Toggle Accessibility Tools

Ketan Jog

Plants respond to environmental stresses in diverse ways. This diversity allows some plants to thrive and others to decline in different environments and is largely driven by species-specific differences in complex gene expression regulatory networks. Although expression data is vastly present for model plant organisms, similar repositories for plants with practical relevance are lacking. Transcription factor binding motifs (TFBMs) tell us which transcription factors can bind to it and subsequently regulate gene expression. Few of these TFBMs are known. These binding sites are found around the promoter and terminator regions of each gene. Since these spatially local sequences are relevant to predicting expression in genes, we use attention mechanisms in conjunction with convolutional neural networks to build our prediction model. We leverage existing data from the model organism A. thaliana using transfer learning to build a prediction model for gene expression under stress (e.g. cold, drought, saline) in O. sativa. The convolution layer identifies regulatory motifs as features while the attention mechanism identifies combinatorial relationships and dependencies between these motifs to improve predictions. We observe an improved performance in prediction of gene expression in O Sativa, after transferring to it the binding motif information from A. thaliana. We perform a comparative study of the motifs generated by the model with a library of existing TFBMs responsible for gene expression.