Albert Xue

The vast majority of disease-associated genetic variants are located in non-coding- regions, which represent more than 97% of the human genome. A systematic delineation of the mechanism by which such non-coding variants induce diseases requires accurate identification of downstream target genes whose expression levels are regulated by these variants in diverse tissues. Since these target genes are highly tissue-specific and are usually located far away  in the 1D genome, prediction has thus far proven to be difficult. Previous studies have mostly focused on short-range regulation and are thus limited by low statistical power, overfitting across different tissues, and difficulties utilizing non-linear associations of multi-enhancer regulation. As a result, an efficient and robust machine learning algorithm is therefore needed to predict long-range links between regulatory elements and distal target genes via 3D chromatin interactions.Based on our preliminary analysis, we propose to leverage protein-protein interactions (PPI) as features to predict long-range chromatin interactions. We will integrate PPI, transcription factor binding, chromatin and epigenetic signals in order to fine tune a convolutional neural network under a transfer learning framework o achieve better accuracy on long-range regulation predictions. In addition, we will use our model to characterize the key PPIs that are important in establishing and maintaining tissue-specific regulatory links between enhancers and distal target genes, which will provide novel mechanistic insights on 3D chromatin structure formation. The predicted long-range regulatory networks will be a valuable platform in interpreting the functional roles of non-coding genetic variants and decoding the genetic basis of human disease.