InteractChrome: Combining Graphical Models and Graph Neural Networks to Identify Expression Guided Epigenetic Interactions
Abstract:
Gene regulation is a complex mechanism associated with differences in DNA sequence and interactions between epigenetic factors. The recent revolution in genomic technologies has enabled a flood of genome-wide profiling of many genomic and epigenomic measurements. The core aim of analyzing such genome-wide datasets is to understand what are the important regulatory factors and how they work together to influence gene regulation. Previous studies fall into two kinds: (1) using generative graphical models to learn a global and interpretable interaction network showing how epigenetic factors are conditionally dependent on each other, (2) Designing predictive models to associate inputs (like chromatin measurements) with outputs (like gene expression). Deep learning methods, belonging to the second type, have shown promising results; however, their adoption is hindered by the difficulty of explaining the produced models due to their "black-box" nature. In this paper, we propose a complementary strategy to combine the generative graphical model estimation with deep neural network for interpretable data analysis. Our proposed method, InteractChrome, is based on a key intuition that the global interaction network estimated by graphical models (GM) can provide a strong inductive bias for deep learning to exploit for downstream gene expression prediction via graph convolution network (GCN). Such a combination can enable the GCN based deep predictive model to identify how epigenetic players collaborate in controlling a specific gene's expression (aka expression guided interactions). Combining the explicit relational identification (via GM) together with the graph guided deep prediction (via GCN) provides us with three main benefits: (1) being able to provide gene-specific interpretation of how epigenetic factors interact and contribute; (2) achieving better parameter efficiency since both GM and GCN were designed with such an aim; and (3) characterizing InteractChrome as a novel and interpretable deep learning framework potentially applicable in similar applications. Empirically, we evaluate InteractChrome on 56 different cell types and achieve state-of-the-art performance on predicting gene expression from histone modification signals. More importantly, we show that gradient based interpretation from InteractChrome provides the unique ability to identify gene-specific importance of how epigenetic factors interact to influence regulation. InteractChrome validates previous observations regarding relationships between epigenetic factors and gene expression and is able to extract meaningful representations demonstrated by multiple t-SNE visualizations.
Committee:
- Alf Weaver (Chair)
- Yanjun Qi (Advisor)
- Worthy Martin
- Mary Lou Soffa
- Vicente Ordonez