↓ Skip to main content

Opening up the blackbox: an interpretable deep neural network-based classifier for cell-type specific enhancer predictions

Overview of attention for article published in BMC Systems Biology, August 2016
Altmetric Badge

About this Attention Score

  • Average Attention Score compared to outputs of the same age
  • Above-average Attention Score compared to outputs of the same age and source (61st percentile)

Mentioned by

twitter
5 X users

Citations

dimensions_citation
29 Dimensions

Readers on

mendeley
71 Mendeley
You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.
Title
Opening up the blackbox: an interpretable deep neural network-based classifier for cell-type specific enhancer predictions
Published in
BMC Systems Biology, August 2016
DOI 10.1186/s12918-016-0302-3
Pubmed ID
Authors

Seong Gon Kim, Nawanol Theera-Ampornpunt, Chih-Hao Fang, Mrudul Harwani, Ananth Grama, Somali Chaterji

Abstract

Gene expression is mediated by specialized cis-regulatory modules (CRMs), the most prominent of which are called enhancers. Early experiments indicated that enhancers located far from the gene promoters are often responsible for mediating gene transcription. Knowing their properties, regulatory activity, and genomic targets is crucial to the functional understanding of cellular events, ranging from cellular homeostasis to differentiation. Recent genome-wide investigation of epigenomic marks has indicated that enhancer elements could be enriched for certain epigenomic marks, such as, combinatorial patterns of histone modifications. Our efforts in this paper are motivated by these recent advances in epigenomic profiling methods, which have uncovered enhancer-associated chromatin features in different cell types and organisms. Specifically, in this paper, we use recent state-of-the-art Deep Learning methods and develop a deep neural network (DNN)-based architecture, called EP-DNN, to predict the presence and types of enhancers in the human genome. It uses as features, the expression levels of the histone modifications at the peaks of the functional sites as well as in its adjacent regions. We apply EP-DNN to four different cell types: H1, IMR90, HepG2, and HeLa S3. We train EP-DNN using p300 binding sites as enhancers, and TSS and random non-DHS sites as non-enhancers. We perform EP-DNN predictions to quantify the validation rate for different levels of confidence in the predictions and also perform comparisons against two state-of-the-art computational models for enhancer predictions, DEEP-ENCODE and RFECS. We find that EP-DNN has superior accuracy and takes less time to make predictions. Next, we develop methods to make EP-DNN interpretable by computing the importance of each input feature in the classification task. This analysis indicates that the important histone modifications were distinct for different cell types, with some overlaps, e.g., H3K27ac was important in cell type H1 but less so in HeLa S3, while H3K4me1 was relatively important in all four cell types. We finally use the feature importance analysis to reduce the number of input features needed to train the DNN, thus reducing training time, which is often the computational bottleneck in the use of a DNN. In this paper, we developed EP-DNN, which has high accuracy of prediction, with validation rates above 90 % for the operational region of enhancer prediction for all four cell lines that we studied, outperforming DEEP-ENCODE and RFECS. Then, we developed a method to analyze a trained DNN and determine which histone modifications are important, and within that, which features proximal or distal to the enhancer site, are important.

X Demographics

X Demographics

The data shown below were collected from the profiles of 5 X users who shared this research output. Click here to find out more about how the information was compiled.
Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 71 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
Unknown 71 100%

Demographic breakdown

Readers by professional status Count As %
Student > Ph. D. Student 18 25%
Researcher 11 15%
Student > Master 8 11%
Student > Doctoral Student 4 6%
Student > Postgraduate 4 6%
Other 13 18%
Unknown 13 18%
Readers by discipline Count As %
Computer Science 21 30%
Biochemistry, Genetics and Molecular Biology 11 15%
Agricultural and Biological Sciences 11 15%
Engineering 2 3%
Immunology and Microbiology 2 3%
Other 8 11%
Unknown 16 23%
Attention Score in Context

Attention Score in Context

This research output has an Altmetric Attention Score of 2. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 06 August 2016.
All research outputs
#14,287,221
of 23,344,526 outputs
Outputs from BMC Systems Biology
#522
of 1,143 outputs
Outputs of similar age
#212,145
of 368,345 outputs
Outputs of similar age from BMC Systems Biology
#13
of 31 outputs
Altmetric has tracked 23,344,526 research outputs across all sources so far. This one is in the 37th percentile – i.e., 37% of other outputs scored the same or lower than it.
So far Altmetric has tracked 1,143 research outputs from this source. They receive a mean Attention Score of 3.6. This one has gotten more attention than average, scoring higher than 52% of its peers.
Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 368,345 tracked outputs that were published within six weeks on either side of this one in any source. This one is in the 41st percentile – i.e., 41% of its contemporaries scored the same or lower than it.
We're also able to compare this research output to 31 others from the same source and published within six weeks on either side of this one. This one has gotten more attention than average, scoring higher than 61% of its contemporaries.