↓ Skip to main content

Inferring intra-motif dependencies of DNA binding sites from ChIP-seq data

Overview of attention for article published in BMC Bioinformatics, November 2015
Altmetric Badge

About this Attention Score

  • Above-average Attention Score compared to outputs of the same age (59th percentile)
  • Average Attention Score compared to outputs of the same age and source

Mentioned by

twitter
6 tweeters

Citations

dimensions_citation
24 Dimensions

Readers on

mendeley
47 Mendeley
You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.
Title
Inferring intra-motif dependencies of DNA binding sites from ChIP-seq data
Published in
BMC Bioinformatics, November 2015
DOI 10.1186/s12859-015-0797-4
Pubmed ID
Authors

Ralf Eggeling, Teemu Roos, Petri Myllymäki, Ivo Grosse

Abstract

Statistical modeling of transcription factor binding sites is one of the classical fields in bioinformatics. The position weight matrix (PWM) model, which assumes statistical independence among all nucleotides in a binding site, used to be the standard model for this task for more than three decades but its simple assumptions are increasingly put into question. Recent high-throughput sequencing methods have provided data sets of sufficient size and quality for studying the benefits of more complex models. However, learning more complex models typically entails the danger of overfitting, and while model classes that dynamically adapt the model complexity to data have been developed, effective model selection is to date only possible for fully observable data, but not, e.g., within de novo motif discovery. To address this issue, we propose a stochastic algorithm for performing robust model selection in a latent variable setting. This algorithm yields a solution without relying on hyperparameter-tuning via massive cross-validation or other computationally expensive resampling techniques. Using this algorithm for learning inhomogeneous parsimonious Markov models, we study the degree of putative higher-order intra-motif dependencies for transcription factor binding sites inferred via de novo motif discovery from ChIP-seq data. We find that intra-motif dependencies are prevalent and not limited to first-order dependencies among directly adjacent nucleotides, but that second-order models appear to be the significantly better choice. The traditional PWM model appears to be indeed insufficient to infer realistic sequence motifs, as it is on average outperformed by more complex models that take into account intra-motif dependencies. Moreover, using such models together with an appropriate model selection procedure does not lead to a significant performance loss in comparison with the PWM model for any of the studied transcription factors. Hence, we find it worthwhile to recommend that any modern motif discovery algorithm should attempt to take into account intra-motif dependencies.

Twitter Demographics

The data shown below were collected from the profiles of 6 tweeters who shared this research output. Click here to find out more about how the information was compiled.

Mendeley readers

The data shown below were compiled from readership statistics for 47 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
Germany 2 4%
United States 1 2%
Spain 1 2%
Unknown 43 91%

Demographic breakdown

Readers by professional status Count As %
Student > Ph. D. Student 13 28%
Researcher 10 21%
Student > Master 10 21%
Student > Bachelor 5 11%
Student > Doctoral Student 3 6%
Other 5 11%
Unknown 1 2%
Readers by discipline Count As %
Agricultural and Biological Sciences 20 43%
Biochemistry, Genetics and Molecular Biology 13 28%
Computer Science 9 19%
Immunology and Microbiology 1 2%
Sports and Recreations 1 2%
Other 1 2%
Unknown 2 4%

Attention Score in Context

This research output has an Altmetric Attention Score of 2. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 04 May 2016.
All research outputs
#5,223,194
of 10,444,782 outputs
Outputs from BMC Bioinformatics
#2,332
of 4,169 outputs
Outputs of similar age
#96,729
of 251,416 outputs
Outputs of similar age from BMC Bioinformatics
#72
of 138 outputs
Altmetric has tracked 10,444,782 research outputs across all sources so far. This one is in the 48th percentile – i.e., 48% of other outputs scored the same or lower than it.
So far Altmetric has tracked 4,169 research outputs from this source. They receive a mean Attention Score of 4.9. This one is in the 41st percentile – i.e., 41% of its peers scored the same or lower than it.
Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 251,416 tracked outputs that were published within six weeks on either side of this one in any source. This one has gotten more attention than average, scoring higher than 59% of its contemporaries.
We're also able to compare this research output to 138 others from the same source and published within six weeks on either side of this one. This one is in the 42nd percentile – i.e., 42% of its contemporaries scored the same or lower than it.