↓ Skip to main content

High Resolution Models of Transcription Factor-DNA Affinities Improve In Vitro and In Vivo Binding Predictions

Overview of attention for article published in PLoS Computational Biology, September 2010
Altmetric Badge

Mentioned by

twitter
1 X user

Citations

dimensions_citation
52 Dimensions

Readers on

mendeley
145 Mendeley
citeulike
17 CiteULike
You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.
Title
High Resolution Models of Transcription Factor-DNA Affinities Improve In Vitro and In Vivo Binding Predictions
Published in
PLoS Computational Biology, September 2010
DOI 10.1371/journal.pcbi.1000916
Pubmed ID
Authors

Phaedra Agius, Aaron Arvey, William Chang, William Stafford Noble, Christina Leslie

Abstract

Accurately modeling the DNA sequence preferences of transcription factors (TFs), and using these models to predict in vivo genomic binding sites for TFs, are key pieces in deciphering the regulatory code. These efforts have been frustrated by the limited availability and accuracy of TF binding site motifs, usually represented as position-specific scoring matrices (PSSMs), which may match large numbers of sites and produce an unreliable list of target genes. Recently, protein binding microarray (PBM) experiments have emerged as a new source of high resolution data on in vitro TF binding specificities. PBM data has been analyzed either by estimating PSSMs or via rank statistics on probe intensities, so that individual sequence patterns are assigned enrichment scores (E-scores). This representation is informative but unwieldy because every TF is assigned a list of thousands of scored sequence patterns. Meanwhile, high-resolution in vivo TF occupancy data from ChIP-seq experiments is also increasingly available. We have developed a flexible discriminative framework for learning TF binding preferences from high resolution in vitro and in vivo data. We first trained support vector regression (SVR) models on PBM data to learn the mapping from probe sequences to binding intensities. We used a novel -mer based string kernel called the di-mismatch kernel to represent probe sequence similarities. The SVR models are more compact than E-scores, more expressive than PSSMs, and can be readily used to scan genomics regions to predict in vivo occupancy. Using a large data set of yeast and mouse TFs, we found that our SVR models can better predict probe intensity than the E-score method or PBM-derived PSSMs. Moreover, by using SVRs to score yeast, mouse, and human genomic regions, we were better able to predict genomic occupancy as measured by ChIP-chip and ChIP-seq experiments. Finally, we found that by training kernel-based models directly on ChIP-seq data, we greatly improved in vivo occupancy prediction, and by comparing a TF's in vitro and in vivo models, we could identify cofactors and disambiguate direct and indirect binding.

X Demographics

X Demographics

The data shown below were collected from the profile of 1 X user who shared this research output. Click here to find out more about how the information was compiled.
Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 145 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
United States 9 6%
France 3 2%
Germany 2 1%
Sweden 2 1%
Canada 2 1%
United Kingdom 1 <1%
Hong Kong 1 <1%
Argentina 1 <1%
Singapore 1 <1%
Other 2 1%
Unknown 121 83%

Demographic breakdown

Readers by professional status Count As %
Researcher 44 30%
Student > Ph. D. Student 43 30%
Student > Master 11 8%
Professor > Associate Professor 8 6%
Student > Bachelor 7 5%
Other 22 15%
Unknown 10 7%
Readers by discipline Count As %
Agricultural and Biological Sciences 82 57%
Computer Science 19 13%
Biochemistry, Genetics and Molecular Biology 14 10%
Mathematics 4 3%
Medicine and Dentistry 3 2%
Other 8 6%
Unknown 15 10%
Attention Score in Context

Attention Score in Context

This research output has an Altmetric Attention Score of 1. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 18 November 2014.
All research outputs
#22,938,588
of 25,576,801 outputs
Outputs from PLoS Computational Biology
#8,612
of 9,003 outputs
Outputs of similar age
#100,113
of 105,254 outputs
Outputs of similar age from PLoS Computational Biology
#55
of 55 outputs
Altmetric has tracked 25,576,801 research outputs across all sources so far. This one is in the 1st percentile – i.e., 1% of other outputs scored the same or lower than it.
So far Altmetric has tracked 9,003 research outputs from this source. They typically receive a lot more attention than average, with a mean Attention Score of 20.4. This one is in the 1st percentile – i.e., 1% of its peers scored the same or lower than it.
Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 105,254 tracked outputs that were published within six weeks on either side of this one in any source. This one is in the 1st percentile – i.e., 1% of its contemporaries scored the same or lower than it.
We're also able to compare this research output to 55 others from the same source and published within six weeks on either side of this one. This one is in the 1st percentile – i.e., 1% of its contemporaries scored the same or lower than it.