↓ Skip to main content

A method combining a random forest-based technique with the modeling of linkage disequilibrium through latent variables, to run multilocus genome-wide association studies

Overview of attention for article published in BMC Bioinformatics, March 2018
Altmetric Badge

Mentioned by

twitter
2 X users

Citations

dimensions_citation
7 Dimensions

Readers on

mendeley
36 Mendeley
Title
A method combining a random forest-based technique with the modeling of linkage disequilibrium through latent variables, to run multilocus genome-wide association studies
Published in
BMC Bioinformatics, March 2018
DOI 10.1186/s12859-018-2054-0
Pubmed ID
Authors

Christine Sinoquet

Abstract

Genome-wide association studies (GWASs) have been widely used to discover the genetic basis of complex phenotypes. However, standard single-SNP GWASs suffer from lack of power. In particular, they do not directly account for linkage disequilibrium, that is the dependences between SNPs (Single Nucleotide Polymorphisms). We present the comparative study of two multilocus GWAS strategies, in the random forest-based framework. The first method, T-Trees, was designed by Botta and collaborators (Botta et al., PLoS ONE 9(4):e93379, 2014). We designed the other method, which is an innovative hybrid method combining T-Trees with the modeling of linkage disequilibrium. Linkage disequilibrium is modeled through a collection of tree-shaped Bayesian networks with latent variables, following our former works (Mourad et al., BMC Bioinformatics 12(1):16, 2011). We compared the two methods, both on simulated and real data. For dominant and additive genetic models, in either of the conditions simulated, the hybrid approach always slightly performs better than T-Trees. We assessed predictive powers through the standard ROC technique on 14 real datasets. For 10 of the 14 datasets analyzed, the already high predicted power observed for T-Trees (0.910-0.946) can still be increased by up to 0.030. We also assessed whether the distributions of SNPs' scores obtained from T-Trees and the hybrid approach differed. Finally, we thoroughly analyzed the intersections of top 100 SNPs output by any two or the three methods amongst T-Trees, the hybrid approach, and the single-SNP method. The sophistication of T-Trees through finer linkage disequilibrium modeling is shown beneficial. The distributions of SNPs' scores generated by T-Trees and the hybrid approach are shown statistically different, which suggests complementary of the methods. In particular, for 12 of the 14 real datasets, the distribution tail of highest SNPs' scores shows larger values for the hybrid approach. Thus are pinpointed more interesting SNPs than by T-Trees, to be provided as a short list of prioritized SNPs, for a further analysis by biologists. Finally, among the 211 top 100 SNPs jointly detected by the single-SNP method, T-Trees and the hybrid approach over the 14 datasets, we identified 72 and 38 SNPs respectively present in the top25s and top10s for each method.

X Demographics

X Demographics

The data shown below were collected from the profiles of 2 X users who shared this research output. Click here to find out more about how the information was compiled.
Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 36 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
Unknown 36 100%

Demographic breakdown

Readers by professional status Count As %
Researcher 11 31%
Student > Ph. D. Student 9 25%
Student > Master 5 14%
Student > Doctoral Student 3 8%
Student > Bachelor 2 6%
Other 2 6%
Unknown 4 11%
Readers by discipline Count As %
Biochemistry, Genetics and Molecular Biology 12 33%
Agricultural and Biological Sciences 10 28%
Computer Science 4 11%
Neuroscience 2 6%
Medicine and Dentistry 1 3%
Other 1 3%
Unknown 6 17%
Attention Score in Context

Attention Score in Context

This research output has an Altmetric Attention Score of 1. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 29 March 2018.
All research outputs
#17,937,475
of 23,031,582 outputs
Outputs from BMC Bioinformatics
#5,970
of 7,316 outputs
Outputs of similar age
#239,780
of 330,033 outputs
Outputs of similar age from BMC Bioinformatics
#77
of 113 outputs
Altmetric has tracked 23,031,582 research outputs across all sources so far. This one is in the 19th percentile – i.e., 19% of other outputs scored the same or lower than it.
So far Altmetric has tracked 7,316 research outputs from this source. They typically receive a little more attention than average, with a mean Attention Score of 5.4. This one is in the 13th percentile – i.e., 13% of its peers scored the same or lower than it.
Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 330,033 tracked outputs that were published within six weeks on either side of this one in any source. This one is in the 22nd percentile – i.e., 22% of its contemporaries scored the same or lower than it.
We're also able to compare this research output to 113 others from the same source and published within six weeks on either side of this one. This one is in the 27th percentile – i.e., 27% of its contemporaries scored the same or lower than it.