Report for: A method combining a random forest-based technique with the modeling of linkage disequilibrium through latent variables, to run multilocus genome-wide association studies

Title	A method combining a random forest-based technique with the modeling of linkage disequilibrium through latent variables, to run multilocus genome-wide association studies
Published in	BMC Bioinformatics, March 2018
DOI	10.1186/s12859-018-2054-0
Pubmed ID	29587628
Authors	Christine Sinoquet
Abstract	Genome-wide association studies (GWASs) have been widely used to discover the genetic basis of complex phenotypes. However, standard single-SNP GWASs suffer from lack of power. In particular, they do not directly account for linkage disequilibrium, that is the dependences between SNPs (Single Nucleotide Polymorphisms). We present the comparative study of two multilocus GWAS strategies, in the random forest-based framework. The first method, T-Trees, was designed by Botta and collaborators (Botta et al., PLoS ONE 9(4):e93379, 2014). We designed the other method, which is an innovative hybrid method combining T-Trees with the modeling of linkage disequilibrium. Linkage disequilibrium is modeled through a collection of tree-shaped Bayesian networks with latent variables, following our former works (Mourad et al., BMC Bioinformatics 12(1):16, 2011). We compared the two methods, both on simulated and real data. For dominant and additive genetic models, in either of the conditions simulated, the hybrid approach always slightly performs better than T-Trees. We assessed predictive powers through the standard ROC technique on 14 real datasets. For 10 of the 14 datasets analyzed, the already high predicted power observed for T-Trees (0.910-0.946) can still be increased by up to 0.030. We also assessed whether the distributions of SNPs' scores obtained from T-Trees and the hybrid approach differed. Finally, we thoroughly analyzed the intersections of top 100 SNPs output by any two or the three methods amongst T-Trees, the hybrid approach, and the single-SNP method. The sophistication of T-Trees through finer linkage disequilibrium modeling is shown beneficial. The distributions of SNPs' scores generated by T-Trees and the hybrid approach are shown statistically different, which suggests complementary of the methods. In particular, for 12 of the 14 real datasets, the distribution tail of highest SNPs' scores shows larger values for the hybrid approach. Thus are pinpointed more interesting SNPs than by T-Trees, to be provided as a short list of prioritized SNPs, for a further analysis by biologists. Finally, among the 211 top 100 SNPs jointly detected by the single-SNP method, T-Trees and the hybrid approach over the 14 datasets, we identified 72 and 38 SNPs respectively present in the top25s and top10s for each method.

View on publisher site Alert me about new mentions

X Demographics

The data shown below were collected from the profiles of 2 X users who shared this research output. Click here to find out more about how the information was compiled.

Geographical breakdown

Country	Count	As %
Unknown	2	100%

Demographic breakdown

Type	Count	As %
Members of the public	2	100%

Mendeley readers

The data shown below were compiled from readership statistics for 36 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country	Count	As %
Unknown	36	100%

Demographic breakdown

Readers by professional status	Count	As %
Researcher	11	31%
Student > Ph. D. Student	9	25%
Student > Master	5	14%
Student > Doctoral Student	3	8%
Student > Bachelor	2	6%
Other	2	6%
Unknown	4	11%

Readers by discipline	Count	As %
Biochemistry, Genetics and Molecular Biology	12	33%
Agricultural and Biological Sciences	10	28%
Computer Science	4	11%
Neuroscience	2	6%
Medicine and Dentistry	1	3%
Other	1	3%
Unknown	6	17%

Attention Score in Context

This research output has an Altmetric Attention Score of 1. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 29 March 2018.

All research outputs

#17,937,475

of 23,031,582 outputs

Outputs from BMC Bioinformatics

#5,970

of 7,316 outputs

Outputs of similar age

#239,780

of 330,033 outputs

Outputs of similar age from BMC Bioinformatics

#77

of 113 outputs

Altmetric has tracked 23,031,582 research outputs across all sources so far. This one is in the 19th percentile – i.e., 19% of other outputs scored the same or lower than it.

So far Altmetric has tracked 7,316 research outputs from this source. They typically receive a little more attention than average, with a mean Attention Score of 5.4. This one is in the 13th percentile – i.e., 13% of its peers scored the same or lower than it.

Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 330,033 tracked outputs that were published within six weeks on either side of this one in any source. This one is in the 22nd percentile – i.e., 22% of its contemporaries scored the same or lower than it.

We're also able to compare this research output to 113 others from the same source and published within six weeks on either side of this one. This one is in the 27th percentile – i.e., 27% of its contemporaries scored the same or lower than it.

A method combining a random forest-based technique with the modeling of linkage disequilibrium through latent variables, to run multilocus genome-wide association studies

Mentioned by

Citations

Readers on

X Demographics

Geographical breakdown

Demographic breakdown

Mendeley readers

Geographical breakdown

Demographic breakdown

Attention Score in Context