Title |
r2VIM: A new variable selection method for random forests in genome-wide association studies
|
---|---|
Published in |
BioData Mining, February 2016
|
DOI | 10.1186/s13040-016-0087-3 |
Pubmed ID | |
Authors |
Silke Szymczak, Emily Holzinger, Abhijit Dasgupta, James D. Malley, Anne M. Molloy, James L. Mills, Lawrence C. Brody, Dwight Stambolian, Joan E. Bailey-Wilson |
Abstract |
Machine learning methods and in particular random forests (RFs) are a promising alternative to standard single SNP analyses in genome-wide association studies (GWAS). RFs provide variable importance measures (VIMs) to rank SNPs according to their predictive power. However, in contrast to the established genome-wide significance threshold, no clear criteria exist to determine how many SNPs should be selected for downstream analyses. We propose a new variable selection approach, recurrent relative variable importance measure (r2VIM). Importance values are calculated relative to an observed minimal importance score for several runs of RF and only SNPs with large relative VIMs in all of the runs are selected as important. Evaluations on simulated GWAS data show that the new method controls the number of false-positives under the null hypothesis. Under a simple alternative hypothesis with several independent main effects it is only slightly less powerful than logistic regression. In an experimental GWAS data set, the same strong signal is identified while the approach selects none of the SNPs in an underpowered GWAS. The novel variable selection method r2VIM is a promising extension to standard RF for objectively selecting relevant SNPs in GWAS while controlling the number of false-positive results. |
X Demographics
Geographical breakdown
Country | Count | As % |
---|---|---|
United States | 3 | 27% |
France | 2 | 18% |
United Kingdom | 2 | 18% |
Unknown | 4 | 36% |
Demographic breakdown
Type | Count | As % |
---|---|---|
Scientists | 6 | 55% |
Members of the public | 5 | 45% |
Mendeley readers
Geographical breakdown
Country | Count | As % |
---|---|---|
United States | 1 | 1% |
Germany | 1 | 1% |
Unknown | 78 | 98% |
Demographic breakdown
Readers by professional status | Count | As % |
---|---|---|
Researcher | 22 | 28% |
Student > Ph. D. Student | 14 | 18% |
Student > Master | 9 | 11% |
Student > Bachelor | 9 | 11% |
Student > Doctoral Student | 4 | 5% |
Other | 15 | 19% |
Unknown | 7 | 9% |
Readers by discipline | Count | As % |
---|---|---|
Agricultural and Biological Sciences | 22 | 28% |
Biochemistry, Genetics and Molecular Biology | 17 | 21% |
Computer Science | 11 | 14% |
Engineering | 6 | 8% |
Medicine and Dentistry | 4 | 5% |
Other | 9 | 11% |
Unknown | 11 | 14% |