Title |
A powerful statistical framework for generalization testing in GWAS, with application to the HCHS/SOL
|
---|---|
Published in |
Genetic Epidemiology, January 2017
|
DOI | 10.1002/gepi.22029 |
Pubmed ID | |
Authors |
Tamar Sofer, Ruth Heller, Marina Bogomolov, Christy L. Avery, Mariaelisa Graff, Kari E. North, Alex P. Reiner, Timothy A. Thornton, Kenneth Rice, Yoav Benjamini, Cathy C. Laurie, Kathleen F. Kerr |
Abstract |
In genome-wide association studies (GWAS), "generalization" is the replication of genotype-phenotype association in a population with different ancestry than the population in which it was first identified. Current practices for declaring generalizations rely on testing associations while controlling the family-wise error rate (FWER) in the discovery study, then separately controlling error measures in the follow-up study. This approach does not guarantee control over the FWER or false discovery rate (FDR) of the generalization null hypotheses. It also fails to leverage the two-stage design to increase power for detecting generalized associations. We provide a formal statistical framework for quantifying the evidence of generalization that accounts for the (in)consistency between the directions of associations in the discovery and follow-up studies. We develop the directional generalization FWER (FWERg ) and FDR (FDRg ) controlling r-values, which are used to declare associations as generalized. This framework extends to generalization testing when applied to a published list of Single Nucleotide Polymorphism-(SNP)-trait associations. Our methods control FWERg or FDRg under various SNP selection rules based on P-values in the discovery study. We find that it is often beneficial to use a more lenient P-value threshold than the genome-wide significance threshold. In a GWAS of total cholesterol in the Hispanic Community Health Study/Study of Latinos (HCHS/SOL), when testing all SNPs with P-values <5×10-8 (15 genomic regions) for generalization in a large GWAS of whites, we generalized SNPs from 15 regions. But when testing all SNPs with P-values <6.6×10-5 (89 regions), we generalized SNPs from 27 regions. |
X Demographics
Geographical breakdown
Country | Count | As % |
---|---|---|
United Kingdom | 1 | 50% |
Unknown | 1 | 50% |
Demographic breakdown
Type | Count | As % |
---|---|---|
Scientists | 2 | 100% |
Mendeley readers
Geographical breakdown
Country | Count | As % |
---|---|---|
Unknown | 19 | 100% |
Demographic breakdown
Readers by professional status | Count | As % |
---|---|---|
Student > Ph. D. Student | 6 | 32% |
Researcher | 3 | 16% |
Student > Bachelor | 2 | 11% |
Student > Master | 2 | 11% |
Student > Doctoral Student | 1 | 5% |
Other | 1 | 5% |
Unknown | 4 | 21% |
Readers by discipline | Count | As % |
---|---|---|
Medicine and Dentistry | 5 | 26% |
Agricultural and Biological Sciences | 3 | 16% |
Biochemistry, Genetics and Molecular Biology | 2 | 11% |
Nursing and Health Professions | 1 | 5% |
Psychology | 1 | 5% |
Other | 1 | 5% |
Unknown | 6 | 32% |