Title |
Genotyping Informatics and Quality Control for 100,000 Subjects in the Genetic Epidemiology Research on Adult Health and Aging (GERA) Cohort
|
---|---|
Published in |
Genetics, June 2015
|
DOI | 10.1534/genetics.115.178905 |
Pubmed ID | |
Authors |
Mark N Kvale, Stephanie Hesselson, Thomas J Hoffmann, Yang Cao, David Chan, Sheryl Connell, Lisa A Croen, Brad P Dispensa, Jasmin Eshragh, Andrea Finn, Jeremy Gollub, Carlos Iribarren, Eric Jorgenson, Lawrence H Kushi, Richard Lao, Yontao Lu, Dana Ludwig, Gurpreet K Mathauda, William B McGuire, Gangwu Mei, Sunita Miles, Michael Mittman, Mohini Patil, Charles P Quesenberry, Dilrini Ranatunga, Sarah Rowell, Marianne Sadler, Lori C Sakoda, Michael Shapero, Ling Shen, Tanu Shenoy, David Smethurst, Carol P Somkin, Stephen K Van Den Eeden, Lawrence Walter, Eunice Wan, Teresa Webster, Rachel A Whitmer, Simon Wong, Chia Zau, Yiping Zhan, Catherine Schaefer, Pui-Yan Kwok, Neil Risch |
Abstract |
The Kaiser Permanente (KP) Research Program on Genes, Environment and Health (RPGEH), in collaboration with the University of California, San Francisco, undertook genome-wide genotyping of over 100,000 subjects that constitute the Genetic Epidemiology Research on Adult Health and Aging (GERA) cohort. The project, which generated over 70 billion genotypes, represents the first large scale use of the Affymetrix Axiom Genotyping Solution. Because genotyping took place over a short 14-month period, creating a near-real time analysis pipeline for experimental assay quality control and final optimized analyses was critical. Because of the multi-ethnic nature of the cohort, four different ethnic-specific arrays were employed to enhance genome-wide coverage. All assays were performed on DNA extracted from saliva samples. To improve sample call rates and significantly increase genotype concordance, we partitioned the cohort into disjoint packages of plates with similar assay contexts. Using strict QC criteria, the overall genotyping success rate was 103,067 out of 109,837 samples assayed (93.8%), with a range of 92.1% to 95.4% for the four different arrays. Similarly, the SNP genotyping success rate ranged from 98.1% to 99.4% across the four arrays, the variation mostly depending on how many SNPs were included as single copy versus double copy on a particular array. The high quality and large scale of genotype data created on this cohort, in conjunction with comprehensive longitudinal data from the KP electronic health records of participants will enable a broad range of highly powered genome-wide association studies on a diversity of traits and conditions. |
X Demographics
Geographical breakdown
Country | Count | As % |
---|---|---|
United States | 2 | 25% |
United Kingdom | 1 | 13% |
Thailand | 1 | 13% |
Unknown | 4 | 50% |
Demographic breakdown
Type | Count | As % |
---|---|---|
Members of the public | 4 | 50% |
Scientists | 2 | 25% |
Science communicators (journalists, bloggers, editors) | 1 | 13% |
Practitioners (doctors, other healthcare professionals) | 1 | 13% |
Mendeley readers
Geographical breakdown
Country | Count | As % |
---|---|---|
United Kingdom | 2 | 2% |
Unknown | 95 | 98% |
Demographic breakdown
Readers by professional status | Count | As % |
---|---|---|
Student > Ph. D. Student | 20 | 21% |
Researcher | 20 | 21% |
Student > Bachelor | 11 | 11% |
Student > Master | 9 | 9% |
Professor | 5 | 5% |
Other | 16 | 16% |
Unknown | 16 | 16% |
Readers by discipline | Count | As % |
---|---|---|
Biochemistry, Genetics and Molecular Biology | 19 | 20% |
Agricultural and Biological Sciences | 16 | 16% |
Medicine and Dentistry | 14 | 14% |
Computer Science | 5 | 5% |
Psychology | 3 | 3% |
Other | 13 | 13% |
Unknown | 27 | 28% |