Title |
Effective filtering strategies to improve data quality from population-based whole exome sequencing studies
|
---|---|
Published in |
BMC Bioinformatics, May 2014
|
DOI | 10.1186/1471-2105-15-125 |
Pubmed ID | |
Authors |
Andrew R Carson, Erin N Smith, Hiroko Matsui, Sigrid K Brækkan, Kristen Jepsen, John-Bjarne Hansen, Kelly A Frazer |
Abstract |
Genotypes generated in next generation sequencing studies contain errors which can significantly impact the power to detect signals in common and rare variant association tests. These genotyping errors are not explicitly filtered by the standard GATK Variant Quality Score Recalibration (VQSR) tool and thus remain a source of errors in whole exome sequencing (WES) projects that follow GATK's recommended best practices. Therefore, additional data filtering methods are required to effectively remove these errors before performing association analyses with complex phenotypes. Here we empirically derive thresholds for genotype and variant filters that, when used in conjunction with the VQSR tool, achieve higher data quality than when using VQSR alone. |
X Demographics
Geographical breakdown
Country | Count | As % |
---|---|---|
Germany | 1 | 17% |
United Kingdom | 1 | 17% |
Norway | 1 | 17% |
Israel | 1 | 17% |
Unknown | 2 | 33% |
Demographic breakdown
Type | Count | As % |
---|---|---|
Scientists | 4 | 67% |
Members of the public | 2 | 33% |
Mendeley readers
Geographical breakdown
Country | Count | As % |
---|---|---|
United States | 2 | <1% |
United Kingdom | 2 | <1% |
Italy | 2 | <1% |
Hong Kong | 1 | <1% |
South Africa | 1 | <1% |
Germany | 1 | <1% |
France | 1 | <1% |
Finland | 1 | <1% |
New Zealand | 1 | <1% |
Other | 1 | <1% |
Unknown | 264 | 95% |
Demographic breakdown
Readers by professional status | Count | As % |
---|---|---|
Student > Ph. D. Student | 67 | 24% |
Researcher | 57 | 21% |
Student > Master | 38 | 14% |
Other | 20 | 7% |
Student > Bachelor | 18 | 6% |
Other | 41 | 15% |
Unknown | 36 | 13% |
Readers by discipline | Count | As % |
---|---|---|
Agricultural and Biological Sciences | 111 | 40% |
Biochemistry, Genetics and Molecular Biology | 72 | 26% |
Medicine and Dentistry | 22 | 8% |
Computer Science | 11 | 4% |
Neuroscience | 8 | 3% |
Other | 9 | 3% |
Unknown | 44 | 16% |