↓ Skip to main content

Identification of missing variants by combining multiple analytic pipelines

Overview of attention for article published in BMC Bioinformatics, April 2018
Altmetric Badge

About this Attention Score

  • In the top 25% of all research outputs scored by Altmetric
  • High Attention Score compared to outputs of the same age (81st percentile)
  • High Attention Score compared to outputs of the same age and source (89th percentile)

Mentioned by

news
1 news outlet
twitter
2 X users

Citations

dimensions_citation
11 Dimensions

Readers on

mendeley
32 Mendeley
Title
Identification of missing variants by combining multiple analytic pipelines
Published in
BMC Bioinformatics, April 2018
DOI 10.1186/s12859-018-2151-0
Pubmed ID
Authors

Yingxue Ren, Joseph S. Reddy, Cyril Pottier, Vivekananda Sarangi, Shulan Tian, Jason P. Sinnwell, Shannon K. McDonnell, Joanna M. Biernacka, Minerva M. Carrasquillo, Owen A. Ross, Nilüfer Ertekin-Taner, Rosa Rademakers, Matthew Hudson, Liudmila Sergeevna Mainzer, Yan W. Asmann

Abstract

After decades of identifying risk factors using array-based genome-wide association studies (GWAS), genetic research of complex diseases has shifted to sequencing-based rare variants discovery. This requires large sample sizes for statistical power and has brought up questions about whether the current variant calling practices are adequate for large cohorts. It is well-known that there are discrepancies between variants called by different pipelines, and that using a single pipeline always misses true variants exclusively identifiable by other pipelines. Nonetheless, it is common practice today to call variants by one pipeline due to computational cost and assume that false negative calls are a small percent of total. We analyzed 10,000 exomes from the Alzheimer's Disease Sequencing Project (ADSP) using multiple analytic pipelines consisting of different read aligners and variant calling strategies. We compared variants identified by using two aligners in 50,100, 200, 500, 1000, and 1952 samples; and compared variants identified by adding single-sample genotyping to the default multi-sample joint genotyping in 50,100, 500, 2000, 5000 and 10,000 samples. We found that using a single pipeline missed increasing numbers of high-quality variants correlated with sample sizes. By combining two read aligners and two variant calling strategies, we rescued 30% of pass-QC variants at sample size of 2000, and 56% at 10,000 samples. The rescued variants had higher proportions of low frequency (minor allele frequency [MAF] 1-5%) and rare (MAF < 1%) variants, which are the very type of variants of interest. In 660 Alzheimer's disease cases with earlier onset ages of ≤65, 4 out of 13 (31%) previously-published rare pathogenic and protective mutations in APP, PSEN1, and PSEN2 genes were undetected by the default one-pipeline approach but recovered by the multi-pipeline approach. Identification of the complete variant set from sequencing data is the prerequisite of genetic association analyses. The current analytic practice of calling genetic variants from sequencing data using a single bioinformatics pipeline is no longer adequate with the increasingly large projects. The number and percentage of quality variants that passed quality filters but are missed by the one-pipeline approach rapidly increased with sample size.

X Demographics

X Demographics

The data shown below were collected from the profiles of 2 X users who shared this research output. Click here to find out more about how the information was compiled.
Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 32 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
Unknown 32 100%

Demographic breakdown

Readers by professional status Count As %
Researcher 9 28%
Student > Ph. D. Student 5 16%
Student > Master 4 13%
Other 3 9%
Professor > Associate Professor 2 6%
Other 3 9%
Unknown 6 19%
Readers by discipline Count As %
Biochemistry, Genetics and Molecular Biology 6 19%
Medicine and Dentistry 6 19%
Neuroscience 5 16%
Agricultural and Biological Sciences 4 13%
Computer Science 2 6%
Other 3 9%
Unknown 6 19%
Attention Score in Context

Attention Score in Context

This research output has an Altmetric Attention Score of 11. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 20 April 2018.
All research outputs
#2,856,023
of 23,043,346 outputs
Outputs from BMC Bioinformatics
#970
of 7,318 outputs
Outputs of similar age
#56,022
of 296,868 outputs
Outputs of similar age from BMC Bioinformatics
#11
of 108 outputs
Altmetric has tracked 23,043,346 research outputs across all sources so far. Compared to these this one has done well and is in the 87th percentile: it's in the top 25% of all research outputs ever tracked by Altmetric.
So far Altmetric has tracked 7,318 research outputs from this source. They typically receive a little more attention than average, with a mean Attention Score of 5.4. This one has done well, scoring higher than 86% of its peers.
Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 296,868 tracked outputs that were published within six weeks on either side of this one in any source. This one has done well, scoring higher than 81% of its contemporaries.
We're also able to compare this research output to 108 others from the same source and published within six weeks on either side of this one. This one has done well, scoring higher than 89% of its contemporaries.