↓ Skip to main content

A Bayesian taxonomic classification method for 16S rRNA gene sequences with improved species-level accuracy

Overview of attention for article published in BMC Bioinformatics, May 2017
Altmetric Badge

About this Attention Score

  • In the top 25% of all research outputs scored by Altmetric
  • High Attention Score compared to outputs of the same age (89th percentile)
  • High Attention Score compared to outputs of the same age and source (92nd percentile)

Mentioned by

twitter
39 X users

Citations

dimensions_citation
156 Dimensions

Readers on

mendeley
230 Mendeley
You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.
Title
A Bayesian taxonomic classification method for 16S rRNA gene sequences with improved species-level accuracy
Published in
BMC Bioinformatics, May 2017
DOI 10.1186/s12859-017-1670-4
Pubmed ID
Authors

Xiang Gao, Huaiying Lin, Kashi Revanna, Qunfeng Dong

Abstract

Species-level classification for 16S rRNA gene sequences remains a serious challenge for microbiome researchers, because existing taxonomic classification tools for 16S rRNA gene sequences either do not provide species-level classification, or their classification results are unreliable. The unreliable results are due to the limitations in the existing methods which either lack solid probabilistic-based criteria to evaluate the confidence of their taxonomic assignments, or use nucleotide k-mer frequency as the proxy for sequence similarity measurement. We have developed a method that shows significantly improved species-level classification results over existing methods. Our method calculates true sequence similarity between query sequences and database hits using pairwise sequence alignment. Taxonomic classifications are assigned from the species to the phylum levels based on the lowest common ancestors of multiple database hits for each query sequence, and further classification reliabilities are evaluated by bootstrap confidence scores. The novelty of our method is that the contribution of each database hit to the taxonomic assignment of the query sequence is weighted by a Bayesian posterior probability based upon the degree of sequence similarity of the database hit to the query sequence. Our method does not need any training datasets specific for different taxonomic groups. Instead only a reference database is required for aligning to the query sequences, making our method easily applicable for different regions of the 16S rRNA gene or other phylogenetic marker genes. Reliable species-level classification for 16S rRNA or other phylogenetic marker genes is critical for microbiome research. Our software shows significantly higher classification accuracy than the existing tools and we provide probabilistic-based confidence scores to evaluate the reliability of our taxonomic classification assignments based on multiple database matches to query sequences. Despite its higher computational costs, our method is still suitable for analyzing large-scale microbiome datasets for practical purposes. Furthermore, our method can be applied for taxonomic classification of any phylogenetic marker gene sequences. Our software, called BLCA, is freely available at https://github.com/qunfengdong/BLCA .

X Demographics

X Demographics

The data shown below were collected from the profiles of 39 X users who shared this research output. Click here to find out more about how the information was compiled.
Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 230 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
Sweden 1 <1%
Switzerland 1 <1%
Unknown 228 99%

Demographic breakdown

Readers by professional status Count As %
Student > Ph. D. Student 51 22%
Student > Master 43 19%
Researcher 40 17%
Student > Bachelor 23 10%
Student > Doctoral Student 9 4%
Other 25 11%
Unknown 39 17%
Readers by discipline Count As %
Agricultural and Biological Sciences 72 31%
Biochemistry, Genetics and Molecular Biology 35 15%
Environmental Science 18 8%
Immunology and Microbiology 14 6%
Computer Science 11 5%
Other 32 14%
Unknown 48 21%
Attention Score in Context

Attention Score in Context

This research output has an Altmetric Attention Score of 20. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 21 May 2019.
All research outputs
#1,780,149
of 24,885,505 outputs
Outputs from BMC Bioinformatics
#340
of 7,601 outputs
Outputs of similar age
#33,710
of 316,006 outputs
Outputs of similar age from BMC Bioinformatics
#9
of 101 outputs
Altmetric has tracked 24,885,505 research outputs across all sources so far. Compared to these this one has done particularly well and is in the 92nd percentile: it's in the top 10% of all research outputs ever tracked by Altmetric.
So far Altmetric has tracked 7,601 research outputs from this source. They typically receive a little more attention than average, with a mean Attention Score of 5.5. This one has done particularly well, scoring higher than 95% of its peers.
Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 316,006 tracked outputs that were published within six weeks on either side of this one in any source. This one has done well, scoring higher than 89% of its contemporaries.
We're also able to compare this research output to 101 others from the same source and published within six weeks on either side of this one. This one has done particularly well, scoring higher than 92% of its contemporaries.