↓ Skip to main content

A Bayesian taxonomic classification method for 16S rRNA gene sequences with improved species-level accuracy

Overview of attention for article published in BMC Bioinformatics, May 2017
Altmetric Badge

About this Attention Score

  • In the top 25% of all research outputs scored by Altmetric
  • High Attention Score compared to outputs of the same age (90th percentile)
  • High Attention Score compared to outputs of the same age and source (99th percentile)

Mentioned by

twitter
43 tweeters

Citations

dimensions_citation
103 Dimensions

Readers on

mendeley
189 Mendeley
You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.
Title
A Bayesian taxonomic classification method for 16S rRNA gene sequences with improved species-level accuracy
Published in
BMC Bioinformatics, May 2017
DOI 10.1186/s12859-017-1670-4
Pubmed ID
Authors

Xiang Gao, Huaiying Lin, Kashi Revanna, Qunfeng Dong

Abstract

Species-level classification for 16S rRNA gene sequences remains a serious challenge for microbiome researchers, because existing taxonomic classification tools for 16S rRNA gene sequences either do not provide species-level classification, or their classification results are unreliable. The unreliable results are due to the limitations in the existing methods which either lack solid probabilistic-based criteria to evaluate the confidence of their taxonomic assignments, or use nucleotide k-mer frequency as the proxy for sequence similarity measurement. We have developed a method that shows significantly improved species-level classification results over existing methods. Our method calculates true sequence similarity between query sequences and database hits using pairwise sequence alignment. Taxonomic classifications are assigned from the species to the phylum levels based on the lowest common ancestors of multiple database hits for each query sequence, and further classification reliabilities are evaluated by bootstrap confidence scores. The novelty of our method is that the contribution of each database hit to the taxonomic assignment of the query sequence is weighted by a Bayesian posterior probability based upon the degree of sequence similarity of the database hit to the query sequence. Our method does not need any training datasets specific for different taxonomic groups. Instead only a reference database is required for aligning to the query sequences, making our method easily applicable for different regions of the 16S rRNA gene or other phylogenetic marker genes. Reliable species-level classification for 16S rRNA or other phylogenetic marker genes is critical for microbiome research. Our software shows significantly higher classification accuracy than the existing tools and we provide probabilistic-based confidence scores to evaluate the reliability of our taxonomic classification assignments based on multiple database matches to query sequences. Despite its higher computational costs, our method is still suitable for analyzing large-scale microbiome datasets for practical purposes. Furthermore, our method can be applied for taxonomic classification of any phylogenetic marker gene sequences. Our software, called BLCA, is freely available at https://github.com/qunfengdong/BLCA .

Twitter Demographics

The data shown below were collected from the profiles of 43 tweeters who shared this research output. Click here to find out more about how the information was compiled.

Mendeley readers

The data shown below were compiled from readership statistics for 189 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
Sweden 1 <1%
Switzerland 1 <1%
Unknown 187 99%

Demographic breakdown

Readers by professional status Count As %
Student > Ph. D. Student 40 21%
Student > Master 40 21%
Researcher 37 20%
Student > Bachelor 18 10%
Student > Doctoral Student 8 4%
Other 22 12%
Unknown 24 13%
Readers by discipline Count As %
Agricultural and Biological Sciences 66 35%
Biochemistry, Genetics and Molecular Biology 32 17%
Environmental Science 15 8%
Immunology and Microbiology 11 6%
Computer Science 9 5%
Other 24 13%
Unknown 32 17%

Attention Score in Context

This research output has an Altmetric Attention Score of 22. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 21 May 2019.
All research outputs
#1,294,807
of 21,105,374 outputs
Outputs from BMC Bioinformatics
#240
of 6,876 outputs
Outputs of similar age
#27,901
of 282,805 outputs
Outputs of similar age from BMC Bioinformatics
#1
of 11 outputs
Altmetric has tracked 21,105,374 research outputs across all sources so far. Compared to these this one has done particularly well and is in the 93rd percentile: it's in the top 10% of all research outputs ever tracked by Altmetric.
So far Altmetric has tracked 6,876 research outputs from this source. They typically receive a little more attention than average, with a mean Attention Score of 5.4. This one has done particularly well, scoring higher than 96% of its peers.
Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 282,805 tracked outputs that were published within six weeks on either side of this one in any source. This one has done particularly well, scoring higher than 90% of its contemporaries.
We're also able to compare this research output to 11 others from the same source and published within six weeks on either side of this one. This one has done particularly well, scoring higher than 99% of its contemporaries.