↓ Skip to main content

Machine Learning Meta-analysis of Large Metagenomic Datasets: Tools and Biological Insights

Overview of attention for article published in PLoS Computational Biology, July 2016
Altmetric Badge

About this Attention Score

  • In the top 5% of all research outputs scored by Altmetric
  • High Attention Score compared to outputs of the same age (96th percentile)
  • High Attention Score compared to outputs of the same age and source (91st percentile)

Mentioned by

blogs
1 blog
twitter
108 X users
patent
1 patent

Citations

dimensions_citation
449 Dimensions

Readers on

mendeley
745 Mendeley
citeulike
8 CiteULike
Title
Machine Learning Meta-analysis of Large Metagenomic Datasets: Tools and Biological Insights
Published in
PLoS Computational Biology, July 2016
DOI 10.1371/journal.pcbi.1004977
Pubmed ID
Authors

Edoardo Pasolli, Duy Tin Truong, Faizan Malik, Levi Waldron, Nicola Segata

Abstract

Shotgun metagenomic analysis of the human associated microbiome provides a rich set of microbial features for prediction and biomarker discovery in the context of human diseases and health conditions. However, the use of such high-resolution microbial features presents new challenges, and validated computational tools for learning tasks are lacking. Moreover, classification rules have scarcely been validated in independent studies, posing questions about the generality and generalization of disease-predictive models across cohorts. In this paper, we comprehensively assess approaches to metagenomics-based prediction tasks and for quantitative assessment of the strength of potential microbiome-phenotype associations. We develop a computational framework for prediction tasks using quantitative microbiome profiles, including species-level relative abundances and presence of strain-specific markers. A comprehensive meta-analysis, with particular emphasis on generalization across cohorts, was performed in a collection of 2424 publicly available metagenomic samples from eight large-scale studies. Cross-validation revealed good disease-prediction capabilities, which were in general improved by feature selection and use of strain-specific markers instead of species-level taxonomic abundance. In cross-study analysis, models transferred between studies were in some cases less accurate than models tested by within-study cross-validation. Interestingly, the addition of healthy (control) samples from other studies to training sets improved disease prediction capabilities. Some microbial species (most notably Streptococcus anginosus) seem to characterize general dysbiotic states of the microbiome rather than connections with a specific disease. Our results in modelling features of the "healthy" microbiome can be considered a first step toward defining general microbial dysbiosis. The software framework, microbiome profiles, and metadata for thousands of samples are publicly available at http://segatalab.cibio.unitn.it/tools/metaml.

X Demographics

X Demographics

The data shown below were collected from the profiles of 108 X users who shared this research output. Click here to find out more about how the information was compiled.
Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 745 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
United States 5 <1%
Brazil 4 <1%
United Kingdom 3 <1%
Sweden 1 <1%
Germany 1 <1%
Ireland 1 <1%
Canada 1 <1%
Ukraine 1 <1%
Estonia 1 <1%
Other 1 <1%
Unknown 726 97%

Demographic breakdown

Readers by professional status Count As %
Researcher 148 20%
Student > Ph. D. Student 141 19%
Student > Master 105 14%
Student > Bachelor 55 7%
Student > Doctoral Student 37 5%
Other 107 14%
Unknown 152 20%
Readers by discipline Count As %
Agricultural and Biological Sciences 171 23%
Biochemistry, Genetics and Molecular Biology 143 19%
Computer Science 76 10%
Medicine and Dentistry 42 6%
Immunology and Microbiology 31 4%
Other 105 14%
Unknown 177 24%
Attention Score in Context

Attention Score in Context

This research output has an Altmetric Attention Score of 70. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 09 June 2022.
All research outputs
#620,937
of 25,646,963 outputs
Outputs from PLoS Computational Biology
#452
of 9,021 outputs
Outputs of similar age
#12,193
of 370,826 outputs
Outputs of similar age from PLoS Computational Biology
#12
of 144 outputs
Altmetric has tracked 25,646,963 research outputs across all sources so far. Compared to these this one has done particularly well and is in the 97th percentile: it's in the top 5% of all research outputs ever tracked by Altmetric.
So far Altmetric has tracked 9,021 research outputs from this source. They typically receive a lot more attention than average, with a mean Attention Score of 20.3. This one has done particularly well, scoring higher than 94% of its peers.
Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 370,826 tracked outputs that were published within six weeks on either side of this one in any source. This one has done particularly well, scoring higher than 96% of its contemporaries.
We're also able to compare this research output to 144 others from the same source and published within six weeks on either side of this one. This one has done particularly well, scoring higher than 91% of its contemporaries.