Title |
MetaBinG2: a fast and accurate metagenomic sequence classification system for samples with many unknown organisms
|
---|---|
Published in |
Biology Direct, August 2018
|
DOI | 10.1186/s13062-018-0220-y |
Pubmed ID | |
Authors |
Yuyang Qiao, Ben Jia, Zhiqiang Hu, Chen Sun, Yijin Xiang, Chaochun Wei |
Abstract |
Many methods have been developed for metagenomic sequence classification, and most of them depend heavily on genome sequences of the known organisms. A large portion of sequencing sequences may be classified as unknown, which greatly impairs our understanding of the whole sample. Here we present MetaBinG2, a fast method for metagenomic sequence classification, especially for samples with a large number of unknown organisms. MetaBinG2 is based on sequence composition, and uses GPUs to accelerate its speed. A million 100 bp Illumina sequences can be classified in about 1 min on a computer with one GPU card. We evaluated MetaBinG2 by comparing it to multiple popular existing methods. We then applied MetaBinG2 to the dataset of MetaSUB Inter-City Challenge provided by CAMDA data analysis contest and compared community composition structures for environmental samples from different public places across cities. Compared to existing methods, MetaBinG2 is fast and accurate, especially for those samples with significant proportions of unknown organisms. This article was reviewed by Drs. Eran Elhaik, Nicolas Rascovan, and Serghei Mangul. |
X Demographics
Geographical breakdown
Country | Count | As % |
---|---|---|
Australia | 2 | 15% |
France | 1 | 8% |
United States | 1 | 8% |
Netherlands | 1 | 8% |
Sweden | 1 | 8% |
Norway | 1 | 8% |
India | 1 | 8% |
Vietnam | 1 | 8% |
Unknown | 4 | 31% |
Demographic breakdown
Type | Count | As % |
---|---|---|
Scientists | 6 | 46% |
Members of the public | 6 | 46% |
Practitioners (doctors, other healthcare professionals) | 1 | 8% |
Mendeley readers
Geographical breakdown
Country | Count | As % |
---|---|---|
Unknown | 26 | 100% |
Demographic breakdown
Readers by professional status | Count | As % |
---|---|---|
Student > Ph. D. Student | 5 | 19% |
Student > Master | 4 | 15% |
Professor | 3 | 12% |
Researcher | 3 | 12% |
Student > Doctoral Student | 2 | 8% |
Other | 7 | 27% |
Unknown | 2 | 8% |
Readers by discipline | Count | As % |
---|---|---|
Biochemistry, Genetics and Molecular Biology | 8 | 31% |
Agricultural and Biological Sciences | 7 | 27% |
Computer Science | 3 | 12% |
Immunology and Microbiology | 3 | 12% |
Environmental Science | 1 | 4% |
Other | 1 | 4% |
Unknown | 3 | 12% |