Title |
StrainSeeker: fast identification of bacterial strains from raw sequencing reads using user-provided guide trees
|
---|---|
Published in |
PeerJ, May 2017
|
DOI | 10.7717/peerj.3353 |
Pubmed ID | |
Authors |
Märt Roosaare, Mihkel Vaher, Lauris Kaplinski, Märt Möls, Reidar Andreson, Maarja Lepamets, Triinu Kõressaar, Paul Naaber, Siiri Kõljalg, Maido Remm |
Abstract |
Fast, accurate and high-throughput identification of bacterial isolates is in great demand. The present work was conducted to investigate the possibility of identifying isolates from unassembled next-generation sequencing reads using custom-made guide trees. A tool named StrainSeeker was developed that constructs a list of specific k-mers for each node of any given Newick-format tree and enables the identification of bacterial isolates in 1-2 min. It uses a novel algorithm, which analyses the observed and expected fractions of node-specific k-mers to test the presence of each node in the sample. This allows StrainSeeker to determine where the isolate branches off the guide tree and assign it to a clade whereas other tools assign each read to a reference genome. Using a dataset of 100 Escherichia coli isolates, we demonstrate that StrainSeeker can predict the clades of E. coli with 92% accuracy and correct tree branch assignment with 98% accuracy. Twenty-five thousand Illumina HiSeq reads are sufficient for identification of the strain. StrainSeeker is a software program that identifies bacterial isolates by assigning them to nodes or leaves of a custom-made guide tree. StrainSeeker's web interface and pre-computed guide trees are available at http://bioinfo.ut.ee/strainseeker. Source code is stored at GitHub: https://github.com/bioinfo-ut/StrainSeeker. |
X Demographics
Geographical breakdown
Country | Count | As % |
---|---|---|
United States | 6 | 15% |
United Kingdom | 3 | 8% |
Austria | 3 | 8% |
Germany | 3 | 8% |
Israel | 1 | 3% |
China | 1 | 3% |
Finland | 1 | 3% |
Sweden | 1 | 3% |
Spain | 1 | 3% |
Other | 3 | 8% |
Unknown | 16 | 41% |
Demographic breakdown
Type | Count | As % |
---|---|---|
Scientists | 19 | 49% |
Members of the public | 18 | 46% |
Science communicators (journalists, bloggers, editors) | 1 | 3% |
Practitioners (doctors, other healthcare professionals) | 1 | 3% |
Mendeley readers
Geographical breakdown
Country | Count | As % |
---|---|---|
Germany | 2 | 1% |
France | 1 | <1% |
Egypt | 1 | <1% |
Estonia | 1 | <1% |
United States | 1 | <1% |
Unknown | 130 | 96% |
Demographic breakdown
Readers by professional status | Count | As % |
---|---|---|
Researcher | 32 | 24% |
Student > Ph. D. Student | 30 | 22% |
Student > Bachelor | 15 | 11% |
Student > Master | 12 | 9% |
Student > Postgraduate | 7 | 5% |
Other | 19 | 14% |
Unknown | 21 | 15% |
Readers by discipline | Count | As % |
---|---|---|
Agricultural and Biological Sciences | 38 | 28% |
Biochemistry, Genetics and Molecular Biology | 29 | 21% |
Computer Science | 14 | 10% |
Immunology and Microbiology | 8 | 6% |
Medicine and Dentistry | 6 | 4% |
Other | 11 | 8% |
Unknown | 30 | 22% |