Title |
OrfM: a fast open reading frame predictor for metagenomic data
|
---|---|
Published in |
Bioinformatics, May 2016
|
DOI | 10.1093/bioinformatics/btw241 |
Pubmed ID | |
Authors |
Ben J Woodcroft, Joel A Boyd, Gene W Tyson |
Abstract |
Finding and translating stretches of DNA lacking stop codons is a task common in the analysis of sequence data. However the computational tools for finding open reading frames are sufficiently slow that they are becoming a bottleneck as the volume of sequence data grows. This computational bottleneck is especially problematic in metagenomics when searching unassembled reads, or screening assembled contigs for genes of interest. Here we present OrfM, a tool to rapidly identify open reading frames (ORFs) in sequence data by applying the Aho-Corasick algorithm to find regions uninterrupted by stop codons. Benchmarking revealed that OrfM finds identical ORFs to similar tools ('GetOrf' and 'Translate') but is five times faster. While OrfM is sequencing platform-agnostic, it is best suited to large, high quality datasets such as those produced by Illumina sequencers. Source code and binaries freely available for download at http://github.com/wwood/OrfM or through GNU Guix under the LGPL 3+ license, implemented in C and supported on GNU/Linux and OSX. [email protected]; [email protected] SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. |
X Demographics
Geographical breakdown
Country | Count | As % |
---|---|---|
United States | 6 | 32% |
United Kingdom | 2 | 11% |
Australia | 2 | 11% |
Norway | 1 | 5% |
Canada | 1 | 5% |
Mexico | 1 | 5% |
Spain | 1 | 5% |
Taiwan | 1 | 5% |
Unknown | 4 | 21% |
Demographic breakdown
Type | Count | As % |
---|---|---|
Scientists | 11 | 58% |
Members of the public | 8 | 42% |
Mendeley readers
Geographical breakdown
Country | Count | As % |
---|---|---|
Brazil | 2 | 2% |
Australia | 1 | <1% |
Norway | 1 | <1% |
Sweden | 1 | <1% |
Canada | 1 | <1% |
New Zealand | 1 | <1% |
Japan | 1 | <1% |
United States | 1 | <1% |
Unknown | 107 | 92% |
Demographic breakdown
Readers by professional status | Count | As % |
---|---|---|
Student > Ph. D. Student | 29 | 25% |
Researcher | 20 | 17% |
Student > Master | 20 | 17% |
Student > Bachelor | 10 | 9% |
Student > Doctoral Student | 7 | 6% |
Other | 14 | 12% |
Unknown | 16 | 14% |
Readers by discipline | Count | As % |
---|---|---|
Agricultural and Biological Sciences | 35 | 30% |
Biochemistry, Genetics and Molecular Biology | 33 | 28% |
Environmental Science | 7 | 6% |
Computer Science | 6 | 5% |
Immunology and Microbiology | 5 | 4% |
Other | 13 | 11% |
Unknown | 17 | 15% |