Chapter title |
GMAP and GSNAP for Genomic Sequence Alignment: Enhancements to Speed, Accuracy, and Functionality
|
---|---|
Chapter number | 15 |
Book title |
Statistical Genomics
|
Published in |
Methods in molecular biology, January 2016
|
DOI | 10.1007/978-1-4939-3578-9_15 |
Pubmed ID | |
Book ISBNs |
978-1-4939-3576-5, 978-1-4939-3578-9
|
Authors |
Thomas D. Wu, Jens Reeder, Michael Lawrence, Gabe Becker, Matthew J. Brauer, Wu, Thomas D, Reeder, Jens, Lawrence, Michael, Becker, Gabe, Brauer, Matthew J, Wu, Thomas D., Brauer, Matthew J. |
Editors |
Ewy Mathé, Sean Davis |
Abstract |
The programs GMAP and GSNAP, for aligning RNA-Seq and DNA-Seq datasets to genomes, have evolved along with advances in biological methodology to handle longer reads, larger volumes of data, and new types of biological assays. The genomic representation has been improved to include linear genomes that can compare sequences using single-instruction multiple-data (SIMD) instructions, compressed genomic hash tables with fast access using SIMD instructions, handling of large genomes with more than four billion bp, and enhanced suffix arrays (ESAs) with novel data structures for fast access. Improvements to the algorithms have included a greedy match-and-extend algorithm using suffix arrays, segment chaining using genomic hash tables, diagonalization using segmental hash tables, and nucleotide-level dynamic programming procedures that use SIMD instructions and eliminate the need for F-loop calculations. Enhancements to the functionality of the programs include standardization of indel positions, handling of ambiguous splicing, clipping and merging of overlapping paired-end reads, and alignments to circular chromosomes and alternate scaffolds. The programs have been adapted for use in pipelines by integrating their usage into R/Bioconductor packages such as gmapR and HTSeqGenie, and these pipelines have facilitated the discovery of numerous biological phenomena. |
X Demographics
Geographical breakdown
Country | Count | As % |
---|---|---|
United States | 1 | 20% |
Unknown | 4 | 80% |
Demographic breakdown
Type | Count | As % |
---|---|---|
Members of the public | 4 | 80% |
Scientists | 1 | 20% |
Mendeley readers
Geographical breakdown
Country | Count | As % |
---|---|---|
Switzerland | 1 | <1% |
Unknown | 170 | 99% |
Demographic breakdown
Readers by professional status | Count | As % |
---|---|---|
Student > Ph. D. Student | 38 | 22% |
Researcher | 35 | 20% |
Student > Doctoral Student | 18 | 11% |
Student > Master | 17 | 10% |
Student > Bachelor | 9 | 5% |
Other | 25 | 15% |
Unknown | 29 | 17% |
Readers by discipline | Count | As % |
---|---|---|
Biochemistry, Genetics and Molecular Biology | 51 | 30% |
Agricultural and Biological Sciences | 50 | 29% |
Computer Science | 16 | 9% |
Immunology and Microbiology | 7 | 4% |
Medicine and Dentistry | 3 | 2% |
Other | 8 | 5% |
Unknown | 36 | 21% |