Title |
The Challenge of Small-Scale Repeats for Indel Discovery
|
---|---|
Published in |
Frontiers in Bioengineering and Biotechnology, January 2015
|
DOI | 10.3389/fbioe.2015.00008 |
Pubmed ID | |
Authors |
Giuseppe Narzisi, Michael C. Schatz |
Abstract |
Repetitive sequences are abundant in the human genome. Different classes of repetitive DNA sequences, including simple repeats, tandem repeats, segmental duplications, interspersed repeats, and other elements, collectively span more than 50% of the genome. Because repeat sequences occur in the genome at different scales they can cause various types of sequence analysis errors, including in alignment, de novo assembly, and annotation, among others. This mini-review highlights the challenges introduced by small-scale repeat sequences, especially near-identical tandem or closely located repeats and short tandem repeats, for discovering DNA insertion and deletion (indel) mutations from next-generation sequencing data. We also discuss the de Bruijn graph sequence assembly paradigm that is emerging as the most popular and promising approach for detecting indels. The human exome is taken as an example and highlights how these repetitive elements can obscure or introduce errors while detecting these types of mutations. |
X Demographics
Geographical breakdown
Country | Count | As % |
---|---|---|
United States | 4 | 31% |
Canada | 3 | 23% |
Norway | 1 | 8% |
Unknown | 5 | 38% |
Demographic breakdown
Type | Count | As % |
---|---|---|
Scientists | 10 | 77% |
Members of the public | 2 | 15% |
Science communicators (journalists, bloggers, editors) | 1 | 8% |
Mendeley readers
Geographical breakdown
Country | Count | As % |
---|---|---|
China | 2 | 2% |
Norway | 1 | 1% |
Italy | 1 | 1% |
France | 1 | 1% |
Belgium | 1 | 1% |
United Kingdom | 1 | 1% |
Russia | 1 | 1% |
United States | 1 | 1% |
Unknown | 79 | 90% |
Demographic breakdown
Readers by professional status | Count | As % |
---|---|---|
Student > Ph. D. Student | 22 | 25% |
Researcher | 16 | 18% |
Student > Bachelor | 11 | 13% |
Student > Master | 10 | 11% |
Other | 5 | 6% |
Other | 11 | 13% |
Unknown | 13 | 15% |
Readers by discipline | Count | As % |
---|---|---|
Biochemistry, Genetics and Molecular Biology | 24 | 27% |
Agricultural and Biological Sciences | 23 | 26% |
Computer Science | 9 | 10% |
Medicine and Dentistry | 5 | 6% |
Mathematics | 3 | 3% |
Other | 10 | 11% |
Unknown | 14 | 16% |