Chapter title |
Protein Sequence Analysis by Proximities
|
---|---|
Chapter number | 12 |
Book title |
Statistical Analysis in Proteomics
|
Published in |
Methods in molecular biology, January 2016
|
DOI | 10.1007/978-1-4939-3106-4_12 |
Pubmed ID | |
Book ISBNs |
978-1-4939-3105-7, 978-1-4939-3106-4
|
Authors |
Frank-Michael Schleif, Schleif, Frank-Michael |
Abstract |
Sequence data are widely used to get a deeper insight into biological systems. From a data analysis perspective they are given as a set of sequences of symbols with varying length. In general they are compared using nonmetric score functions. In this form the data are nonstandard, because they do not provide an immediate metric vector space and their analysis using standard methods is complicated. In this chapter we provide various strategies for how to analyze these type of data in a mathematically accurate way instead of the often seen ad hoc solutions. Our approach is based on the scoring values from protein sequence data although be applicable in a broader sense. We discuss potential recoding concepts of the scores and discuss algorithms to solve clustering, classification and embedding tasks for score data for a protein sequence application. |
Mendeley readers
Geographical breakdown
Country | Count | As % |
---|---|---|
Unknown | 8 | 100% |
Demographic breakdown
Readers by professional status | Count | As % |
---|---|---|
Student > Ph. D. Student | 3 | 38% |
Student > Bachelor | 1 | 13% |
Student > Master | 1 | 13% |
Unknown | 3 | 38% |
Readers by discipline | Count | As % |
---|---|---|
Biochemistry, Genetics and Molecular Biology | 1 | 13% |
Agricultural and Biological Sciences | 1 | 13% |
Computer Science | 1 | 13% |
Neuroscience | 1 | 13% |
Design | 1 | 13% |
Other | 0 | 0% |
Unknown | 3 | 38% |