Title |
Mining Skeletal Phenotype Descriptions from Scientific Literature
|
---|---|
Published in |
PLOS ONE, February 2013
|
DOI | 10.1371/journal.pone.0055656 |
Pubmed ID | |
Authors |
Tudor Groza, Jane Hunter, Andreas Zankl |
Abstract |
Phenotype descriptions are important for our understanding of genetics, as they enable the computation and analysis of a varied range of issues related to the genetic and developmental bases of correlated characters. The literature contains a wealth of such phenotype descriptions, usually reported as free-text entries, similar to typical clinical summaries. In this paper, we focus on creating and making available an annotated corpus of skeletal phenotype descriptions. In addition, we present and evaluate a hybrid Machine Learning approach for mining phenotype descriptions from free text. Our hybrid approach uses an ensemble of four classifiers and experiments with several aggregation techniques. The best scoring technique achieves an F-1 score of 71.52%, which is close to the state-of-the-art in other domains, where training data exists in abundance. Finally, we discuss the influence of the features chosen for the model on the overall performance of the method. |
X Demographics
Geographical breakdown
Country | Count | As % |
---|---|---|
United Kingdom | 1 | 100% |
Demographic breakdown
Type | Count | As % |
---|---|---|
Scientists | 1 | 100% |
Mendeley readers
Geographical breakdown
Country | Count | As % |
---|---|---|
Spain | 1 | 4% |
Canada | 1 | 4% |
Unknown | 26 | 93% |
Demographic breakdown
Readers by professional status | Count | As % |
---|---|---|
Student > Ph. D. Student | 3 | 11% |
Professor > Associate Professor | 3 | 11% |
Other | 2 | 7% |
Student > Bachelor | 2 | 7% |
Student > Doctoral Student | 2 | 7% |
Other | 7 | 25% |
Unknown | 9 | 32% |
Readers by discipline | Count | As % |
---|---|---|
Medicine and Dentistry | 8 | 29% |
Computer Science | 5 | 18% |
Agricultural and Biological Sciences | 4 | 14% |
Biochemistry, Genetics and Molecular Biology | 2 | 7% |
Unspecified | 1 | 4% |
Other | 1 | 4% |
Unknown | 7 | 25% |