Title |
Incorporating domain knowledge in chemical and biomedical named entity recognition with word representations
|
---|---|
Published in |
Journal of Cheminformatics, January 2015
|
DOI | 10.1186/1758-2946-7-s1-s9 |
Pubmed ID | |
Authors |
Tsendsuren Munkhdalai, Meijing Li, Khuyagbaatar Batsuren, Hyeon Ah Park, Nak Hyeon Choi, Keun Ho Ryu |
Abstract |
Chemical and biomedical Named Entity Recognition (NER) is an essential prerequisite task before effective text mining can begin for biochemical-text data. Exploiting unlabeled text data to leverage system performance has been an active and challenging research topic in text mining due to the recent growth in the amount of biomedical literature. We present a semi-supervised learning method that efficiently exploits unlabeled data in order to incorporate domain knowledge into a named entity recognition model and to leverage system performance. The proposed method includes Natural Language Processing (NLP) tasks for text preprocessing, learning word representation features from a large amount of text data for feature extraction, and conditional random fields for token classification. Other than the free text in the domain, the proposed method does not rely on any lexicon nor any dictionary in order to keep the system applicable to other NER tasks in bio-text data. |
Mendeley readers
Geographical breakdown
Country | Count | As % |
---|---|---|
India | 1 | 1% |
Germany | 1 | 1% |
Korea, Republic of | 1 | 1% |
Unknown | 96 | 97% |
Demographic breakdown
Readers by professional status | Count | As % |
---|---|---|
Student > Ph. D. Student | 17 | 17% |
Student > Master | 15 | 15% |
Researcher | 11 | 11% |
Student > Doctoral Student | 7 | 7% |
Student > Bachelor | 7 | 7% |
Other | 18 | 18% |
Unknown | 24 | 24% |
Readers by discipline | Count | As % |
---|---|---|
Computer Science | 40 | 40% |
Agricultural and Biological Sciences | 7 | 7% |
Psychology | 4 | 4% |
Neuroscience | 3 | 3% |
Social Sciences | 3 | 3% |
Other | 18 | 18% |
Unknown | 24 | 24% |