Title |
Rewriting and suppressing UMLS terms for improved biomedical term identification
|
---|---|
Published in |
Journal of Biomedical Semantics, March 2010
|
DOI | 10.1186/2041-1480-1-5 |
Pubmed ID | |
Authors |
Kristina M Hettne, Erik M van Mulligen, Martijn J Schuemie, Bob JA Schijvenaars, Jan A Kors |
Abstract |
Identification of terms is essential for biomedical text mining.. We concentrate here on the use of vocabularies for term identification, specifically the Unified Medical Language System (UMLS). To make the UMLS more suitable for biomedical text mining we implemented and evaluated nine term rewrite and eight term suppression rules. The rules rely on UMLS properties that have been identified in previous work by others, together with an additional set of new properties discovered by our group during our work with the UMLS. Our work complements the earlier work in that we measure the impact on the number of terms identified by the different rules on a MEDLINE corpus. The number of uniquely identified terms and their frequency in MEDLINE were computed before and after applying the rules. The 50 most frequently found terms together with a sample of 100 randomly selected terms were evaluated for every rule. |
Mendeley readers
Geographical breakdown
Country | Count | As % |
---|---|---|
Netherlands | 4 | 7% |
Portugal | 3 | 5% |
United Kingdom | 2 | 4% |
Spain | 2 | 4% |
Belgium | 1 | 2% |
Australia | 1 | 2% |
China | 1 | 2% |
United States | 1 | 2% |
Unknown | 41 | 73% |
Demographic breakdown
Readers by professional status | Count | As % |
---|---|---|
Researcher | 18 | 32% |
Student > Ph. D. Student | 16 | 29% |
Other | 5 | 9% |
Professor > Associate Professor | 4 | 7% |
Student > Master | 4 | 7% |
Other | 6 | 11% |
Unknown | 3 | 5% |
Readers by discipline | Count | As % |
---|---|---|
Computer Science | 28 | 50% |
Agricultural and Biological Sciences | 15 | 27% |
Linguistics | 2 | 4% |
Business, Management and Accounting | 1 | 2% |
Unspecified | 1 | 2% |
Other | 5 | 9% |
Unknown | 4 | 7% |