Report for: Sortal anaphora resolution to enhance relation extraction from biomedical literature

Title	Sortal anaphora resolution to enhance relation extraction from biomedical literature
Published in	BMC Bioinformatics, April 2016
DOI	10.1186/s12859-016-1009-6
Pubmed ID	27080229
Authors	Halil Kilicoglu, Graciela Rosemblat, Marcelo Fiszman, Thomas C. Rindflesch
Abstract	Entity coreference is common in biomedical literature and it can affect text understanding systems that rely on accurate identification of named entities, such as relation extraction and automatic summarization. Coreference resolution is a foundational yet challenging natural language processing task which, if performed successfully, is likely to enhance such systems significantly. In this paper, we propose a semantically oriented, rule-based method to resolve sortal anaphora, a specific type of coreference that forms the majority of coreference instances in biomedical literature. The method addresses all entity types and relies on linguistic components of SemRep, a broad-coverage biomedical relation extraction system. It has been incorporated into SemRep, extending its core semantic interpretation capability from sentence level to discourse level. We evaluated our sortal anaphora resolution method in several ways. The first evaluation specifically focused on sortal anaphora relations. Our methodology achieved a F1 score of 59.6 on the test portion of a manually annotated corpus of 320 Medline abstracts, a 4-fold improvement over the baseline method. Investigating the impact of sortal anaphora resolution on relation extraction, we found that the overall effect was positive, with 50 % of the changes involving uninformative relations being replaced by more specific and informative ones, while 35 % of the changes had no effect, and only 15 % were negative. We estimate that anaphora resolution results in changes in about 1.5 % of approximately 82 million semantic relations extracted from the entire PubMed. Our results demonstrate that a heavily semantic approach to sortal anaphora resolution is largely effective for biomedical literature. Our evaluation and error analysis highlight some areas for further improvements, such as coordination processing and intra-sentential antecedent selection.

View on publisher site Alert me about new mentions

X Demographics

The data shown below were collected from the profiles of 2 X users who shared this research output. Click here to find out more about how the information was compiled.

Geographical breakdown

Country	Count	As %
Unknown	2	100%

Demographic breakdown

Type	Count	As %
Scientists	2	100%

Mendeley readers

The data shown below were compiled from readership statistics for 44 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country	Count	As %
Unknown	44	100%

Demographic breakdown

Readers by professional status	Count	As %
Researcher	15	34%
Student > Ph. D. Student	10	23%
Student > Master	3	7%
Student > Bachelor	2	5%
Student > Postgraduate	2	5%
Other	3	7%
Unknown	9	20%

Readers by discipline	Count	As %
Computer Science	18	41%
Medicine and Dentistry	7	16%
Agricultural and Biological Sciences	3	7%
Biochemistry, Genetics and Molecular Biology	2	5%
Linguistics	1	2%
Other	4	9%
Unknown	9	20%

Attention Score in Context

This research output has an Altmetric Attention Score of 1. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 14 April 2016.

All research outputs

#18,451,892

of 22,862,742 outputs

Outputs from BMC Bioinformatics

#6,328

of 7,295 outputs

Outputs of similar age

#220,037

of 300,620 outputs

Outputs of similar age from BMC Bioinformatics

#94

of 107 outputs

Altmetric has tracked 22,862,742 research outputs across all sources so far. This one is in the 11th percentile – i.e., 11% of other outputs scored the same or lower than it.

So far Altmetric has tracked 7,295 research outputs from this source. They typically receive a little more attention than average, with a mean Attention Score of 5.4. This one is in the 5th percentile – i.e., 5% of its peers scored the same or lower than it.

Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 300,620 tracked outputs that were published within six weeks on either side of this one in any source. This one is in the 15th percentile – i.e., 15% of its contemporaries scored the same or lower than it.

We're also able to compare this research output to 107 others from the same source and published within six weeks on either side of this one. This one is in the 3rd percentile – i.e., 3% of its contemporaries scored the same or lower than it.

Sortal anaphora resolution to enhance relation extraction from biomedical literature

Mentioned by

Citations

Readers on

X Demographics

Geographical breakdown

Demographic breakdown

Mendeley readers

Geographical breakdown

Demographic breakdown

Attention Score in Context