↓ Skip to main content

Deep learning meets ontologies: experiments to anchor the cardiovascular disease ontology in the biomedical literature

Overview of attention for article published in Journal of Biomedical Semantics, April 2018
Altmetric Badge

Mentioned by

twitter
1 X user

Citations

dimensions_citation
33 Dimensions

Readers on

mendeley
87 Mendeley
Title
Deep learning meets ontologies: experiments to anchor the cardiovascular disease ontology in the biomedical literature
Published in
Journal of Biomedical Semantics, April 2018
DOI 10.1186/s13326-018-0181-1
Pubmed ID
Authors

Mercedes Arguello Casteleiro, George Demetriou, Warren Read, Maria Jesus Fernandez Prieto, Nava Maroto, Diego Maseda Fernandez, Goran Nenadic, Julie Klein, John Keane, Robert Stevens

Abstract

Automatic identification of term variants or acceptable alternative free-text terms for gene and protein names from the millions of biomedical publications is a challenging task. Ontologies, such as the Cardiovascular Disease Ontology (CVDO), capture domain knowledge in a computational form and can provide context for gene/protein names as written in the literature. This study investigates: 1) if word embeddings from Deep Learning algorithms can provide a list of term variants for a given gene/protein of interest; and 2) if biological knowledge from the CVDO can improve such a list without modifying the word embeddings created. We have manually annotated 105 gene/protein names from 25 PubMed titles/abstracts and mapped them to 79 unique UniProtKB entries corresponding to gene and protein classes from the CVDO. Using more than 14 M PubMed articles (titles and available abstracts), word embeddings were generated with CBOW and Skip-gram. We setup two experiments for a synonym detection task, each with four raters, and 3672 pairs of terms (target term and candidate term) from the word embeddings created. For Experiment I, the target terms for 64 UniProtKB entries were those that appear in the titles/abstracts; Experiment II involves 63 UniProtKB entries and the target terms are a combination of terms from PubMed titles/abstracts with terms (i.e. increased context) from the CVDO protein class expressions and labels. In Experiment I, Skip-gram finds term variants (full and/or partial) for 89% of the 64 UniProtKB entries, while CBOW finds term variants for 67%. In Experiment II (with the aid of the CVDO), Skip-gram finds term variants for 95% of the 63 UniProtKB entries, while CBOW finds term variants for 78%. Combining the results of both experiments, Skip-gram finds term variants for 97% of the 79 UniProtKB entries, while CBOW finds term variants for 81%. This study shows performance improvements for both CBOW and Skip-gram on a gene/protein synonym detection task by adding knowledge formalised in the CVDO and without modifying the word embeddings created. Hence, the CVDO supplies context that is effective in inducing term variability for both CBOW and Skip-gram while reducing ambiguity. Skip-gram outperforms CBOW and finds more pertinent term variants for gene/protein names annotated from the scientific literature.

X Demographics

X Demographics

The data shown below were collected from the profile of 1 X user who shared this research output. Click here to find out more about how the information was compiled.
Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 87 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
Unknown 87 100%

Demographic breakdown

Readers by professional status Count As %
Student > Ph. D. Student 19 22%
Researcher 13 15%
Other 6 7%
Student > Master 6 7%
Student > Doctoral Student 6 7%
Other 13 15%
Unknown 24 28%
Readers by discipline Count As %
Computer Science 35 40%
Agricultural and Biological Sciences 4 5%
Business, Management and Accounting 3 3%
Biochemistry, Genetics and Molecular Biology 2 2%
Linguistics 2 2%
Other 10 11%
Unknown 31 36%
Attention Score in Context

Attention Score in Context

This research output has an Altmetric Attention Score of 1. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 13 June 2018.
All research outputs
#15,708,425
of 23,344,526 outputs
Outputs from Journal of Biomedical Semantics
#238
of 367 outputs
Outputs of similar age
#210,948
of 330,046 outputs
Outputs of similar age from Journal of Biomedical Semantics
#3
of 3 outputs
Altmetric has tracked 23,344,526 research outputs across all sources so far. This one is in the 22nd percentile – i.e., 22% of other outputs scored the same or lower than it.
So far Altmetric has tracked 367 research outputs from this source. They receive a mean Attention Score of 4.6. This one is in the 20th percentile – i.e., 20% of its peers scored the same or lower than it.
Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 330,046 tracked outputs that were published within six weeks on either side of this one in any source. This one is in the 27th percentile – i.e., 27% of its contemporaries scored the same or lower than it.
We're also able to compare this research output to 3 others from the same source and published within six weeks on either side of this one.