↓ Skip to main content

Mimvec: a deep learning approach for analyzing the human phenome

Overview of attention for article published in BMC Systems Biology, September 2017
Altmetric Badge

About this Attention Score

  • Good Attention Score compared to outputs of the same age (68th percentile)
  • High Attention Score compared to outputs of the same age and source (88th percentile)

Mentioned by

twitter
6 X users
f1000
1 research highlight platform

Citations

dimensions_citation
6 Dimensions

Readers on

mendeley
57 Mendeley
You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.
Title
Mimvec: a deep learning approach for analyzing the human phenome
Published in
BMC Systems Biology, September 2017
DOI 10.1186/s12918-017-0451-z
Pubmed ID
Authors

Mingxin Gan, Wenran Li, Wanwen Zeng, Xiaojian Wang, Rui Jiang

Abstract

The human phenome has been widely used with a variety of genomic data sources in the inference of disease genes. However, most existing methods thus far derive phenotype similarity based on the analysis of biomedical databases by using the traditional term frequency-inverse document frequency (TF-IDF) formulation. This framework, though intuitive, not only ignores semantic relationships between words but also tends to produce high-dimensional vectors, and hence lacks the ability to precisely capture intrinsic semantic characteristics of biomedical documents. To overcome these limitations, we propose a framework called mimvec to analyze the human phenome by making use of the state-of-the-art deep learning technique in natural language processing. We converted 24,061 records in the Online Mendelian Inheritance in Man (OMIM) database to low-dimensional vectors using our method. We demonstrated that the vector presentation not only effectively enabled classification of phenotype records against gene ones, but also succeeded in discriminating diseases of different inheritance styles and different mechanisms. We further derived pairwise phenotype similarities between 7988 human inherited diseases using their vector presentations. With a joint analysis of this phenome with multiple genomic data, we showed that phenotype overlap indeed implied genotype overlap. We finally used the derived phenotype similarities with genomic data to prioritize candidate genes and demonstrated advantages of this method over existing ones. Our method is capable of not only capturing semantic relationships between words in biomedical records but also alleviating the dimensional disaster accompanying the traditional TF-IDF framework. With the approaching of precision medicine, there will be abundant electronic records of medicine and health awaiting for deep analysis, and we expect to see a wide spectrum of applications borrowing the idea of our method in the near future.

X Demographics

X Demographics

The data shown below were collected from the profiles of 6 X users who shared this research output. Click here to find out more about how the information was compiled.
Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 57 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
Unknown 57 100%

Demographic breakdown

Readers by professional status Count As %
Student > Ph. D. Student 9 16%
Researcher 8 14%
Other 5 9%
Student > Master 5 9%
Student > Bachelor 4 7%
Other 9 16%
Unknown 17 30%
Readers by discipline Count As %
Computer Science 10 18%
Medicine and Dentistry 7 12%
Biochemistry, Genetics and Molecular Biology 6 11%
Engineering 3 5%
Nursing and Health Professions 2 4%
Other 5 9%
Unknown 24 42%
Attention Score in Context

Attention Score in Context

This research output has an Altmetric Attention Score of 5. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 06 October 2017.
All research outputs
#6,483,378
of 23,571,271 outputs
Outputs from BMC Systems Biology
#221
of 1,135 outputs
Outputs of similar age
#101,367
of 319,231 outputs
Outputs of similar age from BMC Systems Biology
#3
of 18 outputs
Altmetric has tracked 23,571,271 research outputs across all sources so far. This one has received more attention than most of these and is in the 72nd percentile.
So far Altmetric has tracked 1,135 research outputs from this source. They receive a mean Attention Score of 3.6. This one has done well, scoring higher than 80% of its peers.
Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 319,231 tracked outputs that were published within six weeks on either side of this one in any source. This one has gotten more attention than average, scoring higher than 68% of its contemporaries.
We're also able to compare this research output to 18 others from the same source and published within six weeks on either side of this one. This one has done well, scoring higher than 88% of its contemporaries.