↓ Skip to main content

A framework for feature extraction from hospital medical data with applications in risk prediction

Overview of attention for article published in BMC Bioinformatics, December 2014
Altmetric Badge

About this Attention Score

  • Average Attention Score compared to outputs of the same age

Mentioned by

twitter
2 X users
facebook
1 Facebook page

Citations

dimensions_citation
31 Dimensions

Readers on

mendeley
99 Mendeley
citeulike
1 CiteULike
You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.
Title
A framework for feature extraction from hospital medical data with applications in risk prediction
Published in
BMC Bioinformatics, December 2014
DOI 10.1186/s12859-014-0425-8
Pubmed ID
Authors

Truyen Tran, Wei Luo, Dinh Phung, Sunil Gupta, Santu Rana, Richard Lee Kennedy, Ann Larkins, Svetha Venkatesh

Abstract

BackgroundFeature engineering is a time consuming component of predictive modelling. We propose a versatile platform to automatically extract features for risk prediction, based on a pre-defined and extensible entity schema. The extraction is independent of disease type or risk prediction task. We contrast auto-extracted features to baselines generated from the Elixhauser comorbidities.ResultsHospital medical records was transformed to event sequences, to which filters were applied to extract feature sets capturing diversity in temporal scales and data types. The features were evaluated on a readmission prediction task, comparing with baseline feature sets generated from the Elixhauser comorbidities. The prediction model was through logistic regression with elastic net regularization. Predictions horizons of 1, 2, 3, 6, 12 months were considered for four diverse diseases: diabetes, COPD, mental disorders and pneumonia, with derivation and validation cohorts defined on non-overlapping data-collection periods.For unplanned readmissions, auto-extracted feature set using socio-demographic information and medical records, outperformed baselines derived from the socio-demographic information and Elixhauser comorbidities, over 20 settings (5 prediction horizons over 4 diseases). In particular over 30-day prediction, the AUCs are: COPD¿baseline: 0.60 (95% CI: 0.57, 0.63), auto-extracted: 0.67 (0.64, 0.70); diabetes¿baseline: 0.60 (0.58, 0.63), auto-extracted: 0.67 (0.64, 0.69); mental disorders¿baseline: 0.57 (0.54, 0.60), auto-extracted: 0.69 (0.64,0.70); pneumonia¿baseline: 0.61 (0.59, 0.63), auto-extracted: 0.70 (0.67, 0.72).ConclusionsThe advantages of auto-extracted standard features from complex medical records, in a disease and task agnostic manner were demonstrated. Auto-extracted features have good predictive power over multiple time horizons. Such feature sets have potential to form the foundation of complex automated analytic tasks.

X Demographics

X Demographics

The data shown below were collected from the profiles of 2 X users who shared this research output. Click here to find out more about how the information was compiled.
Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 99 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
Canada 2 2%
Malaysia 1 1%
United States 1 1%
Australia 1 1%
Unknown 94 95%

Demographic breakdown

Readers by professional status Count As %
Student > Ph. D. Student 22 22%
Researcher 13 13%
Student > Master 11 11%
Other 8 8%
Student > Bachelor 6 6%
Other 18 18%
Unknown 21 21%
Readers by discipline Count As %
Computer Science 33 33%
Medicine and Dentistry 16 16%
Engineering 10 10%
Unspecified 3 3%
Agricultural and Biological Sciences 3 3%
Other 7 7%
Unknown 27 27%
Attention Score in Context

Attention Score in Context

This research output has an Altmetric Attention Score of 1. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 03 January 2015.
All research outputs
#15,313,289
of 22,775,504 outputs
Outputs from BMC Bioinformatics
#5,373
of 7,276 outputs
Outputs of similar age
#208,678
of 352,738 outputs
Outputs of similar age from BMC Bioinformatics
#105
of 151 outputs
Altmetric has tracked 22,775,504 research outputs across all sources so far. This one is in the 22nd percentile – i.e., 22% of other outputs scored the same or lower than it.
So far Altmetric has tracked 7,276 research outputs from this source. They typically receive a little more attention than average, with a mean Attention Score of 5.4. This one is in the 18th percentile – i.e., 18% of its peers scored the same or lower than it.
Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 352,738 tracked outputs that were published within six weeks on either side of this one in any source. This one is in the 31st percentile – i.e., 31% of its contemporaries scored the same or lower than it.
We're also able to compare this research output to 151 others from the same source and published within six weeks on either side of this one. This one is in the 17th percentile – i.e., 17% of its contemporaries scored the same or lower than it.