Report for: A comparative study: classification vs. user-based collaborative filtering for clinical prediction

Title	A comparative study: classification vs. user-based collaborative filtering for clinical prediction
Published in	BMC Medical Research Methodology, December 2016
DOI	10.1186/s12874-016-0261-9
Pubmed ID	27931207
Authors	Fang Hao, Rachael Hageman Blair
Abstract	Recommender systems have shown tremendous value for the prediction of personalized item recommendations for individuals in a variety of settings (e.g., marketing, e-commerce, etc.). User-based collaborative filtering is a popular recommender system, which leverages an individuals' prior satisfaction with items, as well as the satisfaction of individuals that are "similar". Recently, there have been applications of collaborative filtering based recommender systems for clinical risk prediction. In these applications, individuals represent patients, and items represent clinical data, which includes an outcome. Application of recommender systems to a problem of this type requires the recasting a supervised learning problem as unsupervised. The rationale is that patients with similar clinical features carry a similar disease risk. As the "Big Data" era progresses, it is likely that approaches of this type will be reached for as biomedical data continues to grow in both size and complexity (e.g., electronic health records). In the present study, we set out to understand and assess the performance of recommender systems in a controlled yet realistic setting. User-based collaborative filtering recommender systems are compared to logistic regression and random forests with different types of imputation and varying amounts of missingness on four different publicly available medical data sets: National Health and Nutrition Examination Survey (NHANES, 2011-2012 on Obesity), Study to Understand Prognoses Preferences Outcomes and Risks of Treatment (SUPPORT), chronic kidney disease, and dermatology data. We also examined performance using simulated data with observations that are Missing At Random (MAR) or Missing Completely At Random (MCAR) under various degrees of missingness and levels of class imbalance in the response variable. Our results demonstrate that user-based collaborative filtering is consistently inferior to logistic regression and random forests with different imputations on real and simulated data. The results warrant caution for the collaborative filtering for the purpose of clinical risk prediction when traditional classification is feasible and practical. CF may not be desirable in datasets where classification is an acceptable alternative. We describe some natural applications related to "Big Data" where CF would be preferred and conclude with some insights as to why caution may be warranted in this context.

View on publisher site Alert me about new mentions

X Demographics

The data shown below were collected from the profiles of 2 X users who shared this research output. Click here to find out more about how the information was compiled.

Geographical breakdown

Country	Count	As %
Venezuela, Bolivarian Republic of	1	50%
Unknown	1	50%

Demographic breakdown

Type	Count	As %
Members of the public	1	50%
Scientists	1	50%

Mendeley readers

The data shown below were compiled from readership statistics for 81 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country	Count	As %
Unknown	81	100%

Demographic breakdown

Readers by professional status	Count	As %
Student > Master	16	20%
Student > Ph. D. Student	11	14%
Researcher	9	11%
Student > Bachelor	7	9%
Student > Doctoral Student	5	6%
Other	13	16%
Unknown	20	25%

Readers by discipline	Count	As %
Computer Science	21	26%
Medicine and Dentistry	10	12%
Engineering	8	10%
Business, Management and Accounting	4	5%
Psychology	3	4%
Other	12	15%
Unknown	23	28%

Attention Score in Context

This research output has an Altmetric Attention Score of 2. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 01 November 2017.

All research outputs

#14,878,745

of 22,912,409 outputs

Outputs from BMC Medical Research Methodology

#1,450

of 2,025 outputs

Outputs of similar age

#240,633

of 419,640 outputs

Outputs of similar age from BMC Medical Research Methodology

#22

of 29 outputs

Altmetric has tracked 22,912,409 research outputs across all sources so far. This one is in the 33rd percentile – i.e., 33% of other outputs scored the same or lower than it.

So far Altmetric has tracked 2,025 research outputs from this source. They typically receive a lot more attention than average, with a mean Attention Score of 10.1. This one is in the 25th percentile – i.e., 25% of its peers scored the same or lower than it.

Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 419,640 tracked outputs that were published within six weeks on either side of this one in any source. This one is in the 39th percentile – i.e., 39% of its contemporaries scored the same or lower than it.

We're also able to compare this research output to 29 others from the same source and published within six weeks on either side of this one. This one is in the 20th percentile – i.e., 20% of its contemporaries scored the same or lower than it.

A comparative study: classification vs. user-based collaborative filtering for clinical prediction

About this Attention Score

Mentioned by

Citations

Readers on

X Demographics

Geographical breakdown

Demographic breakdown

Mendeley readers

Geographical breakdown

Demographic breakdown

Attention Score in Context