↓ Skip to main content

Sorting Through the Safety Data Haystack: Using Machine Learning to Identify Individual Case Safety Reports in Social-Digital Media

Overview of attention for article published in Drug Safety, February 2018
Altmetric Badge

About this Attention Score

  • In the top 5% of all research outputs scored by Altmetric
  • One of the highest-scoring outputs from this source (#9 of 1,870)
  • High Attention Score compared to outputs of the same age (99th percentile)
  • High Attention Score compared to outputs of the same age and source (93rd percentile)

Mentioned by

news
40 news outlets
blogs
1 blog
twitter
31 X users
facebook
1 Facebook page

Citations

dimensions_citation
35 Dimensions

Readers on

mendeley
119 Mendeley
Title
Sorting Through the Safety Data Haystack: Using Machine Learning to Identify Individual Case Safety Reports in Social-Digital Media
Published in
Drug Safety, February 2018
DOI 10.1007/s40264-018-0641-7
Pubmed ID
Authors

Shaun Comfort, Sujan Perera, Zoe Hudson, Darren Dorrell, Shawman Meireis, Meenakshi Nagarajan, Cartic Ramakrishnan, Jennifer Fine

Abstract

There is increasing interest in social digital media (SDM) as a data source for pharmacovigilance activities; however, SDM is considered a low information content data source for safety data. Given that pharmacovigilance itself operates in a high-noise, lower-validity environment without objective 'gold standards' beyond process definitions, the introduction of large volumes of SDM into the pharmacovigilance workflow has the potential to exacerbate issues with limited manual resources to perform adverse event identification and processing. Recent advances in medical informatics have resulted in methods for developing programs which can assist human experts in the detection of valid individual case safety reports (ICSRs) within SDM. In this study, we developed rule-based and machine learning (ML) models for classifying ICSRs from SDM and compared their performance with that of human pharmacovigilance experts. We used a random sampling from a collection of 311,189 SDM posts that mentioned Roche products and brands in combination with common medical and scientific terms sourced from Twitter, Tumblr, Facebook, and a spectrum of news media blogs to develop and evaluate three iterations of an automated ICSR classifier. The ICSR classifier models consisted of sub-components to annotate the relevant ICSR elements and a component to make the final decision on the validity of the ICSR. Agreement with human pharmacovigilance experts was chosen as the preferred performance metric and was evaluated by calculating the Gwet AC1 statistic (gKappa). The best performing model was tested against the Roche global pharmacovigilance expert using a blind dataset and put through a time test of the full 311,189-post dataset. During this effort, the initial strict rule-based approach to ICSR classification resulted in a model with an accuracy of 65% and a gKappa of 46%. Adding an ML-based adverse event annotator improved the accuracy to 74% and gKappa to 60%. This was further improved by the addition of an additional ML ICSR detector. On a blind test set of 2500 posts, the final model demonstrated a gKappa of 78% and an accuracy of 83%. In the time test, it took the final model 48 h to complete a task that would have taken an estimated 44,000 h for human experts to perform. The results of this study indicate that an effective and scalable solution to the challenge of ICSR detection in SDM includes a workflow using an automated ML classifier to identify likely ICSRs for further human SME review.

X Demographics

X Demographics

The data shown below were collected from the profiles of 31 X users who shared this research output. Click here to find out more about how the information was compiled.
Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 119 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
Unknown 119 100%

Demographic breakdown

Readers by professional status Count As %
Student > Master 21 18%
Researcher 13 11%
Other 10 8%
Student > Ph. D. Student 10 8%
Student > Bachelor 8 7%
Other 21 18%
Unknown 36 30%
Readers by discipline Count As %
Computer Science 17 14%
Medicine and Dentistry 10 8%
Nursing and Health Professions 9 8%
Engineering 8 7%
Pharmacology, Toxicology and Pharmaceutical Science 7 6%
Other 20 17%
Unknown 48 40%
Attention Score in Context

Attention Score in Context

This research output has an Altmetric Attention Score of 351. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 11 April 2019.
All research outputs
#93,740
of 25,698,912 outputs
Outputs from Drug Safety
#9
of 1,870 outputs
Outputs of similar age
#2,445
of 457,488 outputs
Outputs of similar age from Drug Safety
#2
of 29 outputs
Altmetric has tracked 25,698,912 research outputs across all sources so far. Compared to these this one has done particularly well and is in the 99th percentile: it's in the top 5% of all research outputs ever tracked by Altmetric.
So far Altmetric has tracked 1,870 research outputs from this source. They typically receive a lot more attention than average, with a mean Attention Score of 12.1. This one has done particularly well, scoring higher than 99% of its peers.
Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 457,488 tracked outputs that were published within six weeks on either side of this one in any source. This one has done particularly well, scoring higher than 99% of its contemporaries.
We're also able to compare this research output to 29 others from the same source and published within six weeks on either side of this one. This one has done particularly well, scoring higher than 93% of its contemporaries.