↓ Skip to main content

Integrating Natural Language Processing and Machine Learning Algorithms to Categorize Oncologic Response in Radiology Reports

Overview of attention for article published in Journal of Digital Imaging, October 2017
Altmetric Badge

About this Attention Score

  • In the top 25% of all research outputs scored by Altmetric
  • High Attention Score compared to outputs of the same age (82nd percentile)
  • High Attention Score compared to outputs of the same age and source (80th percentile)

Mentioned by

twitter
15 X users
patent
2 patents

Citations

dimensions_citation
79 Dimensions

Readers on

mendeley
130 Mendeley
Title
Integrating Natural Language Processing and Machine Learning Algorithms to Categorize Oncologic Response in Radiology Reports
Published in
Journal of Digital Imaging, October 2017
DOI 10.1007/s10278-017-0027-x
Pubmed ID
Authors

Po-Hao Chen, Hanna Zafar, Maya Galperin-Aizenberg, Tessa Cook

Abstract

A significant volume of medical data remains unstructured. Natural language processing (NLP) and machine learning (ML) techniques have shown to successfully extract insights from radiology reports. However, the codependent effects of NLP and ML in this context have not been well-studied. Between April 1, 2015 and November 1, 2016, 9418 cross-sectional abdomen/pelvis CT and MR examinations containing our internal structured reporting element for cancer were separated into four categories: Progression, Stable Disease, Improvement, or No Cancer. We combined each of three NLP techniques with five ML algorithms to predict the assigned label using the unstructured report text and compared the performance of each combination. The three NLP algorithms included term frequency-inverse document frequency (TF-IDF), term frequency weighting (TF), and 16-bit feature hashing. The ML algorithms included logistic regression (LR), random decision forest (RDF), one-vs-all support vector machine (SVM), one-vs-all Bayes point machine (BPM), and fully connected neural network (NN). The best-performing NLP model consisted of tokenized unigrams and bigrams with TF-IDF. Increasing N-gram length yielded little to no added benefit for most ML algorithms. With all parameters optimized, SVM had the best performance on the test dataset, with 90.6 average accuracy and F score of 0.813. The interplay between ML and NLP algorithms and their effect on interpretation accuracy is complex. The best accuracy is achieved when both algorithms are optimized concurrently.

X Demographics

X Demographics

The data shown below were collected from the profiles of 15 X users who shared this research output. Click here to find out more about how the information was compiled.
Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 130 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
Unknown 130 100%

Demographic breakdown

Readers by professional status Count As %
Researcher 27 21%
Student > Ph. D. Student 18 14%
Student > Bachelor 15 12%
Lecturer 8 6%
Student > Master 8 6%
Other 26 20%
Unknown 28 22%
Readers by discipline Count As %
Medicine and Dentistry 36 28%
Computer Science 19 15%
Engineering 10 8%
Nursing and Health Professions 5 4%
Business, Management and Accounting 4 3%
Other 21 16%
Unknown 35 27%
Attention Score in Context

Attention Score in Context

This research output has an Altmetric Attention Score of 11. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 11 July 2023.
All research outputs
#3,112,862
of 24,585,148 outputs
Outputs from Journal of Digital Imaging
#90
of 1,119 outputs
Outputs of similar age
#57,601
of 333,585 outputs
Outputs of similar age from Journal of Digital Imaging
#5
of 21 outputs
Altmetric has tracked 24,585,148 research outputs across all sources so far. Compared to these this one has done well and is in the 87th percentile: it's in the top 25% of all research outputs ever tracked by Altmetric.
So far Altmetric has tracked 1,119 research outputs from this source. They receive a mean Attention Score of 4.6. This one has done particularly well, scoring higher than 91% of its peers.
Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 333,585 tracked outputs that were published within six weeks on either side of this one in any source. This one has done well, scoring higher than 82% of its contemporaries.
We're also able to compare this research output to 21 others from the same source and published within six weeks on either side of this one. This one has done well, scoring higher than 80% of its contemporaries.