Report for: Integrating Natural Language Processing and Machine Learning Algorithms to Categorize Oncologic Response in Radiology Reports

Title	Integrating Natural Language Processing and Machine Learning Algorithms to Categorize Oncologic Response in Radiology Reports
Published in	Journal of Digital Imaging, October 2017
DOI	10.1007/s10278-017-0027-x
Pubmed ID	29079959
Authors	Po-Hao Chen, Hanna Zafar, Maya Galperin-Aizenberg, Tessa Cook
Abstract	A significant volume of medical data remains unstructured. Natural language processing (NLP) and machine learning (ML) techniques have shown to successfully extract insights from radiology reports. However, the codependent effects of NLP and ML in this context have not been well-studied. Between April 1, 2015 and November 1, 2016, 9418 cross-sectional abdomen/pelvis CT and MR examinations containing our internal structured reporting element for cancer were separated into four categories: Progression, Stable Disease, Improvement, or No Cancer. We combined each of three NLP techniques with five ML algorithms to predict the assigned label using the unstructured report text and compared the performance of each combination. The three NLP algorithms included term frequency-inverse document frequency (TF-IDF), term frequency weighting (TF), and 16-bit feature hashing. The ML algorithms included logistic regression (LR), random decision forest (RDF), one-vs-all support vector machine (SVM), one-vs-all Bayes point machine (BPM), and fully connected neural network (NN). The best-performing NLP model consisted of tokenized unigrams and bigrams with TF-IDF. Increasing N-gram length yielded little to no added benefit for most ML algorithms. With all parameters optimized, SVM had the best performance on the test dataset, with 90.6 average accuracy and F score of 0.813. The interplay between ML and NLP algorithms and their effect on interpretation accuracy is complex. The best accuracy is achieved when both algorithms are optimized concurrently.

View on publisher site Alert me about new mentions

X Demographics

The data shown below were collected from the profiles of 15 X users who shared this research output. Click here to find out more about how the information was compiled.

Geographical breakdown

Country	Count	As %
United States	6	40%
Spain	2	13%
Germany	1	7%
United Kingdom	1	7%
Italy	1	7%
Unknown	4	27%

Demographic breakdown

Type	Count	As %
Members of the public	10	67%
Practitioners (doctors, other healthcare professionals)	4	27%
Scientists	1	7%

Mendeley readers

The data shown below were compiled from readership statistics for 130 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country	Count	As %
Unknown	130	100%

Demographic breakdown

Readers by professional status	Count	As %
Researcher	27	21%
Student > Ph. D. Student	18	14%
Student > Bachelor	15	12%
Lecturer	8	6%
Student > Master	8	6%
Other	26	20%
Unknown	28	22%

Readers by discipline	Count	As %
Medicine and Dentistry	36	28%
Computer Science	19	15%
Engineering	10	8%
Nursing and Health Professions	5	4%
Business, Management and Accounting	4	3%
Other	21	16%
Unknown	35	27%

Attention Score in Context

This research output has an Altmetric Attention Score of 11. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 11 July 2023.

All research outputs

#3,112,862

of 24,585,148 outputs

Outputs from Journal of Digital Imaging

#90

of 1,119 outputs

Outputs of similar age

#57,601

of 333,585 outputs

Outputs of similar age from Journal of Digital Imaging

of 21 outputs

Altmetric has tracked 24,585,148 research outputs across all sources so far. Compared to these this one has done well and is in the 87th percentile: it's in the top 25% of all research outputs ever tracked by Altmetric.

So far Altmetric has tracked 1,119 research outputs from this source. They receive a mean Attention Score of 4.6. This one has done particularly well, scoring higher than 91% of its peers.

Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 333,585 tracked outputs that were published within six weeks on either side of this one in any source. This one has done well, scoring higher than 82% of its contemporaries.

We're also able to compare this research output to 21 others from the same source and published within six weeks on either side of this one. This one has done well, scoring higher than 80% of its contemporaries.

Integrating Natural Language Processing and Machine Learning Algorithms to Categorize Oncologic Response in Radiology Reports

About this Attention Score

Mentioned by

Citations

Readers on

X Demographics

Geographical breakdown

Demographic breakdown

Mendeley readers

Geographical breakdown

Demographic breakdown

Attention Score in Context