Healthcare Text Classification System and its Performance Evaluation: A Source of Better Intelligence by Characterizing Healthcare Text

Overview of attention for article published in Journal of Medical Systems, April 2018

Altmetric Badge

Citations

dimensions_citation: 12 Dimensions

Readers on

mendeley: 49 Mendeley

Summary Dimensions citations

You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.

Title	Healthcare Text Classification System and its Performance Evaluation: A Source of Better Intelligence by Characterizing Healthcare Text
Published in	Journal of Medical Systems, April 2018
DOI	10.1007/s10916-018-0941-6
Pubmed ID	29654417
Authors	Saurabh Kumar Srivastava, Sandeep Kumar Singh, Jasjit S. Suri
Abstract	A machine learning (ML)-based text classification system has several classifiers. The performance evaluation (PE) of the ML system is typically driven by the training data size and the partition protocols used. Such systems lead to low accuracy because the text classification systems lack the ability to model the input text data in terms of noise characteristics. This research study proposes a concept of misrepresentation ratio (MRR) on input healthcare text data and models the PE criteria for validating the hypothesis. Further, such a novel system provides a platform to amalgamate several attributes of the ML system such as: data size, classifier type, partitioning protocol and percentage MRR. Our comprehensive data analysis consisted of five types of text data sets (TwitterA, WebKB4, Disease, Reuters (R8), and SMS); five kinds of classifiers (support vector machine with linear kernel (SVM-L), MLP-based neural network, AdaBoost, stochastic gradient descent and decision tree); and five types of training protocols (K2, K4, K5, K10 and JK). Using the decreasing order of MRR, our ML system demonstrates the mean classification accuracies as: 70.13 ± 0.15%, 87.34 ± 0.06%, 93.73 ± 0.03%, 94.45 ± 0.03% and 97.83 ± 0.01%, respectively, using all the classifiers and protocols. The corresponding AUC is 0.98 for SMS data using Multi-Layer Perceptron (MLP) based neural network. All the classifiers, the best accuracy of 91.84 ± 0.04% is shown to be of MLP-based neural network and this is 6% better over previously published. Further we observed that as MRR decreases, the system robustness increases and validated by standard deviations. The overall text system accuracy using all data types, classifiers, protocols is 89%, thereby showing the entire ML system to be novel, robust and unique. The system is also tested for stability and reliability.

View on publisher site Alert me about new mentions

Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 49 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country	Count	As %
Unknown	49	100%

Demographic breakdown

Readers by professional status	Count	As %
Unspecified	8	16%
Student > Master	8	16%
Student > Doctoral Student	5	10%
Researcher	4	8%
Professor	2	4%
Other	4	8%
Unknown	18	37%

Readers by discipline	Count	As %
Computer Science	10	20%
Unspecified	8	16%
Medicine and Dentistry	6	12%
Business, Management and Accounting	2	4%
Agricultural and Biological Sciences	1	2%
Other	3	6%
Unknown	19	39%