↓ Skip to main content

isGPT: An optimized model to identify sub-Golgi protein types using SVM and Random Forest based feature selection

Overview of attention for article published in Artificial Intelligence in Medicine, November 2017
Altmetric Badge

Readers on

mendeley
44 Mendeley
You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.
Title
isGPT: An optimized model to identify sub-Golgi protein types using SVM and Random Forest based feature selection
Published in
Artificial Intelligence in Medicine, November 2017
DOI 10.1016/j.artmed.2017.11.003
Pubmed ID
Authors

M. Saifur Rahman, Khaledur Rahman, M. Kaykobad, M. Sohel Rahman

Abstract

The Golgi Apparatus (GA) is a key organelle for protein synthesis within the eukaryotic cell. The main task of GA is to modify and sort proteins for transport throughout the cell. Proteins permeate through the GA on the ER (Endoplasmic Reticulum) facing side (cis side) and depart on the other side (trans side). Based on this phenomenon, we get two types of GA proteins, namely, cis-Golgi protein and trans-Golgi protein. Any dysfunction of GA proteins can result in congenital glycosylation disorders and some other forms of difficulties that may lead to neurodegenerative and inherited diseases like diabetes, cancer and cystic fibrosis. So, the exact classification of GA proteins may contribute to drug development which will further help in medication. In this paper, we focus on building a new computational model that not only introduces easy ways to extract features from protein sequences but also optimizes classification of trans-Golgi and cis-Golgi proteins. After feature extraction, we have employed Random Forest (RF) model to rank the features based on the importance score obtained from it. After selecting the top ranked features, we have applied Support Vector Machine (SVM) to classify the sub-Golgi proteins. We have trained regression model as well as classification model and found the former to be superior. The model shows improved performance over all previous methods. As the benchmark dataset is significantly imbalanced, we have applied Synthetic Minority Over-sampling Technique (SMOTE) to the dataset to make it balanced and have conducted experiments on both versions. Our method, namely, identification of sub-Golgi Protein Types (isGPT), achieves accuracy values of 95.4%, 95.9% and 95.3% for 10-fold cross-validation test, jackknife test and independent test respectively. According to different performance metrics, isGPT performs better than state-of-the-art techniques. The source code of isGPT, along with relevant dataset and detailed experimental results, can be found at https://github.com/srautonu/isGPT.

Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 44 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
Unknown 44 100%

Demographic breakdown

Readers by professional status Count As %
Student > Ph. D. Student 8 18%
Student > Postgraduate 3 7%
Researcher 3 7%
Student > Doctoral Student 2 5%
Student > Bachelor 2 5%
Other 8 18%
Unknown 18 41%
Readers by discipline Count As %
Computer Science 10 23%
Engineering 3 7%
Medicine and Dentistry 3 7%
Social Sciences 2 5%
Economics, Econometrics and Finance 1 2%
Other 3 7%
Unknown 22 50%