Risk Classification for NSCLC Survival using Microarray and Clinical Data
Lung cancer is one of the most common cancers in the world, and Non-Small Cell Lung Cancer (NSCLC) is the
most dangerous and common type of lung cancer. Therefore, it is of paramount importance to predict NSCLC survival, so that
suitable treatments can be sought. Nonetheless, conventional methods of risk classification of cancer survival rely solely on
histopathology data and predictions are not reliable in many cases. In this paper, we proposed a risk classification model using
high-throughput gene expression data and clinical data to predict NSCLC survival. We used Gain Ratio (GR) and Improved
Gene Expression Programming (IGEP) algorithms for attribute selection. For classification, we used Support Vector Machine
(SVM) alongside with 10-fold cross validation. The results demonstrated the effectiveness of the proposed model with the
average accuracy 90.7% which is higher than other representative models. We obtained three gens LCK, DUSP6 and ERBB3
with T_stage and N_stage clinical factors that can get good prediction results.
Index Terms- Lung cancer, Risk classification, microarray dataset, clinical data.