TY - JOUR
T1 - Subtype of cancer identification for patient survival prediction using semi supervised method
AU - Wasito, Ito
AU - Veritawati, Ionia
PY - 2012/8
Y1 - 2012/8
N2 - Recently, there are number existing techniques in the literature for performing cancer subtype identification. However, most of these techniques assume that different subtypes of cancer are already known to exist. Even though methods for identifying such subtypes exist, these methods work well only for specific datasets. For those reasons, it would be desirable to develop a procedure to find such subtypes on small set of genes that relevant to the clinical data that is applicable in a wide variety of circumstances. Finally, those identified subtypes would be very useful to predict accurate future patient survival. We used experimental data from [13] that consist of 1,536 genes in 100 colorectal carcinoma cancer and 11 normal tissues. Firstly, we identify relevant genes those correlated with patient survival time data. The genes will be selected using Cox regression technique for further analysis by considering only the genes with a p-value less than 0.01. Based on our computation, 63 best genes have been identified for prediction patient survival analysis. Then, 2-means clustering is applied to find the patients subgroups using those 63 genes. Having subgroups identified, we apply Support Vector Machines (SVM) to classify the future patient survival prediction into appropriate subgroup. For the existence of tumour clinical data, we successfully identify 2 subgroups of patients with significant pvalues based on Kaplan-Meier graph. On the existence of metastasis clinical data, we are also successful to discover 2 subgroups for each group. Even though there is no prior subtypes information is exist, we able still predict survival time of cancer patients using combination unsupervised and supervised method called as "semi supervised" methods. The results show our proposed methods successfully unveil subgroups on various colorectal carcinoma parameters. The only partly success is on lymphnode parameter that our proposed method could successfully identify different survival time on lymphnode-0 and lymphnode-1 with significant p-value.
AB - Recently, there are number existing techniques in the literature for performing cancer subtype identification. However, most of these techniques assume that different subtypes of cancer are already known to exist. Even though methods for identifying such subtypes exist, these methods work well only for specific datasets. For those reasons, it would be desirable to develop a procedure to find such subtypes on small set of genes that relevant to the clinical data that is applicable in a wide variety of circumstances. Finally, those identified subtypes would be very useful to predict accurate future patient survival. We used experimental data from [13] that consist of 1,536 genes in 100 colorectal carcinoma cancer and 11 normal tissues. Firstly, we identify relevant genes those correlated with patient survival time data. The genes will be selected using Cox regression technique for further analysis by considering only the genes with a p-value less than 0.01. Based on our computation, 63 best genes have been identified for prediction patient survival analysis. Then, 2-means clustering is applied to find the patients subgroups using those 63 genes. Having subgroups identified, we apply Support Vector Machines (SVM) to classify the future patient survival prediction into appropriate subgroup. For the existence of tumour clinical data, we successfully identify 2 subgroups of patients with significant pvalues based on Kaplan-Meier graph. On the existence of metastasis clinical data, we are also successful to discover 2 subgroups for each group. Even though there is no prior subtypes information is exist, we able still predict survival time of cancer patients using combination unsupervised and supervised method called as "semi supervised" methods. The results show our proposed methods successfully unveil subgroups on various colorectal carcinoma parameters. The only partly success is on lymphnode parameter that our proposed method could successfully identify different survival time on lymphnode-0 and lymphnode-1 with significant p-value.
KW - Cancer subtype
KW - Cox regression
KW - K-Means clustering
KW - Patient survival
KW - Semi supervised
UR - http://www.scopus.com/inward/record.url?scp=84864995769&partnerID=8YFLogxK
U2 - 10.4156/jcit.vol7.issue14.25
DO - 10.4156/jcit.vol7.issue14.25
M3 - Article
AN - SCOPUS:84864995769
SN - 1975-9320
VL - 7
SP - 215
EP - 222
JO - Journal of Convergence Information Technology
JF - Journal of Convergence Information Technology
IS - 14
ER -