Abstract
Analysis of early cancer prognosis is necessary to determine the proper treatment for each patient. Furthermore, as microarray DNA has high dimensional data it would lead to a challenging task. Several studies in high dimensionality reduction have been conducted to determine significant genes with least error in cancer classification. One of those studies implements mining process such as feature selection using parametric and non-parametric statistical tests. Other than feature selection, data integration is also believed as an optimal solution in increasing cancer classification performance. In this paper, dataset containing gene expression value and clinical parameters observed from 60 breast cancer patients is used for experiment. The experiment consists of integrating data using early kernel based data integration model with modification in its dimensionality reduction step. In the existing related research, kernel dimensionality reduction is used. In this paper, mining process using several parametric and non-parametric based statistical tests is used as the replacement of kernel dimensionality reduction. The last step in kernel based data integration is classification using Support Vector Machine (SVM). Ten-fold cross validation scheme is used in the experiment. SVM with linear kernel gives the best accuracy rate compared to other kernels.
Original language | English |
---|---|
Pages (from-to) | 5489-5498 |
Number of pages | 10 |
Journal | Journal of Theoretical and Applied Information Technology |
Volume | 96 |
Issue number | 16 |
Publication status | Published - 31 Aug 2018 |
Keywords
- Data integration
- Gene expressions
- Kernel dimensionality
- Kernel method
- Recurrent cancer