TY - JOUR
T1 - Biclustering analysis using plaid model on gene expression data of colon cancer
AU - Siswantining, Titin
AU - Eriza Aminanto, A.
AU - Sarwinda, Devvi
AU - Swasti, Olivia
N1 - Funding Information:
This research was supported by PUTI research grant with contract number: NKB-1955/-UN2.RST/HKP.05.00/2020. We would like to thank our colleagues from Directorate of Research and Community Engagement Universitas Indonesia who provided insights and expertise to improve this research in innumerable ways.
Publisher Copyright:
© 2021, Austrian Statistical Society. All rights reserved.
PY - 2021
Y1 - 2021
N2 - Unlike other typical clustering analysis, which considers column only, biclustering analysis processes a matrix into sub-matrices based on rows and columns simultaneously. One method of bicluster analysis uses the probabilistic model, like the plaid model, that pro-vides overlapping bicluster. The plaid model calculates the value of an element given from a particular sub-matrix for each cell; thus, the value can be seen as the number of contri-butions of a particular bicluster. The algorithm begins with preparing the input data as a matrix, then an initial model is assessed and makes a residual matrix from the model. After that, we determine bicluster candidates, which are evaluated for its effect parameters and bicluster membership parameters. Finally, the bicluster candidate is pruned to give the optimal bicluster. We implemented the algorithm on gene expression dataset of colon cancer, where the rows and columns contain observations and types of genes, respectively. We carried out in six distinct scenarios in which each scenario uses different model parameters and threshold values. We measured the results using Jaccard index and coherence variance. Our experiments show that biclustering analysis on a model with mean, row, and column effects of colon cancer data output low coherence variance.
AB - Unlike other typical clustering analysis, which considers column only, biclustering analysis processes a matrix into sub-matrices based on rows and columns simultaneously. One method of bicluster analysis uses the probabilistic model, like the plaid model, that pro-vides overlapping bicluster. The plaid model calculates the value of an element given from a particular sub-matrix for each cell; thus, the value can be seen as the number of contri-butions of a particular bicluster. The algorithm begins with preparing the input data as a matrix, then an initial model is assessed and makes a residual matrix from the model. After that, we determine bicluster candidates, which are evaluated for its effect parameters and bicluster membership parameters. Finally, the bicluster candidate is pruned to give the optimal bicluster. We implemented the algorithm on gene expression dataset of colon cancer, where the rows and columns contain observations and types of genes, respectively. We carried out in six distinct scenarios in which each scenario uses different model parameters and threshold values. We measured the results using Jaccard index and coherence variance. Our experiments show that biclustering analysis on a model with mean, row, and column effects of colon cancer data output low coherence variance.
KW - Biclustering
KW - Expression gene dataset
KW - Overlapping bicluster
KW - Plaid model
UR - http://www.scopus.com/inward/record.url?scp=85114037805&partnerID=8YFLogxK
U2 - 10.17713/ajs.v50i5.1195
DO - 10.17713/ajs.v50i5.1195
M3 - Article
AN - SCOPUS:85114037805
SN - 1026-597X
VL - 50
SP - 101
EP - 114
JO - Austrian Journal of Statistics
JF - Austrian Journal of Statistics
IS - 5
ER -