Abstract
Similarity-based biclustering (SBB) algorithm consists of four main phases, transforming data, the construction of row (gene) and column (condition) similarity matrices, the clustering of each similarity matrix and the extraction of the bicluster. In this study, we modified the SBB algorithm at the stage of data transformation using min-max normalisation to identify significant biclusters in diabetic nephropathy and retinopathy microarray data after genes are selected using relative deviations and absolute deviations. Based on the comparison of the silhouette index validation experiments, SBB using partitioning around medoids (PAM) provided better clustering of genes and samples than K-means and agglomerative hierarchical clustering (AHC) (Ward's linkage). Furthermore, the proposed technique identified a meaningful non-overlapping bicluster on a real dataset. Using gene ontology (GO) enrichment analysis and the Bonferroni correction, we have identified biological evidence in each bicluster that is significant in terms of gene functions and biological processes.
Original language | English |
---|---|
Pages (from-to) | 343-362 |
Number of pages | 20 |
Journal | International Journal of Bioinformatics Research and Applications |
Volume | 17 |
Issue number | 4 |
DOIs | |
Publication status | Published - 2021 |
Keywords
- Agglomerative hierarchical clustering
- Biclustering
- Diabetic nephropathy
- Diabetic retinopathy
- Gene expression
- K-means
- Microarray data
- PAM
- Partitioning around medoids
- SBB
- Similarity-based biclustering