TY - JOUR
T1 - Conditional sliding windows
T2 - An approach for handling data limitation in colorectal histopathology image classification
AU - Haryanto, Toto
AU - Suhartanto, Heru
AU - Arymurthy, Aniati Murni
AU - Kusmardi, null
N1 - Funding Information:
This research was supported by grant scheme Penelitian Terapan Unggulan Perguruan Tinggi (PTUPT) 2020 [grant number: No. 8/AMD/E1/KP.PTNBH/2020 with Contract Number: 8/AMD/E1/KP.PTNBH/2020 and 332/PKS/R/UI/2020 date Mei, 11 2020 ], Ministry of Research and Technology, Republic of Indonesia .
Publisher Copyright:
© 2021 The Authors
PY - 2021/1
Y1 - 2021/1
N2 - Large amounts of data are required for the training process with a convolutional neural network (CNN) because small datasets with low variation will cause over-fitting, and the model cannot predict new data with high accuracy. Additionally, the non-availability of histopathological medical data presents an issue because without ethical permission, such data cannot be obtained easily. Therefore, this study proposes a conditional sliding window algorithm to obtain sub-sample data on images of histopathology. Two sets of original data were used, one from the Warwick dataset with dimensions of 775 × 522 pixels and the other from the Department of Pathology and Anatomy, Faculty of Medicine Universitas Indonesia. The algorithm used was inspired by the conventional sliding window method, but implemented with added conditions, such as sliding the window algorithm from the left on (x,y) pixel coordinates, thereby moving from left to right, then up to down until the entire image was covered. Consequently, the new image was produced with two dimensions: 200 × 200 and 300 × 300 pixels. However, to avoid loss of information, the 25 and 50 pixels overlap were used. In this study, CNN 7-5-7 was designed and proposed to perform the process. The conditional sliding window algorithm can produce various sub-samples depending on the image and window size. Furthermore, the images produced were used to develop a CNN and were proven to accurately predict benign and malignant tissues compared to the model from the original dataset. Moreover, the sensitivity values of the Warwick public dataset and the one generated in this study are above 0.80, which shows that the proposed CNN architecture is more stable compared to the existing methods such as AlexNet and DenseNet121. This study succeeded in solving the limitations of colorectal histopathological training data by developing a conditional sliding window algorithm. This algorithm can be applied to generate other histopathological data. Moreover, our proposed CNN 7-5-7 is the fastest architecture for training, comparable to state-of-the-art methodologies. Furthermore, the dataset was used to develop the model for colorectal cancer identification and integrated on the web-based application for further implementation.
AB - Large amounts of data are required for the training process with a convolutional neural network (CNN) because small datasets with low variation will cause over-fitting, and the model cannot predict new data with high accuracy. Additionally, the non-availability of histopathological medical data presents an issue because without ethical permission, such data cannot be obtained easily. Therefore, this study proposes a conditional sliding window algorithm to obtain sub-sample data on images of histopathology. Two sets of original data were used, one from the Warwick dataset with dimensions of 775 × 522 pixels and the other from the Department of Pathology and Anatomy, Faculty of Medicine Universitas Indonesia. The algorithm used was inspired by the conventional sliding window method, but implemented with added conditions, such as sliding the window algorithm from the left on (x,y) pixel coordinates, thereby moving from left to right, then up to down until the entire image was covered. Consequently, the new image was produced with two dimensions: 200 × 200 and 300 × 300 pixels. However, to avoid loss of information, the 25 and 50 pixels overlap were used. In this study, CNN 7-5-7 was designed and proposed to perform the process. The conditional sliding window algorithm can produce various sub-samples depending on the image and window size. Furthermore, the images produced were used to develop a CNN and were proven to accurately predict benign and malignant tissues compared to the model from the original dataset. Moreover, the sensitivity values of the Warwick public dataset and the one generated in this study are above 0.80, which shows that the proposed CNN architecture is more stable compared to the existing methods such as AlexNet and DenseNet121. This study succeeded in solving the limitations of colorectal histopathological training data by developing a conditional sliding window algorithm. This algorithm can be applied to generate other histopathological data. Moreover, our proposed CNN 7-5-7 is the fastest architecture for training, comparable to state-of-the-art methodologies. Furthermore, the dataset was used to develop the model for colorectal cancer identification and integrated on the web-based application for further implementation.
KW - Augmentation
KW - Conditional sliding windows
KW - Convolutional neural network
KW - Histopathology
UR - http://www.scopus.com/inward/record.url?scp=85104075953&partnerID=8YFLogxK
U2 - 10.1016/j.imu.2021.100565
DO - 10.1016/j.imu.2021.100565
M3 - Article
AN - SCOPUS:85104075953
SN - 2352-9148
VL - 23
JO - Informatics in Medicine Unlocked
JF - Informatics in Medicine Unlocked
M1 - 100565
ER -