In 2016, diabetes mellitus type two (T2DM) was one of the leading death causes. The T2DM treatment added inhibitors of the dipeptidyl peptidase-4 (DPP-IV) to the algorithm. Today, DPP-IV inhibitors' marketed forms still have adverse side effects. In this research, we propose the K-means clustering algorithm for the cluster analysis technique by first specifying the best number of clusters by applying several cluster validation approaches then we validate the selection of the most representative DPP-IV bioactive molecules according to Lipinski’s Rule of Five which can be potential for QSAR modelling in the T2DM new drug discovery. The evaluation to determine the best number of clusters is done by using five different seeds (pseudo-random number generator). The clusters obtained from the algorithm group a homogeneous cluster of molecules concerning their molecular descriptors. We obtain three clusters from the cluster analysis process, and the data set of 100 bioactive molecules of DPP-IV can be potential for designing new drug candidates for the treatment of T2DM.