Sign language is a method of communication used by people who suffer from hearing impairment. Several countries have invented their own sign language including Indonesia's Sistem Isyarat Bahasa Indonesia (SIBI). The uniqueness of SIBI is that it applies the Indonesian language grammatical structure to the hand motions used and uses facial expressions as the replacement of intonation and lip motions to pronounce a word. From previous studies, to increase the accuracy of sign language recognition it is necessary to supplement the hand tracking with lip reading. This study proposes a model to recognize inflectional word gestures through lip motions. From the proposed model, this research found that the sequence of lip motions are very similar to each other, which resulted in very low accuracy in determining prefix (21,64%), root words (23,87%), and suffixes (22,61%), in SIBI just by analyzing the sequence of lip motions. Using similarity-groupings determined in this study, the accuracy of the model proposed can be increased significantly with the accuracy increasing to prefix (73,94%), root words (83,98%), and suffixes (82,10%). Additionally, another cause for the low accuracy of the model is that the labeling was done according to the hand gestures, this resulted in the frame by frame sequence that was captured to not be representative of the entire lip motions.