TY - GEN
T1 - Handwriting recognition on form document using convolutional neural network and support vector machines (CNN-SVM)
AU - Darmatasia,
AU - Fanany, Mohamad Ivan
N1 - Publisher Copyright:
© 2017 IEEE.
PY - 2017/10/18
Y1 - 2017/10/18
N2 - In this paper, we propose a workflow and a machine learning model for recognizing handwritten characters on form document. The learning model is based on Convolutional Neural Network (CNN) as a powerful feature extraction and Support Vector Machines (SVM) as a high-end classifier. The proposed method is more efficient than modifying the CNN with complex architecture. We evaluated some SVM and found that the linear SVM using L1 loss function and L2 regularization giving the best performance both of the accuracy rate and the computation time. Based on the experiment results using data from NIST SD 192nd edition both for training and testing, the proposed method which combines CNN and linear SVM using L1 loss function and L2 regularization achieved a recognition rate better than only CNN. The recognition rate achieved by the proposed method are 98.85% on numeral characters, 93.05% on uppercase characters, 86.21% on lowercase characters, and 91.37% on the merger of numeral and uppercase characters. While the original CNN achieves an accuracy rate of 98.30% on numeral characters, 92.33% on uppercase characters, 83.54% on lowercase characters, and 88.32% on the merger of numeral and uppercase characters. The proposed method was also validated by using ten folds cross-validation, and it shows that the proposed method still can improve the accuracy rate. The learning model was used to construct a handwriting recognition system to recognize a more challenging data on form document automatically. The pre-processing, segmentation and character recognition are integrated into one system. The output of the system is converted into an editable text. The system gives an accuracy rate of 83.37% on ten different test form document.
AB - In this paper, we propose a workflow and a machine learning model for recognizing handwritten characters on form document. The learning model is based on Convolutional Neural Network (CNN) as a powerful feature extraction and Support Vector Machines (SVM) as a high-end classifier. The proposed method is more efficient than modifying the CNN with complex architecture. We evaluated some SVM and found that the linear SVM using L1 loss function and L2 regularization giving the best performance both of the accuracy rate and the computation time. Based on the experiment results using data from NIST SD 192nd edition both for training and testing, the proposed method which combines CNN and linear SVM using L1 loss function and L2 regularization achieved a recognition rate better than only CNN. The recognition rate achieved by the proposed method are 98.85% on numeral characters, 93.05% on uppercase characters, 86.21% on lowercase characters, and 91.37% on the merger of numeral and uppercase characters. While the original CNN achieves an accuracy rate of 98.30% on numeral characters, 92.33% on uppercase characters, 83.54% on lowercase characters, and 88.32% on the merger of numeral and uppercase characters. The proposed method was also validated by using ten folds cross-validation, and it shows that the proposed method still can improve the accuracy rate. The learning model was used to construct a handwriting recognition system to recognize a more challenging data on form document automatically. The pre-processing, segmentation and character recognition are integrated into one system. The output of the system is converted into an editable text. The system gives an accuracy rate of 83.37% on ten different test form document.
KW - CNN
KW - SVM
KW - form document
KW - handwriting recognition
UR - http://www.scopus.com/inward/record.url?scp=85039957894&partnerID=8YFLogxK
U2 - 10.1109/ICoICT.2017.8074699
DO - 10.1109/ICoICT.2017.8074699
M3 - Conference contribution
AN - SCOPUS:85039957894
T3 - 2017 5th International Conference on Information and Communication Technology, ICoIC7 2017
BT - 2017 5th International Conference on Information and Communication Technology, ICoIC7 2017
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 5th International Conference on Information and Communication Technology, ICoIC7 2017
Y2 - 17 May 2017 through 19 May 2017
ER -