Latent Semantic Analysis Based Cross Language Plagiarism Detection System with Support Vector Machine Classifier

Research output: Contribution to journalArticlepeer-review

Abstract

Department of Electrical Engineering, Universitas Indonesia has developed a cross language plagiarism detection system based on Latent Semantic Analysis (LSA) between Indonesian and English papers. The system will generate Frobenius norm, slice, and pad as the output data. This paper explains and provides analysis on the development of plagiarism detection system, namely by applying the Support Vector Machine (SVM) algorithm. SVM divides the output data into two classes, namely "plagiarism" and "not plagiarism" by using two methods, a combination of input data and output data and the AND method. Several modifications to the program input has been made, including varying the parameters of learning and the output data of the program. Using balance of precision and relevance of the program, the accuracy of the SVM is 63,15%. However, when viewed through the percentage of the amount of data that appropriately classified, the accuracy of the SVM is 97.04%.
Original languageEnglish
Pages (from-to)13-26
JournalInternational Journal of Software Engineering and its Applications
Volume11
DOIs
Publication statusPublished - 31 May 2017

Fingerprint

Dive into the research topics of 'Latent Semantic Analysis Based Cross Language Plagiarism Detection System with Support Vector Machine Classifier'. Together they form a unique fingerprint.

Cite this