TY - GEN
T1 - Cross-language automatic plagiarism detector using latent semantic analysis and self-organizing map
AU - Ratna, Anak Agung Putri
AU - Nabhastala, Paskalis Nandana Yestha
AU - Ibrahim, Ihsan
AU - Ekadiyanto, F. Astha
AU - Salman, Muhammad
AU - Purnamasari, Prima Dewi
AU - Herusaktiawan, Muhammad Yusuf Irfan
N1 - Publisher Copyright:
© 2018 Association for Computing Machinery.
PY - 2018/11/23
Y1 - 2018/11/23
N2 - Computer assisted detection or automatic detection for plagiarism could help human to check whether an author of a paper do plagiarism or not. Department of Electrical Engineering, Universitas Indonesia had been developing cross-language automatic plagiarism detection which test paper is written on Indonesian and reference paper written on English. More accurate automatic detection system is needed to prevent plagiarism act, especially on academic paper. The system is based on Latent Semantic Analysis (LSA) algorithm with addition of Self-Organizing Map (SOM) to do classification of the output from LSA. Some features for SOM are extracted from singular value matrix from LSA, they are Frobenius Norm and Cosine Similarity. Together with percentage of technical term, all of the features are used as the input for SOM to classify into 10, 5, 3, and 2 classes. The use of 5 classes in LSA could give equal accuracy for all classes, with the highest accuracy reach 83.09%. While in LSA-SOM, the best accuracy is 83.53% for training data and 80.47% for testing data, in 2-classes configuration with 3 features, they were percentage of technical term, frobenius norm, and pad.
AB - Computer assisted detection or automatic detection for plagiarism could help human to check whether an author of a paper do plagiarism or not. Department of Electrical Engineering, Universitas Indonesia had been developing cross-language automatic plagiarism detection which test paper is written on Indonesian and reference paper written on English. More accurate automatic detection system is needed to prevent plagiarism act, especially on academic paper. The system is based on Latent Semantic Analysis (LSA) algorithm with addition of Self-Organizing Map (SOM) to do classification of the output from LSA. Some features for SOM are extracted from singular value matrix from LSA, they are Frobenius Norm and Cosine Similarity. Together with percentage of technical term, all of the features are used as the input for SOM to classify into 10, 5, 3, and 2 classes. The use of 5 classes in LSA could give equal accuracy for all classes, with the highest accuracy reach 83.09%. While in LSA-SOM, the best accuracy is 83.53% for training data and 80.47% for testing data, in 2-classes configuration with 3 features, they were percentage of technical term, frobenius norm, and pad.
KW - Automatic plagiarism detection
KW - Cross language
KW - Latent semantic analysis
KW - Self-organizing map
KW - Singular value decomposition
UR - http://www.scopus.com/inward/record.url?scp=85062796568&partnerID=8YFLogxK
U2 - 10.1145/3293663.3293681
DO - 10.1145/3293663.3293681
M3 - Conference contribution
AN - SCOPUS:85062796568
T3 - ACM International Conference Proceeding Series
SP - 83
EP - 87
BT - AIVR 2018 - 2018 International Conference on Artificial Intelligence and Virtual Reality
PB - Association for Computing Machinery
T2 - 2018 International Conference on Artificial Intelligence and Virtual Reality, AIVR 2018
Y2 - 23 November 2018 through 25 November 2018
ER -