Analysis on the effect of term-document's matrix to the accuracy of latent-semantic-analysis-based crosslanguage plagiarism detection

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper presents the results of experimental investigation on the impact of term-document matrix variations to the accuracy of cross-language LSA-based plagiarism detection. The experiment was focusing in comparing Indonesian and English papers. The increase of document definition size as the source of matrix construction significantly caused negative impact to the detection accuracy in all scenarios. The results of the experiments showed that the document definition size must be kept below 10 in order to maintain high accuracy, and reached its worst performance at 25. Additionally, the implementation of term-document matrix using the frequency of word's occurrence was found beneficial to the improvement of detection accuracy compared to the binary implementation using simply the existence/absence of words.

Original languageEnglish
Title of host publicationProceedings of 2016 5th International Conference on Network, Communication and Computing, ICNCC 2016
PublisherAssociation for Computing Machinery
Pages78-82
Number of pages5
ISBN (Electronic)9781450347938
DOIs
Publication statusPublished - 17 Dec 2016
Event5th International Conference on Network, Communication and Computing, ICNCC 2016 - Kyoto, Japan
Duration: 17 Dec 201621 Dec 2016

Publication series

NameACM International Conference Proceeding Series

Conference

Conference5th International Conference on Network, Communication and Computing, ICNCC 2016
CountryJapan
CityKyoto
Period17/12/1621/12/16

Keywords

  • Cross-Language Plagiarism Detection
  • Latent Semantic Analysis
  • Learning Vector Quantization
  • Term-Document Matrix

Fingerprint Dive into the research topics of 'Analysis on the effect of term-document's matrix to the accuracy of latent-semantic-analysis-based crosslanguage plagiarism detection'. Together they form a unique fingerprint.

Cite this