TY - JOUR
T1 - Authorship Obfuscation System Development based on Long Short-term Memory Algorithm
AU - Maulana, Hendrik
AU - Sari, Riri Fitri
N1 - Funding Information:
This work is supported by Universitas Indonesia under the Q1Q2 Grant Number NKB-0321/UN2.R3.1/HKP.05.00/2019.
Publisher Copyright:
© 2022. International Journal of Technology.All Rights Reserved.
PY - 2022
Y1 - 2022
N2 - Stylometry is an authorship analysis technique that uses statistics. Through stylometry, the authorship identity of a document can be analyzed with high accuracy. This poses a threat to the privacy of the author. Meanwhile, there is a stylometry method, namely the elimination of authorship identity, which can provide privacy protection for writers. This study uses the authorship method to eliminate the method applied to the Federalist Paper corpus. Federalist Paper is a well-known corpus that has been extensively studied, especially in authorship identification methods, considering that there are 12 disputed texts in the corpus. One identification method is the use of the support vector machine (SVM) algorithm. Through this algorithm, the author’s identity of disputed text can be obtained with 86% accuracy. The authorship identity elimination method can change the writing style while maintaining its meaning. Long-short-term memory (LSTM) is a deep learning-based algorithm that can predict words well. Through a model formed from the LSTM algorithm, the writing style of the disputed documents in the Federalist Paper can be changed. As a result, 4 out of 12 disputed documents can be changed from one author identity to another identity. The similarity level of the changed documents ranges from 40% to 57%, which indicates the meaning preservation from original documents. Our experimental results conclude that the proposed method can eliminate authorship identity well
AB - Stylometry is an authorship analysis technique that uses statistics. Through stylometry, the authorship identity of a document can be analyzed with high accuracy. This poses a threat to the privacy of the author. Meanwhile, there is a stylometry method, namely the elimination of authorship identity, which can provide privacy protection for writers. This study uses the authorship method to eliminate the method applied to the Federalist Paper corpus. Federalist Paper is a well-known corpus that has been extensively studied, especially in authorship identification methods, considering that there are 12 disputed texts in the corpus. One identification method is the use of the support vector machine (SVM) algorithm. Through this algorithm, the author’s identity of disputed text can be obtained with 86% accuracy. The authorship identity elimination method can change the writing style while maintaining its meaning. Long-short-term memory (LSTM) is a deep learning-based algorithm that can predict words well. Through a model formed from the LSTM algorithm, the writing style of the disputed documents in the Federalist Paper can be changed. As a result, 4 out of 12 disputed documents can be changed from one author identity to another identity. The similarity level of the changed documents ranges from 40% to 57%, which indicates the meaning preservation from original documents. Our experimental results conclude that the proposed method can eliminate authorship identity well
KW - Authorship
KW - Long-short-term memory (lstm)
KW - Obfuscation
UR - http://www.scopus.com/inward/record.url?scp=85128997114&partnerID=8YFLogxK
U2 - 10.14716/ijtech.v13i2.4257
DO - 10.14716/ijtech.v13i2.4257
M3 - Article
AN - SCOPUS:85128997114
SN - 2086-9614
VL - 13
SP - 345
EP - 355
JO - International Journal of Technology
JF - International Journal of Technology
IS - 2
ER -