TY - JOUR
T1 - Predicting Data Exfiltration using Supervised Machine Learning based on Tactics Mapping from Threat Reports and Event Logs
T2 - Predicting Data Exfiltration Using Supervised Machine Learning Based on Tactics Mapping From Threat Reports and Event Logs
AU - Hakim, Arif Rahman
AU - Ramli, Kalamullah
AU - Salman, Muhammad
AU - Pranggono, Bernardi
AU - Agustina, Esti Rahmawati
N1 - Publisher Copyright:
© 2013 IEEE.
PY - 2024
Y1 - 2024
N2 - Data breach attacks are unique, especially when attackers exfiltrate data from their target's systems. Furthermore, as data breaches continue to increase in frequency and severity, they pose a growing risk to society and organizations. Unfortunately, no prior research focused on predicting exfiltration occurrence based on a sequence of tactics identified from low-level logs. In addition, integrating low-level logs with a high-level conceptual framework presents a critical challenge. The need for automation in the mapping process and advanced methods to assist defenders in analyzing the occurrence of exfiltration within their systems is urgent. In this paper, we focus on developing a machine learning (ML) model to predict the occurrence of data exfiltration by analyzing the sequence of tactics employed by an attacker. We propose two main contributions, including addressing the gap level between low-level logs and high-level data breach conceptual steps and integrating collected event logs and ML models to predict exfiltration tactics. Our dataset for the MLmodel is created based on tactics identified in threat reports, cleaned to obtain ten features, and balanced using the SMOTE+ENN technique. The prediction is made using tactics identified from low-level logs that serve as input to the ML model to determine whether the events lead to the occurrence of exfiltration. We benchmarked three resampling methods, five feature selection techniques, and five ML algorithms to achieve optimal ML model performance. A new dataset, comprehensive techniques used to develop the ML model, and the proposed prediction method represent the key contributions of this study compared to existing research. In addition, to demonstrate the effectiveness of our proposed method, we present case studies using event logs from real-world incidents. The investigation shows that our proposed method effectively predicts the occurrence of exfiltration with higher accuracy than existing studies.
AB - Data breach attacks are unique, especially when attackers exfiltrate data from their target's systems. Furthermore, as data breaches continue to increase in frequency and severity, they pose a growing risk to society and organizations. Unfortunately, no prior research focused on predicting exfiltration occurrence based on a sequence of tactics identified from low-level logs. In addition, integrating low-level logs with a high-level conceptual framework presents a critical challenge. The need for automation in the mapping process and advanced methods to assist defenders in analyzing the occurrence of exfiltration within their systems is urgent. In this paper, we focus on developing a machine learning (ML) model to predict the occurrence of data exfiltration by analyzing the sequence of tactics employed by an attacker. We propose two main contributions, including addressing the gap level between low-level logs and high-level data breach conceptual steps and integrating collected event logs and ML models to predict exfiltration tactics. Our dataset for the MLmodel is created based on tactics identified in threat reports, cleaned to obtain ten features, and balanced using the SMOTE+ENN technique. The prediction is made using tactics identified from low-level logs that serve as input to the ML model to determine whether the events lead to the occurrence of exfiltration. We benchmarked three resampling methods, five feature selection techniques, and five ML algorithms to achieve optimal ML model performance. A new dataset, comprehensive techniques used to develop the ML model, and the proposed prediction method represent the key contributions of this study compared to existing research. In addition, to demonstrate the effectiveness of our proposed method, we present case studies using event logs from real-world incidents. The investigation shows that our proposed method effectively predicts the occurrence of exfiltration with higher accuracy than existing studies.
KW - data exfiltration
KW - event logs
KW - machine learning
KW - tactics mapping
KW - threats report
UR - http://www.scopus.com/inward/record.url?scp=85214293540&partnerID=8YFLogxK
U2 - 10.1109/ACCESS.2024.3524502
DO - 10.1109/ACCESS.2024.3524502
M3 - Article
AN - SCOPUS:85214293540
SN - 2169-3536
VL - 13
SP - 28381
EP - 28397
JO - IEEE Access
JF - IEEE Access
ER -