Predicting Data Exfiltration using Supervised Machine Learning based on Tactics Mapping from Threat Reports and Event Logs: Predicting Data Exfiltration Using Supervised Machine Learning Based on Tactics Mapping From Threat Reports and Event Logs

Arif Rahman Hakim, Kalamullah Ramli, Muhammad Salman, Bernardi Pranggono, Esti Rahmawati Agustina

Research output: Contribution to journalArticlepeer-review

Abstract

Data breach attacks are unique, especially when attackers exfiltrate data from their target's systems. Furthermore, as data breaches continue to increase in frequency and severity, they pose a growing risk to society and organizations. Unfortunately, no prior research focused on predicting exfiltration occurrence based on a sequence of tactics identified from low-level logs. In addition, integrating low-level logs with a high-level conceptual framework presents a critical challenge. The need for automation in the mapping process and advanced methods to assist defenders in analyzing the occurrence of exfiltration within their systems is urgent. In this paper, we focus on developing a machine learning (ML) model to predict the occurrence of data exfiltration by analyzing the sequence of tactics employed by an attacker. We propose two main contributions, including addressing the gap level between low-level logs and high-level data breach conceptual steps and integrating collected event logs and ML models to predict exfiltration tactics. Our dataset for the MLmodel is created based on tactics identified in threat reports, cleaned to obtain ten features, and balanced using the SMOTE+ENN technique. The prediction is made using tactics identified from low-level logs that serve as input to the ML model to determine whether the events lead to the occurrence of exfiltration. We benchmarked three resampling methods, five feature selection techniques, and five ML algorithms to achieve optimal ML model performance. A new dataset, comprehensive techniques used to develop the ML model, and the proposed prediction method represent the key contributions of this study compared to existing research. In addition, to demonstrate the effectiveness of our proposed method, we present case studies using event logs from real-world incidents. The investigation shows that our proposed method effectively predicts the occurrence of exfiltration with higher accuracy than existing studies.

Original languageEnglish
Pages (from-to)28381-28397
Number of pages17
JournalIEEE Access
Volume13
DOIs
Publication statusAccepted/In press - 2024

Keywords

  • data exfiltration
  • event logs
  • machine learning
  • tactics mapping
  • threats report

Fingerprint

Dive into the research topics of 'Predicting Data Exfiltration using Supervised Machine Learning based on Tactics Mapping from Threat Reports and Event Logs: Predicting Data Exfiltration Using Supervised Machine Learning Based on Tactics Mapping From Threat Reports and Event Logs'. Together they form a unique fingerprint.

Cite this