TY - JOUR
T1 - IMPROVING MODEL PERFORMANCE FOR PREDICTING EXFILTRATION ATTACKS THROUGH RESAMPLING STRATEGIES
AU - HAKIM, ARIF RAHMAN
AU - RAMLI, KALAMULLAH
AU - SALMAN, MUHAMMAD
AU - AGUSTINA, ESTI RAHMAWATI
N1 - Publisher Copyright:
© (2025), (International Islamic University Malaysia). All rights reserved.
PY - 2025
Y1 - 2025
N2 - Addressing class imbalance is critical in cybersecurity applications, particularly in scenarios like exfiltration detection, where skewed datasets lead to biased predictions and poor generalization for minority classes. This study investigates five Synthetic Minority Oversampling Technique (SMOTE) variants, including BorderlineSMOTE, KMeansSMOTE, SMOTEENC, SMOTEENN, and SMOTETomek, to mitigate severe imbalance in our customized tactic-labeled dataset with dominant majority class influence and weak class separability class imbalance. We use seven imbalance metrics to assess each SMOTE variant's impact on class distribution stability and separability. Furthermore, we evaluate model performance across five classifiers: Logistic Regression, Naïve Bayes, Support Vector Machine, Random Forest, and XGBoost. Findings reveal that SMOTEENN consistently enhances performance metrics (accuracy, precision, recall, F1-score, and geometric mean) on an average of 99% across most classifiers, establishing itself as the most adaptable variant for handling imbalance. This study provides a comprehensive framework for selecting resampling strategies to enhance classification efficacy in cybersecurity tasks with imbalanced data.
AB - Addressing class imbalance is critical in cybersecurity applications, particularly in scenarios like exfiltration detection, where skewed datasets lead to biased predictions and poor generalization for minority classes. This study investigates five Synthetic Minority Oversampling Technique (SMOTE) variants, including BorderlineSMOTE, KMeansSMOTE, SMOTEENC, SMOTEENN, and SMOTETomek, to mitigate severe imbalance in our customized tactic-labeled dataset with dominant majority class influence and weak class separability class imbalance. We use seven imbalance metrics to assess each SMOTE variant's impact on class distribution stability and separability. Furthermore, we evaluate model performance across five classifiers: Logistic Regression, Naïve Bayes, Support Vector Machine, Random Forest, and XGBoost. Findings reveal that SMOTEENN consistently enhances performance metrics (accuracy, precision, recall, F1-score, and geometric mean) on an average of 99% across most classifiers, establishing itself as the most adaptable variant for handling imbalance. This study provides a comprehensive framework for selecting resampling strategies to enhance classification efficacy in cybersecurity tasks with imbalanced data.
KW - and Exfiltration
KW - Imbalance Data
KW - Machine Learning
KW - SMOTE
UR - https://www.scopus.com/pages/publications/85216395552
U2 - 10.31436/IIUMEJ.V26I1.3547
DO - 10.31436/IIUMEJ.V26I1.3547
M3 - Article
AN - SCOPUS:85216395552
SN - 1511-788X
VL - 26
SP - 420
EP - 436
JO - IIUM Engineering Journal
JF - IIUM Engineering Journal
IS - 1
ER -