TY - JOUR
T1 - Three layer hybrid learning to improve intrusion detection system performance
AU - Harwahyu, Ruki
AU - Ndolu, Fajar Henri Erasmus
AU - Overbeek, Marlinda Vasty
N1 - Publisher Copyright:
© 2024 Institute of Advanced Engineering and Science. All rights reserved.
PY - 2024
Y1 - 2024
N2 - In imbalanced network traffic, malicious cyberattacks can be hidden in a large amount of normal traffic, making it difficult for intrusion detection systems (IDS) to detect them. Therefore, anomaly-based IDS with machine learning is the solution. However, a single machine learning cannot accurately detect all types of attacks. Therefore, a hybrid model that combines long short-term memory (LSTM) and random forest (RF) in three layers is proposed. Building the hybrid model starts with Nearmiss-2 class balancing, which reduces normal samples without increasing minority samples. Then, feature selection is performed using chi-square and RF. Next, hyperparameter tuning is performed to obtain the optimal model. In the first and second layers, LSTM and RF are used for binary classification to detect normal data and attack data. While the third layer model uses RF for multiclass classification. The hybrid model verified using the CSE-CIC-IDS2018 dataset, showed better performance compared to the single algorithm. For multiclass classification, the hybrid model achieved 99.76% accuracy, 99.76% precision, 99.76% recall, and 99.75% F1-score.
AB - In imbalanced network traffic, malicious cyberattacks can be hidden in a large amount of normal traffic, making it difficult for intrusion detection systems (IDS) to detect them. Therefore, anomaly-based IDS with machine learning is the solution. However, a single machine learning cannot accurately detect all types of attacks. Therefore, a hybrid model that combines long short-term memory (LSTM) and random forest (RF) in three layers is proposed. Building the hybrid model starts with Nearmiss-2 class balancing, which reduces normal samples without increasing minority samples. Then, feature selection is performed using chi-square and RF. Next, hyperparameter tuning is performed to obtain the optimal model. In the first and second layers, LSTM and RF are used for binary classification to detect normal data and attack data. While the third layer model uses RF for multiclass classification. The hybrid model verified using the CSE-CIC-IDS2018 dataset, showed better performance compared to the single algorithm. For multiclass classification, the hybrid model achieved 99.76% accuracy, 99.76% precision, 99.76% recall, and 99.75% F1-score.
KW - CSE-CIC-IDS2018
KW - Hybrid learning
KW - Intrusion detection system
KW - Long short-term memory
KW - Random forest
UR - http://www.scopus.com/inward/record.url?scp=85185792299&partnerID=8YFLogxK
U2 - 10.11591/ijece.v14i2.pp1691-1699
DO - 10.11591/ijece.v14i2.pp1691-1699
M3 - Article
AN - SCOPUS:85185792299
SN - 2088-8708
VL - 14
SP - 1691
EP - 1699
JO - International Journal of Electrical and Computer Engineering
JF - International Journal of Electrical and Computer Engineering
IS - 2
ER -