STG2P: A two-stage pipeline model for intrusion detection based on improved LightGBM and K-means

Document Type


Publication Title

Simulation Modelling Practice and Theory


Network attack behavior is always mixed with a large number of normal communications, which makes the attack characteristics only account for a very small fraction in the log data. From the perspective of simulation and modeling, the data for attack detection is extremely unbalanced if we regard the attack behavior as the positive label. Network instruction detection is an important topic in identifying the attack behavior, but the detection methods based on simulation and model, such as traditional machine learning, face the challenges of poor effectiveness and efficiency. Supervised models, such as LightGBM, can effectively classify abnormal data because of the fast training speed and its high efficiency. However, it works badly when dealing with sparse negative data, such as the network intrusion data. On the other hand, unsupervised models, such as K-means, can achieve good performance with undesirable training time cost. However, it is difficult to select an appropriate parameter for network intrusion. In this paper, we propose a two-stage pipeline model named STG2P, which leverages the improved LightGBM and the reinforced K-means. Specifically, STG2P introduces a threshold for LightGBM in the coarse classification stage, and pipelines the draft results to K-means for filtering the false positive samples in the fine classification stage. By adaptively adopting the pipelined data of the improved LightGBM and K-means, the method can avoid the shortcomings of both models. We also conduct extensive simulations on the LANL dataset, and the results show that the AUC value can be improved as high as 29.48%. The detection rate of our method can reach 96.64%, which shows superior performance compared with some traditional detection methods. © 2022 Elsevier B.V.



Publication Date



Improved LightGBM, Intrusion detection, Pipeline model, Reinforced K-means, Efficiency, Pipelines, Reinforcement, Attack behavior, Detection methods, Improved lightgbm, Intrusion-Detection, K-means, Network intrusions, Performance, Pipeline models, Reinforced K-mean, Simulation and modeling


IR Deposit conditions:

OA version (pathway b): Accepted version

24 months embargo

Licence: CC BY-NC-ND

Must link to publisher version with DOI