Author
Listed:
- Sikha S. Bagui
(Department of Computer Science, The University of West Florida, Pensacola, FL 32514, USA)
- Mohammad Pale Khan
(Department of Computer Science, The University of West Florida, Pensacola, FL 32514, USA)
- Chedlyne Valmyr
(Department of Computer Science, The University of West Florida, Pensacola, FL 32514, USA)
- Subhash C. Bagui
(Department of Mathematics and Statistics, The University of West Florida, Pensacola, FL 32514, USA)
- Dustin Mink
(Department of Cybersecurity, The University of West Florida, Pensacola, FL 32514, USA)
Abstract
This paper presents a comprehensive model for detecting and addressing concept drift in network security data using the Isolation Forest algorithm. The approach leverages Isolation Forest’s inherent ability to efficiently isolate anomalies in high-dimensional data, making it suitable for adapting to shifting data distributions in dynamic environments.Anomalies in network attack data may not occur in large numbers, so it is important to be able to detect anomalies even with small batch sizes. The novelty of this work lies in successfully detecting anomalies even with small batch sizes and identifying the point at which incremental retraining needs to be started. Triggering retraining early also keeps the model in sync with the latest data, reducing the chance for attacks to be successfully conducted. Our methodology implements an end-to-end workflow that continuously monitors incoming data and detects distribution changes using Isolation Forest, then manages model retraining using Random Forest to maintain optimal performance. We evaluate our approach using UWF-ZeekDataFall22, a newly created dataset that analyzes Zeek’s Connection Logs collected through Security Onion 2 network security monitor and labeled using the MITRE ATT&CK framework. Incremental as well as full retraining are analyzed using Random Forest. There was a steady increase in the model’s performance with incremental retraining and a positive impact on the model’s performance with full model retraining.
Suggested Citation
Sikha S. Bagui & Mohammad Pale Khan & Chedlyne Valmyr & Subhash C. Bagui & Dustin Mink, 2025.
"Model Retraining upon Concept Drift Detection in Network Traffic Big Data,"
Future Internet, MDPI, vol. 17(8), pages 1-23, July.
Handle:
RePEc:gam:jftint:v:17:y:2025:i:8:p:328-:d:1709399
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jftint:v:17:y:2025:i:8:p:328-:d:1709399. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.