Author
Listed:
- Shailendra Mishra
- Naif S Alshammari
- Hashim Hussain
- Ruba Ahmed Alfahidah
Abstract
Intrusion Detection Systems (IDS) are considered critical security tools in ensuring network infrastructure security. However, recent studies on machine learning-based IDS systems are often constrained by their heavy dependence on a single dataset, lack of reproducibility, and lack of transparency in evaluating their performance. In addressing these challenges, a unified and transparent framework for evaluating IDS systems is proposed, which focuses on integrating feature harmonization, multi-model benchmarking, and statistical validation. In achieving this objective, a preprocessing pipeline is designed to harmonize features of both legacy and contemporary network intrusion datasets, i.e., NSL-KDD and CICIDS2017, respectively. This framework will assess various learning models, including supervised, unsupervised, deep learning, and ensemble-based models, through cross-validation and statistical tests such as Wilcoxon signed-rank, McNemar’s, and DeLong tests. Experimental results demonstrate that the Random Forest model performs best in terms of performance metrics, i.e., 98.0% accuracy and 97.0% F1-score on the harmonized data set. Moreover, feature harmonization is found to be the most important factor in improving performance using ablation analysis. Besides, a novel approach of using a cryptographic logging mechanism using SHA-256 hash chaining is proposed for tamper-evident traceability and reproducibility of results in experiments, though it is not as effective as using a blockchain-based approach. Although effective in its application, it is based on manual feature alignment and hence might not be effective in highly heterogeneous data sets.This work provides a unified, reproducible, and statistically grounded framework for evaluating IDS systems, focusing on generalization and transparency in cybersecurity research.
Suggested Citation
Shailendra Mishra & Naif S Alshammari & Hashim Hussain & Ruba Ahmed Alfahidah, 2026.
"A cross-dataset harmonized intrusion detection framework with statistically validated multi-model learning,"
PLOS ONE, Public Library of Science, vol. 21(4), pages 1-27, April.
Handle:
RePEc:plo:pone00:0346982
DOI: 10.1371/journal.pone.0346982
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0346982. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.