IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0346982.html

A cross-dataset harmonized intrusion detection framework with statistically validated multi-model learning

Author

Listed:
  • Shailendra Mishra
  • Naif S Alshammari
  • Hashim Hussain
  • Ruba Ahmed Alfahidah

Abstract

Intrusion Detection Systems (IDS) are considered critical security tools in ensuring network infrastructure security. However, recent studies on machine learning-based IDS systems are often constrained by their heavy dependence on a single dataset, lack of reproducibility, and lack of transparency in evaluating their performance. In addressing these challenges, a unified and transparent framework for evaluating IDS systems is proposed, which focuses on integrating feature harmonization, multi-model benchmarking, and statistical validation. In achieving this objective, a preprocessing pipeline is designed to harmonize features of both legacy and contemporary network intrusion datasets, i.e., NSL-KDD and CICIDS2017, respectively. This framework will assess various learning models, including supervised, unsupervised, deep learning, and ensemble-based models, through cross-validation and statistical tests such as Wilcoxon signed-rank, McNemar’s, and DeLong tests. Experimental results demonstrate that the Random Forest model performs best in terms of performance metrics, i.e., 98.0% accuracy and 97.0% F1-score on the harmonized data set. Moreover, feature harmonization is found to be the most important factor in improving performance using ablation analysis. Besides, a novel approach of using a cryptographic logging mechanism using SHA-256 hash chaining is proposed for tamper-evident traceability and reproducibility of results in experiments, though it is not as effective as using a blockchain-based approach. Although effective in its application, it is based on manual feature alignment and hence might not be effective in highly heterogeneous data sets.This work provides a unified, reproducible, and statistically grounded framework for evaluating IDS systems, focusing on generalization and transparency in cybersecurity research.

Suggested Citation

  • Shailendra Mishra & Naif S Alshammari & Hashim Hussain & Ruba Ahmed Alfahidah, 2026. "A cross-dataset harmonized intrusion detection framework with statistically validated multi-model learning," PLOS ONE, Public Library of Science, vol. 21(4), pages 1-27, April.
  • Handle: RePEc:plo:pone00:0346982
    DOI: 10.1371/journal.pone.0346982
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0346982
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0346982&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0346982?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0346982. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.