Author
Listed:
- Mahbub E Sobhani
- Anika Tasnim Rodela
- Dewan Md Farid
Abstract
Imbalanced intrusion classification is a complex and challenging task as there are few number of instances/intrusions generally considered as minority instances/intrusions in the imbalanced intrusion datasets. Data sampling methods such as over-sampling and under-sampling methods are commonly applied for dealing with imbalanced intrusion data. In over-sampling, synthetic minority instances are generated e.g. SMOTE (Synthetic Minority Over-sampling Technique) and on the contrary, under-sampling methods remove the majority-class instances to create balanced data e.g. random under-sampling. Both over-sampling and under-sampling methods have the disadvantages as over-sampling technique creates overfitting and under-sampling technique ignores a large portion of the data. Ensemble learning in supervised machine learning is also a common technique for handling imbalanced data. Random Forest and Bagging techniques address the overfitting problem, and Boosting (AdaBoost) gives more attention to the minority-class instances in its iterations. In this paper, we have proposed a method for selecting the most informative instances that represent the overall dataset. We have applied both over-sampling and under-sampling techniques to balance the data by employing the majority and minority informative instances. We have used Random Forest, Bagging, and Boosting (AdaBoost) algorithms and have compared their performances. We have used decision tree (C4.5) as the base classifier of Random Forest and AdaBoost classifiers and naïve Bayes classifier as the base classifier of the Bagging model. The proposed method Adaptive TreeHive addresses both the issues of imbalanced ratio and high dimensionality, resulting in reduced computational power and execution time requirements. We have evaluated the proposed Adaptive TreeHive method using five large-scale public benchmark datasets. The experimental results, compared to data balancing methods such as under-sampling and over-sampling, exhibit superior performance of the Adaptive TreeHive with accuracy rates of 99.96%, 85.65%, 99.83%, 99.77%, and 95.54% on the NSL-KDD, UNSW-NB15, CIC-IDS2017, CSE-CIC-IDS2018, and CICDDoS2019 datasets, respectively, establishing the Adaptive TreeHive as a superior performer compared to the traditional ensemble classifiers.
Suggested Citation
Mahbub E Sobhani & Anika Tasnim Rodela & Dewan Md Farid, 2025.
"Adaptive TreeHive: Ensemble of trees for enhancing imbalanced intrusion classification,"
PLOS ONE, Public Library of Science, vol. 20(9), pages 1-38, September.
Handle:
RePEc:plo:pone00:0331307
DOI: 10.1371/journal.pone.0331307
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0331307. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.