IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v12y2024i5p626-d1342396.html
   My bibliography  Save this article

KDTM: Multi-Stage Knowledge Distillation Transfer Model for Long-Tailed DGA Detection

Author

Listed:
  • Baoyu Fan

    (Faculty of Applied Sciences, Macao Polytechnic University, Macao 999078, China)

  • Han Ma

    (Faculty of Applied Sciences, Macao Polytechnic University, Macao 999078, China)

  • Yue Liu

    (Faculty of Applied Sciences, Macao Polytechnic University, Macao 999078, China)

  • Xiaochen Yuan

    (Faculty of Applied Sciences, Macao Polytechnic University, Macao 999078, China)

  • Wei Ke

    (Faculty of Applied Sciences, Macao Polytechnic University, Macao 999078, China)

Abstract

As the most commonly used attack strategy by Botnets, the Domain Generation Algorithm (DGA) has strong invisibility and variability. Using deep learning models to detect different families of DGA domain names can improve the network defense ability against hackers. However, this task faces an extremely imbalanced sample size among different DGA categories, which leads to low classification accuracy for small sample categories and even classification failure for some categories. To address this issue, we introduce the long-tailed concept and augment the data of small sample categories by transferring pre-trained knowledge. Firstly, we propose the Data Balanced Review Method (DBRM) to reduce the sample size difference between the categories, thus a relatively balanced dataset for transfer learning is generated. Secondly, we propose the Knowledge Transfer Model (KTM) to enhance the knowledge of the small sample categories. KTM uses a multi-stage transfer to transfer weights from the big sample categories to the small sample categories. Furthermore, we propose the Knowledge Distillation Transfer Model (KDTM) to relieve the catastrophic forgetting problem caused by transfer learning, which adds knowledge distillation loss based on the KTM. The experimental results show that KDTM can significantly improve the classification performance of all categories, especially the small sample categories. It can achieve a state-of-the-art macro average F1 score of 84.5%. The robustness of the KDTM model is verified using three DGA datasets that follow the Pareto distributions.

Suggested Citation

  • Baoyu Fan & Han Ma & Yue Liu & Xiaochen Yuan & Wei Ke, 2024. "KDTM: Multi-Stage Knowledge Distillation Transfer Model for Long-Tailed DGA Detection," Mathematics, MDPI, vol. 12(5), pages 1-19, February.
  • Handle: RePEc:gam:jmathe:v:12:y:2024:i:5:p:626-:d:1342396
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/12/5/626/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/12/5/626/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Charles Byrne & Yair Censor, 2001. "Proximity Function Minimization Using Multiple Bregman Projections, with Applications to Split Feasibility and Kullback–Leibler Distance Minimization," Annals of Operations Research, Springer, vol. 105(1), pages 77-98, July.
    2. Bo Pang & Erik Nijkamp & Ying Nian Wu, 2020. "Deep Learning With TensorFlow: A Review," Journal of Educational and Behavioral Statistics, , vol. 45(2), pages 227-248, April.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Xianfu Wang & Xinmin Yang, 2015. "On the Existence of Minimizers of Proximity Functions for Split Feasibility Problems," Journal of Optimization Theory and Applications, Springer, vol. 166(3), pages 861-888, September.
    2. Emanuel Laude & Peter Ochs & Daniel Cremers, 2020. "Bregman Proximal Mappings and Bregman–Moreau Envelopes Under Relative Prox-Regularity," Journal of Optimization Theory and Applications, Springer, vol. 184(3), pages 724-761, March.
    3. Md. Tarek Hasan & Md. Al Emran Hossain & Md. Saddam Hossain Mukta & Arifa Akter & Mohiuddin Ahmed & Salekul Islam, 2023. "A Review on Deep-Learning-Based Cyberbullying Detection," Future Internet, MDPI, vol. 15(5), pages 1-47, May.
    4. Hédy Attouch & Jérôme Bolte & Patrick Redont & Antoine Soubeyran, 2010. "Proximal Alternating Minimization and Projection Methods for Nonconvex Problems: An Approach Based on the Kurdyka-Łojasiewicz Inequality," Mathematics of Operations Research, INFORMS, vol. 35(2), pages 438-457, May.
    5. Jabir, Brahim & Moutaouakil, Khalid El & Falih, Noureddine, 2023. "Developing an Efficient System with Mask R-CNN for Agricultural Applications," AGRIS on-line Papers in Economics and Informatics, Czech University of Life Sciences Prague, Faculty of Economics and Management, vol. 15(1), January.
    6. Regina S. Burachik & Minh N. Dao & Scott B. Lindstrom, 2021. "Generalized Bregman Envelopes and Proximity Operators," Journal of Optimization Theory and Applications, Springer, vol. 190(3), pages 744-778, September.
    7. Xianbin Wang & Yuqi Zhao & Weifeng Li, 2023. "Recognition of Commercial Vehicle Driving Cycles Based on Multilayer Perceptron Model," Sustainability, MDPI, vol. 15(3), pages 1-21, February.
    8. Peng Zhang & Huize Ren & Xiaobin Dong & Xuechao Wang & Mengxue Liu & Ying Zhang & Yufang Zhang & Jiuming Huang & Shuheng Dong & Ruiming Xiao, 2023. "Understanding and Applications of Tensors in Ecosystem Services: A Case Study of the Manas River Basin," Land, MDPI, vol. 12(2), pages 1-23, February.
    9. Yen-Huan Li & Volkan Cevher, 2019. "Convergence of the Exponentiated Gradient Method with Armijo Line Search," Journal of Optimization Theory and Applications, Springer, vol. 181(2), pages 588-607, May.
    10. Wenma Jin & Yair Censor & Ming Jiang, 2016. "Bounded perturbation resilience of projected scaled gradient methods," Computational Optimization and Applications, Springer, vol. 63(2), pages 365-392, March.
    11. Charles L. Byrne, 2013. "Alternating Minimization as Sequential Unconstrained Minimization: A Survey," Journal of Optimization Theory and Applications, Springer, vol. 156(3), pages 554-566, March.
    12. Souza, Sissy da S. & Oliveira, P.R. & da Cruz Neto, J.X. & Soubeyran, A., 2010. "A proximal method with separable Bregman distances for quasiconvex minimization over the nonnegative orthant," European Journal of Operational Research, Elsevier, vol. 201(2), pages 365-376, March.
    13. Vishakha Sood & Reet Kamal Tiwari & Sartajvir Singh & Ravneet Kaur & Bikash Ranjan Parida, 2022. "Glacier Boundary Mapping Using Deep Learning Classification over Bara Shigri Glacier in Western Himalayas," Sustainability, MDPI, vol. 14(20), pages 1-13, October.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:12:y:2024:i:5:p:626-:d:1342396. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.