IDEAS home Printed from https://ideas.repec.org/a/gam/jsusta/v14y2022i12p7375-d840446.html
   My bibliography  Save this article

Interpretable Machine Learning Models for Malicious Domains Detection Using Explainable Artificial Intelligence (XAI)

Author

Listed:
  • Nida Aslam

    (SAUDI ARAMCO Cybersecurity Chair, College of Computer Science and Information Technology, Imam Abdulrahman Bin Faisal University, P.O. Box 1982, Dammam 31441, Saudi Arabia)

  • Irfan Ullah Khan

    (Department of Computer Science, College of Computer Science and Information Technology, Imam Abdulrahman Bin Faisal University, Dammam 31441, Saudi Arabia)

  • Samiha Mirza

    (Department of Computer Science, College of Computer Science and Information Technology, Imam Abdulrahman Bin Faisal University, Dammam 31441, Saudi Arabia)

  • Alanoud AlOwayed

    (Department of Computer Science, College of Computer Science and Information Technology, Imam Abdulrahman Bin Faisal University, Dammam 31441, Saudi Arabia)

  • Fatima M. Anis

    (Department of Computer Science, College of Computer Science and Information Technology, Imam Abdulrahman Bin Faisal University, Dammam 31441, Saudi Arabia)

  • Reef M. Aljuaid

    (Department of Computer Science, College of Computer Science and Information Technology, Imam Abdulrahman Bin Faisal University, Dammam 31441, Saudi Arabia)

  • Reham Baageel

    (Department of Computer Science, College of Computer Science and Information Technology, Imam Abdulrahman Bin Faisal University, Dammam 31441, Saudi Arabia)

Abstract

With the expansion of the internet, a major threat has emerged involving the spread of malicious domains intended by attackers to perform illegal activities aiming to target governments, violating privacy of organizations, and even manipulating everyday users. Therefore, detecting these harmful domains is necessary to combat the growing network attacks. Machine Learning (ML) models have shown significant outcomes towards the detection of malicious domains. However, the “black box” nature of the complex ML models obstructs their wide-ranging acceptance in some of the fields. The emergence of Explainable Artificial Intelligence (XAI) has successfully incorporated the interpretability and explicability in the complex models. Furthermore, the post hoc XAI model has enabled the interpretability without affecting the performance of the models. This study aimed to propose an Explainable Artificial Intelligence (XAI) model to detect malicious domains on a recent dataset containing 45,000 samples of malicious and non-malicious domains. In the current study, initially several interpretable ML models, such as Decision Tree (DT) and Naïve Bayes (NB), and black box ensemble models, such as Random Forest (RF), Extreme Gradient Boosting (XGB), AdaBoost (AB), and Cat Boost (CB) algorithms, were implemented and found that XGB outperformed the other classifiers. Furthermore, the post hoc XAI global surrogate model (Shapley additive explanations) and local surrogate LIME were used to generate the explanation of the XGB prediction. Two sets of experiments were performed; initially the model was executed using a preprocessed dataset and later with selected features using the Sequential Forward Feature selection algorithm. The results demonstrate that ML algorithms were able to distinguish benign and malicious domains with overall accuracy ranging from 0.8479 to 0.9856. The ensemble classifier XGB achieved the highest result, with an AUC and accuracy of 0.9991 and 0.9856, respectively, before the feature selection algorithm, while there was an AUC of 0.999 and accuracy of 0.9818 after the feature selection algorithm. The proposed model outperformed the benchmark study.

Suggested Citation

  • Nida Aslam & Irfan Ullah Khan & Samiha Mirza & Alanoud AlOwayed & Fatima M. Anis & Reef M. Aljuaid & Reham Baageel, 2022. "Interpretable Machine Learning Models for Malicious Domains Detection Using Explainable Artificial Intelligence (XAI)," Sustainability, MDPI, vol. 14(12), pages 1-22, June.
  • Handle: RePEc:gam:jsusta:v:14:y:2022:i:12:p:7375-:d:840446
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2071-1050/14/12/7375/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2071-1050/14/12/7375/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Basim Mahbooba & Mohan Timilsina & Radhya Sahal & Martin Serrano & Ahmed Mostafa Khalil, 2021. "Explainable Artificial Intelligence (XAI) to Enhance Trust Management in Intrusion Detection Systems Using Decision Tree Model," Complexity, Hindawi, vol. 2021, pages 1-11, January.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Jonggu Jeong, 2022. "Introduction of the First AI Impact Assessment and Future Tasks: South Korea Discussion," Laws, MDPI, vol. 11(5), pages 1-11, September.
    2. Hung Viet Nguyen & Haewon Byeon, 2022. "Explainable Deep-Learning-Based Depression Modeling of Elderly Community after COVID-19 Pandemic," Mathematics, MDPI, vol. 10(23), pages 1-10, November.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Frank Cremer & Barry Sheehan & Michael Fortmann & Arash N. Kia & Martin Mullins & Finbarr Murphy & Stefan Materne, 2022. "Cyber risk and cybersecurity: a systematic review of data availability," The Geneva Papers on Risk and Insurance - Issues and Practice, Palgrave Macmillan;The Geneva Association, vol. 47(3), pages 698-736, July.
    2. Jérôme Darmont & Boris Novikov & Robert Wrembel & Ladjel Bellatreche, 2022. "Advances on Data Management and Information Systems," Information Systems Frontiers, Springer, vol. 24(1), pages 1-10, February.
    3. Thi-Minh-Trang Huynh & Chuen-Fa Ni & Yu-Sheng Su & Vo-Chau-Ngan Nguyen & I-Hsien Lee & Chi-Ping Lin & Hoang-Hiep Nguyen, 2022. "Predicting Heavy Metal Concentrations in Shallow Aquifer Systems Based on Low-Cost Physiochemical Parameters Using Machine Learning Techniques," IJERPH, MDPI, vol. 19(19), pages 1-21, September.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jsusta:v:14:y:2022:i:12:p:7375-:d:840446. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.