IDEAS home Printed from https://ideas.repec.org/a/gam/jftint/v17y2025i4p182-d1639084.html
   My bibliography  Save this article

Cybersecurity Intelligence Through Textual Data Analysis: A Framework Using Machine Learning and Terrorism Datasets

Author

Listed:
  • Mohammed Salem Atoum

    (Department of Computer Science, The University of Jordan, Amman 11942, Jordan)

  • Ala Abdulsalam Alarood

    (College of Computer Science and Engineering, University of Jeddah, Jeddah 21959, Saudi Arabia)

  • Eesa Alsolami

    (College of Computer Science and Engineering, University of Jeddah, Jeddah 21959, Saudi Arabia)

  • Adamu Abubakar

    (Department of Computer Science, International Islamic University Malaysia, Kuala Lumpur 53100, Malaysia)

  • Ahmad K. Al Hwaitat

    (Department of Computer Science, The University of Jordan, Amman 11942, Jordan)

  • Izzat Alsmadi

    (Department of Computing, Engineering and Mathematical Sciences, Texas A&M University, San Antonio, TX 78224, USA
    Department of Computer Information Systems, The University of Jordan, Aqaba 77110, Jordan)

Abstract

This study examines multi-lexical data sources, utilizing an extracted dataset from an open-source corpus and the Global Terrorism Datasets (GTDs), to predict lexical patterns that are directly linked to terrorism. This is essential as specific patterns within a textual context can facilitate the identification of terrorism-related content. The research methodology focuses on generating a corpus from various published works and extracting texts pertinent to “terrorism”. Afterwards, we extract additional lexical contexts of GTDs that directly relate to terrorism. The integration of multi-lexical data sources generates lexical patterns linked to terrorism. Machine learning models were used to train the dataset. We conducted two primary experiments and analyzed the results. The analysis of data obtained from open sources reveals that while the Extra Trees model achieved the highest accuracy at 94.31%, the XGBoost model demonstrated superior overall performance with a higher recall (81.32%) and F1-Score (83.06%) after tuning, indicating a better balance between sensitivity and precision. Similarly, on the GTD dataset, XGBoost consistently outperformed other models in recall and the F1-score, making it a more suitable candidate for tasks where minimizing false negatives is critical. This implies that we can establish a specific co-occurrence and context within the terrorism dataset from multiple lexical data sources in effectively identifying certain multi-lexical patterns such as “Suicide Attack/Casualty”, “Civilians/Victims”, and “Hostage Taking/Abduction” across various applications or contexts. This will facilitate the development of a framework for understanding the lexical patterns associated with terrorism.

Suggested Citation

  • Mohammed Salem Atoum & Ala Abdulsalam Alarood & Eesa Alsolami & Adamu Abubakar & Ahmad K. Al Hwaitat & Izzat Alsmadi, 2025. "Cybersecurity Intelligence Through Textual Data Analysis: A Framework Using Machine Learning and Terrorism Datasets," Future Internet, MDPI, vol. 17(4), pages 1-31, April.
  • Handle: RePEc:gam:jftint:v:17:y:2025:i:4:p:182-:d:1639084
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/1999-5903/17/4/182/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/1999-5903/17/4/182/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Song, Yu & Song, Yanqiu & Chang, Shiwei & He, Lele, 2024. "The role of gold in terrorism: Risk aversion or financing source?," Resources Policy, Elsevier, vol. 95(C).
    2. Xueli Hu & Fujun Lai & Gufan Chen & Rongcheng Zou & Qingxiang Feng, 2019. "Quantitative Research on Global Terrorist Attacks and Terrorist Attack Classification," Sustainability, MDPI, vol. 11(5), pages 1-16, March.
    3. Yang Liu & Tianxing Yang & Liwei Tian & Bincheng Huang & Jiaming Yang & Zihan Zeng, 2024. "Ada-XG-CatBoost: A Combined Forecasting Model for Gross Ecosystem Product (GEP) Prediction," Sustainability, MDPI, vol. 16(16), pages 1-19, August.
    4. Amjad Rehman Khan & Tanzila Saba & Tariq Sadad & Seng-phil Hong & Daqing Gong, 2022. "Cloud-Based Framework for COVID-19 Detection through Feature Fusion with Bootstrap Aggregated Extreme Learning Machine," Discrete Dynamics in Nature and Society, Hindawi, vol. 2022, pages 1-7, May.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Zhongbei Li & Xiangchun Li & Chen Dong & Fanfan Guo & Fan Zhang & Qi Zhang, 2021. "Quantitative Analysis of Global Terrorist Attacks Based on the Global Terrorism Database," Sustainability, MDPI, vol. 13(14), pages 1-19, July.
    2. repec:ers:journl:v:xxiv:y:2021:i:special4:p:18-39 is not listed on IDEAS
    3. Muhammad Athar Nadeem & Zhiying Liu & Haji Suleman Ali & Amna Younis & Muhammad Bilal & Yi Xu, 2020. "Innovation and Sustainable Development: Does Aid and Political Instability Impede Innovation?," SAGE Open, , vol. 10(4), pages 21582440209, November.
    4. Vishnu Kumar Kaliappan & Sundharamurthy Gnanamurthy & Abid Yahya & Ravi Samikannu & Muhammad Babar & Basit Qureshi & Anis Koubaa, 2023. "Machine Learning Based Healthcare Service Dissemination Using Social Internet of Things and Cloud Architecture in Smart Cities," Sustainability, MDPI, vol. 15(6), pages 1-15, March.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jftint:v:17:y:2025:i:4:p:182-:d:1639084. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.