IDEAS home Printed from https://ideas.repec.org/a/gam/jftint/v17y2025i4p182-d1639084.html
   My bibliography  Save this article

Cybersecurity Intelligence Through Textual Data Analysis: A Framework Using Machine Learning and Terrorism Datasets

Author

Listed:
  • Mohammed Salem Atoum

    (Department of Computer Science, The University of Jordan, Amman 11942, Jordan)

  • Ala Abdulsalam Alarood

    (College of Computer Science and Engineering, University of Jeddah, Jeddah 21959, Saudi Arabia)

  • Eesa Alsolami

    (College of Computer Science and Engineering, University of Jeddah, Jeddah 21959, Saudi Arabia)

  • Adamu Abubakar

    (Department of Computer Science, International Islamic University Malaysia, Kuala Lumpur 53100, Malaysia)

  • Ahmad K. Al Hwaitat

    (Department of Computer Science, The University of Jordan, Amman 11942, Jordan)

  • Izzat Alsmadi

    (Department of Computing, Engineering and Mathematical Sciences, Texas A&M University, San Antonio, TX 78224, USA
    Department of Computer Information Systems, The University of Jordan, Aqaba 77110, Jordan)

Abstract

This study examines multi-lexical data sources, utilizing an extracted dataset from an open-source corpus and the Global Terrorism Datasets (GTDs), to predict lexical patterns that are directly linked to terrorism. This is essential as specific patterns within a textual context can facilitate the identification of terrorism-related content. The research methodology focuses on generating a corpus from various published works and extracting texts pertinent to “terrorism”. Afterwards, we extract additional lexical contexts of GTDs that directly relate to terrorism. The integration of multi-lexical data sources generates lexical patterns linked to terrorism. Machine learning models were used to train the dataset. We conducted two primary experiments and analyzed the results. The analysis of data obtained from open sources reveals that while the Extra Trees model achieved the highest accuracy at 94.31%, the XGBoost model demonstrated superior overall performance with a higher recall (81.32%) and F1-Score (83.06%) after tuning, indicating a better balance between sensitivity and precision. Similarly, on the GTD dataset, XGBoost consistently outperformed other models in recall and the F1-score, making it a more suitable candidate for tasks where minimizing false negatives is critical. This implies that we can establish a specific co-occurrence and context within the terrorism dataset from multiple lexical data sources in effectively identifying certain multi-lexical patterns such as “Suicide Attack/Casualty”, “Civilians/Victims”, and “Hostage Taking/Abduction” across various applications or contexts. This will facilitate the development of a framework for understanding the lexical patterns associated with terrorism.

Suggested Citation

  • Mohammed Salem Atoum & Ala Abdulsalam Alarood & Eesa Alsolami & Adamu Abubakar & Ahmad K. Al Hwaitat & Izzat Alsmadi, 2025. "Cybersecurity Intelligence Through Textual Data Analysis: A Framework Using Machine Learning and Terrorism Datasets," Future Internet, MDPI, vol. 17(4), pages 1-31, April.
  • Handle: RePEc:gam:jftint:v:17:y:2025:i:4:p:182-:d:1639084
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/1999-5903/17/4/182/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/1999-5903/17/4/182/
    Download Restriction: no
    ---><---

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jftint:v:17:y:2025:i:4:p:182-:d:1639084. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.