Author
Listed:
- Mohammed Salem Atoum
(Department of Computer Science, The University of Jordan, Amman 11942, Jordan)
- Ala Abdulsalam Alarood
(College of Computer Science and Engineering, University of Jeddah, Jeddah 21959, Saudi Arabia)
- Eesa Alsolami
(College of Computer Science and Engineering, University of Jeddah, Jeddah 21959, Saudi Arabia)
- Adamu Abubakar
(Department of Computer Science, International Islamic University Malaysia, Kuala Lumpur 53100, Malaysia)
- Ahmad K. Al Hwaitat
(Department of Computer Science, The University of Jordan, Amman 11942, Jordan)
- Izzat Alsmadi
(Department of Computing, Engineering and Mathematical Sciences, Texas A&M University, San Antonio, TX 78224, USA
Department of Computer Information Systems, The University of Jordan, Aqaba 77110, Jordan)
Abstract
This study examines multi-lexical data sources, utilizing an extracted dataset from an open-source corpus and the Global Terrorism Datasets (GTDs), to predict lexical patterns that are directly linked to terrorism. This is essential as specific patterns within a textual context can facilitate the identification of terrorism-related content. The research methodology focuses on generating a corpus from various published works and extracting texts pertinent to “terrorism”. Afterwards, we extract additional lexical contexts of GTDs that directly relate to terrorism. The integration of multi-lexical data sources generates lexical patterns linked to terrorism. Machine learning models were used to train the dataset. We conducted two primary experiments and analyzed the results. The analysis of data obtained from open sources reveals that while the Extra Trees model achieved the highest accuracy at 94.31%, the XGBoost model demonstrated superior overall performance with a higher recall (81.32%) and F1-Score (83.06%) after tuning, indicating a better balance between sensitivity and precision. Similarly, on the GTD dataset, XGBoost consistently outperformed other models in recall and the F1-score, making it a more suitable candidate for tasks where minimizing false negatives is critical. This implies that we can establish a specific co-occurrence and context within the terrorism dataset from multiple lexical data sources in effectively identifying certain multi-lexical patterns such as “Suicide Attack/Casualty”, “Civilians/Victims”, and “Hostage Taking/Abduction” across various applications or contexts. This will facilitate the development of a framework for understanding the lexical patterns associated with terrorism.
Suggested Citation
Mohammed Salem Atoum & Ala Abdulsalam Alarood & Eesa Alsolami & Adamu Abubakar & Ahmad K. Al Hwaitat & Izzat Alsmadi, 2025.
"Cybersecurity Intelligence Through Textual Data Analysis: A Framework Using Machine Learning and Terrorism Datasets,"
Future Internet, MDPI, vol. 17(4), pages 1-31, April.
Handle:
RePEc:gam:jftint:v:17:y:2025:i:4:p:182-:d:1639084
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jftint:v:17:y:2025:i:4:p:182-:d:1639084. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.