IDEAS home Printed from https://ideas.repec.org/a/gam/jdataj/v9y2024i3p39-d1344674.html
   My bibliography  Save this article

CybAttT: A Dataset of Cyberattack News Tweets for Enhanced Threat Intelligence

Author

Listed:
  • Huda Lughbi

    (College of Computing, Umm-Alqura University, Mecca 24382, Saudi Arabia)

  • Mourad Mars

    (College of Computing, Umm-Alqura University, Mecca 24382, Saudi Arabia
    Higher Institute of Computer Sciences and Mathematics, Monastir University, Monastir 5000, Tunisia)

  • Khaled Almotairi

    (College of Computing, Umm-Alqura University, Mecca 24382, Saudi Arabia)

Abstract

The continuous developments in information technologies have resulted in a significant rise in security concerns, including cybercrimes, unauthorized access, and cyberattacks. Recently, researchers have increasingly turned to social media platforms like X to investigate cyberattacks. Analyzing and collecting news about cyberattacks from tweets can efficiently provide crucial insights into the attacks themselves, including their impacts, occurrence regions, and potential mitigation strategies. However, there is a shortage of labeled datasets related to cyberattacks. This paper describes CybAttT, a dataset of 36,071 English cyberattack-related tweets. These tweets are manually labeled into three classes: high-risk news, normal news, and not news. Our final overall Inner Annotation agreement was 0.99 (Fleiss kappa), which represents high agreement. To ensure dataset reliability and accuracy, we conducted rigorous experiments using different supervised machine learning algorithms and various fine-tuned language models to assess its quality and suitability for its intended purpose. A high F1-score of 87.6% achieved using the CybAttT dataset not only demonstrates the potential of our approach but also validates the high quality and thoroughness of its annotations. We have made our CybAttT dataset accessible to the public for research purposes.

Suggested Citation

  • Huda Lughbi & Mourad Mars & Khaled Almotairi, 2024. "CybAttT: A Dataset of Cyberattack News Tweets for Enhanced Threat Intelligence," Data, MDPI, vol. 9(3), pages 1-16, February.
  • Handle: RePEc:gam:jdataj:v:9:y:2024:i:3:p:39-:d:1344674
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2306-5729/9/3/39/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2306-5729/9/3/39/
    Download Restriction: no
    ---><---

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jdataj:v:9:y:2024:i:3:p:39-:d:1344674. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.