IDEAS home Printed from https://ideas.repec.org/a/gam/jdataj/v11y2026i6p141-d1965334.html

DeepFakeX: A Comprehensive Multimodal Deepfake Dataset for Research and Analysis

Author

Listed:
  • Sonia Salman

    (Department of Computer Science, National University of Computer and Emerging Sciences, Karachi 75030, Pakistan)

  • Jawwad Ahmed Shamsi

    (Department of Computer Science, National University of Computer and Emerging Sciences, Karachi 75030, Pakistan)

  • Rizwan Qureshi

    (Department of Computer Science, Salim Habib University, Karachi 74900, Pakistan
    Center for Research in Computer Vision, University of Central Florida, Orlando, FL 32826, USA)

Abstract

The expanding capabilities of deep learning-based media synthesis have intensified concerns regarding the authenticity of digital content and the reliability of forensic analysis tools. In response to these challenges, this work introduces DeepFakeX, a collection of 800 synthetically generated videos available under controlled access for research purposes. The dataset encompasses four distinct categories of AI-driven synthesis: facial identity replacement, audio track substitution, neural voice cloning, and combined audiovisual alteration. Unlike existing deepfake datasets that predominantly focus on facial synthesis, DeepFakeX covers a broader range of manipulation modalities, reflecting the diversity of synthetic media encountered in real-world settings. All deepfakes were generated using state-of-the-art, publicly available tools. Standardized post-processing procedures were applied to each video to ensure uniformity in terms of quality, duration and encoding format. DeepFakeX also emphasizes diversity in gender, age, ethnicity, and language. Video contexts span speeches, informational videos, movie clips, news broadcasts, and interviews that reflect content scenarios commonly encountered in real-world online environments. The dataset includes videos in both English and Urdu. The dataset’s quality and structural variability were assessed through visual and audio analyses using the Structural Similarity Index Measure (SSIM), Mel-Frequency Cepstral Coefficients (MFCCs), and Principal Component Analysis (PCA). The evaluation results revealed substantial variability within each manipulation category, along with clearly distinguishable patterns specific to each modality. DeepFakeX has been developed to facilitate rigorous and transparent research in deepfake detection, cross-modal forensic analysis, and AI-driven media forensics. It is hosted on Zenodo under controlled access for research use.

Suggested Citation

  • Sonia Salman & Jawwad Ahmed Shamsi & Rizwan Qureshi, 2026. "DeepFakeX: A Comprehensive Multimodal Deepfake Dataset for Research and Analysis," Data, MDPI, vol. 11(6), pages 1-17, June.
  • Handle: RePEc:gam:jdataj:v:11:y:2026:i:6:p:141-:d:1965334
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2306-5729/11/6/141/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2306-5729/11/6/141/
    Download Restriction: no
    ---><---

    More about this item

    Keywords

    ;
    ;
    ;
    ;
    ;
    ;

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jdataj:v:11:y:2026:i:6:p:141-:d:1965334. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager The email address of this maintainer does not seem to be valid anymore. Please ask MDPI Indexing Manager to update the entry or send us the correct address (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.