Author
Listed:
- Yinfei Xiao
(Department of E-Commerce, Jinan University (Shenzhen Campus), Shenzhen 518053, China)
- Yanbing Zhou
(Department of E-Commerce, Jinan University (Shenzhen Campus), Shenzhen 518053, China
These authors contributed equally to this work.)
- Pengzhan Cheng
(Department of E-Commerce, Jinan University (Shenzhen Campus), Shenzhen 518053, China
These authors contributed equally to this work.)
- Leqian Ni
(Department of E-Commerce, Jinan University (Shenzhen Campus), Shenzhen 518053, China)
- Xusheng Wu
(Department of E-Commerce, Jinan University (Shenzhen Campus), Shenzhen 518053, China)
- Tianxiang Zheng
(Department of E-Commerce, Jinan University (Shenzhen Campus), Shenzhen 518053, China)
Abstract
As face forgery techniques, particularly the DeepFake method, progress, the imperative for effective detection of manipulations that enable hyper-realistic facial representations to mitigate security threats is emphasized. Current spatial domain approaches commonly encounter difficulties in generalizing across various forgery methods and compression artifacts, whereas frequency-based analyses exhibit promise in identifying nuanced local cues; however, the absence of global contexts impedes the capacity of detection methods to improve generalization. This study introduces a hybrid architecture that integrates Efficient-ViT and multi-level wavelet transform to dynamically merge spatial and frequency features through a dynamic adaptive multi-branch attention (DAMA) mechanism, thereby improving the deep interaction between the two modalities. We innovatively devise a joint loss function and a training strategy to address the imbalanced data issue and improve the training process. Experimental results on the FaceForensics++ and Celeb-DF (V2) have validated the effectiveness of our approach, attaining 97.07% accuracy in intra-dataset evaluations and a 74.7% AUC score in cross-dataset assessments, surpassing our baseline Efficient-ViT by 14.1% and 7.7%, respectively. The findings indicate that our approach excels in generalization across various datasets and methodologies, while also effectively minimizing feature redundancy through an innovative orthogonal loss that regularizes the feature space, as evidenced by the ablation study and parameter analysis.
Suggested Citation
Yinfei Xiao & Yanbing Zhou & Pengzhan Cheng & Leqian Ni & Xusheng Wu & Tianxiang Zheng, 2025.
"An Attention-Based Framework for Detecting Face Forgeries: Integrating Efficient-ViT and Wavelet Transform,"
Mathematics, MDPI, vol. 13(16), pages 1-30, August.
Handle:
RePEc:gam:jmathe:v:13:y:2025:i:16:p:2576-:d:1722772
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:13:y:2025:i:16:p:2576-:d:1722772. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.