Author
Listed:
- Rong Zhang
(College of Systems Engineering, National University of Defense Technology, Changsha 410073, China
School of Naval Architecture and Civil Engineering, Jiangsu University of Science and Technology, Zhangjiagang 212003, China)
- Mao-Yi Xiong
(College of Computer Science and Technology, National University of Defense Technology, Changsha 410073, China)
- Jun-Jie Huang
(College of Computer Science and Technology, National University of Defense Technology, Changsha 410073, China)
Abstract
Multi-modality image fusion (MIF) aims to integrate complementary information from diverse imaging modalities into a single comprehensive representation and serves as an essential processing step for downstream high-level computer vision tasks. The existing deep unfolding-based processes demonstrate promising results; however, they often rely on deterministic priors with limited generalization ability and usually decouple from the training process of object detection. In this paper, we propose Semantic-Aware Deep Unfolded Network with Diffusion Prior (SEND), a novel framework designed for transparent and effective multi-modality fusion and object detection. SEND consists of a Denoising Prior Guided Fusion Module and a Fusion Object Detection Module. The Denoising Prior Guided Fusion Module does not utilize the traditional deterministic prior but combines the diffusion prior with deep unfolding, leading to improved multi-modal fusion performance and generalization ability. It is designed with a model-based optimization formulation for multi-modal image fusion, which is unfolded into two cascaded blocks: a Diffusion Denoising Fusion Block to generate informative diffusion priors and a Data Consistency Enhancement Block that explicitly aggregates complementary features from both the diffusion priors and input modalities. Additionally, SEND incorporates the Fusion Object Detection Module with the Denoising Prior Guided Fusion Module for object detection task optimization using a carefully designed two-stage training strategy. Experiments demonstrate that the proposed SEND method outperforms state-of-the-art methods, achieving superior fusion quality with improved efficiency and interpretability.
Suggested Citation
Rong Zhang & Mao-Yi Xiong & Jun-Jie Huang, 2025.
"SEND: Semantic-Aware Deep Unfolded Network with Diffusion Prior for Multi-Modal Image Fusion and Object Detection,"
Mathematics, MDPI, vol. 13(16), pages 1-16, August.
Handle:
RePEc:gam:jmathe:v:13:y:2025:i:16:p:2584-:d:1723104
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:13:y:2025:i:16:p:2584-:d:1723104. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.