Author
Listed:
- Dan Tian
- Xiao Wang
- Dongxin Liu
- Ying Hao
Abstract
Standard detectors such as YOLOv8 face significant challenges when applied to aerial drone imagery, including extreme scale variations, minute targets, and complex backgrounds. Their generic feature fusion architecture is prone to generating false positives and missing small objects. To address these limitations, we propose an improved MFDA-YOLO model based on YOLOv8. The model introduces an Attention-based Intra-scale Feature Interaction (AIFI) module in the backbone network to enhance high-level feature interactions, improve the adaptation to multi-scale targets, and strengthen feature representation. In the neck network, we design the Drone Image Detection Pyramid (DIDP) network, which integrates a space-to-depth convolution module to efficiently propagate multi-scale features from shallow to deep layers. By introducing an omni-kernel module in the cross-stage partial network for image recovery, DIDP can enhance global contextual awareness and eliminate the computational burden to extend the traditional P2 detection layer. Aiming at the problem of insufficient synergy between localization and classification tasks in the detection head, we design the Dynamic Alignment Detection Head (DADH). DADH can achieve cross-task representation optimization through multi-scale feature interaction learning and a dynamic feature selection mechanism, which significantly reduces model complexity and maintains detection accuracy. In addition, we employ the WIoUv3 loss function to dynamically adjust the focusing coefficients and enhance the model’s ability to distinguish small targets. Extensive experimental results demonstrate that MFDA-YOLO outperforms existing state-of-the-art methods such as YOLOv11 and YOLOv13 across the VisDrone2019, HIT-UAV, and NWPU VHR-10 datasets. Particularly on the VisDrone2019 dataset, MFDA-YOLO surpasses the baseline YOLOv8n model, achieving a 4.4 percentage point improvement in mAP0.5 and a 2.7 percentage point increase in mAP0.5:0.95. Furthermore, it reduces parameters by 17.2%, effectively lowering both false negative and false positive rates.
Suggested Citation
Dan Tian & Xiao Wang & Dongxin Liu & Ying Hao, 2025.
"MFDA-YOLO: A multiscale feature fusion and dynamic alignment network for UAV small objects detection,"
PLOS ONE, Public Library of Science, vol. 20(12), pages 1-23, December.
Handle:
RePEc:plo:pone00:0337810
DOI: 10.1371/journal.pone.0337810
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0337810. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.