Author
Listed:
- Haiyue Zhang
- Menglong Wu
- Xichang Cai
- Wenkai Liu
Abstract
Sound event detection (SED) and acoustic scene classification (ASC) are closely related tasks in environmental sound analysis. Given the interrelationship between sound events and scenes, some previous studies have proposed using the multitask learning (MTL) method to jointly analyze SED and ASC. However, these multitask learning methods are generally based on hard parameter-sharing, which exchange sound event and scene features only through the low-level network. Such approaches are difficult to balance the complex interrelationships between SED and ASC, and limits the feature sharing and information flow between tasks during the training. To address these challenges, this study proposes a novel multitask network based on residual multi-level feature extraction (R-MFE) framework, which aims to jointly analyze SED and ASC tasks, and utilize scene information to improve the performance of sound event detection. In addition, this study designs the D-LKAC attention module, which combines the advantages of self-attention mechanisms and convolution to capture global and local features. To further enhance SED performance, this study introduces the MS-conv module, which captures audio details from multiple dimensions. The proposed MTL method is evaluated on the TUT Acoustic Scenes 2016/2017 and TUT Sound Events 2016/2017 datasets. Experimental results indicate that our approach outperforms state-of-the-art techniques, improving the F-scores by 6.44%.
Suggested Citation
Haiyue Zhang & Menglong Wu & Xichang Cai & Wenkai Liu, 2025.
"Scene-dependent sound event detection based on multitask learning with deformable large kernel attention convolution,"
PLOS ONE, Public Library of Science, vol. 20(5), pages 1-15, May.
Handle:
RePEc:plo:pone00:0322002
DOI: 10.1371/journal.pone.0322002
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0322002. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.