Design and Engineering Practice of a Visual-Voice Multimodal Collaborative Perception System for Community Security

Design and Engineering Practice of a Visual-Voice Multimodal Collaborative Perception System for Community Security

Author

Listed:

Yang Zhong
(Kingdee Software (China) Co., Ltd., Shenzhen, Guangdong 518025, China)

Abstract

Aiming at the inherent limitations of single-modal perception in community security scenarios—visual detection is susceptible to low-light conditions and occlusions, while voice recognition often suffers from misjudgments due to environmental noise—this study designs and implements a deep learning-based visual-voice multimodal collaborative perception system. Centered on the core of “heterogeneous modal complementary enhancement”, the system adopts a modular technical architecture through feature-level fusion and dynamic decision-making collaborative strategies: (1) The visual module employs an improved YOLOv12s algorithm, integrating adaptive Retinex contrast enhancement and dynamic Gaussian Mixture Model (GMM) background modeling to enhance the robustness of object detection under complex lighting; (2) The voice module is built on a CRNN (CNN+BiLSTM) architecture, combining multi-channel beamforming and SpecAugment data augmentation to strengthen abnormal sound recognition in noisy environments; (3) The multimodal collaboration module innovatively introduces an attention-based feature alignment mechanism and scene-adaptive threshold decision-making to achieve efficient fusion of cross-modal information. Validated on the self-constructed CommunityGuard V1.0 community security dataset (covering 50 hours of multi-scenario synchronized audio-visual data, including day/night, sunny/rainy, and noisy/quiet sub-scenarios), the multimodal collaborative detection achieves F1-Scores that are 5.8% and 13.6% higher than those of visual single-modal and voice single-modal detection, respectively. Particularly in night-noisy scenarios (illumination

Suggested Citation

Yang Zhong, 2025. "Design and Engineering Practice of a Visual-Voice Multimodal Collaborative Perception System for Community Security," Innovation in Science and Technology, Paradigm Academic Press, vol. 4(8), pages 55-65, September.

Handle: RePEc:bdz:inscte:v:4:y:2025:i:8:p:55-65
DOI: 10.63593/IST.2788-7030.2025.09.008

Download full text from publisher

More about this item

Keywords

; ; ; ; ; ; ; ; ; ;

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bdz:inscte:v:4:y:2025:i:8:p:55-65. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

We have no bibliographic references for this item. You can help adding them by using this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Editorial Office (email available below). General contact details of provider: https://www.paradigmpress.org/ .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

Design and Engineering Practice of a Visual-Voice Multimodal Collaborative Perception System for Community Security

Author

Abstract

Suggested Citation

Download full text from publisher

More about this item

Keywords

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data