IDEAS home Printed from https://ideas.repec.org/a/bdz/inscte/v4y2025i8p55-65.html

Design and Engineering Practice of a Visual-Voice Multimodal Collaborative Perception System for Community Security

Author

Listed:
  • Yang Zhong

    (Kingdee Software (China) Co., Ltd., Shenzhen, Guangdong 518025, China)

Abstract

Aiming at the inherent limitations of single-modal perception in community security scenarios—visual detection is susceptible to low-light conditions and occlusions, while voice recognition often suffers from misjudgments due to environmental noise—this study designs and implements a deep learning-based visual-voice multimodal collaborative perception system. Centered on the core of “heterogeneous modal complementary enhancement”, the system adopts a modular technical architecture through feature-level fusion and dynamic decision-making collaborative strategies: (1) The visual module employs an improved YOLOv12s algorithm, integrating adaptive Retinex contrast enhancement and dynamic Gaussian Mixture Model (GMM) background modeling to enhance the robustness of object detection under complex lighting; (2) The voice module is built on a CRNN (CNN+BiLSTM) architecture, combining multi-channel beamforming and SpecAugment data augmentation to strengthen abnormal sound recognition in noisy environments; (3) The multimodal collaboration module innovatively introduces an attention-based feature alignment mechanism and scene-adaptive threshold decision-making to achieve efficient fusion of cross-modal information. Validated on the self-constructed CommunityGuard V1.0 community security dataset (covering 50 hours of multi-scenario synchronized audio-visual data, including day/night, sunny/rainy, and noisy/quiet sub-scenarios), the multimodal collaborative detection achieves F1-Scores that are 5.8% and 13.6% higher than those of visual single-modal and voice single-modal detection, respectively. Particularly in night-noisy scenarios (illumination

Suggested Citation

  • Yang Zhong, 2025. "Design and Engineering Practice of a Visual-Voice Multimodal Collaborative Perception System for Community Security," Innovation in Science and Technology, Paradigm Academic Press, vol. 4(8), pages 55-65, September.
  • Handle: RePEc:bdz:inscte:v:4:y:2025:i:8:p:55-65
    DOI: 10.63593/IST.2788-7030.2025.09.008
    as

    Download full text from publisher

    File URL: https://www.paradigmpress.org/ist/article/view/1805/1639
    Download Restriction: no

    File URL: https://libkey.io/10.63593/IST.2788-7030.2025.09.008?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    More about this item

    Keywords

    ;
    ;
    ;
    ;
    ;
    ;
    ;
    ;
    ;
    ;

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bdz:inscte:v:4:y:2025:i:8:p:55-65. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Editorial Office (email available below). General contact details of provider: https://www.paradigmpress.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.