IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v13y2025i12p1907-d1673997.html
   My bibliography  Save this article

CAG-MoE: Multimodal Emotion Recognition with Cross-Attention Gated Mixture of Experts

Author

Listed:
  • Axel Gedeon Mengara Mengara

    (Department of Artificial Intelligence Data Science, Sejong University, 209 Neungdong-ro, Gwangjin District, Seoul 05006, Republic of Korea)

  • Yeon-kug Moon

    (Department of Artificial Intelligence Data Science, Sejong University, 209 Neungdong-ro, Gwangjin District, Seoul 05006, Republic of Korea)

Abstract

Multimodal emotion recognition faces substantial challenges due to the inherent heterogeneity of data sources, each with its own temporal resolution, noise characteristics, and potential for incompleteness. For example, physiological signals, audio features, and textual data capture complementary yet distinct aspects of emotion, requiring specialized processing to extract meaningful cues. These challenges include aligning disparate modalities, handling varying levels of noise and missing data, and effectively fusing features without diluting critical contextual information. In this work, we propose a novel Mixture of Experts (MoE) framework that addresses these challenges by integrating specialized transformer-based sub-expert networks, a dynamic gating mechanism with sparse Top- k activation, and a cross-modal attention module. Each modality is processed by multiple dedicated sub-experts designed to capture intricate temporal and contextual patterns, while the dynamic gating network selectively weights the contributions of the most relevant experts. Our cross-modal attention module further enhances the integration by facilitating precise exchange of information among modalities, thereby reinforcing robustness in the presence of noisy or incomplete data. Additionally, an auxiliary diversity loss encourages expert specialization, ensuring the fused representation remains highly discriminative. Extensive theoretical analysis and rigorous experiments on benchmark datasets—the Korean Emotion Multimodal Database (KEMDy20) and the ASCERTAIN dataset—demonstrate that our approach significantly outperforms state-of-the-art methods in emotion recognition, setting new performance baselines in affective computing.

Suggested Citation

  • Axel Gedeon Mengara Mengara & Yeon-kug Moon, 2025. "CAG-MoE: Multimodal Emotion Recognition with Cross-Attention Gated Mixture of Experts," Mathematics, MDPI, vol. 13(12), pages 1-37, June.
  • Handle: RePEc:gam:jmathe:v:13:y:2025:i:12:p:1907-:d:1673997
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/13/12/1907/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/13/12/1907/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Sathishkumar Moorthy & Yeon-Kug Moon, 2025. "Hybrid Multi-Attention Network for Audio–Visual Emotion Recognition Through Multimodal Feature Fusion," Mathematics, MDPI, vol. 13(7), pages 1-30, March.
    2. Axel Gedeon Mengara Mengara & Younghak Kim & Younghwan Yoo & Jaehun Ahn, 2020. "Distributed Deep Features Extraction Model for Air Quality Forecasting," Sustainability, MDPI, vol. 12(19), pages 1-19, September.
    3. Axel Gedeon Mengara Mengara & Eunyoung Park & Jinho Jang & Younghwan Yoo, 2022. "Attention-Based Distributed Deep Learning Model for Air Quality Forecasting," Sustainability, MDPI, vol. 14(6), pages 1-34, March.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. María Inmaculada Rodríguez-García & María Gema Carrasco-García & Javier González-Enrique & Juan Jesús Ruiz-Aguilar & Ignacio J. Turias, 2023. "Long Short-Term Memory Approach for Short-Term Air Quality Forecasting in the Bay of Algeciras (Spain)," Sustainability, MDPI, vol. 15(6), pages 1-20, March.
    2. Sang Won Choi & Brian H. S. Kim, 2021. "Applying PCA to Deep Learning Forecasting Models for Predicting PM 2.5," Sustainability, MDPI, vol. 13(7), pages 1-30, March.
    3. Axel Gedeon Mengara Mengara & Eunyoung Park & Jinho Jang & Younghwan Yoo, 2022. "Attention-Based Distributed Deep Learning Model for Air Quality Forecasting," Sustainability, MDPI, vol. 14(6), pages 1-34, March.

    More about this item

    Keywords

    ;
    ;
    ;
    ;
    ;

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:13:y:2025:i:12:p:1907-:d:1673997. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.