Toward Robust Speech Emotion Recognition And Classification Using Natural Language Processing With Deep Learning Model

My bibliography Save this article

Toward Robust Speech Emotion Recognition And Classification Using Natural Language Processing With Deep Learning Model

Author

Listed:

SAAD ALAHMARI
(Department of Computer Science, Applied College, Northern Border University, Arar, Saudi Arabia)
NAJLA I. AL-SHATHRY
(ï¿½ï¿½Department of Language Preparation, Arabic Language Teaching Institute, Princess Nourah Bint Abdulrahman University, P. O. Box 84428, Riyadh 11671, Saudi Arabia)
MAJDY M. ELTAHIR
(ï¿½ï¿½Department of Information Systems, Applied College at Mahayil, King Khalid University, Abha, Saudi Arabia)
MUHAMMAD SWAILEH A. ALZAIDI
(ï¿½Department of English Language, College of Language Sciences, King Saud University, P. O. Box 145111, Riyadh, Saudi Arabia)
AYMAN AHMAD ALGHAMDI
(ï¿½Department of Arabic Teaching, Arabic Language Institute, Umm Al-qura University, Mecca, Saudi Arabia)
AHMED MAHMUD
(ï¿½ï¿½Research Center, Future University in Egypt, New Cairo 11835, Egypt)

Registered:

Abstract

Speech Emotion Recognition (SER) plays a significant role in humanâ€“machine interaction applications. Over the last decade, many SER systems have been anticipated. However, the performance of the SER system remains a challenge owing to the noise, high system complexity and ineffective feature discrimination. SER is challenging and vital, and feature extraction is critical in SER performance. Deep Learning (DL)-based techniques emerge as proficient solutions for SER due to their competence in learning unlabeled data, superior capability of feature representation, capability to handle larger datasets and ability to handle complex features. Different DL techniques, like Long Short-Term Memory (LSTM), Convolutional Neural Network (CNN), Deep Neural Network (DNN) and so on, are successfully presented for automated SER. The study proposes a Robust SER and Classification using the Natural Language Processing with DL (RSERC-NLPDL) model. The presented RSERC-NLPDL technique intends to identify the emotions in the speech signals. In the RSERC-NLPDL technique, pre-processing is initially performed to transform the input speech signal into a valid format. Besides, the RSERC-NLPDL technique extracts a set of features comprising Mel-Frequency Cepstral Coefficients (MFCCs), Zero-Crossing Rate (ZCR), Harmonic-to-Noise Rate (HNR) and Teager Energy Operator (TEO). Next, selecting features can be carried out using Fractal Seagull Optimization Algorithm (FSOA). The Temporal Convolutional Autoencoder (TCAE) model is applied to identify speech emotions, and its hyperparameters are selected using fractal Sand Cat Swarm Optimization (SCSO) algorithm. The simulation analysis of the RSERC-NLPDL method is tested using a speech database. The investigational analysis of the RSERC-NLPDL technique showed superior accuracy outcomes of 94.32% and 95.25% under EMODB and RAVDESS datasets over other models in distinct measures.

Suggested Citation

Saad Alahmari & Najla I. Al-Shathry & Majdy M. Eltahir & Muhammad Swaileh A. Alzaidi & Ayman Ahmad Alghamdi & Ahmed Mahmud, 2025. "Toward Robust Speech Emotion Recognition And Classification Using Natural Language Processing With Deep Learning Model," FRACTALS (fractals), World Scientific Publishing Co. Pte. Ltd., vol. 33(02), pages 1-15.

Handle: RePEc:wsi:fracta:v:33:y:2025:i:02:n:s0218348x25400225
DOI: 10.1142/S0218348X25400225

Download full text from publisher

As the access to this document is restricted, you may want to

for a different version of it.

More about this item

Keywords

; ; ; ;

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:wsi:fracta:v:33:y:2025:i:02:n:s0218348x25400225. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

We have no bibliographic references for this item. You can help adding them by using this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Tai Tone Lim (email available below). General contact details of provider: https://www.worldscientific.com/worldscinet/fractals .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

Toward Robust Speech Emotion Recognition And Classification Using Natural Language Processing With Deep Learning Model

Author

Abstract

Suggested Citation

Download full text from publisher

More about this item

Keywords

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data