Author
Listed:
- Sadi M. Redwan
(University of Rajshahi, Bangladesh)
- Md Rashed-Al-Mahfuz
(University of Rajshahi, Bangladesh)
- Md Ekramul Hamid
(University of Rajshahi, Bangladesh)
Abstract
The importance of speech command recognition in a human-machine interaction system is increased in recent years. In this study, we propose a deep neural network-based system for acoustic and throat command speech recognition. We apply a preprocessed pipeline to create the input of the deep learning model. Firstly, speech commands are decomposed into components using well-known signal decomposition techniques. The Mel-frequency cepstral coefficients (MFCC) feature extraction method is applied to each component of the speech commands to obtain the feature inputs for the recognition system. At this stage, we apply and compare performance using different speech decomposition techniques such as wavelet packet decomposition (WPD), continuous wavelet transform (CWT), and empirical mode decomposition (EMD) in order to find out the best technique for our model. We observe that WPD shows the best performance in terms of classification accuracy. This paper investigates long short-term memory (LSTM)-based recurrent neural network (RNN), which is trained using the extracted MFCC features. The proposed neural network is trained and tested using acoustic speech commands. Moreover, we also train and test the proposed model using a throat mic. speech commands as well. Lastly, the transfer learning technique is employed to increase the test accuracy for throat speech recognition. The weights of the model train with the acoustic signal are used to initialize the model used for throat speech recognition. Overall, we have found significant classification accuracy for both acoustic and throat command speech. We obtain LSTM is much better than the GMM-HMM model, convolutional neural networks such as CNN-tpool2 and residual networks such as res15 and res26 with an accuracy score of over 97% on Google’s Speech Commands dataset and we achieve 95.35% accuracy on our throat speech data set using the transfer learning technique.
Suggested Citation
Handle:
RePEc:epw:comput:v:3:y:2023:i:2:id:10088
DOI: 10.24018/compute.2023.3.2.88
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:epw:comput:v:3:y:2023:i:2:id:10088. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Support Team (email available below). General contact details of provider: https://eu-opensci.org/index.php/compute .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.