IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v10y2022i16p2913-d887274.html
   My bibliography  Save this article

Language Accent Detection with CNN Using Sparse Data from a Crowd-Sourced Speech Archive

Author

Listed:
  • Veranika Mikhailava

    (School of Computer Science and Engineering, The University of Aizu, Aizu-Wakamatsu 965-8580, Japan
    These authors contributed equally to this work.)

  • Mariia Lesnichaia

    (Institute of Computer Science and Technology, Peter the Great St. Petersburg Polytechnic University, 195251 St. Petersburg, Russia
    These authors contributed equally to this work.)

  • Natalia Bogach

    (Institute of Computer Science and Technology, Peter the Great St. Petersburg Polytechnic University, 195251 St. Petersburg, Russia)

  • Iurii Lezhenin

    (Institute of Computer Science and Technology, Peter the Great St. Petersburg Polytechnic University, 195251 St. Petersburg, Russia
    Speech Technology Center, 194044 St. Petersburg, Russia)

  • John Blake

    (School of Computer Science and Engineering, The University of Aizu, Aizu-Wakamatsu 965-8580, Japan)

  • Evgeny Pyshkin

    (School of Computer Science and Engineering, The University of Aizu, Aizu-Wakamatsu 965-8580, Japan)

Abstract

The problem of accent recognition has received a lot of attention with the development of Automatic Speech Recognition (ASR) systems. The crux of the problem is that conventional acoustic language models adapted to fit standard language corpora are unable to satisfy the recognition requirements for accented speech. In this research, we contribute to the accent recognition task for a group of up to nine European accents in English and try to provide some evidence in favor of specific hyperparameter choices for neural network models together with the search for the best input speech signal parameters to ameliorate the baseline accent recognition accuracy. Specifically, we used a CNN-based model trained on the audio features extracted from the Speech Accent Archive dataset, which is a crowd-sourced collection of accented speech recordings. We show that harnessing time–frequency and energy features (such as spectrogram, chromogram, spectral centroid, spectral rolloff, and fundamental frequency) to the Mel-frequency cepstral coefficients (MFCC) may increase the accuracy of the accent classification compared to the conventional feature sets of MFCC and/or raw spectrograms. Our experiments demonstrate that the most impact is brought about by amplitude mel-spectrograms on a linear scale fed into the model. Amplitude mel-spectrograms on a linear scale, which are the correlates of the audio signal energy, allow to produce state-of-the-art classification results and brings the recognition accuracy for English with Germanic, Romance and Slavic accents ranged from 0.964 to 0.987; thus, outperforming existing models of classifying accents which use the Speech Accent Archive. We also investigated how the speech rhythm affects the recognition accuracy. Based on our preliminary experiments, we used the audio recordings in their original form (i.e., with all the pauses preserved) for other accent classification experiments.

Suggested Citation

  • Veranika Mikhailava & Mariia Lesnichaia & Natalia Bogach & Iurii Lezhenin & John Blake & Evgeny Pyshkin, 2022. "Language Accent Detection with CNN Using Sparse Data from a Crowd-Sourced Speech Archive," Mathematics, MDPI, vol. 10(16), pages 1-30, August.
  • Handle: RePEc:gam:jmathe:v:10:y:2022:i:16:p:2913-:d:887274
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/10/16/2913/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/10/16/2913/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Mohammed Algabri & Hassan Mathkour & Mansour Alsulaiman & Mohamed A. Bencherif, 2022. "Mispronunciation Detection and Diagnosis with Articulatory-Level Feedback Generation for Non-Native Arabic Speech," Mathematics, MDPI, vol. 10(15), pages 1-24, August.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.

      Corrections

      All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:10:y:2022:i:16:p:2913-:d:887274. See general information about how to correct material in RePEc.

      If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

      If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

      If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

      For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

      Please note that corrections may take a couple of weeks to filter through the various RePEc services.

      IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.