IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v10y2022i19p3618-d932368.html
   My bibliography  Save this article

The ASR Post-Processor Performance Challenges of BackTranScription (BTS): Data-Centric and Model-Centric Approaches

Author

Listed:
  • Chanjun Park

    (Department of Computer Science and Engineering, Korea University, Seoul 02841, Korea
    Upstage, Yongin 16942, Korea)

  • Jaehyung Seo

    (Department of Computer Science and Engineering, Korea University, Seoul 02841, Korea)

  • Seolhwa Lee

    (Department of Computer Science, University of Copenhagen, DK-2100 Copenhagen, Denmark)

  • Chanhee Lee

    (Naver Corporation, Seongnam 13561, Korea)

  • Heuiseok Lim

    (Department of Computer Science and Engineering, Korea University, Seoul 02841, Korea)

Abstract

Training an automatic speech recognition (ASR) post-processor based on sequence-to-sequence (S2S) requires a parallel pair (e.g., speech recognition result and human post-edited sentence) to construct the dataset, which demands a great amount of human labor. BackTransScription (BTS) proposes a data-building method to mitigate the limitations of the existing S2S based ASR post-processors, which can automatically generate vast amounts of training datasets, reducing time and cost in data construction. Despite the emergence of this novel approach, the BTS-based ASR post-processor still has research challenges and is mostly untested in diverse approaches. In this study, we highlight these challenges through detailed experiments by analyzing the data-centric approach (i.e., controlling the amount of data without model alteration) and the model-centric approach (i.e., model modification). In other words, we attempt to point out problems with the current trend of research pursuing a model-centric approach and alert against ignoring the importance of the data. Our experiment results show that the data-centric approach outperformed the model-centric approach by +11.69, +17.64, and +19.02 in the F1-score, BLEU, and GLEU tests.

Suggested Citation

  • Chanjun Park & Jaehyung Seo & Seolhwa Lee & Chanhee Lee & Heuiseok Lim, 2022. "The ASR Post-Processor Performance Challenges of BackTranScription (BTS): Data-Centric and Model-Centric Approaches," Mathematics, MDPI, vol. 10(19), pages 1-8, October.
  • Handle: RePEc:gam:jmathe:v:10:y:2022:i:19:p:3618-:d:932368
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/10/19/3618/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/10/19/3618/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Leonardo Ranaldi & Francesca Fallucchi & Fabio Massimo Zanzotto, 2021. "Dis-Cover AI Minds to Preserve Human Knowledge," Future Internet, MDPI, vol. 14(1), pages 1-15, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Ana Laura Lezama-Sánchez & Mireya Tovar Vidal & José A. Reyes-Ortiz, 2022. "An Approach Based on Semantic Relationship Embeddings for Text Classification," Mathematics, MDPI, vol. 10(21), pages 1-15, November.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:10:y:2022:i:19:p:3618-:d:932368. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.