IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1005617.html
   My bibliography  Save this article

Relating dynamic brain states to dynamic machine states: Human and machine solutions to the speech recognition problem

Author

Listed:
  • Cai Wingfield
  • Li Su
  • Xunying Liu
  • Chao Zhang
  • Phil Woodland
  • Andrew Thwaites
  • Elisabeth Fonteneau
  • William D Marslen-Wilson

Abstract

There is widespread interest in the relationship between the neurobiological systems supporting human cognition and emerging computational systems capable of emulating these capacities. Human speech comprehension, poorly understood as a neurobiological process, is an important case in point. Automatic Speech Recognition (ASR) systems with near-human levels of performance are now available, which provide a computationally explicit solution for the recognition of words in continuous speech. This research aims to bridge the gap between speech recognition processes in humans and machines, using novel multivariate techniques to compare incremental ‘machine states’, generated as the ASR analysis progresses over time, to the incremental ‘brain states’, measured using combined electro- and magneto-encephalography (EMEG), generated as the same inputs are heard by human listeners. This direct comparison of dynamic human and machine internal states, as they respond to the same incrementally delivered sensory input, revealed a significant correspondence between neural response patterns in human superior temporal cortex and the structural properties of ASR-derived phonetic models. Spatially coherent patches in human temporal cortex responded selectively to individual phonetic features defined on the basis of machine-extracted regularities in the speech to lexicon mapping process. These results demonstrate the feasibility of relating human and ASR solutions to the problem of speech recognition, and suggest the potential for further studies relating complex neural computations in human speech comprehension to the rapidly evolving ASR systems that address the same problem domain.Author summary: The ability to understand spoken language is a defining human capacity. But despite decades of research, there is still no well-specified account of how sound entering the ear is neurally interpreted as a sequence of meaningful words. At the same time, modern computer-based Automatic Speech Recognition (ASR) systems are capable of near-human levels of performance, especially where word-identification is concerned. In this research we aim to bridge the gap between human and machine solutions to speech recognition. We use a novel combination of neuroimaging and statistical methods to relate human and machine internal states that are dynamically generated as spoken words are heard by human listeners and analysed by ASR systems. We find that the stable regularities discovered by the ASR process, linking speech input to phonetic labels, can be significantly related to the regularities extracted in the human brain. Both systems may have in common a representation of these regularities in terms of articulatory phonetic features, consistent with an analysis process which recovers the articulatory gestures that generated the speech. These results suggest a possible partnership between human- and machine-based research which may deliver both a better understanding of how the human brain provides such a robust solution to speech understanding, and generate insights that enhance the performance of future ASR systems.

Suggested Citation

  • Cai Wingfield & Li Su & Xunying Liu & Chao Zhang & Phil Woodland & Andrew Thwaites & Elisabeth Fonteneau & William D Marslen-Wilson, 2017. "Relating dynamic brain states to dynamic machine states: Human and machine solutions to the speech recognition problem," PLOS Computational Biology, Public Library of Science, vol. 13(9), pages 1-25, September.
  • Handle: RePEc:plo:pcbi00:1005617
    DOI: 10.1371/journal.pcbi.1005617
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005617
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1005617&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1005617?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Hamed Nili & Cai Wingfield & Alexander Walther & Li Su & William Marslen-Wilson & Nikolaus Kriegeskorte, 2014. "A Toolbox for Representational Similarity Analysis," PLOS Computational Biology, Public Library of Science, vol. 10(4), pages 1-11, April.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Valentina Krenz & Arjen Alink & Tobias Sommer & Benno Roozendaal & Lars Schwabe, 2023. "Time-dependent memory transformation in hippocampus and neocortex is semantic in nature," Nature Communications, Nature, vol. 14(1), pages 1-17, December.
    2. Julia Berezutskaya & Zachary V Freudenburg & Umut Güçlü & Marcel A J van Gerven & Nick F Ramsey, 2020. "Brain-optimized extraction of complex sound features that drive continuous auditory perception," PLOS Computational Biology, Public Library of Science, vol. 16(7), pages 1-34, July.
    3. Manoj Kumar & Cameron T Ellis & Qihong Lu & Hejia Zhang & Mihai Capotă & Theodore L Willke & Peter J Ramadge & Nicholas B Turk-Browne & Kenneth A Norman, 2020. "BrainIAK tutorials: User-friendly learning materials for advanced fMRI analysis," PLOS Computational Biology, Public Library of Science, vol. 16(1), pages 1-12, January.
    4. Hamed Nili & Alexander Walther & Arjen Alink & Nikolaus Kriegeskorte, 2020. "Inferring exemplar discriminability in brain representations," PLOS ONE, Public Library of Science, vol. 15(6), pages 1-28, June.
    5. Katherine R. Storrs & Barton L. Anderson & Roland W. Fleming, 2021. "Unsupervised learning predicts human perception and misperception of gloss," Nature Human Behaviour, Nature, vol. 5(10), pages 1402-1417, October.
    6. Agustin Lage-Castellanos & Giancarlo Valente & Elia Formisano & Federico De Martino, 2019. "Methods for computing the maximum performance of computational models of fMRI responses," PLOS Computational Biology, Public Library of Science, vol. 15(3), pages 1-25, March.
    7. Ming Bo Cai & Nicolas W Schuck & Jonathan W Pillow & Yael Niv, 2019. "Representational structure or task structure? Bias in neural representational similarity analysis and a Bayesian method for reducing bias," PLOS Computational Biology, Public Library of Science, vol. 15(5), pages 1-30, May.
    8. Michael F Bonner & Russell A Epstein, 2018. "Computational mechanisms underlying cortical responses to the affordance properties of visual scenes," PLOS Computational Biology, Public Library of Science, vol. 14(4), pages 1-31, April.
    9. Máté Aller & Agoston Mihalik & Uta Noppeney, 2022. "Audiovisual adaptation is expressed in spatial and decisional codes," Nature Communications, Nature, vol. 13(1), pages 1-17, December.
    10. Jörn Diedrichsen & Nikolaus Kriegeskorte, 2017. "Representational models: A common framework for understanding encoding, pattern-component, and representational-similarity analysis," PLOS Computational Biology, Public Library of Science, vol. 13(4), pages 1-33, April.
    11. Haider Al-Tahan & Yalda Mohsenzadeh, 2021. "Reconstructing feedback representations in the ventral visual pathway with a generative adversarial autoencoder," PLOS Computational Biology, Public Library of Science, vol. 17(3), pages 1-19, March.
    12. Kristjan Kalm & Dennis Norris, 2021. "Sequence learning recodes cortical representations instead of strengthening initial ones," PLOS Computational Biology, Public Library of Science, vol. 17(5), pages 1-34, May.
    13. Alexander J Barnett & Walter Reilly & Halle R Dimsdale-Zucker & Eda Mizrak & Zachariah Reagh & Charan Ranganath, 2021. "Intrinsic connectivity reveals functionally distinct cortico-hippocampal networks in the human brain," PLOS Biology, Public Library of Science, vol. 19(6), pages 1-34, June.
    14. Christianne Jacobs & Kirsten Petras & Pieter Moors & Valerie Goffaux, 2020. "Contrast versus identity encoding in the face image follow distinct orientation selectivity profiles," PLOS ONE, Public Library of Science, vol. 15(3), pages 1-22, March.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1005617. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.