IDEAS home Printed from https://ideas.repec.org/a/nat/nathum/v9y2025i5d10.1038_s41562-025-02105-9.html
   My bibliography  Save this article

A unified acoustic-to-speech-to-language embedding space captures the neural basis of natural language processing in everyday conversations

Author

Listed:
  • Ariel Goldstein

    (Hebrew University
    Google Research)

  • Haocheng Wang

    (Princeton University)

  • Leonard Niekerken

    (Princeton University
    Maastricht University)

  • Mariano Schain

    (Google Research)

  • Zaid Zada

    (Princeton University)

  • Bobbi Aubrey

    (Princeton University)

  • Tom Sheffer

    (Google Research)

  • Samuel A. Nastase

    (Princeton University)

  • Harshvardhan Gazula

    (Princeton University
    Massachusetts General Hospital and Harvard Medical School)

  • Aditi Singh

    (Princeton University)

  • Aditi Rao

    (Princeton University)

  • Gina Choe

    (Princeton University)

  • Catherine Kim

    (Princeton University)

  • Werner Doyle

    (New York University School of Medicine)

  • Daniel Friedman

    (New York University School of Medicine)

  • Sasha Devore

    (New York University School of Medicine)

  • Patricia Dugan

    (New York University School of Medicine)

  • Avinatan Hassidim

    (Google Research)

  • Michael Brenner

    (Google Research
    Harvard University)

  • Yossi Matias

    (Google Research)

  • Orrin Devinsky

    (New York University School of Medicine)

  • Adeen Flinker

    (New York University School of Medicine)

  • Uri Hasson

    (Princeton University)

Abstract

This study introduces a unified computational framework connecting acoustic, speech and word-level linguistic structures to study the neural basis of everyday conversations in the human brain. We used electrocorticography to record neural signals across 100 h of speech production and comprehension as participants engaged in open-ended real-life conversations. We extracted low-level acoustic, mid-level speech and contextual word embeddings from a multimodal speech-to-text model (Whisper). We developed encoding models that linearly map these embeddings onto brain activity during speech production and comprehension. Remarkably, this model accurately predicts neural activity at each level of the language processing hierarchy across hours of new conversations not used in training the model. The internal processing hierarchy in the model is aligned with the cortical hierarchy for speech and language processing, where sensory and motor regions better align with the model’s speech embeddings, and higher-level language areas better align with the model’s language embeddings. The Whisper model captures the temporal sequence of language-to-speech encoding before word articulation (speech production) and speech-to-language encoding post articulation (speech comprehension). The embeddings learned by this model outperform symbolic models in capturing neural activity supporting natural speech and language. These findings support a paradigm shift towards unified computational models that capture the entire processing hierarchy for speech comprehension and production in real-world conversations.

Suggested Citation

  • Ariel Goldstein & Haocheng Wang & Leonard Niekerken & Mariano Schain & Zaid Zada & Bobbi Aubrey & Tom Sheffer & Samuel A. Nastase & Harshvardhan Gazula & Aditi Singh & Aditi Rao & Gina Choe & Catherin, 2025. "A unified acoustic-to-speech-to-language embedding space captures the neural basis of natural language processing in everyday conversations," Nature Human Behaviour, Nature, vol. 9(5), pages 1041-1055, May.
  • Handle: RePEc:nat:nathum:v:9:y:2025:i:5:d:10.1038_s41562-025-02105-9
    DOI: 10.1038/s41562-025-02105-9
    as

    Download full text from publisher

    File URL: https://www.nature.com/articles/s41562-025-02105-9
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1038/s41562-025-02105-9?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:nathum:v:9:y:2025:i:5:d:10.1038_s41562-025-02105-9. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.