IDEAS home Printed from https://ideas.repec.org/a/nat/natcom/v14y2023i1d10.1038_s41467-023-43713-1.html
   My bibliography  Save this article

Augmenting interpretable models with large language models during training

Author

Listed:
  • Chandan Singh

    (Microsoft Research)

  • Armin Askari

    (University of California)

  • Rich Caruana

    (Microsoft Research)

  • Jianfeng Gao

    (Microsoft Research)

Abstract

Recent large language models (LLMs), such as ChatGPT, have demonstrated remarkable prediction performance for a growing array of tasks. However, their proliferation into high-stakes domains and compute-limited settings has created a burgeoning need for interpretability and efficiency. We address this need by proposing Aug-imodels, a framework for leveraging the knowledge learned by LLMs to build extremely efficient and interpretable prediction models. Aug-imodels use LLMs during fitting but not during inference, allowing complete transparency and often a speed/memory improvement of greater than 1000x for inference compared to LLMs. We explore two instantiations of Aug-imodels in natural-language processing: Aug-Linear, which augments a linear model with decoupled embeddings from an LLM and Aug-Tree, which augments a decision tree with LLM feature expansions. Across a variety of text-classification datasets, both outperform their non-augmented, interpretable counterparts. Aug-Linear can even outperform much larger models, e.g. a 6-billion parameter GPT-J model, despite having 10,000x fewer parameters and being fully transparent. We further explore Aug-imodels in a natural-language fMRI study, where they generate interesting interpretations from scientific data.

Suggested Citation

  • Chandan Singh & Armin Askari & Rich Caruana & Jianfeng Gao, 2023. "Augmenting interpretable models with large language models during training," Nature Communications, Nature, vol. 14(1), pages 1-11, December.
  • Handle: RePEc:nat:natcom:v:14:y:2023:i:1:d:10.1038_s41467-023-43713-1
    DOI: 10.1038/s41467-023-43713-1
    as

    Download full text from publisher

    File URL: https://www.nature.com/articles/s41467-023-43713-1
    File Function: Abstract
    Download Restriction: no

    File URL: https://libkey.io/10.1038/s41467-023-43713-1?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Alexander G. Huth & Wendy A. de Heer & Thomas L. Griffiths & Frédéric E. Theunissen & Jack L. Gallant, 2016. "Natural speech reveals the semantic maps that tile human cerebral cortex," Nature, Nature, vol. 532(7600), pages 453-458, April.
    2. Arnaud Mignan & Marco Broccardo, 2019. "One neuron versus deep learning in aftershock prediction," Nature, Nature, vol. 574(7776), pages 1-3, October.
    3. Pekka Malo & Ankur Sinha & Pekka Korhonen & Jyrki Wallenius & Pyry Takala, 2014. "Good debt or bad debt: Detecting semantic orientations in economic texts," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 65(4), pages 782-796, April.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Chen, Cathy Yi-Hsuan & Fengler, Matthias R. & Härdle, Wolfgang Karl & Liu, Yanchu, 2022. "Media-expressed tone, option characteristics, and stock return predictability," Journal of Economic Dynamics and Control, Elsevier, vol. 134(C).
    2. Paola Cerchiello & Giancarlo Nicola, 2018. "Assessing News Contagion in Finance," Econometrics, MDPI, vol. 6(1), pages 1-19, February.
    3. Priyank Sonkiya & Vikas Bajpai & Anukriti Bansal, 2021. "Stock price prediction using BERT and GAN," Papers 2107.09055, arXiv.org.
    4. Duygu Ider & Stefan Lessmann, 2022. "Forecasting Cryptocurrency Returns from Sentiment Signals: An Analysis of BERT Classifiers and Weak Supervision," Papers 2204.05781, arXiv.org, revised Mar 2023.
    5. Kirtac, Kemal & Germano, Guido, 2024. "Sentiment trading with large language models," LSE Research Online Documents on Economics 122592, London School of Economics and Political Science, LSE Library.
    6. Beau Sievers & Christopher Welker & Uri Hasson & Adam M. Kleinbaum & Thalia Wheatley, 2024. "Consensus-building conversation leads to neural alignment," Nature Communications, Nature, vol. 15(1), pages 1-15, December.
    7. Lorenza Lucchi Basili & Pier Luigi Sacco, 2017. "Tie-Up Cycles in Long-Term Mating. Part II: Fictional Narratives and the Social Cognition of Mating," Challenges, MDPI, vol. 8(1), pages 1-60, February.
    8. Andrea Ajello & Diego Silva & Travis Adams & Francisco Vazquez-Grande, 2023. "More than Words: Twitter Chatter and Financial Market Sentiment," Finance and Economics Discussion Series 2023-034, Board of Governors of the Federal Reserve System (U.S.).
    9. Ankur Sinha & Satishwar Kedas & Rishu Kumar & Pekka Malo, 2022. "SEntFiN 1.0: Entity‐aware sentiment analysis for financial news," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 73(9), pages 1314-1335, September.
    10. Tingsong Jiang & Andy Zeng, 2023. "Financial sentiment analysis using FinBERT with application in predicting stock movement," Papers 2306.02136, arXiv.org.
    11. Agam Shah & Arnav Hiray & Pratvi Shah & Arkaprabha Banerjee & Anushka Singh & Dheeraj Eidnani & Bhaskar Chaudhury & Sudheer Chava, 2024. "Numerical Claim Detection in Finance: A New Financial Dataset, Weak-Supervision Model, and Market Analysis," Papers 2402.11728, arXiv.org.
    12. Samuel Ronnqvist & Peter Sarlin, 2016. "Bank distress in the news: Describing events through deep learning," Papers 1603.05670, arXiv.org, revised Dec 2016.
    13. Alex Kim & Sangwon Yoon, 2023. "Corporate Bankruptcy Prediction with Domain-Adapted BERT," Papers 2312.03194, arXiv.org.
    14. Desjardins, Christoph, 2021. "Don't be too SMART, but SAVE your goals: Proposal for a renewed goal-setting formula for Generation Y," Journal of Applied Leadership and Management, Hochschule Kempten - University of Applied Sciences, Professional School of Business & Technology, vol. 9, pages 73-87.
    15. Maryam Honari-Jahromi & Brea Chouinard & Esti Blanco-Elorrieta & Liina Pylkkänen & Alona Fyshe, 2021. "Neural representation of words within phrases: Temporal evolution of color-adjectives and object-nouns during simple composition," PLOS ONE, Public Library of Science, vol. 16(3), pages 1-17, March.
    16. Xue L. Gong & Alexander G. Huth & Fatma Deniz & Keith Johnson & Jack L. Gallant & Frédéric E. Theunissen, 2023. "Phonemic segmentation of narrative speech in human cerebral cortex," Nature Communications, Nature, vol. 14(1), pages 1-17, December.
    17. Jiaqi Zhang & Xijun He, 2023. "Earthquake magnitude prediction using a VMD-BP neural network model," Natural Hazards: Journal of the International Society for the Prevention and Mitigation of Natural Hazards, Springer;International Society for the Prevention and Mitigation of Natural Hazards, vol. 117(1), pages 189-205, May.
    18. Laurent Caplette & Nicholas B. Turk-Browne, 2024. "Computational reconstruction of mental representations using human behavior," Nature Communications, Nature, vol. 15(1), pages 1-19, December.
    19. David M Alexander & Tonio Ball & Andreas Schulze-Bonhage & Cees van Leeuwen, 2019. "Large-scale cortical travelling waves predict localized future cortical signals," PLOS Computational Biology, Public Library of Science, vol. 15(11), pages 1-34, November.
    20. Ariel Goldstein & Avigail Grinstein-Dabush & Mariano Schain & Haocheng Wang & Zhuoqiao Hong & Bobbi Aubrey & Mariano Schain & Samuel A. Nastase & Zaid Zada & Eric Ham & Amir Feder & Harshvardhan Gazul, 2024. "Alignment of brain embeddings and artificial contextual embeddings in natural language points to common geometric patterns," Nature Communications, Nature, vol. 15(1), pages 1-12, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:natcom:v:14:y:2023:i:1:d:10.1038_s41467-023-43713-1. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.