Augmenting interpretable models with large language models during training

My bibliography Save this article

Augmenting interpretable models with large language models during training

Author

Listed:

Chandan Singh
(Microsoft Research)
Armin Askari
(University of California)
Rich Caruana
(Microsoft Research)
Jianfeng Gao
(Microsoft Research)

Registered:

Abstract

Recent large language models (LLMs), such as ChatGPT, have demonstrated remarkable prediction performance for a growing array of tasks. However, their proliferation into high-stakes domains and compute-limited settings has created a burgeoning need for interpretability and efficiency. We address this need by proposing Aug-imodels, a framework for leveraging the knowledge learned by LLMs to build extremely efficient and interpretable prediction models. Aug-imodels use LLMs during fitting but not during inference, allowing complete transparency and often a speed/memory improvement of greater than 1000x for inference compared to LLMs. We explore two instantiations of Aug-imodels in natural-language processing: Aug-Linear, which augments a linear model with decoupled embeddings from an LLM and Aug-Tree, which augments a decision tree with LLM feature expansions. Across a variety of text-classification datasets, both outperform their non-augmented, interpretable counterparts. Aug-Linear can even outperform much larger models, e.g. a 6-billion parameter GPT-J model, despite having 10,000x fewer parameters and being fully transparent. We further explore Aug-imodels in a natural-language fMRI study, where they generate interesting interpretations from scientific data.

Suggested Citation

Chandan Singh & Armin Askari & Rich Caruana & Jianfeng Gao, 2023. "Augmenting interpretable models with large language models during training," Nature Communications, Nature, vol. 14(1), pages 1-11, December.

Handle: RePEc:nat:natcom:v:14:y:2023:i:1:d:10.1038_s41467-023-43713-1
DOI: 10.1038/s41467-023-43713-1

Download full text from publisher

References listed on IDEAS

Alexander G. Huth & Wendy A. de Heer & Thomas L. Griffiths & Frédéric E. Theunissen & Jack L. Gallant, 2016. "Natural speech reveals the semantic maps that tile human cerebral cortex," Nature, Nature, vol. 532(7600), pages 453-458, April.
Arnaud Mignan & Marco Broccardo, 2019. "One neuron versus deep learning in aftershock prediction," Nature, Nature, vol. 574(7776), pages 1-3, October.
Pekka Malo & Ankur Sinha & Pekka Korhonen & Jyrki Wallenius & Pyry Takala, 2014. "Good debt or bad debt: Detecting semantic orientations in economic texts," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 65(4), pages 782-796, April.
- Pekka Malo & Ankur Sinha & Pyry Takala & Pekka Korhonen & Jyrki Wallenius, 2013. "Good Debt or Bad Debt: Detecting Semantic Orientations in Economic Texts," Papers 1307.5336, arXiv.org, revised Jul 2013.

Full references (including those not matched with items on IDEAS)

Citations

Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.

Cited by:

Yang Zhao & Pu Wang & Yibo Zhao & Hongru Du & Hao Frank Yang, 2025. "SafeTraffic Copilot: adapting large language models for trustworthy traffic safety assessments and decision interventions," Nature Communications, Nature, vol. 16(1), pages 1-17, December.

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Kirtac, Kemal & Germano, Guido, 2024. "Sentiment trading with large language models," Finance Research Letters, Elsevier, vol. 62(PB).
- Kirtac, Kemal & Germano, Guido, 2024. "Sentiment trading with large language models," LSE Research Online Documents on Economics 122592, London School of Economics and Political Science, LSE Library.
- Kemal Kirtac & Guido Germano, 2024. "Sentiment trading with large language models," Papers 2412.19245, arXiv.org.
Chen, Cathy Yi-Hsuan & Fengler, Matthias R. & Härdle, Wolfgang Karl & Liu, Yanchu, 2022. "Media-expressed tone, option characteristics, and stock return predictability," Journal of Economic Dynamics and Control, Elsevier, vol. 134(C).
- Chen, Cathy Yi-Hsuan & Fengler, Matthias R. & Härdle, Wolfgang Karl & Liu, Yanchu, 2019. "Media-expressed tone, Option Characteristics, and Stock Return Predictability," IRTG 1792 Discussion Papers 2019-015, Humboldt University of Berlin, International Research Training Group 1792 "High Dimensional Nonstationary Time Series".
Paola Cerchiello & Giancarlo Nicola, 2018. "Assessing News Contagion in Finance," Econometrics, MDPI, vol. 6(1), pages 1-19, February.
Dolaeva, Aishat & Beliaeva, Uliana & Grigoriev, Dmitry & Semenov, Alexander & Rysz, Maciej, 2025. "Analyzing and forecasting P/E ratios using investor sentiment in panel data regression and LSTM models," International Review of Economics & Finance, Elsevier, vol. 98(C).
Borchert, Philipp & Coussement, Kristof & De Weerdt, Jochen & De Caigny, Arno, 2024. "Industry-sensitive language modeling for business," European Journal of Operational Research, Elsevier, vol. 315(2), pages 691-702.
Priyank Sonkiya & Vikas Bajpai & Anukriti Bansal, 2021. "Stock price prediction using BERT and GAN," Papers 2107.09055, arXiv.org.
Ziliang Zhu & Huichao Yang & Haojie Wen & Jinyi Hung & Yueqin Hu & Yanchao Bi & Xi Yu, 2025. "Innate network mechanisms of temporal pole for semantic cognition in neonatal and adult twin studies," Nature Communications, Nature, vol. 16(1), pages 1-18, December.
Duygu Ider & Stefan Lessmann, 2022. "Forecasting Cryptocurrency Returns from Sentiment Signals: An Analysis of BERT Classifiers and Weak Supervision," Papers 2204.05781, arXiv.org, revised Mar 2023.
Darko B. Vuković & Senanu Dekpo-Adza & Stefana Matović, 2025. "AI integration in financial services: a systematic review of trends and regulatory challenges," Humanities and Social Sciences Communications, Palgrave Macmillan, vol. 12(1), pages 1-29, December.
Ankur Sinha & Chaitanya Agarwal & Pekka Malo, 2025. "FinBloom: Knowledge Grounding Large Language Model with Real-time Financial Data," Papers 2502.18471, arXiv.org.
Julian Junyan Wang & Victor Xiaoqi Wang, 2025. "Assessing Consistency and Reproducibility in the Outputs of Large Language Models: Evidence Across Diverse Finance and Accounting Tasks," Papers 2503.16974, arXiv.org, revised Sep 2025.
Yu Takagi & Daichi Shimizu & Mina Wakabayashi & Ryu Ohata & Hiroshi Imamizu, 2025. "Cross-modal deep generative models reveal the cortical representation of dancing," Nature Communications, Nature, vol. 16(1), pages 1-12, December.
Beau Sievers & Christopher Welker & Uri Hasson & Adam M. Kleinbaum & Thalia Wheatley, 2024. "Consensus-building conversation leads to neural alignment," Nature Communications, Nature, vol. 15(1), pages 1-15, December.
Sreejan Kumar & Theodore R. Sumers & Takateru Yamakoshi & Ariel Goldstein & Uri Hasson & Kenneth A. Norman & Thomas L. Griffiths & Robert D. Hawkins & Samuel A. Nastase, 2024. "Shared functional specialization in transformer-based language models and the human brain," Nature Communications, Nature, vol. 15(1), pages 1-19, December.
Lorenza Lucchi Basili & Pier Luigi Sacco, 2017. "Tie-Up Cycles in Long-Term Mating. Part II: Fictional Narratives and the Social Cognition of Mating," Challenges, MDPI, vol. 8(1), pages 1-60, February.
Andrea Ajello & Diego Silva & Travis Adams & Francisco Vazquez-Grande, 2023. "More than Words: Twitter Chatter and Financial Market Sentiment," Finance and Economics Discussion Series 2023-034, Board of Governors of the Federal Reserve System (U.S.).
Ankur Sinha & Satishwar Kedas & Rishu Kumar & Pekka Malo, 2022. "SEntFiN 1.0: Entity‐aware sentiment analysis for financial news," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 73(9), pages 1314-1335, September.
Tingsong Jiang & Qingyun Zeng, 2023. "Financial sentiment analysis using FinBERT with application in predicting stock movement," Papers 2306.02136, arXiv.org, revised Jun 2025.
Abdollahi, Hooman & Junttila, Juha-Pekka & Lehkonen, Heikki, 2024. "Clustering asset markets based on volatility connectedness to political news," Journal of International Financial Markets, Institutions and Money, Elsevier, vol. 93(C).
Abdollahi, Hooman & Fjesme, Sturla L. & Sirnes, Espen, 2024. "Measuring market volatility connectedness to media sentiment," The North American Journal of Economics and Finance, Elsevier, vol. 71(C).

More about this item

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:natcom:v:14:y:2023:i:1:d:10.1038_s41467-023-43713-1. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

Augmenting interpretable models with large language models during training

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Citations

Most related items

More about this item

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data