IDEAS home Printed from https://ideas.repec.org/p/boe/boeewp/1127.html
   My bibliography  Save this paper

Improving text classification: logistic regression makes small LLMs strong and explainable ‘tens-of-shot’ classifiers

Author

Listed:
  • Marcus Buckmann

    (Bank of England)

  • Ed Hill

    (Bank of England)

Abstract

Text classification tasks such as sentiment analysis are common in economics and finance. We demonstrate that smaller, local generative language models can be effectively used for these tasks. Compared to large commercial models, they offer key advantages in privacy, availability, cost, and explainability. We use 17 sentence classification tasks (each with 2 to 4 classes) to show that penalised logistic regression on embeddings from a small language model often matches or exceeds the performance of a large model, even when trained on just dozens of labelled examples per class – the same amount typically needed to validate a large model’s performance. Moreover, this embedding-based approach yields stable and interpretable explanations for classification decisions.

Suggested Citation

  • Marcus Buckmann & Ed Hill, 2025. "Improving text classification: logistic regression makes small LLMs strong and explainable ‘tens-of-shot’ classifiers," Bank of England working papers 1127, Bank of England.
  • Handle: RePEc:boe:boeewp:1127
    as

    Download full text from publisher

    File URL: https://www.bankofengland.co.uk/-/media/boe/files/working-paper/2025/improving-text-classification-logistic-regression-llms-tens-of-shot-classifiers.pdf
    File Function: Full text
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Stephen Hansen & Michael McMahon, 2016. "Shocking Language: Understanding the Macroeconomic Effects of Central Bank Communication," NBER Chapters, in: NBER International Seminar on Macroeconomics 2015, National Bureau of Economic Research, Inc.
    2. Leif Anders Thorsrud, 2020. "Words are the New Numbers: A Newsy Coincident Index of the Business Cycle," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 38(2), pages 393-409, April.
    3. Bertsch, Christoph & Hull, Isaiah & Lumsdaine, Robin L. & Zhang, Xin, 2025. "Central bank mandates and monetary policy stances: Through the lens of Federal Reserve speeches," Journal of Econometrics, Elsevier, vol. 249(PC).
    4. Paul Hubert & Fabien Labondance, 2016. "Central Bank Sentiment and Policy Expectations," SciencePo Working papers Main hal-03459227, HAL.
    5. repec:spo:wpmain:info:hdl:2441/64veevce0i99oav223j3pkv1hf is not listed on IDEAS
    6. repec:hal:spmain:info:hdl:2441/64veevce0i99oav223j3pkv1hf is not listed on IDEAS
    7. Agam Shah & Suvan Paturi & Sudheer Chava, 2023. "Trillion Dollar Words: A New Financial Dataset, Task & Market Analysis," Papers 2305.07972, arXiv.org.
    8. Tim Loughran & Bill Mcdonald, 2011. "When Is a Liability Not a Liability? Textual Analysis, Dictionaries, and 10‐Ks," Journal of Finance, American Finance Association, vol. 66(1), pages 35-65, February.
    9. Pekka Malo & Ankur Sinha & Pekka Korhonen & Jyrki Wallenius & Pyry Takala, 2014. "Good debt or bad debt: Detecting semantic orientations in economic texts," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 65(4), pages 782-796, April.
    10. Ardia, David & Bluteau, Keven & Boudt, Kris, 2019. "Questioning the news about economic growth: Sparse forecasting using thousands of news-based sentiment values," International Journal of Forecasting, Elsevier, vol. 35(4), pages 1370-1386.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. repec:hal:spmain:info:hdl:2441/3mgbd73vkp9f9oje7utooe7vpg is not listed on IDEAS
    2. Dorinth van Dijk & Jasper de Winter, 2023. "Nowcasting GDP using tone-adjusted time varying news topics: Evidence from the financial press," Working Papers 766, DNB.
    3. Leif Anders Thorsrud, 2016. "Nowcasting using news topics Big Data versus big bank," Working Papers No 6/2016, Centre for Applied Macro- and Petroleum economics (CAMP), BI Norwegian Business School.
    4. Paul Hubert & Fabien Labondance, 2019. "Central bank tone and the dispersion of views within monetary policy committees," SciencePo Working papers Main hal-03403256, HAL.
    5. Yuqi Nie & Yaxuan Kong & Xiaowen Dong & John M. Mulvey & H. Vincent Poor & Qingsong Wen & Stefan Zohren, 2024. "A Survey of Large Language Models for Financial Applications: Progress, Prospects and Challenges," Papers 2406.11903, arXiv.org.
    6. Laurent Ferrara & Nicolas de Roux, 2025. "Capturing international influences in U.S. monetary policy through a NLP approach," Working Papers hal-05072535, HAL.
    7. Mikael Apel & Marianna Blix Grimaldi & Isaiah Hull, 2022. "How Much Information Do Monetary Policy Committees Disclose? Evidence from the FOMC's Minutes and Transcripts," Journal of Money, Credit and Banking, Blackwell Publishing, vol. 54(5), pages 1459-1490, August.
    8. Angelo M. Fasolo & Flávia M. Graminho & Saulo B. Bastos, 2021. "Seeing the Forest for the Trees: using hLDA models to evaluate communication in Banco Central do Brasil," Working Papers Series 555, Central Bank of Brazil, Research Department.
    9. Valerio Astuti & Alessio Ciarlone & Alberto Coco, 2022. "The role of central bank communication in inflation-targeting Eastern European emerging economies," Temi di discussione (Economic working papers) 1381, Bank of Italy, Economic Research and International Relations Area.
    10. Istrefi, Klodiana & Odendahl, Florens & Sestieri, Giulia, 2023. "Fed communication on financial stability concerns and monetary policy decisions: Revelations from speeches," Journal of Banking & Finance, Elsevier, vol. 151(C).
    11. repec:spo:wpmain:info:hdl:2441/7v8fvu0bf08jcoi4epn8cutjm8 is not listed on IDEAS
    12. Shapiro, Adam Hale & Sudhof, Moritz & Wilson, Daniel J., 2022. "Measuring news sentiment," Journal of Econometrics, Elsevier, vol. 228(2), pages 221-243.
    13. Ricardo Correa & Keshav Garud & Juan M Londono & Nathan Mislang, 2021. "Sentiment in Central Banks’ Financial Stability Reports," Review of Finance, European Finance Association, vol. 25(1), pages 85-120.
    14. Bennani, Hamza, 2019. "Does People's Bank of China communication matter? Evidence from stock market reaction," Emerging Markets Review, Elsevier, vol. 40(C), pages 1-1.
    15. Julian Ashwin & Eleni Kalamara & Lorena Saiz, 2024. "Nowcasting Euro area GDP with news sentiment: A tale of two crises," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 39(5), pages 887-905, August.
    16. Aabid Karim & Heman Das Lohano, 2024. "Sentiment Analysis of State Bank of Pakistan's Monetary Policy Documents and its Impact on Stock Market," Papers 2408.03328, arXiv.org.
    17. Hubert, Paul & Labondance, Fabien, 2021. "The signaling effects of central bank tone," European Economic Review, Elsevier, vol. 133(C).
    18. repec:hal:spmain:info:hdl:2441/7v8fvu0bf08jcoi4epn8cutjm8 is not listed on IDEAS
    19. repec:zbw:bofitp:2019_009 is not listed on IDEAS
    20. Simionescu, Mihaela, 2022. "Econometrics of sentiments- sentometrics and machine learning: The improvement of inflation predictions in Romania using sentiment analysis," Technological Forecasting and Social Change, Elsevier, vol. 182(C).
    21. Anastasiou, Dimitris & Krokida, Styliani-Iris & Tsouknidis, Dimitris & Drakos, Konstantinos, 2023. "Can the tone of central bankers’ speeches discourage potential bank borrowers in the Eurozone?," Journal of International Money and Finance, Elsevier, vol. 139(C).
    22. Kwok Ping Tsang & Zichao Yang, 2023. "Agree to Disagree: Measuring Hidden Dissent in FOMC Meetings," Papers 2308.10131, arXiv.org, revised Nov 2024.
    23. Bennani, Hamza, 2019. "Does People's Bank of China communication matter? Evidence from stock market reaction," Emerging Markets Review, Elsevier, vol. 40(C), pages 1-1.
    24. Keiichi Goshima & Hiroshi Ishijima & Mototsugu Shintani & Hiroki Yamamoto, 2019. "Forecasting Japanese inflation with a news-based leading indicator of economic activities," CARF F-Series CARF-F-458, Center for Advanced Research in Finance, Faculty of Economics, The University of Tokyo.

    More about this item

    Keywords

    ;
    ;
    ;
    ;
    ;

    JEL classification:

    • C38 - Mathematical and Quantitative Methods - - Multiple or Simultaneous Equation Models; Multiple Variables - - - Classification Methdos; Cluster Analysis; Principal Components; Factor Analysis
    • C45 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods: Special Topics - - - Neural Networks and Related Topics
    • C80 - Mathematical and Quantitative Methods - - Data Collection and Data Estimation Methodology; Computer Programs - - - General

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:boe:boeewp:1127. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Digital Media Team (email available below). General contact details of provider: https://edirc.repec.org/data/boegvuk.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.