IDEAS home Printed from https://ideas.repec.org/p/qsh/wpaper/383316.html
   My bibliography  Save this paper

The risk of machine learning

Author

Listed:
  • Alberto Abadie
  • Kasy, Maximilian

Abstract

Many applied settings in empirical economics involve simultaneous estimation of a large number of parameters. In particular, applied economists are often interested in estimating the effects of many-valued treatments (like teacher effects or location effects), treatment effects for many groups, and prediction models with many regressors. In these settings, machine learning methods that combine regularized estimation and data-driven choices of regularization parameters are useful to avoid over-fitting. In this article, we analyze the performance of a class of such methods that includes ridge, lasso, and pretest, in contexts that require simultaneous estimation of many parameters. Our analysis aims to provide guidance to applied researchers on (i) the choice between regularized estimators in practice and (ii) data-driven selection of regularization parameters. To address (i), we characterize the risk (mean squared error) of regularized estimators and derive their relative performance as a function of simple features of the data generating process. To address (ii), we show that data-driven choices of regularization parameters, based on Stein's unbiased risk estimate or on cross-validation, yield estimators with risk uniformly close to the risk attained under the optimal (unfeasible) choice of regularization parameters. We use data from recent examples in the empirical economics literature to illustrate the practical applicability of our results.

Suggested Citation

  • Alberto Abadie & Kasy, Maximilian, 2017. "The risk of machine learning," Working Paper 383316, Harvard University OpenScholar.
  • Handle: RePEc:qsh:wpaper:383316
    as

    Download full text from publisher

    File URL: http://scholar.harvard.edu/kasy/node/383316
    Download Restriction: no
    ---><---

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Kunz, Johannes S. & Staub, Kevin E. & Winkelmann, Rainer, 2017. "Estimating Fixed Effects: Perfect Prediction and Bias in Binary Response Panel Models, with an Application to the Hospital Readmissions Reduction Program," IZA Discussion Papers 11182, Institute of Labor Economics (IZA).
    2. James Habyarimana & Stuti Khemani & Thiago Scot, 2023. "The importance of political selection for bureaucratic effectiveness," Economica, London School of Economics and Political Science, vol. 90(359), pages 746-779, July.
    3. David Easley & Eleonora Patacchini & Christopher Rojas, 2020. "Multidimensional diffusion processes in dynamic online networks," PLOS ONE, Public Library of Science, vol. 15(2), pages 1-21, February.
    4. St'ephane Bonhomme & Martin Weidner, 2019. "Posterior Average Effects," Papers 1906.06360, arXiv.org, revised Sep 2021.
    5. Fiona Burlig & Christopher Knittel & David Rapson & Mar Reguant & Catherine Wolfram, 2020. "Machine Learning from Schools about Energy Efficiency," Journal of the Association of Environmental and Resource Economists, University of Chicago Press, vol. 7(6), pages 1181-1217.
    6. Chakraborty, Chiranjit & Joseph, Andreas, 2017. "Machine learning at central banks," Bank of England working papers 674, Bank of England.
    7. Alexander M. Chinco & Adam D. Clark-Joseph & Mao Ye, 2017. "Sparse Signals in the Cross-Section of Returns," NBER Working Papers 23933, National Bureau of Economic Research, Inc.
    8. Deimante Teresiene & Margarita Aleksynaite, 2020. "The Use of Technical Analysis in the US, European and Asian Stock Markets," Technium Social Sciences Journal, Technium Science, vol. 8(1), pages 302-318, June.
    9. Pablo Picardo, 2019. "Predicción de precios de vivienda: Aprendizaje estadístico con datos de oferta y transacciones para la ciudad de Montevideo," Documentos de trabajo 2019002, Banco Central del Uruguay.
    10. repec:thr:techub:1008:y:2020:i:1:p:302-318 is not listed on IDEAS

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:qsh:wpaper:383316. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Richard Brandon (email available below). General contact details of provider: https://edirc.repec.org/data/cbrssus.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.