IDEAS home Printed from
   My bibliography  Save this paper

The risk of machine learning


  • Alberto Abadie
  • Kasy, Maximilian


Many applied settings in empirical economics involve simultaneous estimation of a large number of parameters. In particular, applied economists are often interested in estimating the effects of many-valued treatments (like teacher effects or location effects), treatment effects for many groups, and prediction models with many regressors. In these settings, machine learning methods that combine regularized estimation and data-driven choices of regularization parameters are useful to avoid over-fitting. In this article, we analyze the performance of a class of such methods that includes ridge, lasso, and pretest, in contexts that require simultaneous estimation of many parameters. Our analysis aims to provide guidance to applied researchers on (i) the choice between regularized estimators in practice and (ii) data-driven selection of regularization parameters. To address (i), we characterize the risk (mean squared error) of regularized estimators and derive their relative performance as a function of simple features of the data generating process. To address (ii), we show that data-driven choices of regularization parameters, based on Stein's unbiased risk estimate or on cross-validation, yield estimators with risk uniformly close to the risk attained under the optimal (unfeasible) choice of regularization parameters. We use data from recent examples in the empirical economics literature to illustrate the practical applicability of our results.

Suggested Citation

  • Alberto Abadie & Kasy, Maximilian, 2017. "The risk of machine learning," Working Paper 383316, Harvard University OpenScholar.
  • Handle: RePEc:qsh:wpaper:383316

    Download full text from publisher

    File URL:
    Download Restriction: no


    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.

    Cited by:

    1. Chakraborty, Chiranjit & Joseph, Andreas, 2017. "Machine learning at central banks," Bank of England working papers 674, Bank of England.
    2. Alexander M. Chinco & Adam D. Clark-Joseph & Mao Ye, 2017. "Sparse Signals in the Cross-Section of Returns," NBER Working Papers 23933, National Bureau of Economic Research, Inc.
    3. Pablo Picardo, 2019. "Predicción de precios de vivienda: Aprendizaje estadístico con datos de oferta y transacciones para la ciudad de Montevideo," Documentos de trabajo 2019002, Banco Central del Uruguay.
    4. Kunz, Johannes S. & Staub, Kevin E. & Winkelmann, Rainer, 2017. "Estimating Fixed Effects: Perfect Prediction and Bias in Binary Response Panel Models, with an Application to the Hospital Readmissions Reduction Program," IZA Discussion Papers 11182, Institute of Labor Economics (IZA).
    5. Fiona Burlig & Christopher Knittel & David Rapson & Mar Reguant & Catherine Wolfram, 2017. "Machine Learning from Schools about Energy Efficiency," NBER Working Papers 23908, National Bureau of Economic Research, Inc.
    6. St'ephane Bonhomme & Martin Weidner, 2019. "Posterior Average Effects," Papers 1906.06360,, revised Feb 2020.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:


    Access and download statistics


    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:qsh:wpaper:383316. See general information about how to correct material in RePEc.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: (Richard Brandon). General contact details of provider: .

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service hosted by the Research Division of the Federal Reserve Bank of St. Louis . RePEc uses bibliographic data supplied by the respective publishers.