The risk of machine learning

The risk of machine learning

Author

Listed:

Alberto Abadie
Kasy, Maximilian

Abstract

Many applied settings in empirical economics involve simultaneous estimation of a large number of parameters. In particular, applied economists are often interested in estimating the effects of many-valued treatments (like teacher effects or location effects), treatment effects for many groups, and prediction models with many regressors. In these settings, machine learning methods that combine regularized estimation and data-driven choices of regularization parameters are useful to avoid over-fitting. In this article, we analyze the performance of a class of such methods that includes ridge, lasso, and pretest, in contexts that require simultaneous estimation of many parameters. Our analysis aims to provide guidance to applied researchers on (i) the choice between regularized estimators in practice and (ii) data-driven selection of regularization parameters. To address (i), we characterize the risk (mean squared error) of regularized estimators and derive their relative performance as a function of simple features of the data generating process. To address (ii), we show that data-driven choices of regularization parameters, based on Stein's unbiased risk estimate or on cross-validation, yield estimators with risk uniformly close to the risk attained under the optimal (unfeasible) choice of regularization parameters. We use data from recent examples in the empirical economics literature to illustrate the practical applicability of our results.

Suggested Citation

Alberto Abadie & Kasy, Maximilian, 2017. "The risk of machine learning," Working Paper 383316, Harvard University OpenScholar.

Handle: RePEc:qsh:wpaper:383316

Download full text from publisher

Citations

Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.

Cited by:

Chiranjit Chakraborty & Andreas Joseph, 2017. "Machine learning at central banks," Bank of England working papers 674, Bank of England.
Alexander M. Chinco & Adam D. Clark-Joseph & Mao Ye, 2017. "Sparse Signals in the Cross-Section of Returns," NBER Working Papers 23933, National Bureau of Economic Research, Inc.
Deimante Teresiene & Margarita Aleksynaite, 2020. "The Use of Technical Analysis in the US, European and Asian Stock Markets," Technium Social Sciences Journal, Technium Science, vol. 8(1), pages 302-318, June.
Pablo Picardo, 2019. "Predicción de precios de vivienda: Aprendizaje estadístico con datos de oferta y transacciones para la ciudad de Montevideo," Documentos de trabajo 2019002, Banco Central del Uruguay.
Kunz, Johannes S. & Staub, Kevin E. & Winkelmann, Rainer, 2017. "Estimating Fixed Effects: Perfect Prediction and Bias in Binary Response Panel Models, with an Application to the Hospital Readmissions Reduction Program," IZA Discussion Papers 11182, Institute of Labor Economics (IZA).
Fiona Burlig & Christopher Knittel & David Rapson & Mar Reguant & Catherine Wolfram, 2020. "Machine Learning from Schools about Energy Efficiency," Journal of the Association of Environmental and Resource Economists, University of Chicago Press, vol. 7(6), pages 1181-1217.
- Fiona Burlig & Christopher Knittel & David Rapson & Mar Reguant & Catherine Wolfram, 2017. "Machine Learning from Schools about Energy Efficiency," NBER Working Papers 23908, National Bureau of Economic Research, Inc.
James Habyarimana & Stuti Khemani & Thiago Scot, 2023. "The importance of political selection for bureaucratic effectiveness," Economica, London School of Economics and Political Science, vol. 90(359), pages 746-779, July.
David Easley & Eleonora Patacchini & Christopher Rojas, 2020. "Multidimensional diffusion processes in dynamic online networks," PLOS ONE, Public Library of Science, vol. 15(2), pages 1-21, February.
- David Easley & Eleonora Patacchini & Christopher Rojas, 2019. "Multidimensional Diffusion Processes in Dynamic Online Networks," EIEF Working Papers Series 1912, Einaudi Institute for Economics and Finance (EIEF), revised Jul 2019.
Stéphane Bonhomme & Martin Weidner, 2019. "Posterior average effects," CeMMAP working papers CWP43/19, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
- St'ephane Bonhomme & Martin Weidner, 2019. "Posterior Average Effects," Papers 1906.06360, arXiv.org, revised Sep 2021.

More about this item

NEP fields

This paper has been announced in the following NEP Reports:

NEP-ECM-2016-04-16 (Econometrics)

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:qsh:wpaper:383316. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

We have no bibliographic references for this item. You can help adding them by using this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Richard Brandon The email address of this maintainer does not seem to be valid anymore. Please ask Richard Brandon to update the entry or send us the correct address (email available below). General contact details of provider: https://edirc.repec.org/data/cbrssus.html .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

The risk of machine learning

Author

Abstract

Suggested Citation

Download full text from publisher

Citations

More about this item

NEP fields

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data