IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2505.13422.html
   My bibliography  Save this paper

Machine learning the first stage in 2SLS: Practical guidance from bias decomposition and simulation

Author

Listed:
  • Connor Lennon
  • Edward Rubin
  • Glen Waddell

Abstract

Machine learning (ML) primarily evolved to solve "prediction problems." The first stage of two-stage least squares (2SLS) is a prediction problem, suggesting potential gains from ML first-stage assistance. However, little guidance exists on when ML helps 2SLS$\unicode{x2014}$or when it hurts. We investigate the implications of inserting ML into 2SLS, decomposing the bias into three informative components. Mechanically, ML-in-2SLS procedures face issues common to prediction and causal-inference settings$\unicode{x2014}$and their interaction. Through simulation, we show linear ML methods (e.g., post-Lasso) work well, while nonlinear methods (e.g., random forests, neural nets) generate substantial bias in second-stage estimates$\unicode{x2014}$potentially exceeding the bias of endogenous OLS.

Suggested Citation

  • Connor Lennon & Edward Rubin & Glen Waddell, 2025. "Machine learning the first stage in 2SLS: Practical guidance from bias decomposition and simulation," Papers 2505.13422, arXiv.org.
  • Handle: RePEc:arx:papers:2505.13422
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2505.13422
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. James J. Heckman & Sergio Urzua & Edward Vytlacil, 2006. "Understanding Instrumental Variables in Models with Essential Heterogeneity," The Review of Economics and Statistics, MIT Press, vol. 88(3), pages 389-432, August.
    2. Joshua D. Angrist & Alan B. Krueger, 2001. "Instrumental Variables and the Search for Identification: From Supply and Demand to Natural Experiments," Journal of Economic Perspectives, American Economic Association, vol. 15(4), pages 69-85, Fall.
    3. David S. Lee & Justin McCrary & Marcelo J. Moreira & Jack Porter, 2022. "Valid t-Ratio Inference for IV," American Economic Review, American Economic Association, vol. 112(10), pages 3260-3290, October.
    4. Victor Chernozhukov & Denis Chetverikov & Mert Demirer & Esther Duflo & Christian Hansen & Whitney Newey & James Robins, 2018. "Double/debiased machine learning for treatment and structural parameters," Econometrics Journal, Royal Economic Society, vol. 21(1), pages 1-68, February.
    5. Carrasco, Marine & Tchuente, Guy, 2015. "Regularized LIML for many instruments," Journal of Econometrics, Elsevier, vol. 186(2), pages 427-442.
    6. James J. Heckman & Edward Vytlacil, 2005. "Structural Equations, Treatment Effects, and Econometric Policy Evaluation," Econometrica, Econometric Society, vol. 73(3), pages 669-738, May.
    7. Sendhil Mullainathan & Jann Spiess, 2017. "Machine Learning: An Applied Econometric Approach," Journal of Economic Perspectives, American Economic Association, vol. 31(2), pages 87-106, Spring.
    8. Ng Serena & Bai Jushan, 2009. "Selecting Instrumental Variables in a Data Rich Environment," Journal of Time Series Econometrics, De Gruyter, vol. 1(1), pages 1-34, April.
    9. Victor Chernozhukov & Christian Hansen & Martin Spindler, 2015. "Valid Post-Selection and Post-Regularization Inference: An Elementary, General Approach," Annual Review of Economics, Annual Reviews, vol. 7(1), pages 649-688, August.
    10. Hansen, Christian & Kozbur, Damian, 2014. "Instrumental variables estimation with many weak instruments using regularized JIVE," Journal of Econometrics, Elsevier, vol. 182(2), pages 290-308.
    11. Daniel A. Ackerberg & Paul J. Devereux, 2009. "Improved JIVE Estimators for Overidentified Linear Models with and without Heteroskedasticity," The Review of Economics and Statistics, MIT Press, vol. 91(2), pages 351-362, May.
    12. Jeffrey M Wooldridge, 2010. "Econometric Analysis of Cross Section and Panel Data," MIT Press Books, The MIT Press, edition 2, volume 1, number 0262232588, December.
    13. Biewen, Martin & Kugler, Philipp, 2021. "Two-stage least squares random forests with an application to Angrist and Evans (1998)," Economics Letters, Elsevier, vol. 204(C).
    14. Friedman, Jerome H., 2002. "Stochastic gradient boosting," Computational Statistics & Data Analysis, Elsevier, vol. 38(4), pages 367-378, February.
    15. Angrist, Joshua D & Krueger, Alan B, 1995. "Split-Sample Instrumental Variables Estimates of the Return to Schooling," Journal of Business & Economic Statistics, American Statistical Association, vol. 13(2), pages 225-235, April.
    16. Fuller, Wayne A, 1977. "Some Properties of a Modification of the Limited Information Estimator," Econometrica, Econometric Society, vol. 45(4), pages 939-953, May.
    17. Winkelried, D. & Smith, R.J., 2011. "Principal Components Instrumental Variable Estimation," Cambridge Working Papers in Economics 1119, Faculty of Economics, University of Cambridge.
    18. A. Belloni & D. Chen & V. Chernozhukov & C. Hansen, 2012. "Sparse Models and Methods for Optimal Instruments With an Application to Eminent Domain," Econometrica, Econometric Society, vol. 80(6), pages 2369-2429, November.
    19. Joshua D. Angrist & Jörn-Steffen Pischke, 2009. "Mostly Harmless Econometrics: An Empiricist's Companion," Economics Books, Princeton University Press, edition 1, number 8769.
    20. Joshua Angrist & Alan Krueger, 2001. "Instrumental Variables and the Search for Identification: From Supply and Demand to Natural Experiments," Working Papers 834, Princeton University, Department of Economics, Industrial Relations Section..
    21. Angrist, J D & Imbens, G W & Krueger, A B, 1999. "Jackknife Instrumental Variables Estimation," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 14(1), pages 57-67, Jan.-Feb..
    22. Hansen, Christian & Hausman, Jerry & Newey, Whitney, 2008. "Estimation With Many Instrumental Variables," Journal of Business & Economic Statistics, American Statistical Association, vol. 26, pages 398-422.
    23. Ellora Derenoncourt, 2022. "Can You Move to Opportunity? Evidence from the Great Migration," American Economic Review, American Economic Association, vol. 112(2), pages 369-408, February.
    24. repec:fth:prinin:455 is not listed on IDEAS
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Luis Antonio Fantozzi Alvarez & Rodrigo Toneto, 2024. "The interpretation of 2SLS with a continuous instrument: a weighted LATE representation," Working Papers, Department of Economics 2024_11, University of São Paulo (FEA-USP).
    2. Mogstad, Magne & Torgovitsky, Alexander, 2024. "Instrumental variables with unobserved heterogeneity in treatment effects," Handbook of Labor Economics,, Elsevier.
    3. Alvarez, Luis A.F. & Toneto, Rodrigo, 2024. "The interpretation of 2SLS with a continuous instrument: A weighted LATE representation," Economics Letters, Elsevier, vol. 237(C).
    4. Lim, Dennis & Wang, Wenjie & Zhang, Yichong, 2024. "A conditional linear combination test with many weak instruments," Journal of Econometrics, Elsevier, vol. 238(2).
    5. Dennis Lim & Wenjie Wang & Yichong Zhang, 2022. "A Conditional Linear Combination Test with Many Weak Instruments," Papers 2207.11137, arXiv.org, revised Apr 2023.
    6. Alexandre Belloni & Victor Chernozhukov & Christian Hansen, 2010. "LASSO Methods for Gaussian Instrumental Variables Models," Papers 1012.1297, arXiv.org, revised Feb 2011.
    7. Arne Henningsen & Guy Low & David Wuepper & Tobias Dalhaus & Hugo Storm & Dagim Belay & Stefan Hirsch, 2024. "Estimating Causal Effects with Observational Data: Guidelines for Agricultural and Applied Economists," IFRO Working Paper 2024/03, University of Copenhagen, Department of Food and Resource Economics.
    8. A. Belloni & D. Chen & V. Chernozhukov & C. Hansen, 2012. "Sparse Models and Methods for Optimal Instruments With an Application to Eminent Domain," Econometrica, Econometric Society, vol. 80(6), pages 2369-2429, November.
    9. Michal Kolesár, 2013. "Estimation in an Instrumental Variables Model With Treatment Effect Heterogeneity," Working Papers 2013-2, Princeton University. Economics Department..
    10. Matsushita, Yukitoshi & Otsu, Taisuke, 2024. "A jackknife Lagrange multiplier test with many weak instruments," LSE Research Online Documents on Economics 116392, London School of Economics and Political Science, LSE Library.
    11. Guilhem Bascle, 2008. "Controlling for endogeneity with instrumental variables in strategic management research," Post-Print hal-00576795, HAL.
    12. Michael P. Murray, 2006. "Avoiding Invalid Instruments and Coping with Weak Instruments," Journal of Economic Perspectives, American Economic Association, vol. 20(4), pages 111-132, Fall.
    13. Marine Carrasco & Guy Tchuente, 2016. "Efficient Estimation with Many Weak Instruments Using Regularization Techniques," Econometric Reviews, Taylor & Francis Journals, vol. 35(8-10), pages 1609-1637, December.
    14. Matthias Westphal & Daniel A Kamhöfer & Hendrik Schmitz, 2022. "Marginal College Wage Premiums Under Selection Into Employment," The Economic Journal, Royal Economic Society, vol. 132(646), pages 2231-2272.
    15. Tom Boot & Didier Nibbering, 2024. "Inference on LATEs with covariates," Papers 2402.12607, arXiv.org, revised Nov 2024.
    16. Victor Chernozhukov & Ivan Fernandez-Val & Chen Huang & Weining Wang, 2024. "Arellano-bond lasso estimator for dynamic linear panel models," CeMMAP working papers 09/24, Institute for Fiscal Studies.
    17. Eric Gautier & Christiern Rose, 2022. "Fast, Robust Inference for Linear Instrumental Variables Models using Self-Normalized Moments," Papers 2211.02249, arXiv.org, revised Nov 2022.
    18. Michael T. French & Ioana Popovici, 2011. "That instrument is lousy! In search of agreement when using instrumental variables estimation in substance use research," Health Economics, John Wiley & Sons, Ltd., vol. 20(2), pages 127-146, February.
    19. Abadie, Alberto & Gu, Jiaying & Shen, Shu, 2024. "Instrumental variable estimation with first-stage heterogeneity," Journal of Econometrics, Elsevier, vol. 240(2).
    20. Thomas Wiemann, 2023. "Optimal Categorical Instrumental Variables," Papers 2311.17021, arXiv.org, revised May 2024.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2505.13422. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.