IDEAS home Printed from https://ideas.repec.org/p/arx/papers/1903.08704.html
   My bibliography  Save this paper

Omitted variable bias of Lasso-based inference methods: A finite sample analysis

Author

Listed:
  • Kaspar Wuthrich
  • Ying Zhu

Abstract

We study the finite sample behavior of Lasso-based inference methods such as post double Lasso and debiased Lasso. We show that these methods can exhibit substantial omitted variable biases (OVBs) due to Lasso not selecting relevant controls. This phenomenon can occur even when the coefficients are sparse and the sample size is large and larger than the number of controls. Therefore, relying on the existing asymptotic inference theory can be problematic in empirical applications. We compare the Lasso-based inference methods to modern high-dimensional OLS-based methods and provide practical guidance.

Suggested Citation

  • Kaspar Wuthrich & Ying Zhu, 2019. "Omitted variable bias of Lasso-based inference methods: A finite sample analysis," Papers 1903.08704, arXiv.org, revised Sep 2021.
  • Handle: RePEc:arx:papers:1903.08704
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/1903.08704
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. David A. Wise, 1994. "Studies in the Economics of Aging," NBER Books, National Bureau of Economic Research, Inc, number wise94-1, March.
    2. Christoph Rothe, 2017. "Robust Confidence Intervals for Average Treatment Effects Under Limited Overlap," Econometrica, Econometric Society, vol. 85, pages 645-660, March.
    3. Graham Elliott & Ulrich K. Müller & Mark W. Watson, 2015. "Nearly Optimal Tests When a Nuisance Parameter Is Present Under the Null Hypothesis," Econometrica, Econometric Society, vol. 83, pages 771-811, March.
    4. Victor Chernozhukov & Iván Fernández-Val & Blaise Melly & Kaspar Wüthrich, 2020. "Generic Inference on Quantile and Quantile Effect Functions for Discrete Outcomes," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 115(529), pages 123-137, January.
    5. Victor Chernozhukov & Denis Chetverikov & Mert Demirer & Esther Duflo & Christian Hansen & Whitney Newey & James Robins, 2018. "Double/debiased machine learning for treatment and structural parameters," Econometrics Journal, Royal Economic Society, vol. 21(1), pages 1-68, February.
    6. David A. Wise, 1998. "Introduction to "Frontiers in the Economics of Aging"," NBER Chapters, in: Frontiers in the Economics of Aging, pages 1-20, National Bureau of Economic Research, Inc.
    7. Robinson, Peter M, 1988. "Root- N-Consistent Semiparametric Regression," Econometrica, Econometric Society, vol. 56(4), pages 931-954, July.
    8. Alexandre Belloni & Victor Chernozhukov & Christian Hansen & Damian Kozbur, 2016. "Inference in High-Dimensional Panel Models With an Application to Gun Control," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 34(4), pages 590-605, October.
    9. Shawn A Cole & A Nilesh Fernando, 2021. "‘Mobile’izing Agricultural Advice Technology Adoption Diffusion and Sustainability [Dial “a” for agriculture: using ICTs for agricultural extension in development countries]," The Economic Journal, Royal Economic Society, vol. 131(633), pages 192-219.
    10. James M. Poterba & Steven F. Venti, 1998. "Personal Retirement Saving Programs and Asset Accumulation: Reconciling the Evidence," NBER Chapters, in: Frontiers in the Economics of Aging, pages 23-124, National Bureau of Economic Research, Inc.
    11. Leeb, Hannes & Pötscher, Benedikt M., 2008. "Can One Estimate The Unconditional Distribution Of Post-Model-Selection Estimators?," Econometric Theory, Cambridge University Press, vol. 24(2), pages 338-376, April.
    12. Matias D. Cattaneo & Michael Jansson & Whitney K. Newey, 2018. "Inference in Linear Regression Models with Many Covariates and Heteroscedasticity," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 113(523), pages 1350-1361, July.
    13. Timothy B. Armstrong & Michal Kolesár & Soonwoo Kwon, 2020. "Bias-Aware Inference in Regularized Regression Models," Working Papers 2020-2, Princeton University. Economics Department..
    14. Schmitz, Hendrik & Westphal, Matthias, 2017. "Informal care and long-term labor market outcomes," Journal of Health Economics, Elsevier, vol. 56(C), pages 1-18.
    15. Caner, Mehmet & Kock, Anders Bredahl, 2018. "Asymptotically honest confidence regions for high dimensional parameters by the desparsified conservative Lasso," Journal of Econometrics, Elsevier, vol. 203(1), pages 143-168.
    16. Damon Jones & David Molitor & Julian Reif, 2019. "What do Workplace Wellness Programs do? Evidence from the Illinois Workplace Wellness Study," The Quarterly Journal of Economics, President and Fellows of Harvard College, vol. 134(4), pages 1747-1791.
    17. Benjamin Enke, 2020. "Moral Values and Voting," Journal of Political Economy, University of Chicago Press, vol. 128(10), pages 3679-3729.
    18. Michal Kolesár & Christoph Rothe, 2018. "Inference in Regression Discontinuity Designs with a Discrete Running Variable," American Economic Review, American Economic Association, vol. 108(8), pages 2277-2304, August.
    19. Friedman, Jerome H. & Hastie, Trevor & Tibshirani, Rob, 2010. "Regularization Paths for Generalized Linear Models via Coordinate Descent," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 33(i01).
    20. Patrick Kline & Raffaele Saggio & Mikkel Sølvsten, 2020. "Leave‐Out Estimation of Variance Components," Econometrica, Econometric Society, vol. 88(5), pages 1859-1898, September.
    21. Decker, Simon & Schmitz, Hendrik, 2016. "Health shocks and risk aversion," Journal of Health Economics, Elsevier, vol. 50(C), pages 156-170.
    22. Xianyang Zhang & Guang Cheng, 2017. "Simultaneous Inference for High-Dimensional Linear Models," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 112(518), pages 757-768, April.
    23. Farrell, Max H., 2015. "Robust inference on average treatment effects with possibly more covariates than observations," Journal of Econometrics, Elsevier, vol. 189(1), pages 1-23.
    24. Koen Jochmans, 2022. "Heteroscedasticity-Robust Inference in Linear Regression Models With Many Covariates," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 117(538), pages 887-896, April.
    25. A. Belloni & D. Chen & V. Chernozhukov & C. Hansen, 2012. "Sparse Models and Methods for Optimal Instruments With an Application to Eminent Domain," Econometrica, Econometric Society, vol. 80(6), pages 2369-2429, November.
    26. Victor Chernozhukov & Christian Hansen, 2004. "The Effects of 401(K) Participation on the Wealth Distribution: An Instrumental Quantile Regression Analysis," The Review of Economics and Statistics, MIT Press, vol. 86(3), pages 735-751, August.
    27. Daniel L. Chen, 2015. "Can markets stimulate rights? On the alienability of legal claims," RAND Journal of Economics, RAND Corporation, vol. 46(1), pages 23-65, March.
    28. Poterba, James M. & Venti, Steven F. & Wise, David A., 1995. "Do 401(k) contributions crowd out other personal saving?," Journal of Public Economics, Elsevier, vol. 58(1), pages 1-32, September.
    29. Roland G. Fryer & Steven D. Levitt, 2013. "Testing for Racial Differences in the Mental Ability of Young Children," American Economic Review, American Economic Association, vol. 103(2), pages 981-1005, April.
    30. Alexandre Belloni & Victor Chernozhukov & Christian Hansen, 2014. "Inference on Treatment Effects after Selection among High-Dimensional Controlsâ€," The Review of Economic Studies, Review of Economic Studies Ltd, vol. 81(2), pages 608-650.
    31. Abadie, Alberto, 2003. "Semiparametric instrumental variable estimation of treatment response models," Journal of Econometrics, Elsevier, vol. 113(2), pages 231-263, April.
    32. Emily Breza & Arun G. Chandrasekhar, 2019. "Social Networks, Reputation, and Commitment: Evidence From a Savings Monitors Experiment," Econometrica, Econometric Society, vol. 87(1), pages 175-216, January.
    33. David A. Wise, 1998. "Frontiers in the Economics of Aging," NBER Books, National Bureau of Economic Research, Inc, number wise98-1, March.
    34. Benjamin, Daniel J., 2003. "Does 401(k) eligibility increase saving?: Evidence from propensity score subclassification," Journal of Public Economics, Elsevier, vol. 87(5-6), pages 1259-1290, May.
    35. Cun-Hui Zhang & Stephanie S. Zhang, 2014. "Confidence intervals for low dimensional parameters in high dimensional linear models," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 76(1), pages 217-242, January.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Breinlich, Holger & Corradi, Valentina & Rocha, Nadia & Ruta, Michele & Silva, J.M.C. Santos & Zylkin, Tom, 2021. "Machine learning in international trade research - evaluating the impact of trade agreements," LSE Research Online Documents on Economics 114379, London School of Economics and Political Science, LSE Library.
    2. Ahrens, Achim & Hansen, Christian B. & Schaffer, Mark E & Wiemann, Thomas, 2024. "Model Averaging and Double Machine Learning," IZA Discussion Papers 16714, Institute of Labor Economics (IZA).
    3. Joshua D. Angrist, 2022. "Empirical Strategies in Economics: Illuminating the Path From Cause to Effect," Econometrica, Econometric Society, vol. 90(6), pages 2509-2539, November.
    4. Masayuki Hirukawa & Di Liu & Irina Murtazashvili & Artem Prokhorov, 2023. "DS-HECK: double-lasso estimation of Heckman selection model," Empirical Economics, Springer, vol. 64(6), pages 3167-3195, June.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Alexandre Belloni & Victor Chernozhukov & Denis Chetverikov & Christian Hansen & Kengo Kato, 2018. "High-dimensional econometrics and regularized GMM," CeMMAP working papers CWP35/18, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    2. Yu, Ping & Phillips, Peter C.B., 2018. "Threshold regression with endogeneity," Journal of Econometrics, Elsevier, vol. 203(1), pages 50-68.
    3. Victor Chernozhukov & Ivan Fernandez-Val & Christian Hansen, 2013. "Program evaluation with high-dimensional data," CeMMAP working papers CWP57/13, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    4. Wüthrich, Kaspar, 2019. "A closed-form estimator for quantile treatment effects with endogeneity," Journal of Econometrics, Elsevier, vol. 210(2), pages 219-235.
    5. Philipp Bach & Victor Chernozhukov & Malte S. Kurz & Martin Spindler & Sven Klaassen, 2021. "DoubleML -- An Object-Oriented Implementation of Double Machine Learning in R," Papers 2103.09603, arXiv.org, revised Feb 2024.
    6. Hiroaki Kaido & Kaspar Wüthrich, 2021. "Decentralization estimators for instrumental variable quantile regression models," Quantitative Economics, Econometric Society, vol. 12(2), pages 443-475, May.
    7. Kaspar W thrich, 2015. "Semiparametric estimation of quantile treatment effects with endogeneity," Diskussionsschriften dp1509, Universitaet Bern, Departement Volkswirtschaft.
    8. Agboola, Oluwagbenga David & Yu, Han, 2023. "Neighborhood-based cross fitting approach to treatment effects with high-dimensional data," Computational Statistics & Data Analysis, Elsevier, vol. 186(C).
    9. A. Belloni & V. Chernozhukov & I. Fernández‐Val & C. Hansen, 2017. "Program Evaluation and Causal Inference With High‐Dimensional Data," Econometrica, Econometric Society, vol. 85, pages 233-298, January.
    10. Victor Chernozhukov & Denis Chetverikov & Mert Demirer & Esther Duflo & Christian Hansen & Whitney Newey & James Robins, 2018. "Double/debiased machine learning for treatment and structural parameters," Econometrics Journal, Royal Economic Society, vol. 21(1), pages 1-68, February.
    11. Michael C Knaus & Michael Lechner & Anthony Strittmatter, 2021. "Machine learning estimation of heterogeneous causal effects: Empirical Monte Carlo evidence," The Econometrics Journal, Royal Economic Society, vol. 24(1), pages 134-161.
    12. Hansen, Christian & Liao, Yuan, 2019. "The Factor-Lasso And K-Step Bootstrap Approach For Inference In High-Dimensional Economic Applications," Econometric Theory, Cambridge University Press, vol. 35(3), pages 465-509, June.
    13. Victor Chernozhukov & Denis Chetverikov & Mert Demirer & Esther Duflo & Christian Hansen & Whitney K. Newey, 2016. "Double machine learning for treatment and causal parameters," CeMMAP working papers 49/16, Institute for Fiscal Studies.
    14. Zongwu Cai & Ying Fang & Ming Lin & Shengfang Tang, 2020. "Testing Unconfoundedness Assumption Using Auxiliary Variables," WORKING PAPERS SERIES IN THEORETICAL AND APPLIED ECONOMICS 202004, University of Kansas, Department of Economics, revised Feb 2020.
    15. Yang Ning & Sida Peng & Jing Tao, 2020. "Doubly Robust Semiparametric Difference-in-Differences Estimators with High-Dimensional Data," Papers 2009.03151, arXiv.org.
    16. Matias D Cattaneo & Michael Jansson & Xinwei Ma, 2019. "Two-Step Estimation and Inference with Possibly Many Included Covariates," The Review of Economic Studies, Review of Economic Studies Ltd, vol. 86(3), pages 1095-1122.
    17. Damian Kozbur, 2017. "Testing-Based Forward Model Selection," American Economic Review, American Economic Association, vol. 107(5), pages 266-269, May.
    18. Simon Calmar Andersen & Louise Beuchert & Phillip Heiler & Helena Skyt Nielsen, 2023. "A Guide to Impact Evaluation under Sample Selection and Missing Data: Teacher's Aides and Adolescent Mental Health," Papers 2308.04963, arXiv.org.
    19. Adamek, Robert & Smeekes, Stephan & Wilms, Ines, 2023. "Lasso inference for high-dimensional time series," Journal of Econometrics, Elsevier, vol. 235(2), pages 1114-1143.
    20. Timothy B. Armstrong & Michal Kolesár & Soonwoo Kwon, 2020. "Bias-Aware Inference in Regularized Regression Models," Working Papers 2020-2, Princeton University. Economics Department..

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:1903.08704. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.