IDEAS home Printed from https://ideas.repec.org/p/ifs/cemmap/77-13.html
   My bibliography  Save this paper

Program evaluation with high-dimensional data

Author

Listed:
  • Alexandre Belloni

    (Institute for Fiscal Studies)

  • Victor Chernozhukov

    (Institute for Fiscal Studies and MIT)

  • Ivan Fernandez-Val

    (Institute for Fiscal Studies and Boston University)

  • Christian Hansen

    (Institute for Fiscal Studies and Chicago GSB)

Abstract

In the first part of the paper, we consider estimation and inference on policy relevant treatment effects, such as local average and local quantile treatment effects, in a data-rich environment where there may be many more control variables available than there are observations. In addition to allowing many control variables, the setting we consider allows endogenous receipt of treatment, heterogeneous treatment effects, and function-valued outcomes. To make informative inference possible, we assume that some reduced form predictive relationships are approximately sparse. That is, we require that the relationship between the control variables and the outcome, treatment status, and instrument status can be captured up to a small approximation error using a small number of the control variables whose identities are unknown to the researcher. This condition allows estimation and inference for a wide variety of treatment parameters to proceed after selection of an appropriate set of controls formed by selecting control variables separately for each reduced form relationship and then appropriately combining these reduced form relationships. We provide conditions under which post-selection inference is uniformly valid across a wide-range of models and show that a key condition underlying the uniform validity of post-selection inference allowing for imperfect model selection is the use of approximately unbiased estimating equations. We illustrate the use of the proposed methods with an application to estimating the effect of 401(k) participation on accumulated assets. In the second part of the paper, we present a generalization of the treatment effect framework to a much richer setting, where possibly a continuum of target parameters is of interest and the Lasso-type or post-Lasso type methods are used to estimate a continuum of high-dimensional nuisance functions. This framework encompasses the analysis of local treatment effects as a leading special case and also covers a wide variety of classical and modern moment-condition problems in econometrics. We establish a functional central limit theorem for the continuum of the target parameters, and also show that it holds uniformly in a wide range of data-generating processes P, with continua of approximately sparse nuisance functions. We also establish validity of the multiplier bootstrap for resampling the first order approximations to the standardized continuum of the estimators, and also establish uniform validity in P. We propose a notion of the functional delta method for finding limit distribution and multiplier bootstrap of the smooth functionals of the target parameters that is valid uniformly in P. Finally, we establish rate and consistency results for continua of Lasso or post-Lasso type methods for estimating continua of the (nuisance) regression functions, also providing practical, theoretically justified penalty choices. Each of these results is new and could be of independent interest.

Suggested Citation

  • Alexandre Belloni & Victor Chernozhukov & Ivan Fernandez-Val & Christian Hansen, 2013. "Program evaluation with high-dimensional data," CeMMAP working papers CWP77/13, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
  • Handle: RePEc:ifs:cemmap:77/13
    as

    Download full text from publisher

    File URL: http://www.cemmap.ac.uk/wps/cwp771313.pdf
    Download Restriction: no
    ---><---

    Other versions of this item:

    References listed on IDEAS

    as
    1. Zou, Hui, 2006. "The Adaptive Lasso and Its Oracle Properties," Journal of the American Statistical Association, American Statistical Association, vol. 101, pages 1418-1429, December.
    2. Hansen, Lars Peter, 1982. "Large Sample Properties of Generalized Method of Moments Estimators," Econometrica, Econometric Society, vol. 50(4), pages 1029-1054, July.
    3. Andrews, Donald W K, 1994. "Asymptotics for Semiparametric Econometric Models via Stochastic Equicontinuity," Econometrica, Econometric Society, vol. 62(1), pages 43-72, January.
    4. Guido W. Imbens & Whitney K. Newey, 2009. "Identification and Estimation of Triangular Simultaneous Equations Models Without Additivity," Econometrica, Econometric Society, vol. 77(5), pages 1481-1512, September.
    5. Robinson, Peter M, 1988. "Root- N-Consistent Semiparametric Regression," Econometrica, Econometric Society, vol. 56(4), pages 931-954, July.
    6. Victor Chernozhukov & Christian Hansen & Martin Spindler, 2015. "Post-Selection and Post-Regularization Inference in Linear Models with Many Controls and Instruments," American Economic Review, American Economic Association, vol. 105(5), pages 486-490, May.
    7. Victor Chernozhukov & Denis Chetverikov & Kengo Kato, 2012. "Gaussian approximation of suprema of empirical processes," CeMMAP working papers CWP44/12, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    8. Cattaneo, Matias D., 2010. "Efficient semiparametric estimation of multi-valued treatment effects under ignorability," Journal of Econometrics, Elsevier, vol. 155(2), pages 138-154, April.
    9. Hahn, Jinyong, 1997. "Bayesian Bootstrap of the Quantile Regression Estimator: A Large Sample Study," International Economic Review, Department of Economics, University of Pennsylvania and Osaka University Institute of Social and Economic Research Association, vol. 38(4), pages 795-808, November.
    10. Xiaohong Chen & Oliver Linton & Ingrid Van Keilegom, 2003. "Estimation of Semiparametric Models when the Criterion Function Is Not Smooth," Econometrica, Econometric Society, vol. 71(5), pages 1591-1608, September.
    11. Kline Patrick & Santos Andres, 2012. "A Score Based Approach to Wild Bootstrap Inference," Journal of Econometric Methods, De Gruyter, vol. 1(1), pages 23-41, August.
    12. Chamberlain, Gary & Imbens, Guido W, 2003. "Nonparametric Applications of Bayesian Inference," Journal of Business & Economic Statistics, American Statistical Association, vol. 21(1), pages 12-18, January.
    13. Jeffrey M Wooldridge, 2010. "Econometric Analysis of Cross Section and Panel Data," MIT Press Books, The MIT Press, edition 2, volume 1, number 0262232588, April.
    14. Linton, Oliver, 1996. "Edgeworth Approximation for MINPIN Estimators in Semiparametric Regression Models," Econometric Theory, Cambridge University Press, vol. 12(1), pages 30-60, March.
    15. Juan Carlos Escanciano & Lin Zhu, 2013. "Set inferences and sensitivity analysis in semiparametric conditionally identified models," CeMMAP working papers CWP55/13, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    16. Alexandre Belloni & Victor Chernozhukov & Ying Wei, 2013. "Honest confidence regions for a regression parameter in logistic regression with a large number of controls," CeMMAP working papers CWP67/13, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    17. A. Belloni & D. Chen & V. Chernozhukov & C. Hansen, 2012. "Sparse Models and Methods for Optimal Instruments With an Application to Eminent Domain," Econometrica, Econometric Society, vol. 80(6), pages 2369-2429, November.
    18. Newey, Whitney K, 1990. "Semiparametric Efficiency Bounds," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 5(2), pages 99-135, April-Jun.
    19. Koenker, Roger, 1988. "Asymptotic Theory and Econometric Practice," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 3(2), pages 139-147, April.
    20. Imbens, Guido W & Angrist, Joshua D, 1994. "Identification and Estimation of Local Average Treatment Effects," Econometrica, Econometric Society, vol. 62(2), pages 467-475, March.
    21. Abadie, Alberto, 2003. "Semiparametric instrumental variable estimation of treatment response models," Journal of Econometrics, Elsevier, vol. 113(2), pages 231-263, April.
    22. Rothe, Christoph & Firpo, Sergio Pinheiro, 2013. "Semiparametric estimation and inference using doubly robust moment conditions," Textos para discussão 330, FGV EESP - Escola de Economia de São Paulo, Fundação Getulio Vargas (Brazil).
    23. Alexandre Belloni & Victor Chernozhukov & Kengo Kato, 2013. "Uniform post selection inference for LAD regression models," CeMMAP working papers CWP24/13, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    24. Chunrong Ai & Xiaohong Chen, 2003. "Efficient Estimation of Models with Conditional Moment Restrictions Containing Unknown Functions," Econometrica, Econometric Society, vol. 71(6), pages 1795-1843, November.
    25. Jinyong Hahn, 1998. "On the Role of the Propensity Score in Efficient Semiparametric Estimation of Average Treatment Effects," Econometrica, Econometric Society, vol. 66(2), pages 315-332, March.
    26. Han Hong & Denis Nekipelov, 2010. "Semiparametric efficiency in nonlinear LATE models," Quantitative Economics, Econometric Society, vol. 1(2), pages 279-304, November.
    27. Hansen, Bruce E, 1996. "Inference When a Nuisance Parameter Is Not Identified under the Null Hypothesis," Econometrica, Econometric Society, vol. 64(2), pages 413-430, March.
    28. Matias D. Cattaneo, 2010. "multi-valued treatment effects," The New Palgrave Dictionary of Economics,, Palgrave Macmillan.
    29. Newey, Whitney K., 1997. "Convergence rates and asymptotic normality for series estimators," Journal of Econometrics, Elsevier, vol. 79(1), pages 147-168, July.
    30. Fan J. & Li R., 2001. "Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties," Journal of the American Statistical Association, American Statistical Association, vol. 96, pages 1348-1360, December.
    31. Andrew Chesher, 2003. "Identification in Nonseparable Models," Econometrica, Econometric Society, vol. 71(5), pages 1405-1441, September.
    32. David A. Wise, 1994. "Studies in the Economics of Aging," NBER Books, National Bureau of Economic Research, Inc, number wise94-1.
    33. A. Belloni & V. Chernozhukov & L. Wang, 2011. "Square-root lasso: pivotal recovery of sparse signals via conic programming," Biometrika, Biometrika Trust, vol. 98(4), pages 791-806.
    34. Leeb, Hannes & Pötscher, Benedikt M., 2008. "Can One Estimate The Unconditional Distribution Of Post-Model-Selection Estimators?," Econometric Theory, Cambridge University Press, vol. 24(2), pages 338-376, April.
    35. Mehmet Caner & Hao Helen Zhang, 2014. "Adaptive Elastic Net for Generalized Methods of Moments," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 32(1), pages 30-47, January.
    36. Markus Frölich & Blaise Melly, 2013. "Identification of Treatment Effects on the Treated with One-Sided Non-Compliance," Econometric Reviews, Taylor & Francis Journals, vol. 32(3), pages 384-414, November.
    37. Eric M. Engen & William G. Gale & John Karl Scholz, 1996. "The Illusory Effects of Saving Incentives on Saving," Journal of Economic Perspectives, American Economic Association, vol. 10(4), pages 113-138, Fall.
    38. Hong, H. & Scaillet, O., 2006. "A fast subsampling method for nonlinear dynamic models," Journal of Econometrics, Elsevier, vol. 133(2), pages 557-578, August.
    39. Victor Chernozhukov & Christian Hansen, 2004. "The Effects of 401(K) Participation on the Wealth Distribution: An Instrumental Quantile Regression Analysis," The Review of Economics and Statistics, MIT Press, vol. 86(3), pages 735-751, August.
    40. Elizabeth L. Ogburn & Andrea Rotnitzky & James M. Robins, 2015. "Doubly robust estimation of the local average treatment effect curve," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 77(2), pages 373-396, March.
    41. Abadie A., 2002. "Bootstrap Tests for Distributional Treatment Effects in Instrumental Variable Models," Journal of the American Statistical Association, American Statistical Association, vol. 97, pages 284-292, March.
    42. Edward Vytlacil, 2002. "Independence, Monotonicity, and Latent Index Models: An Equivalence Result," Econometrica, Econometric Society, vol. 70(1), pages 331-341, January.
    43. Benjamin, Daniel J., 2003. "Does 401(k) eligibility increase saving?: Evidence from propensity score subclassification," Journal of Public Economics, Elsevier, vol. 87(5-6), pages 1259-1290, May.
    44. Hansen, Lars Peter & Singleton, Kenneth J, 1982. "Generalized Instrumental Variables Estimation of Nonlinear Rational Expectations Models," Econometrica, Econometric Society, vol. 50(5), pages 1269-1286, September.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Alexandre Belloni & Victor Chernozhukov & Kengo Kato, 2019. "Valid Post-Selection Inference in High-Dimensional Approximately Sparse Quantile Regression Models," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 114(526), pages 749-758, April.
    2. Christian Hansen & Damian Kozbur & Sanjog Misra, 2016. "Targeted undersmoothing," ECON - Working Papers 282, Department of Economics - University of Zurich, revised Apr 2018.
    3. Alexandre Belloni & Victor Chernozhukov & Lie Wang, 2013. "Pivotal estimation via square-root lasso in nonparametric regression," CeMMAP working papers CWP62/13, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    4. Hansen, Christian & Liao, Yuan, 2019. "The Factor-Lasso And K-Step Bootstrap Approach For Inference In High-Dimensional Economic Applications," Econometric Theory, Cambridge University Press, vol. 35(3), pages 465-509, June.
    5. Alexandre Belloni & Victor Chernozhukov & Christian Hansen & Damian Kozbur, 2016. "Inference in High-Dimensional Panel Models With an Application to Gun Control," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 34(4), pages 590-605, October.
    6. Damian Kozbur, 2017. "Testing-Based Forward Model Selection," American Economic Review, American Economic Association, vol. 107(5), pages 266-269, May.
    7. Matias D. Cattaneo & Michael Jansson, 2014. "Bootstrapping Kernel-Based Semiparametric Estimators," CREATES Research Papers 2014-25, Department of Economics and Business Economics, Aarhus University.
    8. Danquah, Michael & Iddrisu, Abdul Malik & Boakye, Ernest Owusu & Owusu, Solomon, 2021. "Do gender wage differences within households influence women's empowerment and welfare? Evidence from Ghana," Journal of Economic Behavior & Organization, Elsevier, vol. 188(C), pages 916-932.
    9. Susan Athey & Julie Tibshirani & Stefan Wager, 2016. "Generalized Random Forests," Papers 1610.01271, arXiv.org, revised Apr 2018.
    10. Ning Xu & Jian Hong & Timothy C. G. Fisher, 2016. "Finite-sample and asymptotic analysis of generalization ability with an application to penalized regression," Papers 1609.03344, arXiv.org, revised Sep 2016.
    11. Huber, Martin & Wüthrich, Kaspar, 2017. "Evaluating local average and quantile treatment effects under endogeneity based on instruments: a review," FSES Working Papers 479, Faculty of Economics and Social Sciences, University of Freiburg/Fribourg Switzerland.
    12. Denis Chetverikov & . ., 2016. "On cross-validated Lasso," CeMMAP working papers 47/16, Institute for Fiscal Studies.
    13. Farrell, Max H., 2015. "Robust inference on average treatment effects with possibly more covariates than observations," Journal of Econometrics, Elsevier, vol. 189(1), pages 1-23.
    14. Kasy Maximilian, 2019. "Uniformity and the Delta Method," Journal of Econometric Methods, De Gruyter, vol. 8(1), pages 1-19, January.
    15. Shao, Shuai & Xu, Le & Yang, Lili & Yu, Dianfan, 2024. "How do energy-saving policies improve environmental quality: Evidence from China’s Top 10,000 energy-consuming enterprises program," World Development, Elsevier, vol. 175(C).
    16. Guo, Zijian & Kang, Hyunseung & Cai, T. Tony & Small, Dylan S., 2018. "Testing endogeneity with high dimensional covariates," Journal of Econometrics, Elsevier, vol. 207(1), pages 175-187.
    17. Denis Chetverikov & . ., 2016. "On cross-validated Lasso," CeMMAP working papers CWP47/16, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    18. Braverman, Mark & Chassang, Sylvain, 2022. "Data-driven incentive alignment in capitation schemes," Journal of Public Economics, Elsevier, vol. 207(C).
    19. Guido W. Imbens, 2020. "Potential Outcome and Directed Acyclic Graph Approaches to Causality: Relevance for Empirical Practice in Economics," Journal of Economic Literature, American Economic Association, vol. 58(4), pages 1129-1179, December.
    20. Victor Chernozhukov & Vira Semenova, 2018. "Simultaneous inference for Best Linear Predictor of the Conditional Average Treatment Effect and other structural functions," CeMMAP working papers CWP40/18, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    21. Kaspar W thrich, 2015. "Semiparametric estimation of quantile treatment effects with endogeneity," Diskussionsschriften dp1509, Universitaet Bern, Departement Volkswirtschaft.
    22. Susan Athey & Guido W. Imbens, 2017. "The State of Applied Econometrics: Causality and Policy Evaluation," Journal of Economic Perspectives, American Economic Association, vol. 31(2), pages 3-32, Spring.
    23. Thai T. Pham & Yuanyuan Shen, 2017. "A Deep Causal Inference Approach to Measuring the Effects of Forming Group Loans in Online Non-profit Microfinance Platform," Papers 1706.02795, arXiv.org.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. A. Belloni & V. Chernozhukov & I. Fernández‐Val & C. Hansen, 2017. "Program Evaluation and Causal Inference With High‐Dimensional Data," Econometrica, Econometric Society, vol. 85, pages 233-298, January.
    2. Victor Chernozhukov & Denis Chetverikov & Mert Demirer & Esther Duflo & Christian Hansen & Whitney Newey & James Robins, 2018. "Double/debiased machine learning for treatment and structural parameters," Econometrics Journal, Royal Economic Society, vol. 21(1), pages 1-68, February.
    3. Victor Chernozhukov & Denis Chetverikov & Mert Demirer & Esther Duflo & Christian Hansen & Whitney Newey & James Robins, 2016. "Double/Debiased Machine Learning for Treatment and Causal Parameters," Papers 1608.00060, arXiv.org, revised Nov 2024.
    4. Victor Chernozhukov & Denis Chetverikov & Mert Demirer & Esther Duflo & Christian Hansen & Whitney K. Newey, 2016. "Double machine learning for treatment and causal parameters," CeMMAP working papers 49/16, Institute for Fiscal Studies.
    5. Yu, Ping & Phillips, Peter C.B., 2018. "Threshold regression with endogeneity," Journal of Econometrics, Elsevier, vol. 203(1), pages 50-68.
    6. Farrell, Max H., 2015. "Robust inference on average treatment effects with possibly more covariates than observations," Journal of Econometrics, Elsevier, vol. 189(1), pages 1-23.
    7. Chen, Xiaohong, 2007. "Large Sample Sieve Estimation of Semi-Nonparametric Models," Handbook of Econometrics, in: J.J. Heckman & E.E. Leamer (ed.), Handbook of Econometrics, edition 1, volume 6, chapter 76, Elsevier.
    8. Victor Chernozhukov & Juan Carlos Escanciano & Hidehiko Ichimura & Whitney K. Newey & James M. Robins, 2022. "Locally Robust Semiparametric Estimation," Econometrica, Econometric Society, vol. 90(4), pages 1501-1535, July.
    9. Haitian Xie, 2020. "Efficient and Robust Estimation of the Generalized LATE Model," Papers 2001.06746, arXiv.org, revised Feb 2022.
    10. Guido W. Imbens & Jeffrey M. Wooldridge, 2009. "Recent Developments in the Econometrics of Program Evaluation," Journal of Economic Literature, American Economic Association, vol. 47(1), pages 5-86, March.
    11. Alexandre Belloni & Victor Chernozhukov & Christian Hansen, 2014. "High-Dimensional Methods and Inference on Structural and Treatment Effects," Journal of Economic Perspectives, American Economic Association, vol. 28(2), pages 29-50, Spring.
    12. Firpo, Sergio Pinheiro & Pinto, Rafael de Carvalho Cayres, 2012. "Combining Strategies for the Estimation of Treatment Effects," Brazilian Review of Econometrics, Sociedade Brasileira de Econometria - SBE, vol. 32(1), March.
    13. Agboola, Oluwagbenga David & Yu, Han, 2023. "Neighborhood-based cross fitting approach to treatment effects with high-dimensional data," Computational Statistics & Data Analysis, Elsevier, vol. 186(C).
    14. Dong, Chaohua & Gao, Jiti & Linton, Oliver, 2023. "High dimensional semiparametric moment restriction models," Journal of Econometrics, Elsevier, vol. 232(2), pages 320-345.
    15. Carneiro, Pedro & Lee, Sokbae, 2009. "Estimating distributions of potential outcomes using local instrumental variables with an application to changes in college enrollment and wage inequality," Journal of Econometrics, Elsevier, vol. 149(2), pages 191-208, April.
    16. Phillip Heiler, 2020. "Efficient Covariate Balancing for the Local Average Treatment Effect," Papers 2007.04346, arXiv.org.
    17. Wüthrich, Kaspar, 2019. "A closed-form estimator for quantile treatment effects with endogeneity," Journal of Econometrics, Elsevier, vol. 210(2), pages 219-235.
    18. Qi Li & Jeffrey Scott Racine, 2006. "Nonparametric Econometrics: Theory and Practice," Economics Books, Princeton University Press, edition 1, volume 1, number 8355.
    19. Sant’Anna, Pedro H.C. & Zhao, Jun, 2020. "Doubly robust difference-in-differences estimators," Journal of Econometrics, Elsevier, vol. 219(1), pages 101-122.
    20. Halbert White & Karim Chalak, 2013. "Identification and Identification Failure for Treatment Effects Using Structural Systems," Econometric Reviews, Taylor & Francis Journals, vol. 32(3), pages 273-317, November.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:ifs:cemmap:77/13. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Emma Hyman (email available below). General contact details of provider: https://edirc.repec.org/data/cmifsuk.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.