IDEAS home Printed from https://ideas.repec.org/p/arx/papers/1712.04802.html

Fisher-Schultz Lecture: Generic Machine Learning Inference on Heterogenous Treatment Effects in Randomized Experiments, with an Application to Immunization in India

Author

Listed:
  • Victor Chernozhukov
  • Mert Demirer
  • Esther Duflo
  • Iv'an Fern'andez-Val

Abstract

We propose strategies to estimate and make inference on key features of heterogeneous effects in randomized experiments. These key features include best linear predictors of the effects using machine learning proxies, average effects sorted by impact groups, and average characteristics of most and least impacted units. The approach is valid in high dimensional settings, where the effects are proxied (but not necessarily consistently estimated) by predictive and causal machine learning methods. We post-process these proxies into estimates of the key features. Our approach is generic, it can be used in conjunction with penalized methods, neural networks, random forests, boosted trees, and ensemble methods, both predictive and causal. Estimation and inference are based on repeated data splitting to avoid overfitting and achieve validity. We use quantile aggregation of the results across many potential splits, in particular taking medians of p-values and medians and other quantiles of confidence intervals. We show that quantile aggregation lowers estimation risks over a single split procedure, and establish its principal inferential properties. Finally, our analysis reveals ways to build provably better machine learning proxies through causal learning: we can use the objective functions that we develop to construct the best linear predictors of the effects, to obtain better machine learning proxies in the initial step. We illustrate the use of both inferential tools and causal learners with a randomized field experiment that evaluates a combination of nudges to stimulate demand for immunization in India.

Suggested Citation

  • Victor Chernozhukov & Mert Demirer & Esther Duflo & Iv'an Fern'andez-Val, 2017. "Fisher-Schultz Lecture: Generic Machine Learning Inference on Heterogenous Treatment Effects in Randomized Experiments, with an Application to Immunization in India," Papers 1712.04802, arXiv.org, revised Oct 2023.
  • Handle: RePEc:arx:papers:1712.04802
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/1712.04802
    File Function: Latest version
    Download Restriction: no
    ---><---

    Other versions of this item:

    References listed on IDEAS

    as
    1. Victor Chernozhukov & Iván Fernández‐Val & Ye Luo, 2018. "The Sorted Effects Method: Discovering Heterogeneous Effects Beyond Their Averages," Econometrica, Econometric Society, vol. 86(6), pages 1911-1938, November.
    2. Gneezy, Uri & Rustichini, Aldo, 2000. "A Fine is a Price," The Journal of Legal Studies, University of Chicago Press, vol. 29(1), pages 1-17, January.
    3. Tatyana Deryugina & Garth Heutel & Nolan H. Miller & David Molitor & Julian Reif, 2019. "The Mortality and Medical Costs of Air Pollution: Evidence from Changes in Wind Direction," American Economic Review, American Economic Association, vol. 109(12), pages 4178-4219, December.
    4. Victor Chernozhukov & Kaspar Wüthrich & Yinchu Zhu, 2021. "An Exact and Robust Conformal Inference Method for Counterfactual and Synthetic Controls," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 116(536), pages 1849-1864, October.
    5. Meinshausen, Nicolai & Meier, Lukas & Bühlmann, Peter, 2009. "p-Values for High-Dimensional Regression," Journal of the American Statistical Association, American Statistical Association, vol. 104(488), pages 1671-1681.
    6. Stefan Wager & Susan Athey, 2018. "Estimation and Inference of Heterogeneous Treatment Effects using Random Forests," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 113(523), pages 1228-1242, July.
    7. Wright, Marvin N. & Ziegler, Andreas, 2017. "ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 77(i01).
    8. Victor Chernozhukov & Denis Chetverikov & Mert Demirer & Esther Duflo & Christian Hansen & Whitney Newey & James Robins, 2018. "Double/debiased machine learning for treatment and structural parameters," Econometrics Journal, Royal Economic Society, vol. 21(1), pages 1-68, February.
    9. Alexandre Belloni & Victor Chernozhukov & Christian Hansen, 2014. "Inference on Treatment Effects after Selection among High-Dimensional Controlsâ€," The Review of Economic Studies, Review of Economic Studies Ltd, vol. 81(2), pages 608-650.
    10. Alberto Abadie, 2005. "Semiparametric Difference-in-Differences Estimators," The Review of Economic Studies, Review of Economic Studies Ltd, vol. 72(1), pages 1-19.
    11. Duflo, Esther & Glennerster, Rachel & Kremer, Michael, 2008. "Using Randomization in Development Economics Research: A Toolkit," Handbook of Development Economics, in: T. Paul Schultz & John A. Strauss (ed.), Handbook of Development Economics, edition 1, volume 4, chapter 61, pages 3895-3962, Elsevier.
    12. Alexandre Belloni & Victor Chernozhukov & Kengo Kato, 2013. "Uniform post selection inference for LAD regression models," CeMMAP working papers CWP24/13, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    13. Christian Hansen & Damian Kozbur & Sanjog Misra, 2016. "Targeted undersmoothing," ECON - Working Papers 282, Department of Economics - University of Zurich, revised Apr 2018.
    14. Jonathan M.V. Davis & Sara B. Heller, 2020. "Rethinking the Benefits of Youth Employment Programs: The Heterogeneous Effects of Summer Jobs," The Review of Economics and Statistics, MIT Press, vol. 102(4), pages 664-677, October.
    15. Imbens,Guido W. & Rubin,Donald B., 2015. "Causal Inference for Statistics, Social, and Biomedical Sciences," Cambridge Books, Cambridge University Press, number 9780521885881, Enero-Abr.
    16. Michael Zimmert & Michael Lechner, 2019. "Nonparametric estimation of causal heterogeneity under high-dimensional confounding," Papers 1908.08779, arXiv.org.
    17. Keisuke Hirano & Guido W. Imbens & Geert Ridder, 2003. "Efficient Estimation of Average Treatment Effects Using the Estimated Propensity Score," Econometrica, Econometric Society, vol. 71(4), pages 1161-1189, July.
    18. Alexandre Belloni & Victor Chernozhukov & Kengo Kato, 2013. "Uniform Post Selection Inference for LAD Regression and Other Z-estimation problems," Papers 1304.0282, arXiv.org, revised Oct 2020.
    19. Reshmaan Hussam & Natalia Rigol & Benjamin N. Roth, 2022. "Targeting High Ability Entrepreneurs Using Community Information: Mechanism Design in the Field," American Economic Review, American Economic Association, vol. 112(3), pages 861-898, March.
    20. Guido W. Imbens, 2004. "Nonparametric Estimation of Average Treatment Effects Under Exogeneity: A Review," The Review of Economics and Statistics, MIT Press, vol. 86(1), pages 4-29, February.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Victor Chernozhukov & Mert Demirer & Esther Duflo & Ivan Fernandez-Val, 2017. "Generic machine learning inference on heterogenous treatment effects in randomized experiments," CeMMAP working papers 61/17, Institute for Fiscal Studies.
    2. Alexandre Belloni & Victor Chernozhukov & Denis Chetverikov & Christian Hansen & Kengo Kato, 2018. "High-Dimensional Econometrics and Regularized GMM," Papers 1806.01888, arXiv.org, revised Jun 2018.
    3. Michael C Knaus, 2022. "Double machine learning-based programme evaluation under unconfoundedness [Econometric methods for program evaluation]," The Econometrics Journal, Royal Economic Society, vol. 25(3), pages 602-627.
    4. Michael Zimmert & Michael Lechner, 2019. "Nonparametric estimation of causal heterogeneity under high-dimensional confounding," Papers 1908.08779, arXiv.org.
    5. Athey, Susan & Imbens, Guido W. & Metzger, Jonas & Munro, Evan, 2024. "Using Wasserstein Generative Adversarial Networks for the design of Monte Carlo simulations," Journal of Econometrics, Elsevier, vol. 240(2).
    6. Michael Lechner, 2023. "Causal Machine Learning and its use for public policy," Swiss Journal of Economics and Statistics, Springer;Swiss Society of Economics and Statistics, vol. 159(1), pages 1-15, December.
    7. Mark Kattenberg & Bas Scheer & Jurre Thiel, 2023. "Causal forests with fixed effects for treatment effect heterogeneity in difference-in-differences," CPB Discussion Paper 452, CPB Netherlands Bureau for Economic Policy Analysis.
    8. Michael C Knaus & Michael Lechner & Anthony Strittmatter, 2021. "Machine learning estimation of heterogeneous causal effects: Empirical Monte Carlo evidence," The Econometrics Journal, Royal Economic Society, vol. 24(1), pages 134-161.
    9. Rahul Singh & Liyuan Xu & Arthur Gretton, 2020. "Kernel Methods for Causal Functions: Dose, Heterogeneous, and Incremental Response Curves," Papers 2010.04855, arXiv.org, revised Oct 2022.
    10. Martin Huber & Jannis Kueck, 2022. "Testing the identification of causal effects in observational data," Papers 2203.15890, arXiv.org, revised May 2026.
    11. Lin, Zhexiao & Han, Fang, 2025. "On regression-adjusted imputation estimators of average treatment effects," Journal of Econometrics, Elsevier, vol. 251(C).
    12. Dmitry Arkhangelsky & Guido Imbens, 2023. "Causal Models for Longitudinal and Panel Data: A Survey," Papers 2311.15458, arXiv.org, revised Jun 2024.
    13. Goller, Daniel & Lechner, Michael & Moczall, Andreas & Wolff, Joachim, 2020. "Does the estimation of the propensity score by machine learning improve matching estimation? The case of Germany's programmes for long term unemployed," Labour Economics, Elsevier, vol. 65(C).
    14. Huber, Martin, 2019. "An introduction to flexible methods for policy evaluation," FSES Working Papers 504, Faculty of Economics and Social Sciences, University of Freiburg/Fribourg Switzerland.
    15. Ganesh Karapakula, 2023. "Stable Probability Weighting: Large-Sample and Finite-Sample Estimation and Inference Methods for Heterogeneous Causal Effects of Multivalued Treatments Under Limited Overlap," Papers 2301.05703, arXiv.org, revised Jan 2023.
    16. Verhagen, Mark D., 2023. "Using machine learning to monitor the equity of large-scale policy interventions: The Dutch decentralisation of the Social Domain," SocArXiv qzm7y, Center for Open Science.
    17. Achim Ahrens & Alessandra Stampi‐Bombelli & Selina Kurer & Dominik Hangartner, 2024. "Optimal multi‐action treatment allocation: A two‐phase field experiment to boost immigrant naturalization," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 39(7), pages 1379-1395, November.
    18. Chunrong Ai & Oliver Linton & Kaiji Motegi & Zheng Zhang, 2021. "A unified framework for efficient estimation of general treatment models," Quantitative Economics, Econometric Society, vol. 12(3), pages 779-816, July.
    19. Borgschulte, Mark & Vogler, Jacob, 2020. "Did the ACA Medicaid expansion save lives?," Journal of Health Economics, Elsevier, vol. 72(C).
    20. Isaac Meza, 2025. "Residual Balancing for Non-Linear Outcome Models in High Dimensions," Papers 2511.00324, arXiv.org.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:1712.04802. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.