IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2201.12692.html
   My bibliography  Save this paper

Meta-Learners for Estimation of Causal Effects: Finite Sample Cross-Fit Performance

Author

Listed:
  • Gabriel Okasa

Abstract

Estimation of causal effects using machine learning methods has become an active research field in econometrics. In this paper, we study the finite sample performance of meta-learners for estimation of heterogeneous treatment effects under the usage of sample-splitting and cross-fitting to reduce the overfitting bias. In both synthetic and semi-synthetic simulations we find that the performance of the meta-learners in finite samples greatly depends on the estimation procedure. The results imply that sample-splitting and cross-fitting are beneficial in large samples for bias reduction and efficiency of the meta-learners, respectively, whereas full-sample estimation is preferable in small samples. Furthermore, we derive practical recommendations for application of specific meta-learners in empirical studies depending on particular data characteristics such as treatment shares and sample size.

Suggested Citation

  • Gabriel Okasa, 2022. "Meta-Learners for Estimation of Causal Effects: Finite Sample Cross-Fit Performance," Papers 2201.12692, arXiv.org.
  • Handle: RePEc:arx:papers:2201.12692
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2201.12692
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Lechner, Michael & Wunsch, Conny, 2013. "Sensitivity of matching-based program evaluations to the availability of control variables," Labour Economics, Elsevier, vol. 21(C), pages 111-121.
    2. Yingqi Zhao & Donglin Zeng & A. John Rush & Michael R. Kosorok, 2012. "Estimating Individualized Treatment Rules Using Outcome Weighted Learning," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 107(499), pages 1106-1118, September.
    3. David S. Yeager & Paul Hanselman & Gregory M. Walton & Jared S. Murray & Robert Crosnoe & Chandra Muller & Elizabeth Tipton & Barbara Schneider & Chris S. Hulleman & Cintia P. Hinojosa & David Paunesk, 2019. "A national experiment reveals where a growth mindset improves achievement," Nature, Nature, vol. 573(7774), pages 364-369, September.
    4. Sokbae Lee & Ryo Okui & Yoon†Jae Whang, 2017. "Doubly robust uniform confidence band for the conditional average treatment effect function," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 32(7), pages 1207-1225, November.
    5. Stefan Wager & Susan Athey, 2018. "Estimation and Inference of Heterogeneous Treatment Effects using Random Forests," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 113(523), pages 1228-1242, July.
    6. Wright, Marvin N. & Ziegler, Andreas, 2017. "ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 77(i01).
    7. Michael C Knaus & Michael Lechner & Anthony Strittmatter, 2021. "Machine learning estimation of heterogeneous causal effects: Empirical Monte Carlo evidence," The Econometrics Journal, Royal Economic Society, vol. 24(1), pages 134-161.
    8. Lechner, Michael, 2018. "Modified Causal Forests for Estimating Heterogeneous Causal Effects," IZA Discussion Papers 12040, Institute of Labor Economics (IZA).
    9. Aur'elien Sallin, 2021. "Estimating returns to special education: combining machine learning and text analysis to address confounding," Papers 2110.08807, arXiv.org, revised Feb 2022.
    10. Robinson, Peter M, 1988. "Root- N-Consistent Semiparametric Regression," Econometrica, Econometric Society, vol. 56(4), pages 931-954, July.
    11. Daniel Jacob, 2021. "CATE meets ML -- The Conditional Average Treatment Effect and Machine Learning," Papers 2104.09935, arXiv.org, revised Apr 2021.
    12. Susan Athey & Guido W. Imbens, 2017. "The State of Applied Econometrics: Causality and Policy Evaluation," Journal of Economic Perspectives, American Economic Association, vol. 31(2), pages 3-32, Spring.
    13. Michael C Knaus, 2022. "Double machine learning-based programme evaluation under unconfoundedness [Econometric methods for program evaluation]," The Econometrics Journal, Royal Economic Society, vol. 25(3), pages 602-627.
    14. Alexandre Belloni & Victor Chernozhukov & Christian Hansen, 2011. "Inference on Treatment Effects After Selection Amongst High-Dimensional Controls," Papers 1201.0224, arXiv.org, revised May 2012.
    15. Susan Athey & Julie Tibshirani & Stefan Wager, 2016. "Generalized Random Forests," Papers 1610.01271, arXiv.org, revised Apr 2018.
    16. Toru Kitagawa & Aleksey Tetenov, 2018. "Who Should Be Treated? Empirical Welfare Maximization Methods for Treatment Choice," Econometrica, Econometric Society, vol. 86(2), pages 591-616, March.
    17. Goller, Daniel & Harrer, Tamara & Lechner, Michael & Wolff, Joachim, 2021. "Active labour market policies for the long-term unemployed: New evidence from causal machine learning," Economics Working Paper Series 2108, University of St. Gallen, School of Economics and Political Science.
    18. Goller, Daniel & Lechner, Michael & Moczall, Andreas & Wolff, Joachim, 2020. "Does the estimation of the propensity score by machine learning improve matching estimation? The case of Germany's programmes for long term unemployed," Labour Economics, Elsevier, vol. 65(C).
    19. Joe, Harry, 2006. "Generating random correlation matrices based on partial correlations," Journal of Multivariate Analysis, Elsevier, vol. 97(10), pages 2177-2189, November.
    20. Gerber, Alan S. & Green, Donald P. & Larimer, Christopher W., 2008. "Social Pressure and Voter Turnout: Evidence from a Large-Scale Field Experiment," American Political Science Review, Cambridge University Press, vol. 102(1), pages 33-48, February.
    21. Powell, James L & Stock, James H & Stoker, Thomas M, 1989. "Semiparametric Estimation of Index Coefficients," Econometrica, Econometric Society, vol. 57(6), pages 1403-1430, November.
    22. Xinkun Nie & Emma Brunskill & Stefan Wager, 2021. "Learning When-to-Treat Policies," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 116(533), pages 392-409, January.
    23. Guido W. Imbens & Jeffrey M. Wooldridge, 2009. "Recent Developments in the Econometrics of Program Evaluation," Journal of Economic Literature, American Economic Association, vol. 47(1), pages 5-86, March.
    24. A. Belloni & V. Chernozhukov & I. Fernández‐Val & C. Hansen, 2017. "Program Evaluation and Causal Inference With High‐Dimensional Data," Econometrica, Econometric Society, vol. 85, pages 233-298, January.
    25. Andini, Monica & Ciani, Emanuele & de Blasio, Guido & D'Ignazio, Alessio & Salvestrini, Viola, 2018. "Targeting with machine learning: An application to a tax rebate program in Italy," Journal of Economic Behavior & Organization, Elsevier, vol. 156(C), pages 86-102.
    26. Hodler, Roland & Lechner, Michael & Raschky, Paul A., 2020. "Reassessing the Resource Curse using Causal Machine Learning," Economics Working Paper Series 2016, University of St. Gallen, School of Economics and Political Science.
    27. Gérard Biau & Erwan Scornet, 2016. "Rejoinder on: A random forest guided tour," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 25(2), pages 264-268, June.
    28. Susan Athey & Guido W. Imbens, 2019. "Machine Learning Methods That Economists Should Know About," Annual Review of Economics, Annual Reviews, vol. 11(1), pages 685-725, August.
    29. Athey, Susan & Imbens, Guido W., 2019. "Machine Learning Methods Economists Should Know About," Research Papers 3776, Stanford University, Graduate School of Business.
    30. Jacob, Daniel, 2021. "CATE meets ML: Conditional average treatment effect and machine learning," IRTG 1792 Discussion Papers 2021-005, Humboldt University of Berlin, International Research Training Group 1792 "High Dimensional Nonstationary Time Series".
    31. Susan Athey & Stefan Wager, 2021. "Policy Learning With Observational Data," Econometrica, Econometric Society, vol. 89(1), pages 133-161, January.
    32. Athey, Susan & Imbens, Guido W., 2015. "Machine Learning for Estimating Heterogeneous Causal Effects," Research Papers 3350, Stanford University, Graduate School of Business.
    33. Sallin, Aurelién, 2021. "Estimating returns to special education: combining machine learning and text analysis to address confounding," Economics Working Paper Series 2109, University of St. Gallen, School of Economics and Political Science.
    34. D’Amour, Alexander & Ding, Peng & Feller, Avi & Lei, Lihua & Sekhon, Jasjeet, 2021. "Overlap in observational studies with high-dimensional covariates," Journal of Econometrics, Elsevier, vol. 221(2), pages 644-654.
    35. Sant’Anna, Pedro H.C. & Zhao, Jun, 2020. "Doubly robust difference-in-differences estimators," Journal of Econometrics, Elsevier, vol. 219(1), pages 101-122.
    36. Gérard Biau & Erwan Scornet, 2016. "A random forest guided tour," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 25(2), pages 197-227, June.
    37. Hui Zou & Trevor Hastie, 2005. "Addendum: Regularization and variable selection via the elastic net," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(5), pages 768-768, November.
    38. Huber, Martin & Lechner, Michael & Wunsch, Conny, 2013. "The performance of estimators based on the propensity score," Journal of Econometrics, Elsevier, vol. 175(1), pages 1-21.
    39. Jarque, Carlos M. & Bera, Anil K., 1980. "Efficient tests for normality, homoscedasticity and serial independence of regression residuals," Economics Letters, Elsevier, vol. 6(3), pages 255-259.
    40. Thorsten Thadewald & Herbert Buning, 2007. "Jarque-Bera Test and its Competitors for Testing Normality - A Power Comparison," Journal of Applied Statistics, Taylor & Francis Journals, vol. 34(1), pages 87-105.
    41. Matt Taddy & Matt Gardner & Liyun Chen & David Draper, 2016. "A Nonparametric Bayesian Analysis of Heterogenous Treatment Effects in Digital Experimentation," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 34(4), pages 661-672, October.
    42. Gubela, Robin M. & Lessmann, Stefan & Jaroszewicz, Szymon, 2020. "Response transformation and profit decomposition for revenue uplift modeling," European Journal of Operational Research, Elsevier, vol. 283(2), pages 647-661.
    43. X Nie & S Wager, 2021. "Quasi-oracle estimation of heterogeneous treatment effects [TensorFlow: A system for large-scale machine learning]," Biometrika, Biometrika Trust, vol. 108(2), pages 299-319.
    44. Jinyong Hahn, 1998. "On the Role of the Propensity Score in Efficient Semiparametric Estimation of Average Treatment Effects," Econometrica, Econometric Society, vol. 66(2), pages 315-332, March.
    45. Imbens,Guido W. & Rubin,Donald B., 2015. "Causal Inference for Statistics, Social, and Biomedical Sciences," Cambridge Books, Cambridge University Press, number 9780521885881.
    46. Jasjeet S. Sekhon & Yotam Shem-Tov, 2021. "Inference on a New Class of Sample Average Treatment Effects," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 116(534), pages 798-804, April.
    47. Daniel Jacob, 2021. "CATE meets ML," Digital Finance, Springer, vol. 3(2), pages 99-148, June.
    48. Anna Baiardi & Andrea A. Naghi, 2021. "The Value Added of Machine Learning to Causal Inference: Evidence from Revisited Studies," Papers 2101.00878, arXiv.org.
    49. Valente, Marica, 2023. "Policy evaluation of waste pricing programs using heterogeneous causal effect estimation," Journal of Environmental Economics and Management, Elsevier, vol. 117(C).
    50. Alexandre Belloni & Victor Chernozhukov & Christian Hansen, 2013. "Supplementary Appendix for "Inference on Treatment Effects After Selection Amongst High-Dimensional Controls"," Papers 1305.6099, arXiv.org, revised Jun 2013.
    51. Hui Zou & Trevor Hastie, 2005. "Regularization and variable selection via the elastic net," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(2), pages 301-320, April.
    52. Lu Tian & Ash A. Alizadeh & Andrew J. Gentles & Robert Tibshirani, 2014. "A Simple Method for Estimating Interactions Between a Treatment and a Large Number of Covariates," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 109(508), pages 1517-1532, December.
    53. Janitza, Silke & Tutz, Gerhard & Boulesteix, Anne-Laure, 2016. "Random forest for ordinal responses: Prediction and variable selection," Computational Statistics & Data Analysis, Elsevier, vol. 96(C), pages 57-73.
    54. Bera, Anil K. & Jarque, Carlos M., 1981. "Efficient tests for normality, homoscedasticity and serial independence of regression residuals : Monte Carlo Evidence," Economics Letters, Elsevier, vol. 7(4), pages 313-318.
    55. Susan Athey, 2018. "The Impact of Machine Learning on Economics," NBER Chapters, in: The Economics of Artificial Intelligence: An Agenda, pages 507-547, National Bureau of Economic Research, Inc.
    56. Arceneaux, Kevin & Gerber, Alan S. & Green, Donald P., 2006. "Comparing Experimental and Matching Methods Using a Large-Scale Voter Mobilization Experiment," Political Analysis, Cambridge University Press, vol. 14(1), pages 37-62, January.
    57. Anna Baiardi & Andrea A. Naghi, 2021. "The Value Added of Machine Learning to Causal Inference: Evidence from Revisited Studies," Tinbergen Institute Discussion Papers 21-001/V, Tinbergen Institute.
    58. Edward H. Kennedy & Zongming Ma & Matthew D. McHugh & Dylan S. Small, 2017. "Non-parametric methods for doubly robust estimation of continuous treatment effects," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 79(4), pages 1229-1245, September.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Ganesh Karapakula, 2023. "Stable Probability Weighting: Large-Sample and Finite-Sample Estimation and Inference Methods for Heterogeneous Causal Effects of Multivalued Treatments Under Limited Overlap," Papers 2301.05703, arXiv.org, revised Jan 2023.
    2. Michael C Knaus & Michael Lechner & Anthony Strittmatter, 2021. "Machine learning estimation of heterogeneous causal effects: Empirical Monte Carlo evidence," The Econometrics Journal, Royal Economic Society, vol. 24(1), pages 134-161.
    3. Michael Lechner & Jana Mareckova, 2022. "Modified Causal Forest," Papers 2209.03744, arXiv.org.
    4. Michael Lechner, 2023. "Causal Machine Learning and its use for public policy," Swiss Journal of Economics and Statistics, Springer;Swiss Society of Economics and Statistics, vol. 159(1), pages 1-15, December.
    5. Goller, Daniel & Harrer, Tamara & Lechner, Michael & Wolff, Joachim, 2021. "Active labour market policies for the long-term unemployed: New evidence from causal machine learning," Economics Working Paper Series 2108, University of St. Gallen, School of Economics and Political Science.
    6. Michael C Knaus, 2022. "Double machine learning-based programme evaluation under unconfoundedness [Econometric methods for program evaluation]," The Econometrics Journal, Royal Economic Society, vol. 25(3), pages 602-627.
    7. Daniel Boller & Michael Lechner & Gabriel Okasa, 2021. "The Effect of Sport in Online Dating: Evidence from Causal Machine Learning," Papers 2104.04601, arXiv.org.
    8. Lechner, Michael, 2018. "Modified Causal Forests for Estimating Heterogeneous Causal Effects," IZA Discussion Papers 12040, Institute of Labor Economics (IZA).
    9. Kyle Colangelo & Ying-Ying Lee, 2019. "Double debiased machine learning nonparametric inference with continuous treatments," CeMMAP working papers CWP72/19, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    10. Daniel Goller, 2023. "Analysing a built-in advantage in asymmetric darts contests using causal machine learning," Annals of Operations Research, Springer, vol. 325(1), pages 649-679, June.
    11. Valente, Marica, 2023. "Policy evaluation of waste pricing programs using heterogeneous causal effect estimation," Journal of Environmental Economics and Management, Elsevier, vol. 117(C).
    12. Cockx, Bart & Lechner, Michael & Bollens, Joost, 2023. "Priority to unemployed immigrants? A causal machine learning evaluation of training in Belgium," Labour Economics, Elsevier, vol. 80(C).
    13. Kyle Colangelo & Ying-Ying Lee, 2020. "Double Debiased Machine Learning Nonparametric Inference with Continuous Treatments," Papers 2004.03036, arXiv.org, revised Sep 2023.
    14. Martin Huber, 2019. "An introduction to flexible methods for policy evaluation," Papers 1910.00641, arXiv.org.
    15. Kyle Colangelo & Ying-Ying Lee, 2019. "Double debiased machine learning nonparametric inference with continuous treatments," CeMMAP working papers CWP54/19, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    16. Nora Bearth & Michael Lechner, 2024. "Causal Machine Learning for Moderation Effects," Papers 2401.08290, arXiv.org, revised Apr 2024.
    17. Yiyi Huo & Yingying Fan & Fang Han, 2023. "On the adaptation of causal forests to manifold data," Papers 2311.16486, arXiv.org, revised Dec 2023.
    18. Michael Zimmert & Michael Lechner, 2019. "Nonparametric estimation of causal heterogeneity under high-dimensional confounding," Papers 1908.08779, arXiv.org.
    19. Tobias Cagala & Ulrich Glogowsky & Johannes Rincke & Anthony Strittmatter, 2021. "Optimal Targeting in Fundraising: A Causal Machine-Learning Approach," Papers 2103.10251, arXiv.org, revised Sep 2021.
    20. Goller, Daniel & Lechner, Michael & Moczall, Andreas & Wolff, Joachim, 2020. "Does the estimation of the propensity score by machine learning improve matching estimation? The case of Germany's programmes for long term unemployed," Labour Economics, Elsevier, vol. 65(C).

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2201.12692. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.