IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2601.11845.html

Reevaluating Causal Estimation Methods with Data from a Product Release

Author

Listed:
  • Justin Young
  • Muthoni Ngatia
  • Eleanor Wiske Dillon

Abstract

Recent developments in causal machine learning methods have made it easier to estimate flexible relationships between confounders, treatments and outcomes, making unconfoundedness assumptions in causal analysis more palatable. How successful are these approaches in recovering ground truth baselines? In this paper we analyze a new data sample including an experimental rollout of a new feature at a large technology company and a simultaneous sample of users who endogenously opted into the feature. We find that recovering ground truth causal effects is feasible -- but only with careful modeling choices. Our results build on the observational causal literature beginning with LaLonde (1986), offering best practices for more credible treatment effect estimation in modern, high-dimensional datasets.

Suggested Citation

  • Justin Young & Muthoni Ngatia & Eleanor Wiske Dillon, 2026. "Reevaluating Causal Estimation Methods with Data from a Product Release," Papers 2601.11845, arXiv.org.
  • Handle: RePEc:arx:papers:2601.11845
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2601.11845
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Joshua D. Angrist & Jörn-Steffen Pischke, 2009. "Mostly Harmless Econometrics: An Empiricist's Companion," Economics Books, Princeton University Press, edition 1, number 8769, December.
    2. Glynn, Adam N. & Quinn, Kevin M., 2010. "An Introduction to the Augmented Inverse Propensity Weighted Estimator," Political Analysis, Cambridge University Press, vol. 18(1), pages 36-56, January.
    3. Xinkun Nie & Stefan Wager, 2017. "Quasi-Oracle Estimation of Heterogeneous Treatment Effects," Papers 1712.04912, arXiv.org, revised Aug 2020.
    4. Rajeev H. Dehejia & Sadek Wahba, 2002. "Propensity Score-Matching Methods For Nonexperimental Causal Studies," The Review of Economics and Statistics, MIT Press, vol. 84(1), pages 151-161, February.
    5. Victor Chernozhukov & Christian Hansen & Nathan Kallus & Martin Spindler & Vasilis Syrgkanis, 2024. "Applied Causal Inference Powered by ML and AI," Papers 2403.02467, arXiv.org.
    6. James J. Heckman & Hidehiko Ichimura & Petra E. Todd, 1997. "Matching As An Econometric Evaluation Estimator: Evidence from Evaluating a Job Training Programme," The Review of Economic Studies, Review of Economic Studies Ltd, vol. 64(4), pages 605-654.
    7. Richard K. Crump & V. Joseph Hotz & Guido W. Imbens & Oscar A. Mitnik, 2009. "Dealing with limited overlap in estimation of average treatment effects," Biometrika, Biometrika Trust, vol. 96(1), pages 187-199.
    8. A. Smith, Jeffrey & E. Todd, Petra, 2005. "Does matching overcome LaLonde's critique of nonexperimental estimators?," Journal of Econometrics, Elsevier, vol. 125(1-2), pages 305-353.
    9. Philipp Bach & Oliver Schacht & Victor Chernozhukov & Sven Klaassen & Martin Spindler, 2024. "Hyperparameter Tuning for Causal Inference with Double Machine Learning: A Simulation Study," Papers 2402.04674, arXiv.org.
    10. Guido W. Imbens, 2004. "Nonparametric Estimation of Average Treatment Effects Under Exogeneity: A Review," The Review of Economics and Statistics, MIT Press, vol. 86(1), pages 4-29, February.
    11. Jinyong Hahn, 1998. "On the Role of the Propensity Score in Efficient Semiparametric Estimation of Average Treatment Effects," Econometrica, Econometric Society, vol. 66(2), pages 315-332, March.
    12. Friedlander, Daniel & Robins, Philip K, 1995. "Evaluating Program Evaluations: New Evidence on Commonly Used Nonexperimental Methods," American Economic Review, American Economic Association, vol. 85(4), pages 923-937, September.
    13. James Heckman & Hidehiko Ichimura & Jeffrey Smith & Petra Todd, 1998. "Characterizing Selection Bias Using Experimental Data," Econometrica, Econometric Society, vol. 66(5), pages 1017-1098, September.
    14. Keisuke Hirano & Guido W. Imbens & Geert Ridder, 2003. "Efficient Estimation of Average Treatment Effects Using the Estimated Propensity Score," Econometrica, Econometric Society, vol. 71(4), pages 1161-1189, July.
    15. Alberto Abadie & Guido W. Imbens, 2006. "Large Sample Properties of Matching Estimators for Average Treatment Effects," Econometrica, Econometric Society, vol. 74(1), pages 235-267, January.
    16. Alberto Abadie & Guido W. Imbens, 2016. "Matching on the Estimated Propensity Score," Econometrica, Econometric Society, vol. 84, pages 781-807, March.
    17. Alberto Abadie & David Drukker & Jane Leber Herr & Guido W. Imbens, 2004. "Implementing matching estimators for average treatment effects in Stata," Stata Journal, StataCorp LLC, vol. 4(3), pages 290-311, September.
    18. Patrick Kline, 2011. "Oaxaca-Blinder as a Reweighting Estimator," American Economic Review, American Economic Association, vol. 101(3), pages 532-537, May.
    19. Fan Li & Kari Lock Morgan & Alan M. Zaslavsky, 2018. "Balancing Covariates via Propensity Score Weighting," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 113(521), pages 390-400, January.
    20. Joseph Hotz, V. & Imbens, Guido W. & Mortimer, Julie H., 2005. "Predicting the efficacy of future training programs using past experiences at other locations," Journal of Econometrics, Elsevier, vol. 125(1-2), pages 241-270.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Huber, Martin & Lechner, Michael & Wunsch, Conny, 2013. "The performance of estimators based on the propensity score," Journal of Econometrics, Elsevier, vol. 175(1), pages 1-21.
    2. Carlos A. Flores & Oscar A. Mitnik, 2009. "Evaluating Nonexperimental Estimators for Multiple Treatments: Evidence from Experimental Data," Working Papers 2010-10, University of Miami, Department of Economics.
    3. Guido W. Imbens & Jeffrey M. Wooldridge, 2009. "Recent Developments in the Econometrics of Program Evaluation," Journal of Economic Literature, American Economic Association, vol. 47(1), pages 5-86, March.
    4. Huber, Martin & Lechner, Michael & Wunsch, Conny, 2010. "How to Control for Many Covariates? Reliable Estimators Based on the Propensity Score," IZA Discussion Papers 5268, IZA Network @ LISER.
    5. Stephen L. Morgan & David J. Harding, 2006. "Matching Estimators of Causal Effects," Sociological Methods & Research, , vol. 35(1), pages 3-60, August.
    6. Marco Caliendo & Sabine Kopeinig, 2008. "Some Practical Guidance For The Implementation Of Propensity Score Matching," Journal of Economic Surveys, Wiley Blackwell, vol. 22(1), pages 31-72, February.
    7. Lin, Zhexiao & Han, Fang, 2025. "On regression-adjusted imputation estimators of average treatment effects," Journal of Econometrics, Elsevier, vol. 251(C).
    8. Ferraro, Paul J. & Miranda, Juan José, 2014. "The performance of non-experimental designs in the evaluation of environmental programs: A design-replication study using a large-scale randomized experiment as a benchmark," Journal of Economic Behavior & Organization, Elsevier, vol. 107(PA), pages 344-365.
    9. Zeqin Liu & Zongwu Cai & Ying Fang & Ming Lin, 2019. "Statistical Analysis and Evaluation of Macroeconomic Policies: A Selective Review," WORKING PAPERS SERIES IN THEORETICAL AND APPLIED ECONOMICS 201904, University of Kansas, Department of Economics, revised Mar 2019.
    10. Zhexiao Lin & Fang Han, 2022. "On regression-adjusted imputation estimators of the average treatment effect," Papers 2212.05424, arXiv.org, revised Jan 2023.
    11. Sant’Anna, Pedro H.C. & Song, Xiaojun, 2019. "Specification tests for the propensity score," Journal of Econometrics, Elsevier, vol. 210(2), pages 379-404.
    12. Peter R. Mueser & Kenneth R. Troske & Alexey Gorislavsky, 2007. "Using State Administrative Data to Measure Program Performance," The Review of Economics and Statistics, MIT Press, vol. 89(4), pages 761-783, November.
    13. Dettmann, E. & Becker, C. & Schmeißer, C., 2011. "Distance functions for matching in small samples," Computational Statistics & Data Analysis, Elsevier, vol. 55(5), pages 1942-1960, May.
    14. Dettmann, Eva & Becker, Claudia & Schmeißer, Christian, 2010. "Is there a Superior Distance Function for Matching in Small Samples?," IWH Discussion Papers 3/2010, Halle Institute for Economic Research (IWH).
    15. Yihui He & Fang Han, 2023. "On propensity score matching with a diverging number of matches," Papers 2310.14142, arXiv.org, revised Nov 2023.
    16. Richard K. Crump & V. Joseph Hotz & Guido W. Imbens & Oscar A. Mitnik, 2006. "Moving the Goalposts: Addressing Limited Overlap in the Estimation of Average Treatment Effects by Changing the Estimand," NBER Technical Working Papers 0330, National Bureau of Economic Research, Inc.
    17. V. Joseph Hotz & Guido W. Imbens & Jacob A. Klerman, 2006. "Evaluating the Differential Effects of Alternative Welfare-to-Work Training Components: A Reanalysis of the California GAIN Program," Journal of Labor Economics, University of Chicago Press, vol. 24(3), pages 521-566, July.
    18. Lechner, Michael & Wunsch, Conny, 2013. "Sensitivity of matching-based program evaluations to the availability of control variables," Labour Economics, Elsevier, vol. 21(C), pages 111-121.
    19. Jones A.M & Rice N, 2009. "Econometric Evaluation of Health Policies," Health, Econometrics and Data Group (HEDG) Working Papers 09/09, HEDG, c/o Department of Economics, University of York.
    20. Paweł Strawiński, 2012. "Small sample properties of matching with caliper," Working Papers 2012-13, Faculty of Economic Sciences, University of Warsaw.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2601.11845. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.