IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2509.17180.html
   My bibliography  Save this paper

Regularizing Extrapolation in Causal Inference

Author

Listed:
  • David Arbour
  • Harsh Parikh
  • Bijan Niknam
  • Elizabeth Stuart
  • Kara Rudolph
  • Avi Feller

Abstract

Many common estimators in machine learning and causal inference are linear smoothers, where the prediction is a weighted average of the training outcomes. Some estimators, such as ordinary least squares and kernel ridge regression, allow for arbitrarily negative weights, which improve feature imbalance but often at the cost of increased dependence on parametric modeling assumptions and higher variance. By contrast, estimators like importance weighting and random forests (sometimes implicitly) restrict weights to be non-negative, reducing dependence on parametric modeling and variance at the cost of worse imbalance. In this paper, we propose a unified framework that directly penalizes the level of extrapolation, replacing the current practice of a hard non-negativity constraint with a soft constraint and corresponding hyperparameter. We derive a worst-case extrapolation error bound and introduce a novel "bias-bias-variance" tradeoff, encompassing biases due to feature imbalance, model misspecification, and estimator variance; this tradeoff is especially pronounced in high dimensions, particularly when positivity is poor. We then develop an optimization procedure that regularizes this bound while minimizing imbalance and outline how to use this approach as a sensitivity analysis for dependence on parametric modeling assumptions. We demonstrate the effectiveness of our approach through synthetic experiments and a real-world application, involving the generalization of randomized controlled trial estimates to a target population of interest.

Suggested Citation

  • David Arbour & Harsh Parikh & Bijan Niknam & Elizabeth Stuart & Kara Rudolph & Avi Feller, 2025. "Regularizing Extrapolation in Causal Inference," Papers 2509.17180, arXiv.org, revised Oct 2025.
  • Handle: RePEc:arx:papers:2509.17180
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2509.17180
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Victor Chernozhukov & Denis Chetverikov & Mert Demirer & Esther Duflo & Christian Hansen & Whitney Newey & James Robins, 2018. "Double/debiased machine learning for treatment and structural parameters," Econometrics Journal, Royal Economic Society, vol. 21(1), pages 1-68, February.
    2. Nikolay Doudchenko & Guido W. Imbens, 2016. "Balancing, Regression, Difference-In-Differences and Synthetic Control Methods: A Synthesis," NBER Working Papers 22791, National Bureau of Economic Research, Inc.
    3. D’Amour, Alexander & Ding, Peng & Feller, Avi & Lei, Lihua & Sekhon, Jasjeet, 2021. "Overlap in observational studies with high-dimensional covariates," Journal of Econometrics, Elsevier, vol. 221(2), pages 644-654.
    4. Ambarish Chattopadhyay & José R Zubizarreta, 2023. "On the implied weights of linear regression for causal inference," Biometrika, Biometrika Trust, vol. 110(3), pages 615-629.
    5. Zhexiao Lin & Peng Ding & Fang Han, 2021. "Estimation based on nearest neighbor matching: from density ratio to average treatment effect," Papers 2112.13506, arXiv.org.
    6. Richard K. Crump & V. Joseph Hotz & Guido W. Imbens & Oscar A. Mitnik, 2006. "Moving the Goalposts: Addressing Limited Overlap in the Estimation of Average Treatment Effects by Changing the Estimand," NBER Technical Working Papers 0330, National Bureau of Economic Research, Inc.
    7. Abadie, Alberto & Diamond, Alexis & Hainmueller, Jens, 2010. "Synthetic Control Methods for Comparative Case Studies: Estimating the Effect of California’s Tobacco Control Program," Journal of the American Statistical Association, American Statistical Association, vol. 105(490), pages 493-505.
    8. King, Gary & Zeng, Langche, 2006. "The Dangers of Extreme Counterfactuals," Political Analysis, Cambridge University Press, vol. 14(2), pages 131-159, April.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Nicolaj N. Mühlbach, 2020. "Tree-based Synthetic Control Methods: Consequences of moving the US Embassy," CREATES Research Papers 2020-04, Department of Economics and Business Economics, Aarhus University.
    2. Davide Viviano & Jelena Bradic, 2019. "Synthetic learner: model-free inference on treatments over time," Papers 1904.01490, arXiv.org, revised Aug 2022.
    3. Dennis Shen & Peng Ding & Jasjeet Sekhon & Bin Yu, 2022. "Same Root Different Leaves: Time Series and Cross-Sectional Methods in Panel Data," Papers 2207.14481, arXiv.org, revised Oct 2022.
    4. Dmitry Arkhangelsky & Guido Imbens, 2023. "Causal Models for Longitudinal and Panel Data: A Survey," Papers 2311.15458, arXiv.org, revised Jun 2024.
    5. Ganesh Karapakula, 2023. "Stable Probability Weighting: Large-Sample and Finite-Sample Estimation and Inference Methods for Heterogeneous Causal Effects of Multivalued Treatments Under Limited Overlap," Papers 2301.05703, arXiv.org, revised Jan 2023.
    6. Jason Poulos & Shuxi Zeng, 2021. "RNN‐based counterfactual prediction, with an application to homestead policy and public schooling," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 70(4), pages 1124-1139, August.
    7. Alberto Abadie & Jaume Vives-i-Bastida, 2022. "Synthetic Controls in Action," Papers 2203.06279, arXiv.org.
    8. Viviano, Davide & Bradic, Jelena, 2023. "Synthetic Learner: Model-free inference on treatments over time," Journal of Econometrics, Elsevier, vol. 234(2), pages 691-713.
    9. Daniel Albalate & Germà Bel & Ferran A. Mazaira-Font, 2020. "Ensuring Stability, Accuracy and Meaningfulness in Synthetic Control Methods: The Regularized SHAP-Distance Method," IREA Working Papers 202005, University of Barcelona, Research Institute of Applied Economics, revised Apr 2020.
    10. Bruno Ferman & Cristine Pinto & Vitor Possebom, 2020. "Cherry Picking with Synthetic Controls," Journal of Policy Analysis and Management, John Wiley & Sons, Ltd., vol. 39(2), pages 510-532, March.
    11. Arne Henningsen & Guy Low & David Wuepper & Tobias Dalhaus & Hugo Storm & Dagim Belay & Stefan Hirsch, 2024. "Estimating Causal Effects with Observational Data: Guidelines for Agricultural and Applied Economists," IFRO Working Paper 2024/03, University of Copenhagen, Department of Food and Resource Economics.
    12. Sheng, Yu & Xu, Xinpeng, 2019. "The productivity impact of climate change: Evidence from Australia's Millennium drought," Economic Modelling, Elsevier, vol. 76(C), pages 182-191.
    13. Sallin, Aurelién, 2021. "Estimating returns to special education: combining machine learning and text analysis to address confounding," Economics Working Paper Series 2109, University of St. Gallen, School of Economics and Political Science.
    14. Irene Botosaru & Bruno Ferman, 2019. "On the role of covariates in the synthetic control method," The Econometrics Journal, Royal Economic Society, vol. 22(2), pages 117-130.
    15. Jan Bruha & Jaromir Tonner, 2018. "An Exchange Rate Floor as an Instrument of Monetary Policy: An Ex-Post Assessment of the Czech Experience," Czech Journal of Economics and Finance (Finance a uver), Charles University Prague, Faculty of Social Sciences, vol. 68(6), pages 537-549, December.
    16. Camilla Beck Olsen & Hans Olav Melberg, 2018. "Did adolescents in Norway respond to the elimination of copayments for general practitioner services?," Health Economics, John Wiley & Sons, Ltd., vol. 27(7), pages 1120-1130, July.
    17. Niklas Potrafke & Fabian Ruthardt & Kaspar Wuthrich, 2020. "Protectionism and economic growth: Causal evidence from the first era of globalization," Papers 2010.02378, arXiv.org, revised Mar 2022.
    18. Susan Athey & Mohsen Bayati & Guido Imbens & Zhaonan Qu, 2019. "Ensemble Methods for Causal Effects in Panel Data Settings," AEA Papers and Proceedings, American Economic Association, vol. 109, pages 65-70, May.
    19. Peter Backus & Thien Nguyen, 2021. "The Effect of the Sex Buyer Law on the Market for Sex, Sexual Health and Sexual Violence," Economics Discussion Paper Series 2106, Economics, The University of Manchester.
    20. Francesca Caselli & Matilde Faralli & Paolo Manasse & Ugo Panizza, 2021. "On the Benefits of Repaying," IMF Working Papers 2021/233, International Monetary Fund.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2509.17180. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.