IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2107.11732.html
   My bibliography  Save this paper

Federated Causal Inference in Heterogeneous Observational Data

Author

Listed:
  • Ruoxuan Xiong
  • Allison Koenecke
  • Michael Powell
  • Zhu Shen
  • Joshua T. Vogelstein
  • Susan Athey

Abstract

We are interested in estimating the effect of a treatment applied to individuals at multiple sites, where data is stored locally for each site. Due to privacy constraints, individual-level data cannot be shared across sites; the sites may also have heterogeneous populations and treatment assignment mechanisms. Motivated by these considerations, we develop federated methods to draw inference on the average treatment effects of combined data across sites. Our methods first compute summary statistics locally using propensity scores and then aggregate these statistics across sites to obtain point and variance estimators of average treatment effects. We show that these estimators are consistent and asymptotically normal. To achieve these asymptotic properties, we find that the aggregation schemes need to account for the heterogeneity in treatment assignments and in outcomes across sites. We demonstrate the validity of our federated methods through a comparative study of two large medical claims databases.

Suggested Citation

  • Ruoxuan Xiong & Allison Koenecke & Michael Powell & Zhu Shen & Joshua T. Vogelstein & Susan Athey, 2021. "Federated Causal Inference in Heterogeneous Observational Data," Papers 2107.11732, arXiv.org, revised Apr 2023.
  • Handle: RePEc:arx:papers:2107.11732
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2107.11732
    File Function: Latest version
    Download Restriction: no
    ---><---

    Other versions of this item:

    References listed on IDEAS

    as
    1. Stefan Wager & Susan Athey, 2018. "Estimation and Inference of Heterogeneous Treatment Effects using Random Forests," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 113(523), pages 1228-1242, July.
    2. Victor Chernozhukov & Denis Chetverikov & Mert Demirer & Esther Duflo & Christian Hansen & Whitney Newey & James Robins, 2018. "Double/debiased machine learning for treatment and structural parameters," Econometrics Journal, Royal Economic Society, vol. 21(1), pages 1-68, February.
    3. Jeffrey M. Wooldridge, 2002. "Inverse probability weighted M-estimators for sample selection, attrition, and stratification," Portuguese Economic Journal, Springer;Instituto Superior de Economia e Gestao, vol. 1(2), pages 117-139, August.
    4. Susan Athey & Raj Chetty & Guido Imbens, 2020. "Combining Experimental and Observational Data to Estimate Treatment Effects on Long Term Outcomes," Papers 2006.09676, arXiv.org.
    5. Newey, Whitney K, 1994. "The Asymptotic Variance of Semiparametric Estimators," Econometrica, Econometric Society, vol. 62(6), pages 1349-1382, November.
    6. Victor Chernozhukov & Denis Chetverikov & Mert Demirer & Esther Duflo & Christian Hansen & Whitney Newey, 2017. "Double/Debiased/Neyman Machine Learning of Treatment Effects," American Economic Review, American Economic Association, vol. 107(5), pages 261-265, May.
    7. Wooldridge, Jeffrey M., 2007. "Inverse probability weighted estimation for general missing data problems," Journal of Econometrics, Elsevier, vol. 141(2), pages 1281-1301, December.
    8. Keisuke Hirano & Guido W. Imbens & Geert Ridder, 2003. "Efficient Estimation of Average Treatment Effects Using the Estimated Propensity Score," Econometrica, Econometric Society, vol. 71(4), pages 1161-1189, July.
    9. Verena Staedtke & Ren-Yuan Bai & Kibem Kim & Martin Darvas & Marco L. Davila & Gregory J. Riggins & Paul B. Rothman & Nickolas Papadopoulos & Kenneth W. Kinzler & Bert Vogelstein & Shibin Zhou, 2018. "Disruption of a self-amplifying catecholamine loop reduces cytokine release syndrome," Nature, Nature, vol. 564(7735), pages 273-277, December.
    10. Victor Chernozhukov & Denis Chetverikov & Mert Demirer & Esther Duflo & Christian Hansen & Whitney Newey & James Robins, 2016. "Double/Debiased Machine Learning for Treatment and Causal Parameters," Papers 1608.00060, arXiv.org, revised Dec 2017.
    11. Dominik Rothenhäusler & Nicolai Meinshausen & Peter Bühlmann & Jonas Peters, 2021. "Anchor regression: Heterogeneous data meet causality," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 83(2), pages 215-246, April.
    12. Blough, David K. & Madden, Carolyn W. & Hornbrook, Mark C., 1999. "Modeling risk using generalized linear models," Journal of Health Economics, Elsevier, vol. 18(2), pages 153-171, April.
    13. Jonas Peters & Peter Bühlmann & Nicolai Meinshausen, 2016. "Causal inference by using invariant prediction: identification and confidence intervals," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 78(5), pages 947-1012, November.
    14. Heejung Bang & James M. Robins, 2005. "Doubly Robust Estimation in Missing Data and Causal Inference Models," Biometrics, The International Biometric Society, vol. 61(4), pages 962-973, December.
    15. Imbens,Guido W. & Rubin,Donald B., 2015. "Causal Inference for Statistics, Social, and Biomedical Sciences," Cambridge Books, Cambridge University Press, number 9780521885881, October.
    16. White, Halbert, 1982. "Maximum Likelihood Estimation of Misspecified Models," Econometrica, Econometric Society, vol. 50(1), pages 1-25, January.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Aldo Gael Carranza & Susan Athey, 2023. "Federated Offline Policy Learning," Papers 2305.12407, arXiv.org, revised Oct 2024.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Ganesh Karapakula, 2023. "Stable Probability Weighting: Large-Sample and Finite-Sample Estimation and Inference Methods for Heterogeneous Causal Effects of Multivalued Treatments Under Limited Overlap," Papers 2301.05703, arXiv.org, revised Jan 2023.
    2. Sant’Anna, Pedro H.C. & Zhao, Jun, 2020. "Doubly robust difference-in-differences estimators," Journal of Econometrics, Elsevier, vol. 219(1), pages 101-122.
    3. Yuya Sasaki & Takuya Ura & Yichong Zhang, 2022. "Unconditional quantile regression with high‐dimensional data," Quantitative Economics, Econometric Society, vol. 13(3), pages 955-978, July.
    4. Dmitry Arkhangelsky & Guido Imbens, 2023. "Causal Models for Longitudinal and Panel Data: A Survey," Papers 2311.15458, arXiv.org, revised Jun 2024.
    5. Su, Liangjun & Ura, Takuya & Zhang, Yichong, 2019. "Non-separable models with high-dimensional data," Journal of Econometrics, Elsevier, vol. 212(2), pages 646-677.
    6. Jikai Jin & Vasilis Syrgkanis, 2024. "Structure-agnostic Optimality of Doubly Robust Learning for Treatment Effect Estimation," Papers 2402.14264, arXiv.org, revised Mar 2024.
    7. Athey, Susan & Imbens, Guido W. & Metzger, Jonas & Munro, Evan, 2024. "Using Wasserstein Generative Adversarial Networks for the design of Monte Carlo simulations," Journal of Econometrics, Elsevier, vol. 240(2).
    8. Heiler, Phillip & Kazak, Ekaterina, 2021. "Valid inference for treatment effect parameters under irregular identification and many extreme propensity scores," Journal of Econometrics, Elsevier, vol. 222(2), pages 1083-1108.
    9. Alexandre Belloni & Victor Chernozhukov & Denis Chetverikov & Christian Hansen & Kengo Kato, 2018. "High-dimensional econometrics and regularized GMM," CeMMAP working papers CWP35/18, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    10. Miruna Oprescu & Vasilis Syrgkanis & Zhiwei Steven Wu, 2018. "Orthogonal Random Forest for Causal Inference," Papers 1806.03467, arXiv.org, revised Sep 2019.
    11. Mark Kattenberg & Bas Scheer & Jurre Thiel, 2023. "Causal forests with fixed effects for treatment effect heterogeneity in difference-in-differences," CPB Discussion Paper 452, CPB Netherlands Bureau for Economic Policy Analysis.
    12. Guido W. Imbens, 2020. "Potential Outcome and Directed Acyclic Graph Approaches to Causality: Relevance for Empirical Practice in Economics," Journal of Economic Literature, American Economic Association, vol. 58(4), pages 1129-1179, December.
    13. Valente, Marica, 2023. "Policy evaluation of waste pricing programs using heterogeneous causal effect estimation," Journal of Environmental Economics and Management, Elsevier, vol. 117(C).
    14. Victor Chernozhukov & Mert Demirer & Esther Duflo & Ivan Fernandez-Val, 2017. "Generic machine learning inference on heterogenous treatment effects in randomized experiments," CeMMAP working papers 61/17, Institute for Fiscal Studies.
    15. Anna Baiardi & Andrea A. Naghi, 2021. "The Value Added of Machine Learning to Causal Inference: Evidence from Revisited Studies," Papers 2101.00878, arXiv.org.
    16. Jiaming Mao & Jingzhi Xu, 2020. "Ensemble Learning with Statistical and Structural Models," Papers 2006.05308, arXiv.org.
    17. Ashkan Ertefaie & Nima S. Hejazi & Mark J. van der Laan, 2023. "Nonparametric inverse‐probability‐weighted estimators based on the highly adaptive lasso," Biometrics, The International Biometric Society, vol. 79(2), pages 1029-1041, June.
    18. Retsef Levi & Elisabeth Paulson & Georgia Perakis & Emily Zhang, 2024. "Heterogeneous Treatment Effects in Panel Data," Papers 2406.05633, arXiv.org.
    19. Anna Baiardi & Andrea A. Naghi, 2021. "The Value Added of Machine Learning to Causal Inference: Evidence from Revisited Studies," Tinbergen Institute Discussion Papers 21-001/V, Tinbergen Institute.
    20. Dongcheng Zhang & Kunpeng Zhang, 2020. "Weighting-Based Treatment Effect Estimation via Distribution Learning," Papers 2012.13805, arXiv.org, revised May 2023.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2107.11732. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.