IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2504.01702.html
   My bibliography  Save this paper

A Causal Inference Framework for Data Rich Environments

Author

Listed:
  • Alberto Abadie
  • Anish Agarwal
  • Devavrat Shah

Abstract

We propose a formal model for counterfactual estimation with unobserved confounding in "data-rich" settings, i.e., where there are a large number of units and a large number of measurements per unit. Our model provides a bridge between the structural causal model view of causal inference common in the graphical models literature with that of the latent factor model view common in the potential outcomes literature. We show how classic models for potential outcomes and treatment assignments fit within our framework. We provide an identification argument for the average treatment effect, the average treatment effect on the treated, and the average treatment effect on the untreated. For any estimator that has a fast enough estimation error rate for a certain nuisance parameter, we establish it is consistent for these various causal parameters. We then show principal component regression is one such estimator that leads to consistent estimation, and we analyze the minimal smoothness required of the potential outcomes function for consistency.

Suggested Citation

  • Alberto Abadie & Anish Agarwal & Devavrat Shah, 2025. "A Causal Inference Framework for Data Rich Environments," Papers 2504.01702, arXiv.org.
  • Handle: RePEc:arx:papers:2504.01702
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2504.01702
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Dmitry Arkhangelsky & Susan Athey & David A. Hirshberg & Guido W. Imbens & Stefan Wager, 2021. "Synthetic Difference-in-Differences," American Economic Review, American Economic Association, vol. 111(12), pages 4088-4118, December.
    2. Susan Athey & Mohsen Bayati & Nikolay Doudchenko & Guido Imbens & Khashayar Khosravi, 2021. "Matrix Completion Methods for Causal Panel Data Models," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 116(536), pages 1716-1730, October.
    3. Chamberlain, Gary & Rothschild, Michael, 1983. "Arbitrage, Factor Structure, and Mean-Variance Analysis on Large Asset Markets," Econometrica, Econometric Society, vol. 51(5), pages 1281-1304, September.
    4. Jushan Bai & Serena Ng, 2021. "Matrix Completion, Counterfactuals, and Factor Analysis of Missing Data," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 116(536), pages 1746-1763, October.
    5. Jushan Bai, 2009. "Panel Data Models With Interactive Fixed Effects," Econometrica, Econometric Society, vol. 77(4), pages 1229-1279, July.
    6. Abadie, Alberto & Diamond, Alexis & Hainmueller, Jens, 2010. "Synthetic Control Methods for Comparative Case Studies: Estimating the Effect of California’s Tobacco Control Program," Journal of the American Statistical Association, American Statistical Association, vol. 105(490), pages 493-505.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Alberto Abadie & Anish Agarwal & Raaz Dwivedi & Abhin Shah, 2024. "Doubly Robust Inference in Causal Latent Factor Models," Papers 2402.11652, arXiv.org, revised Oct 2024.
    2. Dmitry Arkhangelsky & Guido Imbens, 2023. "Causal Models for Longitudinal and Panel Data: A Survey," Papers 2311.15458, arXiv.org, revised Jun 2024.
    3. Callaway, Brantly & Karami, Sonia, 2023. "Treatment effects in interactive fixed effects models with a small number of time periods," Journal of Econometrics, Elsevier, vol. 233(1), pages 184-208.
    4. Luis Costa & Vivek F. Farias & Patricio Foncea & Jingyuan (Donna) Gan & Ayush Garg & Ivo Rosa Montenegro & Kumarjit Pathak & Tianyi Peng & Dusan Popovic, 2023. "Generalized Synthetic Control for TestOps at ABI: Models, Algorithms, and Infrastructure," Interfaces, INFORMS, vol. 53(5), pages 336-349, September.
    5. Guido W. Imbens & Davide Viviano, 2023. "Identification and Inference for Synthetic Controls with Confounding," Papers 2312.00955, arXiv.org.
    6. Li, Xingyu & Shen, Yan & Zhou, Qiankun, 2024. "Confidence intervals of treatment effects in panel data models with interactive fixed effects," Journal of Econometrics, Elsevier, vol. 240(1).
    7. Ben Deaner & Chen-Wei Hsiang & Andrei Zeleneev, 2025. "Inferring Treatment Effects in Large Panels by Uncovering Latent Similarities," Papers 2503.20769, arXiv.org, revised Mar 2025.
    8. Xiong, Ruoxuan & Pelger, Markus, 2023. "Large dimensional latent factor modeling with missing observations and applications to causal inference," Journal of Econometrics, Elsevier, vol. 233(1), pages 271-301.
    9. Dallas Dotter & Duncan Chaplin & Maria Bartlett, "undated". "Impacts of School Reforms in Washington, DC on Student Achievement," Mathematica Policy Research Reports 44e95d7566434a21b8d57f951, Mathematica Policy Research.
    10. Jungjun Choi & Ming Yuan, 2023. "Matrix Completion When Missing Is Not at Random and Its Applications in Causal Panel Data Models," Papers 2308.02364, arXiv.org.
    11. Dmitry Arkhangelsky & Aleksei Samkov, 2024. "Sequential Synthetic Difference in Differences," Papers 2404.00164, arXiv.org.
    12. Belloni, Alexandre & Chen, Mingli & Madrid Padilla, Oscar Hernan & Wang, Zixuan (Kevin), 2019. "High Dimensional Latent Panel Quantile Regression with an Application to Asset Pricing," The Warwick Economics Research Paper Series (TWERPS) 1230, University of Warwick, Department of Economics.
    13. Viviano, Davide & Bradic, Jelena, 2023. "Synthetic Learner: Model-free inference on treatments over time," Journal of Econometrics, Elsevier, vol. 234(2), pages 691-713.
    14. Anish Agarwal & Munther Dahleh & Devavrat Shah & Dennis Shen, 2021. "Causal Matrix Completion," Papers 2109.15154, arXiv.org.
    15. Dennis Shen & Peng Ding & Jasjeet Sekhon & Bin Yu, 2022. "Same Root Different Leaves: Time Series and Cross-Sectional Methods in Panel Data," Papers 2207.14481, arXiv.org, revised Oct 2022.
    16. Cahan, Ercument & Bai, Jushan & Ng, Serena, 2023. "Factor-based imputation of missing values and covariances in panel data of large dimensions," Journal of Econometrics, Elsevier, vol. 233(1), pages 113-131.
    17. Qili Wang & Liangfei Qiu & Wei Xu, 2024. "Informal Payments and Doctor Engagement in an Online Health Community: An Empirical Investigation Using Generalized Synthetic Control," Information Systems Research, INFORMS, vol. 35(2), pages 706-726, June.
    18. Cummins Joseph & Miller Douglas L. & Smith Brock & Simon David, 2024. "Matching on Noise: Finite Sample Bias in the Synthetic Control Estimator," Journal of Econometric Methods, De Gruyter, vol. 13(1), pages 67-95, January.
    19. Michał Marcin Kobierecki & Michał Pierzgalski, 2022. "Sports Mega-Events and Economic Growth: A Synthetic Control Approach," Journal of Sports Economics, , vol. 23(5), pages 567-597, June.
    20. Bai, Jushan & Wang, Peng, 2024. "Causal inference using factor models," MPRA Paper 120585, University Library of Munich, Germany.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2504.01702. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.