IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2109.15154.html
   My bibliography  Save this paper

Causal Matrix Completion

Author

Listed:
  • Anish Agarwal
  • Munther Dahleh
  • Devavrat Shah
  • Dennis Shen

Abstract

Matrix completion is the study of recovering an underlying matrix from a sparse subset of noisy observations. Traditionally, it is assumed that the entries of the matrix are "missing completely at random" (MCAR), i.e., each entry is revealed at random, independent of everything else, with uniform probability. This is likely unrealistic due to the presence of "latent confounders", i.e., unobserved factors that determine both the entries of the underlying matrix and the missingness pattern in the observed matrix. For example, in the context of movie recommender systems -- a canonical application for matrix completion -- a user who vehemently dislikes horror films is unlikely to ever watch horror films. In general, these confounders yield "missing not at random" (MNAR) data, which can severely impact any inference procedure that does not correct for this bias. We develop a formal causal model for matrix completion through the language of potential outcomes, and provide novel identification arguments for a variety of causal estimands of interest. We design a procedure, which we call "synthetic nearest neighbors" (SNN), to estimate these causal estimands. We prove finite-sample consistency and asymptotic normality of our estimator. Our analysis also leads to new theoretical results for the matrix completion literature. In particular, we establish entry-wise, i.e., max-norm, finite-sample consistency and asymptotic normality results for matrix completion with MNAR data. As a special case, this also provides entry-wise bounds for matrix completion with MCAR data. Across simulated and real data, we demonstrate the efficacy of our proposed estimator.

Suggested Citation

  • Anish Agarwal & Munther Dahleh & Devavrat Shah & Dennis Shen, 2021. "Causal Matrix Completion," Papers 2109.15154, arXiv.org.
  • Handle: RePEc:arx:papers:2109.15154
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2109.15154
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Dmitry Arkhangelsky & Susan Athey & David A. Hirshberg & Guido W. Imbens & Stefan Wager, 2021. "Synthetic Difference-in-Differences," American Economic Review, American Economic Association, vol. 111(12), pages 4088-4118, December.
    2. Muhummad Amjad & Vishal Misra & Devavrat Shah & Dennis Shen, 2019. "mRSC: Multi-dimensional Robust Synthetic Control," Papers 1905.06400, arXiv.org, revised Sep 2019.
    3. Hugo Freeman & Martin Weidner, 2021. "Low-rank approximations of nonseparable panel models," The Econometrics Journal, Royal Economic Society, vol. 24(2), pages 40-77.
    4. Susan Athey & Mohsen Bayati & Nikolay Doudchenko & Guido Imbens & Khashayar Khosravi, 2021. "Matrix Completion Methods for Causal Panel Data Models," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 116(536), pages 1716-1730, October.
    5. Chamberlain, Gary & Rothschild, Michael, 1983. "Arbitrage, Factor Structure, and Mean-Variance Analysis on Large Asset Markets," Econometrica, Econometric Society, vol. 51(5), pages 1281-1304, September.
    6. Alberto Abadie & Javier Gardeazabal, 2003. "The Economic Costs of Conflict: A Case Study of the Basque Country," American Economic Review, American Economic Association, vol. 93(1), pages 113-132, March.
    7. Michael E. Tipping & Christopher M. Bishop, 1999. "Probabilistic Principal Component Analysis," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 61(3), pages 611-622.
    8. Abadie, Alberto & Diamond, Alexis & Hainmueller, Jens, 2010. "Synthetic Control Methods for Comparative Case Studies: Estimating the Effect of California’s Tobacco Control Program," Journal of the American Statistical Association, American Statistical Association, vol. 105(490), pages 493-505.
    9. Imbens,Guido W. & Rubin,Donald B., 2015. "Causal Inference for Statistics, Social, and Biomedical Sciences," Cambridge Books, Cambridge University Press, number 9780521885881.
    10. Anish Agarwal & Rahul Singh, 2021. "Causal Inference with Corrupted Data: Measurement Error, Missing Values, Discretization, and Differential Privacy," Papers 2107.02780, arXiv.org, revised Feb 2024.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Sandro Heiniger, 2024. "Data-driven model selection within the matrix completion method for causal panel data models," Papers 2402.01069, arXiv.org.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Dennis Shen & Peng Ding & Jasjeet Sekhon & Bin Yu, 2022. "Same Root Different Leaves: Time Series and Cross-Sectional Methods in Panel Data," Papers 2207.14481, arXiv.org, revised Oct 2022.
    2. Dmitry Arkhangelsky & Guido Imbens, 2023. "Causal Models for Longitudinal and Panel Data: A Survey," Papers 2311.15458, arXiv.org, revised Mar 2024.
    3. Roy Cerqueti & Raffaella Coppier & Alessandro Girardi & Marco Ventura, 2022. "The sooner the better: lives saved by the lockdown during the COVID-19 outbreak. The case of Italy [Using synthetic controls: Feasibility, data requirements, and methodological aspects]," The Econometrics Journal, Royal Economic Society, vol. 25(1), pages 46-70.
    4. Anish Agarwal & Vasilis Syrgkanis, 2022. "Synthetic Blip Effects: Generalizing Synthetic Controls for the Dynamic Treatment Regime," Papers 2210.11003, arXiv.org.
    5. Viviano, Davide & Bradic, Jelena, 2023. "Synthetic Learner: Model-free inference on treatments over time," Journal of Econometrics, Elsevier, vol. 234(2), pages 691-713.
    6. Stefano, Roberta di & Mellace, Giovanni, 2020. "The inclusive synthetic control method," Discussion Papers on Economics 14/2020, University of Southern Denmark, Department of Economics.
    7. Denis Fougère & Nicolas Jacquemet, 2020. "Policy Evaluation Using Causal Inference Methods," SciencePo Working papers Main hal-03455978, HAL.
    8. Alberto Abadie & Anish Agarwal & Raaz Dwivedi & Abhin Shah, 2024. "Doubly Robust Inference in Causal Latent Factor Models," Papers 2402.11652, arXiv.org, revised Apr 2024.
    9. Sandro Heiniger, 2024. "Data-driven model selection within the matrix completion method for causal panel data models," Papers 2402.01069, arXiv.org.
    10. Giulio Grossi & Marco Mariani & Alessandra Mattei & Patrizia Lattarulo & Ozge Oner, 2020. "Direct and spillover effects of a new tramway line on the commercial vitality of peripheral streets. A synthetic-control approach," Papers 2004.05027, arXiv.org, revised Nov 2023.
    11. David Gilchrist & Thomas Emery & Nuno Garoupa & Rok Spruk, 2023. "Synthetic Control Method: A tool for comparative case studies in economic history," Journal of Economic Surveys, Wiley Blackwell, vol. 37(2), pages 409-445, April.
    12. Lea Bottmer & Guido Imbens & Jann Spiess & Merrill Warnick, 2021. "A Design-Based Perspective on Synthetic Control Methods," Papers 2101.09398, arXiv.org, revised Jul 2023.
    13. Susan Athey & Mohsen Bayati & Nikolay Doudchenko & Guido Imbens & Khashayar Khosravi, 2021. "Matrix Completion Methods for Causal Panel Data Models," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 116(536), pages 1716-1730, October.
    14. Roberta Di Stefano & Giovanni Mellace, 2020. "The inclusive synthetic control method," Working Papers 21/20, Sapienza University of Rome, DISS.
    15. Dennis Shen & Peng Ding & Jasjeet Sekhon & Bin Yu, 2023. "Same Root Different Leaves: Time Series and Cross‐Sectional Methods in Panel Data," Econometrica, Econometric Society, vol. 91(6), pages 2125-2154, November.
    16. Florian Gunsilius, 2020. "Distributional synthetic controls," Papers 2001.06118, arXiv.org, revised Dec 2021.
    17. Christian Aleman & Christopher Busch & Alexander Ludwig & Raul Santaeulalia-Llopis, 2022. "A Stage-Based Identification of Policy Effects," PIER Working Paper Archive 22-026, Penn Institute for Economic Research, Department of Economics, University of Pennsylvania.
    18. Enzo Brox & Riccardo Di Francesco, 2024. "The Cost of Coming Out," Papers 2403.03649, arXiv.org.
    19. Jason Poulos & Shuxi Zeng, 2021. "RNN‐based counterfactual prediction, with an application to homestead policy and public schooling," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 70(4), pages 1124-1139, August.
    20. Anish Agarwal & Devavrat Shah & Dennis Shen, 2020. "Synthetic Interventions," Papers 2006.07691, arXiv.org, revised Oct 2023.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2109.15154. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.