IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2505.24296.html
   My bibliography  Save this paper

Data Fusion for Partial Identification of Causal Effects

Author

Listed:
  • Quinn Lanners
  • Cynthia Rudin
  • Alexander Volfovsky
  • Harsh Parikh

Abstract

Data fusion techniques integrate information from heterogeneous data sources to improve learning, generalization, and decision making across data sciences. In causal inference, these methods leverage rich observational data to improve causal effect estimation, while maintaining the trustworthiness of randomized controlled trials. Existing approaches often relax the strong no unobserved confounding assumption by instead assuming exchangeability of counterfactual outcomes across data sources. However, when both assumptions simultaneously fail - a common scenario in practice - current methods cannot identify or estimate causal effects. We address this limitation by proposing a novel partial identification framework that enables researchers to answer key questions such as: Is the causal effect positive or negative? and How severe must assumption violations be to overturn this conclusion? Our approach introduces interpretable sensitivity parameters that quantify assumption violations and derives corresponding causal effect bounds. We develop doubly robust estimators for these bounds and operationalize breakdown frontier analysis to understand how causal conclusions change as assumption violations increase. We apply our framework to the Project STAR study, which investigates the effect of classroom size on students' third-grade standardized test performance. Our analysis reveals that the Project STAR results are robust to simultaneous violations of key assumptions, both on average and across various subgroups of interest. This strengthens confidence in the study's conclusions despite potential unmeasured biases in the data.

Suggested Citation

  • Quinn Lanners & Cynthia Rudin & Alexander Volfovsky & Harsh Parikh, 2025. "Data Fusion for Partial Identification of Causal Effects," Papers 2505.24296, arXiv.org.
  • Handle: RePEc:arx:papers:2505.24296
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2505.24296
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Ashley L. Buchanan & Michael G. Hudgens & Stephen R. Cole & Katie R. Mollan & Paul E. Sax & Eric S. Daar & Adaora A. Adimora & Joseph J. Eron & Michael J. Mugavero, 2018. "Generalizing evidence from randomized trials using inverse probability of sampling weights," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 181(4), pages 1193-1209, October.
    2. Weili Ding & Steven F. Lehrer, 2010. "Estimating Treatment Effects from Contaminated Multiperiod Education Experiments: The Dynamic Impacts of Class Size Reductions," The Review of Economics and Statistics, MIT Press, vol. 92(1), pages 31-42, February.
    3. Victor Chernozhukov & Denis Chetverikov & Mert Demirer & Esther Duflo & Christian Hansen & Whitney Newey & James Robins, 2018. "Double/debiased machine learning for treatment and structural parameters," Econometrics Journal, Royal Economic Society, vol. 21(1), pages 1-68, February.
    4. Fan Li & Hwanhee Hong & Elizabeth A. Stuart, 2023. "A note on semiparametric efficient generalization of causal effects from randomized trials to target populations," Communications in Statistics - Theory and Methods, Taylor & Francis Journals, vol. 52(16), pages 5767-5798, August.
    5. Colm O'Muircheartaigh & Larry V. Hedges, 2014. "Generalizing from unrepresentative experiments: a stratified propensity score approach," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 63(2), pages 195-210, February.
    6. Hainmueller, Jens, 2012. "Entropy Balancing for Causal Effects: A Multivariate Reweighting Method to Produce Balanced Samples in Observational Studies," Political Analysis, Cambridge University Press, vol. 20(1), pages 25-46, January.
    7. Jeffrey M. Wooldridge, 2002. "Inverse probability weighted M-estimators for sample selection, attrition, and stratification," Portuguese Economic Journal, Springer;Instituto Superior de Economia e Gestao, vol. 1(2), pages 117-139, August.
    8. Justman, Moshe, 2018. "Randomized controlled trials informing public policy: Lessons from project STAR and class size reduction," European Journal of Political Economy, Elsevier, vol. 54(C), pages 167-174.
    9. Matteo Bonvini & Edward H. Kennedy, 2022. "Sensitivity Analysis via the Proportion of Unmeasured Confounding," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 117(539), pages 1540-1550, September.
    10. Issa J. Dahabreh & Sarah E. Robertson & Eric J. Tchetgen & Elizabeth A. Stuart & Miguel A. Hernán, 2019. "Generalizing causal inferences from individuals in randomized trials to all trial‐eligible individuals," Biometrics, The International Biometric Society, vol. 75(2), pages 685-694, June.
    11. Xinkun Nie & Guido Imbens & Stefan Wager, 2021. "Covariate Balancing Sensitivity Analysis for Extrapolating Randomized Trials across Locations," Papers 2112.04723, arXiv.org.
    12. Evan T.R. Rosenman & Guillaume Basse & Art B. Owen & Mike Baiocchi, 2023. "Combining observational and experimental datasets using shrinkage estimators," Biometrics, The International Biometric Society, vol. 79(4), pages 2961-2973, December.
    13. Matthew A. Masten & Alexandre Poirier, 2020. "Inference on breakdown frontiers," Quantitative Economics, Econometric Society, vol. 11(1), pages 41-111, January.
    14. Glynn, Adam N. & Quinn, Kevin M., 2010. "An Introduction to the Augmented Inverse Propensity Weighted Estimator," Political Analysis, Cambridge University Press, vol. 18(1), pages 36-56, January.
    15. Datar, Ashlesha, 2006. "Does delaying kindergarten entrance give children a head start?," Economics of Education Review, Elsevier, vol. 25(1), pages 43-62, February.
    16. Blackwell, Matthew, 2014. "A Selection Bias Approach to Sensitivity Analysis for Causal Effects," Political Analysis, Cambridge University Press, vol. 22(2), pages 169-182, April.
    17. Elizabeth A. Stuart & Stephen R. Cole & Catherine P. Bradshaw & Philip J. Leaf, 2011. "The use of propensity scores to assess the generalizability of results from randomized trials," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 174(2), pages 369-386, April.
    18. Erin Hartman & Richard Grieve & Roland Ramsahai & Jasjeet S. Sekhon, 2015. "From sample average treatment effect to population average treatment effect on the treated: combining experimental with observational studies to estimate population treatment effects," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 178(3), pages 757-778, June.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Harsh Parikh & Marco Morucci & Vittorio Orlandi & Sudeepa Roy & Cynthia Rudin & Alexander Volfovsky, 2023. "A Double Machine Learning Approach to Combining Experimental and Observational Data," Papers 2307.01449, arXiv.org, revised Apr 2024.
    2. Dasom Lee & Shu Yang & Lin Dong & Xiaofei Wang & Donglin Zeng & Jianwen Cai, 2023. "Improving trial generalizability using observational studies," Biometrics, The International Biometric Society, vol. 79(2), pages 1213-1225, June.
    3. Ashley L. Buchanan & Michael G. Hudgens & Stephen R. Cole & Katie R. Mollan & Paul E. Sax & Eric S. Daar & Adaora A. Adimora & Joseph J. Eron & Michael J. Mugavero, 2018. "Generalizing evidence from randomized trials using inverse probability of sampling weights," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 181(4), pages 1193-1209, October.
    4. Naoki Egami & Erin Hartman, 2021. "Covariate selection for generalizing experimental results: Application to a large‐scale development program in Uganda," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 184(4), pages 1524-1548, October.
    5. Oyenubi, Adeola & Kollamparambil, Umakrishnan, 2023. "Does noncompliance with COVID-19 regulations impact the depressive symptoms of others?," Economic Modelling, Elsevier, vol. 120(C).
    6. Benjamin Lu & Eli Ben-Michael & Avi Feller & Luke Miratrix, 2023. "Is It Who You Are or Where You Are? Accounting for Compositional Differences in Cross-Site Treatment Effect Variation," Journal of Educational and Behavioral Statistics, , vol. 48(4), pages 420-453, August.
    7. Fan Li & Ashley L. Buchanan & Stephen R. Cole, 2022. "Generalizing trial evidence to target populations in non‐nested designs: Applications to AIDS clinical trials," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 71(3), pages 669-697, June.
    8. Aditya Ghosh & Dominik Rothenhausler, 2025. "Assumption-robust Causal Inference," Papers 2505.08729, arXiv.org.
    9. Rui Chen & Guanhua Chen & Menggang Yu, 2023. "Entropy balancing for causal generalization with target sample summary information," Biometrics, The International Biometric Society, vol. 79(4), pages 3179-3190, December.
    10. Xinyu Li & Wang Miao & Fang Lu & Xiao‐Hua Zhou, 2023. "Improving efficiency of inference in clinical trials with external control data," Biometrics, The International Biometric Society, vol. 79(1), pages 394-403, March.
    11. David M. Phillippo & Anthony E. Ades & Sofia Dias & Stephen Palmer & Keith R. Abrams & Nicky J. Welton, 2018. "Methods for Population-Adjusted Indirect Comparisons in Health Technology Appraisal," Medical Decision Making, , vol. 38(2), pages 200-211, February.
    12. Melody Y Huang & Harsh Parikh, 2024. "Towards Generalizing Inferences from Trials to Target Populations," Papers 2402.17042, arXiv.org, revised May 2024.
    13. Lundberg, Ian & Brand, Jennie E. & Jeon, Nanum, 2022. "Researcher reasoning meets computational capacity: Machine learning for social science," SocArXiv s5zc8, Center for Open Science.
    14. Nicolaj N. Mühlbach, 2020. "Tree-based Synthetic Control Methods: Consequences of moving the US Embassy," CREATES Research Papers 2020-04, Department of Economics and Business Economics, Aarhus University.
    15. Ruoxuan Xiong & Allison Koenecke & Michael Powell & Zhu Shen & Joshua T. Vogelstein & Susan Athey, 2021. "Federated Causal Inference in Heterogeneous Observational Data," Papers 2107.11732, arXiv.org, revised Apr 2023.
    16. Susan Athey & Guido W. Imbens & Stefan Wager, 2018. "Approximate residual balancing: debiased inference of average treatment effects in high dimensions," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 80(4), pages 597-623, September.
    17. Sallin, Aurelién, 2021. "Estimating returns to special education: combining machine learning and text analysis to address confounding," Economics Working Paper Series 2109, University of St. Gallen, School of Economics and Political Science.
    18. Pedro H. C. Sant'Anna & Xiaojun Song & Qi Xu, 2022. "Covariate distribution balance via propensity scores," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 37(6), pages 1093-1120, September.
    19. Wendy Chan, 2018. "Applications of Small Area Estimation to Generalization With Subclassification by Propensity Scores," Journal of Educational and Behavioral Statistics, , vol. 43(2), pages 182-224, April.
    20. Michael C. Knaus, 2021. "A double machine learning approach to estimate the effects of musical practice on student’s skills," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 184(1), pages 282-300, January.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2505.24296. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.