IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2006.09676.html
   My bibliography  Save this paper

Combining Experimental and Observational Data to Estimate Treatment Effects on Long Term Outcomes

Author

Listed:
  • Susan Athey
  • Raj Chetty
  • Guido Imbens

Abstract

There has been an increase in interest in experimental evaluations to estimate causal effects, partly because their internal validity tends to be high. At the same time, as part of the big data revolution, large, detailed, and representative, administrative data sets have become more widely available. However, the credibility of estimates of causal effects based on such data sets alone can be low. In this paper, we develop statistical methods for systematically combining experimental and observational data to obtain credible estimates of the causal effect of a binary treatment on a primary outcome that we only observe in the observational sample. Both the observational and experimental samples contain data about a treatment, observable individual characteristics, and a secondary (often short term) outcome. To estimate the effect of a treatment on the primary outcome while addressing the potential confounding in the observational sample, we propose a method that makes use of estimates of the relationship between the treatment and the secondary outcome from the experimental sample. If assignment to the treatment in the observational sample were unconfounded, we would expect the treatment effects on the secondary outcome in the two samples to be similar. We interpret differences in the estimated causal effects on the secondary outcome between the two samples as evidence of unobserved confounders in the observational sample, and develop control function methods for using those differences to adjust the estimates of the treatment effects on the primary outcome. We illustrate these ideas by combining data on class size and third grade test scores from the Project STAR experiment with observational data on class size and both third and eighth grade test scores from the New York school system.

Suggested Citation

  • Susan Athey & Raj Chetty & Guido Imbens, 2020. "Combining Experimental and Observational Data to Estimate Treatment Effects on Long Term Outcomes," Papers 2006.09676, arXiv.org.
  • Handle: RePEc:arx:papers:2006.09676
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2006.09676
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Zhengyuan Zhou & Susan Athey & Stefan Wager, 2023. "Offline Multi-Action Policy Learning: Generalization and Optimization," Operations Research, INFORMS, vol. 71(1), pages 148-183, January.
    2. Charles F. Manski, 2013. "Response to the Review of ‘Public Policy in an Uncertain World’," Economic Journal, Royal Economic Society, vol. 0, pages 412-415, August.
    3. Manski, Charles F, 1990. "Nonparametric Bounds on Treatment Effects," American Economic Review, American Economic Association, vol. 80(2), pages 319-323, May.
    4. Joshua D. Angrist & Jörn-Steffen Pischke, 2010. "The Credibility Revolution in Empirical Economics: How Better Research Design Is Taking the Con out of Econometrics," Journal of Economic Perspectives, American Economic Association, vol. 24(2), pages 3-30, Spring.
    5. Angus Deaton, 2010. "Instruments, Randomization, and Learning about Development," Journal of Economic Literature, American Economic Association, vol. 48(2), pages 424-455, June.
    6. Keisuke Hirano & Jack R. Porter, 2009. "Asymptotics for Statistical Treatment Rules," Econometrica, Econometric Society, vol. 77(5), pages 1683-1701, September.
    7. Card, David & Krueger, Alan B, 1994. "Minimum Wages and Employment: A Case Study of the Fast-Food Industry in New Jersey and Pennsylvania," American Economic Review, American Economic Association, vol. 84(4), pages 772-793, September.
    8. Duflo, Esther & Glennerster, Rachel & Kremer, Michael, 2008. "Using Randomization in Development Economics Research: A Toolkit," Handbook of Development Economics, in: T. Paul Schultz & John A. Strauss (ed.), Handbook of Development Economics, edition 1, volume 4, chapter 61, pages 3895-3962, Elsevier.
    9. Dehejia, Rajeev H., 2005. "Program evaluation as a decision problem," Journal of Econometrics, Elsevier, vol. 125(1-2), pages 141-173.
    10. David Card, 1990. "The Impact of the Mariel Boatlift on the Miami Labor Market," ILR Review, Cornell University, ILR School, vol. 43(2), pages 245-257, January.
    11. Guido W. Imbens & Whitney K. Newey, 2009. "Identification and Estimation of Triangular Simultaneous Equations Models Without Additivity," Econometrica, Econometric Society, vol. 77(5), pages 1481-1512, September.
    12. Krueger, Alan B & Whitmore, Diane M, 2001. "The Effect of Attending a Small Class in the Early Grades on College-Test Taking and Middle School Test Results: Evidence from Project STAR," Economic Journal, Royal Economic Society, vol. 111(468), pages 1-28, January.
    13. Ridder, Geert & Moffitt, Robert, 2007. "The Econometrics of Data Combination," Handbook of Econometrics, in: J.J. Heckman & E.E. Leamer (ed.), Handbook of Econometrics, edition 1, volume 6, chapter 75, Elsevier.
    14. Guido W. Imbens, 2010. "Better LATE Than Nothing: Some Comments on Deaton (2009) and Heckman and Urzua (2009)," Journal of Economic Literature, American Economic Association, vol. 48(2), pages 399-423, June.
    15. Rachel Glennerster & Kudzai Takavarasha, 2013. "Running Randomized Evaluations: A Practical Guide," Economics Books, Princeton University Press, edition 1, number 10085.
    16. Manski, Charles F., 2013. "Public Policy in an Uncertain World: Analysis and Decisions," Economics Books, Harvard University Press, number 9780674066892, Spring.
    17. Patrick Kline & Christopher R. Walters, 2019. "On Heckits, LATE, and Numerical Equivalence," Econometrica, Econometric Society, vol. 87(2), pages 677-696, March.
    18. Heckman, James, 2013. "Sample selection bias as a specification error," Applied Econometrics, Russian Presidential Academy of National Economy and Public Administration (RANEPA), vol. 31(3), pages 129-137.
    19. Alberto Abadie & Guido W. Imbens, 2016. "Matching on the Estimated Propensity Score," Econometrica, Econometric Society, vol. 84, pages 781-807, March.
    20. Charles F. Manski, 2004. "Statistical Treatment Rules for Heterogeneous Populations," Econometrica, Econometric Society, vol. 72(4), pages 1221-1246, July.
    21. Susan Athey & Guido W. Imbens, 2006. "Identification and Inference in Nonlinear Difference-in-Differences Models," Econometrica, Econometric Society, vol. 74(2), pages 431-497, March.
    22. Raj Chetty, 2009. "Sufficient Statistics for Welfare Analysis: A Bridge Between Structural and Reduced-Form Methods," Annual Review of Economics, Annual Reviews, vol. 1(1), pages 451-488, May.
    23. Anne O. Krueger, "undated". "The Missing Middle," Indian Council for Research on International Economic Relations, New Delhi Working Papers 230, Indian Council for Research on International Economic Relations, New Delhi, India.
    24. Jeffrey M. Wooldridge, 2015. "Control Function Methods in Applied Econometrics," Journal of Human Resources, University of Wisconsin Press, vol. 50(2), pages 420-445.
    25. Athey, Susan & Wager, Stefan, 2017. "Efficient Policy Learning," Research Papers 3506, Stanford University, Graduate School of Business.
    26. Imbens,Guido W. & Rubin,Donald B., 2015. "Causal Inference for Statistics, Social, and Biomedical Sciences," Cambridge Books, Cambridge University Press, number 9780521885881.
    27. Joseph Hotz, V. & Imbens, Guido W. & Mortimer, Julie H., 2005. "Predicting the efficacy of future training programs using past experiences at other locations," Journal of Econometrics, Elsevier, vol. 125(1-2), pages 241-270.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Jiafeng Chen & David M. Ritzwoller, 2021. "Semiparametric Estimation of Long-Term Treatment Effects," Papers 2107.14405, arXiv.org, revised Aug 2023.
    2. Ruoxuan Xiong & Allison Koenecke & Michael Powell & Zhu Shen & Joshua T. Vogelstein & Susan Athey, 2021. "Federated Causal Inference in Heterogeneous Observational Data," Papers 2107.11732, arXiv.org, revised Apr 2023.
    3. Guido Imbens & Nathan Kallus & Xiaojie Mao & Yuhao Wang, 2022. "Long-term Causal Inference Under Persistent Confounding via Data Combination," Papers 2202.07234, arXiv.org, revised Aug 2023.
    4. Kyungmin Park & Stephanie Lee & Shahryar Doosti & Yong Tan, 2023. "Provision of helpful review videos: Effects of video characteristics on perceived helpfulness," Production and Operations Management, Production and Operations Management Society, vol. 32(7), pages 2031-2048, July.
    5. Xavier D'Haultf{oe}uille & Christophe Gaillac & Arnaud Maurel, 2022. "Partially Linear Models under Data Combination," Papers 2204.05175, arXiv.org, revised Aug 2023.
    6. Dmitry Arkhangelsky & Guido Imbens, 2023. "Causal Models for Longitudinal and Panel Data: A Survey," Papers 2311.15458, arXiv.org, revised Mar 2024.
    7. Yechan Park & Yuya Sasaki, 2024. "A Bracketing Relationship for Long-Term Policy Evaluation with Combined Experimental and Observational Data," Papers 2401.12050, arXiv.org.
    8. George Z. Gui, 2020. "Combining Observational and Experimental Data to Improve Efficiency Using Imperfect Instruments," Papers 2010.05117, arXiv.org, revised Dec 2023.
    9. Harsh Parikh & Marco Morucci & Vittorio Orlandi & Sudeepa Roy & Cynthia Rudin & Alexander Volfovsky, 2023. "A Double Machine Learning Approach to Combining Experimental and Observational Data," Papers 2307.01449, arXiv.org, revised Apr 2024.
    10. Tatyana Deryugina & Julian Reif, 2023. "The Long-run Effect of Air Pollution on Survival," NBER Working Papers 31858, National Bureau of Economic Research, Inc.
    11. Xinyu Li & Wang Miao & Fang Lu & Xiao‐Hua Zhou, 2023. "Improving efficiency of inference in clinical trials with external control data," Biometrics, The International Biometric Society, vol. 79(1), pages 394-403, March.
    12. Carlos Fernández-Loría & Foster Provost, 2022. "Causal Decision Making and Causal Effect Estimation Are Not the Same…and Why It Matters," INFORMS Joural on Data Science, INFORMS, vol. 1(1), pages 4-16, April.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Guido W. Imbens & Jeffrey M. Wooldridge, 2009. "Recent Developments in the Econometrics of Program Evaluation," Journal of Economic Literature, American Economic Association, vol. 47(1), pages 5-86, March.
    2. Susan Athey & Guido W. Imbens, 2017. "The State of Applied Econometrics: Causality and Policy Evaluation," Journal of Economic Perspectives, American Economic Association, vol. 31(2), pages 3-32, Spring.
    3. Guido W. Imbens, 2020. "Potential Outcome and Directed Acyclic Graph Approaches to Causality: Relevance for Empirical Practice in Economics," Journal of Economic Literature, American Economic Association, vol. 58(4), pages 1129-1179, December.
    4. Susan Athey & Guido Imbens, 2016. "The Econometrics of Randomized Experiments," Papers 1607.00698, arXiv.org.
    5. Peter Hull & Michal Kolesár & Christopher Walters, 2022. "Labor by design: contributions of David Card, Joshua Angrist, and Guido Imbens," Scandinavian Journal of Economics, Wiley Blackwell, vol. 124(3), pages 603-645, July.
    6. Committee, Nobel Prize, 2021. "Answering causal questions using observational data," Nobel Prize in Economics documents 2021-2, Nobel Prize Committee.
    7. Jeffrey Smith & Arthur Sweetman, 2016. "Viewpoint: Estimating the causal effects of policies and programs," Canadian Journal of Economics, Canadian Economics Association, vol. 49(3), pages 871-905, August.
    8. Huber, Martin, 2019. "An introduction to flexible methods for policy evaluation," FSES Working Papers 504, Faculty of Economics and Social Sciences, University of Freiburg/Fribourg Switzerland.
    9. Alex Eble & Peter Boone & Diana Elbourne, 2017. "On Minimizing the Risk of Bias in Randomized Controlled Trials in Economics," The World Bank Economic Review, World Bank, vol. 31(3), pages 687-707.
    10. van der Klaauw, Bas, 2014. "From micro data to causality: Forty years of empirical labor economics," Labour Economics, Elsevier, vol. 30(C), pages 88-97.
    11. Guido W. Imbens, 2010. "Better LATE Than Nothing: Some Comments on Deaton (2009) and Heckman and Urzua (2009)," Journal of Economic Literature, American Economic Association, vol. 48(2), pages 399-423, June.
    12. Paul Hunermund & Elias Bareinboim, 2019. "Causal Inference and Data Fusion in Econometrics," Papers 1912.09104, arXiv.org, revised Mar 2023.
    13. Davide Viviano, 2019. "Policy Targeting under Network Interference," Papers 1906.10258, arXiv.org, revised Apr 2024.
    14. Guido W. Imbens, 2022. "Causality in Econometrics: Choice vs Chance," Econometrica, Econometric Society, vol. 90(6), pages 2541-2566, November.
    15. Michael C Knaus, 2022. "Double machine learning-based programme evaluation under unconfoundedness [Econometric methods for program evaluation]," The Econometrics Journal, Royal Economic Society, vol. 25(3), pages 602-627.
    16. Denis Fougère & Nicolas Jacquemet, 2020. "Policy Evaluation Using Causal Inference Methods," SciencePo Working papers Main hal-03455978, HAL.
    17. Bryan S. Graham & Guido W. Imbens & Geert Ridder, 2014. "Complementarity and aggregate implications of assortative matching: A nonparametric analysis," Quantitative Economics, Econometric Society, vol. 5, pages 29-66, March.
    18. Takanori Ida & Takunori Ishihara & Koichiro Ito & Daido Kido & Toru Kitagawa & Shosei Sakaguchi & Shusaku Sasaki, 2022. "Choosing Who Chooses: Selection-Driven Targeting in Energy Rebate Programs," NBER Working Papers 30469, National Bureau of Economic Research, Inc.
    19. Takanori Ida & Takunori Ishihara & Koichiro Ito & Daido Kido & Toru Kitagawa & Shosei Sakaguchi & Shusaku Sasaki, 2021. "Paternalism, Autonomy, or Both? Experimental Evidence from Energy Saving Programs," Papers 2112.09850, arXiv.org.
    20. Thoresen, Thor O. & Vattø, Trine E., 2015. "Validation of the discrete choice labor supply model by methods of the new tax responsiveness literature," Labour Economics, Elsevier, vol. 37(C), pages 38-53.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2006.09676. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.