IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2006.09676.html
   My bibliography  Save this paper

Using Experiments to Correct for Selection in Observational Studies

Author

Listed:
  • Susan Athey
  • Raj Chetty
  • Guido Imbens

Abstract

Researchers increasingly have access to two types of data: (i) large observational datasets where treatment (e.g., class size) is not randomized but several primary outcomes (e.g., graduation rates) and secondary outcomes (e.g., test scores) are observed and (ii) experimental data in which treatment is randomized but only secondary outcomes are observed. We develop a new method to estimate treatment effects on primary outcomes in such settings. We use the difference between the secondary outcome and its predicted value based on the experimental treatment effect to measure selection bias in the observational data. Controlling for this estimate of selection bias yields an unbiased estimate of the treatment effect on the primary outcome under a new assumption that we term latent unconfoundedness, which requires that the same confounders affect the primary and secondary outcomes. Latent unconfoundedness weakens the assumptions underlying commonly used surrogate estimators. We apply our estimator to identify the effect of third grade class size on students outcomes. Estimated impacts on test scores using OLS regressions in observational school district data have the opposite sign of estimates from the Tennessee STAR experiment. In contrast, selection-corrected estimates in the observational data replicate the experimental estimates. Our estimator reveals that reducing class sizes by 25% increases high school graduation rates by 0.7 percentage points. Controlling for observables does not change the OLS estimates, demonstrating that experimental selection correction can remove biases that cannot be addressed with standard controls.

Suggested Citation

  • Susan Athey & Raj Chetty & Guido Imbens, 2020. "Using Experiments to Correct for Selection in Observational Studies," Papers 2006.09676, arXiv.org, revised May 2025.
  • Handle: RePEc:arx:papers:2006.09676
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2006.09676
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Charles F. Manski, 2013. "Response to the Review of ‘Public Policy in an Uncertain World’," Economic Journal, Royal Economic Society, vol. 0, pages 412-415, August.
    2. Joshua D. Angrist & Jörn-Steffen Pischke, 2010. "The Credibility Revolution in Empirical Economics: How Better Research Design Is Taking the Con out of Econometrics," Journal of Economic Perspectives, American Economic Association, vol. 24(2), pages 3-30, Spring.
    3. Manski, Charles F, 1990. "Nonparametric Bounds on Treatment Effects," American Economic Review, American Economic Association, vol. 80(2), pages 319-323, May.
    4. Angus Deaton, 2010. "Instruments, Randomization, and Learning about Development," Journal of Economic Literature, American Economic Association, vol. 48(2), pages 424-455, June.
    5. Keisuke Hirano & Jack R. Porter, 2009. "Asymptotics for Statistical Treatment Rules," Econometrica, Econometric Society, vol. 77(5), pages 1683-1701, September.
    6. Card, David & Krueger, Alan B, 1994. "Minimum Wages and Employment: A Case Study of the Fast-Food Industry in New Jersey and Pennsylvania," American Economic Review, American Economic Association, vol. 84(4), pages 772-793, September.
    7. David Card, 1990. "The Impact of the Mariel Boatlift on the Miami Labor Market," ILR Review, Cornell University, ILR School, vol. 43(2), pages 245-257, January.
    8. Guido W. Imbens & Whitney K. Newey, 2009. "Identification and Estimation of Triangular Simultaneous Equations Models Without Additivity," Econometrica, Econometric Society, vol. 77(5), pages 1481-1512, September.
    9. Ridder, Geert & Moffitt, Robert, 2007. "The Econometrics of Data Combination," Handbook of Econometrics, in: J.J. Heckman & E.E. Leamer (ed.), Handbook of Econometrics, edition 1, volume 6, chapter 75, Elsevier.
    10. Rachel Glennerster & Kudzai Takavarasha, 2013. "Running Randomized Evaluations: A Practical Guide," Economics Books, Princeton University Press, edition 1, number 10085.
    11. Alberto Abadie & Guido W. Imbens, 2016. "Matching on the Estimated Propensity Score," Econometrica, Econometric Society, vol. 84, pages 781-807, March.
    12. Zhengyuan Zhou & Susan Athey & Stefan Wager, 2023. "Offline Multi-Action Policy Learning: Generalization and Optimization," Operations Research, INFORMS, vol. 71(1), pages 148-183, January.
    13. Raj Chetty, 2009. "Sufficient Statistics for Welfare Analysis: A Bridge Between Structural and Reduced-Form Methods," Annual Review of Economics, Annual Reviews, vol. 1(1), pages 451-488, May.
    14. Duflo, Esther & Glennerster, Rachel & Kremer, Michael, 2008. "Using Randomization in Development Economics Research: A Toolkit," Handbook of Development Economics, in: T. Paul Schultz & John A. Strauss (ed.), Handbook of Development Economics, edition 1, volume 4, chapter 61, pages 3895-3962, Elsevier.
    15. Charles F. Manski, 2004. "Statistical Treatment Rules for Heterogeneous Populations," Econometrica, Econometric Society, vol. 72(4), pages 1221-1246, July.
    16. Susan Athey & Guido W. Imbens, 2006. "Identification and Inference in Nonlinear Difference-in-Differences Models," Econometrica, Econometric Society, vol. 74(2), pages 431-497, March.
    17. Krueger, Alan B & Whitmore, Diane M, 2001. "The Effect of Attending a Small Class in the Early Grades on College-Test Taking and Middle School Test Results: Evidence from Project STAR," Economic Journal, Royal Economic Society, vol. 111(468), pages 1-28, January.
    18. Guido W. Imbens, 2010. "Better LATE Than Nothing: Some Comments on Deaton (2009) and Heckman and Urzua (2009)," Journal of Economic Literature, American Economic Association, vol. 48(2), pages 399-423, June.
    19. Patrick Kline & Christopher R. Walters, 2019. "On Heckits, LATE, and Numerical Equivalence," Econometrica, Econometric Society, vol. 87(2), pages 677-696, March.
    20. Joseph Hotz, V. & Imbens, Guido W. & Mortimer, Julie H., 2005. "Predicting the efficacy of future training programs using past experiences at other locations," Journal of Econometrics, Elsevier, vol. 125(1-2), pages 241-270.
    21. Dehejia, Rajeev H., 2005. "Program evaluation as a decision problem," Journal of Econometrics, Elsevier, vol. 125(1-2), pages 141-173.
    22. Manski, Charles F., 2013. "Public Policy in an Uncertain World: Analysis and Decisions," Economics Books, Harvard University Press, number 9780674066892, march.
    23. Heckman, James, 2013. "Sample selection bias as a specification error," Applied Econometrics, Russian Presidential Academy of National Economy and Public Administration (RANEPA), vol. 31(3), pages 129-137.
    24. Anne O. Krueger, "undated". "The Missing Middle," Indian Council for Research on International Economic Relations, New Delhi Working Papers 230, Indian Council for Research on International Economic Relations, New Delhi, India.
    25. Jeffrey M. Wooldridge, 2015. "Control Function Methods in Applied Econometrics," Journal of Human Resources, University of Wisconsin Press, vol. 50(2), pages 420-445.
    26. Athey, Susan & Wager, Stefan, 2017. "Efficient Policy Learning," Research Papers 3506, Stanford University, Graduate School of Business.
    27. Imbens,Guido W. & Rubin,Donald B., 2015. "Causal Inference for Statistics, Social, and Biomedical Sciences," Cambridge Books, Cambridge University Press, number 9780521885881, June.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Jiafeng Chen & David M. Ritzwoller, 2021. "Semiparametric Estimation of Long-Term Treatment Effects," Papers 2107.14405, arXiv.org, revised Aug 2023.
    2. Francisco Blasques & Paolo Gorgi & Siem Jan Koopman & Noah Stegehuis, 2024. "Mitigating Estimation Risk: a Data-Driven Fusion of Experimental and Observational Data," Tinbergen Institute Discussion Papers 24-066/III, Tinbergen Institute.
    3. Ruoxuan Xiong & Allison Koenecke & Michael Powell & Zhu Shen & Joshua T. Vogelstein & Susan Athey, 2021. "Federated Causal Inference in Heterogeneous Observational Data," Papers 2107.11732, arXiv.org, revised Apr 2023.
    4. Guido Imbens & Nathan Kallus & Xiaojie Mao & Yuhao Wang, 2022. "Long-term Causal Inference Under Persistent Confounding via Data Combination," Papers 2202.07234, arXiv.org, revised Aug 2024.
    5. Li, Ting & Shi, Chengchun & Wen, Qianglin & Sui, Yang & Qin, Yongli & Lai, Chunbo & Zhu, Hongtu, 2024. "Combining experimental and historical data for policy evaluation," LSE Research Online Documents on Economics 125588, London School of Economics and Political Science, LSE Library.
    6. X D’Haultfœuille & C Gaillac & A Maurel, 2025. "Partially Linear Models under Data Combination," The Review of Economic Studies, Review of Economic Studies Ltd, vol. 92(1), pages 238-267.
    7. Dmitry Arkhangelsky & Guido Imbens, 2023. "Causal Models for Longitudinal and Panel Data: A Survey," Papers 2311.15458, arXiv.org, revised Jun 2024.
    8. Yechan Park & Yuya Sasaki, 2024. "A Bracketing Relationship for Long-Term Policy Evaluation with Combined Experimental and Observational Data," Papers 2401.12050, arXiv.org.
    9. Harsh Parikh & Marco Morucci & Vittorio Orlandi & Sudeepa Roy & Cynthia Rudin & Alexander Volfovsky, 2023. "A Double Machine Learning Approach to Combining Experimental and Observational Data," Papers 2307.01449, arXiv.org, revised Apr 2024.
    10. Yechan Park & Yuya Sasaki, 2024. "Matching $\leq$ Hybrid $\leq$ Difference in Differences," Papers 2411.07952, arXiv.org, revised Feb 2025.
    11. Tatyana Deryugina & Julian Reif, 2023. "The Long-run Effect of Air Pollution on Survival," NBER Working Papers 31858, National Bureau of Economic Research, Inc.
    12. Xinyu Li & Wang Miao & Fang Lu & Xiao‐Hua Zhou, 2023. "Improving efficiency of inference in clinical trials with external control data," Biometrics, The International Biometric Society, vol. 79(1), pages 394-403, March.
    13. Kyungmin Park & Stephanie Lee & Shahryar Doosti & Yong Tan, 2023. "Provision of helpful review videos: Effects of video characteristics on perceived helpfulness," Production and Operations Management, Production and Operations Management Society, vol. 32(7), pages 2031-2048, July.
    14. George Z. Gui, 2020. "Combining Observational and Experimental Data to Improve Efficiency Using Imperfect Instruments," Papers 2010.05117, arXiv.org, revised Dec 2023.
    15. Carlos Fernández-Loría & Foster Provost, 2022. "Causal Decision Making and Causal Effect Estimation Are Not the Same…and Why It Matters," INFORMS Joural on Data Science, INFORMS, vol. 1(1), pages 4-16, April.
    16. Yechan Park & Yuya Sasaki, 2024. "The Informativeness of Combined Experimental and Observational Data under Dynamic Selection," Papers 2403.16177, arXiv.org.
    17. George Z. Gui, 2024. "Combining Observational and Experimental Data to Improve Efficiency Using Imperfect Instruments," Marketing Science, INFORMS, vol. 43(2), pages 378-391, March.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Guido W. Imbens & Jeffrey M. Wooldridge, 2009. "Recent Developments in the Econometrics of Program Evaluation," Journal of Economic Literature, American Economic Association, vol. 47(1), pages 5-86, March.
    2. Susan Athey & Guido W. Imbens, 2017. "The State of Applied Econometrics: Causality and Policy Evaluation," Journal of Economic Perspectives, American Economic Association, vol. 31(2), pages 3-32, Spring.
    3. Guido W. Imbens, 2020. "Potential Outcome and Directed Acyclic Graph Approaches to Causality: Relevance for Empirical Practice in Economics," Journal of Economic Literature, American Economic Association, vol. 58(4), pages 1129-1179, December.
    4. Susan Athey & Guido Imbens, 2016. "The Econometrics of Randomized Experiments," Papers 1607.00698, arXiv.org.
    5. Peter Hull & Michal Kolesár & Christopher Walters, 2022. "Labor by design: contributions of David Card, Joshua Angrist, and Guido Imbens," Scandinavian Journal of Economics, Wiley Blackwell, vol. 124(3), pages 603-645, July.
    6. Committee, Nobel Prize, 2021. "Answering causal questions using observational data," Nobel Prize in Economics documents 2021-2, Nobel Prize Committee.
    7. Molinari, Francesca, 2020. "Microeconometrics with partial identification," Handbook of Econometrics, in: Steven N. Durlauf & Lars Peter Hansen & James J. Heckman & Rosa L. Matzkin (ed.), Handbook of Econometrics, edition 1, volume 7, chapter 0, pages 355-486, Elsevier.
    8. Jeffrey Smith & Arthur Sweetman, 2016. "Viewpoint: Estimating the causal effects of policies and programs," Canadian Journal of Economics, Canadian Economics Association, vol. 49(3), pages 871-905, August.
    9. Huber, Martin, 2019. "An introduction to flexible methods for policy evaluation," FSES Working Papers 504, Faculty of Economics and Social Sciences, University of Freiburg/Fribourg Switzerland.
    10. Mogstad, Magne & Torgovitsky, Alexander, 2024. "Instrumental variables with unobserved heterogeneity in treatment effects," Handbook of Labor Economics,, Elsevier.
    11. Yuehao Bai & Azeem M. Shaikh & Max Tabord-Meehan, 2024. "A Primer on the Analysis of Randomized Experiments and a Survey of some Recent Advances," Papers 2405.03910, arXiv.org, revised Apr 2025.
    12. Alex Eble & Peter Boone & Diana Elbourne, 2017. "On Minimizing the Risk of Bias in Randomized Controlled Trials in Economics," The World Bank Economic Review, World Bank, vol. 31(3), pages 687-707.
    13. van der Klaauw, Bas, 2014. "From micro data to causality: Forty years of empirical labor economics," Labour Economics, Elsevier, vol. 30(C), pages 88-97.
    14. Guido W. Imbens, 2010. "Better LATE Than Nothing: Some Comments on Deaton (2009) and Heckman and Urzua (2009)," Journal of Economic Literature, American Economic Association, vol. 48(2), pages 399-423, June.
    15. Paul Hunermund & Elias Bareinboim, 2019. "Causal Inference and Data Fusion in Econometrics," Papers 1912.09104, arXiv.org, revised Mar 2023.
    16. Davide Viviano, 2019. "Policy Targeting under Network Interference," Papers 1906.10258, arXiv.org, revised Apr 2024.
    17. Guido W. Imbens, 2022. "Causality in Econometrics: Choice vs Chance," Econometrica, Econometric Society, vol. 90(6), pages 2541-2566, November.
    18. Michael C Knaus, 2022. "Double machine learning-based programme evaluation under unconfoundedness [Econometric methods for program evaluation]," The Econometrics Journal, Royal Economic Society, vol. 25(3), pages 602-627.
    19. Denis Fougère & Nicolas Jacquemet, 2020. "Policy Evaluation Using Causal Inference Methods," SciencePo Working papers Main hal-03455978, HAL.
    20. Bryan S. Graham & Guido W. Imbens & Geert Ridder, 2014. "Complementarity and aggregate implications of assortative matching: A nonparametric analysis," Quantitative Economics, Econometric Society, vol. 5, pages 29-66, March.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2006.09676. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.