IDEAS home Printed from https://ideas.repec.org/a/sae/evarev/v42y2018i2p147-175.html
   My bibliography  Save this article

What Can Be Learned From Empirical Evaluations of Nonexperimental Methods?

Author

Listed:
  • Vivian C. Wong
  • Peter M. Steiner
  • Kylie L. Anglin

Abstract

Given the widespread use of nonexperimental (NE) methods for assessing program impacts, there is a strong need to know whether NE approaches yield causally valid results in field settings. In within-study comparison (WSC) designs, the researcher compares treatment effects from an NE with those obtained from a randomized experiment that shares the same target population. The goal is to assess whether the stringent assumptions required for NE methods are likely to be met in practice. This essay provides an overview of recent efforts to empirically evaluate NE method performance in field settings. We discuss a brief history of the design, highlighting methodological innovations along the way. We also describe papers that are included in this two-volume special issue on WSC approaches and suggest future areas for consideration in the design, implementation, and analysis of WSCs.

Suggested Citation

  • Vivian C. Wong & Peter M. Steiner & Kylie L. Anglin, 2018. "What Can Be Learned From Empirical Evaluations of Nonexperimental Methods?," Evaluation Review, , vol. 42(2), pages 147-175, April.
  • Handle: RePEc:sae:evarev:v:42:y:2018:i:2:p:147-175
    DOI: 10.1177/0193841X18776870
    as

    Download full text from publisher

    File URL: https://journals.sagepub.com/doi/10.1177/0193841X18776870
    Download Restriction: no

    File URL: https://libkey.io/10.1177/0193841X18776870?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Kenneth Fortson & Natalya Verbitsky-Savitz & Emma Kopa & Philip Gleason, 2012. "Using an Experimental Evaluation of Charter Schools to Test Whether Nonexperimental Comparison Group Methods Can Replicate Experimental Impact Estimates," Mathematica Policy Research Reports 27f871b5b7b94f3a80278a593, Mathematica Policy Research.
    2. Brian Gill & Joshua Furgeson & Hanley Chiang & Bing-Ru Teh & Joshua Haimson & Natalya Verbitsky-Savitz, "undated". "Replicating Experimental Impact Estimates With Nonexperimental Methods in the Context of Control-Group Noncompliance," Mathematica Policy Research Reports 8482c7e80ad04f8490d29b8ce, Mathematica Policy Research.
    3. LaLonde, Robert J, 1986. "Evaluating the Econometric Evaluations of Training Programs with Experimental Data," American Economic Review, American Economic Association, vol. 76(4), pages 604-620, September.
    4. Keele, Luke J. & Titiunik, Rocío, 2015. "Geographic Boundaries as Regression Discontinuities," Political Analysis, Cambridge University Press, vol. 23(1), pages 127-155, January.
    5. Ferraro, Paul J. & Miranda, Juan José, 2014. "The performance of non-experimental designs in the evaluation of environmental programs: A design-replication study using a large-scale randomized experiment as a benchmark," Journal of Economic Behavior & Organization, Elsevier, vol. 107(PA), pages 344-365.
    6. V. Joseph Hotz & Guido W. Imbens & Jacob A. Klerman, 2006. "Evaluating the Differential Effects of Alternative Welfare-to-Work Training Components: A Reanalysis of the California GAIN Program," Journal of Labor Economics, University of Chicago Press, vol. 24(3), pages 521-566, July.
    7. A. Smith, Jeffrey & E. Todd, Petra, 2005. "Does matching overcome LaLonde's critique of nonexperimental estimators?," Journal of Econometrics, Elsevier, vol. 125(1-2), pages 305-353.
    8. James J. Heckman & Hidehiko Ichimura & Petra E. Todd, 1997. "Matching As An Econometric Evaluation Estimator: Evidence from Evaluating a Job Training Programme," The Review of Economic Studies, Review of Economic Studies Ltd, vol. 64(4), pages 605-654.
    9. repec:mpr:mprres:3694 is not listed on IDEAS
    10. Steven Glazerman & Dan M. Levy & David Myers, 2003. "Nonexperimental Versus Experimental Estimates of Earnings Impacts," The ANNALS of the American Academy of Political and Social Science, , vol. 589(1), pages 63-93, September.
    11. repec:mpr:mprres:7461 is not listed on IDEAS
    12. Friedlander, Daniel & Robins, Philip K, 1995. "Evaluating Program Evaluations: New Evidence on Commonly Used Nonexperimental Methods," American Economic Review, American Economic Association, vol. 85(4), pages 923-937, September.
    13. Heckman, J.J. & Hotz, V.J., 1988. "Choosing Among Alternative Nonexperimental Methods For Estimating The Impact Of Social Programs: The Case Of Manpower Training," University of Chicago - Economics Research Center 88-12, Chicago - Economics Research Center.
    14. David McKenzie & John Gibson & Steven Stillman, 2010. "How Important Is Selection? Experimental vs. Non-Experimental Measures of the Income Gains from Migration," Journal of the European Economic Association, MIT Press, vol. 8(4), pages 913-945, June.
    15. Duncan D. Chaplin & Thomas D. Cook & Jelena Zurovac & Jared S. Coopersmith & Mariel M. Finucane & Lauren N. Vollmer & Rebecca E. Morris, 2018. "The Internal And External Validity Of The Regression Discontinuity Design: A Meta‐Analysis Of 15 Within‐Study Comparisons," Journal of Policy Analysis and Management, John Wiley & Sons, Ltd., vol. 37(2), pages 403-429, March.
    16. Wichman, Casey J. & Ferraro, Paul J., 2017. "A cautionary tale on using panel data estimators to measure program impacts," Economics Letters, Elsevier, vol. 151(C), pages 82-90.
    17. Thomas D. Cook & Dominique Foray, 2007. "Building the Capacity to Experiment in Schools: A Case Study of the Institute of Educational Sciences in the US Department of Education," Economics of Innovation and New Technology, Taylor & Francis Journals, vol. 16(5), pages 385-402.
    18. Fortson, Kenneth & Gleason, Philip & Kopa, Emma & Verbitsky-Savitz, Natalya, 2015. "Horseshoes, hand grenades, and treatment effects? Reassessing whether nonexperimental estimators are biased," Economics of Education Review, Elsevier, vol. 44(C), pages 100-113.
    19. repec:mpr:mprres:2953 is not listed on IDEAS
    20. Peikes, Deborah N. & Moreno, Lorenzo & Orzol, Sean Michael, 2008. "Propensity Score Matching: A Note of Caution for Evaluators of Social Programs," The American Statistician, American Statistical Association, vol. 62, pages 222-231, August.
    21. James Heckman & Hidehiko Ichimura & Jeffrey Smith & Petra Todd, 1998. "Characterizing Selection Bias Using Experimental Data," Econometrica, Econometric Society, vol. 66(5), pages 1017-1098, September.
    22. Joshua D. Angrist & Miikka Rokkanen, 2015. "Wanna Get Away? Regression Discontinuity Estimation of Exam School Effects Away From the Cutoff," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 110(512), pages 1331-1344, December.
    23. Atila Abdulkadiroğlu & Joshua D. Angrist & Susan M. Dynarski & Thomas J. Kane & Parag A. Pathak, 2011. "Accountability and Flexibility in Public Schools: Evidence from Boston's Charters And Pilots," The Quarterly Journal of Economics, President and Fellows of Harvard College, vol. 126(2), pages 699-748.
    24. Rajeev H. Dehejia & Sadek Wahba, 2002. "Propensity Score-Matching Methods For Nonexperimental Causal Studies," The Review of Economics and Statistics, MIT Press, vol. 84(1), pages 151-161, February.
    25. Buddelmeyer, Hielke & Skoufias, Emmanuel, 2003. "An Evaluation of the Performance of Regression Discontinuity Design on PROGRESA," IZA Discussion Papers 827, Institute of Labor Economics (IZA).
    26. Sudhanshu Handa & John A. Maluccio, 2010. "Matching the Gold Standard: Comparing Experimental and Nonexperimental Evaluation Techniques for a Geographically Targeted Program," Economic Development and Cultural Change, University of Chicago Press, vol. 58(3), pages 415-447, April.
    27. Shadish, William R. & Clark, M. H. & Steiner, Peter M., 2008. "Can Nonrandomized Experiments Yield Accurate Answers? A Randomized Experiment Comparing Random and Nonrandom Assignments," Journal of the American Statistical Association, American Statistical Association, vol. 103(484), pages 1334-1344.
    28. Rebecca A. Maynard & Kenneth A. Couch & Coady Wing & Thomas D. Cook, 2013. "Strengthening The Regression Discontinuity Design Using Additional Design Elements: A Within‐Study Comparison," Journal of Policy Analysis and Management, John Wiley & Sons, Ltd., vol. 32(4), pages 853-877, September.
    29. Joshua D. Angrist & Jörn-Steffen Pischke, 2009. "Mostly Harmless Econometrics: An Empiricist's Companion," Economics Books, Princeton University Press, edition 1, number 8769.
    30. Espen Bratberg & Astrid Grasdal & Alf Erling Risa, 2002. "Evaluating Social Policy by Experimental and Nonexperimental Methods," Scandinavian Journal of Economics, Wiley Blackwell, vol. 104(1), pages 147-171, March.
    31. Kenneth A. Couch & Robert Bifulco, 2012. "Can Nonexperimental Estimates Replicate Estimates Based on Random Assignment in Evaluations of School Choice? A Within‐Study Comparison," Journal of Policy Analysis and Management, John Wiley & Sons, Ltd., vol. 31(3), pages 729-751, June.
    32. Elizabeth Ty Wilde & Robinson Hollister, 2007. "How close is close enough? Evaluating propensity score matching using data from a class size reduction experiment," Journal of Policy Analysis and Management, John Wiley & Sons, Ltd., vol. 26(3), pages 455-477.
    33. Peter M. Steiner & Thomas D. Cook & William R. Shadish, 2011. "On the Importance of Reliable Covariate Measurement in Selection Bias Adjustments Using Propensity Scores," Journal of Educational and Behavioral Statistics, , vol. 36(2), pages 213-236, April.
    34. Bratberg, Espen & Grasdal, Astrid & Risa, Alf Erling, 2002. " Evaluating Social Policy by Experimental and Nonexperimental Methods," Scandinavian Journal of Economics, Wiley Blackwell, vol. 104(1), pages 147-171.
    35. Kenneth Fortson & Philip Gleason & Emma Kopa & Natalya Verbitsky-Savitz, 2015. "Horseshoes, Hand Grenades, and Treatment Effects? Reassessing Whether Nonexperimental Estimators are Biased," Mathematica Policy Research Reports 88154a3523cc492dbca5bcb47, Mathematica Policy Research.
    36. Kelly Hallberg & Thomas D. Cook & Peter M. Steiner & M. H. Clark, "undated". "Pretest Measures of the Study Outcome and the Elimination of Selection Bias: Evidence from Three Within Study Comparisons," Mathematica Policy Research Reports 0ed024ae6d1f45fd9c1c7a428, Mathematica Policy Research.
    37. Thomas D. Cook & William R. Shadish & Vivian C. Wong, 2008. "Three conditions under which experiments and observational studies produce comparable causal estimates: New findings from within-study comparisons," Journal of Policy Analysis and Management, John Wiley & Sons, Ltd., vol. 27(4), pages 724-750.
    38. repec:mpr:mprres:7443 is not listed on IDEAS
    39. Roberto Agodini & Mark Dynarski, "undated". "Are Experiments the Only Option? A Look at Dropout Prevention Programs," Mathematica Policy Research Reports 51241adbf9fa4a26add6d54c5, Mathematica Policy Research.
    40. Roberto Agodini & Mark Dynarski, 2004. "Are Experiments the Only Option? A Look at Dropout Prevention Programs," The Review of Economics and Statistics, MIT Press, vol. 86(1), pages 180-194, February.
    41. Charles Michalopoulos & Howard S. Bloom & Carolyn J. Hill, 2004. "Can Propensity-Score Methods Match the Findings from a Random Assignment Evaluation of Mandatory Welfare-to-Work Programs?," The Review of Economics and Statistics, MIT Press, vol. 86(1), pages 156-179, February.
    42. Joshua D. Angrist, 2004. "American Education Research Changes Tack," Oxford Review of Economic Policy, Oxford University Press and Oxford Review of Economic Policy Limited, vol. 20(2), pages 198-212, Summer.
    43. Donald B. Rubin, 2005. "Causal Inference Using Potential Outcomes: Design, Modeling, Decisions," Journal of the American Statistical Association, American Statistical Association, vol. 100, pages 322-331, March.
    44. Yang Tang & Thomas D. Cook & Yasemin Kisbu-Sakarya & Heinrich Hock & Hanley Chiang, 2017. "The Comparative Regression Discontinuity (CRD) Design: An Overview and Demonstration of its Performance Relative to Basic RD and the Randomized Experiment," Advances in Econometrics, in: Regression Discontinuity Designs, volume 38, pages 237-279, Emerald Group Publishing Limited.
    45. Green, Donald P. & Leong, Terence Y. & Kern, Holger L. & Gerber, Alan S. & Larimer, Christopher W., 2009. "Testing the Accuracy of Regression Discontinuity Analysis Using Experimental Benchmarks," Political Analysis, Cambridge University Press, vol. 17(4), pages 400-417.
    46. Josh Angrist & David Autor & Sally Hudson & Amanda Pallais, 2015. "Evaluating Econometric Evaluations of Post-Secondary Aid," American Economic Review, American Economic Association, vol. 105(5), pages 502-507, May.
    47. Thomas Fraker & Rebecca Maynard, 1987. "The Adequacy of Comparison Group Designs for Evaluations of Employment-Related Programs," Journal of Human Resources, University of Wisconsin Press, vol. 22(2), pages 194-227.
    48. Joseph Hotz, V. & Imbens, Guido W. & Mortimer, Julie H., 2005. "Predicting the efficacy of future training programs using past experiences at other locations," Journal of Econometrics, Elsevier, vol. 125(1-2), pages 241-270.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Andrew P. Jaciw, 2016. "Assessing the Accuracy of Generalized Inferences From Comparison Group Studies Using a Within-Study Comparison Approach," Evaluation Review, , vol. 40(3), pages 199-240, June.
    2. Katherine Baicker & Theodore Svoronos, 2019. "Testing the Validity of the Single Interrupted Time Series Design," NBER Working Papers 26080, National Bureau of Economic Research, Inc.
    3. Fatih Unlu & Douglas Lee Lauen & Sarah Crittenden Fuller & Tiffany Berglund & Elc Estrera, 2021. "Can Quasi‐Experimental Evaluations That Rely On State Longitudinal Data Systems Replicate Experimental Results?," Journal of Policy Analysis and Management, John Wiley & Sons, Ltd., vol. 40(2), pages 572-613, March.
    4. Fortson, Kenneth & Gleason, Philip & Kopa, Emma & Verbitsky-Savitz, Natalya, 2015. "Horseshoes, hand grenades, and treatment effects? Reassessing whether nonexperimental estimators are biased," Economics of Education Review, Elsevier, vol. 44(C), pages 100-113.
    5. Ferraro, Paul J. & Miranda, Juan José, 2014. "The performance of non-experimental designs in the evaluation of environmental programs: A design-replication study using a large-scale randomized experiment as a benchmark," Journal of Economic Behavior & Organization, Elsevier, vol. 107(PA), pages 344-365.
    6. Katherine Baicker & Theodore Svoronos, 2019. "Testing the Validity of the Single Interrupted Time Series Design," CID Working Papers 364, Center for International Development at Harvard University.
    7. Flores, Carlos A. & Mitnik, Oscar A., 2009. "Evaluating Nonexperimental Estimators for Multiple Treatments: Evidence from Experimental Data," IZA Discussion Papers 4451, Institute of Labor Economics (IZA).
    8. Ben Weidmann & Luke Miratrix, 2021. "Lurking Inferential Monsters? Quantifying Selection Bias In Evaluations Of School Programs," Journal of Policy Analysis and Management, John Wiley & Sons, Ltd., vol. 40(3), pages 964-986, June.
    9. Kenneth Fortson & Philip Gleason & Emma Kopa & Natalya Verbitsky-Savitz, "undated". "Horseshoes, Hand Grenades, and Treatment Effects? Reassessing Bias in Nonexperimental Estimators," Mathematica Policy Research Reports 1c24988cd5454dd3be51fbc2c, Mathematica Policy Research.
    10. Vivian C. Wong & Peter M. Steiner, 2018. "Designs of Empirical Evaluations of Nonexperimental Methods in Field Settings," Evaluation Review, , vol. 42(2), pages 176-213, April.
    11. Elizabeth Ty Wilde & Robinson Hollister, 2007. "How close is close enough? Evaluating propensity score matching using data from a class size reduction experiment," Journal of Policy Analysis and Management, John Wiley & Sons, Ltd., vol. 26(3), pages 455-477.
    12. Justine Burns & Malcolm Kewsell & Rebecca Thornton, 2009. "Evaluating the Impact of Health Programmes," SALDRU Working Papers 40, Southern Africa Labour and Development Research Unit, University of Cape Town.
    13. Guido W. Imbens & Jeffrey M. Wooldridge, 2009. "Recent Developments in the Econometrics of Program Evaluation," Journal of Economic Literature, American Economic Association, vol. 47(1), pages 5-86, March.
    14. Robin Jacob & Marie-Andree Somers & Pei Zhu & Howard Bloom, 2016. "The Validity of the Comparative Interrupted Time Series Design for Evaluating the Effect of School-Level Interventions," Evaluation Review, , vol. 40(3), pages 167-198, June.
    15. Peter R. Mueser & Kenneth R. Troske & Alexey Gorislavsky, 2007. "Using State Administrative Data to Measure Program Performance," The Review of Economics and Statistics, MIT Press, vol. 89(4), pages 761-783, November.
    16. Lechner, Michael & Wunsch, Conny, 2013. "Sensitivity of matching-based program evaluations to the availability of control variables," Labour Economics, Elsevier, vol. 21(C), pages 111-121.
    17. Henrik Hansen & Ninja Ritter Klejnstrup & Ole Winckler Andersen, 2011. "A Comparison of Model-based and Design-based Impact Evaluations of Interventions in Developing Countries," IFRO Working Paper 2011/16, University of Copenhagen, Department of Food and Resource Economics.
    18. Sudhanshu Handa & John A. Maluccio, 2010. "Matching the Gold Standard: Comparing Experimental and Nonexperimental Evaluation Techniques for a Geographically Targeted Program," Economic Development and Cultural Change, University of Chicago Press, vol. 58(3), pages 415-447, April.
    19. Andrew P. Jaciw, 2016. "Applications of a Within-Study Comparison Approach for Evaluating Bias in Generalized Causal Inferences From Comparison Groups Studies," Evaluation Review, , vol. 40(3), pages 241-276, June.
    20. Travis St.Clair & Kelly Hallberg & Thomas D. Cook, 2016. "The Validity and Precision of the Comparative Interrupted Time-Series Design," Journal of Educational and Behavioral Statistics, , vol. 41(3), pages 269-299, June.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:sae:evarev:v:42:y:2018:i:2:p:147-175. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: SAGE Publications (email available below). General contact details of provider: .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.