IDEAS home Printed from https://ideas.repec.org/p/nbr/nberwo/25657.html
   My bibliography  Save this paper

Administrative Data Linking and Statistical Power Problems in Randomized Experiments

Author

Listed:
  • Sarah Tahamont
  • Zubin Jelveh
  • Aaron Chalfin
  • Shi Yan
  • Benjamin Hansen

Abstract

Objective: The increasing availability of large administrative datasets has led to a particularly exciting innovation in criminal justice research, that of the “low-cost” randomized trial in which administrative data are used to measure outcomes in lieu of costly primary data collection. In this paper, we point out that randomized experiments that make use of administrative data have an unfortunate consequence: the destruction of statistical power. Linking data from an experimental intervention to administrative records that track outcomes of interest typically requires matching datasets without a common unique identifier. In order to minimize mistaken linkages, researchers will often use “exact matching” (retaining an individual only if all their demographic variables match exactly in two or more datasets) in order to ensure that speculative matches do not lead to errors in an analytic dataset. Methods: In this paper, we derive an analytic result for the consequences of linking errors on statistical power and show how the problem varies across different combinations of relevant inputs, including the matching error rate, the outcome density and the sample size. Results: We show that this seemingly conservative approach leads to underpowered experiments and potentially to the failure of entire experimental literatures. For marginally powered studies, which are common in empirical social science, exact matching is particularly problematic. Conclusions: We conclude on an optimistic note by showing that simple machine learning-based probabilistic matching algorithms allow criminal justice researchers to recover a considerable share of the statistical power that is lost to errors in data linking.

Suggested Citation

  • Sarah Tahamont & Zubin Jelveh & Aaron Chalfin & Shi Yan & Benjamin Hansen, 2019. "Administrative Data Linking and Statistical Power Problems in Randomized Experiments," NBER Working Papers 25657, National Bureau of Economic Research, Inc.
  • Handle: RePEc:nbr:nberwo:25657
    Note: TWP
    as

    Download full text from publisher

    File URL: http://www.nber.org/papers/w25657.pdf
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Angus Deaton, 2010. "Instruments, Randomization, and Learning about Development," Journal of Economic Literature, American Economic Association, vol. 48(2), pages 424-455, June.
    2. Aigner, Dennis J., 1973. "Regression with a binary independent variable subject to errors of observation," Journal of Econometrics, Elsevier, vol. 1(1), pages 49-59, March.
    3. Will Dobbie & Hans Grönqvist & Susan Niknami & Mårten Palme & Mikael Priks, 2018. "The Intergenerational Effects of Parental Incarceration," NBER Working Papers 24186, National Bureau of Economic Research, Inc.
    4. Kevin Arceneaux & Alan S. Gerber & Donald P. Green, 2010. "A Cautionary Note on the Use of Matching to Estimate Causal Effects: An Empirical Example Comparing Matching Estimates to an Experimental Benchmark," Sociological Methods & Research, , vol. 39(2), pages 256-282, November.
    5. Gordon B. Dahl & Andreas Ravndal Kostøl & Magne Mogstad, 2014. "Family Welfare Cultures," The Quarterly Journal of Economics, President and Fellows of Harvard College, vol. 129(4), pages 1711-1752.
    6. Will Dobbie & Jacob Goldin & Crystal S. Yang, 2018. "The Effects of Pretrial Detention on Conviction, Future Crime, and Employment: Evidence from Randomly Assigned Judges," American Economic Review, American Economic Association, vol. 108(2), pages 201-240, February.
    7. Charles Courtemanche & Augustine Denteh & Rusty Tchernis, 2019. "Estimating the Associations between SNAP and Food Insecurity, Obesity, and Food Purchases with Imperfect Administrative Measures of Participation," Southern Economic Journal, John Wiley & Sons, vol. 86(1), pages 202-228, July.
    8. Brett R. Gordon & Florian Zettelmeyer & Neha Bhargava & Dan Chapsky, 2019. "A Comparison of Approaches to Advertising Measurement: Evidence from Big Field Experiments at Facebook," Marketing Science, INFORMS, vol. 38(2), pages 193-225, March.
    9. James J. Heckman & Jeffrey A. Smith, 1995. "Assessing the Case for Social Experiments," Journal of Economic Perspectives, American Economic Association, vol. 9(2), pages 85-110, Spring.
    10. Petra E. Todd & Jeffrey A. Smith, 2001. "Reconciling Conflicting Evidence on the Performance of Propensity-Score Matching Methods," American Economic Review, American Economic Association, vol. 91(2), pages 112-118, May.
    11. Fischbacher, Urs & Gachter, Simon & Fehr, Ernst, 2001. "Are people conditionally cooperative? Evidence from a public goods experiment," Economics Letters, Elsevier, vol. 71(3), pages 397-404, June.
    12. P. Lahiri & Michael D. Larsen, 2005. "Regression Analysis With Linked Data," Journal of the American Statistical Association, American Statistical Association, vol. 100, pages 222-230, March.
    13. J E Johndrow & K Lum & D B Dunson, 2018. "Theoretical limits of microclustering for record linkage," Biometrika, Biometrika Trust, vol. 105(2), pages 431-446.
    14. Gerber, Alan & Malhotra, Neil, 2008. "Do Statistical Reporting Standards Affect What Is Published? Publication Bias in Two Leading Political Science Journals," Quarterly Journal of Political Science, now publishers, vol. 3(3), pages 313-326, October.
    15. Joshua D. Angrist & Jörn-Steffen Pischke, 2009. "Mostly Harmless Econometrics: An Empiricist's Companion," Economics Books, Princeton University Press, edition 1, number 8769.
    16. Duflo, Esther & Glennerster, Rachel & Kremer, Michael, 2008. "Using Randomization in Development Economics Research: A Toolkit," Handbook of Development Economics, in: T. Paul Schultz & John A. Strauss (ed.), Handbook of Development Economics, edition 1, volume 4, chapter 61, pages 3895-3962, Elsevier.
    17. Ran Abramitzky & Leah Boustan & Katherine Eriksson & James Feigenbaum & Santiago Pérez, 2021. "Automated Linking of Historical Data," Journal of Economic Literature, American Economic Association, vol. 59(3), pages 865-918, September.
    18. Rubin, Donald B., 2008. "Comment: The Design and Analysis of Gold Standard Randomized Experiments," Journal of the American Statistical Association, American Statistical Association, vol. 103(484), pages 1350-1353.
    19. Roberts, Jennifer & Wells, William, 2010. "The validity of criminal justice contacts reported by inmates: A comparison of self-reported data with official prison records," Journal of Criminal Justice, Elsevier, vol. 38(5), pages 1031-1037, September.
    20. Hansen, Benjamin & Waddell, Glen R., 2018. "Legal access to alcohol and criminality," Journal of Health Economics, Elsevier, vol. 57(C), pages 277-289.
    21. Alexander Gelber & Adam Isen & Judd B. Kessler, 2016. "The Effects of Youth Employment: Evidence from New York City Lotteries," The Quarterly Journal of Economics, President and Fellows of Harvard College, vol. 131(1), pages 423-460.
    22. Sandra E. Black & Paul J. Devereux & Kjell G. Salvanes, 2005. "Why the Apple Doesn't Fall Far: Understanding Intergenerational Transmission of Human Capital," American Economic Review, American Economic Association, vol. 95(1), pages 437-449, March.
    23. John P. A. Ioannidis & T. D. Stanley & Hristos Doucouliagos, 2017. "The Power of Bias in Economics Research," Economic Journal, Royal Economic Society, vol. 127(605), pages 236-265, October.
    24. Carlos Dobkin & Amy Finkelstein & Raymond Kluender & Matthew J. Notowidigdo, 2018. "The Economic Consequences of Hospital Admissions," American Economic Review, American Economic Association, vol. 108(2), pages 308-352, February.
    25. A. Smith, Jeffrey & E. Todd, Petra, 2005. "Does matching overcome LaLonde's critique of nonexperimental estimators?," Journal of Econometrics, Elsevier, vol. 125(1-2), pages 305-353.
    26. Joseph Price & Kasey Buckles & Jacob Van Leeuwen & Isaac Riley, 2019. "Combining Family History and Machine Learning to Link Historical Records," NBER Working Papers 26227, National Bureau of Economic Research, Inc.
    27. Benjamin Hansen, 2015. "Punishment and Deterrence: Evidence from Drunk Driving," American Economic Review, American Economic Association, vol. 105(4), pages 1581-1617, April.
    28. Guido W. Imbens, 2010. "Better LATE Than Nothing: Some Comments on Deaton (2009) and Heckman and Urzua (2009)," Journal of Economic Literature, American Economic Association, vol. 48(2), pages 399-423, June.
    29. Abhijit V. Banerjee & Esther Duflo, 2009. "The Experimental Approach to Development Economics," Annual Review of Economics, Annual Reviews, vol. 1(1), pages 151-178, May.
    30. David Cesarini & Erik Lindqvist & Robert Östling & Björn Wallace, 2016. "Wealth, Health, and Child Development: Evidence from Administrative Data on Swedish Lottery Players," The Quarterly Journal of Economics, President and Fellows of Harvard College, vol. 131(2), pages 687-738.
    31. Sara B. Heller & Anuj K. Shah & Jonathan Guryan & Jens Ludwig & Sendhil Mullainathan & Harold A. Pollack, 2017. "Thinking, Fast and Slow? Some Field Experiments to Reduce Crime and Dropout in Chicago," The Quarterly Journal of Economics, President and Fellows of Harvard College, vol. 132(1), pages 1-54.
    32. John E. DiNardo & Jörn-Steffen Pischke, 1997. "The Returns to Computer Use Revisited: Have Pencils Changed the Wage Structure Too?," The Quarterly Journal of Economics, President and Fellows of Harvard College, vol. 112(1), pages 291-303.
    33. Andrew C. Johnston & Alexandre Mas, 2018. "Potential Unemployment Insurance Duration and Labor Supply: The Individual and Market-Level Response to a Benefit Cut," Journal of Political Economy, University of Chicago Press, vol. 126(6), pages 2480-2522.
    34. Sendhil Mullainathan & Marianne Bertrand, 2001. "Do People Mean What They Say? Implications for Subjective Survey Data," American Economic Review, American Economic Association, vol. 91(2), pages 67-72, May.
    35. Asim Ijaz Khwaja & Atif Mian, 2005. "Do Lenders Favor Politically Connected Firms? Rent Provision in an Emerging Financial Market," The Quarterly Journal of Economics, President and Fellows of Harvard College, vol. 120(4), pages 1371-1411.
    36. LaLonde, Robert J, 1986. "Evaluating the Econometric Evaluations of Training Programs with Experimental Data," American Economic Review, American Economic Association, vol. 76(4), pages 604-620, September.
    37. David Powell & Seth Seabury, 2018. "Medical Care Spending and Labor Market Outcomes: Evidence from Workers' Compensation Reforms," American Economic Review, American Economic Association, vol. 108(10), pages 2995-3027, October.
    38. Randi Hjalmarsson & Matthew J. Lindquist, 2012. "Like Godfather, Like Son: Exploring the Intergenerational Nature of Crime," Journal of Human Resources, University of Wisconsin Press, vol. 47(2), pages 550-582.
    39. Camerer, Colin & Dreber, Anna & Forsell, Eskil & Ho, Teck-Hua & Huber, Jurgen & Johannesson, Magnus & Kirchler, Michael & Almenberg, Johan & Altmejd, Adam & Chan, Taizan & Heikensten, Emma & Holzmeist, 2016. "Evaluating replicability of laboratory experiments in Economics," MPRA Paper 75461, University Library of Munich, Germany.
    40. AIGNER, Dennis J., 1973. "Regression with a binary independent variable subject to errors of observation," LIDAM Reprints CORE 130, Université catholique de Louvain, Center for Operations Research and Econometrics (CORE).
    41. Martha J. Bailey & Connor Cole & Morgan Henderson & Catherine Massey, 2020. "How Well Do Automated Linking Methods Perform? Lessons from US Historical Data," Journal of Economic Literature, American Economic Association, vol. 58(4), pages 997-1044, December.
    42. David P. Farrington, 2003. "A Short History of Randomized Experiments in Criminology," Evaluation Review, , vol. 27(3), pages 218-227, June.
    43. Doleac, Jennifer L. & Temple, Chelsea & Pritchard, David & Roberts, Adam, 2020. "Which prisoner reentry programs work? Replicating and extending analyses of three RCTs," International Review of Law and Economics, Elsevier, vol. 62(C).
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Jeffrey Smith & Arthur Sweetman, 2016. "Viewpoint: Estimating the causal effects of policies and programs," Canadian Journal of Economics, Canadian Economics Association, vol. 49(3), pages 871-905, August.
    2. Onur Altindag & Theodore J. Joyce & Julie A. Reeder, 2015. "Effects of Peer Counseling to Support Breastfeeding: Assessing the External Validity of a Randomized Field Experiment," NBER Working Papers 21013, National Bureau of Economic Research, Inc.
    3. Guido W. Imbens & Jeffrey M. Wooldridge, 2009. "Recent Developments in the Econometrics of Program Evaluation," Journal of Economic Literature, American Economic Association, vol. 47(1), pages 5-86, March.
    4. Guido W. Imbens, 2010. "Better LATE Than Nothing: Some Comments on Deaton (2009) and Heckman and Urzua (2009)," Journal of Economic Literature, American Economic Association, vol. 48(2), pages 399-423, June.
    5. Susan Athey & Guido Imbens, 2016. "The Econometrics of Randomized Experiments," Papers 1607.00698, arXiv.org.
    6. Committee, Nobel Prize, 2021. "Answering causal questions using observational data," Nobel Prize in Economics documents 2021-2, Nobel Prize Committee.
    7. Anna Aizer & Shari Eli & Adriana Lleras-Muney & Keyoung Lee, 2020. "Do Youth Employment Programs Work? Evidence from the New Deal," NBER Working Papers 27103, National Bureau of Economic Research, Inc.
    8. Peter Hull & Michal Kolesár & Christopher Walters, 2022. "Labor by design: contributions of David Card, Joshua Angrist, and Guido Imbens," Scandinavian Journal of Economics, Wiley Blackwell, vol. 124(3), pages 603-645, July.
    9. Dupraz, Yannick & Ferrara, Andreas, 2021. "Fatherless: The Long-Term Effects of Losing a Father in the U.S. Civil War," CAGE Online Working Paper Series 538, Competitive Advantage in the Global Economy (CAGE).
    10. Bhuller, Manudeep & Dahl, Gordon B & Løken, Katrine V. & Mogstad, Magne, 2018. "Incarceration Spillovers in Criminal and Family Networks," Discussion Paper Series in Economics 15/2018, Norwegian School of Economics, Department of Economics.
    11. Nicolas R. Ziebarth, 2018. "Social Insurance and Health," Contributions to Economic Analysis, in: Health Econometrics, volume 127, pages 57-84, Emerald Group Publishing Limited.
    12. Kugler Franziska & Schwerdt Guido & Wößmann Ludger, 2014. "Ökonometrische Methoden zur Evaluierung kausaler Effekte der Wirtschaftspolitik," Perspektiven der Wirtschaftspolitik, De Gruyter, vol. 15(2), pages 105-132, June.
    13. Yonatan Eyal, 2020. "Self-Assessment Variables as a Source of Information in the Evaluation of Intervention Programs: A Theoretical and Methodological Framework," SAGE Open, , vol. 10(1), pages 21582440198, January.
    14. W. Bentley MacLeod, 2017. "Viewpoint: The human capital approach to inference," Canadian Journal of Economics, Canadian Economics Association, vol. 50(1), pages 5-39, February.
    15. Jesse Rothstein & Till von Wachter, 2016. "Social Experiments in the Labor Market," NBER Working Papers 22585, National Bureau of Economic Research, Inc.
    16. Sebastian Galiani & Juan Pantano, 2021. "Structural Models: Inception and Frontier," NBER Working Papers 28698, National Bureau of Economic Research, Inc.
    17. Ravallion, Martin, 2008. "Evaluating Anti-Poverty Programs," Handbook of Development Economics, in: T. Paul Schultz & John A. Strauss (ed.), Handbook of Development Economics, edition 1, volume 4, chapter 59, pages 3787-3846, Elsevier.
    18. James J. Heckman, 2005. "Micro Data, Heterogeneity and the Evaluation of Public Policy Part 2," The American Economist, Sage Publications, vol. 49(1), pages 16-44, March.
    19. Thoresen, Thor O. & Vattø, Trine E., 2015. "Validation of the discrete choice labor supply model by methods of the new tax responsiveness literature," Labour Economics, Elsevier, vol. 37(C), pages 38-53.
    20. Duo Qin & Yanqun Zhang, 2013. "A History of Polyvalent Structural Parameters: the Case of Instrument Variable Estimators," Working Papers 183, Department of Economics, SOAS University of London, UK.

    More about this item

    JEL classification:

    • C1 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods and Methodology: General
    • C12 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods and Methodology: General - - - Hypothesis Testing: General
    • K42 - Law and Economics - - Legal Procedure, the Legal System, and Illegal Behavior - - - Illegal Behavior and the Enforcement of Law

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nbr:nberwo:25657. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: the person in charge (email available below). General contact details of provider: https://edirc.repec.org/data/nberrus.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.