IDEAS home Printed from https://ideas.repec.org/p/nbr/nberwo/24019.html
   My bibliography  Save this paper

How Well Do Automated Linking Methods Perform? Lessons from U.S. Historical Data

Author

Listed:
  • Martha Bailey
  • Connor Cole
  • Morgan Henderson
  • Catherine Massey

Abstract

This paper reviews the literature in historical record linkage in the U.S. and examines the performance of widely-used automated record linking algorithms in two high-quality historical datasets and one synthetic ground truth. Focusing on algorithms in current practice, our findings highlight the important effects of linking methods on data quality. We find that (1) no method (including hand-linking) consistently produces representative samples; (2) 15 to 37 percent of links chosen by prominent machine linking algorithms are identified as false links by human reviewers; and (3) these false links are systematically related to baseline sample characteristics, suggesting that machine algorithms may introduce complicated forms of bias into analyses. We find that prominent linking algorithms attenuate estimates of the intergenerational income elasticity by up to 20 percent and common variations in algorithm choices result in greater attenuation. These results recommend that current practice could be improved by placing more emphasis on reducing false links and less emphasis on increasing match rates. We conclude with constructive suggestions for reducing linking errors and directions for future research.

Suggested Citation

  • Martha Bailey & Connor Cole & Morgan Henderson & Catherine Massey, 2017. "How Well Do Automated Linking Methods Perform? Lessons from U.S. Historical Data," NBER Working Papers 24019, National Bureau of Economic Research, Inc.
  • Handle: RePEc:nbr:nberwo:24019
    Note: AG DAE LS
    as

    Download full text from publisher

    File URL: http://www.nber.org/papers/w24019.pdf
    Download Restriction: no
    ---><---

    Other versions of this item:

    References listed on IDEAS

    as
    1. Leah Platt Boustan & Matthew E. Kahn & Paul W. Rhode, 2012. "Moving to Higher Ground: Migration Response to Natural Disasters in the Early Twentieth Century," American Economic Review, American Economic Association, vol. 102(3), pages 238-244, May.
    2. DiNardo, John & Fortin, Nicole M & Lemieux, Thomas, 1996. "Labor Market Institutions and the Distribution of Wages, 1973-1992: A Semiparametric Approach," Econometrica, Econometric Society, vol. 64(5), pages 1001-1044, September.
    3. Hoyt Bleakley & Joseph Ferrie, 2016. "Shocking Behavior: Random Wealth in Antebellum Georgia and Human Capital Across Generations," The Quarterly Journal of Economics, President and Fellows of Harvard College, vol. 131(3), pages 1455-1495.
    4. Melvin Stephens & Takashi Unayama, 2019. "Estimating the Impacts of Program Benefits: Using Instrumental Variables with Underreported and Imputed Data," The Review of Economics and Statistics, MIT Press, vol. 101(3), pages 468-475, July.
    5. Bhashkar Mazumder, 2005. "Fortunate Sons: New Estimates of Intergenerational Mobility in the United States Using Social Security Earnings Data," The Review of Economics and Statistics, MIT Press, vol. 87(2), pages 235-255, May.
    6. Kasey S. Buckles & Daniel M. Hungerman, 2013. "Season of Birth and Later Outcomes: Old Questions, New Answers," The Review of Economics and Statistics, MIT Press, vol. 95(3), pages 711-724, July.
    7. P. Lahiri & Michael D. Larsen, 2005. "Regression Analysis With Linked Data," Journal of the American Statistical Association, American Statistical Association, vol. 100, pages 222-230, March.
    8. Horowitz, Joel L & Manski, Charles F, 1995. "Identification and Robustness with Contaminated and Corrupted Data," Econometrica, Econometric Society, vol. 63(2), pages 281-302, March.
    9. Raj Chetty & Nathaniel Hendren & Patrick Kline & Emmanuel Saez & Nicholas Turner, 2014. "Is the United States Still a Land of Opportunity? Recent Trends in Intergenerational Mobility," American Economic Review, American Economic Association, vol. 104(5), pages 141-147, May.
    10. Abowd, John M. & Vilhuber, Lars, 2005. "The Sensitivity of Economic Statistics to Coding Errors in Personal Identifiers," Journal of Business & Economic Statistics, American Statistical Association, vol. 23, pages 133-152, April.
    11. Steven Haider & Gary Solon, 2006. "Life-Cycle Variation in the Association between Current and Lifetime Earnings," American Economic Review, American Economic Association, vol. 96(4), pages 1308-1320, September.
    12. Raj Chetty & Nathaniel Hendren & Patrick Kline & Emmanuel Saez, 2014. "Where is the land of Opportunity? The Geography of Intergenerational Mobility in the United States," The Quarterly Journal of Economics, President and Fellows of Harvard College, vol. 129(4), pages 1553-1623.
    13. Margo, Robert A., 2016. "Obama, Katrina, and the Persistence of Racial Inequality," The Journal of Economic History, Cambridge University Press, vol. 76(2), pages 301-341, June.
    14. Shari Eli & Laura Salisbury & Allison Shertzer, 2016. "Migration Responses to Conflict: Evidence from the Border of the American Civil War," NBER Working Papers 22591, National Bureau of Economic Research, Inc.
    15. Michael Hout & Avery M. Guest, 2013. "Intergenerational Occupational Mobility in Great Britain and the United States since 1850: Comment," American Economic Review, American Economic Association, vol. 103(5), pages 2021-2040, August.
    16. A'Hearn, Brian & Baten, Jörg & Crayen, Dorothee, 2009. "Quantifying Quantitative Literacy: Age Heaping and the History of Human Capital," The Journal of Economic History, Cambridge University Press, vol. 69(3), pages 783-808, September.
    17. Abramitzky, Ran & Boustan, Leah Platt & Eriksson, Katherine, 2013. "Have the poor always been less likely to migrate? Evidence from inheritance practices during the age of mass migration," Journal of Development Economics, Elsevier, vol. 102(C), pages 2-14.
    18. Dora L. Costa & Heather DeSomer & Eric Hanss & Christopher Roudiez & Sven E. Wilson & Noelle Yetter, 2017. "Union Army veterans, all grown up," Historical Methods: A Journal of Quantitative and Interdisciplinary History, Taylor & Francis Journals, vol. 50(2), pages 79-95, April.
    19. White, Halbert, 1980. "A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity," Econometrica, Econometric Society, vol. 48(4), pages 817-838, May.
    20. Hoyt Bleakley & Joseph P. Ferrie, 2013. "Up from Poverty? The 1832 Cherokee Land Lottery and the Long-run Distribution of Wealth," NBER Working Papers 19175, National Bureau of Economic Research, Inc.
    21. James Heckman & Hidehiko Ichimura & Jeffrey Smith & Petra Todd, 1998. "Characterizing Selection Bias Using Experimental Data," Econometrica, Econometric Society, vol. 66(5), pages 1017-1098, September.
    22. William J. Collins & Marianne H. Wanamaker, 2017. "African American Intergenerational Economic Mobility Since 1880," NBER Working Papers 23395, National Bureau of Economic Research, Inc.
    23. Yu Xie & Alexandra Killewald, 2013. "Intergenerational Occupational Mobility in Great Britain and the United States since 1850: Comment," American Economic Review, American Economic Association, vol. 103(5), pages 2003-2020, August.
    24. Leah Platt Boustan & Carola Frydman & Robert A. Margo, 2014. "Introduction to "Human Capital in History: The American Record"," NBER Chapters, in: Human Capital in History: The American Record, pages 1-14, National Bureau of Economic Research, Inc.
    25. Richard Hornbeck & Suresh Naidu, 2014. "When the Levee Breaks: Black Migration and Economic Development in the American South," American Economic Review, American Economic Association, vol. 104(3), pages 963-990, March.
    26. Catherine G. Massey, 2017. "Playing with matches: An assessment of accuracy in linked historical data," Historical Methods: A Journal of Quantitative and Interdisciplinary History, Taylor & Francis Journals, vol. 50(3), pages 129-143, July.
    27. Ran Abramitzky & Leah Platt Boustan & Katherine Eriksson, 2012. "Europe's Tired, Poor, Huddled Masses: Self-Selection and Economic Outcomes in the Age of Mass Migration," American Economic Review, American Economic Association, vol. 102(5), pages 1832-1856, August.
    28. Leah Platt Boustan & Carola Frydman & Robert A. Margo, 2014. "Human Capital in History: The American Record," NBER Books, National Bureau of Economic Research, Inc, number bous12-1, May.
    29. Solon, Gary, 1999. "Intergenerational mobility in the labor market," Handbook of Labor Economics, in: O. Ashenfelter & D. Card (ed.), Handbook of Labor Economics, edition 1, volume 3, chapter 29, pages 1761-1800, Elsevier.
    30. Collins, William J. & Wanamaker, Marianne H., 2015. "The Great Migration in Black and White: New Evidence on the Selection and Sorting of Southern Migrants," The Journal of Economic History, Cambridge University Press, vol. 75(4), pages 947-992, December.
    31. Maria J. Wisselgren & S�ren Edvinsson & Mats Berggren & Maria Larsson, 2014. "Testing Methods of Record Linkage on Swedish Censuses," Historical Methods: A Journal of Quantitative and Interdisciplinary History, Taylor & Francis Journals, vol. 47(3), pages 138-151, September.
    32. Bhashkar Mazumder, 2015. "Estimating the Intergenerational Elasticity and Rank Association in the U.S.: Overcoming the Current Limitations of Tax Data," Working Paper Series WP-2015-4, Federal Reserve Bank of Chicago.
    33. Ran Abramitzky & Leah Platt Boustan & Katherine Eriksson, 2014. "A Nation of Immigrants: Assimilation and Economic Outcomes in the Age of Mass Migration," Journal of Political Economy, University of Chicago Press, vol. 122(3), pages 467-506.
    34. Otis Duncan, 1968. "Patterns of occupational mobility among Negro men," Demography, Springer;Population Association of America (PAA), vol. 5(1), pages 11-22, March.
    35. Solon, Gary, 1992. "Intergenerational Income Mobility in the United States," American Economic Review, American Economic Association, vol. 82(3), pages 393-408, June.
    36. Jørgen Modalsli, 2017. "Intergenerational Mobility in Norway, 1865–2011," Scandinavian Journal of Economics, Wiley Blackwell, vol. 119(1), pages 34-71, January.
    37. Zimmerman, David J, 1992. "Regression toward Mediocrity in Economic Stature," American Economic Review, American Economic Association, vol. 82(3), pages 409-429, June.
    38. Gunky Kim & Raymond Chambers, 2012. "Regression Analysis under Probabilistic Multi‐Linkage," Statistica Neerlandica, Netherlands Society for Statistics and Operations Research, vol. 66(1), pages 64-79, February.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Zachary Ward, 2023. "Intergenerational Mobility in American History: Accounting for Race and Measurement Error," American Economic Review, American Economic Association, vol. 113(12), pages 3213-3248, December.
    2. Ran Abramitzky & Leah Boustan & Katherine Eriksson & James Feigenbaum & Santiago Pérez, 2021. "Automated Linking of Historical Data," Journal of Economic Literature, American Economic Association, vol. 59(3), pages 865-918, September.
    3. Brantly Callaway & Weige Huang, 2020. "Distributional Effects of a Continuous Treatment with an Application on Intergenerational Mobility," Oxford Bulletin of Economics and Statistics, Department of Economics, University of Oxford, vol. 82(4), pages 808-842, August.
    4. Martin Nybom & Jan Stuhler, 2017. "Biases in Standard Measures of Intergenerational Income Dependence," Journal of Human Resources, University of Wisconsin Press, vol. 52(3), pages 800-825.
    5. Markus Jäntti & Stephen P. Jenkins, 2013. "Income Mobility," SOEPpapers on Multidisciplinary Panel Data Research 607, DIW Berlin, The German Socio-Economic Panel (SOEP).
    6. Catherine G. Massey, 2016. "Playing with Matches: An Assessment of Accuracy in Linked Historical Data," CARRA Working Papers 2016-05, Center for Economic Studies, U.S. Census Bureau.
    7. Elisa Jácome & Ilyana Kuziemko & Suresh Naidu, 2021. "Mobility for All: Representative Intergenerational Mobility Estimates over the 20th Century," Working Papers 302, Princeton University, Department of Economics, Center for Economic Policy Studies..
    8. Chelsea Murray & Robert Graham Clark & Silvia Mendolia & Peter Siminski, 2018. "Direct Measures of Intergenerational Income Mobility for Australia," The Economic Record, The Economic Society of Australia, vol. 94(307), pages 445-468, December.
    9. Florencia Torche, 2015. "Analyses of Intergenerational Mobility," The ANNALS of the American Academy of Political and Social Science, , vol. 657(1), pages 37-62, January.
    10. Bhashkar Mazumder, 2018. "Intergenerational Mobility in the United States: What We Have Learned from the PSID," The ANNALS of the American Academy of Political and Social Science, , vol. 680(1), pages 213-234, November.
    11. Huang, Xiao & Huang, Shoujun & Shui, Ailun, 2021. "Government spending and intergenerational income mobility: Evidence from China," Journal of Economic Behavior & Organization, Elsevier, vol. 191(C), pages 387-414.
    12. Chu, Luke Yu-Wei & Lin, Ming-Jen, 2016. "Economic development and intergenerational earnings mobility: Evidence from Taiwan," Working Paper Series 19495, Victoria University of Wellington, School of Economics and Finance.
    13. Inwood, Kris & Minns, Chris & Summerfield, Fraser, 2019. "Occupational income scores and immigrant assimilation. Evidence from the Canadian census," Explorations in Economic History, Elsevier, vol. 72(C), pages 114-122.
    14. Michelle M. Miller & Frank McIntyre, 2020. "Does Money Matter for Intergenerational Income Transmission?," Southern Economic Journal, John Wiley & Sons, vol. 86(3), pages 941-970, January.
    15. Galassi, Gabriela & Koll, David & Mayr, Lukas, 2019. "The Intergenerational Correlation of Employment: Is There a Role for Work Culture?," IZA Discussion Papers 12595, Institute of Labor Economics (IZA).
    16. Chenhong Peng & Paul Siu Fai Yip & Yik Wa Law, 2019. "Intergenerational Earnings Mobility and Returns to Education in Hong Kong: A Developed Society with High Economic Inequality," Social Indicators Research: An International and Interdisciplinary Journal for Quality-of-Life Measurement, Springer, vol. 143(1), pages 133-156, May.
    17. Gabriela Galassi & David Koll & Lukas Mayr, 2019. "The Intergenerational Correlation of Employment: Is There a Role for Work Culture?," Staff Working Papers 19-33, Bank of Canada.
    18. Tharcisio Leone, 2019. "The Geography of Intergenerational Mobility: Evidence of Educational Persistence and the “Great Gatsby Curve" in Brazil," Documentos de Trabajo 17526, The Latin American and Caribbean Economic Association (LACEA).
    19. Jaehyun Nam, 2021. "Does Economic Inequality Constrain Intergenerational Economic Mobility? The Association Between Income Inequality During Childhood and Intergenerational Income Persistence in the United States," Social Indicators Research: An International and Interdisciplinary Journal for Quality-of-Life Measurement, Springer, vol. 154(2), pages 469-488, April.
    20. Leone, Tharcisio, 2019. "The geography of intergenerational mobility: Evidence of educational persistence and the "Great Gatsby Curve" in Brazil," GIGA Working Papers 318, GIGA German Institute of Global and Area Studies.

    More about this item

    JEL classification:

    • J62 - Labor and Demographic Economics - - Mobility, Unemployment, Vacancies, and Immigrant Workers - - - Job, Occupational and Intergenerational Mobility; Promotion
    • N0 - Economic History - - General

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nbr:nberwo:24019. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: the person in charge (email available below). General contact details of provider: https://edirc.repec.org/data/nberrus.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.