IDEAS home Printed from https://ideas.repec.org/p/cen/cpaper/2016-05.html
   My bibliography  Save this paper

Playing with Matches: An Assessment of Accuracy in Linked Historical Data

Author

Listed:
  • Catherine G. Massey

Abstract

This paper evaluates linkage quality achieved by various record linkage techniques used in historical demography. I create benchmark, or truth, data by linking the 2005 Current Population Survey Annual Social and Economic Supplement to the Social Security Administration’s Numeric Identification System by Social Security Number. By comparing simulated linkages to the benchmark data, I examine the value added (in terms of number and quality of links) from incorporating text-string comparators, adjusting age, and using a probabilistic matching algorithm. I find that text-string comparators and probabilistic approaches are useful for increasing the linkage rate, but use of text-string comparators may decrease accuracy in some cases. Overall, probabilistic matching offers the best balance between linkage rates and accuracy.

Suggested Citation

  • Catherine G. Massey, 2016. "Playing with Matches: An Assessment of Accuracy in Linked Historical Data," CARRA Working Papers 2016-05, Center for Economic Studies, U.S. Census Bureau.
  • Handle: RePEc:cen:cpaper:2016-05
    as

    Download full text from publisher

    File URL: https://www.census.gov/content/dam/Census/library/working-papers/2016/adrm/carra-wp-2016-05.pdf
    File Function: First version, 2016
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Leah Platt Boustan & Matthew E. Kahn & Paul W. Rhode, 2012. "Moving to Higher Ground: Migration Response to Natural Disasters in the Early Twentieth Century," American Economic Review, American Economic Association, vol. 102(3), pages 238-244, May.
    2. Ran Abramitzky & Leah Platt Boustan & Katherine Eriksson, 2012. "Europe's Tired, Poor, Huddled Masses: Self-Selection and Economic Outcomes in the Age of Mass Migration," American Economic Review, American Economic Association, vol. 102(5), pages 1832-1856, August.
    3. Leah Platt Boustan & William J. Collins, 2014. "The Origin and Persistence of Black-White Differences in Women's Labor Force Participation," NBER Chapters, in: Human Capital in History: The American Record, pages 205-240, National Bureau of Economic Research, Inc.
    4. Jason Long & Joseph Ferrie, 2013. "Intergenerational Occupational Mobility in Great Britain and the United States since 1850," American Economic Review, American Economic Association, vol. 103(4), pages 1109-1137, June.
    5. Irma Elo & Samuel Preston, 1994. "Estimating African-American mortality from inaccurate data," Demography, Springer;Population Association of America (PAA), vol. 31(3), pages 427-458, August.
    6. William J. Collins & Marianne H. Wanamaker, 2014. "Selection and Economic Gains in the Great Migration of African Americans: New Evidence from Linked Census Data," American Economic Journal: Applied Economics, American Economic Association, vol. 6(1), pages 220-252, January.
    7. Abowd, John M. & Vilhuber, Lars, 2005. "The Sensitivity of Economic Statistics to Coding Errors in Personal Identifiers," Journal of Business & Economic Statistics, American Statistical Association, vol. 23, pages 133-152, April.
    8. Jason Long & Joseph Ferrie, 2013. "Intergenerational Occupational Mobility in Great Britain and the United States since 1850: Reply," American Economic Review, American Economic Association, vol. 103(5), pages 2041-2049, August.
    9. Maloney, Thomas N., 2001. "Migration and Economic Opportunity in the 1910s: New Evidence on African-American Occupational Mobility in the North," Explorations in Economic History, Elsevier, vol. 38(1), pages 147-165, January.
    10. A'Hearn, Brian & Baten, Jörg & Crayen, Dorothee, 2009. "Quantifying Quantitative Literacy: Age Heaping and the History of Human Capital," The Journal of Economic History, Cambridge University Press, vol. 69(3), pages 783-808, September.
    11. Long, Jason, 2006. "The Socioeconomic Return to Primary Schooling in Victorian England," The Journal of Economic History, Cambridge University Press, vol. 66(4), pages 1026-1053, December.
    12. Abramitzky, Ran & Boustan, Leah Platt & Eriksson, Katherine, 2013. "Have the poor always been less likely to migrate? Evidence from inheritance practices during the age of mass migration," Journal of Development Economics, Elsevier, vol. 102(C), pages 2-14.
    13. Ran Abramitzky & Leah Platt Boustan & Katherine Eriksson, 2014. "A Nation of Immigrants: Assimilation and Economic Outcomes in the Age of Mass Migration," Journal of Political Economy, University of Chicago Press, vol. 122(3), pages 467-506.
    14. Carolyn A. Lieble & Sonya Rastogi & Leticia E. Fernandez & James M. Noon & Sharon R. Ennis, 2014. "America’s Churning Races: Race and Ethnic Response Changes between Census 2000 and the 2010 Census," CARRA Working Papers 2014-09, Center for Economic Studies, U.S. Census Bureau.
    15. Kosack, Edward & Ward, Zachary, 2014. "Who Crossed the Border? Self-Selection of Mexican Migrants in the Early Twentieth Century," The Journal of Economic History, Cambridge University Press, vol. 74(4), pages 1015-1044, December.
    16. Long, Jason, 2005. "Rural-Urban Migration and Socioeconomic Mobility in Victorian Britain," The Journal of Economic History, Cambridge University Press, vol. 65(1), pages 1-35, March.
    17. Gunky Kim & Raymond Chambers, 2012. "Regression Analysis under Probabilistic Multi‐Linkage," Statistica Neerlandica, Netherlands Society for Statistics and Operations Research, vol. 66(1), pages 64-79, February.
    18. James Feigenbaum, 2014. "JAROWINKLER: Stata module to calculate the Jaro-Winkler distance between strings," Statistical Software Components S457850, Boston College Department of Economics, revised 13 Oct 2016.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Dupont, Brandon & Rosenbloom, Joshua L., 2018. "The economic origins of the postwar southern elite," Explorations in Economic History, Elsevier, vol. 68(C), pages 119-131.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Martha J. Bailey & Connor Cole & Morgan Henderson & Catherine Massey, 2020. "How Well Do Automated Linking Methods Perform? Lessons from US Historical Data," Journal of Economic Literature, American Economic Association, vol. 58(4), pages 997-1044, December.
    2. Ran Abramitzky & Roy Mill & Santiago Pérez, 2020. "Linking individuals across historical sources: A fully automated approach," Historical Methods: A Journal of Quantitative and Interdisciplinary History, Taylor & Francis Journals, vol. 53(2), pages 94-111, April.
    3. Ran Abramitzky & Leah Boustan & Katherine Eriksson & James Feigenbaum & Santiago Pérez, 2021. "Automated Linking of Historical Data," Journal of Economic Literature, American Economic Association, vol. 59(3), pages 865-918, September.
    4. Dribe, Martin & Eriksson, Björn & Scalone, Francesco, 2019. "Migration, marriage and social mobility: Women in Sweden 1880–1900," Explorations in Economic History, Elsevier, vol. 71(C), pages 93-111.
    5. Krzysztof Karbownik & Anthony Wray, 2019. "Long-Run Consequences of Exposure to Natural Disasters," Journal of Labor Economics, University of Chicago Press, vol. 37(3), pages 949-1007.
    6. Combes, Pierre-Philippe & Gobillon, Laurent & Zylberberg, Yanos, 2022. "Urban economics in a historical perspective: Recovering data with machine learning," Regional Science and Urban Economics, Elsevier, vol. 94(C).
    7. Collins, William J. & Zimran, Ariell, 2019. "The economic assimilation of Irish Famine migrants to the United States," Explorations in Economic History, Elsevier, vol. 74(C).
    8. Julián Costas-Fernández & José-Alberto Guerra & Myra Mohnen, 2020. "Train to Opportunity: the Effect of Infrastructure on Intergenerational Mobility," Documentos CEDE 18591, Universidad de los Andes, Facultad de Economía, CEDE.
    9. David S. Johnson & Catherine Massey & Amy O’Hara, 2015. "The Opportunities and Challenges of Using Administrative Data Linkages to Evaluate Mobility," The ANNALS of the American Academy of Political and Social Science, , vol. 657(1), pages 247-264, January.
    10. Karen Clay & Ethan J. Schmick, 2020. "The Impact of an Environmental Shock on Black-White Inequality: Evidence from the Boll Weevil," NBER Working Papers 27101, National Bureau of Economic Research, Inc.
    11. Catherine G. Massey, 2014. "Creating Linked Historical Data: An Assessment of the Census Bureau’s Ability to Assign Protected Identification Keys to the 1960 Census," CARRA Working Papers 2014-12, Center for Economic Studies, U.S. Census Bureau.
    12. Combes, Pierre-Philippe & Gobillon, Laurent & Zylberberg, Yanos, 2022. "Urban economics in a historical perspective: Recovering data with machine learning," Regional Science and Urban Economics, Elsevier, vol. 94(C).
    13. Long, Jason & Siu, Henry, 2018. "Refugees from Dust and Shrinking Land: Tracking the Dust Bowl Migrants," The Journal of Economic History, Cambridge University Press, vol. 78(4), pages 1001-1033, December.
    14. Zachary Ward, 2019. "Internal Migration, Education and Upward Rank Mobility:Evidence from American History," CEH Discussion Papers 04, Centre for Economic History, Research School of Economics, Australian National University.
    15. Catron, Peter, 2017. "The Citizenship Advantage: Immigrant Socioeconomic Attainment across Generations in the First Half of the Twentieth Century," SocArXiv c7k45, Center for Open Science.
    16. Dylan Shane Connor & Michael Storper, 2020. "The changing geography of social mobility in the United States," Proceedings of the National Academy of Sciences, Proceedings of the National Academy of Sciences, vol. 117(48), pages 30309-30317, December.
    17. Christina Diaz & Jennifer Lee, 2023. "Segmented assimilation and mobility among men in the early 20th century," Demographic Research, Max Planck Institute for Demographic Research, Rostock, Germany, vol. 48(5), pages 107-152.
    18. Jørgen Modalsli, 2017. "Intergenerational Mobility in Norway, 1865–2011," Scandinavian Journal of Economics, Wiley Blackwell, vol. 119(1), pages 34-71, January.
    19. Collins, William J. & Wanamaker, Marianne H., 2015. "The Great Migration in Black and White: New Evidence on the Selection and Sorting of Southern Migrants," The Journal of Economic History, Cambridge University Press, vol. 75(4), pages 947-992, December.
    20. Inwood, Kris & Minns, Chris & Summerfield, Fraser, 2019. "Occupational income scores and immigrant assimilation. Evidence from the Canadian census," Explorations in Economic History, Elsevier, vol. 72(C), pages 114-122.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:cen:cpaper:2016-05. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Dawn Anderson (email available below). General contact details of provider: https://edirc.repec.org/data/cesgvus.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.