IDEAS home Printed from https://ideas.repec.org/p/nbr/nberwo/25825.html
   My bibliography  Save this paper

Automated Linking of Historical Data

Author

Listed:
  • Ran Abramitzky
  • Leah Platt Boustan
  • Katherine Eriksson
  • James J. Feigenbaum
  • Santiago Pérez

Abstract

The recent digitization of complete count census data is an extraordinary opportunity for social scientists to create large longitudinal datasets by linking individuals from one census to another or from other sources to the census. We evaluate different automated methods for record linkage, performing a series of comparisons across methods and against hand linking. We have three main findings that lead us to conclude that automated methods perform well. First, a number of automated methods generate very low (less than 5%) false positive rates. The automated methods trace out a frontier illustrating the tradeoff between the false positive rate and the (true) match rate. Relative to more conservative automated algorithms, humans tend to link more observations but at a cost of higher rates of false positives. Second, when human linkers and algorithms use the same linking variables, there is relatively little disagreement between them. Third, across a number of plausible analyses, coefficient estimates and parameters of interest are very similar when using linked samples based on each of the different automated methods. We provide code and Stata commands to implement the various automated methods.

Suggested Citation

  • Ran Abramitzky & Leah Platt Boustan & Katherine Eriksson & James J. Feigenbaum & Santiago Pérez, 2019. "Automated Linking of Historical Data," NBER Working Papers 25825, National Bureau of Economic Research, Inc.
  • Handle: RePEc:nbr:nberwo:25825
    Note: DAE TWP
    as

    Download full text from publisher

    File URL: http://www.nber.org/papers/w25825.pdf
    Download Restriction: no
    ---><---

    Other versions of this item:

    References listed on IDEAS

    as
    1. Ferrie, Joseph P., 1997. "The Entry into the U.S. Labor Market of Antebellum European Immigrants, 1840-1860," Explorations in Economic History, Elsevier, vol. 34(3), pages 295-330, July.
    2. Ran Abramitzky & Leah Platt Boustan & Katherine Eriksson, 2012. "Europe's Tired, Poor, Huddled Masses: Self-Selection and Economic Outcomes in the Age of Mass Migration," American Economic Review, American Economic Association, vol. 102(5), pages 1832-1856, August.
    3. Hoyt Bleakley & Joseph Ferrie, 2016. "Shocking Behavior: Random Wealth in Antebellum Georgia and Human Capital Across Generations," The Quarterly Journal of Economics, Oxford University Press, vol. 131(3), pages 1455-1495.
    4. Pérez, Santiago, 2019. "Intergenerational Occupational Mobility across Three Continents," The Journal of Economic History, Cambridge University Press, vol. 79(2), pages 383-416, June.
    5. Claudia Goldin & Lawrence F. Katz, 1999. "Education and Income in the Early 20th Century: Evidence from the Prairies," NBER Working Papers 7217, National Bureau of Economic Research, Inc.
    6. Eli, Shari & Salisbury, Laura, 2016. "Patronage Politics and the Development of the Welfare State: Confederate Pensions in the American South," The Journal of Economic History, Cambridge University Press, vol. 76(4), pages 1078-1112, December.
    7. Jason Long & Joseph Ferrie, 2013. "Intergenerational Occupational Mobility in Great Britain and the United States since 1850," American Economic Review, American Economic Association, vol. 103(4), pages 1109-1137, June.
    8. Collins, William J. & Wanamaker, Marianne H., 2015. "The Great Migration in Black and White: New Evidence on the Selection and Sorting of Southern Migrants," The Journal of Economic History, Cambridge University Press, vol. 75(4), pages 947-992, December.
    9. Dora L. Costa & Heather DeSomer & Eric Hanss & Christopher Roudiez & Sven E. Wilson & Noelle Yetter, 2017. "Union Army veterans, all grown up," Historical Methods: A Journal of Quantitative and Interdisciplinary History, Taylor & Francis Journals, vol. 50(2), pages 79-95, April.
    10. Claudia Goldin & Lawrence F. Katz, 2007. "Long-Run Changes in the Wage Structure: Narrowing, Widening, Polarizing," Brookings Papers on Economic Activity, Economic Studies Program, The Brookings Institution, vol. 38(2), pages 135-168.
    11. William J. Collins & Marianne H. Wanamaker, 2014. "Selection and Economic Gains in the Great Migration of African Americans: New Evidence from Linked Census Data," American Economic Journal: Applied Economics, American Economic Association, vol. 6(1), pages 220-252, January.
    12. Ran Abramitzky & Leah Boustan & Katherine Eriksson, 2019. "To the New World and Back Again: Return Migrants in the Age of Mass Migration," ILR Review, Cornell University, ILR School, vol. 72(2), pages 300-322, March.
    13. Anna Aizer & Shari Eli & Joseph Ferrie & Adriana Lleras-Muney, 2016. "The Long-Run Impact of Cash Transfers to Poor Families," American Economic Review, American Economic Association, vol. 106(4), pages 935-971, April.
    14. Goldin, Claudia & Katz, Lawrence F., 2000. "Education and Income in the Early Twentieth Century: Evidence from the Prairies," The Journal of Economic History, Cambridge University Press, vol. 60(3), pages 782-818, September.
    15. Bandiera, Oriana & Rasul, Imran & Viarengo, Martina, 2013. "The Making of Modern America: Migratory Flows in the Age of Mass Migration," Journal of Development Economics, Elsevier, vol. 102(C), pages 23-47.
    16. James J. Feigenbaum, 2018. "Multiple Measures of Historical Intergenerational Mobility: Iowa 1915 to 1940," Economic Journal, Royal Economic Society, vol. 128(612), pages 446-481, July.
    17. Jason Long & Joseph Ferrie, 2013. "Intergenerational Occupational Mobility in Great Britain and the United States since 1850: Reply," American Economic Review, American Economic Association, vol. 103(5), pages 2041-2049, August.
    18. Richard Hornbeck & Suresh Naidu, 2014. "When the Levee Breaks: Black Migration and Economic Development in the American South," American Economic Review, American Economic Association, vol. 104(3), pages 963-990, March.
    19. Salisbury, Laura, 2014. "Selective migration, wages, and occupational mobility in nineteenth century America," Explorations in Economic History, Elsevier, vol. 53(C), pages 40-63.
    20. Joseph Price & Kasey Buckles & Jacob Van Leeuwen & Isaac Riley, 2019. "Combining Family History and Machine Learning to Link Historical Records," NBER Working Papers 26227, National Bureau of Economic Research, Inc.
    21. Ran Abramitzky & Leah Platt Boustan & Katherine Eriksson, 2014. "A Nation of Immigrants: Assimilation and Economic Outcomes in the Age of Mass Migration," Journal of Political Economy, University of Chicago Press, vol. 122(3), pages 467-506.
    22. Kosack, Edward & Ward, Zachary, 2014. "Who Crossed the Border? Self-Selection of Mexican Migrants in the Early Twentieth Century," The Journal of Economic History, Cambridge University Press, vol. 74(4), pages 1015-1044, December.
    23. Martha J. Bailey & Connor Cole & Morgan Henderson & Catherine Massey, 2020. "How Well Do Automated Linking Methods Perform? Lessons from US Historical Data," Journal of Economic Literature, American Economic Association, vol. 58(4), pages 997-1044, December.
    24. Jørgen Modalsli, 2017. "Intergenerational Mobility in Norway, 1865–2011," Scandinavian Journal of Economics, Wiley Blackwell, vol. 119(1), pages 34-71, January.
    25. William J. Collins & Marianne H. Wanamaker, 2017. "African American Intergenerational Economic Mobility Since 1880," NBER Working Papers 23395, National Bureau of Economic Research, Inc.
    26. Ashenfelter, Orley & Krueger, Alan B, 1994. "Estimates of the Economic Returns to Schooling from a New Sample of Twins," American Economic Review, American Economic Association, vol. 84(5), pages 1157-1173, December.
    27. Parman, John, 2015. "Childhood health and sibling outcomes: Nurture Reinforcing nature during the 1918 influenza pandemic," Explorations in Economic History, Elsevier, vol. 58(C), pages 22-43.
    28. Karen Mason & Lisa Cope, 1987. "Sources of age and date-of-birth misreporting in the 1900 U.S. census," Demography, Springer;Population Association of America (PAA), vol. 24(4), pages 563-573, November.
    29. Eriksson, Katherine, 2019. "Moving North and into jail? The great migration and black incarceration," Journal of Economic Behavior & Organization, Elsevier, vol. 159(C), pages 526-538.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Ran Abramitzky & Roy Mill & Santiago Pérez, 2020. "Linking individuals across historical sources: A fully automated approach," Historical Methods: A Journal of Quantitative and Interdisciplinary History, Taylor & Francis Journals, vol. 53(2), pages 94-111, April.
    2. Collins, William J. & Zimran, Ariell, 2019. "The economic assimilation of Irish Famine migrants to the United States," Explorations in Economic History, Elsevier, vol. 74(C).
    3. Martha J. Bailey & Connor Cole & Morgan Henderson & Catherine Massey, 2020. "How Well Do Automated Linking Methods Perform? Lessons from US Historical Data," Journal of Economic Literature, American Economic Association, vol. 58(4), pages 997-1044, December.
    4. Karen Clay & Ethan J. Schmick & Werner Troesken, 2020. "The Boll Weevil’s Impact on Racial Income Gaps in the Early Twentieth Century," NBER Working Papers 27101, National Bureau of Economic Research, Inc.
    5. Zachary Ward, 2019. "Internal Migration, Education and Upward Rank Mobility:Evidence from American History," CEH Discussion Papers 04, Centre for Economic History, Research School of Economics, Australian National University.
    6. Combes, Pierre-Philippe & Gobillon, Laurent & Zylberberg, Yanos, 2022. "Urban economics in a historical perspective: Recovering data with machine learning," Regional Science and Urban Economics, Elsevier, vol. 94(C).
    7. Dribe, Martin & Eriksson, Björn & Scalone, Francesco, 2019. "Migration, marriage and social mobility: Women in Sweden 1880–1900," Explorations in Economic History, Elsevier, vol. 71(C), pages 93-111.
    8. Zachary Ward, 2019. "Intergenerational Mobility in American History: Accounting for Race and Measurement Error," CEH Discussion Papers 10, Centre for Economic History, Research School of Economics, Australian National University.
    9. Combes, Pierre-Philippe & Gobillon, Laurent & Zylberberg, Yanos, 2022. "Urban economics in a historical perspective: Recovering data with machine learning," Regional Science and Urban Economics, Elsevier, vol. 94(C).
    10. Abramitzky, Ran & Boustan, Leah & Catron, Peter & Connor, Dylan & Voigt, Rob, 2021. "Refugees without Assistance: English-Language Attainment and Economic Outcomes in the Early Twentieth Century," SocArXiv 429jp, Center for Open Science.
    11. Dylan Shane Connor & Michael Storper, 2020. "The changing geography of social mobility in the United States," Proceedings of the National Academy of Sciences, Proceedings of the National Academy of Sciences, vol. 117(48), pages 30309-30317, December.
    12. Ran Abramitzky, 2015. "Economics and the Modern Economic Historian," NBER Working Papers 21636, National Bureau of Economic Research, Inc.
    13. James J. Feigenbaum & Hui Ren Tan, 2019. "The Return to Education in the Mid-20th Century: Evidence from Twins," NBER Working Papers 26407, National Bureau of Economic Research, Inc.
    14. Ran Abramitzky & Leah Platt Boustan & Elisa Jácome & Santiago Pérez, 2019. "Intergenerational Mobility of Immigrants over Two Centuries," Working Papers 2019-6, Princeton University. Economics Department..
    15. Catherine G. Massey, 2016. "Playing with Matches: An Assessment of Accuracy in Linked Historical Data," CARRA Working Papers 2016-05, Center for Economic Studies, U.S. Census Bureau.
    16. Krzysztof Karbownik & Anthony Wray, 2019. "Long-Run Consequences of Exposure to Natural Disasters," Journal of Labor Economics, University of Chicago Press, vol. 37(3), pages 949-1007.
    17. Catron, Peter, 2017. "The Citizenship Advantage: Immigrant Socioeconomic Attainment across Generations in the First Half of the Twentieth Century," SocArXiv c7k45, Center for Open Science.
    18. Elisa Jácome & Ilyana Kuziemko & Suresh Naidu, 2021. "Mobility for All: Representative Intergenerational Mobility Estimates over the 20th Century," Working Papers 302, Princeton University, Department of Economics, Center for Economic Policy Studies..
    19. Cavit Baran & Eric Chyn & Bryan A. Stuart, 2022. "The Great Migration and Educational Opportunity," Upjohn Working Papers 22-367, W.E. Upjohn Institute for Employment Research.
    20. David Andersson & Mounir Karadja & Erik Prawitz, 2022. "Mass Migration and Technological Change [“Immigration in American Economic History.”]," Journal of the European Economic Association, European Economic Association, vol. 20(5), pages 1859-1896.

    More about this item

    JEL classification:

    • C81 - Mathematical and Quantitative Methods - - Data Collection and Data Estimation Methodology; Computer Programs - - - Methodology for Collecting, Estimating, and Organizing Microeconomic Data; Data Access
    • N0 - Economic History - - General

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nbr:nberwo:25825. See general information about how to correct material in RePEc.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: . General contact details of provider: https://edirc.repec.org/data/nberrus.html .

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: (email available below). General contact details of provider: https://edirc.repec.org/data/nberrus.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service hosted by the Research Division of the Federal Reserve Bank of St. Louis . RePEc uses bibliographic data supplied by the respective publishers.