IDEAS home Printed from https://ideas.repec.org/p/nbr/nberwo/24324.html
   My bibliography  Save this paper

Linking Individuals Across Historical Sources: a Fully Automated Approach

Author

Listed:
  • Ran Abramitzky
  • Roy Mill
  • Santiago Pérez

Abstract

Linking individuals across historical datasets relies on information such as name and age that is both non-unique and prone to enumeration and transcription errors. These errors make it impossible to find the correct match with certainty. In the first part of the paper, we suggest a fully automated probabilistic method for linking historical datasets that enables researchers to create samples at the frontier of minimizing type I (false positives) and type II (false negatives) errors. The first step guides researchers in the choice of which variables to use for linking. The second step uses the Expectation-Maximization (EM) algorithm, a standard tool in statistics, to compute the probability that each two records correspond to the same individual. The third step suggests how to use these estimated probabilities to choose which records to use in the analysis. In the second part of the paper, we apply the method to link historical population censuses in the US and Norway, and use these samples to estimate measures of intergenerational occupational mobility. The estimates using our method are remarkably similar to the ones using IPUMS’, which relies on hand linking to create a training sample. We created an R code and a Stata command that implement this method.

Suggested Citation

  • Ran Abramitzky & Roy Mill & Santiago Pérez, 2018. "Linking Individuals Across Historical Sources: a Fully Automated Approach," NBER Working Papers 24324, National Bureau of Economic Research, Inc.
  • Handle: RePEc:nbr:nberwo:24324
    Note: AG DAE LS
    as

    Download full text from publisher

    File URL: http://www.nber.org/papers/w24324.pdf
    Download Restriction: no
    ---><---

    Other versions of this item:

    References listed on IDEAS

    as
    1. Ferrie, Joseph P., 1997. "The Entry into the U.S. Labor Market of Antebellum European Immigrants, 1840-1860," Explorations in Economic History, Elsevier, vol. 34(3), pages 295-330, July.
    2. Jason Long & Joseph Ferrie, 2013. "Intergenerational Occupational Mobility in Great Britain and the United States since 1850: Reply," American Economic Review, American Economic Association, vol. 103(5), pages 2041-2049, August.
    3. Shari Eli & Laura Salisbury & Allison Shertzer, 2016. "Migration Responses to Conflict: Evidence from the Border of the American Civil War," NBER Working Papers 22591, National Bureau of Economic Research, Inc.
    4. Kosack, Edward & Ward, Zachary, 2014. "Who Crossed the Border? Self-Selection of Mexican Migrants in the Early Twentieth Century," The Journal of Economic History, Cambridge University Press, vol. 74(4), pages 1015-1044, December.
    5. Hoyt Bleakley & Joseph P. Ferrie, 2013. "Up from Poverty? The 1832 Cherokee Land Lottery and the Long-run Distribution of Wealth," NBER Working Papers 19175, National Bureau of Economic Research, Inc.
    6. William J. Collins & Marianne H. Wanamaker, 2014. "Selection and Economic Gains in the Great Migration of African Americans: New Evidence from Linked Census Data," American Economic Journal: Applied Economics, American Economic Association, vol. 6(1), pages 220-252, January.
    7. Richard Hornbeck & Suresh Naidu, 2014. "When the Levee Breaks: Black Migration and Economic Development in the American South," American Economic Review, American Economic Association, vol. 104(3), pages 963-990, March.
    8. Catherine G. Massey, 2017. "Playing with matches: An assessment of accuracy in linked historical data," Historical Methods: A Journal of Quantitative and Interdisciplinary History, Taylor & Francis Journals, vol. 50(3), pages 129-143, July.
    9. Ran Abramitzky & Leah Platt Boustan & Katherine Eriksson, 2012. "Europe's Tired, Poor, Huddled Masses: Self-Selection and Economic Outcomes in the Age of Mass Migration," American Economic Review, American Economic Association, vol. 102(5), pages 1832-1856, August.
    10. Jason Long & Joseph Ferrie, 2013. "Intergenerational Occupational Mobility in Great Britain and the United States since 1850," American Economic Review, American Economic Association, vol. 103(4), pages 1109-1137, June.
    11. Anna Aizer & Shari Eli & Joseph Ferrie & Adriana Lleras-Muney, 2016. "The Long-Run Impact of Cash Transfers to Poor Families," American Economic Review, American Economic Association, vol. 106(4), pages 935-971, April.
    12. Anna, Petrenko, 2016. "Мaркування готової продукції як складова частина інформаційного забезпечення маркетингової діяльності підприємств овочепродуктового підкомплексу," Agricultural and Resource Economics: International Scientific E-Journal, Agricultural and Resource Economics: International Scientific E-Journal, vol. 2(1), March.
    13. Salisbury, Laura, 2014. "Selective migration, wages, and occupational mobility in nineteenth century America," Explorations in Economic History, Elsevier, vol. 53(C), pages 40-63.
    14. Long, Jason, 2006. "The Socioeconomic Return to Primary Schooling in Victorian England," The Journal of Economic History, Cambridge University Press, vol. 66(4), pages 1026-1053, December.
    15. Martha J. Bailey & Connor Cole & Morgan Henderson & Catherine Massey, 2020. "How Well Do Automated Linking Methods Perform? Lessons from US Historical Data," Journal of Economic Literature, American Economic Association, vol. 58(4), pages 997-1044, December.
    16. Parman, John, 2015. "Childhood health and sibling outcomes: Nurture Reinforcing nature during the 1918 influenza pandemic," Explorations in Economic History, Elsevier, vol. 58(C), pages 22-43.
    17. Emily Nix & Nancy Qian, 2015. "The Fluidity of Race: “Passing” in the United States, 1880-1940," NBER Working Papers 20828, National Bureau of Economic Research, Inc.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Michele Baggio & Metin Cosgel, 2023. "Racial Diversity and Team Performance: Evidence from the American Offshore Whaling Industry," Working papers 2023-04, University of Connecticut, Department of Economics, revised Feb 2024.
    2. Joseph Price & Kasey Buckles & Jacob Van Leeuwen & Isaac Riley, 2019. "Combining Family History and Machine Learning to Link Historical Records," NBER Working Papers 26227, National Bureau of Economic Research, Inc.
    3. Valerie Michelman & Joseph Price & Seth D Zimmerman, 2022. "Old Boys’ Clubs and Upward Mobility Among the Educational Elite [Do Immigrants Assimilate More Slowly Today Than in the Past?]," The Quarterly Journal of Economics, Oxford University Press, vol. 137(2), pages 845-909.
    4. Bergeaud, Antonin & Verluise, Cyril, 2024. "A new dataset to study a century of innovation in Europe and in the US," Research Policy, Elsevier, vol. 53(1).
    5. Dupraz, Yannick & Ferrara, Andreas, 2021. "Fatherless: The Long-Term Effects of Losing a Father in the U.S. Civil War," CAGE Online Working Paper Series 538, Competitive Advantage in the Global Economy (CAGE).
    6. Dahl, Christian M. & Johansen, Torben S.D. & Sørensen, Emil N. & Wittrock, Simon, 2023. "HANA: A handwritten name database for offline handwritten text recognition," Explorations in Economic History, Elsevier, vol. 87(C).
    7. Anna Aizer & Shari Eli & Adriana Lleras-Muney & Keyoung Lee, 2020. "Do Youth Employment Programs Work? Evidence from the New Deal," NBER Working Papers 27103, National Bureau of Economic Research, Inc.
    8. Narciso, Gaia & Severgnini, Battista, 2023. "The deep roots of rebellion," Journal of Development Economics, Elsevier, vol. 160(C).
    9. Luque de Haro, Víctor A. & Pujadas-Mora, Joana M. & García-Gómez, José J., 2021. "Inequality in mortality in pre-industrial southern Europe during an epidemic episode: socio-economic determinants (eighteenth - nineteenth centuries Spain)," Economics & Human Biology, Elsevier, vol. 40(C).
    10. Bennett, Robert J. & Montebruno, Piero & Van Lieshout, Carry & Smith, Harry, 2022. "Business entry and exit: career changes of proprietors in England and Wales (1851-81) using record-linkage," LSE Research Online Documents on Economics 113867, London School of Economics and Political Science, LSE Library.
    11. Price, Joseph & Buckles, Kasey & Van Leeuwen, Jacob & Riley, Isaac, 2021. "Combining family history and machine learning to link historical records: The Census Tree data set," Explorations in Economic History, Elsevier, vol. 80(C).
    12. Alexander, Monica, 2018. "Deaths without denominators: using a matched dataset to study mortality patterns in the United States," SocArXiv q79ye, Center for Open Science.
    13. Anbinder, Tyler & Connor, Dylan & O Grada, Cormac & Wegge, Simone, 2021. "The Problem of False Positives in Automated Census Linking: Evidence from Nineteenth-Century New York's Irish Immigrants," CAGE Online Working Paper Series 568, Competitive Advantage in the Global Economy (CAGE).
    14. Zhu, Ziming, 2022. "Like father like son? Intergenerational immobility in England, 1851-1911," Economic History Working Papers 117588, London School of Economics and Political Science, Department of Economic History.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Ran Abramitzky & Leah Boustan & Katherine Eriksson & James Feigenbaum & Santiago Pérez, 2021. "Automated Linking of Historical Data," Journal of Economic Literature, American Economic Association, vol. 59(3), pages 865-918, September.
    2. Krzysztof Karbownik & Anthony Wray, 2019. "Long-Run Consequences of Exposure to Natural Disasters," Journal of Labor Economics, University of Chicago Press, vol. 37(3), pages 949-1007.
    3. Collins, William J. & Zimran, Ariell, 2019. "The economic assimilation of Irish Famine migrants to the United States," Explorations in Economic History, Elsevier, vol. 74(C).
    4. Catherine G. Massey, 2016. "Playing with Matches: An Assessment of Accuracy in Linked Historical Data," CARRA Working Papers 2016-05, Center for Economic Studies, U.S. Census Bureau.
    5. Krzysztof Karbownik & Anthony Wray, 2019. "Educational, Labor-market and Intergenerational Consequences of Poor Childhood Health," NBER Working Papers 26368, National Bureau of Economic Research, Inc.
    6. Ran Abramitzky, 2015. "Economics and the Modern Economic Historian," NBER Working Papers 21636, National Bureau of Economic Research, Inc.
    7. Inwood, Kris & Minns, Chris & Summerfield, Fraser, 2019. "Occupational income scores and immigrant assimilation. Evidence from the Canadian census," Explorations in Economic History, Elsevier, vol. 72(C), pages 114-122.
    8. Martha J. Bailey & Connor Cole & Morgan Henderson & Catherine Massey, 2020. "How Well Do Automated Linking Methods Perform? Lessons from US Historical Data," Journal of Economic Literature, American Economic Association, vol. 58(4), pages 997-1044, December.
    9. Karen Clay & Ethan J. Schmick, 2020. "The Impact of an Environmental Shock on Black-White Inequality: Evidence from the Boll Weevil," NBER Working Papers 27101, National Bureau of Economic Research, Inc.
    10. Dora Costa & CoraLee Lewis & Noelle Yetter, 2022. "Children and Grandchildren of Union Army Veterans: New Data Collections to Study the Persistence of Longevity and Socioeconomic Status Across Generations," NBER Working Papers 30747, National Bureau of Economic Research, Inc.
    11. Dribe, Martin & Eriksson, Björn & Scalone, Francesco, 2019. "Migration, marriage and social mobility: Women in Sweden 1880–1900," Explorations in Economic History, Elsevier, vol. 71(C), pages 93-111.
    12. Zachary Ward, 2019. "Internal Migration, Education and Upward Rank Mobility:Evidence from American History," CEH Discussion Papers 04, Centre for Economic History, Research School of Economics, Australian National University.
    13. Combes, Pierre-Philippe & Gobillon, Laurent & Zylberberg, Yanos, 2022. "Urban economics in a historical perspective: Recovering data with machine learning," Regional Science and Urban Economics, Elsevier, vol. 94(C).
    14. Catron, Peter, 2017. "The Citizenship Advantage: Immigrant Socioeconomic Attainment across Generations in the First Half of the Twentieth Century," SocArXiv c7k45, Center for Open Science.
    15. Dylan Shane Connor & Michael Storper, 2020. "The changing geography of social mobility in the United States," Proceedings of the National Academy of Sciences, Proceedings of the National Academy of Sciences, vol. 117(48), pages 30309-30317, December.
    16. Bennett, Robert J. & Montebruno, Piero & Van Lieshout, Carry & Smith, Harry, 2022. "Business entry and exit: career changes of proprietors in England and Wales (1851-81) using record-linkage," LSE Research Online Documents on Economics 113867, London School of Economics and Political Science, LSE Library.
    17. Elisa Jácome & Ilyana Kuziemko & Suresh Naidu, 2021. "Mobility for All: Representative Intergenerational Mobility Estimates over the 20th Century," Working Papers 302, Princeton University, Department of Economics, Center for Economic Policy Studies..
    18. David Andersson & Mounir Karadja & Erik Prawitz, 2022. "Mass Migration and Technological Change," Journal of the European Economic Association, European Economic Association, vol. 20(5), pages 1859-1896.
    19. Julián Costas-Fernández & José-Alberto Guerra & Myra Mohnen, 2020. "Train to Opportunity: the Effect of Infrastructure on Intergenerational Mobility," Documentos CEDE 18591, Universidad de los Andes, Facultad de Economía, CEDE.
    20. Giacomin Favre, 2019. "Bias in social mobility estimates with historical data: evidence from Swiss microdata," ECON - Working Papers 329, Department of Economics - University of Zurich.

    More about this item

    JEL classification:

    • C10 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods and Methodology: General - - - General
    • J01 - Labor and Demographic Economics - - General - - - Labor Economics: General
    • J10 - Labor and Demographic Economics - - Demographic Economics - - - General
    • N00 - Economic History - - General - - - General

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nbr:nberwo:24324. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: the person in charge (email available below). General contact details of provider: https://edirc.repec.org/data/nberrus.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.