IDEAS home Printed from https://ideas.repec.org/a/aea/jeclit/v59y2021i3p865-918.html
   My bibliography  Save this article

Automated Linking of Historical Data

Author

Listed:
  • Ran Abramitzky
  • Leah Boustan
  • Katherine Eriksson
  • James Feigenbaum
  • Santiago Pérez

Abstract

The recent digitization of complete count census data is an extraordinary opportunity for social scientists to create large longitudinal datasets by linking individuals from one census to another or from other sources to the census. We evaluate different automated methods for record linkage, performing a series of comparisons across methods and against hand linking. We have three main findings that lead us to conclude that automated methods perform well. First, a number of automated methods generate very low (less than 5 percent) false positive rates. The automated methods trace out a frontier illustrating the trade-off between the false positive rate and the (true) match rate. Relative to more conservative automated algorithms, humans tend to link more observations but at a cost of higher rates of false positives. Second, when human linkers and algorithms use the same linking variables, there is relatively little disagreement between them. Third, across a number of plausible analyses, coefficient estimates and parameters of interest are very similar when using linked samples based on each of the different automated methods. We provide code and Stata commands to implement the various automated methods.

Suggested Citation

  • Ran Abramitzky & Leah Boustan & Katherine Eriksson & James Feigenbaum & Santiago Pérez, 2021. "Automated Linking of Historical Data," Journal of Economic Literature, American Economic Association, vol. 59(3), pages 865-918, September.
  • Handle: RePEc:aea:jeclit:v:59:y:2021:i:3:p:865-918
    DOI: 10.1257/jel.20201599
    as

    Download full text from publisher

    File URL: https://www.aeaweb.org/doi/10.1257/jel.20201599
    Download Restriction: no

    File URL: https://www.aeaweb.org/journals/data/icpsr-unavailable
    Download Restriction: no

    File URL: https://www.aeaweb.org/doi/10.1257/jel.20201599.ds
    Download Restriction: Access to full text is restricted to AEA members and institutional subscribers.

    File URL: https://libkey.io/10.1257/jel.20201599?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    Other versions of this item:

    References listed on IDEAS

    as
    1. Salisbury, Laura, 2014. "Selective migration, wages, and occupational mobility in nineteenth century America," Explorations in Economic History, Elsevier, vol. 53(C), pages 40-63.
    2. Ferrie, Joseph P., 1997. "The Entry into the U.S. Labor Market of Antebellum European Immigrants, 1840-1860," Explorations in Economic History, Elsevier, vol. 34(3), pages 295-330, July.
    3. Ran Abramitzky & Leah Platt Boustan & Katherine Eriksson, 2012. "Europe's Tired, Poor, Huddled Masses: Self-Selection and Economic Outcomes in the Age of Mass Migration," American Economic Review, American Economic Association, vol. 102(5), pages 1832-1856, August.
    4. Hoyt Bleakley & Joseph Ferrie, 2016. "Shocking Behavior: Random Wealth in Antebellum Georgia and Human Capital Across Generations," The Quarterly Journal of Economics, Oxford University Press, vol. 131(3), pages 1455-1495.
    5. Dora L. Costa & Heather DeSomer & Eric Hanss & Christopher Roudiez & Sven E. Wilson & Noelle Yetter, 2017. "Union Army veterans, all grown up," Historical Methods: A Journal of Quantitative and Interdisciplinary History, Taylor & Francis Journals, vol. 50(2), pages 79-95, April.
    6. Pérez, Santiago, 2019. "Intergenerational Occupational Mobility across Three Continents," The Journal of Economic History, Cambridge University Press, vol. 79(2), pages 383-416, June.
    7. Claudia Goldin & Lawrence F. Katz, 1999. "Education and Income in the Early 20th Century: Evidence from the Prairies," NBER Working Papers 7217, National Bureau of Economic Research, Inc.
    8. Joseph Price & Kasey Buckles & Jacob Van Leeuwen & Isaac Riley, 2019. "Combining Family History and Machine Learning to Link Historical Records," NBER Working Papers 26227, National Bureau of Economic Research, Inc.
    9. Eli, Shari & Salisbury, Laura, 2016. "Patronage Politics and the Development of the Welfare State: Confederate Pensions in the American South," The Journal of Economic History, Cambridge University Press, vol. 76(4), pages 1078-1112, December.
    10. Ran Abramitzky & Leah Platt Boustan & Katherine Eriksson, 2014. "A Nation of Immigrants: Assimilation and Economic Outcomes in the Age of Mass Migration," Journal of Political Economy, University of Chicago Press, vol. 122(3), pages 467-506.
    11. Jason Long & Joseph Ferrie, 2013. "Intergenerational Occupational Mobility in Great Britain and the United States since 1850," American Economic Review, American Economic Association, vol. 103(4), pages 1109-1137, June.
    12. Collins, William J. & Wanamaker, Marianne H., 2015. "The Great Migration in Black and White: New Evidence on the Selection and Sorting of Southern Migrants," The Journal of Economic History, Cambridge University Press, vol. 75(4), pages 947-992, December.
    13. Kosack, Edward & Ward, Zachary, 2014. "Who Crossed the Border? Self-Selection of Mexican Migrants in the Early Twentieth Century," The Journal of Economic History, Cambridge University Press, vol. 74(4), pages 1015-1044, December.
    14. Claudia Goldin & Lawrence F. Katz, 2007. "Long-Run Changes in the Wage Structure: Narrowing, Widening, Polarizing," Brookings Papers on Economic Activity, Economic Studies Program, The Brookings Institution, vol. 38(2), pages 135-168.
    15. Martha J. Bailey & Connor Cole & Morgan Henderson & Catherine Massey, 2020. "How Well Do Automated Linking Methods Perform? Lessons from US Historical Data," Journal of Economic Literature, American Economic Association, vol. 58(4), pages 997-1044, December.
    16. Jørgen Modalsli, 2017. "Intergenerational Mobility in Norway, 1865–2011," Scandinavian Journal of Economics, Wiley Blackwell, vol. 119(1), pages 34-71, January.
    17. William J. Collins & Marianne H. Wanamaker, 2014. "Selection and Economic Gains in the Great Migration of African Americans: New Evidence from Linked Census Data," American Economic Journal: Applied Economics, American Economic Association, vol. 6(1), pages 220-252, January.
    18. Ran Abramitzky & Leah Boustan & Katherine Eriksson, 2019. "To the New World and Back Again: Return Migrants in the Age of Mass Migration," ILR Review, Cornell University, ILR School, vol. 72(2), pages 300-322, March.
    19. Anna Aizer & Shari Eli & Joseph Ferrie & Adriana Lleras-Muney, 2016. "The Long-Run Impact of Cash Transfers to Poor Families," American Economic Review, American Economic Association, vol. 106(4), pages 935-971, April.
    20. Goldin, Claudia & Katz, Lawrence F., 2000. "Education and Income in the Early Twentieth Century: Evidence from the Prairies," The Journal of Economic History, Cambridge University Press, vol. 60(3), pages 782-818, September.
    21. William J. Collins & Marianne H. Wanamaker, 2017. "African American Intergenerational Economic Mobility Since 1880," NBER Working Papers 23395, National Bureau of Economic Research, Inc.
    22. Bandiera, Oriana & Rasul, Imran & Viarengo, Martina, 2013. "The Making of Modern America: Migratory Flows in the Age of Mass Migration," Journal of Development Economics, Elsevier, vol. 102(C), pages 23-47.
    23. Ashenfelter, Orley & Krueger, Alan B, 1994. "Estimates of the Economic Returns to Schooling from a New Sample of Twins," American Economic Review, American Economic Association, vol. 84(5), pages 1157-1173, December.
    24. Parman, John, 2015. "Childhood health and sibling outcomes: Nurture Reinforcing nature during the 1918 influenza pandemic," Explorations in Economic History, Elsevier, vol. 58(C), pages 22-43.
    25. Karen Mason & Lisa Cope, 1987. "Sources of age and date-of-birth misreporting in the 1900 U.S. census," Demography, Springer;Population Association of America (PAA), vol. 24(4), pages 563-573, November.
    26. James J. Feigenbaum, 2018. "Multiple Measures of Historical Intergenerational Mobility: Iowa 1915 to 1940," Economic Journal, Royal Economic Society, vol. 128(612), pages 446-481, July.
    27. Jason Long & Joseph Ferrie, 2013. "Intergenerational Occupational Mobility in Great Britain and the United States since 1850: Reply," American Economic Review, American Economic Association, vol. 103(5), pages 2041-2049, August.
    28. Richard Hornbeck & Suresh Naidu, 2014. "When the Levee Breaks: Black Migration and Economic Development in the American South," American Economic Review, American Economic Association, vol. 104(3), pages 963-990, March.
    29. Eriksson, Katherine, 2019. "Moving North and into jail? The great migration and black incarceration," Journal of Economic Behavior & Organization, Elsevier, vol. 159(C), pages 526-538.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Ran Abramitzky & Roy Mill & Santiago Pérez, 2020. "Linking individuals across historical sources: A fully automated approach," Historical Methods: A Journal of Quantitative and Interdisciplinary History, Taylor & Francis Journals, vol. 53(2), pages 94-111, April.
    2. Collins, William J. & Zimran, Ariell, 2019. "The economic assimilation of Irish Famine migrants to the United States," Explorations in Economic History, Elsevier, vol. 74(C).
    3. Zachary Ward, 2019. "Internal Migration, Education and Upward Rank Mobility:Evidence from American History," CEH Discussion Papers 04, Centre for Economic History, Research School of Economics, Australian National University.
    4. Martha J. Bailey & Connor Cole & Morgan Henderson & Catherine Massey, 2020. "How Well Do Automated Linking Methods Perform? Lessons from US Historical Data," Journal of Economic Literature, American Economic Association, vol. 58(4), pages 997-1044, December.
    5. Karen Clay & Ethan J. Schmick & Werner Troesken, 2020. "The Boll Weevil’s Impact on Racial Income Gaps in the Early Twentieth Century," NBER Working Papers 27101, National Bureau of Economic Research, Inc.
    6. Combes, Pierre-Philippe & Gobillon, Laurent & Zylberberg, Yanos, 2022. "Urban economics in a historical perspective: Recovering data with machine learning," Regional Science and Urban Economics, Elsevier, vol. 94(C).
    7. Dribe, Martin & Eriksson, Björn & Scalone, Francesco, 2019. "Migration, marriage and social mobility: Women in Sweden 1880–1900," Explorations in Economic History, Elsevier, vol. 71(C), pages 93-111.
    8. Zachary Ward, 2019. "Intergenerational Mobility in American History: Accounting for Race and Measurement Error," CEH Discussion Papers 10, Centre for Economic History, Research School of Economics, Australian National University.
    9. Combes, Pierre-Philippe & Gobillon, Laurent & Zylberberg, Yanos, 2022. "Urban economics in a historical perspective: Recovering data with machine learning," Regional Science and Urban Economics, Elsevier, vol. 94(C).
    10. Abramitzky, Ran & Boustan, Leah & Catron, Peter & Connor, Dylan & Voigt, Rob, 2021. "Refugees without Assistance: English-Language Attainment and Economic Outcomes in the Early Twentieth Century," SocArXiv 429jp, Center for Open Science.
    11. Dylan Shane Connor & Michael Storper, 2020. "The changing geography of social mobility in the United States," Proceedings of the National Academy of Sciences, Proceedings of the National Academy of Sciences, vol. 117(48), pages 30309-30317, December.
    12. Catherine G. Massey, 2016. "Playing with Matches: An Assessment of Accuracy in Linked Historical Data," CARRA Working Papers 2016-05, Center for Economic Studies, U.S. Census Bureau.
    13. Ran Abramitzky, 2015. "Economics and the Modern Economic Historian," NBER Working Papers 21636, National Bureau of Economic Research, Inc.
    14. James J. Feigenbaum & Hui Ren Tan, 2019. "The Return to Education in the Mid-20th Century: Evidence from Twins," NBER Working Papers 26407, National Bureau of Economic Research, Inc.
    15. Ran Abramitzky & Leah Platt Boustan & Elisa Jácome & Santiago Pérez, 2019. "Intergenerational Mobility of Immigrants over Two Centuries," Working Papers 2019-6, Princeton University. Economics Department..
    16. Krzysztof Karbownik & Anthony Wray, 2019. "Long-Run Consequences of Exposure to Natural Disasters," Journal of Labor Economics, University of Chicago Press, vol. 37(3), pages 949-1007.
    17. Catron, Peter, 2017. "The Citizenship Advantage: Immigrant Socioeconomic Attainment across Generations in the First Half of the Twentieth Century," SocArXiv c7k45, Center for Open Science.
    18. Catherine G. Massey, 2014. "Creating Linked Historical Data: An Assessment of the Census Bureau’s Ability to Assign Protected Identification Keys to the 1960 Census," CARRA Working Papers 2014-12, Center for Economic Studies, U.S. Census Bureau.
    19. Elisa Jácome & Ilyana Kuziemko & Suresh Naidu, 2021. "Mobility for All: Representative Intergenerational Mobility Estimates over the 20th Century," Working Papers 302, Princeton University, Department of Economics, Center for Economic Policy Studies..
    20. Inwood, Kris & Minns, Chris & Summerfield, Fraser, 2019. "Occupational income scores and immigrant assimilation. Evidence from the Canadian census," Explorations in Economic History, Elsevier, vol. 72(C), pages 114-122.

    More about this item

    JEL classification:

    • C81 - Mathematical and Quantitative Methods - - Data Collection and Data Estimation Methodology; Computer Programs - - - Methodology for Collecting, Estimating, and Organizing Microeconomic Data; Data Access
    • C83 - Mathematical and Quantitative Methods - - Data Collection and Data Estimation Methodology; Computer Programs - - - Survey Methods; Sampling Methods
    • N01 - Economic History - - General - - - Development of the Discipline: Historiographical; Sources and Methods
    • N31 - Economic History - - Labor and Consumers, Demography, Education, Health, Welfare, Income, Wealth, Religion, and Philanthropy - - - U.S.; Canada: Pre-1913
    • N32 - Economic History - - Labor and Consumers, Demography, Education, Health, Welfare, Income, Wealth, Religion, and Philanthropy - - - U.S.; Canada: 1913-

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:aea:jeclit:v:59:y:2021:i:3:p:865-918. See general information about how to correct material in RePEc.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: . General contact details of provider: https://edirc.repec.org/data/aeaaaea.html .

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Michael P. Albert (email available below). General contact details of provider: https://edirc.repec.org/data/aeaaaea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service hosted by the Research Division of the Federal Reserve Bank of St. Louis . RePEc uses bibliographic data supplied by the respective publishers.