IDEAS home Printed from https://ideas.repec.org/a/eee/exehis/v96y2025ics0014498325000038.html
   My bibliography  Save this article

Examining the role of training data for supervised methods of automated record linkage: Lessons for best practice in economic history

Author

Listed:
  • Feigenbaum, James J
  • Helgertz, Jonas
  • Price, Joseph

Abstract

During the past decade, scholars have produced a vast amount of research using linked historical individual-level data, shaping and changing our understanding of the past. This linked data revolution has been powered by methodological and computational advances, partly focused on supervised machine-learning methods that rely on training data. The importance of obtaining high-quality training data for the performance of the record linkage algorithm largely, however, remains unknown. This paper comprehensively examines the role of training data, and—by extension—improves our understanding of best practices in supervised methods of probabilistic record linkage. First, we compare the speed and costs of building training data using different methods. Second, we document high rates of conditional accuracy across the training data sets, rates that are especially high when built with access to more information. Third, we show that data constructed by record linking algorithms learning from different training-data-generation methods do not substantially differ in their accuracy, either overall or across demographic groups, though algorithms tend to perform best when their feature space aligns with the features used to build the training data. Lastly, we introduce errors in the training data and find that the examined record linking algorithms are remarkably capable of making accurate links even working with flawed training data.

Suggested Citation

  • Feigenbaum, James J & Helgertz, Jonas & Price, Joseph, 2025. "Examining the role of training data for supervised methods of automated record linkage: Lessons for best practice in economic history," Explorations in Economic History, Elsevier, vol. 96(C).
  • Handle: RePEc:eee:exehis:v:96:y:2025:i:c:s0014498325000038
    DOI: 10.1016/j.eeh.2025.101656
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0014498325000038
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.eeh.2025.101656?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Jinseok Kim & Jenna Kim, 2018. "The impact of imbalanced training data on machine learning for author name disambiguation," Scientometrics, Springer;Akadémiai Kiadó, vol. 117(1), pages 511-526, October.
    2. Zachary Ward, 2023. "Intergenerational Mobility in American History: Accounting for Race and Measurement Error," American Economic Review, American Economic Association, vol. 113(12), pages 3213-3248, December.
    3. Ran Abramitzky & Leah Boustan & Katherine Eriksson & James Feigenbaum & Santiago Pérez, 2021. "Automated Linking of Historical Data," Journal of Economic Literature, American Economic Association, vol. 59(3), pages 865-918, September.
    4. Jason Long & Joseph Ferrie, 2013. "Intergenerational Occupational Mobility in Great Britain and the United States since 1850: Reply," American Economic Review, American Economic Association, vol. 103(5), pages 2041-2049, August.
    5. Price, Joseph & Buckles, Kasey & Van Leeuwen, Jacob & Riley, Isaac, 2021. "Combining family history and machine learning to link historical records: The Census Tree data set," Explorations in Economic History, Elsevier, vol. 80(C).
    6. Ran Abramitzky & Jacob Conway & Roy Mill & Luke Stein, 2023. "The Gendered Impacts of Perceived Skin Tone: Evidence from African-American Siblings in 1870–1940," NBER Working Papers 31016, National Bureau of Economic Research, Inc.
    7. Jason Long & Joseph Ferrie, 2013. "Intergenerational Occupational Mobility in Great Britain and the United States since 1850," American Economic Review, American Economic Association, vol. 103(4), pages 1109-1137, June.
    8. James J. Feigenbaum, 2018. "Multiple Measures of Historical Intergenerational Mobility: Iowa 1915 to 1940," Economic Journal, Royal Economic Society, vol. 128(612), pages 446-481, July.
    9. Martha J. Bailey & Connor Cole & Morgan Henderson & Catherine Massey, 2020. "How Well Do Automated Linking Methods Perform? Lessons from US Historical Data," Journal of Economic Literature, American Economic Association, vol. 58(4), pages 997-1044, December.
    10. Jonas Helgertz & Joseph Price & Jacob Wellington & Kelly J Thompson & Steven Ruggles & Catherine A. Fitch, 2022. "A new strategy for linking U.S. historical censuses: A case study for the IPUMS multigenerational longitudinal panel," Historical Methods: A Journal of Quantitative and Interdisciplinary History, Taylor & Francis Journals, vol. 55(1), pages 12-29, January.
    11. Ricardo Dahis & Emily Nix & Nancy Qian, 2019. "Choosing Racial Identity in the United States, 1880-1940," NBER Working Papers 26465, National Bureau of Economic Research, Inc.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Hwang, Sam Il Myoung & Squires, Munir, 2024. "Linked samples and measurement error in historical US census data," Explorations in Economic History, Elsevier, vol. 93(C).
    2. Krzysztof Karbownik & Anthony Wray, 2019. "Educational, Labor-market and Intergenerational Consequences of Poor Childhood Health," NBER Working Papers 26368, National Bureau of Economic Research, Inc.
    3. Torsten Santavirta & Jan Stuhler, 2024. "Name-Based Estimators of Intergenerational Mobility," The Economic Journal, Royal Economic Society, vol. 134(663), pages 2982-3016.
    4. Collins, William J. & Zimran, Ariell, 2019. "The economic assimilation of Irish Famine migrants to the United States," Explorations in Economic History, Elsevier, vol. 74(C).
    5. Juliana Jaramillo-Echeverri, 2024. "Movilidad social en la educación: el caso de la Universidad de los Andes en Colombia entre 1949 y 2018," Cuadernos de Historia Económica 61, Banco de la Republica de Colombia.
    6. Zachary Ward, 2023. "Intergenerational Mobility in American History: Accounting for Race and Measurement Error," American Economic Review, American Economic Association, vol. 113(12), pages 3213-3248, December.
    7. Martha J. Bailey & Peter Z. Lin, 2024. "Marital Matching and Women’s Intergenerational Mobility in the Late 19th and Early 20th Century US," NBER Chapters, in: The Economic History of American Inequality: New Evidence and Perspectives, pages 165-196, National Bureau of Economic Research, Inc.
    8. Eric S. M. Protzer & Sultan Orazbayev & Andres Gomez-Lievano & Matte Hartog & Frank Neffke, 2024. "A New Algorithm to Efficiently Match U.S. Census Records and Balance Representativity with Match Quality," Growth Lab Working Papers 238, Harvard's Growth Lab.
    9. Elisa Jácome & Ilyana Kuziemko & Suresh Naidu, 2021. "Mobility for All: Representative Intergenerational Mobility Estimates over the 20th Century," Working Papers 302, Princeton University, Department of Economics, Center for Economic Policy Studies..
    10. Ran Abramitzky & Leah Boustan & Katherine Eriksson & James Feigenbaum & Santiago Pérez, 2021. "Automated Linking of Historical Data," Journal of Economic Literature, American Economic Association, vol. 59(3), pages 865-918, September.
    11. Berger, Thor & Engzell, Per & Eriksson, Björn & Molinder, Jakob, 2023. "Social Mobility in Sweden before the Welfare State," The Journal of Economic History, Cambridge University Press, vol. 83(2), pages 431-463, June.
    12. Dora L. Costa & Coralee Lewis & Noelle Yetter, 2023. "Children and grandchildren of Union Army veterans: New data collections to study the persistence of longevity and socioeconomic status across generations," Historical Methods: A Journal of Quantitative and Interdisciplinary History, Taylor & Francis Journals, vol. 56(4), pages 223-239, October.
    13. Ran Abramitzky & Leah Platt Boustan & Elisa Jácome & Santiago Pérez, 2019. "Intergenerational Mobility of Immigrants over Two Centuries," Working Papers 2019-6, Princeton University. Economics Department..
    14. Ran Abramitzky & Leah Platt Boustan & Elisa Jácome & Santiago Pérez, 2019. "Intergenerational Mobility of Immigrants in the US over Two Centuries," NBER Working Papers 26408, National Bureau of Economic Research, Inc.
    15. Ran Abramitzky & Roy Mill & Santiago Pérez, 2020. "Linking individuals across historical sources: A fully automated approach," Historical Methods: A Journal of Quantitative and Interdisciplinary History, Taylor & Francis Journals, vol. 53(2), pages 94-111, April.
    16. Combes, Pierre-Philippe & Gobillon, Laurent & Zylberberg, Yanos, 2022. "Urban economics in a historical perspective: Recovering data with machine learning," Regional Science and Urban Economics, Elsevier, vol. 94(C).
    17. Daniel Aaronson & Jonathan Davis & Karl Schulze, 2018. "Internal Immigrant Mobility in the Early 20th Century: Experimental Evidence from Galveston Immigrants," Working Paper Series WP-2018-4, Federal Reserve Bank of Chicago.
    18. Catron, Peter, 2017. "The Citizenship Advantage: Immigrant Socioeconomic Attainment across Generations in the First Half of the Twentieth Century," SocArXiv c7k45, Center for Open Science.
    19. Dylan Shane Connor & Michael Storper, 2020. "The changing geography of social mobility in the United States," Proceedings of the National Academy of Sciences, Proceedings of the National Academy of Sciences, vol. 117(48), pages 30309-30317, December.
    20. Bennett, Robert J. & Montebruno, Piero & Van Lieshout, Carry & Smith, Harry, 2022. "Business entry and exit: career changes of proprietors in England and Wales (1851-81) using record-linkage," LSE Research Online Documents on Economics 113867, London School of Economics and Political Science, LSE Library.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:exehis:v:96:y:2025:i:c:s0014498325000038. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/inca/622830 .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.