IDEAS home Printed from https://ideas.repec.org/a/eee/exehis/v80y2021ics0014498321000024.html
   My bibliography  Save this article

Combining family history and machine learning to link historical records: The Census Tree data set

Author

Listed:
  • Price, Joseph
  • Buckles, Kasey
  • Van Leeuwen, Jacob
  • Riley, Isaac

Abstract

A key challenge for research on many questions in the social sciences is that it is difficult to link records in a way that allows investigators to observe people at different points in their life or across generations. In this paper, we contribute to recent efforts to create these links with a new approach that relies on millions of record links created by individual contributors to a large, public, wiki-style family tree. We use these “true” links both to inform the decisions one needs to make when using automated methods to link records and as a training data set for use in a supervised machine learning approach. We describe our procedure and illustrate its potential by linking individuals across the 100% samples of the US censuses from 1900, 1910, and 1920. When linking adjacent censuses, we obtain an overall match rate of 62-65 percent (for over 88.9 million matches), with a false positive rate that is around 6-7 percent and with links that are similar to the population along observable characteristics. Thus, our method allows us to link records with a combination of a high match rate, precision, and representativeness that is beyond the current frontier. Finally, we demonstrate the potential of the data by estimating the degree of intergenerational transmission of literacy between father-son and mother-daughter pairs.

Suggested Citation

  • Price, Joseph & Buckles, Kasey & Van Leeuwen, Jacob & Riley, Isaac, 2021. "Combining family history and machine learning to link historical records: The Census Tree data set," Explorations in Economic History, Elsevier, vol. 80(C).
  • Handle: RePEc:eee:exehis:v:80:y:2021:i:c:s0014498321000024
    DOI: 10.1016/j.eeh.2021.101391
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0014498321000024
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.eeh.2021.101391?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Catherine G. Massey, 2017. "Playing with matches: An assessment of accuracy in linked historical data," Historical Methods: A Journal of Quantitative and Interdisciplinary History, Taylor & Francis Journals, vol. 50(3), pages 129-143, July.
    2. Fouka, Vasiliki, 2019. "How Do Immigrants Respond to Discrimination? The Case of Germans in the US During World War I," American Political Science Review, Cambridge University Press, vol. 113(2), pages 405-422, May.
    3. Raj Chetty & John N. Friedman & Emmanuel Saez & Nicholas Turner & Danny Yagan, 2017. "Mobility Report Cards: The Role of Colleges in Intergenerational Mobility," NBER Working Papers 23618, National Bureau of Economic Research, Inc.
    4. Raj Chetty & Nathaniel Hendren, 2018. "The Impacts of Neighborhoods on Intergenerational Mobility I: Childhood Exposure Effects," The Quarterly Journal of Economics, President and Fellows of Harvard College, vol. 133(3), pages 1107-1162.
    5. Collins, William J. & Wanamaker, Marianne H., 2015. "The Great Migration in Black and White: New Evidence on the Selection and Sorting of Southern Migrants," The Journal of Economic History, Cambridge University Press, vol. 75(4), pages 947-992, December.
    6. Ran Abramitzky & Roy Mill & Santiago Pérez, 2020. "Linking individuals across historical sources: A fully automated approach," Historical Methods: A Journal of Quantitative and Interdisciplinary History, Taylor & Francis Journals, vol. 53(2), pages 94-111, April.
    7. Claudia Olivetti & M. Daniele Paserman, 2015. "In the Name of the Son (and the Daughter): Intergenerational Mobility in the United States, 1850-1940," American Economic Review, American Economic Association, vol. 105(8), pages 2695-2724, August.
    8. Ran Abramitzky & Leah Boustan & Katherine Eriksson & James Feigenbaum & Santiago Pérez, 2021. "Automated Linking of Historical Data," Journal of Economic Literature, American Economic Association, vol. 59(3), pages 865-918, September.
    9. James J. Feigenbaum, 2018. "Multiple Measures of Historical Intergenerational Mobility: Iowa 1915 to 1940," Economic Journal, Royal Economic Society, vol. 128(612), pages 446-481, July.
    10. Alexander, Rohan & Ward, Zachary, 2018. "Age at Arrival and Assimilation During the Age of Mass Migration," The Journal of Economic History, Cambridge University Press, vol. 78(3), pages 904-937, September.
    11. Bhashkar Mazumder & Jonathan M. V. Davis, 2013. "Parental Earnings And Children'S Well-Being: An Analysis Of The Survey Of Income And Program Participation Matched To Social Security Administration Earnings Data," Economic Inquiry, Western Economic Association International, vol. 51(3), pages 1795-1808, July.
    12. James Feigenbaum & Daniel P. Gross, 2020. "Answering the Call of Automation: How the Labor Market Adjusted to Mechanizing Telephone Operation," NBER Working Papers 28061, National Bureau of Economic Research, Inc.
    13. Beach, Brian & Ferrie, Joseph & Saavedra, Martin & Troesken, Werner, 2016. "Typhoid Fever, Water Quality, and Human Capital Formation," The Journal of Economic History, Cambridge University Press, vol. 76(1), pages 41-75, March.
    14. Sendhil Mullainathan & Jann Spiess, 2017. "Machine Learning: An Applied Econometric Approach," Journal of Economic Perspectives, American Economic Association, vol. 31(2), pages 87-106, Spring.
    15. Joseph Price & Kasey Buckles & Jacob Van Leeuwen & Isaac Riley, 2019. "Combining Family History and Machine Learning to Link Historical Records," NBER Working Papers 26227, National Bureau of Economic Research, Inc.
    16. Ran Abramitzky & Leah Platt Boustan & Katherine Eriksson, 2014. "A Nation of Immigrants: Assimilation and Economic Outcomes in the Age of Mass Migration," Journal of Political Economy, University of Chicago Press, vol. 122(3), pages 467-506.
    17. Solon, Gary, 1992. "Intergenerational Income Mobility in the United States," American Economic Review, American Economic Association, vol. 82(3), pages 393-408, June.
    18. Mary F. Evans & Eric Helland & Jonathan Klick & Ashwin Patel, 2016. "The Developmental Effect Of State Alcohol Prohibitions At The Turn Of The Twentieth Century," Economic Inquiry, Western Economic Association International, vol. 54(2), pages 762-777, April.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Wolfgang Keller & Carol H. Shiue, 2023. "Intergenerational Mobility of Daughters and Marital Sorting: New Evidence from Imperial China," NBER Working Papers 31695, National Bureau of Economic Research, Inc.
    2. Youssouf Merouani & Faustine Perrin, 2022. "Gender and the long-run development process. A survey of the literature [Rethinking age heaping: A cautionary tale from nineteenth-century Italy]," European Review of Economic History, European Historical Economics Society, vol. 26(4), pages 612-641.
    3. Postel, Hannah M., 2022. "Record Linkage for Character-Based Surnames: Evidence from Chinese Exclusion," SocArXiv rckjp, Center for Open Science.
    4. Postel, Hannah M., 2023. "Record linkage for character-based surnames: Evidence from chinese exclusion," Explorations in Economic History, Elsevier, vol. 87(C).
    5. Anbinder, Tyler & Connor, Dylan & O Grada, Cormac & Wegge, Simone, 2021. "The Problem of False Positives in Automated Census Linking: Evidence from Nineteenth-Century New York's Irish Immigrants," CAGE Online Working Paper Series 568, Competitive Advantage in the Global Economy (CAGE).

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Joseph Price & Kasey Buckles & Jacob Van Leeuwen & Isaac Riley, 2019. "Combining Family History and Machine Learning to Link Historical Records," NBER Working Papers 26227, National Bureau of Economic Research, Inc.
    2. Combes, Pierre-Philippe & Gobillon, Laurent & Zylberberg, Yanos, 2022. "Urban economics in a historical perspective: Recovering data with machine learning," Regional Science and Urban Economics, Elsevier, vol. 94(C).
    3. Dahl, Christian M. & Johansen, Torben S.D. & Sørensen, Emil N. & Wittrock, Simon, 2023. "HANA: A handwritten name database for offline handwritten text recognition," Explorations in Economic History, Elsevier, vol. 87(C).
    4. Collins, William J. & Zimran, Ariell, 2019. "The economic assimilation of Irish Famine migrants to the United States," Explorations in Economic History, Elsevier, vol. 74(C).
    5. Krzysztof Karbownik & Anthony Wray, 2019. "Educational, Labor-market and Intergenerational Consequences of Poor Childhood Health," NBER Working Papers 26368, National Bureau of Economic Research, Inc.
    6. Elisa Jácome & Ilyana Kuziemko & Suresh Naidu, 2021. "Mobility for All: Representative Intergenerational Mobility Estimates over the 20th Century," Working Papers 302, Princeton University, Department of Economics, Center for Economic Policy Studies..
    7. Ran Abramitzky & Leah Boustan & Katherine Eriksson & James Feigenbaum & Santiago Pérez, 2021. "Automated Linking of Historical Data," Journal of Economic Literature, American Economic Association, vol. 59(3), pages 865-918, September.
    8. Inwood, Kris & Minns, Chris & Summerfield, Fraser, 2019. "Occupational income scores and immigrant assimilation. Evidence from the Canadian census," Explorations in Economic History, Elsevier, vol. 72(C), pages 114-122.
    9. Zhu, Ziming, 2022. "Like father like son? Intergenerational immobility in England, 1851-1911," Economic History Working Papers 117588, London School of Economics and Political Science, Department of Economic History.
    10. Martha J. Bailey & Connor Cole & Morgan Henderson & Catherine Massey, 2020. "How Well Do Automated Linking Methods Perform? Lessons from US Historical Data," Journal of Economic Literature, American Economic Association, vol. 58(4), pages 997-1044, December.
    11. Ran Abramitzky & Leah Platt Boustan & Elisa Jácome & Santiago Pérez, 2019. "Intergenerational Mobility of Immigrants over Two Centuries," Working Papers 2019-6, Princeton University. Economics Department..
    12. Zachary Ward, 2023. "Intergenerational Mobility in American History: Accounting for Race and Measurement Error," American Economic Review, American Economic Association, vol. 113(12), pages 3213-3248, December.
    13. Combes, Pierre-Philippe & Gobillon, Laurent & Zylberberg, Yanos, 2022. "Urban economics in a historical perspective: Recovering data with machine learning," Regional Science and Urban Economics, Elsevier, vol. 94(C).
    14. Saavedra, Martin & Twinam, Tate, 2020. "A machine learning approach to improving occupational income scores," Explorations in Economic History, Elsevier, vol. 75(C).
    15. Inwood, Kris & Minns, Chris & Summerfield, Fraser, 2019. "Occupational income scores and immigrant assimilation. Evidence from the Canadian census," Explorations in Economic History, Elsevier, vol. 72(C), pages 114-122.
    16. Chong Lu, 2022. "The effect of migration on rural residents’ intergenerational subjective social status mobility in China," Quality & Quantity: International Journal of Methodology, Springer, vol. 56(5), pages 3279-3308, October.
    17. Abramitzky, Ran & Boustan, Leah & Catron, Peter & Connor, Dylan & Voigt, Rob, 2021. "Refugees without Assistance: English-Language Attainment and Economic Outcomes in the Early Twentieth Century," SocArXiv 429jp, Center for Open Science.
    18. Ager, Philipp & Abramitzky, Ran & Boustan, Leah & Cohen, Elior David & Hansen, Casper Worm, 2019. "The Effects of Immigration on the Economy: Lessons from the 1920s Border Closure," CEPR Discussion Papers 14165, C.E.P.R. Discussion Papers.
    19. Margo, Robert A., 2016. "Obama, Katrina, and the Persistence of Racial Inequality," The Journal of Economic History, Cambridge University Press, vol. 76(2), pages 301-341, June.
    20. Zimran, Ariell, 2022. "US immigrants’ secondary migration and geographic assimilation during the Age of Mass Migration," Explorations in Economic History, Elsevier, vol. 85(C).

    More about this item

    Keywords

    Record linking; Genealogy data; Machine learning; Intergenerational transmission;
    All these keywords.

    JEL classification:

    • N01 - Economic History - - General - - - Development of the Discipline: Historiographical; Sources and Methods
    • N11 - Economic History - - Macroeconomics and Monetary Economics; Industrial Structure; Growth; Fluctuations - - - U.S.; Canada: Pre-1913
    • N12 - Economic History - - Macroeconomics and Monetary Economics; Industrial Structure; Growth; Fluctuations - - - U.S.; Canada: 1913-
    • C8 - Mathematical and Quantitative Methods - - Data Collection and Data Estimation Methodology; Computer Programs

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:exehis:v:80:y:2021:i:c:s0014498321000024. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/inca/622830 .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.