IDEAS home Printed from https://ideas.repec.org/a/eee/exehis/v87y2023ics0014498322000511.html
   My bibliography  Save this article

HANA: A handwritten name database for offline handwritten text recognition

Author

Listed:
  • Dahl, Christian M.
  • Johansen, Torben S.D.
  • Sørensen, Emil N.
  • Wittrock, Simon

Abstract

Methods for linking individuals across historical data sets, typically in combination with AI based transcription models, are developing rapidly. Perhaps the single most important identifier for linking is personal names. However, personal names are prone to enumeration and transcription errors and although modern linking methods are designed to handle such challenges, these sources of errors are critical and should be minimized. For this purpose, improved transcription methods and large-scale databases are crucial components. This paper describes and provides documentation for HANA, a newly constructed large-scale database which consists of more than 3.3 million names. The database contains more than 105 thousand unique names with a total of more than 1.1 million images of personal names, which proves useful for transfer learning to other settings. We provide three examples hereof, obtaining significantly improved transcription accuracy on both Danish and US census data. In addition, we present benchmark results for deep learning models automatically transcribing the personal names from the scanned documents. Through making more challenging large-scale databases publicly available we hope to foster more sophisticated, accurate, and robust models for handwritten text recognition.

Suggested Citation

  • Dahl, Christian M. & Johansen, Torben S.D. & Sørensen, Emil N. & Wittrock, Simon, 2023. "HANA: A handwritten name database for offline handwritten text recognition," Explorations in Economic History, Elsevier, vol. 87(C).
  • Handle: RePEc:eee:exehis:v:87:y:2023:i:c:s0014498322000511
    DOI: 10.1016/j.eeh.2022.101473
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0014498322000511
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.eeh.2022.101473?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to look for a different version below or search for a different version of it.

    Other versions of this item:

    References listed on IDEAS

    as
    1. Catherine G. Massey, 2017. "Playing with matches: An assessment of accuracy in linked historical data," Historical Methods: A Journal of Quantitative and Interdisciplinary History, Taylor & Francis Journals, vol. 50(3), pages 129-143, July.
    2. Combes, Pierre-Philippe & Gobillon, Laurent & Zylberberg, Yanos, 2022. "Urban economics in a historical perspective: Recovering data with machine learning," Regional Science and Urban Economics, Elsevier, vol. 94(C).
    3. Ran Abramitzky & Leah Platt Boustan & Katherine Eriksson, 2012. "Europe's Tired, Poor, Huddled Masses: Self-Selection and Economic Outcomes in the Age of Mass Migration," American Economic Review, American Economic Association, vol. 102(5), pages 1832-1856, August.
    4. Ran Abramitzky & Roy Mill & Santiago Pérez, 2020. "Linking individuals across historical sources: A fully automated approach," Historical Methods: A Journal of Quantitative and Interdisciplinary History, Taylor & Francis Journals, vol. 53(2), pages 94-111, April.
    5. Ran Abramitzky & Leah Boustan & Katherine Eriksson & James Feigenbaum & Santiago Pérez, 2021. "Automated Linking of Historical Data," Journal of Economic Literature, American Economic Association, vol. 59(3), pages 865-918, September.
    6. Ran Abramitzky & Leah Platt Boustan & Katherine Eriksson, 2016. "Cultural Assimilation during the Age of Mass Migration," NBER Working Papers 22381, National Bureau of Economic Research, Inc.
    7. James J. Feigenbaum, 2018. "Multiple Measures of Historical Intergenerational Mobility: Iowa 1915 to 1940," Economic Journal, Royal Economic Society, vol. 128(612), pages 446-481, July.
    8. Abramitzky, Ran & Boustan, Leah Platt & Eriksson, Katherine, 2013. "Have the poor always been less likely to migrate? Evidence from inheritance practices during the age of mass migration," Journal of Development Economics, Elsevier, vol. 102(C), pages 2-14.
    9. Joseph Price & Kasey Buckles & Jacob Van Leeuwen & Isaac Riley, 2019. "Combining Family History and Machine Learning to Link Historical Records," NBER Working Papers 26227, National Bureau of Economic Research, Inc.
    10. Ran Abramitzky & Leah Platt Boustan & Katherine Eriksson, 2014. "A Nation of Immigrants: Assimilation and Economic Outcomes in the Age of Mass Migration," Journal of Political Economy, University of Chicago Press, vol. 122(3), pages 467-506.
    11. Martha J. Bailey & Connor Cole & Morgan Henderson & Catherine Massey, 2020. "How Well Do Automated Linking Methods Perform? Lessons from US Historical Data," Journal of Economic Literature, American Economic Association, vol. 58(4), pages 997-1044, December.
    12. Christian M. Dahl & Torben S. D. Johansen & Emil N. S{o}rensen & Christian E. Westermann & Simon F. Wittrock, 2021. "Applications of Machine Learning in Document Digitisation," Papers 2102.03239, arXiv.org.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Price, Joseph & Buckles, Kasey & Van Leeuwen, Jacob & Riley, Isaac, 2021. "Combining family history and machine learning to link historical records: The Census Tree data set," Explorations in Economic History, Elsevier, vol. 80(C).
    2. Collins, William J. & Zimran, Ariell, 2019. "The economic assimilation of Irish Famine migrants to the United States," Explorations in Economic History, Elsevier, vol. 74(C).
    3. Ran Abramitzky & Leah Boustan & Katherine Eriksson & James Feigenbaum & Santiago Pérez, 2021. "Automated Linking of Historical Data," Journal of Economic Literature, American Economic Association, vol. 59(3), pages 865-918, September.
    4. Combes, Pierre-Philippe & Gobillon, Laurent & Zylberberg, Yanos, 2022. "Urban economics in a historical perspective: Recovering data with machine learning," Regional Science and Urban Economics, Elsevier, vol. 94(C).
    5. Ager, Philipp & Abramitzky, Ran & Boustan, Leah & Cohen, Elior David & Hansen, Casper Worm, 2019. "The Effects of Immigration on the Economy: Lessons from the 1920s Border Closure," CEPR Discussion Papers 14165, C.E.P.R. Discussion Papers.
    6. Hanlon, W.Walker & Heblich, Stephan, 2022. "History and urban economics," Regional Science and Urban Economics, Elsevier, vol. 94(C).
    7. Philipp Ager & Leah Boustan & Katherine Eriksson, 2021. "The Intergenerational Effects of a Large Wealth Shock: White Southerners after the Civil War," American Economic Review, American Economic Association, vol. 111(11), pages 3767-3794, November.
    8. Tyler Anbinder & Dylan Connor & Cormac Ó Gráda & Simone Wegge, 2021. "The Problem of False Positives in Automated Census Linking: Evidence from Nineteenth-Century New York's Irish Immigrants," Working Papers 202114, School of Economics, University College Dublin.
    9. Andreas Vortisch, 2023. "The impact of the Johnson–Reed Act on Filipino labor market outcomes," French Stata Users' Group Meetings 2023 12, Stata Users Group.
    10. Martha J. Bailey & Connor Cole & Morgan Henderson & Catherine Massey, 2020. "How Well Do Automated Linking Methods Perform? Lessons from US Historical Data," Journal of Economic Literature, American Economic Association, vol. 58(4), pages 997-1044, December.
    11. Zachary Ward, 2023. "Intergenerational Mobility in American History: Accounting for Race and Measurement Error," American Economic Review, American Economic Association, vol. 113(12), pages 3213-3248, December.
    12. Krzysztof Karbownik & Anthony Wray, 2019. "Educational, Labor-market and Intergenerational Consequences of Poor Childhood Health," NBER Working Papers 26368, National Bureau of Economic Research, Inc.
    13. Joseph Price & Kasey Buckles & Jacob Van Leeuwen & Isaac Riley, 2019. "Combining Family History and Machine Learning to Link Historical Records," NBER Working Papers 26227, National Bureau of Economic Research, Inc.
    14. Ran Abramitzky & Roy Mill & Santiago Pérez, 2020. "Linking individuals across historical sources: A fully automated approach," Historical Methods: A Journal of Quantitative and Interdisciplinary History, Taylor & Francis Journals, vol. 53(2), pages 94-111, April.
    15. Abramitzky, Ran & Boustan, Leah & Catron, Peter & Connor, Dylan & Voigt, Rob, 2021. "Refugees without Assistance: English-Language Attainment and Economic Outcomes in the Early Twentieth Century," SocArXiv 429jp, Center for Open Science.
    16. Ran Abramitzky & Leah Platt Boustan & Dylan Connor, 2020. "Leaving the Enclave: Historical Evidence on Immigrant Mobility from the Industrial Removal Office," Working Papers 2020-35, Princeton University. Economics Department..
    17. Catron, Peter, 2017. "The Citizenship Advantage: Immigrant Socioeconomic Attainment across Generations in the First Half of the Twentieth Century," SocArXiv c7k45, Center for Open Science.
    18. Dylan Shane Connor & Michael Storper, 2020. "The changing geography of social mobility in the United States," Proceedings of the National Academy of Sciences, Proceedings of the National Academy of Sciences, vol. 117(48), pages 30309-30317, December.
    19. Bennett, Robert J. & Montebruno, Piero & Van Lieshout, Carry & Smith, Harry, 2022. "Business entry and exit: career changes of proprietors in England and Wales (1851-81) using record-linkage," LSE Research Online Documents on Economics 113867, London School of Economics and Political Science, LSE Library.
    20. Timothy J Hatton & Zachary Ward, 2018. "International Migration in the Atlantic Economy 1850 - 1940," CEH Discussion Papers 02, Centre for Economic History, Research School of Economics, Australian National University.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:exehis:v:87:y:2023:i:c:s0014498322000511. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/inca/622830 .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.