IDEAS home Printed from https://ideas.repec.org/p/cdl/cshedu/qt1xp6g8nj.html
   My bibliography  Save this paper

The University of California ClioMetric History Project and Formatted Optical Character Recognition

Author

Listed:
  • Bleemer, Zachary

Abstract

In what ways—and to what degree—have universities contributed to the long-run growth, health, economic mobility, and gender/ethnic equity of their students’ communities and home states? The University of California ClioMetric History Project (UC-CHP), based at the Center for Studies in Higher Education, extends prior research on this question in two ways. First, we have developed a novel digitization protocol—formatted optical character recognition (fOCR)—which transforms scanned structured and semi-structured texts like university directories and catalogs into high-quality computer-readable databases. We use fOCR to produce annual databases of students (1890s to 1940s), faculty (1900 to present), course descriptions (1900 to present), and detailed budgets (1911-2012) for many California universities. Digitized student records, for example, illuminate the high proportion of 1900s university students who were female and from rural areas, as well as large family income differences between male and female students and between students at public and private universities. Second, UC-CHP is working to photograph, process with fOCR, and analyze restricted student administrative records to construct a comprehensive database of California university students and their enrollment behavior. This paper describes UC-CHP’s methodology and provides technical documentation for the project, while also presenting examples of the range of data the project is exploring and prospects for future research.

Suggested Citation

  • Bleemer, Zachary, 2018. "The University of California ClioMetric History Project and Formatted Optical Character Recognition," University of California at Berkeley, Center for Studies in Higher Education qt1xp6g8nj, Center for Studies in Higher Education, UC Berkeley.
  • Handle: RePEc:cdl:cshedu:qt1xp6g8nj
    as

    Download full text from publisher

    File URL: https://www.escholarship.org/uc/item/1xp6g8nj.pdf;origin=repeccitec
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Raj Chetty & John N. Friedman & Emmanuel Saez & Nicholas Turner & Danny Yagan, 2017. "Mobility Report Cards: The Role of Colleges in Intergenerational Mobility," Working Papers 2017-059, Human Capital and Economic Opportunity Working Group.
    2. Bleemer, Zachary, 2016. "Role Model Effects Of Female Stem Teachers And Doctors On Early 20th Century University Enrollment In California," University of California at Berkeley, Center for Studies in Higher Education qt8nq0z4wb, Center for Studies in Higher Education, UC Berkeley.
    3. Matthew Gentzkow & Jesse Shapiro & Matt Taddy, 2016. "Measuring Polarization in High-Dimensional Data: Method and Application to Congressional Speech," Working Papers id:11114, eSocialSciences.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Bleemer , Zachary & Mehta, Aashish, 2021. "College Major Restrictions and Student Stratification," University of California at Berkeley, Center for Studies in Higher Education qt513249vg, Center for Studies in Higher Education, UC Berkeley.
    2. Bleemer, Zachary, 2024. "How Helpful Are Average Wage-By-Major Statistics In Choosing A Field Of Study?," University of California at Berkeley, Center for Studies in Higher Education qt0pf717bh, Center for Studies in Higher Education, UC Berkeley.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Bleemer, Zachary, 2018. "THE UC CLIOMETRIC HISTORY PROJECT AND FORMATTED OPTICAL CHARACTER RECOGNITION by Zachary Bleemer, UC Berkeley CSHE 3.18 (February 2018)," University of California at Berkeley, Center for Studies in Higher Education qt2bd8d25p, Center for Studies in Higher Education, UC Berkeley.
    2. Bleemer, Zachary, 2018. "The Uc Cliometric History Project And Formatted Optical Character Recognition," University of California at Berkeley, Center for Studies in Higher Education qt9xz1748q, Center for Studies in Higher Education, UC Berkeley.
    3. Zhifeng Cai & Jonathan Heathcote, 2022. "College Tuition and Income Inequality," American Economic Review, American Economic Association, vol. 112(1), pages 81-121, January.
    4. Charles Angelucci & Julia Cage & Michael Sinkinson, 2020. "Media Competition and News Diets," Sciences Po publications 2020-03, Sciences Po.
    5. Walker, Ian & Zhu, Yu, 2018. "University selectivity and the relative returns to higher education: Evidence from the UK," Labour Economics, Elsevier, vol. 53(C), pages 230-249.
    6. Cagé, Julia, 2020. "Media competition, information provision and political participation: Evidence from French local newspapers and elections, 1944–2014," Journal of Public Economics, Elsevier, vol. 185(C).
    7. Joshua Goodman & Oded Gurantz & Jonathan Smith, 2020. "Take Two! SAT Retaking and College Enrollment Gaps," American Economic Journal: Economic Policy, American Economic Association, vol. 12(2), pages 115-158, May.
    8. Piketty, Thomas & Bozio, Antoine & Garbinti, Bertrand & Goupille-Lebret, Jonathan & Guillot, Malka, 2020. "Predistribution vs. Redistribution: Evidence from France and the U.S," CEPR Discussion Papers 15415, C.E.P.R. Discussion Papers.
    9. Adam Altmejd & Andrés Barrios-Fernández & Marin Drlje & Joshua Goodman & Michael Hurwitz & Dejan Kovac & Christine Mulhern & Christopher Neilson & Jonathan Smith, 2021. "O Brother, Where Start Thou? Sibling Spillovers on College and Major Choice in Four Countries," The Quarterly Journal of Economics, President and Fellows of Harvard College, vol. 136(3), pages 1831-1886.
    10. Niklas Potrafke, 2018. "Government ideology and economic policy-making in the United States—a survey," Public Choice, Springer, vol. 174(1), pages 145-207, January.
    11. Qiong Zhu & Junghee Choi & Yi Meng, 2021. "The Impact of No-Loan Policies on Student Economic Diversity at Public Colleges and Universities," Research in Higher Education, Springer;Association for Institutional Research, vol. 62(6), pages 733-764, September.
    12. Amy Hsin & Francesc Ortega, 2018. "The Effects of Deferred Action for Childhood Arrivals on the Educational Outcomes of Undocumented Students," Demography, Springer;Population Association of America (PAA), vol. 55(4), pages 1487-1506, August.
    13. Benjamin Ogden, 2017. "The Imperfect Beliefs Voting Model," Working Papers ECARES ECARES 2017-20, ULB -- Universite Libre de Bruxelles.
    14. Laajaj, Rachid & Moya, Andrés & Sánchez, Fabio, 2022. "Equality of opportunity and human capital accumulation: Motivational effect of a nationwide scholarship in Colombia," Journal of Development Economics, Elsevier, vol. 154(C).
    15. Bertrand Garbinti & Frédérique Savignac, 2020. "Accounting for Intergenerational Wealth Mobility in France over the 20th Century: Method and Estimations," Working papers 776, Banque de France.
    16. Matthew Gentzkow & Jesse M. Shapiro & Matt Taddy, 2019. "Measuring Group Differences in High‐Dimensional Choices: Method and Application to Congressional Speech," Econometrica, Econometric Society, vol. 87(4), pages 1307-1340, July.
    17. Jo Blanden & Matthias Doepke & Jan Stuhler, 2022. "Education inequality," CEP Discussion Papers dp1849, Centre for Economic Performance, LSE.
    18. Weinstein, Russell, 2018. "Employer screening costs, recruiting strategies, and labor market outcomes: An equilibrium analysis of on-campus recruiting," Labour Economics, Elsevier, vol. 55(C), pages 282-299.
    19. Christopher Bennett & Brent Evans & Christopher Marsicano, 2021. "Taken for Granted? Effects of Loan-Reduction Initiatives on Student Borrowing, Admission Metrics, and Campus Diversity," Research in Higher Education, Springer;Association for Institutional Research, vol. 62(5), pages 569-599, August.
    20. Eleanor Wiske Dillon & Jeffrey Andrew Smith, 2020. "The Consequences of Academic Match between Students and Colleges," Journal of Human Resources, University of Wisconsin Press, vol. 55(3), pages 767-808.

    More about this item

    Keywords

    Education; Social and Behavioral Sciences; History of Higher Education; Big Data; Natural Language Processing; University of California;
    All these keywords.

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:cdl:cshedu:qt1xp6g8nj. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Lisa Schiff (email available below). General contact details of provider: https://escholarship.org/uc/cshe/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.