IDEAS home Printed from https://ideas.repec.org/p/nbr/nberwo/32327.html
   My bibliography  Save this paper

Recovering Overlooked Information in Categorical Variables with LLMs: An Application to Labor Market Mismatch

Author

Listed:
  • Yi Chen
  • Hanming Fang
  • Yi Zhao
  • Zibo Zhao

Abstract

Categorical variables have no intrinsic ordering, and researchers often adopt a fixed-effect (FE) approach in empirical analysis. However, this approach has two significant limitations: it overlooks textual labels associated with the categorical variables; and it produces unstable results when there are only limited observations in a category. In this paper, we propose a novel method that utilizes recent advances in large language models (LLMs) to recover overlooked information in categorical variables. We apply this method to investigate labor market mismatch. Specifically, we task LLMs with simulating the role of a human resources specialist to assess the suitability of an applicant with specific characteristics for a given job. Our main findings can be summarized in three parts. First, using comprehensive administrative data from an online job posting platform, we show that our new match quality measure is positively correlated with several traditional measures in the literature, and at the same time, we highlight the LLM's capability to provide additional information conditional on the traditional measures. Second, we demonstrate the broad applicability of the new method with a survey data containing significantly less information than the administrative data, which makes it impossible to compute most of the traditional match quality measures. Our LLM measure successfully replicates most of the salient patterns observed in a hard-to-access administrative dataset using easily accessible survey data. Third, we investigate the gender gap in match quality and explore whether there exists gender stereotypes in the hiring process. We simulate an audit study, examining whether revealing gender information to LLMs influences their assessment. We show that when gender information is disclosed to the GPT, the model deems females better suited for traditionally female-dominated roles.

Suggested Citation

  • Yi Chen & Hanming Fang & Yi Zhao & Zibo Zhao, 2024. "Recovering Overlooked Information in Categorical Variables with LLMs: An Application to Labor Market Mismatch," NBER Working Papers 32327, National Bureau of Economic Research, Inc.
  • Handle: RePEc:nbr:nberwo:32327
    Note: LS PE
    as

    Download full text from publisher

    File URL: http://www.nber.org/papers/w32327.pdf
    Download Restriction: Access to the full text is generally limited to series subscribers, however if the top level domain of the client browser is in a developing country or transition economy free access is provided. More information about subscriptions and free access is available at http://www.nber.org/wwphelp.html. Free access is also available to older working papers.
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    More about this item

    JEL classification:

    • C55 - Mathematical and Quantitative Methods - - Econometric Modeling - - - Large Data Sets: Modeling and Analysis
    • J16 - Labor and Demographic Economics - - Demographic Economics - - - Economics of Gender; Non-labor Discrimination
    • J24 - Labor and Demographic Economics - - Demand and Supply of Labor - - - Human Capital; Skills; Occupational Choice; Labor Productivity
    • J31 - Labor and Demographic Economics - - Wages, Compensation, and Labor Costs - - - Wage Level and Structure; Wage Differentials

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nbr:nberwo:32327. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: the person in charge (email available below). General contact details of provider: https://edirc.repec.org/data/nberrus.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.