IDEAS home Printed from https://ideas.repec.org/p/pen/papers/24-017.html
   My bibliography  Save this paper

Recovering Overlooked Information in Categorical Variables with LLMs: An Application to Labor Market Mismatch

Author

Listed:
  • Yi Chen

    (ShanghaiTech University)

  • Hanming Fang

    (University of Pennsylvania)

  • Yi Zhao

    (Tsinghua University)

  • Zibo Zhao

    (ShanghaiTech University)

Abstract

Categorical variables have no intrinsic ordering, and researchers often adopt a fixed-effect (FE) approach in empirical analysis. However, this approach has two significant limitations: it overlooks textual information associated with the categorical variables; and it produces unstable results when there are only limited observations in a category. In this paper, we propose a novel method that utilizes recent advances in large language models (LLMs) to recover overlooked information in categorical variables. We apply this method to investigate labor market mismatch. Specifically, we task LLMs with simulating the role of a human resources specialist to assess the suitability of an applicant with specific characteristics for a given job. Our main findings can be summarized in three parts. First, using comprehensive administrative data from an online job posting platform, we show that our new match quality measure is positively correlated with several traditional measures in the literature, and we highlight the LLM’s capability to provide additional information beyond that contained in the traditional measures. Second, we demonstrate the broad applicability of the new method with a survey data containing significantly less information than the administrative data, which makes it impossible to compute most of the traditional match quality measures. Our LLM measure successfully replicates most of the salient patterns observed in a hard-to-access administrative dataset using easily accessible survey data. Third, we investigate the gender gap in match quality and explore whether there exists gender stereotypes in the hiring process. We simulate an audit study, examining whether revealing gender information to LLMs influences their assessment. We show that when gender information is disclosed to the LLMs, the model deems females better suited for traditionally female-dominated roles.

Suggested Citation

  • Yi Chen & Hanming Fang & Yi Zhao & Zibo Zhao, 2024. "Recovering Overlooked Information in Categorical Variables with LLMs: An Application to Labor Market Mismatch," PIER Working Paper Archive 24-017, Penn Institute for Economic Research, Department of Economics, University of Pennsylvania.
  • Handle: RePEc:pen:papers:24-017
    as

    Download full text from publisher

    File URL: https://economics.sas.upenn.edu/system/files/working-papers/24-017%20PIER%20Paper%20Submission.pdf
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Tyna Eloundou & Sam Manning & Pamela Mishkin & Daniel Rock, 2023. "GPTs are GPTs: An Early Look at the Labor Market Impact Potential of Large Language Models," Papers 2303.10130, arXiv.org, revised Aug 2023.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Hanming Fang & Ming Li & Guangli Lu, 2025. "Decoding China’s Industrial Policies," PIER Working Paper Archive 25-012, Penn Institute for Economic Research, Department of Economics, University of Pennsylvania.
    2. Herbert Dawid & Philipp Harting & Hankui Wang & Zhongli Wang & Jiachen Yi, 2025. "Agentic Workflows for Economic Research: Design and Implementation," Papers 2504.09736, arXiv.org.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Christoph Riedl & Eric Bogert, 2024. "Effects of AI Feedback on Learning, the Skill Gap, and Intellectual Diversity," Papers 2409.18660, arXiv.org.
    2. Carvajal, Daniel & Franco, Catalina & Isaksson, Siri, 2024. "Will Artificial Intelligence Get in the Way of Achieving Gender Equality?," Discussion Paper Series in Economics 3/2024, Norwegian School of Economics, Department of Economics, revised 28 Apr 2025.
    3. Naomi Hausman & Oren Rigbi & Sarit Weisburd, 2025. "Generative AI’s Impact on Student Achievement and Implications for Worker Productivity," CESifo Working Paper Series 11843, CESifo.
    4. Evangelos Katsamakas & Oleg V. Pavlov & Ryan Saklad, 2024. "Artificial intelligence and the transformation of higher education institutions," Papers 2402.08143, arXiv.org.
    5. Yang Shen, 2024. "Future jobs: analyzing the impact of artificial intelligence on employment and its mechanisms," Economic Change and Restructuring, Springer, vol. 57(2), pages 1-33, April.
    6. Jai Vipra & Anton Korinek, 2023. "Market Concentration Implications of Foundation Models," Papers 2311.01550, arXiv.org.
    7. Kristina McElheran & J. Frank Li & Erik Brynjolfsson & Zachary Kroff & Emin Dinlersoz & Lucia Foster & Nikolas Zolas, 2024. "AI adoption in America: Who, what, and where," Journal of Economics & Management Strategy, Wiley Blackwell, vol. 33(2), pages 375-415, March.
    8. Leonardo Banh & Gero Strobel, 2023. "Generative artificial intelligence," Electronic Markets, Springer;IIM University of St. Gallen, vol. 33(1), pages 1-17, December.
    9. Draca, Mirko & Nathan, Max & Nguyen-Tien, Viet & Oliveira-Cunha, Juliana & Rosso, Anna & Valero, Anna, 2024. "The New Wave? The Role of Human Capital and STEM Skills in Technology Adoption in the UK," The Warwick Economics Research Paper Series (TWERPS) 1521, University of Warwick, Department of Economics.
    10. Berlinski, Elise & Morales, Jérémy & Sponem, Samuel, 2024. "Artificial imaginaries: Generative AIs as an advanced form of capitalism," CRITICAL PERSPECTIVES ON ACCOUNTING, Elsevier, vol. 99(C).
    11. Lan Chen & Yufei Ji & Xichen Yao & Hengshu Zhu, 2024. "Occupation Life Cycle," Papers 2406.15373, arXiv.org.
    12. Acar, Oguz A., 2024. "Commentary: Reimagining marketing education in the age of generative AI," International Journal of Research in Marketing, Elsevier, vol. 41(3), pages 489-495.
    13. Frank M. Fossen & Trevor McLemore & Alina Sorgner, 2024. "Artificial Intelligence and Entrepreneurship," Foundations and Trends(R) in Entrepreneurship, now publishers, vol. 20(8), pages 781-904, December.
    14. Caleb Peppiatt, 2024. "The Future of Work: Inequality, Artificial Intelligence, and What Can Be Done About It. A Literature Review," Papers 2408.13300, arXiv.org.
    15. D'Al, Francesco & Santarelli, Enrico & Vivarelli, Marco, 2024. "The KSTE+I approach and the advent of AI technologies: evidence from the European regions," GLO Discussion Paper Series 1473, Global Labor Organization (GLO).
    16. Amali Matharaarachchi & Wishmitha Mendis & Kanishka Randunu & Daswin De Silva & Gihan Gamage & Harsha Moraliyage & Nishan Mills & Andrew Jennings, 2024. "Optimizing Generative AI Chatbots for Net-Zero Emissions Energy Internet-of-Things Infrastructure," Energies, MDPI, vol. 17(8), pages 1-19, April.
    17. Anna Davies & Betsy Donald & Mia Gray, 2023. "The power of platforms—precarity and place," Cambridge Journal of Regions, Economy and Society, Cambridge Political Economy Society, vol. 16(2), pages 245-256.
    18. D'Allesandro, Francesco & Santarelli, Enrico & Vivarelli, Marco, 2024. "The KSTE+I approach and the AI technologies," MERIT Working Papers 2024-016, United Nations University - Maastricht Economic and Social Research Institute on Innovation and Technology (MERIT).
    19. Ylenia Curci & Nathalie Greenan & Silvia Napolitano, 2024. "Innovating for the good or for the bad. An EU-wide analysis of the impact of technological transformation on job polarisation and unemployment," TEPP Working Paper 2024-02, TEPP.
    20. Ege Erdil & Andrei Potlogea & Tamay Besiroglu & Edu Roldan & Anson Ho & Jaime Sevilla & Matthew Barnett & Matej Vrzla & Robert Sandler, 2025. "GATE: An Integrated Assessment Model for AI Automation," Papers 2503.04941, arXiv.org, revised Mar 2025.

    More about this item

    Keywords

    Large Language Models; Categorical Variables; Labor Market Mismatch;
    All these keywords.

    JEL classification:

    • C55 - Mathematical and Quantitative Methods - - Econometric Modeling - - - Large Data Sets: Modeling and Analysis
    • J16 - Labor and Demographic Economics - - Demographic Economics - - - Economics of Gender; Non-labor Discrimination
    • J24 - Labor and Demographic Economics - - Demand and Supply of Labor - - - Human Capital; Skills; Occupational Choice; Labor Productivity
    • J31 - Labor and Demographic Economics - - Wages, Compensation, and Labor Costs - - - Wage Level and Structure; Wage Differentials

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:pen:papers:24-017. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Administrator (email available below). General contact details of provider: https://edirc.repec.org/data/deupaus.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.