IDEAS home Printed from https://ideas.repec.org/p/ecl/stabus/4188.html

Labor-LLM: Language-Based Occupational Representations with Large Language Models

Author

Listed:
  • Du, Tianyu

    (Stanford U)

  • Kanodia, Ayush

    (Stanford U)

  • Brunborg, Herman

    (Stanford U)

  • Vafa, Keyon

    (Harvard U)

  • Athey, Susan

    (Stanford U)

Abstract

Many empirical studies of labor market questions rely on estimating relatively simple predictive models using small, carefully constructed longitudinal survey datasets based on hand-engineered features. Large Language Models (LLMs), trained on massive datasets, encode vast quantities of world knowledge and can be used for the next job prediction problem. However, while an off-the-shelf LLM produces plausible career trajectories when prompted, the probability with which an LLM predicts a particular job transition conditional on career history will not, in general, align with the true conditional probability in a given population. Recently, Vafa et al. (2024) introduced a transformer-based “foundation model†, CAREER, trained using a large, unrepresentative resume dataset, that predicts transitions between jobs; it further demonstrated how transfer learning techniques can be used to leverage the foundation model to build better predictive models of both transitions and wages that reflect conditional transition probabilities found in nationally representative survey datasets. This paper considers an alternative where the fine-tuning of the CAREER foundation model is replaced by fine-tuning LLMs. For the task of next job prediction, we demonstrate that models trained with our approach outperform several alternatives in terms of predictive performance on the survey data, including traditional econometric models, CAREER, and LLMs with in-context learning, even though the LLM can in principle predict job titles that are not allowed in the survey data. Further, we show that our fine-tuned LLM-based models’ predictions are more representative of the career trajectories of various workforce subpopulations than off-the-shelf LLM models and CAREER. We conduct experiments and analyses that highlight the sources of the gains in the performance of our models for representative predictions.

Suggested Citation

  • Du, Tianyu & Kanodia, Ayush & Brunborg, Herman & Vafa, Keyon & Athey, Susan, 2024. "Labor-LLM: Language-Based Occupational Representations with Large Language Models," Research Papers 4188, Stanford University, Graduate School of Business.
  • Handle: RePEc:ecl:stabus:4188
    as

    Download full text from publisher

    File URL: https://www.gsb.stanford.edu/faculty-research/working-papers/labor-llm-language-based-occupational-representations-large
    Download Restriction: no
    ---><---

    Other versions of this item:

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Nutarelli, Federico & Edet, Samuel & Gnecco, Giorgio & Riccaboni, Massimo, 2025. "Predicting the technological complexity of global cities based on unsupervised and supervised machine learning methods," Journal of Economic Behavior & Organization, Elsevier, vol. 234(C).
    2. Mealy, Penelope Ann & Bucker, Joris Joseph Johannes Hendrik & Moura, Fernanda Senra de & Knudsen, Camilla, 2025. "Beyond Green Jobs : Advancing Metrics and Modeling Approaches for a Changing Labor Market," Policy Research Working Paper Series 11262, The World Bank.
    3. Lukas B. Freund & Lukas F. Mann, 2025. "Job Transformation, Specialization, and the Labor Market Effects of AI," CESifo Working Paper Series 12072, CESifo.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:ecl:stabus:4188. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: the person in charge (email available below). General contact details of provider: https://edirc.repec.org/data/gsstaus.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.