IDEAS home Printed from https://ideas.repec.org/p/pra/mprapa/126336.html
   My bibliography  Save this paper

Extracting O*NET Features from the NLx Corpus to Build Public Use Aggregate Labor Market Data

Author

Listed:
  • Meisenbacher, Stephen
  • Nestorov, Svetlozar
  • Norlander, Peter

Abstract

Data from online job postings are difficult to access and are not built in a standard or transparent manner. Data included in the standard taxonomy and occupational information database (O*NET) are updated infrequently and based on small survey samples. We adopt O*NET as a framework for building natural language processing tools that extract structured information from job postings. We publish the Job Ad Analysis Toolkit (JAAT), a collection of open-source tools built for this purpose, and demonstrate its reliability and accuracy in out-of-sample and LLM-as-judge testing. We extract more than 10 billion data points from more than 155 million online job ads provided by the National Labor Exchange (NLx) Research Hub, including O*NET tasks, occupation codes, tools, and technologies, as well as wages, skills, industry, and more features. We describe the construction of a dataset of occupation, state, and industry level features aggregated by monthly active jobs from 2015 – 2025. We illustrate the potential for research and future uses in education and workforce development.

Suggested Citation

  • Meisenbacher, Stephen & Nestorov, Svetlozar & Norlander, Peter, 2025. "Extracting O*NET Features from the NLx Corpus to Build Public Use Aggregate Labor Market Data," MPRA Paper 126336, University Library of Munich, Germany.
  • Handle: RePEc:pra:mprapa:126336
    as

    Download full text from publisher

    File URL: https://mpra.ub.uni-muenchen.de/126336/1/MPRA_paper_126336.pdf
    File Function: original version
    Download Restriction: no
    ---><---

    More about this item

    Keywords

    ;
    ;
    ;
    ;
    ;

    JEL classification:

    • J23 - Labor and Demographic Economics - - Demand and Supply of Labor - - - Labor Demand
    • J24 - Labor and Demographic Economics - - Demand and Supply of Labor - - - Human Capital; Skills; Occupational Choice; Labor Productivity
    • J63 - Labor and Demographic Economics - - Mobility, Unemployment, Vacancies, and Immigrant Workers - - - Turnover; Vacancies; Layoffs

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:pra:mprapa:126336. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Joachim Winter (email available below). General contact details of provider: https://edirc.repec.org/data/vfmunde.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.