IDEAS home Printed from https://ideas.repec.org/p/pra/mprapa/126336.html
   My bibliography  Save this paper

Extracting O*NET Features from the NLx Corpus to Build Public Use Aggregate Labor Market Data

Author

Listed:
  • Meisenbacher, Stephen
  • Nestorov, Svetlozar
  • Norlander, Peter

Abstract

Data from online job postings are difficult to access and are not built in a standard or transparent manner. Data included in the standard taxonomy and occupational information database (O*NET) are updated infrequently and based on small survey samples. We adopt O*NET as a framework for building natural language processing tools that extract structured information from job postings. We publish the Job Ad Analysis Toolkit (JAAT), a collection of open-source tools built for this purpose, and demonstrate its reliability and accuracy in out-of-sample and LLM-as-judge testing. We extract more than 10 billion data points from more than 155 million online job ads provided by the National Labor Exchange (NLx) Research Hub, including O*NET tasks, occupation codes, tools, and technologies, as well as wages, skills, industry, and more features. We describe the construction of a dataset of occupation, state, and industry level features aggregated by monthly active jobs from 2015 – 2025. We illustrate the potential for research and future uses in education and workforce development.

Suggested Citation

  • Meisenbacher, Stephen & Nestorov, Svetlozar & Norlander, Peter, 2025. "Extracting O*NET Features from the NLx Corpus to Build Public Use Aggregate Labor Market Data," MPRA Paper 126336, University Library of Munich, Germany.
  • Handle: RePEc:pra:mprapa:126336
    as

    Download full text from publisher

    File URL: https://mpra.ub.uni-muenchen.de/126336/1/MPRA_paper_126336.pdf
    File Function: original version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Bloom, Nicholas & Davis, Steven J. & Hansen, Stephen & Lambert, Peter John & Sadun, Raffaella & Taska, Bledi, 2023. "Remote work across jobs, companies and space," LSE Research Online Documents on Economics 121302, London School of Economics and Political Science, LSE Library.
    2. Mark DesJardine & Pratima Bansal, 2019. "One Step Forward, Two Steps Back: How Negative External Evaluations Can Shorten Organizational Time Horizons," Organization Science, INFORMS, vol. 30(4), pages 761-780, July.
    3. Dingel, Jonathan I. & Neiman, Brent, 2020. "How many jobs can be done at home?," Journal of Public Economics, Elsevier, vol. 189(C).
    4. Brad Hershbein & Lisa B. Kahn, 2018. "Do Recessions Accelerate Routine-Biased Technological Change? Evidence from Vacancy Postings," American Economic Review, American Economic Association, vol. 108(7), pages 1737-1772, July.
    5. Andreas I Mueller & Damian Osterwalder & Josef Zweimüller & Andreas Kettemann, 2024. "Vacancy Durations and Entry Wages: Evidence from Linked Vacancy–Employer–Employee Data," The Review of Economic Studies, Review of Economic Studies Ltd, vol. 91(3), pages 1807-1841.
    6. Randall W. Eberts & Christopher J. O'Leary, 2003. "A Frontline Decision Support System for Georgia Career Centers," Book chapters authored by Upjohn Institute researchers, in: Joshua Riley & Aquila Branch & Stephen Wandner & Wayne Gordon (ed.),A Compilation of Selected Papers from the Employment and Training Administration's 2003 Biennial National Research Conference, ETA Occasional Paper 20, pages 80-129, W.E. Upjohn Institute for Employment Research.
    7. Ioana Marinescu & Ronald Wolthoff, 2020. "Opening the Black Box of the Matching Function: The Power of Words," Journal of Labor Economics, University of Chicago Press, vol. 38(2), pages 535-568.
    8. Ciao-Wei Chen & Laura Yue Li, 2023. "Is hiring fast a good sign? The informativeness of job vacancy duration for future firm profitability," Review of Accounting Studies, Springer, vol. 28(3), pages 1316-1353, September.
    9. Nada Wasi & Aaron Flaaen, 2015. "Record linkage using Stata: Preprocessing, linking, and reviewing utilities," Stata Journal, StataCorp LLC, vol. 15(3), pages 672-697, September.
    10. Honey Batra & Amanda Michaud & Simon Mongey, 2023. "Online Job Posts Contain Very Little Wage Information," NBER Working Papers 31984, National Bureau of Economic Research, Inc.
    11. DeVaro, Jed & Norlander, Peter, 2021. "Wage Theft, Economic Conditions, and Market Power: The Case of H-1B Workers," GLO Discussion Paper Series 855, Global Labor Organization (GLO).
    12. David J. Deming, 2017. "The Growing Importance of Social Skills in the Labor Market," The Quarterly Journal of Economics, President and Fellows of Harvard College, vol. 132(4), pages 1593-1640.
    13. Christina Gathmann & Felix Grimm & Erwin Winkler, 2024. "AI, Task Changes in Jobs, and Worker Reallocation," CESifo Working Paper Series 11585, CESifo.
    14. Steven J. Davis & R. Jason Faberman & John C. Haltiwanger, 2013. "The Establishment-Level Behavior of Vacancies and Hiring," The Quarterly Journal of Economics, President and Fellows of Harvard College, vol. 128(2), pages 581-622.
    15. Prithwiraj Choudhury & Evan Starr & Rajshree Agarwal, 2020. "Machine learning and human capital complementarities: Experimental evidence on bias mitigation," Strategic Management Journal, Wiley Blackwell, vol. 41(8), pages 1381-1411, August.
    16. Jeffrey Clemens & Lisa B. Kahn & Jonathan Meer, 2021. "Dropouts Need Not Apply? The Minimum Wage and Skill Upgrading," Journal of Labor Economics, University of Chicago Press, vol. 39(S1), pages 107-149.
    17. Meisenbacher, Stephen & Norlander, Peter, 2022. "Creating Data from Unstructured Text with Context Rule Assisted Machine Learning (CRAML)," GLO Discussion Paper Series 1214, Global Labor Organization (GLO).
    18. Alan S. Blinder, 2009. "How Many US Jobs Might be Offshorable?," World Economics, World Economics, 1 Ivory Square, Plantation Wharf, London, United Kingdom, SW11 3UE, vol. 10(2), pages 41-78, April.
    19. Francois Brochet & Maria Loumioti & George Serafeim, 2015. "Speaking of the short-term: disclosure horizon and managerial myopia," Review of Accounting Studies, Springer, vol. 20(3), pages 1122-1163, September.
    20. Moss, Todd W. & Renko, Maija & Block, Emily & Meyskens, Moriah, 2018. "Funding the story of hybrid ventures: Crowdfunder lending preferences and linguistic hybridity," Journal of Business Venturing, Elsevier, vol. 33(5), pages 643-659.
    21. Matthew Dey & Susan Houseman, 2025. "The Rise of the Contract Workforce in US Manufacturing," NBER Chapters, in: The Changing Nature of Work, National Bureau of Economic Research, Inc.
    22. Shelley Stall & Lynn Yarmey & Joel Cutcher-Gershenfeld & Brooks Hanson & Kerstin Lehnert & Brian Nosek & Mark Parsons & Erin Robinson & Lesley Wyborn, 2019. "Make scientific data FAIR," Nature, Nature, vol. 570(7759), pages 27-29, June.
    23. Atalay, Enghin & Phongthiengtham, Phai & Sotelo, Sebastian & Tannenbaum, Daniel, 2018. "New technologies and the labor market," Journal of Monetary Economics, Elsevier, vol. 97(C), pages 48-67.
    24. David E. Balducchi & Randall W. Eberts & Christopher J. O'Leary (ed.), 2004. "Labor Exchange Policy in the United States," Books from Upjohn Press, W.E. Upjohn Institute for Employment Research, number lep.
    25. Bell, Alex, 2020. "Job Amenities & Earnings Inequality," MPRA Paper 118516, University Library of Munich, Germany.
    26. Jonathon Hazell & Christina Patterson & Heather Sarsons & Bledi Taska, 2022. "National Wage Setting," NBER Working Papers 30623, National Bureau of Economic Research, Inc.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Alex Chernoff & Casey Warman, 2023. "COVID-19 and implications for automation," Applied Economics, Taylor & Francis Journals, vol. 55(17), pages 1939-1957, April.
    2. Abi Adams & Tom Waters & Maria Balgova & Matthias Qian, 2023. "Firm concentration & job design: the case of schedule flexible work arrangements," IFS Working Papers W23/14, Institute for Fiscal Studies.
    3. Bhuller, Manudeep & Kostøl, Andreas & Vigtel, Trond Christian, 2019. "How Broadband Internet Affects Labor Market Matching," Memorandum 10/2019, Oslo University, Department of Economics.
    4. Bagger, Jesper & Fontaine, Francois & Galenianos, Manolis & Trapeznikova, Ija, 2022. "Vacancies, employment outcomes and firm growth: Evidence from Denmark," Labour Economics, Elsevier, vol. 75(C).
    5. Emilio Colombo & Alberto Marcato, 2021. "Skill Demand and Labour Market Concentration: Theory and Evidence from Italian Vacancies," DISEIS - Quaderni del Dipartimento di Economia internazionale, delle istituzioni e dello sviluppo dis2104, Università Cattolica del Sacro Cuore, Dipartimento di Economia internazionale, delle istituzioni e dello sviluppo (DISEIS).
    6. Blanas, Sotiris & Oikonomou, Rigas, 2023. "COVID-induced economic uncertainty, tasks and occupational demand," Labour Economics, Elsevier, vol. 81(C).
    7. Alejandra Bellatin & Gabriela Galassi, 2022. "What COVID-19 May Leave Behind: Technology-Related Job Postings in Canada," Staff Working Papers 22-17, Bank of Canada.
    8. Bloom, Nicholas & Davis, Steven J. & Hansen, Stephen & Lambert, Peter John & Sadun, Raffaella & Taska, Bledi, 2023. "Remote work across jobs, companies and space," LSE Research Online Documents on Economics 121302, London School of Economics and Political Science, LSE Library.
    9. Hayato Kanayama & Suguru Otani, 2024. "Nonparametric Estimation of Matching Efficiency and Elasticity in a Spot Gig Work Platform: 2019-2023," Papers 2412.19024, arXiv.org, revised Mar 2025.
    10. Andreas I Mueller & Damian Osterwalder & Josef Zweimüller & Andreas Kettemann, 2024. "Vacancy Durations and Entry Wages: Evidence from Linked Vacancy–Employer–Employee Data," The Review of Economic Studies, Review of Economic Studies Ltd, vol. 91(3), pages 1807-1841.
    11. Carlos Carrillo-Tudela & Hermann Gartner & Leo Kaas, 2023. "Recruitment Policies, Job-Filling Rates, and Matching Efficiency," Journal of the European Economic Association, European Economic Association, vol. 21(6), pages 2413-2459.
    12. Consoli, Davide & Marin, Giovanni & Rentocchini, Francesco & Vona, Francesco, 2023. "Routinization, within-occupation task changes and long-run employment dynamics," Research Policy, Elsevier, vol. 52(1).
    13. Koomen, Miriam & Backes-Gellner, Uschi, 2022. "Occupational tasks and wage inequality in West Germany: A decomposition analysis," Labour Economics, Elsevier, vol. 79(C).
    14. Daniel Kreisman & Jonathan Smith & Bondi Arifin, 2023. "Labor Market Signaling and the Value of College: Evidence from Resumes and the Truth," Journal of Human Resources, University of Wisconsin Press, vol. 58(6), pages 1820-1849.
    15. David Deming & Lisa B. Kahn, 2018. "Skill Requirements across Firms and Labor Markets: Evidence from Job Postings for Professionals," Journal of Labor Economics, University of Chicago Press, vol. 36(S1), pages 337-369.
    16. Pater, Robert & Szkola, Jaroslaw & Kozak, Marcin, 2019. "A method for measuring detailed demand for workers' competences," Economics - The Open-Access, Open-Assessment E-Journal (2007-2020), Kiel Institute for the World Economy (IfW Kiel), vol. 13, pages 1-30.
    17. Schultheiss, Tobias & Pfister, Curdin & Gnehm, Ann-Sophie & Backes-Gellner, Uschi, 2023. "Education expansion and high-skill job opportunities for workers: Does a rising tide lift all boats?," Labour Economics, Elsevier, vol. 82(C).
    18. Sugat Chaturvedi & Kanika Mahajan & Zahra Siddique, 2024. "Using Domain-Specific Word Embeddings to Examine the Demand for Skills," Research in Labor Economics, in: Big Data Applications in Labor Economics, Part B, volume 52, pages 171-223, Emerald Group Publishing Limited.
    19. Maciej Beręsewicz & Herman Cherniaiev & Andrzej Mantaj & Robert Pater, 2024. "Text analysis of job offers for mismatch of educational characteristics to labour market demands," Quality & Quantity: International Journal of Methodology, Springer, vol. 58(2), pages 1799-1825, April.
    20. Suguru Otani, 2024. "Nonparametric Estimation of Matching Efficiency and Elasticity on a Private On-the-Job Search Platform: Evidence from Japan, 2014-2024," Papers 2410.17011, arXiv.org, revised Nov 2025.

    More about this item

    Keywords

    ;
    ;
    ;
    ;
    ;

    JEL classification:

    • J23 - Labor and Demographic Economics - - Demand and Supply of Labor - - - Labor Demand
    • J24 - Labor and Demographic Economics - - Demand and Supply of Labor - - - Human Capital; Skills; Occupational Choice; Labor Productivity
    • J63 - Labor and Demographic Economics - - Mobility, Unemployment, Vacancies, and Immigrant Workers - - - Turnover; Vacancies; Layoffs

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:pra:mprapa:126336. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Joachim Winter (email available below). General contact details of provider: https://edirc.repec.org/data/vfmunde.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.