IDEAS home Printed from https://ideas.repec.org/p/dis/wpaper/dis2601.html

VEUCTOR : Training and Selecting Best Vector Space Models from Online Job Ads for European Countries

Author

Listed:
  • Emilio Colombo

  • Simone D'Amico

  • Fabio Mercorio

  • Mario Mezzanzanica

Abstract

Over the last decade, word embeddings have enabled machines to represent words and sentences as vectors, enabling researchers to reason on text for tasks like semantic similarity, contextual understanding, machine translation, etc. However, the synthesis of embeddings involves domain-specific parameters that affect semantic accuracy and contextual relevance, often leading to unpredictable biases and inconsistent comparisons. This issue is particularly relevant in labor market analysis, where different embeddings yield varying results, making the selection of the most appropriate model a key element. This paper addresses these challenges by (i) proposing a methodology to train, select, and align vector space models for a target taxonomy, ensuring comparability across dimensions and languages; (ii) applying this approach to 4.5 million job ads in 28 languages, aligning country-specific embeddings using the ESCO taxonomy; (iii) generating over 3,000 models over 142 machine days, making the best-performing ones publicly available via VEUCTOR; and (iv) showing how model choice significantly impacts labor market analysis, revealing substantial variations in occupational skill bundles across embeddings.

Suggested Citation

  • Emilio Colombo & Simone D'Amico & Fabio Mercorio & Mario Mezzanzanica, 2026. "VEUCTOR : Training and Selecting Best Vector Space Models from Online Job Ads for European Countries," DISEIS - Quaderni del Dipartimento di Economia internazionale, delle istituzioni e dello sviluppo dis2601, Università Cattolica del Sacro Cuore, Dipartimento di Economia internazionale, delle istituzioni e dello sviluppo (DISEIS).
  • Handle: RePEc:dis:wpaper:dis2601
    as

    Download full text from publisher

    File URL: http://dipartimenti.unicatt.it/diseis-wp_2601.pdf
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Alicia Sasser Modestino & Daniel Shoag & Joshua Ballance, 2020. "Upskilling: Do Employers Demand Greater Skill When Workers Are Plentiful?," The Review of Economics and Statistics, MIT Press, vol. 102(4), pages 793-805, October.
    2. Gu, Ran & Zhong, Ling, 2023. "Effects of stay-at-home orders on skill requirements in vacancy postings," Labour Economics, Elsevier, vol. 82(C).
    3. Goldman, Matt & Kaplan, David M., 2018. "Comparing distributions by multiple testing across quantiles or CDF values," Journal of Econometrics, Elsevier, vol. 206(1), pages 143-166.
    4. Emilio Colombo & Alberto Marcato, 2023. "Skill demand and labour market concentration: evidence from Italian vacancies," International Journal of Manpower, Emerald Group Publishing Limited, vol. 44(9), pages 156-198, October.
    5. Arthur Turrell & Bradley Speigner & Jyldyz Djumalieva & David Copple & James Thurgood, 2018. "Using job vacancies to understand the effects of labour market mismatch on UK output and productivity," Bank of England working papers 737, Bank of England.
    6. Colombo, Emilio & Mercorio, Fabio & Mezzanzanica, Mario, 2019. "AI meets labor market: Exploring the link between automation and skills," Information Economics and Policy, Elsevier, vol. 47(C), pages 27-37.
    7. Goldman, Matt & Kaplan, David M., 2018. "Comparing distributions by multiple testing across quantiles or CDF values," Journal of Econometrics, Elsevier, vol. 206(1), pages 143-166.
    8. Azar, José & Marinescu, Ioana & Steinbaum, Marshall & Taska, Bledi, 2020. "Concentration in US labor markets: Evidence from online vacancy data," Labour Economics, Elsevier, vol. 66(C).
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Arendt, Lukasz & Gałecka-Burdziak, Ewa & Núñez, Fernando & Pater, Robert & Usabiaga, Carlos, 2023. "Skills requirements across task-content groups in Poland: What online job offers tell us," Technological Forecasting and Social Change, Elsevier, vol. 187(C).
    2. Maciej Berk{e}sewicz & Herman Cherniaiev & Robert Pater, 2021. "Estimating the number of entities with vacancies using administrative and online data," Papers 2106.03263, arXiv.org.
    3. Pham, Tho & Talavera, Oleksandr & Wu, Zhuangchen, 2023. "Labor markets during war time: Evidence from online job advertisements," Journal of Comparative Economics, Elsevier, vol. 51(4), pages 1316-1333.
    4. Melo, Grace & Palma, Marco & Chomali, Laura & Ribera, Luis, 2025. "Are experts overoptimistic about the success of market labeling information?," 2025 AAEA & WAEA Joint Annual Meeting, July 27-29, 2025, Denver, CO 360812, Agricultural and Applied Economics Association.
    5. Blemings, Benjamin T. & Bock, Margaret & Scarcioffolo, Alexandre, 2022. "Hoggin' the Road: Negative Road Externalities of Pork Slaughterhouses," 2022 Annual Meeting, July 31-August 2, Anaheim, California 322466, Agricultural and Applied Economics Association.
    6. repec:ags:aaea22:343870 is not listed on IDEAS
    7. Gay, Victor, 2023. "Culture: An Empirical Investigation of Beliefs, Work, and Fertility. A Verification and Reproduction of Fernández and Fogli (American Economic Journal: Macroeconomics, 2009)," Journal of Comments and Replications in Economics (JCRE), ZBW - Leibniz Information Centre for Economics, vol. 2, pages 1-15.
    8. Chung, EunYi & Olivares, Mauricio, 2021. "Permutation test for heterogeneous treatment effects with a nuisance parameter," Journal of Econometrics, Elsevier, vol. 225(2), pages 148-174.
    9. Melo, Grace & Palma, Marco A. & Ribera, Luis A., 2024. "Are experts overoptimistic about the success of food market labeling information?," 2024 Annual Meeting, July 28-30, New Orleans, LA 343870, Agricultural and Applied Economics Association.
    10. Carlos Madeira, 2025. "How accurately do consumers report their debts in household surveys?," BIS Working Papers 1258, Bank for International Settlements.
    11. Wang, Duoyu & Cleary, Rebecca, 2023. "What contributes to the gap in nutritional quality across food security status?," 2023 Annual Meeting, July 23-25, Washington D.C. 335552, Agricultural and Applied Economics Association.
    12. Huang, Wei & Li, Teng & Pan, Yinghao & Ren, Jinyang, 2023. "Teacher characteristics and student performance: Evidence from random teacher-student assignments in China," Journal of Economic Behavior & Organization, Elsevier, vol. 214(C), pages 747-781.
    13. Caetano, Carolina & Caetano, Gregorio & Nielsen, Eric, 2024. "Are children spending too much time on enrichment activities?," Economics of Education Review, Elsevier, vol. 98(C).
    14. Xavier Cirera & Diego A. Comin & Marcio Cruz & Kyung Min Lee, 2020. "Anatomy of Technology in the Firm," NBER Working Papers 28080, National Bureau of Economic Research, Inc.
    15. repec:ags:aaea22:343960 is not listed on IDEAS
    16. Michael Hanemann & Jon Krosnick & Lisanne Wichgers & Jeffrey Wooldridge & Stephanie Lampron & Daniel Schneider & Eric M. Shaeffer & Trevor Tompson & Penny Visser, 2025. "Ben Franklin’s Whistle, Cost Expectations, and the Choice of Valuation Format," Environmental & Resource Economics, Springer;European Association of Environmental and Resource Economists, vol. 88(5), pages 1375-1406, May.
    17. John Mullahy, 2020. "Discovering Treatment Effectiveness via Median Treatment Effects—Applications to COVID-19 Clinical Trials," NBER Working Papers 27895, National Bureau of Economic Research, Inc.
    18. Anastasios Evgenidis & Apostolos Fasianos, 2021. "Unconventional Monetary Policy and Wealth Inequalities in Great Britain," Oxford Bulletin of Economics and Statistics, Department of Economics, University of Oxford, vol. 83(1), pages 115-175, February.
    19. Kochhar,Nishtha & Knippenberg,Erwin Willem Yvonnick Leon, 2023. "Droughts and Welfare in Afghanistan," Policy Research Working Paper Series 10272, The World Bank.
    20. David M. Kaplan, 2024. "Inference on Consensus Ranking of Distributions," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 42(3), pages 839-850, July.
    21. David M. Kaplan, 2019. "distcomp: Comparing distributions," Stata Journal, StataCorp LLC, vol. 19(4), pages 832-848, December.
    22. Liverpool-Tasie, Lenis Saweda O & Dillon, Andrew & Bloem, Jeffrey R. & Adjognon, Guigonan Serge, 2025. "Private sector promotion of agricultural technologies: Experimental evidence from Nigeria," Journal of Environmental Economics and Management, Elsevier, vol. 133(C).

    More about this item

    JEL classification:

    • J63 - Labor and Demographic Economics - - Mobility, Unemployment, Vacancies, and Immigrant Workers - - - Turnover; Vacancies; Layoffs
    • C55 - Mathematical and Quantitative Methods - - Econometric Modeling - - - Large Data Sets: Modeling and Analysis

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:dis:wpaper:dis2601. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Emilio Colombo (email available below). General contact details of provider: https://edirc.repec.org/data/dicatit.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.