IDEAS home Printed from https://ideas.repec.org/p/nsr/escoed/escoe-dp-2018-13.html
   My bibliography  Save this paper

An Open and Data-driven Taxonomy of Skills Extracted from Online Job Adverts

Author

Listed:
  • Jyldyz Djumalieva1
  • Cath Sleeman

Abstract

In this work we offer an open and data-driven skills taxonomy, which is independent of ESCO and O*NET, two popular available taxonomies that are expert-derived. Since the taxonomy is created in an algorithmic way without expert elicitation, it can be quickly updated to reflect changes in labour demand and provide timely insights to support labour market decision-making. Our proposed taxonomy also captures links between skills, aggregated job titles, and the salaries mentioned in the millions of UK job adverts used in this analysis. To generate the taxonomy, we employ machine learning methods, such as word embeddings, network community detection algorithms and consensus clustering. We model skills as a graph with individual skills as vertices and their co-occurrences in job adverts as edges. The strength of the relationships between the skills is measured using both the frequency of actual co-occurrences of skills in the same advert as well as their shared context, based on a trained word embeddings model. Once skills are represented as a network, we hierarchically group them into clusters. To ensure the stability of the resulting clusters, we introduce bootstrapping and consensus clustering stages into the methodology. While we share initial results and describe the skill clusters, the main purpose of this paper is to outline the methodology for building the taxonomy.

Suggested Citation

  • Jyldyz Djumalieva1 & Cath Sleeman, 2018. "An Open and Data-driven Taxonomy of Skills Extracted from Online Job Adverts," Economic Statistics Centre of Excellence (ESCoE) Discussion Papers ESCoE DP-2018-13, Economic Statistics Centre of Excellence (ESCoE).
  • Handle: RePEc:nsr:escoed:escoe-dp-2018-13
    as

    Download full text from publisher

    File URL: https://escoe-website.s3.amazonaws.com/wp-content/uploads/2020/07/13161304/ESCoE-DP-2018-13.pdf
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Martin Rosvall & Carl T Bergstrom, 2010. "Mapping Change in Large Networks," PLOS ONE, Public Library of Science, vol. 5(1), pages 1-7, January.
    2. Jyldyz Djumalieva & Antonio Lima & Cath Sleeman, 2018. "Classifying Occupations According to Their Skill Requirements in Job Advertisements," Economic Statistics Centre of Excellence (ESCoE) Discussion Papers ESCoE DP-2018-04, Economic Statistics Centre of Excellence (ESCoE).
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Josh Martin & Rebecca Riley, 2025. "Productivity measurement: Reassessing the production function from micro to macro," Journal of Economic Surveys, Wiley Blackwell, vol. 39(1), pages 246-279, February.
    2. Stef Garasto & Jyldyz Djumalieva & Karlis Kanders & Rachel Wilcock & Cath Sleeman, 2021. "Developing experimental estimates of regional skill demand," Economic Statistics Centre of Excellence (ESCoE) Discussion Papers ESCoE DP-2021-02, Economic Statistics Centre of Excellence (ESCoE).
    3. Seifried, Mareike & Jurowetzki, Roman & Kretschmer, Tobias, 2020. "Career paths in online labor markets: Same, same but different?," ZEW Discussion Papers 20-090, ZEW - Leibniz Centre for European Economic Research.
    4. Seifried, Mareike, 2021. "Transitions from offline to online labor markets: The relationship between freelancers' prior offline and online work experience," ZEW Discussion Papers 21-101, ZEW - Leibniz Centre for European Economic Research.
    5. Marios Kokkodis, 2023. "Adjusting Skillset Cohesion in Online Labor Markets: Reputation Gains and Opportunity Losses," Information Systems Research, INFORMS, vol. 34(3), pages 1245-1258, September.
    6. Brenčič, Vera & McGee, Andrew, 2023. "Employers' Demand for Personality Traits," IZA Discussion Papers 16083, Institute of Labor Economics (IZA).
    7. Eggenberger, Christian & Backes-Gellner, Uschi, 2023. "IT skills, occupation specificity and job separations," Economics of Education Review, Elsevier, vol. 92(C).
    8. Leonardo Fabio Morales & Carlos Ospino & Nicole Amaral, 2021. "Online Vacancies and its Role in Labor Market Performance," Borradores de Economia 1174, Banco de la Republica de Colombia.
    9. Jyldyz Djumalieva & Stef Garasto & Cath Sleeman, 2020. "Evaluating a new earnings indicator. Can we improve the timeliness of existing statistics on earnings by using salary information from online job adverts?," Economic Statistics Centre of Excellence (ESCoE) Discussion Papers ESCoE DP-2020-19, Economic Statistics Centre of Excellence (ESCoE).
    10. Jagjit S. Chadha & Richard Barwell, 2019. "Renewing our Monetary Vows: Open Letters to the Governor of the Bank of England," National Institute of Economic and Social Research (NIESR) Occasional Papers 58, National Institute of Economic and Social Research.
    11. Mónica Santana & Mirta Díaz-Fernández, 2023. "Competencies for the artificial intelligence age: visualisation of the state of the art and future perspectives," Review of Managerial Science, Springer, vol. 17(6), pages 1971-2004, August.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Ali Najmi & Taha H. Rashidi & Alireza Abbasi & S. Travis Waller, 2017. "Reviewing the transport domain: an evolutionary bibliometrics and network analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 110(2), pages 843-865, February.
    2. Jimi Adams & Ryan Light, 2014. "Mapping Interdisciplinary Fields: Efficiencies, Gaps and Redundancies in HIV/AIDS Research," PLOS ONE, Public Library of Science, vol. 9(12), pages 1-13, December.
    3. Melissa Haller & David L. Rigby, 2020. "The geographic evolution of optics technologies in the United States, 1976–2010," Papers in Regional Science, Wiley Blackwell, vol. 99(6), pages 1539-1559, December.
    4. Luis Lorenzo & Javier Arroyo, 2022. "Analysis of the cryptocurrency market using different prototype-based clustering techniques," Financial Innovation, Springer;Southwestern University of Finance and Economics, vol. 8(1), pages 1-46, December.
    5. Benjamin Allen & Christine Sample & Yulia Dementieva & Ruben C Medeiros & Christopher Paoletti & Martin A Nowak, 2015. "The Molecular Clock of Neutral Evolution Can Be Accelerated or Slowed by Asymmetric Spatial Structure," PLOS Computational Biology, Public Library of Science, vol. 11(2), pages 1-32, February.
    6. Bech, Morten L. & Bergstrom, Carl T. & Rosvall, Martin & Garratt, Rodney J., 2015. "Mapping change in the overnight money market," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 424(C), pages 44-51.
    7. Nicolo Musmeci & Tomaso Aste & Tiziana Di Matteo, 2014. "Relation between Financial Market Structure and the Real Economy: Comparison between Clustering Methods," Papers 1406.0496, arXiv.org, revised Jan 2015.
    8. Caglayan, Mustafa & Talavera, Oleksandr & Xiong, Lin, 2022. "Female small business owners in China: Discouraged, not discriminated," Journal of International Financial Markets, Institutions and Money, Elsevier, vol. 80(C).
    9. Wenyuan Liu & Andrea Nanetti & Siew Ann Cheong, 2017. "Knowledge evolution in physics research: An analysis of bibliographic coupling networks," PLOS ONE, Public Library of Science, vol. 12(9), pages 1-19, September.
    10. Günter Wallner & Simone Kriglstein & Edward Chung & Syeed Anta Kashfi, 2018. "Visualisation of trip chaining behaviour and mode choice using household travel survey data," Public Transport, Springer, vol. 10(3), pages 427-453, December.
    11. Faryna, Oleksandr & Pham, Tho & Talavera, Oleksandr & Tsapin, Andriy, 2020. "Wage Setting and Unemployment: Evidence from Online Job Vacancy Data," GLO Discussion Paper Series 503, Global Labor Organization (GLO).
    12. Vincent Labatut & Jean-Michel Balasque, 2012. "Detection and Interpretation of Communities in Complex Networks: Methods and Practical Application," Post-Print hal-00633653, HAL.
    13. Bogdan Walek & Ondrej Pektor, 2021. "Data Mining of Job Requirements in Online Job Advertisements Using Machine Learning and SDCA Logistic Regression," Mathematics, MDPI, vol. 9(19), pages 1-32, October.
    14. P. Dorta-González & M. I. Dorta-González, 2013. "Comparing journals from different fields of science and social science through a JCR subject categories normalized impact factor," Scientometrics, Springer;Akadémiai Kiadó, vol. 95(2), pages 645-672, May.
    15. Ying Lu & Walter Timo de Vries, 2021. "A Bibliometric and Visual Analysis of Rural Development Research," Sustainability, MDPI, vol. 13(11), pages 1-21, May.
    16. Benatti, Alexandre & Ferraz de Arrruda, Henrique & Nascimento Silva, Filipi & da Fontoura Costa, Luciano, 2021. "Enriching and analyzing small citation networks: A case study on transistor’s history," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 573(C).
    17. Cecily Josten & Grace Lordan, 2022. "Automation and the changing nature of work," PLOS ONE, Public Library of Science, vol. 17(5), pages 1-15, May.
    18. Daniel Pélissier, 2022. "Le temps dans le discours, expérimentation d’un protocole d’observation des caractéristiques temporelles d’un corpus d’avis de salariés," Post-Print hal-04554013, HAL.
    19. Alvin Vista, 2020. "Data-Driven Identification of Skills for the Future: 21st-Century Skills for the 21st-Century Workforce," SAGE Open, , vol. 10(2), pages 21582440209, April.
    20. Loet Leydesdorff & Caroline S. Wagner & Lutz Bornmann, 2018. "Discontinuities in citation relations among journals: self-organized criticality as a model of scientific revolutions and change," Scientometrics, Springer;Akadémiai Kiadó, vol. 116(1), pages 623-644, July.

    More about this item

    Keywords

    Skills; Skills taxonomy; Labour demand; Online job adverts; Big data; Machine learning; Word embeddings;
    All these keywords.

    JEL classification:

    • C18 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods and Methodology: General - - - Methodolical Issues: General
    • C38 - Mathematical and Quantitative Methods - - Multiple or Simultaneous Equation Models; Multiple Variables - - - Classification Methdos; Cluster Analysis; Principal Components; Factor Analysis
    • J23 - Labor and Demographic Economics - - Demand and Supply of Labor - - - Labor Demand
    • J24 - Labor and Demographic Economics - - Demand and Supply of Labor - - - Human Capital; Skills; Occupational Choice; Labor Productivity

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nsr:escoed:escoe-dp-2018-13. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ESCoE Centre Manager (email available below). General contact details of provider: https://edirc.repec.org/data/escoeuk.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.