IDEAS home Printed from
   My bibliography  Save this paper

An Open and Data-driven Taxonomy of Skills Extracted from Online Job Adverts


  • Jyldyz Djumalieva1
  • Cath Sleeman


In this work we offer an open and data-driven skills taxonomy, which is independent of ESCO and O*NET, two popular available taxonomies that are expert-derived. Since the taxonomy is created in an algorithmic way without expert elicitation, it can be quickly updated to reflect changes in labour demand and provide timely insights to support labour market decision-making. Our proposed taxonomy also captures links between skills, aggregated job titles, and the salaries mentioned in the millions of UK job adverts used in this analysis. To generate the taxonomy, we employ machine learning methods, such as word embeddings, network community detection algorithms and consensus clustering. We model skills as a graph with individual skills as vertices and their co-occurrences in job adverts as edges. The strength of the relationships between the skills is measured using both the frequency of actual co-occurrences of skills in the same advert as well as their shared context, based on a trained word embeddings model. Once skills are represented as a network, we hierarchically group them into clusters. To ensure the stability of the resulting clusters, we introduce bootstrapping and consensus clustering stages into the methodology. While we share initial results and describe the skill clusters, the main purpose of this paper is to outline the methodology for building the taxonomy.

Suggested Citation

  • Jyldyz Djumalieva1 & Cath Sleeman, 2018. "An Open and Data-driven Taxonomy of Skills Extracted from Online Job Adverts," Economic Statistics Centre of Excellence (ESCoE) Discussion Papers ESCoE DP-2018-13, Economic Statistics Centre of Excellence (ESCoE).
  • Handle: RePEc:nsr:escoed:escoe-dp-2018-13

    Download full text from publisher

    File URL:
    Download Restriction: no

    References listed on IDEAS

    1. Martin Rosvall & Carl T Bergstrom, 2010. "Mapping Change in Large Networks," PLOS ONE, Public Library of Science, vol. 5(1), pages 1-7, January.
    2. Jyldyz Djumalieva & Antonio Lima & Cath Sleeman, 2018. "Classifying Occupations According to Their Skill Requirements in Job Advertisements," Economic Statistics Centre of Excellence (ESCoE) Discussion Papers ESCoE DP-2018-04, Economic Statistics Centre of Excellence (ESCoE).
    Full references (including those not matched with items on IDEAS)


    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.

    Cited by:

    1. Stef Garasto & Jyldyz Djumalieva & Karlis Kanders & Rachel Wilcock & Cath Sleeman, 2021. "Developing experimental estimates of regional skill demand," Economic Statistics Centre of Excellence (ESCoE) Discussion Papers ESCoE DP-2021-02, Economic Statistics Centre of Excellence (ESCoE).
    2. Seifried, Mareike & Jurowetzki, Roman & Kretschmer, Tobias, 2020. "Career paths in online labor markets: Same, same but different?," ZEW Discussion Papers 20-090, ZEW - Leibniz Centre for European Economic Research.
    3. Seifried, Mareike, 2021. "Transitions from offline to online labor markets: The relationship between freelancers' prior offline and online work experience," ZEW Discussion Papers 21-101, ZEW - Leibniz Centre for European Economic Research.
    4. Brenčič, Vera & McGee, Andrew, 2023. "Employers' Demand for Personality Traits," IZA Discussion Papers 16083, Institute of Labor Economics (IZA).
    5. Eggenberger, Christian & Backes-Gellner, Uschi, 2023. "IT skills, occupation specificity and job separations," Economics of Education Review, Elsevier, vol. 92(C).
    6. Leonardo Fabio Morales & Carlos Ospino & Nicole Amaral, 2021. "Online Vacancies and its Role in Labor Market Performance," Borradores de Economia 1174, Banco de la Republica de Colombia.
    7. Josh Martin & Rebecca Riley, 2023. "Productivity measurement - Reassessing the production function from micro to macro," Working Papers 033, The Productivity Institute.
    8. Jyldyz Djumalieva & Stef Garasto & Cath Sleeman, 2020. "Evaluating a new earnings indicator. Can we improve the timeliness of existing statistics on earnings by using salary information from online job adverts?," Economic Statistics Centre of Excellence (ESCoE) Discussion Papers ESCoE DP-2020-19, Economic Statistics Centre of Excellence (ESCoE).
    9. Jagjit S. Chadha & Richard Barwell, 2019. "Renewing our Monetary Vows: Open Letters to the Governor of the Bank of England," National Institute of Economic and Social Research (NIESR) Occasional Papers 58, National Institute of Economic and Social Research.
    10. Mónica Santana & Mirta Díaz-Fernández, 2023. "Competencies for the artificial intelligence age: visualisation of the state of the art and future perspectives," Review of Managerial Science, Springer, vol. 17(6), pages 1971-2004, August.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Shiji Chen & Clément Arsenault & Yves Gingras & Vincent Larivière, 2015. "Exploring the interdisciplinary evolution of a discipline: the case of Biochemistry and Molecular Biology," Scientometrics, Springer;Akadémiai Kiadó, vol. 102(2), pages 1307-1323, February.
    2. Ali Najmi & Taha H. Rashidi & Alireza Abbasi & S. Travis Waller, 2017. "Reviewing the transport domain: an evolutionary bibliometrics and network analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 110(2), pages 843-865, February.
    3. Jimi Adams & Ryan Light, 2014. "Mapping Interdisciplinary Fields: Efficiencies, Gaps and Redundancies in HIV/AIDS Research," PLOS ONE, Public Library of Science, vol. 9(12), pages 1-13, December.
    4. Nicola Melluso & Andrea Bonaccorsi & Filippo Chiarello & Gualtiero Fantoni, 2021. "Rapid detection of fast innovation under the pressure of COVID-19," Papers 2102.00197,
    5. Melissa Haller & David L. Rigby, 2020. "The geographic evolution of optics technologies in the United States, 1976–2010," Papers in Regional Science, Wiley Blackwell, vol. 99(6), pages 1539-1559, December.
    6. Raghu Raman & Nava Subramaniam & Vinith Kumar Nair & Avinash Shivdas & Krishnashree Achuthan & Prema Nedungadi, 2022. "Women Entrepreneurship and Sustainable Development: Bibliometric Analysis and Emerging Research Trends," Sustainability, MDPI, vol. 14(15), pages 1-31, July.
    7. Yang Li & Yongcheng Qi, 2020. "Asymptotic distribution of modularity in networks," Metrika: International Journal for Theoretical and Applied Statistics, Springer, vol. 83(4), pages 467-484, May.
    8. Ziqiao Ao & Gergely Horvath & Chunyuan Sheng & Yifan Song & Yutong Sun, 2022. "Skill requirements in job advertisements: A comparison of skill-categorization methods based on explanatory power in wage regressions," Papers 2207.12834,
    9. Jinyang Dong & Jiamou Liu & Tiezhong Liu, 2021. "The impact of top scientists on the community development of basic research directed by government funding: evidence from program 973 in China," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(10), pages 8561-8579, October.
    10. Ying Liu & Naizhuo Zhao & Jennifer K Vanos & Guofeng Cao, 2018. "Visualizing changes in nationally averaged PM2.5 concentrations by an alluvial diagram," Environment and Planning A, , vol. 50(2), pages 259-261, March.
    11. Luis Lorenzo & Javier Arroyo, 2022. "Analysis of the cryptocurrency market using different prototype-based clustering techniques," Financial Innovation, Springer;Southwestern University of Finance and Economics, vol. 8(1), pages 1-46, December.
    12. Lovro Šubelj & Nees Jan van Eck & Ludo Waltman, 2016. "Clustering Scientific Publications Based on Citation Relations: A Systematic Comparison of Different Methods," PLOS ONE, Public Library of Science, vol. 11(4), pages 1-23, April.
    13. Weiwei Pan & Lirong Jian & Tao Liu, 2019. "Grey system theory trends from 1991 to 2018: a bibliometric analysis and visualization," Scientometrics, Springer;Akadémiai Kiadó, vol. 121(3), pages 1407-1434, December.
    14. Chae, Bongsug (Kevin), 2019. "A General framework for studying the evolution of the digital innovation ecosystem: The case of big data," International Journal of Information Management, Elsevier, vol. 45(C), pages 83-94.
    15. Benjamin Allen & Christine Sample & Yulia Dementieva & Ruben C Medeiros & Christopher Paoletti & Martin A Nowak, 2015. "The Molecular Clock of Neutral Evolution Can Be Accelerated or Slowed by Asymmetric Spatial Structure," PLOS Computational Biology, Public Library of Science, vol. 11(2), pages 1-32, February.
    16. Bech, Morten L. & Bergstrom, Carl T. & Rosvall, Martin & Garratt, Rodney J., 2015. "Mapping change in the overnight money market," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 424(C), pages 44-51.
    17. de Nooy, Wouter & Leydesdorff, Loet, 2015. "The dynamics of triads in aggregated journal–journal citation relations: Specialty developments at the above-journal level," Journal of Informetrics, Elsevier, vol. 9(3), pages 542-554.
    18. Nicolo Musmeci & Tomaso Aste & Tiziana Di Matteo, 2014. "Relation between Financial Market Structure and the Real Economy: Comparison between Clustering Methods," Papers 1406.0496,, revised Jan 2015.
    19. Mingers, John & Leydesdorff, Loet, 2015. "A review of theory and practice in scientometrics," European Journal of Operational Research, Elsevier, vol. 246(1), pages 1-19.
    20. Ke, Qing, 2018. "Comparing scientific and technological impact of biomedical research," Journal of Informetrics, Elsevier, vol. 12(3), pages 706-717.

    More about this item


    Skills; Skills taxonomy; Labour demand; Online job adverts; Big data; Machine learning; Word embeddings;
    All these keywords.

    JEL classification:

    • C18 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods and Methodology: General - - - Methodolical Issues: General
    • C38 - Mathematical and Quantitative Methods - - Multiple or Simultaneous Equation Models; Multiple Variables - - - Classification Methdos; Cluster Analysis; Principal Components; Factor Analysis
    • J23 - Labor and Demographic Economics - - Demand and Supply of Labor - - - Labor Demand
    • J24 - Labor and Demographic Economics - - Demand and Supply of Labor - - - Human Capital; Skills; Occupational Choice; Labor Productivity

    NEP fields

    This paper has been announced in the following NEP Reports:


    Access and download statistics


    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nsr:escoed:escoe-dp-2018-13. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ESCoE Centre Manager (email available below). General contact details of provider: .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.