IDEAS home Printed from https://ideas.repec.org/a/eee/infome/v7y2013i4p874-886.html
   My bibliography  Save this article

Research literature clustering using diffusion maps

Author

Listed:
  • Nieminen, Paavo
  • Pölönen, Ilkka
  • Sipola, Tuomo

Abstract

We apply the knowledge discovery process to the mapping of current topics in a particular field of science. We are interested in how articles form clusters and what are the contents of the found clusters. A framework involving web scraping, keyword extraction, dimensionality reduction and clustering using the diffusion map algorithm is presented. We use publicly available information about articles in high-impact journals. The method should be of use to practitioners or scientists who want to overview recent research in a field of science. As a case study, we map the topics in data mining literature in the year 2011.

Suggested Citation

  • Nieminen, Paavo & Pölönen, Ilkka & Sipola, Tuomo, 2013. "Research literature clustering using diffusion maps," Journal of Informetrics, Elsevier, vol. 7(4), pages 874-886.
  • Handle: RePEc:eee:infome:v:7:y:2013:i:4:p:874-886
    DOI: 10.1016/j.joi.2013.08.004
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S1751157713000680
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.joi.2013.08.004?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Chaomei Chen, 2006. "CiteSpace II: Detecting and visualizing emerging trends and transient patterns in scientific literature," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 57(3), pages 359-377, February.
    2. Ismael Rafols & Alan L. Porter & Loet Leydesdorff, 2010. "Science overlay maps: A new tool for research policy and library management," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 61(9), pages 1871-1887, September.
    3. David J. Hand & Heikki Mannila & Padhraic Smyth, 2001. "Principles of Data Mining," MIT Press Books, The MIT Press, edition 1, volume 1, number 026208290x, December.
    4. Yuen-Hsien Tseng & Ming-Yueh Tsay, 2013. "Journal clustering of library and information science for subfield delineation using the bibliometric analysis toolkit: CATAR," Scientometrics, Springer;Akadémiai Kiadó, vol. 95(2), pages 503-528, May.
    5. Kevin W. Boyack & Richard Klavans & Katy Börner, 2005. "Mapping the backbone of science," Scientometrics, Springer;Akadémiai Kiadó, vol. 64(3), pages 351-374, August.
    6. Henry Small, 1973. "Co‐citation in the scientific literature: A new measure of the relationship between two documents," Journal of the American Society for Information Science, Association for Information Science & Technology, vol. 24(4), pages 265-269, July.
    7. Loet Leydesdorff & Stephen Carley & Ismael Rafols, 2013. "Global maps of science based on the new Web-of-Science categories," Scientometrics, Springer;Akadémiai Kiadó, vol. 94(2), pages 589-593, February.
    8. Loet Leydesdorff & Ismael Rafols, 2009. "A global map of science based on the ISI subject categories," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 60(2), pages 348-362, February.
    9. Ismael Rafols & Alan Porter & Loet Leydesdorff, 2009. "Overlay Maps of Science: a New Tool for Research Policy," SPRU Working Paper Series 179, SPRU - Science Policy Research Unit, University of Sussex Business School.
    10. Waltman, Ludo & van Eck, Nees Jan & Noyons, Ed C.M., 2010. "A unified approach to mapping and clustering of bibliometric networks," Journal of Informetrics, Elsevier, vol. 4(4), pages 629-635.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Zhang, Yi & Lu, Jie & Liu, Feng & Liu, Qian & Porter, Alan & Chen, Hongshu & Zhang, Guangquan, 2018. "Does deep learning help topic extraction? A kernel k-means clustering method with word embedding," Journal of Informetrics, Elsevier, vol. 12(4), pages 1099-1117.
    2. Cena, Anna & Gagolewski, Marek & Mesiar, Radko, 2015. "Problems and challenges of information resources producers’ clustering," Journal of Informetrics, Elsevier, vol. 9(2), pages 273-284.
    3. Jong Hwan Suh, 2019. "SocialTERM-Extractor: Identifying and Predicting Social-Problem-Specific Key Noun Terms from a Large Number of Online News Articles Using Text Mining and Machine Learning Techniques," Sustainability, MDPI, vol. 11(1), pages 1-44, January.
    4. Dejing Kong & Jianzhong Yang & Lingfeng Li, 2020. "Early identification of technological convergence in numerical control machine tool: a deep learning approach," Scientometrics, Springer;Akadémiai Kiadó, vol. 125(3), pages 1983-2009, December.
    5. Zhichao Wang & Valentin Zelenyuk, 2021. "Performance Analysis of Hospitals in Australia and its Peers: A Systematic Review," CEPA Working Papers Series WP012021, School of Economics, University of Queensland, Australia.
    6. Skrjanc, T. & Mihalic, R. & Rudez, U., 2023. "A systematic literature review on under-frequency load shedding protection using clustering methods," Renewable and Sustainable Energy Reviews, Elsevier, vol. 180(C).

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Ying Huang & Wolfgang Glänzel & Lin Zhang, 2021. "Tracing the development of mapping knowledge domains," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(7), pages 6201-6224, July.
    2. Jielan Ding & Per Ahlgren & Liying Yang & Ting Yue, 2018. "Disciplinary structures in Nature, Science and PNAS: journal and country levels," Scientometrics, Springer;Akadémiai Kiadó, vol. 116(3), pages 1817-1852, September.
    3. Miguel R. Guevara & Dominik Hartmann & Manuel Aristarán & Marcelo Mendoza & César A. Hidalgo, 2016. "The research space: using career paths to predict the evolution of the research output of individuals, institutions, and nations," Scientometrics, Springer;Akadémiai Kiadó, vol. 109(3), pages 1695-1709, December.
    4. Hric, Darko & Kaski, Kimmo & Kivelä, Mikko, 2018. "Stochastic block model reveals maps of citation patterns and their evolution in time," Journal of Informetrics, Elsevier, vol. 12(3), pages 757-783.
    5. Yuxian Liu & Ewelina Biskup & Yueqian Wang & Fengfeng Cai & Xiaoyan Zhang, 2020. "A new territory and its pioneer: opening up a dominant research stream for a translational research area," Scientometrics, Springer;Akadémiai Kiadó, vol. 125(2), pages 1213-1228, November.
    6. Xiaozan Lyu & Ping Zhou & Loet Leydesdorff, 2020. "Eco-system mapping of techno-science linkages at the level of scholarly journals and fields," Scientometrics, Springer;Akadémiai Kiadó, vol. 124(3), pages 2037-2055, September.
    7. Andrea Bonaccorsi & Nicola Melluso & Francesco Alessandro Massucci, 2022. "Exploring the antecedents of interdisciplinarity at the European Research Council: a topic modeling approach," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(12), pages 6961-6991, December.
    8. Payam Hanafizadeh & Seyedali Marjaie, 2020. "Trends and turning points of banking: a timespan view," Review of Managerial Science, Springer, vol. 14(6), pages 1183-1219, December.
    9. Sun, Xiaoling & Ding, Kun & Lin, Yuan, 2016. "Mapping the evolution of scientific fields based on cross-field authors," Journal of Informetrics, Elsevier, vol. 10(3), pages 750-761.
    10. Andrea Bonaccorsi & Filippo Chiarello & Gualtiero Fantoni, 2021. "Impact for whom? Mapping the users of public research with lexicon-based text mining," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(2), pages 1745-1774, February.
    11. Leydesdorff, Loet & Rafols, Ismael, 2012. "Interactive overlays: A new method for generating global journal maps from Web-of-Science data," Journal of Informetrics, Elsevier, vol. 6(2), pages 318-332.
    12. Yanto Chandra, 2018. "Mapping the evolution of entrepreneurship as a field of research (1990–2013): A scientometric analysis," PLOS ONE, Public Library of Science, vol. 13(1), pages 1-24, January.
    13. Balland, Pierre-Alexandre & Boschma, Ron, 2022. "Do scientific capabilities in specific domains matter for technological diversification in European regions?," Research Policy, Elsevier, vol. 51(10).
    14. Giovanni Abramo & Ciriaco Andrea D'Angelo & Flavia Costa, 2012. "Identifying interdisciplinarity through the disciplinary classification of coauthors of scientific publications," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 63(11), pages 2206-2222, November.
    15. Rafols, Ismael & Leydesdorff, Loet & O’Hare, Alice & Nightingale, Paul & Stirling, Andy, 2012. "How journal rankings can suppress interdisciplinary research: A comparison between Innovation Studies and Business & Management," Research Policy, Elsevier, vol. 41(7), pages 1262-1282.
    16. Jianhua Hou, 2017. "Exploration into the evolution and historical roots of citation analysis by referenced publication year spectroscopy," Scientometrics, Springer;Akadémiai Kiadó, vol. 110(3), pages 1437-1452, March.
    17. Ludo Waltman & Nees Jan Eck, 2012. "A new methodology for constructing a publication-level classification system of science," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 63(12), pages 2378-2392, December.
    18. Xie, Yundong & Wu, Qiang & Zhang, Peng & Li, Xingchen, 2020. "Information Science and Library Science (IS-LS) journal subject categorisation and comparison based on editorship information," Journal of Informetrics, Elsevier, vol. 14(4).
    19. Lin Zhang & Beibei Sun & Fei Shu & Ying Huang, 2022. "Comparing paper level classifications across different methods and systems: an investigation of Nature publications," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(12), pages 7633-7651, December.
    20. Michel Zitt, 2015. "Meso-level retrieval: IR-bibliometrics interplay and hybrid citation-words methods in scientific fields delineation," Scientometrics, Springer;Akadémiai Kiadó, vol. 102(3), pages 2223-2245, March.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:infome:v:7:y:2013:i:4:p:874-886. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/joi .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.