IDEAS home Printed from https://ideas.repec.org/a/eee/infome/v14y2020i4s1751157720301978.html
   My bibliography  Save this article

Return to basics: Clustering of scientific literature using structural information

Author

Listed:
  • Yun, Jinhyuk
  • Ahn, Sejung
  • Lee, June Young

Abstract

Scholars frequently employ relatedness measures to estimate the similarity between two different items (e.g., documents, authors, and institutes). Such relatedness measures are commonly based on overlapping references (i.e., bibliographic coupling) or citations (i.e., co-citation) and can then be used with cluster analysis to find boundaries between research fields. Unfortunately, calculating a relatedness measure is challenging, especially for a large number of items, because the computational complexity is greater than linear. We propose an alternative method for identifying research fronts that uses direct citation inspired by relatedness measures. Our novel approach simply replicates a node into two distinct nodes: a citing node and cited node. We then apply typical clustering methods to the modified network. Clusters of citing nodes should emulate those from the bibliographic coupling relatedness network, while clusters of cited nodes should act like those from the co-citation relatedness network. In validation tests, our proposed method demonstrated high levels of similarity with conventional relatedness-based methods. We also found that the clustering results of the proposed method outperformed those of conventional relatedness-based measures regarding similarity with natural language processing-based classification.

Suggested Citation

  • Yun, Jinhyuk & Ahn, Sejung & Lee, June Young, 2020. "Return to basics: Clustering of scientific literature using structural information," Journal of Informetrics, Elsevier, vol. 14(4).
  • Handle: RePEc:eee:infome:v:14:y:2020:i:4:s1751157720301978
    DOI: 10.1016/j.joi.2020.101099
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S1751157720301978
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.joi.2020.101099?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Rey-Long Liu, 2017. "A new bibliographic coupling measure with descriptive capability," Scientometrics, Springer;Akadémiai Kiadó, vol. 110(2), pages 915-935, February.
    2. Giovanni Colavizza & Kevin W. Boyack & Nees Jan van Eck & Ludo Waltman, 2018. "The Closer the Better: Similarity of Publication Pairs at Different Cocitation Levels," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 69(4), pages 600-609, April.
    3. Howard D. White, 2003. "Author cocitation analysis and Pearson's r," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 54(13), pages 1250-1259, November.
    4. Ludo Waltman & Nees Jan Eck, 2012. "A new methodology for constructing a publication-level classification system of science," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 63(12), pages 2378-2392, December.
    5. Richard Klavans & Kevin W. Boyack, 2017. "Which Type of Citation Analysis Generates the Most Accurate Taxonomy of Scientific and Technical Knowledge?," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 68(4), pages 984-998, April.
    6. Jochen Gläser & Wolfgang Glänzel & Andrea Scharnhorst, 2017. "Same data—different results? Towards a comparative approach to the identification of thematic structures in science," Scientometrics, Springer;Akadémiai Kiadó, vol. 111(2), pages 981-998, May.
    7. Per Ahlgren & Bo Jarneving & Ronald Rousseau, 2004. "Author cocitation analysis and Pearson's r," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 55(9), pages 843-843, July.
    8. Per Ahlgren & Bo Jarneving & Ronald Rousseau, 2003. "Requirements for a cocitation similarity measure, with special reference to Pearson's correlation coefficient," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 54(6), pages 550-560, April.
    9. M. M. Kessler, 1963. "Bibliographic coupling between scientific papers," American Documentation, Wiley Blackwell, vol. 14(1), pages 10-25, January.
    10. Kevin W. Boyack & Richard Klavans, 2010. "Co‐citation analysis, bibliographic coupling, and direct citation: Which citation approach represents the research front most accurately?," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 61(12), pages 2389-2404, December.
    11. Yu-Wei Chang & Mu-Hsuan Huang & Chiao-Wen Lin, 2015. "Evolution of research subjects in library and information science based on keyword, bibliographical coupling, and co-citation analyses," Scientometrics, Springer;Akadémiai Kiadó, vol. 105(3), pages 2071-2087, December.
    12. Katherine W. McCain, 1990. "Mapping authors in intellectual space: A technical overview," Journal of the American Society for Information Science, Association for Information Science & Technology, vol. 41(6), pages 433-443, September.
    13. Patrick Wilson, 1995. "Unused relevant information in research and development," Journal of the American Society for Information Science, Association for Information Science & Technology, vol. 46(1), pages 45-51, January.
    14. Bu, Yi & Ni, Shaokang & Huang, Win-bin, 2017. "Combining multiple scholarly relationships with author cocitation analysis: A preliminary exploration on improving knowledge domain mappings," Journal of Informetrics, Elsevier, vol. 11(3), pages 810-822.
    15. Loet Leydesdorff, 2005. "Similarity measures, author cocitation analysis, and information theory," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 56(7), pages 769-772, May.
    16. Leo Egghe & Ronald Rousseau, 2002. "Co-citation, bibliographic coupling and a characterization of lattice citation networks," Scientometrics, Springer;Akadémiai Kiadó, vol. 55(3), pages 349-361, November.
    17. Ludo Waltman & Nees Jan van Eck, 2012. "A new methodology for constructing a publication‐level classification system of science," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 63(12), pages 2378-2392, December.
    18. Bart Thijs & Edgar Schiebel & Wolfgang Glänzel, 2013. "Do second-order similarities provide added-value in a hybrid approach?," Scientometrics, Springer;Akadémiai Kiadó, vol. 96(3), pages 667-677, September.
    19. Henry Small, 1973. "Co‐citation in the scientific literature: A new measure of the relationship between two documents," Journal of the American Society for Information Science, Association for Information Science & Technology, vol. 24(4), pages 265-269, July.
    20. Kevin W. Boyack & Richard Klavans, 2010. "Co-citation analysis, bibliographic coupling, and direct citation: Which citation approach represents the research front most accurately?," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 61(12), pages 2389-2404, December.
    21. Dangzhi Zhao & Andreas Strotmann, 2008. "Evolution of research activities and intellectual influences in information science 1996–2005: Introducing author bibliographic‐coupling analysis," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 59(13), pages 2070-2086, November.
    22. Henry Small, 1999. "Visualizing science by citation mapping," Journal of the American Society for Information Science, Association for Information Science & Technology, vol. 50(9), pages 799-813.
    23. Cristian Colliander & Per Ahlgren, 2012. "Experimental comparison of first and second-order similarities in a scientometric context," Scientometrics, Springer;Akadémiai Kiadó, vol. 90(2), pages 675-685, February.
    24. Donald O. Case & Georgeann M. Higgins, 2000. "How can we investigate citation behavior? A study of reasons for citing literature in communication," Journal of the American Society for Information Science, Association for Information Science & Technology, vol. 51(7), pages 635-645.
    25. Matthew L. Wallace & Yves Gingras & Russell Duhon, 2009. "A new approach for detecting scientific specialties from raw cocitation networks," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 60(2), pages 240-246, February.
    26. Stephen J. Bensman, 2004. "Pearson's r and author cocitation analysis: A commentary on the controversy," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 55(10), pages 935-935, August.
    27. Howard D. White & Belver C. Griffith, 1981. "Author cocitation: A literature measure of intellectual structure," Journal of the American Society for Information Science, Association for Information Science & Technology, vol. 32(3), pages 163-171, May.
    28. Howard D. White & Katherine W. McCain, 1998. "Visualizing a discipline: An author co‐citation analysis of information science, 1972–1995," Journal of the American Society for Information Science, Association for Information Science & Technology, vol. 49(4), pages 327-355.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Yun, Jinhyuk, 2022. "Generalization of bibliographic coupling and co-citation using the node split network," Journal of Informetrics, Elsevier, vol. 16(2).
    2. Skrjanc, T. & Mihalic, R. & Rudez, U., 2023. "A systematic literature review on under-frequency load shedding protection using clustering methods," Renewable and Sustainable Energy Reviews, Elsevier, vol. 180(C).

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Yun, Jinhyuk, 2022. "Generalization of bibliographic coupling and co-citation using the node split network," Journal of Informetrics, Elsevier, vol. 16(2).
    2. García-Lillo, Francisco & Seva-Larrosa, Pedro & Sánchez-García, Eduardo, 2023. "What is going on in entrepreneurship research? A bibliometric and SNA analysis," Journal of Business Research, Elsevier, vol. 158(C).
    3. Kraker, Peter & Schlögl, Christian & Jack, Kris & Lindstaedt, Stefanie, 2015. "Visualization of co-readership patterns from an online reference management system," Journal of Informetrics, Elsevier, vol. 9(1), pages 169-182.
    4. Ying Huang & Wolfgang Glänzel & Lin Zhang, 2021. "Tracing the development of mapping knowledge domains," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(7), pages 6201-6224, July.
    5. Jun-Ping Qiu & Ke Dong & Hou-Qiang Yu, 2014. "Comparative study on structure and correlation among author co-occurrence networks in bibliometrics," Scientometrics, Springer;Akadémiai Kiadó, vol. 101(2), pages 1345-1360, November.
    6. Song Yanhui & Wu Lijuan & Qiu Junping, 2021. "A comparative study of first and all-author bibliographic coupling analysis based on Scientometrics," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(2), pages 1125-1147, February.
    7. Sjögårde, Peter & Ahlgren, Per, 2018. "Granularity of algorithmically constructed publication-level classifications of research publications: Identification of topics," Journal of Informetrics, Elsevier, vol. 12(1), pages 133-152.
    8. Wang, Feifei & Jia, Chenran & Wang, Xiaohan & Liu, Junwan & Xu, Shuo & Liu, Yang & Yang, Chenyuyan, 2019. "Exploring all-author tripartite citation networks: A case study of gene editing," Journal of Informetrics, Elsevier, vol. 13(3), pages 856-873.
    9. Prathap, Gangan & Ujum, Ephrance Abu & Kumar, Sameer & Ratnavelu, Kuru, 2021. "Scoring the resourcefulness of researchers using bibliographic coupling patterns," Journal of Informetrics, Elsevier, vol. 15(3).
    10. Nassiri, Isar & Masoudi-Nejad, Ali & Jalili, Mahdi & Moeini, Ali, 2013. "Normalized Similarity Index: An adjusted index to prioritize article citations," Journal of Informetrics, Elsevier, vol. 7(1), pages 91-98.
    11. Jochen Gläser & Wolfgang Glänzel & Andrea Scharnhorst, 2017. "Same data—different results? Towards a comparative approach to the identification of thematic structures in science," Scientometrics, Springer;Akadémiai Kiadó, vol. 111(2), pages 981-998, May.
    12. Ignacio Rodríguez-Rodríguez & José-Víctor Rodríguez & Niloofar Shirvanizadeh & Andrés Ortiz & Domingo-Javier Pardo-Quiles, 2021. "Applications of Artificial Intelligence, Machine Learning, Big Data and the Internet of Things to the COVID-19 Pandemic: A Scientometric Review Using Text Mining," IJERPH, MDPI, vol. 18(16), pages 1-29, August.
    13. Liu, Yunmei & Yang, Liu & Chen, Min, 2021. "A new citation concept: Triangular citation in the literature," Journal of Informetrics, Elsevier, vol. 15(2).
    14. Michel Zitt, 2015. "Meso-level retrieval: IR-bibliometrics interplay and hybrid citation-words methods in scientific fields delineation," Scientometrics, Springer;Akadémiai Kiadó, vol. 102(3), pages 2223-2245, March.
    15. Wenceslao Arroyo-Machado & Daniel Torres-Salinas & Nicolas Robinson-Garcia, 2021. "Identifying and characterizing social media communities: a socio-semantic network approach to altmetrics," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(11), pages 9267-9289, November.
    16. Xu, Shuo & Hao, Liyuan & Yang, Guancan & Lu, Kun & An, Xin, 2021. "A topic models based framework for detecting and forecasting emerging technologies," Technological Forecasting and Social Change, Elsevier, vol. 162(C).
    17. Takano, Yasutomo & Kajikawa, Yuya, 2019. "Extracting commercialization opportunities of the Internet of Things: Measuring text similarity between papers and patents," Technological Forecasting and Social Change, Elsevier, vol. 138(C), pages 45-68.
    18. Mingchun Cao & Ilan Alon, 2020. "Intellectual Structure of the Belt and Road Initiative Research: A Scientometric Analysis and Suggestions for a Future Research Agenda," Sustainability, MDPI, vol. 12(17), pages 1-40, August.
    19. Myriam Ertz & Sébastien Leblanc-Proulx, 2019. "Review of a proposed methodology for bibliometric and visualization analyses for organizations: application to the collaboration economy," Journal of Marketing Analytics, Palgrave Macmillan, vol. 7(2), pages 84-93, June.
    20. Hsiao, Chun Hua & Yang, Chyan, 2011. "The intellectual development of the technology acceptance model: A co-citation analysis," International Journal of Information Management, Elsevier, vol. 31(2), pages 128-136.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:infome:v:14:y:2020:i:4:s1751157720301978. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/joi .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.