IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0187164.html
   My bibliography  Save this article

Hybrid self-optimized clustering model based on citation links and textual features to detect research topics

Author

Listed:
  • Dejian Yu
  • Wanru Wang
  • Shuai Zhang
  • Wenyu Zhang
  • Rongyu Liu

Abstract

The challenge of detecting research topics in a specific research field has attracted attention from researchers in the bibliometrics community. In this study, to solve two problems of clustering papers, i.e., the influence of different distributions of citation links and involved textual features on similarity computation, the authors propose a hybrid self-optimized clustering model to detect research topics by extending the hybrid clustering model to identify “core documents”. First, the Amsler network, consisting of bibliographic coupling and co-citation links, is created to calculate the citation-based similarity based on the cosine angle of papers. Second, the cosine similarity is also used to compute the text-based similarity, which consists of the textual statistical and topological features. Then, the cosine angle of the linear combination of citation- and text-based similarity is considered as the hybrid similarity. Finally, the Louvain method is applied to cluster papers, and the terms based on term frequency are used to label clusters. To test the performance of the proposed model, a dataset related to the data envelopment analysis field is used for comparison and analysis of clustering results. Based on the benchmark built, different clustering methods with different citation links or textual features are compared according to evaluation measures. The results show that the proposed model can obtain reasonable and effective clustering results, and the research topics of data envelopment analysis field are also analyzed based on the proposed model. As different features are considered in the proposed model compared with previous hybrid clustering models, the proposed clustering model can provide inspiration for further studies on topic identification by other researchers.

Suggested Citation

  • Dejian Yu & Wanru Wang & Shuai Zhang & Wenyu Zhang & Rongyu Liu, 2017. "Hybrid self-optimized clustering model based on citation links and textual features to detect research topics," PLOS ONE, Public Library of Science, vol. 12(10), pages 1-21, October.
  • Handle: RePEc:plo:pone00:0187164
    DOI: 10.1371/journal.pone.0187164
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0187164
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0187164&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0187164?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Xinhai Liu & Shi Yu & Frizo Janssens & Wolfgang Glänzel & Yves Moreau & Bart De Moor, 2010. "Weighted hybrid clustering by combining text mining and bibliometrics on a large-scale journal database," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 61(6), pages 1105-1119, June.
    2. Chen, Guo & Xiao, Lu, 2016. "Selecting publication keywords for domain analysis in bibliometrics: A comparison of three methods," Journal of Informetrics, Elsevier, vol. 10(1), pages 212-223.
    3. Fabian Meyer-Brötz & Edgar Schiebel & Leo Brecht, 2017. "Experimental evaluation of parameter settings in calculation of hybrid similarities: effects of first- and second-order similarity, edge cutting, and weighting factors," Scientometrics, Springer;Akadémiai Kiadó, vol. 111(3), pages 1307-1325, June.
    4. Dejian Yu & Wanru Wang & Shuai Zhang & Wenyu Zhang & Rongyu Liu, 2017. "A multiple-link, mutually reinforced journal-ranking model to measure the prestige of journals," Scientometrics, Springer;Akadémiai Kiadó, vol. 111(1), pages 521-542, April.
    5. Amancio, Diego R. & Nunes, Maria G.V. & Oliveira, Osvaldo N. & Costa, Luciano da F., 2012. "Extractive summarization using complex networks and syntactic dependency," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 391(4), pages 1855-1864.
    6. Wolfgang Glänzel & Bart Thijs, 2017. "Using hybrid methods and ‘core documents’ for the representation of clusters and topics: the astronomy dataset," Scientometrics, Springer;Akadémiai Kiadó, vol. 111(2), pages 1071-1087, May.
    7. M. M. Kessler, 1963. "Bibliographic coupling between scientific papers," American Documentation, Wiley Blackwell, vol. 14(1), pages 10-25, January.
    8. Robert R. Braam & Henk F. Moed & Anthony F. J. van Raan, 1991. "Mapping of science by combined co‐citation and word analysis. II: Dynamical aspects," Journal of the American Society for Information Science, Association for Information Science & Technology, vol. 42(4), pages 252-266, May.
    9. Xiangfeng Meng & Xinhai Liu & YunHai Tong & Wolfgang Glänzel & Shaohua Tan, 2015. "Multi-view clustering with exemplars for scientific mapping," Scientometrics, Springer;Akadémiai Kiadó, vol. 105(3), pages 1527-1552, December.
    10. Henry Small, 1973. "Co‐citation in the scientific literature: A new measure of the relationship between two documents," Journal of the American Society for Information Science, Association for Information Science & Technology, vol. 24(4), pages 265-269, July.
    11. Diego Raphael Amancio, 2015. "Comparing the topological properties of real and artificially generated scientific manuscripts," Scientometrics, Springer;Akadémiai Kiadó, vol. 105(3), pages 1763-1779, December.
    12. Silva, Filipi N. & Amancio, Diego R. & Bardosova, Maria & Costa, Luciano da F. & Oliveira, Osvaldo N., 2016. "Using network science and text analytics to produce surveys in a scientific topic," Journal of Informetrics, Elsevier, vol. 10(2), pages 487-502.
    13. José M. Merigó & Christian A. Cancino & Freddy Coronado & David Urbano, 2016. "Academic research in innovation: a country analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 108(2), pages 559-593, August.
    14. Frizo Janssens & Wolfgang Glänzel & Bart Moor, 2008. "A hybrid mapping of information science," Scientometrics, Springer;Akadémiai Kiadó, vol. 75(3), pages 607-631, June.
    15. Xinhai Liu & Shi Yu & Frizo Janssens & Wolfgang Glänzel & Yves Moreau & Bart De Moor, 2010. "Weighted hybrid clustering by combining text mining and bibliometrics on a large‐scale journal database," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 61(6), pages 1105-1119, June.
    16. Rey-Long Liu, 2015. "Passage-Based Bibliographic Coupling: An Inter-Article Similarity Measure for Biomedical Articles," PLOS ONE, Public Library of Science, vol. 10(10), pages 1-22, October.
    17. Cristian Colliander & Per Ahlgren, 2012. "Experimental comparison of first and second-order similarities in a scientometric context," Scientometrics, Springer;Akadémiai Kiadó, vol. 90(2), pages 675-685, February.
    18. Diego Raphael Amancio, 2015. "A Complex Network Approach to Stylometry," PLOS ONE, Public Library of Science, vol. 10(8), pages 1-21, August.
    19. Thiago Salles & Leonardo Rocha & Marcos André Gonçalves & Jussara M. Almeida & Fernando Mourão & Wagner Meira Jr. & Felipe Viegas, 2016. "A quantitative analysis of the temporal effects on automatic text classification," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 67(7), pages 1639-1667, July.
    20. Jinseok Kim & Jana Diesner, 2016. "Distortive effects of initial-based name disambiguation on measurements of large-scale coauthorship networks," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 67(6), pages 1446-1461, June.
    21. Ahlgren, Per & Colliander, Cristian, 2009. "Document–document similarity approaches and science mapping: Experimental comparison of five approaches," Journal of Informetrics, Elsevier, vol. 3(1), pages 49-63.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. S. Lozano & L. Calzada-Infante & B. Adenso-Díaz & S. García, 2019. "Complex network analysis of keywords co-occurrence in the recent efficiency analysis literature," Scientometrics, Springer;Akadémiai Kiadó, vol. 120(2), pages 609-629, August.
    2. Corrêa, Edilson A. & Amancio, Diego R., 2019. "Word sense induction using word embeddings and community detection in complex networks," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 523(C), pages 180-190.
    3. Dejian Yu & Sun Meng, 2018. "An overview of biomass energy research with bibliometric indicators," Energy & Environment, , vol. 29(4), pages 576-590, June.
    4. Tohalino, Jorge V. & Amancio, Diego R., 2018. "Extractive multi-document summarization using multilayer networks," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 503(C), pages 526-539.
    5. Wang, Feifei & Jia, Chenran & Wang, Xiaohan & Liu, Junwan & Xu, Shuo & Liu, Yang & Yang, Chenyuyan, 2019. "Exploring all-author tripartite citation networks: A case study of gene editing," Journal of Informetrics, Elsevier, vol. 13(3), pages 856-873.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Michel Zitt, 2015. "Meso-level retrieval: IR-bibliometrics interplay and hybrid citation-words methods in scientific fields delineation," Scientometrics, Springer;Akadémiai Kiadó, vol. 102(3), pages 2223-2245, March.
    2. Ying Huang & Wolfgang Glänzel & Lin Zhang, 2021. "Tracing the development of mapping knowledge domains," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(7), pages 6201-6224, July.
    3. Yeow Chong Goh & Xin Qing Cai & Walter Theseira & Giovanni Ko & Khiam Aik Khor, 2020. "Evaluating human versus machine learning performance in classifying research abstracts," Scientometrics, Springer;Akadémiai Kiadó, vol. 125(2), pages 1197-1212, November.
    4. Rey-Long Liu, 2015. "Passage-Based Bibliographic Coupling: An Inter-Article Similarity Measure for Biomedical Articles," PLOS ONE, Public Library of Science, vol. 10(10), pages 1-22, October.
    5. Rons, Nadine, 2018. "Bibliometric approximation of a scientific specialty by combining key sources, title words, authors and references," Journal of Informetrics, Elsevier, vol. 12(1), pages 113-132.
    6. Tohalino, Jorge V. & Amancio, Diego R., 2018. "Extractive multi-document summarization using multilayer networks," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 503(C), pages 526-539.
    7. Guan-Can Yang & Gang Li & Chun-Ya Li & Yun-Hua Zhao & Jing Zhang & Tong Liu & Dar-Zen Chen & Mu-Hsuan Huang, 2015. "Using the comprehensive patent citation network (CPC) to evaluate patent value," Scientometrics, Springer;Akadémiai Kiadó, vol. 105(3), pages 1319-1346, December.
    8. Rey-Long Liu, 2017. "A new bibliographic coupling measure with descriptive capability," Scientometrics, Springer;Akadémiai Kiadó, vol. 110(2), pages 915-935, February.
    9. Ding, Ying, 2011. "Community detection: Topological vs. topical," Journal of Informetrics, Elsevier, vol. 5(4), pages 498-514.
    10. Leslier Valenzuela-Fernández & Manuel Escobar-Farfán, 2022. "Zero-Waste Management and Sustainable Consumption: A Comprehensive Bibliometric Mapping Analysis," Sustainability, MDPI, vol. 14(23), pages 1-24, December.
    11. Yun, Jinhyuk & Ahn, Sejung & Lee, June Young, 2020. "Return to basics: Clustering of scientific literature using structural information," Journal of Informetrics, Elsevier, vol. 14(4).
    12. MaruÅ¡a Premru & Matej ÄŒerne & SaÅ¡a BatistiÄ, 2022. "The Road to the Future: A Multi-Technique Bibliometric Review and Development Projections of the Leader–Member Exchange (LMX) Research," SAGE Open, , vol. 12(2), pages 21582440221, May.
    13. Sigifredo Laengle & Nikunja Mohan Modak & Jose M. Merigo & Gustavo Zurita, 2018. "Twenty-Five Years of Group Decision and Negotiation: A Bibliometric Overview," Group Decision and Negotiation, Springer, vol. 27(4), pages 505-542, August.
    14. Laengle, Sigifredo & Merigó, José M. & Miranda, Jaime & Słowiński, Roman & Bomze, Immanuel & Borgonovo, Emanuele & Dyson, Robert G. & Oliveira, José Fernando & Teunter, Ruud, 2017. "Forty years of the European Journal of Operational Research: A bibliometric overview," European Journal of Operational Research, Elsevier, vol. 262(3), pages 803-816.
    15. José M. Merigó & Claudio Muller & Nikunja Mohan Modak & Sigifredo Laengle, 2019. "Research in Production and Operations Management: A University-Based Bibliometric Analysis," Global Journal of Flexible Systems Management, Springer;Global Institute of Flexible Systems Management, vol. 20(1), pages 1-29, March.
    16. Guadalupe Palacios-Núñez & Gabriel Vélez-Cuartas & Juan D. Botero, 2018. "Developmental tendencies in the academic field of intellectual property through the identification of invisible colleges," Scientometrics, Springer;Akadémiai Kiadó, vol. 115(3), pages 1561-1574, June.
    17. Fabian Meyer-Brötz & Edgar Schiebel & Leo Brecht, 2017. "Experimental evaluation of parameter settings in calculation of hybrid similarities: effects of first- and second-order similarity, edge cutting, and weighting factors," Scientometrics, Springer;Akadémiai Kiadó, vol. 111(3), pages 1307-1325, June.
    18. Nelson Andrade-Valbuena & Hugo Baier-Fuentes & Magaly Gaviria-Marin, 2022. "An Overview of Sustainable Entrepreneurship in Tourism, Destination, and Hospitality Research Based on the Web of Science," Sustainability, MDPI, vol. 14(22), pages 1-26, November.
    19. Hugo Baier-Fuentes & José M. Merigó & José Ernesto Amorós & Magaly Gaviria-Marín, 2019. "International entrepreneurship: a bibliometric overview," International Entrepreneurship and Management Journal, Springer, vol. 15(2), pages 385-429, June.
    20. Yun, Jinhyuk, 2022. "Generalization of bibliographic coupling and co-citation using the node split network," Journal of Informetrics, Elsevier, vol. 16(2).

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0187164. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.