IDEAS home Printed from https://ideas.repec.org/a/spr/scient/v128y2023i12d10.1007_s11192-023-04851-x.html
   My bibliography  Save this article

Identifying the driving factors of word co-occurrence: a perspective of semantic relations

Author

Listed:
  • Yiming Zhao

    (Wuhan University
    Wuhan University
    Wuhan University)

  • Jiaying Yin

    (Wuhan University)

  • Jin Zhang

    (University of Wisconsin Milwaukee)

  • Linrong Wu

    (Wuhan University)

Abstract

This study aims to investigate and identify the driving factors of word co-occurrence from the perspective of semantic relations between frequently co-occurring words. Natural sentences in a corpus of news articles were used as co-occurrence windows to extract co-occurring word pairs, and the distance of those two words was not limited. ConceptNet (a semantic knowledge base) was used to annotate the semantic relation between co-occurring words. To solve the problem that some co-occurring word pairs fail to match direct semantic relations in ConceptNet, we proposed a relation annotation method by connecting them with an intermediate word. Results showed that six semantic relations in ConceptNet, (i.e., RelatedTo, IsA, Synonym, HasContext, Antonym, and MannerOf) were important factors directly inducing word co-occurrence. The combination of some of those semantic relations was an important factor indirectly driving word co-occurrence. Also, syntactic analysis and lexical semantic theories were combined to analyze the direct and indirect semantic relations. In this analysis, we found that the factors driving word co-occurrence in sentences could be classified into three relation categories: collocation and modification, hyponymy, and synonym and antonym. These findings can help explain the phenomenon of word co-occurrence and improve the method and application of co-word analysis.

Suggested Citation

  • Yiming Zhao & Jiaying Yin & Jin Zhang & Linrong Wu, 2023. "Identifying the driving factors of word co-occurrence: a perspective of semantic relations," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(12), pages 6471-6494, December.
  • Handle: RePEc:spr:scient:v:128:y:2023:i:12:d:10.1007_s11192-023-04851-x
    DOI: 10.1007/s11192-023-04851-x
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11192-023-04851-x
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11192-023-04851-x?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Andrej Kastrin & Jelena Klisara & Borut Lužar & Janez Povh, 2018. "Is science driven by principal investigators?," Scientometrics, Springer;Akadémiai Kiadó, vol. 117(2), pages 1157-1182, November.
    2. Lutz Bornmann & Robin Haunschild & Sven E. Hug, 2018. "Visualizing the context of citations referencing papers published by Eugene Garfield: a new type of keyword co-occurrence analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 114(2), pages 427-437, February.
    3. Miranda Lee Pao, 1978. "Automatic text analysis based on transition phenomena of word occurrences," Journal of the American Society for Information Science, Association for Information Science & Technology, vol. 29(3), pages 121-124, May.
    4. Wanying Ding & Chaomei Chen, 2014. "Dynamic topic detection and tracking: A comparison of HDP, C-word, and cocitation methods," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 65(10), pages 2084-2097, October.
    5. Jia Feng & Yun Qiu Zhang & Hao Zhang, 2017. "Improving the co-word analysis method based on semantic distance," Scientometrics, Springer;Akadémiai Kiadó, vol. 111(3), pages 1521-1531, June.
    6. Marek Kwiek, 2020. "Internationalists and locals: international research collaboration in a resource-poor system," Scientometrics, Springer;Akadémiai Kiadó, vol. 124(1), pages 57-105, July.
    7. Peter A. Hook, 2017. "Using course-subject Co-occurrence (CSCO) to reveal the structure of an academic discipline: A framework to evaluate different inputs of a domain map," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 68(1), pages 182-196, January.
    8. Zhentao Liang & Jin Mao & Kun Lu & Gang Li, 2021. "Finding citations for PubMed: a large-scale comparison between five freely available bibliographic data sources," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(12), pages 9519-9542, December.
    9. Zhong-Yi Wang & Gang Li & Chun-Ya Li & Ang Li, 2012. "Research on the semantic-based co-word analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 90(3), pages 855-875, March.
    10. Vahe Tshitoyan & John Dagdelen & Leigh Weston & Alexander Dunn & Ziqin Rong & Olga Kononova & Kristin A. Persson & Gerbrand Ceder & Anubhav Jain, 2019. "Unsupervised word embeddings capture latent knowledge from materials science literature," Nature, Nature, vol. 571(7763), pages 95-98, July.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Qikai Cheng & Jiamin Wang & Wei Lu & Yong Huang & Yi Bu, 2020. "Keyword-citation-keyword network: a new perspective of discipline knowledge structure analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 124(3), pages 1923-1943, September.
    2. Kai Hu & Huayi Wu & Kunlun Qi & Jingmin Yu & Siluo Yang & Tianxing Yu & Jie Zheng & Bo Liu, 2018. "A domain keyword analysis approach extending Term Frequency-Keyword Active Index with Google Word2Vec model," Scientometrics, Springer;Akadémiai Kiadó, vol. 114(3), pages 1031-1068, March.
    3. Xiang Zhu & Yunqiu Zhang, 2020. "Co-word analysis method based on meta-path of subject knowledge network," Scientometrics, Springer;Akadémiai Kiadó, vol. 123(2), pages 753-766, May.
    4. Chen, Guo & Hong, Siqi & Du, Chenxin & Wang, Panting & Yang, Zeyu & Xiao, Lu, 2024. "Comparing semantic representation methods for keyword analysis in bibliometric research," Journal of Informetrics, Elsevier, vol. 18(3).
    5. Qiang Gao & Xiao Huang & Ke Dong & Zhentao Liang & Jiang Wu, 2022. "Semantic-enhanced topic evolution analysis: a combination of the dynamic topic model and word2vec," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(3), pages 1543-1563, March.
    6. Ananthan Nambiar & Tobias Rubel & James McCaull & Jon deVries & Mark Bedau, 2021. "Dropping diversity of products of large US firms: Models and measures," Papers 2110.08367, arXiv.org.
    7. Lu, Wei & Ren, Yan & Huang, Yong & Bu, Yi & Zhang, Yuehan, 2021. "Scientific collaboration and career stages: An ego-centric perspective," Journal of Informetrics, Elsevier, vol. 15(4).
    8. Marek Kwiek & Wojciech Roszka, 2022. "Academic vs. biological age in research on academic careers: a large-scale study with implications for scientifically developing systems," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(6), pages 3543-3575, June.
    9. Sumbol Fiaz & Muhammad Azeem Qureshi, 2021. "How perceived organizational politics cause work-to-family conflict? Scoping and systematic review of literature," Future Business Journal, Springer, vol. 7(1), pages 1-18, December.
    10. Jason Youn & Navneet Rai & Ilias Tagkopoulos, 2022. "Knowledge integration and decision support for accelerated discovery of antibiotic resistance genes," Nature Communications, Nature, vol. 13(1), pages 1-11, December.
    11. João M. Santos & Hugo Horta & Shihui Feng, 2024. "Homophily and its effects on collaborations and repeated collaborations: a study across scientific fields," Scientometrics, Springer;Akadémiai Kiadó, vol. 129(3), pages 1801-1823, March.
    12. Naif Radi Aljohani & Ayman Fayoumi & Saeed-Ul Hassan, 2021. "An in-text citation classification predictive model for a scholarly search system," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(7), pages 5509-5529, July.
    13. Chaocheng He & Jiang Wu & Qingpeng Zhang, 2021. "Characterizing research leadership on geographically weighted collaboration network," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(5), pages 4005-4037, May.
    14. Zongrui Pei & Junqi Yin & Peter K. Liaw & Dierk Raabe, 2023. "Toward the design of ultrahigh-entropy alloys via mining six million texts," Nature Communications, Nature, vol. 14(1), pages 1-8, December.
    15. Jesús Frutos-Belizón & Natalia García-Carbonell & Félix Guerrero-Alba & Gonzalo Sánchez-Gardey, 2024. "An empirical analysis of individual and collective determinants of international research collaboration," Scientometrics, Springer;Akadémiai Kiadó, vol. 129(5), pages 2749-2770, May.
    16. Shaoshuo Li & Baixing Chen & Hao Chen & Zhen Hua & Yang Shao & Heng Yin & Jianwei Wang, 2021. "Analysis of potential genetic biomarkers and molecular mechanism of smoking-related postmenopausal osteoporosis using weighted gene co-expression network analysis and machine learning," PLOS ONE, Public Library of Science, vol. 16(9), pages 1-18, September.
    17. Gang Du & Xi Liang & Xiaoling Ouyang & Chunming Wang, 0. "Risk prediction of hypertension complications based on the intelligent algorithm optimized Bayesian network," Journal of Combinatorial Optimization, Springer, vol. 0, pages 1-22.
    18. John Dagdelen & Alexander Dunn & Sanghoon Lee & Nicholas Walker & Andrew S. Rosen & Gerbrand Ceder & Kristin A. Persson & Anubhav Jain, 2024. "Structured information extraction from scientific text with large language models," Nature Communications, Nature, vol. 15(1), pages 1-14, December.
    19. Korzeniowska Dominika & Brescia Valerio & Fijałkowska Justyna, 2022. "Behavioral Accounting: A Bibliometric Analysis of Literature Outputs in 2013–2022," Journal of Intercultural Management, Sciendo, vol. 14(3), pages 17-40, September.
    20. Xiaoyan Wang & Guocai Wang & Yanhui Zhao & Wyatt A. Schrock, 2024. "The Intellectual Structure of Sales Ethics Research: A Multi-method Bibliometric Analysis," Journal of Business Ethics, Springer, vol. 193(1), pages 133-157, August.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:scient:v:128:y:2023:i:12:d:10.1007_s11192-023-04851-x. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.