IDEAS home Printed from https://ideas.repec.org/a/spr/scient/v114y2018i3d10.1007_s11192-017-2574-9.html
   My bibliography  Save this article

A domain keyword analysis approach extending Term Frequency-Keyword Active Index with Google Word2Vec model

Author

Listed:
  • Kai Hu

    (Wuhan University
    Wuhan University)

  • Huayi Wu

    (Wuhan University
    Wuhan University)

  • Kunlun Qi

    (China University of Geosciences (Wuhan))

  • Jingmin Yu

    (Changjiang Spatial Information Technology Engineering CO., LTD)

  • Siluo Yang

    (Wuhan University)

  • Tianxing Yu

    (Wuhan University
    Wuhan University)

  • Jie Zheng

    (Wuhan University
    Wuhan University)

  • Bo Liu

    (East China Institute of Technology)

Abstract

In bibliometric research, keyword analysis of publications provides an effective way not only to investigate the knowledge structure of research domains, but also to explore the developing trends within domains. To identify the most representative keywords, many approaches have been proposed. Most of them focus on using statistical regularities, syntax, grammar, or network-based characteristics to select representative keywords for the domain analysis. In this paper, we argue that the domain knowledge is reflected by the semantic meanings behind keywords rather than the keywords themselves. We apply the Google Word2Vec model, a model of a word distribution using deep learning, to represent the semantic meanings of the keywords. Based on this work, we propose a new domain knowledge approach, the Semantic Frequency-Semantic Active Index, similar to Term Frequency-Inverse Document Frequency, to link domain and background information and identify infrequent but important keywords. We adopt a semantic similarity measuring process before statistical computation to compute the frequencies of “semantic units” rather than keyword frequencies. Semantic units are generated by word vector clustering, while the Inverse Document Frequency is extended to include the semantic inverse document frequency; thus only words in the inverse documents with a certain similarity will be counted. Taking geographical natural hazards as the domain and natural hazards as the background discipline, we identify the domain-specific knowledge that distinguishes geographical natural hazards from other types of natural hazards. We compare and discuss the advantages and disadvantages of the proposed method in relation to existing methods, finding that by introducing the semantic meaning of the keywords, our method supports more effective domain knowledge analysis.

Suggested Citation

  • Kai Hu & Huayi Wu & Kunlun Qi & Jingmin Yu & Siluo Yang & Tianxing Yu & Jie Zheng & Bo Liu, 2018. "A domain keyword analysis approach extending Term Frequency-Keyword Active Index with Google Word2Vec model," Scientometrics, Springer;Akadémiai Kiadó, vol. 114(3), pages 1031-1068, March.
  • Handle: RePEc:spr:scient:v:114:y:2018:i:3:d:10.1007_s11192-017-2574-9
    DOI: 10.1007/s11192-017-2574-9
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11192-017-2574-9
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11192-017-2574-9?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Chen, Guo & Xiao, Lu, 2016. "Selecting publication keywords for domain analysis in bibliometrics: A comparison of three methods," Journal of Informetrics, Elsevier, vol. 10(1), pages 212-223.
    2. Chaomei Chen, 2006. "CiteSpace II: Detecting and visualizing emerging trends and transient patterns in scientific literature," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 57(3), pages 359-377, February.
    3. Hsin-Ning Su & Pei-Chun Lee, 2010. "Mapping knowledge structure by keyword co-occurrence: a first look at journal papers in Technology Foresight," Scientometrics, Springer;Akadémiai Kiadó, vol. 85(1), pages 65-79, October.
    4. Jia Feng & Yun Qiu Zhang & Hao Zhang, 2017. "Improving the co-word analysis method based on semantic distance," Scientometrics, Springer;Akadémiai Kiadó, vol. 111(3), pages 1521-1531, June.
    5. Guo Chen & Lu Xiao & Chang-ping Hu & Xue-qin Zhao, 2015. "Identifying the research focus of Library and Information Science institutions in China with institution-specific keywords," Scientometrics, Springer;Akadémiai Kiadó, vol. 103(2), pages 707-724, May.
    6. Yang, Siluo & Han, Ruizhen & Wolfram, Dietmar & Zhao, Yuehua, 2016. "Visualizing the intellectual structure of information science (2006–2015): Introducing author keyword coupling analysis," Journal of Informetrics, Elsevier, vol. 10(1), pages 132-150.
    7. Zhong-Yi Wang & Gang Li & Chun-Ya Li & Ang Li, 2012. "Research on the semantic-based co-word analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 90(3), pages 855-875, March.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Dabić, Marina & Marzi, Giacomo & Vlačić, Božidar & Daim, Tugrul U. & Vanhaverbeke, Wim, 2021. "40 years of excellence: An overview of Technovation and a roadmap for future research," Technovation, Elsevier, vol. 106(C).
    2. Karol Król & Dariusz Zdonek, 2023. "Cultural Heritage Topics in Online Queries: A Comparison between English- and Polish-Speaking Internet Users," Sustainability, MDPI, vol. 15(6), pages 1-20, March.
    3. Chiarello, Filippo & Fantoni, Gualtiero & Hogarth, Terence & Giordano, Vito & Baltina, Liga & Spada, Irene, 2021. "Towards ESCO 4.0 – Is the European classification of skills in line with Industry 4.0? A text mining approach," Technological Forecasting and Social Change, Elsevier, vol. 173(C).
    4. Lijie Feng & Yuxiang Niu & Zhenfeng Liu & Jinfeng Wang & Ke Zhang, 2019. "Discovering Technology Opportunity by Keyword-Based Patent Analysis: A Hybrid Approach of Morphology Analysis and USIT," Sustainability, MDPI, vol. 12(1), pages 1-35, December.
    5. Xiaoyu Liu & Xuefeng Wang & Donghua Zhu, 2022. "Reviewer recommendation method for scientific research proposals: a case for NSFC," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(6), pages 3343-3366, June.
    6. Lu Huang & Yijie Cai & Erdong Zhao & Shengting Zhang & Yue Shu & Jiao Fan, 2022. "Measuring the interdisciplinarity of Information and Library Science interactions using citation analysis and semantic analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(11), pages 6733-6761, November.
    7. Xiang Zhu & Yunqiu Zhang, 2020. "Co-word analysis method based on meta-path of subject knowledge network," Scientometrics, Springer;Akadémiai Kiadó, vol. 123(2), pages 753-766, May.
    8. Lu Huang & Xiang Chen & Yi Zhang & Changtian Wang & Xiaoli Cao & Jiarun Liu, 2022. "Identification of topic evolution: network analytics with piecewise linear representation and word embedding," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(9), pages 5353-5383, September.
    9. Yongcong Luo & Jing Ma & Chi Li, 2020. "Entity name recognition of cross-border e-commerce commodity titles based on TWs-LSTM," Electronic Commerce Research, Springer, vol. 20(2), pages 405-426, June.
    10. Xinyuan Zhang & Qing Xie & Chaemin Song & Min Song, 2022. "Mining the evolutionary process of knowledge through multiple relationships between keywords," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(4), pages 2023-2053, April.
    11. Lu Huang & Xiang Chen & Yi Zhang & Yihe Zhu & Suyi Li & Xingxing Ni, 2021. "Dynamic network analytics for recommending scientific collaborators," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(11), pages 8789-8814, November.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Qikai Cheng & Jiamin Wang & Wei Lu & Yong Huang & Yi Bu, 2020. "Keyword-citation-keyword network: a new perspective of discipline knowledge structure analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 124(3), pages 1923-1943, September.
    2. Chen, Guo & Xiao, Lu, 2016. "Selecting publication keywords for domain analysis in bibliometrics: A comparison of three methods," Journal of Informetrics, Elsevier, vol. 10(1), pages 212-223.
    3. Munan Li, 2018. "Classifying and ranking topic terms based on a novel approach: role differentiation of author keywords," Scientometrics, Springer;Akadémiai Kiadó, vol. 116(1), pages 77-100, July.
    4. Guan, Jiancheng & Yan, Yan & Zhang, Jing Jing, 2017. "The impact of collaboration and knowledge networks on citations," Journal of Informetrics, Elsevier, vol. 11(2), pages 407-422.
    5. Manuel Castriotta & Michela Loi & Elona Marku & Luca Naitana, 2019. "What’s in a name? Exploring the conceptual structure of emerging organizations," Scientometrics, Springer;Akadémiai Kiadó, vol. 118(2), pages 407-437, February.
    6. Xingwen Chen & Li Zhu & Chao Liu & Chunhua Chen & Jun Liu & Dongxia Huo, 2023. "Workplace Diversity in the Asia-Pacific Region: A Review of Literature and Directions for Future Research," Asia Pacific Journal of Management, Springer, vol. 40(3), pages 1021-1045, September.
    7. Yucheng Zhang & Zhiling Wang & Lin Xiao & Lijun Wang & Pei Huang, 2023. "Discovering the evolution of online reviews: A bibliometric review," Electronic Markets, Springer;IIM University of St. Gallen, vol. 33(1), pages 1-22, December.
    8. Wang, Xiaoguang & He, Jing & Huang, Han & Wang, Hongyu, 2022. "MatrixSim: A new method for detecting the evolution paths of research topics," Journal of Informetrics, Elsevier, vol. 16(4).
    9. Xiang Zhu & Yunqiu Zhang, 2020. "Co-word analysis method based on meta-path of subject knowledge network," Scientometrics, Springer;Akadémiai Kiadó, vol. 123(2), pages 753-766, May.
    10. Liu Yang & Keping Li & Hangfei Huang, 2018. "A new network model for extracting text keywords," Scientometrics, Springer;Akadémiai Kiadó, vol. 116(1), pages 339-361, July.
    11. Hao Wang & Sanhong Deng & Xinning Su, 2016. "A study on construction and analysis of discipline knowledge structure of Chinese LIS based on CSSCI," Scientometrics, Springer;Akadémiai Kiadó, vol. 109(3), pages 1725-1759, December.
    12. Osman Issah & Lúcia Lima Rodrigues, 2021. "Corporate Social Responsibility and Corporate Tax Aggressiveness: A Scientometric Analysis of the Existing Literature to Map the Future," Sustainability, MDPI, vol. 13(11), pages 1-23, June.
    13. Chaker Jebari & Enrique Herrera-Viedma & Manuel Jesus Cobo, 2021. "The use of citation context to detect the evolution of research topics: a large-scale analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(4), pages 2971-2989, April.
    14. Rui Yang & Xin An & Yingwen Chen & Xiuli Yang, 2023. "The Knowledge Analysis of Panel Vector Autoregression: A Systematic Review," SAGE Open, , vol. 13(4), pages 21582440231, December.
    15. Kai Hu & Kunlun Qi & Siluo Yang & Shengyu Shen & Xiaoqiang Cheng & Huayi Wu & Jie Zheng & Stephen McClure & Tianxing Yu, 2018. "Identifying the “Ghost City” of domain topics in a keyword semantic space combining citations," Scientometrics, Springer;Akadémiai Kiadó, vol. 114(3), pages 1141-1157, March.
    16. Wenting Yang & Jiantong Zhang & Ruolin Ma, 2020. "The Prediction of Infectious Diseases: A Bibliometric Analysis," IJERPH, MDPI, vol. 17(17), pages 1-19, August.
    17. Lin, Boqiang & Su, Tong, 2020. "Mapping the oil price-stock market nexus researches: A scientometric review," International Review of Economics & Finance, Elsevier, vol. 67(C), pages 133-147.
    18. Jianhua Hou & Xiucai Yang & Chaomei Chen, 2018. "Emerging trends and new developments in information science: a document co-citation analysis (2009–2016)," Scientometrics, Springer;Akadémiai Kiadó, vol. 115(2), pages 869-892, May.
    19. Boutheina Fhoula & Majed Hadid & Adel Elomri & Laoucine Kerbache & Anas Hamad & Mohammed Hamad J. Al Thani & Raed M. Al-Zoubi & Abdulla Al-Ansari & Omar M. Aboumarzouk & Abdelfatteh El Omri, 2022. "Home Cancer Care Research: A Bibliometric and Visualization Analysis (1990–2021)," IJERPH, MDPI, vol. 19(20), pages 1-25, October.
    20. Zhiyi Tao & Xiangdong Zhu & Guoqiang Xu & Dezhi Zou & Guo Li, 2023. "A Comparative Analysis of Outdoor Thermal Comfort Indicators Applied in China and Other Countries," Sustainability, MDPI, vol. 15(22), pages 1-36, November.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:scient:v:114:y:2018:i:3:d:10.1007_s11192-017-2574-9. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.