IDEAS home Printed from https://ideas.repec.org/a/spr/scient/v124y2020i3d10.1007_s11192-020-03601-7.html
   My bibliography  Save this article

A co-occurrence based approach of automatic keyword expansion using mass diffusion

Author

Listed:
  • Xicheng Yin

    (Tongji University)

  • Hongwei Wang

    (Tongji University)

  • Pei Yin

    (University of Shanghai for Science and Technology)

  • Hengmin Zhu

    (Nanjing University of Posts and Telecommunications)

  • Zhenyu Zhang

    (Tongji University)

Abstract

The performance of keyword expansion in prior methods is often enhanced by adopting external knowledge. Given a set of initial keywords, this paper is motivated to propose a novel method to expand semantically or conceptually related keywords from domain corpus by employing mass diffusion. A bipartite word network is thus constructed based on co-occurrence relations between initial keywords and candidate words. The expanded keywords are identified via two-step mass diffusion which is carried out in the bipartite network. Experimental results prove that the proposed method outperforms both the typical statistical-based approach and graph-based approach. Our research is expected to complement the theoretical framework of keyword expansion and is applicable to the scenarios of query expansion, thesaurus construction, and text clustering.

Suggested Citation

  • Xicheng Yin & Hongwei Wang & Pei Yin & Hengmin Zhu & Zhenyu Zhang, 2020. "A co-occurrence based approach of automatic keyword expansion using mass diffusion," Scientometrics, Springer;Akadémiai Kiadó, vol. 124(3), pages 1885-1905, September.
  • Handle: RePEc:spr:scient:v:124:y:2020:i:3:d:10.1007_s11192-020-03601-7
    DOI: 10.1007/s11192-020-03601-7
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11192-020-03601-7
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11192-020-03601-7?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. M. Shamim Khan & Sebastian Khor, 2004. "Enhanced Web document retrieval using automatic query expansion," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 55(1), pages 29-40, January.
    2. Helen J. Peat & Peter Willett, 1991. "The limitations of term co‐occurrence data for query expansion in document retrieval systems," Journal of the American Society for Information Science, Association for Information Science & Technology, vol. 42(5), pages 378-383, June.
    3. Liu Yang & Keping Li & Hangfei Huang, 2018. "A new network model for extracting text keywords," Scientometrics, Springer;Akadémiai Kiadó, vol. 116(1), pages 339-361, July.
    4. LAMBIOTTE, Renaud & DELVENNE, Jean-Charles & BARAHONA, Mauricio, 2014. "Random walks, Markov processes and the multiscale modular organization of complex network," LIDAM Reprints CORE 2660, Université catholique de Louvain, Center for Operations Research and Econometrics (CORE).
    5. Shuqing Li & Ying Sun & Dagobert Soergel, 2015. "A new method for automatically constructing domain-oriented term taxonomy based on weighted word co-occurrence analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 103(3), pages 1023-1042, June.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Ying Lian & Xiaofeng Lin & Xuefan Dong & Shengjie Hou, 2022. "A Normalized Rich-Club Connectivity-Based Strategy for Keyword Selection in Social Media Analysis," Sustainability, MDPI, vol. 14(13), pages 1-19, June.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Benjamin Davies & David C. Maré, 2020. "Delineating functional labour market areas with estimable classification stabilities," Working Papers 20_08, Motu Economic and Public Policy Research.
    2. Xin Xu & Yang Lu & Yupeng Zhou & Zhiguo Fu & Yanjie Fu & Minghao Yin, 2021. "An Information-Explainable Random Walk Based Unsupervised Network Representation Learning Framework on Node Classification Tasks," Mathematics, MDPI, vol. 9(15), pages 1-14, July.
    3. Yu Tian & Sebastian Lautz & Alisdiar O. G. Wallis & Renaud Lambiotte, 2021. "Extracting Complements and Substitutes from Sales Data: A Network Perspective," Papers 2103.02042, arXiv.org, revised Aug 2021.
    4. Yang, Jinqing & Bu, Yi & Lu, Wei & Huang, Yong & Hu, Jiming & Huang, Shengzhi & Zhang, Li, 2022. "Identifying keyword sleeping beauties: A perspective on the knowledge diffusion process," Journal of Informetrics, Elsevier, vol. 16(1).
    5. Liu Yang & Keping Li & Dan Zhao & Shuang Gu & Dongyang Yan, 2019. "A Network Method for Identifying the Root Cause of High-Speed Rail Faults Based on Text Data," Energies, MDPI, vol. 12(10), pages 1-17, May.
    6. Tianwei Mu & Yaqi Li & Ziyi Li & Luyue Wang & Haoqiang Tan & Chengzhi Zheng, 2021. "Improved Network Reliability Optimization Model with Head Loss for Water Distribution System," Water Resources Management: An International Journal, Published for the European Water Resources Association (EWRA), Springer;European Water Resources Association (EWRA), vol. 35(7), pages 2101-2114, May.
    7. Chengzhi Zhang & Lei Zhao & Mengyuan Zhao & Yingyi Zhang, 2022. "Enhancing keyphrase extraction from academic articles with their reference information," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(2), pages 703-731, February.
    8. Tingting Zhang & Baozhen Lee & Qinghua Zhu, 2019. "Semantic measure of plagiarism using a hierarchical graph model," Scientometrics, Springer;Akadémiai Kiadó, vol. 121(1), pages 209-239, October.
    9. Dietmar Wolfram, 2015. "The symbiotic relationship between information retrieval and informetrics," Scientometrics, Springer;Akadémiai Kiadó, vol. 102(3), pages 2201-2214, March.
    10. Wylie, Peter J., 1995. "Partial equilibrium estimates of manufacturing trade creation and diversion due to NAFTA," The North American Journal of Economics and Finance, Elsevier, vol. 6(1), pages 65-84.
    11. Veda C. Storey & Andrew Burton-Jones & Vijayan Sugumaran & Sandeep Purao, 2008. "CONQUER: A Methodology for Context-Aware Query Processing on the World Wide Web," Information Systems Research, INFORMS, vol. 19(1), pages 3-25, March.
    12. Arnaud Adam & Jean-Charles Delvenne & Isabelle Thomas, 2018. "Detecting communities with the multi-scale Louvain method: robustness test on the metropolitan area of Brussels," Journal of Geographical Systems, Springer, vol. 20(4), pages 363-386, October.
    13. Shamina Imran Pathan & Silvia Scibetta & Chiara Grassi & Giacomo Pietramellara & Simone Orlandini & Maria Teresa Ceccherini & Marco Napoli, 2020. "Response of Soil Bacterial Community to Application of Organic and Inorganic Phosphate Based Fertilizers under Vicia faba L. Cultivation at Two Different Phenological Stages," Sustainability, MDPI, vol. 12(22), pages 1-20, November.
    14. Tomeczek, Artur F., 2022. "The evolution of Japanese keiretsu networks: A review and text network analysis of their perceptions in economics," Japan and the World Economy, Elsevier, vol. 62(C).
    15. Tianwei Mu & Manhong Huang & Shi Tang & Rui Zhang & Gang Chen & Baiyi Jiang, 2022. "Sensor Partitioning Placements via Random Walk and Water Quality and Leakage Detection Models within Water Distribution Systems," Water Resources Management: An International Journal, Published for the European Water Resources Association (EWRA), Springer;European Water Resources Association (EWRA), vol. 36(13), pages 5297-5311, October.
    16. Tingting Zhang & Baozhen Lee & Qinghua Zhu & Xi Han & Ke Chen, 2023. "Document keyword extraction based on semantic hierarchical graph model," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(5), pages 2623-2647, May.
    17. Leleux, Pierre & Courtain, Sylvain & Françoisse, Kevin & Saerens, Marco, 2022. "Design of biased random walks on a graph with application to collaborative recommendation," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 590(C).
    18. Tianwei Mu & Yan Lu & Haoqiang Tan & Haowen Zhang & Chengzhi Zheng, 2021. "Random Walks Partitioning and Network Reliability Assessing in Water Distribution System," Water Resources Management: An International Journal, Published for the European Water Resources Association (EWRA), Springer;European Water Resources Association (EWRA), vol. 35(8), pages 2325-2341, June.
    19. YiJun Liu & Li Zhang & Xiaoli Lian, 2020. "A document-structure-based complex network model for extracting text keywords," Scientometrics, Springer;Akadémiai Kiadó, vol. 124(3), pages 1765-1791, September.
    20. Jesus S. Alejandro-Cruz & Rosa M. Rio-Belver & Yara C. Almanza-Arjona & Alejandro Rodriguez-Andara, 2019. "Towards a Science Map on Sustainability in Higher Education," Sustainability, MDPI, vol. 11(13), pages 1-18, June.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:scient:v:124:y:2020:i:3:d:10.1007_s11192-020-03601-7. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.