IDEAS home Printed from https://ideas.repec.org/a/spr/scient/v127y2022i2d10.1007_s11192-021-04230-4.html
   My bibliography  Save this article

Enhancing keyphrase extraction from academic articles with their reference information

Author

Listed:
  • Chengzhi Zhang

    (Nanjing University of Science and Technology)

  • Lei Zhao

    (Nanjing University of Science and Technology)

  • Mengyuan Zhao

    (Nanjing University of Science and Technology)

  • Yingyi Zhang

    (Nanjing University of Science and Technology)

Abstract

With the development of Internet technology, the phenomenon of information overload is becoming more and more obvious. It takes a lot of time for users to obtain the information they need. However, keyphrases that summarize document information highly are helpful for users to quickly obtain and understand documents. For academic resources, most existing studies extract keyphrases through the title and abstract of papers. We find that title information in references also contains author-assigned keyphrases. Therefore, this article uses reference information and applies two typical methods of unsupervised extraction methods (TF*IDF and TextRank), two representative traditional supervised learning algorithms (Naïve Bayes and Conditional Random Field) and a supervised deep learning model (BiLSTM-CRF), to analyze the specific performance of reference information on keyphrase extraction. It is expected to improve the quality of keyphrase recognition from the perspective of expanding the source text. The experimental results show that reference information can increase precision, recall, and F1 of automatic keyphrase extraction to a certain extent. This indicates the usefulness of reference information on keyphrase extraction of academic papers and provides a new idea for the research on automatic keyphrase extraction.

Suggested Citation

  • Chengzhi Zhang & Lei Zhao & Mengyuan Zhao & Yingyi Zhang, 2022. "Enhancing keyphrase extraction from academic articles with their reference information," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(2), pages 703-731, February.
  • Handle: RePEc:spr:scient:v:127:y:2022:i:2:d:10.1007_s11192-021-04230-4
    DOI: 10.1007/s11192-021-04230-4
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11192-021-04230-4
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11192-021-04230-4?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Chen, Guo & Xiao, Lu, 2016. "Selecting publication keywords for domain analysis in bibliometrics: A comparison of three methods," Journal of Informetrics, Elsevier, vol. 10(1), pages 212-223.
    2. Liu Yang & Keping Li & Hangfei Huang, 2018. "A new network model for extracting text keywords," Scientometrics, Springer;Akadémiai Kiadó, vol. 116(1), pages 339-361, July.
    3. Shimelis G. Assefa & Abebe Rorissa, 2013. "A bibliometric mapping of the structure of STEM education using co‐word analysis," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 64(12), pages 2513-2536, December.
    4. Shimelis G. Assefa & Abebe Rorissa, 2013. "A bibliometric mapping of the structure of STEM education using co-word analysis," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 64(12), pages 2513-2536, December.
    5. Scott Deerwester & Susan T. Dumais & George W. Furnas & Thomas K. Landauer & Richard Harshman, 1990. "Indexing by latent semantic analysis," Journal of the American Society for Information Science, Association for Information Science & Technology, vol. 41(6), pages 391-407, September.
    6. Yingyi Zhang & Chengzhi Zhang, 2021. "Enhancing keyphrase extraction from microblogs using human reading time," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 72(5), pages 611-626, May.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Mohammed Azmi Al-Betar & Ammar Kamal Abasi & Ghazi Al-Naymat & Kamran Arshad & Sharif Naser Makhadmeh, 2023. "Optimization of scientific publications clustering with ensemble approach for topic extraction," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(5), pages 2819-2877, May.
    2. Ebadi, Ashkan & Auger, Alain & Gauthier, Yvan, 2022. "Detecting emerging technologies and their evolution using deep learning and weak signal analysis," Journal of Informetrics, Elsevier, vol. 16(4).

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Marie Katsurai & Shunsuke Ono, 2019. "TrendNets: mapping emerging research trends from dynamic co-word networks via sparse representation," Scientometrics, Springer;Akadémiai Kiadó, vol. 121(3), pages 1583-1598, December.
    2. Jamali, Seyedh Mahboobeh & Nader, Ale Ebrahim & Jamali, Fatemeh, 2021. "The Role of STEM Education in Improving the Quality of Education: A Bibliometric Study," MPRA Paper 114214, University Library of Munich, Germany, revised 02 May 2022.
    3. Víctor Meseguer-Sánchez & Emilio Abad-Segura & Luis Jesús Belmonte-Ureña & Valentín Molina-Moreno, 2020. "Examining the Research Evolution on the Socio-Economic and Environmental Dimensions on University Social Responsibility," IJERPH, MDPI, vol. 17(13), pages 1-30, July.
    4. Vibhav Singh & Surabhi Verma & Sushil S. Chaurasia, 2020. "Mapping the themes and intellectual structure of corporate university: co-citation and cluster analyses," Scientometrics, Springer;Akadémiai Kiadó, vol. 122(3), pages 1275-1302, March.
    5. Guan, Jiancheng & Yan, Yan & Zhang, Jing Jing, 2017. "The impact of collaboration and knowledge networks on citations," Journal of Informetrics, Elsevier, vol. 11(2), pages 407-422.
    6. Chen, Guo & Xiao, Lu, 2016. "Selecting publication keywords for domain analysis in bibliometrics: A comparison of three methods," Journal of Informetrics, Elsevier, vol. 10(1), pages 212-223.
    7. Hae Ok Choi, 2020. "An Evolutionary Approach to Technology Innovation of Cadastre for Smart Land Management Policy," Land, MDPI, vol. 9(2), pages 1-19, February.
    8. Alan L. Porter & David J. Schoeneck & Jan Youtie & Gregg E. A. Solomon & Seokbeom Kwon & Stephen F. Carley, 2019. "Learning about learning: patterns of sharing of research knowledge among Education, Border, and Cognitive Science fields," Scientometrics, Springer;Akadémiai Kiadó, vol. 118(3), pages 1093-1117, March.
    9. Guo Chen & Lu Xiao & Chang-ping Hu & Xue-qin Zhao, 2015. "Identifying the research focus of Library and Information Science institutions in China with institution-specific keywords," Scientometrics, Springer;Akadémiai Kiadó, vol. 103(2), pages 707-724, May.
    10. Liang Zhuang & Chao Ye & Scott N. Lieske, 2020. "Intertwining globality and locality: bibliometric analysis based on the top geography annual conferences in America and China," Scientometrics, Springer;Akadémiai Kiadó, vol. 122(2), pages 1075-1096, February.
    11. Ping Liu & Qiong Wu & Xiangming Mu & Kaipeng Yu & Yiting Guo, 2015. "Detecting the intellectual structure of library and information science based on formal concept analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 104(3), pages 737-762, September.
    12. Irina Wedel & Michael Palk & Stefan Voß, 2022. "A Bilingual Comparison of Sentiment and Topics for a Product Event on Twitter," Information Systems Frontiers, Springer, vol. 24(5), pages 1635-1646, October.
    13. Mohammed Salem Binwahlan, 2023. "Polynomial Networks Model for Arabic Text Summarization," International Journal of Research and Scientific Innovation, International Journal of Research and Scientific Innovation (IJRSI), vol. 10(2), pages 74-84, February.
    14. Curci, Ylenia & Mongeau Ospina, Christian A., 2016. "Investigating biofuels through network analysis," Energy Policy, Elsevier, vol. 97(C), pages 60-72.
    15. Chao Wei & Senlin Luo & Xincheng Ma & Hao Ren & Ji Zhang & Limin Pan, 2016. "Locally Embedding Autoencoders: A Semi-Supervised Manifold Learning Approach of Document Representation," PLOS ONE, Public Library of Science, vol. 11(1), pages 1-20, January.
    16. Kai Hu & Huayi Wu & Kunlun Qi & Jingmin Yu & Siluo Yang & Tianxing Yu & Jie Zheng & Bo Liu, 2018. "A domain keyword analysis approach extending Term Frequency-Keyword Active Index with Google Word2Vec model," Scientometrics, Springer;Akadémiai Kiadó, vol. 114(3), pages 1031-1068, March.
    17. Mikel Alayo & Txomin Iturralde & Amaia Maseda & Gloria Aparicio, 2021. "Mapping family firm internationalization research: bibliometric and literature review," Review of Managerial Science, Springer, vol. 15(6), pages 1517-1560, August.
    18. Maksym Polyakov & Morteza Chalak & Md. Sayed Iftekhar & Ram Pandit & Sorada Tapsuwan & Fan Zhang & Chunbo Ma, 2018. "Authorship, Collaboration, Topics, and Research Gaps in Environmental and Resource Economics 1991–2015," Environmental & Resource Economics, Springer;European Association of Environmental and Resource Economists, vol. 71(1), pages 217-239, September.
    19. Ding, Ying, 2011. "Community detection: Topological vs. topical," Journal of Informetrics, Elsevier, vol. 5(4), pages 498-514.
    20. Klaus Gugler & Florian Szücs & Ulrich Wohak, 2023. "Start-up Acquisitions, Venture Capital and Innovation: A Comparative Study of Google, Apple, Facebook, Amazon and Microsoft," Department of Economics Working Papers wuwp340, Vienna University of Economics and Business, Department of Economics.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:scient:v:127:y:2022:i:2:d:10.1007_s11192-021-04230-4. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.