IDEAS home Printed from https://ideas.repec.org/a/bla/jinfst/v74y2023i7p759-774.html
   My bibliography  Save this article

Generating keyphrases for readers: A controllable keyphrase generation framework

Author

Listed:
  • Yi Jiang
  • Rui Meng
  • Yong Huang
  • Wei Lu
  • Jiawei Liu

Abstract

With the wide application of keyphrases in many Information Retrieval (IR) and Natural Language Processing (NLP) tasks, automatic keyphrase prediction has been emerging. However, these statistically important phrases are contributing increasingly less to the related tasks because the end‐to‐end learning mechanism enables models to learn the important semantic information of the text directly. Similarly, keyphrases are of little help for readers to quickly grasp the paper's main idea because the relationship between the keyphrase and the paper is not explicit to readers. Therefore, we propose to generate keyphrases with specific functions for readers to bridge the semantic gap between them and the information producers, and verify the effectiveness of the keyphrase function for assisting users’ comprehension with a user experiment. A controllable keyphrase generation framework (the CKPG) that uses the keyphrase function as a control code to generate categorized keyphrases is proposed and implemented based on Transformer, BART, and T5, respectively. For the Computer Science domain, the Macro‐avgs of P@5, R@5, and F1@5 on the Paper with Code dataset are up to 0.680, 0.535, and 0.558, respectively. Our experimental results indicate the effectiveness of the CKPG models.

Suggested Citation

  • Yi Jiang & Rui Meng & Yong Huang & Wei Lu & Jiawei Liu, 2023. "Generating keyphrases for readers: A controllable keyphrase generation framework," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 74(7), pages 759-774, July.
  • Handle: RePEc:bla:jinfst:v:74:y:2023:i:7:p:759-774
    DOI: 10.1002/asi.24749
    as

    Download full text from publisher

    File URL: https://doi.org/10.1002/asi.24749
    Download Restriction: no

    File URL: https://libkey.io/10.1002/asi.24749?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Kun Lu & Margaret E.I. Kipp, 2014. "Understanding the retrieval effectiveness of collaborative tags and author keywords in different retrieval environments: An experimental study on medical collections," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 65(3), pages 483-500, March.
    2. Qikai Cheng & Jiamin Wang & Wei Lu & Yong Huang & Yi Bu, 2020. "Keyword-citation-keyword network: a new perspective of discipline knowledge structure analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 124(3), pages 1923-1943, September.
    3. Kevin Heffernan & Simone Teufel, 2018. "Identifying problems and solutions in scientific text," Scientometrics, Springer;Akadémiai Kiadó, vol. 116(2), pages 1367-1382, August.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Ying Lian & Xiaofeng Lin & Xuefan Dong & Shengjie Hou, 2022. "A Normalized Rich-Club Connectivity-Based Strategy for Keyword Selection in Social Media Analysis," Sustainability, MDPI, vol. 14(13), pages 1-19, June.
    2. Reynaldo Gustavo Rivera & Carlos Orellana Fantoni & Eunice Gálvez & Priscilla Jimenez-Pazmino & Carmen Karina Vaca Ruiz & Arturo Fitz Herbert, 2024. "Using scientometrics to mapping Latin American research networks in emerging fields: the field networking index," Scientometrics, Springer;Akadémiai Kiadó, vol. 129(4), pages 2309-2335, April.
    3. Bowen Ma & Chengzhi Zhang & Yuzhuo Wang & Sanhong Deng, 2022. "Enhancing identification of structure function of academic articles using contextual information," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(2), pages 885-925, February.
    4. Guillaume Cabanac & Ingo Frommholz & Philipp Mayr, 2018. "Bibliometric-enhanced information retrieval: preface," Scientometrics, Springer;Akadémiai Kiadó, vol. 116(2), pages 1225-1227, August.
    5. Yonghe Lu & Jiayi Luo & Ying Xiao & Hou Zhu, 2021. "Text representation model of scientific papers based on fusing multi-viewpoint information and its quality assessment," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(8), pages 6937-6963, August.
    6. Saeed-Ul Hassan & Naif R. Aljohani & Mudassir Shabbir & Umair Ali & Sehrish Iqbal & Raheem Sarwar & Eugenio Martínez-Cámara & Sebastián Ventura & Francisco Herrera, 2020. "Tweet Coupling: a social media methodology for clustering scientific publications," Scientometrics, Springer;Akadémiai Kiadó, vol. 124(2), pages 973-991, August.
    7. Iqra Safder & Saeed-Ul Hassan, 2019. "Bibliometric-enhanced information retrieval: a novel deep feature engineering approach for algorithm searching from full-text publications," Scientometrics, Springer;Akadémiai Kiadó, vol. 119(1), pages 257-277, April.
    8. Pengcheng Li & Wei Lu & Qikai Cheng, 2022. "Generating a related work section for scientific papers: an optimized approach with adopting problem and method information," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(8), pages 4397-4417, August.
    9. Nasrin Asadi & Kambiz Badie & Maryam Tayefeh Mahmoudi, 2019. "Automatic zone identification in scientific papers via fusion techniques," Scientometrics, Springer;Akadémiai Kiadó, vol. 119(2), pages 845-862, May.
    10. Yuzhuo Wang & Chengzhi Zhang & Kai Li, 2022. "A review on method entities in the academic literature: extraction, evaluation, and application," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(5), pages 2479-2520, May.
    11. Dong Liu & Yu Peng Zhu, 2023. "Evolution of Knowledge Structure in an Emerging Field Based on a Triple Helix Model: the Case of Smart Factory," Journal of the Knowledge Economy, Springer;Portland International Center for Management of Engineering and Technology (PICMET), vol. 14(4), pages 4583-4607, December.
    12. Luo, Zhuoran & Lu, Wei & He, Jiangen & Wang, Yuqi, 2022. "Combination of research questions and methods: A new measurement of scientific novelty," Journal of Informetrics, Elsevier, vol. 16(2).
    13. Dongin Nam & Jiwon Kim & Jeeyoung Yoon & Chaemin Song & Seongdeok Kim & Min Song, 2024. "Examining knowledge entities and its relationships based on citation sentences using a multi-anchor bipartite network," Scientometrics, Springer;Akadémiai Kiadó, vol. 129(11), pages 7197-7228, November.
    14. Lu Huang & Xiang Chen & Yi Zhang & Changtian Wang & Xiaoli Cao & Jiarun Liu, 2022. "Identification of topic evolution: network analytics with piecewise linear representation and word embedding," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(9), pages 5353-5383, September.
    15. Biao Zhang & Yunwei Chen, 2024. "Automated recognition of innovative sentences in academic articles: semi-automatic annotation for cost reduction and SAO reconstruction for enhanced data," Scientometrics, Springer;Akadémiai Kiadó, vol. 129(9), pages 5403-5432, September.
    16. Yingyi Zhang & Chengzhi Zhang, 2024. "Extracting problem and method sentence from scientific papers: a context-enhanced transformer using formulaic expression desensitization," Scientometrics, Springer;Akadémiai Kiadó, vol. 129(6), pages 3433-3468, June.
    17. Qiang Gao & Xiao Huang & Ke Dong & Zhentao Liang & Jiang Wu, 2022. "Semantic-enhanced topic evolution analysis: a combination of the dynamic topic model and word2vec," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(3), pages 1543-1563, March.
    18. Shiyun Wang & Jin Mao & Yujie Cao & Gang Li, 2022. "Integrated knowledge content in an interdisciplinary field: identification, classification, and application," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(11), pages 6581-6614, November.
    19. Youngok Choi & Sue Yeon Syn, 2016. "Characteristics of tagging behavior in digitized humanities online collections," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 67(5), pages 1089-1104, May.
    20. Gaizka Garechana & Rosa Río-Belver & Enara Zarrabeitia & Izaskun Alvarez-Meaza, 2022. "TeknoAssistant : a domain specific tech mining approach for technical problem-solving support," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(9), pages 5459-5473, September.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:jinfst:v:74:y:2023:i:7:p:759-774. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: http://www.asis.org .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.