IDEAS home Printed from https://ideas.repec.org/a/spr/scient/v126y2021i7d10.1007_s11192-021-03986-z.html
   My bibliography  Save this article

An in-text citation classification predictive model for a scholarly search system

Author

Listed:
  • Naif Radi Aljohani

    (King Abdulaziz University)

  • Ayman Fayoumi

    (King Abdulaziz University)

  • Saeed-Ul Hassan

    (Information Technology University)

Abstract

We argue that citations in scholarly documents do not always perform equivalent functions or possess equal importance. To address this problem, we worked with a corpus of over 21 k citations from the Association for Computational Linguistics, from which 465 citations were randomly annotated by experts as either important or unimportant. We used an array of machine-learning models on these annotated citations: Random Forest (RF); Support Vector Machine (SVM); and Decision Tree (DT). For the classification task, the selected models employed 15 novel features: contextual; quantitative; and qualitative. We show that the RF model outperformed the comparative model by 9.52%, achieving a 92% precision-recall area under the curve. We present a prototype of a scientific publication search system based on the RF prediction model for feature engineering. This was used on a dataset of 4138 full-text articles indexed by PLOS ONE that consists of 31,839 unique references. The empirical evaluation shows that the proposed search system improves visibility of a given scientific document by including, along with its index terms, terms from the works that it cites that are predicted to be important. Overall, this yields improved search results against the queries by the user.

Suggested Citation

  • Naif Radi Aljohani & Ayman Fayoumi & Saeed-Ul Hassan, 2021. "An in-text citation classification predictive model for a scholarly search system," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(7), pages 5509-5529, July.
  • Handle: RePEc:spr:scient:v:126:y:2021:i:7:d:10.1007_s11192-021-03986-z
    DOI: 10.1007/s11192-021-03986-z
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11192-021-03986-z
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11192-021-03986-z?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Saeed-Ul Hassan & Iqra Safder & Anam Akram & Faisal Kamiran, 2018. "A novel machine-learning approach to measuring scientific knowledge flows using citation context analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 116(2), pages 973-996, August.
    2. Lutz Bornmann & Robin Haunschild & Sven E. Hug, 2018. "Visualizing the context of citations referencing papers published by Eugene Garfield: a new type of keyword co-occurrence analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 114(2), pages 427-437, February.
    3. Dangzhi Zhao & Andreas Strotmann, 2020. "Deep and narrow impact: introducing location filtered citation counting," Scientometrics, Springer;Akadémiai Kiadó, vol. 122(1), pages 503-517, January.
    4. Samaneh Karimi & Luis Moraes & Avisha Das & Azadeh Shakery & Rakesh Verma, 2018. "Citance-based retrieval and summarization using IR and machine learning," Scientometrics, Springer;Akadémiai Kiadó, vol. 116(2), pages 1331-1366, August.
    5. Boyack, Kevin W. & van Eck, Nees Jan & Colavizza, Giovanni & Waltman, Ludo, 2018. "Characterizing in-text citations in scientific articles: A large-scale analysis," Journal of Informetrics, Elsevier, vol. 12(1), pages 59-73.
    6. Susan Bonzi, 1982. "Characteristics of a Literature as Predictors of Relatedness Between Cited and Citing Works," Journal of the American Society for Information Science, Association for Information Science & Technology, vol. 33(4), pages 208-216, July.
    7. Shutian Ma & Jin Xu & Chengzhi Zhang, 2018. "Automatic identification of cited text spans: a multi-classifier approach over imbalanced dataset," Scientometrics, Springer;Akadémiai Kiadó, vol. 116(2), pages 1303-1330, August.
    8. Hu, Zhigang & Chen, Chaomei & Liu, Zeyuan, 2013. "Where are citations located in the body of scientific articles? A study of the distributions of citation locations," Journal of Informetrics, Elsevier, vol. 7(4), pages 887-896.
    9. Saeed-Ul Hassan & Peter Haddawy, 2013. "Measuring international knowledge flows and scholarly impact of scientific research," Scientometrics, Springer;Akadémiai Kiadó, vol. 94(1), pages 163-179, January.
    10. Lutz Bornmann & K. Brad Wray & Robin Haunschild, 2020. "Citation concept analysis (CCA): a new form of citation analysis revealing the usefulness of concepts for other researchers illustrated by exemplary case studies including classic books by Thomas S. K," Scientometrics, Springer;Akadémiai Kiadó, vol. 122(2), pages 1051-1074, February.
    11. Patricia A. Hooten, 1991. "Frequency and functional use of cited documents in information science," Journal of the American Society for Information Science, Association for Information Science & Technology, vol. 42(6), pages 397-404, July.
    12. Ying Ding & Guo Zhang & Tamy Chambers & Min Song & Xiaolong Wang & Chengxiang Zhai, 2014. "Content-based citation analysis: The next generation of citation analysis," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 65(9), pages 1820-1833, September.
    13. Xiaodan Zhu & Peter Turney & Daniel Lemire & André Vellino, 2015. "Measuring academic influence: Not all citations are equal," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 66(2), pages 408-427, February.
    14. Tahamtan, Iman & Bornmann, Lutz, 2018. "Core elements in the process of citing publications: Conceptual overview of the literature," Journal of Informetrics, Elsevier, vol. 12(1), pages 203-216.
    15. Shahzad Nazir & Muhammad Asif & Shahbaz Ahmad & Faisal Bukhari & Muhammad Tanvir Afzal & Hanan Aljuaid, 2020. "Important citation identification by exploiting content and section-wise in-text citation count," PLOS ONE, Public Library of Science, vol. 15(3), pages 1-19, March.
    16. Lutz Bornmann & K. Brad Wray & Robin Haunschild, 2020. "Correction to: Citation concept analysis (CCA): a new form of citation analysis revealing the usefulness of concepts for other researchers illustrated by exemplary case studies including classic books," Scientometrics, Springer;Akadémiai Kiadó, vol. 124(3), pages 2737-2737, September.
    17. V. Cano, 1989. "Citation behavior: Classification, utility, and location," Journal of the American Society for Information Science, Association for Information Science & Technology, vol. 40(4), pages 284-290, July.
    18. Faiza Qayyum & Muhammad Tanvir Afzal, 2019. "Identification of important citations by exploiting research articles’ metadata and cue-terms from content," Scientometrics, Springer;Akadémiai Kiadó, vol. 118(1), pages 21-43, January.
    19. Dorte Drongstrup & Shafaq Malik & Naif Radi Aljohani & Salem Alelyani & Iqra Safder & Saeed-Ul Hassan, 2020. "Can social media usage of scientific literature predict journal indices of AJG, SNIP and JCR? An altmetric study of economics," Scientometrics, Springer;Akadémiai Kiadó, vol. 125(2), pages 1541-1558, November.
    20. Iqra Safder & Saeed-Ul Hassan, 2019. "Bibliometric-enhanced information retrieval: a novel deep feature engineering approach for algorithm searching from full-text publications," Scientometrics, Springer;Akadémiai Kiadó, vol. 119(1), pages 257-277, April.
    21. Saeed-Ul Hassan & Mubashir Imran & Sehrish Iqbal & Naif Radi Aljohani & Raheel Nawaz, 2018. "Deep context of citations using machine-learning models in scholarly full-text articles," Scientometrics, Springer;Akadémiai Kiadó, vol. 117(3), pages 1645-1662, December.
    22. Saeed-Ul Hassan & Peter Haddawy, 2015. "Analyzing knowledge flows of scientific literature through semantic links: a case study in the field of energy," Scientometrics, Springer;Akadémiai Kiadó, vol. 103(1), pages 33-46, April.
    23. Small, Henry, 2018. "Characterizing highly cited method and non-method papers using citation contexts: The role of uncertainty," Journal of Informetrics, Elsevier, vol. 12(2), pages 461-480.
    24. Shutian Ma & Chengzhi Zhang & Xiaozhong Liu, 2020. "A review of citation recommendation: from textual content to enriched context," Scientometrics, Springer;Akadémiai Kiadó, vol. 122(3), pages 1445-1472, March.
    25. Charles Oppenheim & Susan P. Renn, 1978. "Highly cited old papers and the reasons why they continue to be cited," Journal of the American Society for Information Science, Association for Information Science & Technology, vol. 29(5), pages 225-231, September.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Xiaorui Jiang & Jingqiang Chen, 2023. "Contextualised segment-wise citation function classification," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(9), pages 5117-5158, September.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Sehrish Iqbal & Saeed-Ul Hassan & Naif Radi Aljohani & Salem Alelyani & Raheel Nawaz & Lutz Bornmann, 2021. "A decade of in-text citation analysis based on natural language processing and machine learning techniques: an overview of empirical studies," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(8), pages 6551-6599, August.
    2. Dongqing Lyu & Xuanmin Ruan & Juan Xie & Ying Cheng, 2021. "The classification of citing motivations: a meta-synthesis," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(4), pages 3243-3264, April.
    3. Wang, Shiyun & Mao, Jin & Lu, Kun & Cao, Yujie & Li, Gang, 2021. "Understanding interdisciplinary knowledge integration through citance analysis: A case study on eHealth," Journal of Informetrics, Elsevier, vol. 15(4).
    4. Shengzhi Huang & Jiajia Qian & Yong Huang & Wei Lu & Yi Bu & Jinqing Yang & Qikai Cheng, 2022. "Disclosing the relationship between citation structure and future impact of a publication," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 73(7), pages 1025-1042, July.
    5. Iman Tahamtan & Lutz Bornmann, 2019. "What do citation counts measure? An updated review of studies on citations in scientific documents published between 2006 and 2018," Scientometrics, Springer;Akadémiai Kiadó, vol. 121(3), pages 1635-1684, December.
    6. Mingyang Wang & Jiaqi Zhang & Shijia Jiao & Xiangrong Zhang & Na Zhu & Guangsheng Chen, 2020. "Important citation identification by exploiting the syntactic and contextual information of citations," Scientometrics, Springer;Akadémiai Kiadó, vol. 125(3), pages 2109-2129, December.
    7. Saeed-Ul Hassan & Iqra Safder & Anam Akram & Faisal Kamiran, 2018. "A novel machine-learning approach to measuring scientific knowledge flows using citation context analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 116(2), pages 973-996, August.
    8. Zhang, Chengzhi & Liu, Lifan & Wang, Yuzhuo, 2021. "Characterizing references from different disciplines: A perspective of citation content analysis," Journal of Informetrics, Elsevier, vol. 15(2).
    9. Tahamtan, Iman & Bornmann, Lutz, 2018. "Core elements in the process of citing publications: Conceptual overview of the literature," Journal of Informetrics, Elsevier, vol. 12(1), pages 203-216.
    10. Boyack, Kevin W. & van Eck, Nees Jan & Colavizza, Giovanni & Waltman, Ludo, 2018. "Characterizing in-text citations in scientific articles: A large-scale analysis," Journal of Informetrics, Elsevier, vol. 12(1), pages 59-73.
    11. Faiza Qayyum & Harun Jamil & Naeem Iqbal & DoHyeun Kim & Muhammad Tanvir Afzal, 2022. "Toward potential hybrid features evaluation using MLP-ANN binary classification model to tackle meaningful citations," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(11), pages 6471-6499, November.
    12. Dangzhi Zhao & Andreas Strotmann, 2020. "Telescopic and panoramic views of library and information science research 2011–2018: a comparison of four weighting schemes for author co-citation analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 124(1), pages 255-270, July.
    13. Lutz Bornmann & K. Brad Wray & Robin Haunschild, 2020. "Citation concept analysis (CCA): a new form of citation analysis revealing the usefulness of concepts for other researchers illustrated by exemplary case studies including classic books by Thomas S. K," Scientometrics, Springer;Akadémiai Kiadó, vol. 122(2), pages 1051-1074, February.
    14. Liyue Chen & Jielan Ding & Vincent Larivière, 2022. "Measuring the citation context of national self‐references," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 73(5), pages 671-686, May.
    15. Hamid R. Jamali & Majid Nabavi & Saeid Asadi, 2018. "How video articles are cited, the case of JoVE: Journal of Visualized Experiments," Scientometrics, Springer;Akadémiai Kiadó, vol. 117(3), pages 1821-1839, December.
    16. Xin An & Xin Sun & Shuo Xu, 2022. "Important citations identification with semi-supervised classification model," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(11), pages 6533-6555, November.
    17. Saeed-Ul Hassan & Naif R. Aljohani & Mudassir Shabbir & Umair Ali & Sehrish Iqbal & Raheem Sarwar & Eugenio Martínez-Cámara & Sebastián Ventura & Francisco Herrera, 2020. "Tweet Coupling: a social media methodology for clustering scientific publications," Scientometrics, Springer;Akadémiai Kiadó, vol. 124(2), pages 973-991, August.
    18. Bikun Chen & Dannan Deng & Zhouyan Zhong & Chengzhi Zhang, 2020. "Exploring linguistic characteristics of highly browsed and downloaded academic articles," Scientometrics, Springer;Akadémiai Kiadó, vol. 122(3), pages 1769-1790, March.
    19. Weibin Wang & Zheng Wang & Tian Yu & CholMyong Pak & Guang Yu, 2020. "Research on citation mention times and contributions using a neural network," Scientometrics, Springer;Akadémiai Kiadó, vol. 125(3), pages 2383-2400, December.
    20. Iqra Safder & Saeed-Ul Hassan, 2019. "Bibliometric-enhanced information retrieval: a novel deep feature engineering approach for algorithm searching from full-text publications," Scientometrics, Springer;Akadémiai Kiadó, vol. 119(1), pages 257-277, April.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:scient:v:126:y:2021:i:7:d:10.1007_s11192-021-03986-z. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.