IDEAS home Printed from https://ideas.repec.org/a/spr/scient/v130y2025i3d10.1007_s11192-025-05265-7.html
   My bibliography  Save this article

Ensembling approaches to citation function classification and important citation screening

Author

Listed:
  • Xiaorui Jiang

    (The University of Sheffield)

Abstract

Compared to feature engineering, deep learning approaches for citation context analysis have yet fully leveraged the myriad of design options for modeling in-text citation, citation sentence, and citation context. In fact, no single modeling option universally excels on all citation function classes or annotation schemes, which implies the untapped potential for synergizing diverse modeling approaches to further elevate the performance of citation context analysis. Motivated by this insight, the current paper undertook a systematic exploration of ensemble methods for citation context analysis. To achieve a better diverse set of base classifiers, I delved into three sources of classifier diversity, incorporated five diversity measures, and introduced two novel diversity re-ranking methods. Then, I conducted a comprehensive examination of both voting and stacking approaches for constructing classifier ensembles. I also proposed a novel weighting method that considers each individual classifier’s performance, resulting in superior voting outcomes. While being simple, voting approaches faced significant challenges in determining the optimal number of base classifiers for combination. Several strategies have been proposed to address this limitation, including meta-classification on base classifiers and utilising deeper ensemble architectures. The latter involved hierarchical voting on a filtered set of meta-classifiers and stacked meta-classification. All proposed methods demonstrate state-of-the-art results on, with the best performances achieving more than 5 and 4% improvements on the 11-class and 6-class schemes of citation function classification and by 3% on important citation screening. The promising empirical results validated the potential of the proposed ensembling approaches for citation context analysis.

Suggested Citation

  • Xiaorui Jiang, 2025. "Ensembling approaches to citation function classification and important citation screening," Scientometrics, Springer;Akadémiai Kiadó, vol. 130(3), pages 1371-1419, March.
  • Handle: RePEc:spr:scient:v:130:y:2025:i:3:d:10.1007_s11192-025-05265-7
    DOI: 10.1007/s11192-025-05265-7
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11192-025-05265-7
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11192-025-05265-7?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Mingyang Wang & Jiaqi Zhang & Shijia Jiao & Xiangrong Zhang & Na Zhu & Guangsheng Chen, 2020. "Important citation identification by exploiting the syntactic and contextual information of citations," Scientometrics, Springer;Akadémiai Kiadó, vol. 125(3), pages 2109-2129, December.
    2. Minh-Thang Luong & Thuy Dung Nguyen & Min-Yen Kan, 2010. "Logical Structure Recovery in Scholarly Articles with Rich Document Features," International Journal of Digital Library Systems (IJDLS), IGI Global, vol. 1(4), pages 1-23, October.
    3. Xiaojun Wan & Fang Liu, 2014. "Are all literature citations equally important? Automatic citation strength estimation and its applications," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 65(9), pages 1929-1938, September.
    4. Shutian Ma & Jin Xu & Chengzhi Zhang, 2018. "Automatic identification of cited text spans: a multi-classifier approach over imbalanced dataset," Scientometrics, Springer;Akadémiai Kiadó, vol. 116(2), pages 1303-1330, August.
    5. Ruihua Qi & Jia Wei & Zhen Shao & Zhengguang Li & Heng Chen & Yunhao Sun & Shaohua Li, 2023. "Multi-task learning model for citation intent classification in scientific publications," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(12), pages 6335-6355, December.
    6. Guo Zhang & Ying Ding & Staša Milojević, 2013. "Citation content analysis (CCA): A framework for syntactic and semantic analysis of citation content," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 64(7), pages 1490-1503, July.
    7. Marc Bertin & Iana Atanassova, 2024. "Linguistic perspectives in deciphering citation function classification," Scientometrics, Springer;Akadémiai Kiadó, vol. 129(10), pages 6301-6313, October.
    8. Shahzad Nazir & Muhammad Asif & Shahbaz Ahmad & Faisal Bukhari & Muhammad Tanvir Afzal & Hanan Aljuaid, 2020. "Important citation identification by exploiting content and section-wise in-text citation count," PLOS ONE, Public Library of Science, vol. 15(3), pages 1-19, March.
    9. Nasrin Asadi & Kambiz Badie & Maryam Tayefeh Mahmoudi, 2019. "Automatic zone identification in scientific papers via fusion techniques," Scientometrics, Springer;Akadémiai Kiadó, vol. 119(2), pages 845-862, May.
    10. Bowen Ma & Chengzhi Zhang & Yuzhuo Wang & Sanhong Deng, 2022. "Enhancing identification of structure function of academic articles using contextual information," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(2), pages 885-925, February.
    11. Yang Zhang & Rongying Zhao & Yufei Wang & Haihua Chen & Adnan Mahmood & Munazza Zaib & Wei Emma Zhang & Quan Z. Sheng, 2022. "Correction to: Towards employing native information in citation function classification," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(11), pages 6579-6579, November.
    12. Faiza Qayyum & Muhammad Tanvir Afzal, 2019. "Identification of important citations by exploiting research articles’ metadata and cue-terms from content," Scientometrics, Springer;Akadémiai Kiadó, vol. 118(1), pages 21-43, January.
    13. Yang Zhang & Rongying Zhao & Yufei Wang & Haihua Chen & Adnan Mahmood & Munazza Zaib & Wei Emma Zhang & Quan Z. Sheng, 2022. "Towards employing native information in citation function classification," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(11), pages 6557-6577, November.
    14. Guo Zhang & Ying Ding & Staša Milojević, 2013. "Citation content analysis (CCA): A framework for syntactic and semantic analysis of citation content," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 64(7), pages 1490-1503, July.
    15. Xiaorui Jiang & Jingqiang Chen, 2023. "Contextualised segment-wise citation function classification," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(9), pages 5117-5158, September.
    16. Dongqing Lyu & Xuanmin Ruan & Juan Xie & Ying Cheng, 2021. "The classification of citing motivations: a meta-synthesis," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(4), pages 3243-3264, April.
    17. Naif Radi Aljohani & Ayman Fayoumi & Saeed-Ul Hassan, 2021. "An in-text citation classification predictive model for a scholarly search system," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(7), pages 5509-5529, July.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Xiaorui Jiang & Jingqiang Chen, 2023. "Contextualised segment-wise citation function classification," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(9), pages 5117-5158, September.
    2. Ruihua Qi & Jia Wei & Zhen Shao & Zhengguang Li & Heng Chen & Yunhao Sun & Shaohua Li, 2023. "Multi-task learning model for citation intent classification in scientific publications," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(12), pages 6335-6355, December.
    3. Faiza Qayyum & Harun Jamil & Naeem Iqbal & DoHyeun Kim & Muhammad Tanvir Afzal, 2022. "Toward potential hybrid features evaluation using MLP-ANN binary classification model to tackle meaningful citations," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(11), pages 6471-6499, November.
    4. Krittin Chatrinan & Thanapon Noraset & Suppawong Tuarob, 2025. "GAN-CITE: leveraging semi-supervised generative adversarial networks for citation function classification with limited data," Scientometrics, Springer;Akadémiai Kiadó, vol. 130(2), pages 679-703, February.
    5. Liu, Xiaojuan & Wang, Chenlin & Chen, Dar-Zen & Huang, Mu-Hsuan, 2022. "Exploring perception of retraction based on mentioned status in post-retraction citations," Journal of Informetrics, Elsevier, vol. 16(3).
    6. Naif Radi Aljohani & Ayman Fayoumi & Saeed-Ul Hassan, 2021. "An in-text citation classification predictive model for a scholarly search system," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(7), pages 5509-5529, July.
    7. Indra Budi & Yaniasih Yaniasih, 2023. "Understanding the meanings of citations using sentiment, role, and citation function classifications," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(1), pages 735-759, January.
    8. Kai Nishikawa, 2023. "How and why are citations between disciplines made? A citation context analysis focusing on natural sciences and social sciences and humanities," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(5), pages 2975-2997, May.
    9. Wang, Shiyun & Mao, Jin & Lu, Kun & Cao, Yujie & Li, Gang, 2021. "Understanding interdisciplinary knowledge integration through citance analysis: A case study on eHealth," Journal of Informetrics, Elsevier, vol. 15(4).
    10. Frederique Bordignon, 2022. "Critical citations in knowledge construction and citation analysis: from paradox to definition," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(2), pages 959-972, February.
    11. Chao Lu & Ying Ding & Chengzhi Zhang, 2017. "Understanding the impact change of a highly cited article: a content-based citation analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 112(2), pages 927-945, August.
    12. Jiang, Xiaorui & Zhuge, Hai, 2019. "Forward search path count as an alternative indirect citation impact indicator," Journal of Informetrics, Elsevier, vol. 13(4).
    13. Kong, Ling & Zhang, Wei & Hu, Haotian & Liang, Zhu & Han, Yonggang & Wang, Dongbo & Song, Min, 2024. "Transdisciplinary fine-grained citation content analysis: A multi-task learning perspective for citation aspect and sentiment classification," Journal of Informetrics, Elsevier, vol. 18(3).
    14. Sehrish Iqbal & Saeed-Ul Hassan & Naif Radi Aljohani & Salem Alelyani & Raheel Nawaz & Lutz Bornmann, 2021. "A decade of in-text citation analysis based on natural language processing and machine learning techniques: an overview of empirical studies," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(8), pages 6551-6599, August.
    15. Kai Nishikawa & Hitoshi Koshiba, 2024. "Exploring the applicability of large language models to citation context analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 129(11), pages 6751-6777, November.
    16. Lutz Bornmann & Robin Haunschild & Sven E. Hug, 2018. "Visualizing the context of citations referencing papers published by Eugene Garfield: a new type of keyword co-occurrence analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 114(2), pages 427-437, February.
    17. Adilson Vital & Diego R. Amancio, 2022. "A comparative analysis of local similarity metrics and machine learning approaches: application to link prediction in author citation networks," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(10), pages 6011-6028, October.
    18. Hamid R. Jamali & Majid Nabavi & Saeid Asadi, 2018. "How video articles are cited, the case of JoVE: Journal of Visualized Experiments," Scientometrics, Springer;Akadémiai Kiadó, vol. 117(3), pages 1821-1839, December.
    19. Witting Antje, 2015. "Measuring the Use of Knowledge in Policy Development," Central European Journal of Public Policy, Sciendo, vol. 9(2), pages 54-62, December.
    20. Tahamtan, Iman & Bornmann, Lutz, 2018. "Core elements in the process of citing publications: Conceptual overview of the literature," Journal of Informetrics, Elsevier, vol. 12(1), pages 203-216.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:scient:v:130:y:2025:i:3:d:10.1007_s11192-025-05265-7. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.