IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0302304.html
   My bibliography  Save this article

Scientific text citation analysis using CNN features and ensemble learning model

Author

Listed:
  • Khaled Alnowaiser

Abstract

Citation illustrates the link between citing and cited documents. Different aspects of achievements like the journal’s impact factor, author’s ranking, and peers’ judgment are analyzed using citations. However, citations are given the same weight for determining these important metrics. However academics contend that not all citations can ever have equal weight. Predominantly, such rankings are based on quantitative measures and the qualitative aspect is completely ignored. For a fair evaluation, qualitative evaluation of citations is needed in addition to quantitative ones. Many existing works that use qualitative evaluation consider binary class and categorize citations as important or unimportant. This study considers multi-class tasks for citation sentiments on imbalanced data and presents a novel framework for sentiment analysis in in-text citations of research articles. In the proposed technique, features are retrieved using a convolutional neural network (CNN), and classification is performed using a voting classifier that combines Logistic Regression (LR) and Stochastic Gradient Descent (SGD). The class imbalance problem is handled by the synthetic minority oversampling technique (SMOTE). Extensive experiments are performed in comparison with the proposed approach using SMOTE-generated data and machine learning models by term frequency (TF), and term frequency-inverse document frequency (TF-IDF) to evaluate the efficacy of the proposed approach for citation analysis. It is found that the proposed voting classifier using CNN features achieves an accuracy, precision, recall, and F1 score of 0.99 for all. This work not only advances the field of sentiment analysis in academic citations but also underscores the importance of incorporating qualitative aspects in evaluating the impact and sentiments conveyed through citations.

Suggested Citation

  • Khaled Alnowaiser, 2024. "Scientific text citation analysis using CNN features and ensemble learning model," PLOS ONE, Public Library of Science, vol. 19(5), pages 1-19, May.
  • Handle: RePEc:plo:pone00:0302304
    DOI: 10.1371/journal.pone.0302304
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0302304
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0302304&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0302304?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Mingyang Wang & Jiaqi Zhang & Shijia Jiao & Xiangrong Zhang & Na Zhu & Guangsheng Chen, 2020. "Important citation identification by exploiting the syntactic and contextual information of citations," Scientometrics, Springer;Akadémiai Kiadó, vol. 125(3), pages 2109-2129, December.
    2. Jevin D. West & Michael C. Jensen & Ralph J. Dandrea & Gregory J. Gordon & Carl T. Bergstrom, 2013. "Author‐level Eigenfactor metrics: Evaluating the influence of authors, institutions, and countries within the social science research network community," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 64(4), pages 787-801, April.
    3. Judit Bar-Ilan & Gali Halevi, 2017. "Post retraction citations in context: a case study," Scientometrics, Springer;Akadémiai Kiadó, vol. 113(1), pages 547-565, October.
    4. Henk F. Moed & Lisa Colledge & Jan Reedijk & Felix Moya-Anegon & Vicente Guerrero-Bote & Andrew Plume & Mayur Amin, 2012. "Citation-based metrics are appropriate tools in journal assessment provided that they are accurate and used in an informed way," Scientometrics, Springer;Akadémiai Kiadó, vol. 92(2), pages 367-376, August.
    5. Lorna Wildgaard & Jesper W. Schneider & Birger Larsen, 2014. "A review of the characteristics of 108 author-level bibliometric indicators," Scientometrics, Springer;Akadémiai Kiadó, vol. 101(1), pages 125-158, October.
    6. Xiaodan Zhu & Peter Turney & Daniel Lemire & André Vellino, 2015. "Measuring academic influence: Not all citations are equal," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 66(2), pages 408-427, February.
    7. Muhammad Touseef Ikram & Muhammad Tanvir Afzal, 2019. "Aspect based citation sentiment analysis using linguistic patterns for better comprehension of scientific knowledge," Scientometrics, Springer;Akadémiai Kiadó, vol. 119(1), pages 73-95, April.
    8. Yong Zhang & Hongrui Zhang & Jing Cai & Binbin Yang, 2014. "A Weighted Voting Classifier Based on Differential Evolution," Abstract and Applied Analysis, John Wiley & Sons, vol. 2014(1).
    9. Yong Zhang & Hongrui Zhang & Jing Cai & Binbin Yang, 2014. "A Weighted Voting Classifier Based on Differential Evolution," Abstract and Applied Analysis, Hindawi, vol. 2014, pages 1-6, May.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Faiza Qayyum & Harun Jamil & Naeem Iqbal & DoHyeun Kim & Muhammad Tanvir Afzal, 2022. "Toward potential hybrid features evaluation using MLP-ANN binary classification model to tackle meaningful citations," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(11), pages 6471-6499, November.
    2. Lathabai, Hiran H., 2020. "ψ-index: A new overall productivity index for actors of science and technology," Journal of Informetrics, Elsevier, vol. 14(4).
    3. Mingyang Wang & Jiaqi Zhang & Shijia Jiao & Xiangrong Zhang & Na Zhu & Guangsheng Chen, 2020. "Important citation identification by exploiting the syntactic and contextual information of citations," Scientometrics, Springer;Akadémiai Kiadó, vol. 125(3), pages 2109-2129, December.
    4. Muhammad Touseef Ikram & Muhammad Tanvir Afzal, 2019. "Aspect based citation sentiment analysis using linguistic patterns for better comprehension of scientific knowledge," Scientometrics, Springer;Akadémiai Kiadó, vol. 119(1), pages 73-95, April.
    5. Mark Levene & Trevor Fenner & Judit Bar-Ilan, 2019. "Characterisation of the $$\chi$$χ-index and the rec-index," Scientometrics, Springer;Akadémiai Kiadó, vol. 120(2), pages 885-896, August.
    6. Xiaorui Jiang & Jingqiang Chen, 2023. "Contextualised segment-wise citation function classification," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(9), pages 5117-5158, September.
    7. Xin An & Xin Sun & Shuo Xu, 2022. "Important citations identification with semi-supervised classification model," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(11), pages 6533-6555, November.
    8. Waltman, Ludo, 2016. "A review of the literature on citation impact indicators," Journal of Informetrics, Elsevier, vol. 10(2), pages 365-391.
    9. Hou Zhu & Li Shuhuai, 2024. "Instant prediction of scientific paper cited potential based on semantic and metadata features: Taking artificial intelligence field as an example," PLOS ONE, Public Library of Science, vol. 19(12), pages 1-20, December.
    10. Setio Basuki & Masatoshi Tsuchiya, 2022. "SDCF: semi-automatically structured dataset of citation functions," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(8), pages 4569-4608, August.
    11. Liwei Cai & Jiahao Tian & Jiaying Liu & Xiaomei Bai & Ivan Lee & Xiangjie Kong & Feng Xia, 2019. "Scholarly impact assessment: a survey of citation weighting solutions," Scientometrics, Springer;Akadémiai Kiadó, vol. 118(2), pages 453-478, February.
    12. Sehrish Iqbal & Saeed-Ul Hassan & Naif Radi Aljohani & Salem Alelyani & Raheel Nawaz & Lutz Bornmann, 2021. "A decade of in-text citation analysis based on natural language processing and machine learning techniques: an overview of empirical studies," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(8), pages 6551-6599, August.
    13. Kehan Wang & Wenxuan Shi & Junsong Bai & Xiaoping Zhao & Liying Zhang, 2021. "Prediction and application of article potential citations based on nonlinear citation-forecasting combined model," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(8), pages 6533-6550, August.
    14. Deming Lin & Tianhui Gong & Wenbin Liu & Martin Meyer, 2020. "An entropy-based measure for the evolution of h index research," Scientometrics, Springer;Akadémiai Kiadó, vol. 125(3), pages 2283-2298, December.
    15. Chao Min & Qingyu Chen & Erjia Yan & Yi Bu & Jianjun Sun, 2021. "Citation cascade and the evolution of topic relevance," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 72(1), pages 110-127, January.
    16. Marion Schmidt, 2024. "Why do some retracted articles continue to get cited?," Scientometrics, Springer;Akadémiai Kiadó, vol. 129(12), pages 7535-7563, December.
    17. Liu, Xiaojuan & Wang, Chenlin & Chen, Dar-Zen & Huang, Mu-Hsuan, 2022. "Exploring perception of retraction based on mentioned status in post-retraction citations," Journal of Informetrics, Elsevier, vol. 16(3).
    18. repec:plo:pone00:0155097 is not listed on IDEAS
    19. Yi Bu & Binglu Wang & Win-bin Huang & Shangkun Che & Yong Huang, 2018. "Using the appearance of citations in full text on author co-citation analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 116(1), pages 275-289, July.
    20. Zheng Yan & Wenqian Robertson & Yaosheng Lou & Tom W. Robertson & Sung Yong Park, 2021. "Finding leading scholars in mobile phone behavior: a mixed-method analysis of an emerging interdisciplinary field," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(12), pages 9499-9517, December.
    21. Zhenbin Yan & Qiang Wu & Xingchen Li, 2016. "Do Hirsch-type indices behave the same in assessing single publications? An empirical study of 29 bibliometric indicators," Scientometrics, Springer;Akadémiai Kiadó, vol. 109(3), pages 1815-1833, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0302304. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.