IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0312945.html

Instant prediction of scientific paper cited potential based on semantic and metadata features: Taking artificial intelligence field as an example

Author

Listed:
  • Hou Zhu
  • Li Shuhuai

Abstract

With the continuous increase in the number of academic researchers, the volume of scientific papers is also increasing rapidly. The challenge of identifying papers with greater potential academic impact from this large pool has received increasing attention. The citation frequency of a paper is often used as an objective indicator to gauge the academic influence of the paper. The task of citation frequency prediction based on historical citation data in previous studies can achieve high accuracy. However, it can only be executed after the paper has been published for a period. The delay is not conducive to timely discovery of papers with high citation frequency. In this paper, we propose a novel method for predicting cited potential of a paper based on the metadata and semantic information, which can predict the cited potential of academic paper instantly once it has been published. Specifically, the semantic information, such as abstract, semantic span and semantic inflection, is extracted to enhance the ability of the prediction model based on machine learning. To prove the effectiveness and rationality of cited potential prediction model, we conduct two experiments to validate the model and find the most effective combination of input information. The empirical experiments show that the prediction accuracy of our proposed model can reach 88% for the instant prediction of citation.

Suggested Citation

  • Hou Zhu & Li Shuhuai, 2024. "Instant prediction of scientific paper cited potential based on semantic and metadata features: Taking artificial intelligence field as an example," PLOS ONE, Public Library of Science, vol. 19(12), pages 1-20, December.
  • Handle: RePEc:plo:pone00:0312945
    DOI: 10.1371/journal.pone.0312945
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0312945
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0312945&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0312945?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Henk F. Moed & Lisa Colledge & Jan Reedijk & Felix Moya-Anegon & Vicente Guerrero-Bote & Andrew Plume & Mayur Amin, 2012. "Citation-based metrics are appropriate tools in journal assessment provided that they are accurate and used in an informed way," Scientometrics, Springer;Akadémiai Kiadó, vol. 92(2), pages 367-376, August.
    2. Bedoor K. AlShebli & Talal Rahwan & Wei Lee Woon, 2018. "The preeminence of ethnic diversity in scientific collaboration," Nature Communications, Nature, vol. 9(1), pages 1-10, December.
    3. Babak Sohrabi & Hamideh Iraj, 2017. "The effect of keyword repetition in abstract and keyword frequency per journal in predicting citation counts," Scientometrics, Springer;Akadémiai Kiadó, vol. 110(1), pages 243-251, January.
    4. Hamid R. Jamali & Mahsa Nikzad, 2011. "Article title type and its relation with the number of downloads and citations," Scientometrics, Springer;Akadémiai Kiadó, vol. 88(2), pages 653-661, August.
    5. Salim Moussa, 2021. "Are FT50 journals really leading? A comment on Fassin," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(12), pages 9613-9622, December.
    6. Anqi Ma & Yu Liu & Xiujuan Xu & Tao Dong, 2021. "A deep-learning based citation count prediction model with paper metadata semantic features," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(8), pages 6803-6823, August.
    7. Tian Yu & Guang Yu & Peng-Yu Li & Liang Wang, 2014. "Citation impact prediction for scientific papers using stepwise regression analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 101(2), pages 1233-1252, November.
    8. Lorna Wildgaard & Jesper W. Schneider & Birger Larsen, 2014. "A review of the characteristics of 108 author-level bibliometric indicators," Scientometrics, Springer;Akadémiai Kiadó, vol. 101(1), pages 125-158, October.
    9. Kathy McKeown & Hal Daume III & Snigdha Chaturvedi & John Paparrizos & Kapil Thadani & Pablo Barrio & Or Biran & Suvarna Bothe & Michael Collins & Kenneth R. Fleischmann & Luis Gravano & Rahul Jha & B, 2016. "Predicting the impact of scientific concepts using full-text features," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 67(11), pages 2684-2696, November.
    10. Lawrence D. Fu & Constantin F. Aliferis, 2010. "Using content-based and bibliometric features for machine learning models to predict citation counts in the biomedical literature," Scientometrics, Springer;Akadémiai Kiadó, vol. 85(1), pages 257-270, October.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Wan Siti Nur Aiza & Liyana Shuib & Norisma Idris & Nur Baiti Afini Normadhi, 2024. "Features, techniques and evaluation in predicting articles’ citations: a review from years 2010–2023," Scientometrics, Springer;Akadémiai Kiadó, vol. 129(1), pages 1-29, January.
    2. Kayvan Kousha & Mike Thelwall, 2024. "Factors associating with or predicting more cited or higher quality journal articles: An Annual Review of Information Science and Technology (ARIST) paper," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 75(3), pages 215-244, March.
    3. Wanjun Xia & Tianrui Li & Chongshou Li, 2023. "A review of scientific impact prediction: tasks, features and methods," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(1), pages 543-585, January.
    4. Martorell Cunil, Onofre & Otero González, Luis & Durán Santomil, Pablo & Mulet Forteza, Carlos, 2023. "How to accomplish a highly cited paper in the tourism, leisure and hospitality field," Journal of Business Research, Elsevier, vol. 157(C).
    5. Iman Tahamtan & Askar Safipour Afshar & Khadijeh Ahamdzadeh, 2016. "Factors affecting number of citations: a comprehensive review of the literature," Scientometrics, Springer;Akadémiai Kiadó, vol. 107(3), pages 1195-1225, June.
    6. Zhengang Zhang & Chuanming Yu & Jingnan Wang & Lu An, 2025. "A temporal evolution and fine-grained information aggregation model for citation count prediction," Scientometrics, Springer;Akadémiai Kiadó, vol. 130(4), pages 2069-2091, April.
    7. Kehan Wang & Wenxuan Shi & Junsong Bai & Xiaoping Zhao & Liying Zhang, 2021. "Prediction and application of article potential citations based on nonlinear citation-forecasting combined model," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(8), pages 6533-6550, August.
    8. Ajiferuke, Isola & Famoye, Felix, 2015. "Modelling count response variables in informetric studies: Comparison among count, linear, and lognormal regression models," Journal of Informetrics, Elsevier, vol. 9(3), pages 499-513.
    9. Anqi Ma & Yu Liu & Xiujuan Xu & Tao Dong, 2021. "A deep-learning based citation count prediction model with paper metadata semantic features," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(8), pages 6803-6823, August.
    10. Khaled Alnowaiser, 2024. "Scientific text citation analysis using CNN features and ensemble learning model," PLOS ONE, Public Library of Science, vol. 19(5), pages 1-19, May.
    11. Bai, Xiaomei & Zhang, Fuli & Lee, Ivan, 2019. "Predicting the citations of scholarly paper," Journal of Informetrics, Elsevier, vol. 13(1), pages 407-418.
    12. Yubing Nie & Yifan Zhu & Qika Lin & Sifan Zhang & Pengfei Shi & Zhendong Niu, 2019. "Academic rising star prediction via scholar’s evaluation model and machine learning techniques," Scientometrics, Springer;Akadémiai Kiadó, vol. 120(2), pages 461-476, August.
    13. Fang Zhang & Shengli Wu, 2024. "Predicting citation impact of academic papers across research areas using multiple models and early citations," Scientometrics, Springer;Akadémiai Kiadó, vol. 129(7), pages 4137-4166, July.
    14. Rafał Zbonikowski, 2025. "What influences the number of citations of scientific articles? Study on colloid and interface science," Scientometrics, Springer;Akadémiai Kiadó, vol. 130(5), pages 2577-2593, May.
    15. Hu, Ya-Han & Tai, Chun-Tien & Liu, Kang Ernest & Cai, Cheng-Fang, 2020. "Identification of highly-cited papers using topic-model-based and bibliometric features: the consideration of keyword popularity," Journal of Informetrics, Elsevier, vol. 14(1).
    16. Yezhu Wang & Yundong Xie & Dong Wang & Lu Guo & Rongting Zhou, 2022. "Do cover papers get better citations and usage counts? An analysis of 42 journals in cell biology," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(7), pages 3793-3813, July.
    17. Sergio Jimenez & Youlin Avila & George Dueñas & Alexander Gelbukh, 2020. "Automatic prediction of citability of scientific articles by stylometry of their titles and abstracts," Scientometrics, Springer;Akadémiai Kiadó, vol. 125(3), pages 3187-3232, December.
    18. Zhao, Qihang & Feng, Xiaodong, 2022. "Utilizing citation network structure to predict paper citation counts: A Deep learning approach," Journal of Informetrics, Elsevier, vol. 16(1).
    19. Bornmann, Lutz & Haunschild, Robin & Mutz, Rüdiger, 2020. "Should citations be field-normalized in evaluative bibliometrics? An empirical analysis based on propensity score matching," Journal of Informetrics, Elsevier, vol. 14(4).
    20. Stegehuis, Clara & Litvak, Nelly & Waltman, Ludo, 2015. "Predicting the long-term citation impact of recent publications," Journal of Informetrics, Elsevier, vol. 9(3), pages 642-657.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0312945. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.