IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0312945.html
   My bibliography  Save this article

Instant prediction of scientific paper cited potential based on semantic and metadata features: Taking artificial intelligence field as an example

Author

Listed:
  • Hou Zhu
  • Li Shuhuai

Abstract

With the continuous increase in the number of academic researchers, the volume of scientific papers is also increasing rapidly. The challenge of identifying papers with greater potential academic impact from this large pool has received increasing attention. The citation frequency of a paper is often used as an objective indicator to gauge the academic influence of the paper. The task of citation frequency prediction based on historical citation data in previous studies can achieve high accuracy. However, it can only be executed after the paper has been published for a period. The delay is not conducive to timely discovery of papers with high citation frequency. In this paper, we propose a novel method for predicting cited potential of a paper based on the metadata and semantic information, which can predict the cited potential of academic paper instantly once it has been published. Specifically, the semantic information, such as abstract, semantic span and semantic inflection, is extracted to enhance the ability of the prediction model based on machine learning. To prove the effectiveness and rationality of cited potential prediction model, we conduct two experiments to validate the model and find the most effective combination of input information. The empirical experiments show that the prediction accuracy of our proposed model can reach 88% for the instant prediction of citation.

Suggested Citation

  • Hou Zhu & Li Shuhuai, 2024. "Instant prediction of scientific paper cited potential based on semantic and metadata features: Taking artificial intelligence field as an example," PLOS ONE, Public Library of Science, vol. 19(12), pages 1-20, December.
  • Handle: RePEc:plo:pone00:0312945
    DOI: 10.1371/journal.pone.0312945
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0312945
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0312945&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0312945?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Babak Sohrabi & Hamideh Iraj, 2017. "The effect of keyword repetition in abstract and keyword frequency per journal in predicting citation counts," Scientometrics, Springer;Akadémiai Kiadó, vol. 110(1), pages 243-251, January.
    2. Hamid R. Jamali & Mahsa Nikzad, 2011. "Article title type and its relation with the number of downloads and citations," Scientometrics, Springer;Akadémiai Kiadó, vol. 88(2), pages 653-661, August.
    3. Kathy McKeown & Hal Daume III & Snigdha Chaturvedi & John Paparrizos & Kapil Thadani & Pablo Barrio & Or Biran & Suvarna Bothe & Michael Collins & Kenneth R. Fleischmann & Luis Gravano & Rahul Jha & B, 2016. "Predicting the impact of scientific concepts using full-text features," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 67(11), pages 2684-2696, November.
    4. Henk F. Moed & Lisa Colledge & Jan Reedijk & Felix Moya-Anegon & Vicente Guerrero-Bote & Andrew Plume & Mayur Amin, 2012. "Citation-based metrics are appropriate tools in journal assessment provided that they are accurate and used in an informed way," Scientometrics, Springer;Akadémiai Kiadó, vol. 92(2), pages 367-376, August.
    5. Bedoor K. AlShebli & Talal Rahwan & Wei Lee Woon, 2018. "The preeminence of ethnic diversity in scientific collaboration," Nature Communications, Nature, vol. 9(1), pages 1-10, December.
    6. Salim Moussa, 2021. "Are FT50 journals really leading? A comment on Fassin," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(12), pages 9613-9622, December.
    7. Tian Yu & Guang Yu & Peng-Yu Li & Liang Wang, 2014. "Citation impact prediction for scientific papers using stepwise regression analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 101(2), pages 1233-1252, November.
    8. Lorna Wildgaard & Jesper W. Schneider & Birger Larsen, 2014. "A review of the characteristics of 108 author-level bibliometric indicators," Scientometrics, Springer;Akadémiai Kiadó, vol. 101(1), pages 125-158, October.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Wan Siti Nur Aiza & Liyana Shuib & Norisma Idris & Nur Baiti Afini Normadhi, 2024. "Features, techniques and evaluation in predicting articles’ citations: a review from years 2010–2023," Scientometrics, Springer;Akadémiai Kiadó, vol. 129(1), pages 1-29, January.
    2. Martorell Cunil, Onofre & Otero González, Luis & Durán Santomil, Pablo & Mulet Forteza, Carlos, 2023. "How to accomplish a highly cited paper in the tourism, leisure and hospitality field," Journal of Business Research, Elsevier, vol. 157(C).
    3. Kayvan Kousha & Mike Thelwall, 2024. "Factors associating with or predicting more cited or higher quality journal articles: An Annual Review of Information Science and Technology (ARIST) paper," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 75(3), pages 215-244, March.
    4. Kehan Wang & Wenxuan Shi & Junsong Bai & Xiaoping Zhao & Liying Zhang, 2021. "Prediction and application of article potential citations based on nonlinear citation-forecasting combined model," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(8), pages 6533-6550, August.
    5. Ajiferuke, Isola & Famoye, Felix, 2015. "Modelling count response variables in informetric studies: Comparison among count, linear, and lognormal regression models," Journal of Informetrics, Elsevier, vol. 9(3), pages 499-513.
    6. Anqi Ma & Yu Liu & Xiujuan Xu & Tao Dong, 2021. "A deep-learning based citation count prediction model with paper metadata semantic features," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(8), pages 6803-6823, August.
    7. Khaled Alnowaiser, 2024. "Scientific text citation analysis using CNN features and ensemble learning model," PLOS ONE, Public Library of Science, vol. 19(5), pages 1-19, May.
    8. Bai, Xiaomei & Zhang, Fuli & Lee, Ivan, 2019. "Predicting the citations of scholarly paper," Journal of Informetrics, Elsevier, vol. 13(1), pages 407-418.
    9. Yubing Nie & Yifan Zhu & Qika Lin & Sifan Zhang & Pengfei Shi & Zhendong Niu, 2019. "Academic rising star prediction via scholar’s evaluation model and machine learning techniques," Scientometrics, Springer;Akadémiai Kiadó, vol. 120(2), pages 461-476, August.
    10. Wanjun Xia & Tianrui Li & Chongshou Li, 2023. "A review of scientific impact prediction: tasks, features and methods," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(1), pages 543-585, January.
    11. Sergio Jimenez & Youlin Avila & George Dueñas & Alexander Gelbukh, 2020. "Automatic prediction of citability of scientific articles by stylometry of their titles and abstracts," Scientometrics, Springer;Akadémiai Kiadó, vol. 125(3), pages 3187-3232, December.
    12. Bornmann, Lutz & Haunschild, Robin & Mutz, Rüdiger, 2020. "Should citations be field-normalized in evaluative bibliometrics? An empirical analysis based on propensity score matching," Journal of Informetrics, Elsevier, vol. 14(4).
    13. Iman Tahamtan & Askar Safipour Afshar & Khadijeh Ahamdzadeh, 2016. "Factors affecting number of citations: a comprehensive review of the literature," Scientometrics, Springer;Akadémiai Kiadó, vol. 107(3), pages 1195-1225, June.
    14. Bikun Chen & Dannan Deng & Zhouyan Zhong & Chengzhi Zhang, 2020. "Exploring linguistic characteristics of highly browsed and downloaded academic articles," Scientometrics, Springer;Akadémiai Kiadó, vol. 122(3), pages 1769-1790, March.
    15. Juan Xie & Kaile Gong & Jiang Li & Qing Ke & Hyonchol Kang & Ying Cheng, 2019. "A probe into 66 factors which are possibly associated with the number of citations an article received," Scientometrics, Springer;Akadémiai Kiadó, vol. 119(3), pages 1429-1454, June.
    16. Sepideh Fahimifar & Khadijeh Mousavi & Fatemeh Mozaffari & Marcel Ausloos, 2023. "Identification of the most important external features of highly cited scholarly papers through 3 (i.e., Ridge, Lasso, and Boruta) feature selection data mining methods," Quality & Quantity: International Journal of Methodology, Springer, vol. 57(4), pages 3685-3712, August.
    17. Tehmina Amjad & Nafeesa Shahid & Ali Daud & Asma Khatoon, 2022. "Citation burst prediction in a bibliometric network," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(5), pages 2773-2790, May.
    18. Akella, Akhil Pandey & Alhoori, Hamed & Kondamudi, Pavan Ravikanth & Freeman, Cole & Zhou, Haiming, 2021. "Early indicators of scientific impact: Predicting citations with altmetrics," Journal of Informetrics, Elsevier, vol. 15(2).
    19. Nunkoo, Robin & Hall, C. Michael & Rughoobur-Seetah, Soujata & Teeroovengadum, Viraiyan, 2019. "Citation practices in tourism research: Toward a gender conscientious engagement," Annals of Tourism Research, Elsevier, vol. 79(C).
    20. Lakshmi Balachandran Nair & Michael Gibbert, 2016. "What makes a ‘good’ title and (how) does it matter for citations? A review and general model of article title attributes in management science," Scientometrics, Springer;Akadémiai Kiadó, vol. 107(3), pages 1331-1359, June.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0312945. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.