IDEAS home Printed from https://ideas.repec.org/a/spr/scient/v126y2021i8d10.1007_s11192-021-04033-7.html
   My bibliography  Save this article

A deep-learning based citation count prediction model with paper metadata semantic features

Author

Listed:
  • Anqi Ma

    (Dalian University of Technology)

  • Yu Liu

    (Dalian University of Technology)

  • Xiujuan Xu

    (Dalian University of Technology)

  • Tao Dong

    (Dalian University of Technology)

Abstract

Predicting the impact of academic papers can help scholars quickly identify the high-quality papers in the field. How to develop efficient predictive model for evaluating potential papers has attracted increasing attention in academia. Many studies have shown that early citations contribute to improving the performance of predicting the long-term impact of a paper. Besides early citations, some bibliometric features and altmetric features have also been explored for predicting the impact of academic papers. Furthermore, paper metadata text such as title, abstract and keyword contains valuable information which has effect on its citation count. However, present studies ignore the semantic information contained in the metadata text. In this paper, we propose a novel citation prediction model based on paper metadata text to predict the long-term citation count, and the core of our model is to obtain the semantic information from the metadata text. We use deep learning techniques to encode the metadata text, and then further extract high-level semantic features for learning the citation prediction task. We also integrate early citations for improving the prediction performance of the model. We show that our proposed model outperforms the state-of-the-art models in predicting the long-term citation count of the papers, and metadata semantic features are effective for improving the accuracy of the citation prediction models.

Suggested Citation

  • Anqi Ma & Yu Liu & Xiujuan Xu & Tao Dong, 2021. "A deep-learning based citation count prediction model with paper metadata semantic features," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(8), pages 6803-6823, August.
  • Handle: RePEc:spr:scient:v:126:y:2021:i:8:d:10.1007_s11192-021-04033-7
    DOI: 10.1007/s11192-021-04033-7
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11192-021-04033-7
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11192-021-04033-7?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Bornmann, Lutz & Leydesdorff, Loet & Wang, Jian, 2014. "How to improve the prediction based on citation impact percentiles for years shortly after the publication date?," Journal of Informetrics, Elsevier, vol. 8(1), pages 175-180.
    2. Stegehuis, Clara & Litvak, Nelly & Waltman, Ludo, 2015. "Predicting the long-term citation impact of recent publications," Journal of Informetrics, Elsevier, vol. 9(3), pages 642-657.
    3. Babak Sohrabi & Hamideh Iraj, 2017. "The effect of keyword repetition in abstract and keyword frequency per journal in predicting citation counts," Scientometrics, Springer;Akadémiai Kiadó, vol. 110(1), pages 243-251, January.
    4. Ruan, Xuanmin & Zhu, Yuanyang & Li, Jiang & Cheng, Ying, 2020. "Predicting the citation counts of individual papers via a BP neural network," Journal of Informetrics, Elsevier, vol. 14(3).
    5. Zhang, Yi & Lu, Jie & Liu, Feng & Liu, Qian & Porter, Alan & Chen, Hongshu & Zhang, Guangquan, 2018. "Does deep learning help topic extraction? A kernel k-means clustering method with word embedding," Journal of Informetrics, Elsevier, vol. 12(4), pages 1099-1117.
    6. Andrea Fronzetti Colladon & Ciriaco Andrea D’Angelo & Peter A. Gloor, 2020. "Predicting the future success of scientific publications through social network and semantic analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 124(1), pages 357-377, July.
    7. Abramo, Giovanni & D’Angelo, Ciriaco Andrea & Felici, Giovanni, 2019. "Predicting publication long-term impact through a combination of early citations and journal impact factor," Journal of Informetrics, Elsevier, vol. 13(1), pages 32-49.
    8. Saeed-Ul Hassan & Timothy D. Bowman & Mudassir Shabbir & Aqsa Akhtar & Mubashir Imran & Naif Radi Aljohani, 2019. "Influential tweeters in relation to highly cited articles in altmetric big data," Scientometrics, Springer;Akadémiai Kiadó, vol. 119(1), pages 481-493, April.
    9. Letchford, Adrian & Preis, Tobias & Moat, Helen Susannah, 2016. "The advantage of simple paper abstracts," Journal of Informetrics, Elsevier, vol. 10(1), pages 1-8.
    10. Leo Egghe, 2006. "Theory and practise of the g-index," Scientometrics, Springer;Akadémiai Kiadó, vol. 69(1), pages 131-152, October.
    11. Hamid R. Jamali & Mahsa Nikzad, 2011. "Article title type and its relation with the number of downloads and citations," Scientometrics, Springer;Akadémiai Kiadó, vol. 88(2), pages 653-661, August.
    12. Abrishami, Ali & Aliakbary, Sadegh, 2019. "Predicting citation counts based on deep neural network learning techniques," Journal of Informetrics, Elsevier, vol. 13(2), pages 485-499.
    13. Shaobo Li & Jie Hu & Yuxin Cui & Jianjun Hu, 2018. "DeepPatent: patent classification with convolutional neural networks and word embedding," Scientometrics, Springer;Akadémiai Kiadó, vol. 117(2), pages 721-744, November.
    14. Fenghua Wang & Ying Fan & An Zeng & Zengru Di, 2019. "Can we predict ESI highly cited publications?," Scientometrics, Springer;Akadémiai Kiadó, vol. 118(1), pages 109-125, January.
    15. Cao, Xuanyu & Chen, Yan & Ray Liu, K.J., 2016. "A data analytic approach to quantifying scientific impact," Journal of Informetrics, Elsevier, vol. 10(2), pages 471-484.
    16. Bai, Xiaomei & Zhang, Fuli & Lee, Ivan, 2019. "Predicting the citations of scholarly paper," Journal of Informetrics, Elsevier, vol. 13(1), pages 407-418.
    17. Tibor Braun & Wolfgang Glänzel & András Schubert, 2006. "A Hirsch-type index for journals," Scientometrics, Springer;Akadémiai Kiadó, vol. 69(1), pages 169-173, October.
    18. Hu, Ya-Han & Tai, Chun-Tien & Liu, Kang Ernest & Cai, Cheng-Fang, 2020. "Identification of highly-cited papers using topic-model-based and bibliometric features: the consideration of keyword popularity," Journal of Informetrics, Elsevier, vol. 14(1).
    19. Bornmann, Lutz & Schier, Hermann & Marx, Werner & Daniel, Hans-Dieter, 2012. "What factors determine citation counts of publications in chemistry besides their quality?," Journal of Informetrics, Elsevier, vol. 6(1), pages 11-18.
    20. Mingyang Wang & Zhenyu Wang & Guangsheng Chen, 2019. "Which can better predict the future success of articles? Bibliometric indices or alternative metrics," Scientometrics, Springer;Akadémiai Kiadó, vol. 119(3), pages 1575-1595, June.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Li, Xin & Tang, Xuli & Cheng, Qikai, 2022. "Predicting the clinical citation count of biomedical papers using multilayer perceptron neural network," Journal of Informetrics, Elsevier, vol. 16(4).
    2. Wanjun Xia & Tianrui Li & Chongshou Li, 2023. "A review of scientific impact prediction: tasks, features and methods," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(1), pages 543-585, January.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Wanjun Xia & Tianrui Li & Chongshou Li, 2023. "A review of scientific impact prediction: tasks, features and methods," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(1), pages 543-585, January.
    2. Martorell Cunil, Onofre & Otero González, Luis & Durán Santomil, Pablo & Mulet Forteza, Carlos, 2023. "How to accomplish a highly cited paper in the tourism, leisure and hospitality field," Journal of Business Research, Elsevier, vol. 157(C).
    3. Wumei Du & Zheng Xie & Yiqin Lv, 2021. "Predicting publication productivity for authors: Shallow or deep architecture?," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(7), pages 5855-5879, July.
    4. Hu, Ya-Han & Tai, Chun-Tien & Liu, Kang Ernest & Cai, Cheng-Fang, 2020. "Identification of highly-cited papers using topic-model-based and bibliometric features: the consideration of keyword popularity," Journal of Informetrics, Elsevier, vol. 14(1).
    5. Chowdhury, K.P., 2021. "Functional analysis of generalized linear models under non-linear constraints with applications to identifying highly-cited papers," Journal of Informetrics, Elsevier, vol. 15(1).
    6. Zhang, Xinyuan & Xie, Qing & Song, Min, 2021. "Measuring the impact of novelty, bibliometric, and academic-network factors on citation count using a neural network," Journal of Informetrics, Elsevier, vol. 15(2).
    7. Wang, Xing & Zhang, Zhihui, 2020. "Improving the reliability of short-term citation impact indicators by taking into account the correlation between short- and long-term citation impact," Journal of Informetrics, Elsevier, vol. 14(2).
    8. Andrea Fronzetti Colladon & Ciriaco Andrea D’Angelo & Peter A. Gloor, 2020. "Predicting the future success of scientific publications through social network and semantic analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 124(1), pages 357-377, July.
    9. Xie, Zheng, 2020. "Predicting publication productivity for researchers: A piecewise Poisson model," Journal of Informetrics, Elsevier, vol. 14(3).
    10. Ruan, Xuanmin & Zhu, Yuanyang & Li, Jiang & Cheng, Ying, 2020. "Predicting the citation counts of individual papers via a BP neural network," Journal of Informetrics, Elsevier, vol. 14(3).
    11. Sepideh Fahimifar & Khadijeh Mousavi & Fatemeh Mozaffari & Marcel Ausloos, 2023. "Identification of the most important external features of highly cited scholarly papers through 3 (i.e., Ridge, Lasso, and Boruta) feature selection data mining methods," Quality & Quantity: International Journal of Methodology, Springer, vol. 57(4), pages 3685-3712, August.
    12. Kehan Wang & Wenxuan Shi & Junsong Bai & Xiaoping Zhao & Liying Zhang, 2021. "Prediction and application of article potential citations based on nonlinear citation-forecasting combined model," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(8), pages 6533-6550, August.
    13. Akella, Akhil Pandey & Alhoori, Hamed & Kondamudi, Pavan Ravikanth & Freeman, Cole & Zhou, Haiming, 2021. "Early indicators of scientific impact: Predicting citations with altmetrics," Journal of Informetrics, Elsevier, vol. 15(2).
    14. Mingyue Sun & Tingcan Ma & Lewei Zhou & Mingliang Yue, 2023. "Analysis of the relationships among paper citation and its influencing factors: a Bayesian network-based approach," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(5), pages 3017-3033, May.
    15. Zhao, Qihang & Feng, Xiaodong, 2022. "Utilizing citation network structure to predict paper citation counts: A Deep learning approach," Journal of Informetrics, Elsevier, vol. 16(1).
    16. Shengzhi Huang & Jiajia Qian & Yong Huang & Wei Lu & Yi Bu & Jinqing Yang & Qikai Cheng, 2022. "Disclosing the relationship between citation structure and future impact of a publication," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 73(7), pages 1025-1042, July.
    17. Xiaomei Bai & Fuli Zhang & Jinzhou Li & Zhong Xu & Zeeshan Patoli & Ivan Lee, 2021. "Quantifying scientific collaboration impact by exploiting collaboration-citation network," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(9), pages 7993-8008, September.
    18. Gao, Qiang & Liang, Zhentao & Wang, Ping & Hou, Jingrui & Chen, Xiuxiu & Liu, Manman, 2021. "Potential index: Revealing the future impact of research topics based on current knowledge networks," Journal of Informetrics, Elsevier, vol. 15(3).
    19. Kong, Ling & Wang, Dongbo, 2020. "Comparison of citations and attention of cover and non-cover papers," Journal of Informetrics, Elsevier, vol. 14(4).
    20. Sato, Ryoma & Yamada, Makoto & Kashima, Hisashi, 2022. "Poincare: Recommending Publication Venues via Treatment Effect Estimation," Journal of Informetrics, Elsevier, vol. 16(2).

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:scient:v:126:y:2021:i:8:d:10.1007_s11192-021-04033-7. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.