IDEAS home Printed from https://ideas.repec.org/a/eee/infome/v17y2023i1s1751157723000019.html
   My bibliography  Save this article

Deep representation learning of scientific paper reveals its potential scholarly impact

Author

Listed:
  • Jiang, Zhuoren
  • Lin, Tianqianjin
  • Huang, Cui

Abstract

Citation and citation-based metrics are traditionally used to quantify the scholarly impact of scientific papers. However, for documents without citation data, i.e., newly published papers, the citation-based metrics are not available. By leveraging deep representation techniques, we propose a text-content based approach that may reveal the scholarly impact of papers without human domain-specific knowledge. Specifically, a large-scale Pre-Trained Model (PTM) with 110 million parameters is utilized to automatically encode the paper into the vector representation. Two indicators, τ(Topicality) and σ(Originality), are then proposed based on the learned representations. These two indicators leverage the spatial relations of paper representations in the semantic space to capture the impact-related characteristics of a scientific paper. Extensive experiments have been conducted on a COVID-19 open research dataset with 1,056,660 papers. The experimental results demonstrate that the deep representation learning method can better capture the scientific content in the published literature; and the proposed indicators are positively and significantly associated with a paper’s potential scholarly impact. In the multivariate regression analysis for the potential impact of a paper, the coefficients of σ and τ are 5.4915 (P<0.001) and 6.6879 (P<0.001) for next 6 months prediction, 12.9964 (P<0.001) and 13.8678 (P<0.001) for next 12 months prediction. The proposed framework may facilitate the study of how scholarly impact is generated, from a textual representation perspective.

Suggested Citation

  • Jiang, Zhuoren & Lin, Tianqianjin & Huang, Cui, 2023. "Deep representation learning of scientific paper reveals its potential scholarly impact," Journal of Informetrics, Elsevier, vol. 17(1).
  • Handle: RePEc:eee:infome:v:17:y:2023:i:1:s1751157723000019
    DOI: 10.1016/j.joi.2023.101376
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S1751157723000019
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.joi.2023.101376?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Joshua Ettinger & Friederike E. L. Otto & E. Lisa F. Schipper, 2021. "Storytelling can be a powerful tool for science," Nature, Nature, vol. 589(7842), pages 352-352, January.
    2. M.H. MacRoberts & B.R. MacRoberts, 2010. "Problems of citation analysis: A study of uncited and seldom-cited influences," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 61(1), pages 1-12, January.
    3. Dag W. Aksnes, 2006. "Citation rates and perceptions of scientific contribution," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 57(2), pages 169-185, January.
    4. Wang, Jian & Veugelers, Reinhilde & Stephan, Paula, 2017. "Bias against novelty in science: A cautionary tale for users of bibliometric indicators," Research Policy, Elsevier, vol. 46(8), pages 1416-1436.
    5. Chen, Chaomei & Chen, Yue & Horowitz, Mark & Hou, Haiyan & Liu, Zeyuan & Pellegrino, Donald, 2009. "Towards an explanatory and computational theory of scientific discovery," Journal of Informetrics, Elsevier, vol. 3(3), pages 191-209.
    6. Gunther Eysenbach, 2006. "Citation Advantage of Open Access Articles," Working Papers id:626, eSocialSciences.
    7. Akella, Akhil Pandey & Alhoori, Hamed & Kondamudi, Pavan Ravikanth & Freeman, Cole & Zhou, Haiming, 2021. "Early indicators of scientific impact: Predicting citations with altmetrics," Journal of Informetrics, Elsevier, vol. 15(2).
    8. Johan Bollen & Herbert Van de Sompel & Aric Hagberg & Ryan Chute, 2009. "A Principal Component Analysis of 39 Scientific Impact Measures," PLOS ONE, Public Library of Science, vol. 4(6), pages 1-11, June.
    9. Nick Haslam & Lauren Ban & Leah Kaufmann & Stephen Loughnan & Kim Peters & Jennifer Whelan & Sam Wilson, 2008. "What makes an article influential? Predicting impact in social and personality psychology," Scientometrics, Springer;Akadémiai Kiadó, vol. 76(1), pages 169-185, July.
    10. Rinia, E. J. & van Leeuwen, Th. N. & van Vuren, H. G. & van Raan, A. F. J., 1998. "Comparative analysis of a set of bibliometric indicators and central peer review criteria: Evaluation of condensed matter physics in the Netherlands," Research Policy, Elsevier, vol. 27(1), pages 95-107, May.
    11. Kaur, Jasleen & Radicchi, Filippo & Menczer, Filippo, 2013. "Universality of scholarly impact metrics," Journal of Informetrics, Elsevier, vol. 7(4), pages 924-932.
    12. Mitcham, Carl & Emeritus,, 2021. "Science policy and democracy," Technology in Society, Elsevier, vol. 67(C).
    13. Shouhuai Xu & Moti Yung & Jingguo Wang, 2021. "Seeking Foundations for the Science of Cyber Security," Information Systems Frontiers, Springer, vol. 23(2), pages 263-267, April.
    14. Croft, William L. & Sack, Jörg-Rüdiger, 2022. "Predicting the citation count and CiteScore of journals one year in advance," Journal of Informetrics, Elsevier, vol. 16(4).
    15. Stephen M. Lawani & Alan E. Bayer, 1983. "Validity of citation criteria for assessing the influence of scientific publications: New evidence with peer assessment," Journal of the American Society for Information Science, Association for Information Science & Technology, vol. 34(1), pages 59-66, January.
    16. Terttu Luukkonen, 1991. "Citation indicators and peer review: their time-scales, criteria of evaluation, and biases," Research Evaluation, Oxford University Press, vol. 1(1), pages 21-30, April.
    17. Philip M. Davis, 2008. "Eigenfactor: Does the principle of repeated improvement result in better estimates than raw citation counts?," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 59(13), pages 2186-2188, November.
    18. M.H. MacRoberts & B.R. MacRoberts, 2010. "Problems of citation analysis: A study of uncited and seldom‐cited influences," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 61(1), pages 1-12, January.
    19. Alexander J. Gates & Qing Ke & Onur Varol & Albert-László Barabási, 2019. "Nature’s reach: narrow work has broad impact," Nature, Nature, vol. 575(7781), pages 32-34, November.
    20. Radicchi, Filippo & Weissman, Alexander & Bollen, Johan, 2017. "Quantifying perceived impact of scientific publications," Journal of Informetrics, Elsevier, vol. 11(3), pages 704-712.
    21. Yi Zhang & Fen Zhao & Jianguo Lu, 2019. "P2V: large-scale academic paper embedding," Scientometrics, Springer;Akadémiai Kiadó, vol. 121(1), pages 399-432, October.
    22. Diana Kwon, 2020. "How swamped preprint servers are blocking bad coronavirus research," Nature, Nature, vol. 581(7807), pages 130-131, May.
    23. Anthony Breitzman, 2021. "The relationship between web usage and citation statistics for electronics and information technology articles," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(3), pages 2085-2105, March.
    24. Vahe Tshitoyan & John Dagdelen & Leigh Weston & Alexander Dunn & Ziqin Rong & Olga Kononova & Kristin A. Persson & Gerbrand Ceder & Anubhav Jain, 2019. "Unsupervised word embeddings capture latent knowledge from materials science literature," Nature, Nature, vol. 571(7763), pages 95-98, July.
    25. Bai, Xiaomei & Zhang, Fuli & Lee, Ivan, 2019. "Predicting the citations of scholarly paper," Journal of Informetrics, Elsevier, vol. 13(1), pages 407-418.
    26. Zhongyi Wang & Keying Wang & Jiyue Liu & Jing Huang & Haihua Chen, 2022. "Measuring the innovation of method knowledge elements in scientific literature," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(5), pages 2803-2827, May.
    27. Abhay S. D. Rajput & Sangeeta Sharma, 2021. "India: draft science policy calls for public engagement," Nature, Nature, vol. 592(7852), pages 26-26, April.
    28. Lisa Mandle & Analisa Shields-Estrada & Rebecca Chaplin-Kramer & Matthew G. E. Mitchell & Leah L. Bremer & Jesse D. Gourevitch & Peter Hawthorne & Justin A. Johnson & Brian E. Robinson & Jeffrey R. Sm, 2021. "Increasing decision relevance of ecosystem service science," Nature Sustainability, Nature, vol. 4(2), pages 161-169, February.
    29. Zhuoren Jiang & Xiaozhong Liu & Yan Chen, 2016. "Recovering uncaptured citations in a scholarly network: A two-step citation analysis to estimate publication importance," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 67(7), pages 1722-1735, July.
    30. Liwei Cai & Jiahao Tian & Jiaying Liu & Xiaomei Bai & Ivan Lee & Xiangjie Kong & Feng Xia, 2019. "Scholarly impact assessment: a survey of citation weighting solutions," Scientometrics, Springer;Akadémiai Kiadó, vol. 118(2), pages 453-478, February.
    31. Jean J. Wang & Sarah X. Shao & Fred Y. Ye, 2021. "Identifying 'seed' papers in sciences," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(7), pages 6001-6011, July.
    32. Bornmann, Lutz & Schier, Hermann & Marx, Werner & Daniel, Hans-Dieter, 2012. "What factors determine citation counts of publications in chemistry besides their quality?," Journal of Informetrics, Elsevier, vol. 6(1), pages 11-18.
    33. Charles Oppenheim & Susan P. Renn, 1978. "Highly cited old papers and the reasons why they continue to be cited," Journal of the American Society for Information Science, Association for Information Science & Technology, vol. 29(5), pages 225-231, September.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Malte Hückstädt, 2023. "Ten reasons why research collaborations succeed—a random forest approach," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(3), pages 1923-1950, March.
    2. Cinzia Daraio & Simone Di Leo & Loet Leydesdorff, 2022. "Using the Leiden Rankings as a Heuristics: Evidence from Italian universities in the European landscape," LEM Papers Series 2022/08, Laboratory of Economics and Management (LEM), Sant'Anna School of Advanced Studies, Pisa, Italy.
    3. Weihua Li & Sam Zhang & Zhiming Zheng & Skyler J. Cranmer & Aaron Clauset, 2022. "Untangling the network effects of productivity and prominence among scientists," Nature Communications, Nature, vol. 13(1), pages 1-11, December.
    4. Liyin Zhang & Yuchen Qian & Chao Ma & Jiang Li, 2023. "Continued collaboration shortens the transition period of scientists who move to another institution," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(3), pages 1765-1784, March.
    5. Michael Färber & Melissa Coutinho & Shuzhou Yuan, 2023. "Biases in scholarly recommender systems: impact, prevalence, and mitigation," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(5), pages 2703-2736, May.
    6. Marek Kwiek & Wojciech Roszka, 2022. "Academic vs. biological age in research on academic careers: a large-scale study with implications for scientifically developing systems," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(6), pages 3543-3575, June.
    7. Wu, Lingfei & Kittur, Aniket & Youn, Hyejin & Milojević, Staša & Leahey, Erin & Fiore, Stephen M. & Ahn, Yong-Yeol, 2022. "Metrics and mechanisms: Measuring the unmeasurable in the science of science," Journal of Informetrics, Elsevier, vol. 16(2).
    8. Zhuanlan Sun & C. Clark Cao & Sheng Liu & Yiwei Li & Chao Ma, 2024. "Behavioral consequences of second-person pronouns in written communications between authors and reviewers of scientific papers," Nature Communications, Nature, vol. 15(1), pages 1-12, December.
    9. Pierre Pelletier & Kevin Wirtz, 2023. "Sails and Anchors: The Complementarity of Exploratory and Exploitative Scientists in Knowledge Creation," Papers 2312.10476, arXiv.org.
    10. Thomas, Duncan Andrew & Ramos-Vielba, Irene, 2022. "Reframing study of research(er) funding towards configurations and trails," SocArXiv uty2v, Center for Open Science.
    11. Cinzia Daraio & Simone Di Leo & Loet Leydesdorff, 2023. "A heuristic approach based on Leiden rankings to identify outliers: evidence from Italian universities in the European landscape," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(1), pages 483-510, January.
    12. Liang, Zhentao & Ba, Zhichao & Mao, Jin & Li, Gang, 2023. "Research complexity increases with scientists’ academic age: Evidence from library and information science," Journal of Informetrics, Elsevier, vol. 17(1).
    13. Manuel Goyanes & Márton Demeter & Aurea Grané & Tamás Tóth & Homero Gil Zúñiga, 2023. "Research patterns in communication (2009–2019): testing female representation and productivity differences, within the most cited authors and the field," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(1), pages 137-156, January.
    14. Gao, Qiang & Liang, Zhentao & Wang, Ping & Hou, Jingrui & Chen, Xiuxiu & Liu, Manman, 2021. "Potential index: Revealing the future impact of research topics based on current knowledge networks," Journal of Informetrics, Elsevier, vol. 15(3).
    15. Katchanov, Yurij L. & Markova, Yulia V. & Shmatko, Natalia A., 2023. "Uncited papers in the structure of scientific communication," Journal of Informetrics, Elsevier, vol. 17(2).
    16. JingJing Zhang & Jiancheng Guan, 2017. "Scientific relatedness and intellectual base: a citation analysis of un-cited and highly-cited papers in the solar energy field," Scientometrics, Springer;Akadémiai Kiadó, vol. 110(1), pages 141-162, January.
    17. Lu Liu & Benjamin F. Jones & Brian Uzzi & Dashun Wang, 2023. "Data, measurement and empirical methods in the science of science," Nature Human Behaviour, Nature, vol. 7(7), pages 1046-1058, July.
    18. Eitan Frachtenberg, 2022. "Multifactor Citation Analysis over Five Years: A Case Study of SIGMETRICS Papers," Publications, MDPI, vol. 10(4), pages 1-16, December.
    19. Lutz Bornmann & Robin Haunschild & Rüdiger Mutz, 2021. "Growth rates of modern science: a latent piecewise growth curve approach to model publication numbers from established and new literature databases," Palgrave Communications, Palgrave Macmillan, vol. 8(1), pages 1-15, December.
    20. Siluo Yang & Feng Ma & Yanhui Song & Junping Qiu, 2010. "A longitudinal analysis of citation distribution breadth for Chinese scholars," Scientometrics, Springer;Akadémiai Kiadó, vol. 85(3), pages 755-765, December.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:infome:v:17:y:2023:i:1:s1751157723000019. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/joi .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.