IDEAS home Printed from https://ideas.repec.org/a/spr/scient/v127y2022i10d10.1007_s11192-022-04484-6.html
   My bibliography  Save this article

A comparative analysis of local similarity metrics and machine learning approaches: application to link prediction in author citation networks

Author

Listed:
  • Adilson Vital

    (University of São Paulo)

  • Diego R. Amancio

    (University of São Paulo)

Abstract

Understanding the evolution of paper and author citations is of paramount importance for the design of research policies and evaluation criteria that can promote and accelerate scientific discoveries. Recently many studies on the evolution of science have been conducted in the context of the emergent Science of Science field. While many studies have probed the link problem in citation networks, only a few works have analyzed the temporal nature of link prediction in author citation networks. In this study we compared the performance of 10 well-known local network similarity measurements with four machine learning models to predict future links in author citations networks. Differently from traditional link prediction methods, the temporal nature of the predict links is relevant for our approach. Our analysis revealed that the Jaccard coefficient was found to be among the most relevant measurements. The preferential attachment measurement, conversely, displayed the worst performance. We also found that the extension of local measurements to their weighted version do not significantly improved the performance of predicting citations. Finally, we also found that a XGBoost and neural network approach summarizing the information from all 10 considered similarity measurements was able to provide the highest AUC performance and competitive precision values.

Suggested Citation

  • Adilson Vital & Diego R. Amancio, 2022. "A comparative analysis of local similarity metrics and machine learning approaches: application to link prediction in author citation networks," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(10), pages 6011-6028, October.
  • Handle: RePEc:spr:scient:v:127:y:2022:i:10:d:10.1007_s11192-022-04484-6
    DOI: 10.1007/s11192-022-04484-6
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11192-022-04484-6
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11192-022-04484-6?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Milojević, Staša, 2013. "Accuracy of simple, initials-based methods for author name disambiguation," Journal of Informetrics, Elsevier, vol. 7(4), pages 767-773.
    2. Xiomara S. Q. Chacon & Thiago C. Silva & Diego R. Amancio, 2020. "Comparing the impact of subfields in scientific journals," Scientometrics, Springer;Akadémiai Kiadó, vol. 125(1), pages 625-639, October.
    3. Xiaomei Bai & Feng Xia & Ivan Lee & Jun Zhang & Zhaolong Ning, 2016. "Identifying Anomalous Citations for Objective Evaluation of Scholarly Article Impact," PLOS ONE, Public Library of Science, vol. 11(9), pages 1-15, September.
    4. D. R. Amancio & M. G. V. Nunes & O. N. Oliveira & L. F. Costa, 2012. "Using complex networks concepts to assess approaches for citations in scientific papers," Scientometrics, Springer;Akadémiai Kiadó, vol. 91(3), pages 827-842, June.
    5. Wuestman, Mignon L. & Hoekman, Jarno & Frenken, Koen, 2019. "The geography of scientific citations," Research Policy, Elsevier, vol. 48(7), pages 1771-1780.
    6. Liu, Xiao Fan & Chen, Hou-Jin & Sun, Wu-Jiu, 2021. "Adaptive topological coevolution of interdependent networks: Scientific collaboration-citation networks as an example," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 564(C).
    7. Naoki Shibata & Yuya Kajikawa & Ichiro Sakata, 2012. "Link prediction in citation networks," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 63(1), pages 78-85, January.
    8. Naoki Shibata & Yuya Kajikawa & Ichiro Sakata, 2012. "Link prediction in citation networks," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 63(1), pages 78-85, January.
    9. Diego Raphael Amancio & Cesar Henrique Comin & Dalcimar Casanova & Gonzalo Travieso & Odemir Martinez Bruno & Francisco Aparecido Rodrigues & Luciano da Fontoura Costa, 2014. "A Systematic Comparison of Supervised Classifiers," PLOS ONE, Public Library of Science, vol. 9(4), pages 1-14, April.
    10. Sanda Martinčić-Ipšić & Edvin Močibob & Matjaž Perc, 2017. "Link prediction on Twitter," PLOS ONE, Public Library of Science, vol. 12(7), pages 1-21, July.
    11. Amancio, D.R. & Nunes, M.G.V. & Oliveira, O.N. & Pardo, T.A.S. & Antiqueira, L. & da F. Costa, L., 2011. "Using metrics from complex networks to evaluate machine translation," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 390(1), pages 131-142.
    12. Wang, Mingyang & Yu, Guang & Yu, Daren, 2008. "Measuring the preferential attachment mechanism in citation networks," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 387(18), pages 4692-4698.
    13. Guo Zhang & Ying Ding & Staša Milojević, 2013. "Citation content analysis (CCA): A framework for syntactic and semantic analysis of citation content," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 64(7), pages 1490-1503, July.
    14. Lü, Linyuan & Zhou, Tao, 2011. "Link prediction in complex networks: A survey," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 390(6), pages 1150-1170.
    15. Shiu-Wan Hung & An-Pang Wang, 2010. "Examining the small world phenomenon in the patent citation network: a case study of the radio frequency identification (RFID) network," Scientometrics, Springer;Akadémiai Kiadó, vol. 82(1), pages 121-134, January.
    16. Young-Ho Eom & Santo Fortunato, 2011. "Characterizing and Modeling Citation Dynamics," PLOS ONE, Public Library of Science, vol. 6(9), pages 1-7, September.
    17. Tao Zhou & Linyuan Lü & Yi-Cheng Zhang, 2009. "Predicting missing links via local information," The European Physical Journal B: Condensed Matter and Complex Systems, Springer;EDP Sciences, vol. 71(4), pages 623-630, October.
    18. Hennemann, Stefan & Rybski, Diego & Liefner, Ingo, 2012. "The myth of global science collaboration—Collaboration patterns in epistemic communities," Journal of Informetrics, Elsevier, vol. 6(2), pages 217-225.
    19. Jefferson Seide Molléri & Kai Petersen & Emilia Mendes, 2018. "Towards understanding the relation between citations and research quality in software engineering studies," Scientometrics, Springer;Akadémiai Kiadó, vol. 117(3), pages 1453-1478, December.
    20. Sven E. Hug & Martin P. Brändle, 2017. "The coverage of Microsoft Academic: analyzing the publication output of a university," Scientometrics, Springer;Akadémiai Kiadó, vol. 113(3), pages 1551-1571, December.
    21. Silva, Filipi N. & Amancio, Diego R. & Bardosova, Maria & Costa, Luciano da F. & Oliveira, Osvaldo N., 2016. "Using network science and text analytics to produce surveys in a scientific topic," Journal of Informetrics, Elsevier, vol. 10(2), pages 487-502.
    22. Amancio, Diego Raphael & Oliveira, Osvaldo Novais & da Fontoura Costa, Luciano, 2012. "Three-feature model to reproduce the topology of citation networks and the effects from authors’ visibility on their h-index," Journal of Informetrics, Elsevier, vol. 6(3), pages 427-434.
    23. Guo Zhang & Ying Ding & Staša Milojević, 2013. "Citation content analysis (CCA): A framework for syntactic and semantic analysis of citation content," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 64(7), pages 1490-1503, July.
    24. L. Krumov & C. Fretter & M. Müller-Hannemann & K. Weihe & M. Hütt, 2011. "Motifs in co-authorship networks and their relation to the impact of scientific publications," The European Physical Journal B: Condensed Matter and Complex Systems, Springer;EDP Sciences, vol. 84(4), pages 535-540, December.
    25. Bai, Xiaomei & Zhang, Fuli & Lee, Ivan, 2019. "Predicting the citations of scholarly paper," Journal of Informetrics, Elsevier, vol. 13(1), pages 407-418.
    26. Stella, Massimo, 2020. "Multiplex networks quantify robustness of the mental lexicon to catastrophic concept failures, aphasic degradation and ageing," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 554(C).
    27. Paul Sebo & Sylvain de Lucia & Nathalie Vernaz, 2021. "Accuracy of PubMed-based author lists of publications and use of author identifiers to address author name ambiguity: a cross-sectional study," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(5), pages 4121-4135, May.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Brian, Kieran & Stella, Massimo, 2023. "Introducing mindset streams to investigate stances towards STEM in high school students and experts," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 626(C).

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Jorge A. V. Tohalino & Laura V. C. Quispe & Diego R. Amancio, 2021. "Analyzing the relationship between text features and grants productivity," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(5), pages 4255-4275, May.
    2. Yichi Zhang & Zhiliang Dong & Sen Liu & Peixiang Jiang & Cuizhi Zhang & Chao Ding, 2021. "Forecast of International Trade of Lithium Carbonate Products in Importing Countries and Small-Scale Exporting Countries," Sustainability, MDPI, vol. 13(3), pages 1-23, January.
    3. Corrêa Jr., Edilson A. & Silva, Filipi N. & da F. Costa, Luciano & Amancio, Diego R., 2017. "Patterns of authors contribution in scientific manuscripts," Journal of Informetrics, Elsevier, vol. 11(2), pages 498-510.
    4. Diego R. Amancio & Osvaldo N. Oliveira jr & Luciano F. Costa, 2015. "Topological-collaborative approach for disambiguating authors’ names in collaborative networks," Scientometrics, Springer;Akadémiai Kiadó, vol. 102(1), pages 465-485, January.
    5. Xiomara S. Q. Chacon & Thiago C. Silva & Diego R. Amancio, 2020. "Comparing the impact of subfields in scientific journals," Scientometrics, Springer;Akadémiai Kiadó, vol. 125(1), pages 625-639, October.
    6. Jing Ma & Yaohui Pan & Chih-Yi Su, 2022. "Organization-oriented technology opportunities analysis based on predicting patent networks: a case of Alzheimer’s disease," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(9), pages 5497-5517, September.
    7. Lingling Zhang & Jing Li & Qiuliu Zhang & Fan Meng & Weili Teng, 2019. "Domain Knowledge-Based Link Prediction in Customer-Product Bipartite Graph for Product Recommendation," International Journal of Information Technology & Decision Making (IJITDM), World Scientific Publishing Co. Pte. Ltd., vol. 18(01), pages 311-338, January.
    8. Tofighy, Sajjad & Charkari, Nasrollah Moghadam & Ghaderi, Foad, 2022. "Link prediction in multiplex networks using intralayer probabilistic distance and interlayer co-evolving factors," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 606(C).
    9. Lutz Bornmann & Robin Haunschild & Sven E. Hug, 2018. "Visualizing the context of citations referencing papers published by Eugene Garfield: a new type of keyword co-occurrence analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 114(2), pages 427-437, February.
    10. Zhao, Qihang & Feng, Xiaodong, 2022. "Utilizing citation network structure to predict paper citation counts: A Deep learning approach," Journal of Informetrics, Elsevier, vol. 16(1).
    11. Wang, Feifei & Dong, Jiaxin & Lu, Wanzhao & Xu, Shuo, 2023. "Collaboration prediction based on multilayer all-author tripartite citation networks: A case study of gene editing," Journal of Informetrics, Elsevier, vol. 17(1).
    12. Tahamtan, Iman & Bornmann, Lutz, 2018. "Core elements in the process of citing publications: Conceptual overview of the literature," Journal of Informetrics, Elsevier, vol. 12(1), pages 203-216.
    13. Nazim Choudhury & Shahadat Uddin, 2016. "Time-aware link prediction to explore network effects on temporal knowledge evolution," Scientometrics, Springer;Akadémiai Kiadó, vol. 108(2), pages 745-776, August.
    14. Lee, Yan-Li & Dong, Qiang & Zhou, Tao, 2021. "Link prediction via controlling the leading eigenvector," Applied Mathematics and Computation, Elsevier, vol. 411(C).
    15. Yang, Jinqing & Liu, Zhifeng, 2022. "The effect of citation behaviour on knowledge diffusion and intellectual structure," Journal of Informetrics, Elsevier, vol. 16(1).
    16. Brito, Ana C.M. & Silva, Filipi N. & Amancio, Diego R., 2021. "Associations between author-level metrics in subsequent time periods," Journal of Informetrics, Elsevier, vol. 15(4).
    17. Yongchang Wei & Lei Chen & Yu Qi & Can Wang & Fei Li & Haorong Wang & Fangyu Chen, 2019. "A Complex Network Method in Criticality Evaluation of Air Quality Standards," Sustainability, MDPI, vol. 11(14), pages 1-15, July.
    18. Yan, Erjia & Guns, Raf, 2014. "Predicting and recommending collaborations: An author-, institution-, and country-level analysis," Journal of Informetrics, Elsevier, vol. 8(2), pages 295-309.
    19. Ana C. M. Brito & Filipi N. Silva & Diego R. Amancio, 2023. "Analyzing the influence of prolific collaborations on authors productivity and visibility," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(4), pages 2471-2487, April.
    20. Bian, Tian & Hu, Jiantao & Deng, Yong, 2017. "Identifying influential nodes in complex networks based on AHP," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 479(C), pages 422-436.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:scient:v:127:y:2022:i:10:d:10.1007_s11192-022-04484-6. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.