IDEAS home Printed from https://ideas.repec.org/a/eee/infome/v11y2017i3p766-782.html
   My bibliography  Save this article

Quantifying and suppressing ranking bias in a large citation network

Author

Listed:
  • Vaccario, Giacomo
  • Medo, Matúš
  • Wider, Nicolas
  • Mariani, Manuel Sebastian

Abstract

It is widely recognized that citation counts for papers from different fields cannot be directly compared because different scientific fields adopt different citation practices. Citation counts are also strongly biased by paper age since older papers had more time to attract citations. Various procedures aim at suppressing these biases and give rise to new normalized indicators, such as the relative citation count. We use a large citation dataset from Microsoft Academic Graph and a new statistical framework based on the Mahalanobis distance to show that the rankings by well known indicators, including the relative citation count and Google's PageRank score, are significantly biased by paper field and age. Our statistical framework to assess ranking bias allows us to exactly quantify the contributions of each individual field to the overall bias of a given ranking. We propose a general normalization procedure motivated by the z-score which produces much less biased rankings when applied to citation count and PageRank score.

Suggested Citation

  • Vaccario, Giacomo & Medo, Matúš & Wider, Nicolas & Mariani, Manuel Sebastian, 2017. "Quantifying and suppressing ranking bias in a large citation network," Journal of Informetrics, Elsevier, vol. 11(3), pages 766-782.
  • Handle: RePEc:eee:infome:v:11:y:2017:i:3:p:766-782
    DOI: 10.1016/j.joi.2017.05.014
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S1751157717300974
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.joi.2017.05.014?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Mariani, Manuel Sebastian & Medo, Matúš & Zhang, Yi-Cheng, 2016. "Identification of milestone papers through time-balanced network centrality," Journal of Informetrics, Elsevier, vol. 10(4), pages 1207-1223.
    2. Ingo Scholtes & Nicolas Wider & René Pfitzner & Antonios Garas & Claudio J. Tessone & Frank Schweitzer, 2014. "Causality-driven slow-down and speed-up of diffusion in non-Markovian temporal networks," Nature Communications, Nature, vol. 5(1), pages 1-9, December.
    3. Pedro Albarrán & Juan A. Crespo & Ignacio Ortuño & Javier Ruiz-Castillo, 2011. "The skewness of science in 219 sub-fields and a number of aggregates," Scientometrics, Springer;Akadémiai Kiadó, vol. 88(2), pages 385-397, August.
    4. Ludo Waltman & Nees Jan van Eck, 2010. "The relation between Eigenfactor, audience factor, and influence weight," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 61(7), pages 1476-1486, July.
    5. Radicchi, Filippo & Castellano, Claudio, 2012. "Testing the fairness of citation indicators for comparison across scientific domains: The case of fractional citation counts," Journal of Informetrics, Elsevier, vol. 6(1), pages 121-130.
    6. Ludo Waltman & Erjia Yan & Nees Jan Eck, 2011. "A recursive field-normalized bibliometric performance indicator: an application to the field of library and information science," Scientometrics, Springer;Akadémiai Kiadó, vol. 89(1), pages 301-314, October.
    7. Chen, P. & Xie, H. & Maslov, S. & Redner, S., 2007. "Finding scientific gems with Google’s PageRank algorithm," Journal of Informetrics, Elsevier, vol. 1(1), pages 8-15.
    8. Sven E. Hug & Michael Ochsner & Martin P. Brändle, 2017. "Citation analysis with microsoft academic," Scientometrics, Springer;Akadémiai Kiadó, vol. 111(1), pages 371-378, April.
    9. Ludo Waltman & Nees Jan van Eck, 2010. "The relation between Eigenfactor, audience factor, and influence weight," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 61(7), pages 1476-1486, July.
    10. Waltman, Ludo, 2016. "A review of the literature on citation impact indicators," Journal of Informetrics, Elsevier, vol. 10(2), pages 365-391.
    11. Thed N. van Leeuwen & Clara Calero Medina, 2012. "Redefining the field of economics: Improving field normalization for the application of bibliometric techniques in the field of economics," Research Evaluation, Oxford University Press, vol. 21(1), pages 61-70, February.
    12. Jonathan Adams & Karen Gurney & Louise Jackson, 2008. "Calibrating the zoom — a test of Zitt’s hypothesis," Scientometrics, Springer;Akadémiai Kiadó, vol. 75(1), pages 81-95, April.
    13. Zhihui Zhang & Ying Cheng & Nian Cai Liu, 2014. "Comparison of the effect of mean-based method and z-score for field normalization of citations at the level of Web of Science subject categories," Scientometrics, Springer;Akadémiai Kiadó, vol. 101(3), pages 1679-1693, December.
    14. Michel Zitt & Suzy Ramanana-Rahary & Elise Bassecoulard, 2005. "Relativity of citation performance and excellence measures: From cross-field to cross-scale effects of field-normalisation," Scientometrics, Springer;Akadémiai Kiadó, vol. 63(2), pages 373-401, April.
    15. Jianlin Zhou & An Zeng & Ying Fan & Zengru Di, 2016. "Ranking scientific publications with similarity-preferential mechanism," Scientometrics, Springer;Akadémiai Kiadó, vol. 106(2), pages 805-816, February.
    16. Dunaiski, Marcel & Visser, Willem & Geldenhuys, Jaco, 2016. "Evaluating paper and author ranking algorithms using impact and contribution awards," Journal of Informetrics, Elsevier, vol. 10(2), pages 392-407.
    17. Lundberg, Jonas, 2007. "Lifting the crown—citation z-score," Journal of Informetrics, Elsevier, vol. 1(2), pages 145-154.
    18. Parolo, Pietro Della Briotta & Pan, Raj Kumar & Ghosh, Rumi & Huberman, Bernardo A. & Kaski, Kimmo & Fortunato, Santo, 2015. "Attention decay in science," Journal of Informetrics, Elsevier, vol. 9(4), pages 734-745.
    19. Emre Sarigöl & David Garcia & Ingo Scholtes & Frank Schweitzer, 2017. "Quantifying the effect of editor–author relations on manuscript handling times," Scientometrics, Springer;Akadémiai Kiadó, vol. 113(1), pages 609-631, October.
    20. Derek De Solla Price, 1976. "A general theory of bibliometric and other cumulative advantage processes," Journal of the American Society for Information Science, Association for Information Science & Technology, vol. 27(5), pages 292-306, September.
    21. Ludo Waltman & Nees Jan van Eck & Anthony F. J. van Raan, 2012. "Universality of citation distributions revisited," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 63(1), pages 72-77, January.
    22. Leo Katz, 1953. "A new status index derived from sociometric analysis," Psychometrika, Springer;The Psychometric Society, vol. 18(1), pages 39-43, March.
    23. Radicchi, Filippo & Castellano, Claudio, 2012. "Why Sirtes's claims (Sirtes, 2012) do not square with reality," Journal of Informetrics, Elsevier, vol. 6(4), pages 615-618.
    24. Colliander, Cristian & Ahlgren, Per, 2011. "The effects and their stability of field normalization baseline on relative performance with respect to citation impact: A case study of 20 natural science departments," Journal of Informetrics, Elsevier, vol. 5(1), pages 101-113.
    25. Ludo Waltman & Nees Jan van Eck & Anthony F. J. van Raan, 2012. "Universality of citation distributions revisited," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 63(1), pages 72-77, January.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Dunaiski, Marcel & Geldenhuys, Jaco & Visser, Willem, 2019. "Globalised vs averaged: Bias and ranking performance on the author level," Journal of Informetrics, Elsevier, vol. 13(1), pages 299-313.
    2. Pan, Raj K. & Petersen, Alexander M. & Pammolli, Fabio & Fortunato, Santo, 2018. "The memory of science: Inflation, myopia, and the knowledge network," Journal of Informetrics, Elsevier, vol. 12(3), pages 656-678.
    3. Xipeng Liu & Xinmiao Li, 2022. "Early Identification of Significant Patents Using Heterogeneous Applicant-Citation Networks Based on the Chinese Green Patent Data," Sustainability, MDPI, vol. 14(21), pages 1-27, October.
    4. Xu, Shuqi & Mariani, Manuel Sebastian & Lü, Linyuan & Medo, Matúš, 2020. "Unbiased evaluation of ranking metrics reveals consistent performance in science and technology citation data," Journal of Informetrics, Elsevier, vol. 14(1).
    5. Mariani, Manuel Sebastian & Medo, Matúš & Lafond, François, 2019. "Early identification of important patents: Design and validation of citation network metrics," Technological Forecasting and Social Change, Elsevier, vol. 146(C), pages 644-654.
    6. Yuanyuan Liu & Qiang Wu & Shijie Wu & Yong Gao, 2021. "Weighted citation based on ranking-related contribution: a new index for evaluating article impact," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(10), pages 8653-8672, October.
    7. Dunaiski, Marcel & Geldenhuys, Jaco & Visser, Willem, 2019. "On the interplay between normalisation, bias, and performance of paper impact metrics," Journal of Informetrics, Elsevier, vol. 13(1), pages 270-290.
    8. Liwei Cai & Jiahao Tian & Jiaying Liu & Xiaomei Bai & Ivan Lee & Xiangjie Kong & Feng Xia, 2019. "Scholarly impact assessment: a survey of citation weighting solutions," Scientometrics, Springer;Akadémiai Kiadó, vol. 118(2), pages 453-478, February.
    9. Zhou, Yanbo & Li, Qu & Yang, Xuhua & Cheng, Hongbing, 2021. "Predicting the popularity of scientific publications by an age-based diffusion model," Journal of Informetrics, Elsevier, vol. 15(4).
    10. Wang, Jingjing & Xu, Shuqi & Mariani, Manuel S. & Lü, Linyuan, 2021. "The local structure of citation networks uncovers expert-selected milestone papers," Journal of Informetrics, Elsevier, vol. 15(4).
    11. Eyal Eckhaus & Nitza Davidovitch, 2021. "Academic Rank and Position Effect on Academic Research Output – A Case Study of Ariel University," International Journal of Higher Education, Sciedu Press, vol. 10(1), pages 295-295, February.
    12. Zahid Halim & Shafaq Khan, 2019. "A data science-based framework to categorize academic journals," Scientometrics, Springer;Akadémiai Kiadó, vol. 119(1), pages 393-423, April.
    13. Sven E. Hug & Martin P. Brändle, 2017. "The coverage of Microsoft Academic: analyzing the publication output of a university," Scientometrics, Springer;Akadémiai Kiadó, vol. 113(3), pages 1551-1571, December.
    14. Ren, Zhuo-Ming, 2019. "Age preference of metrics for identifying significant nodes in growing citation networks," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 513(C), pages 325-332.
    15. Yuhao Zhou & Ruijie Wang & An Zeng, 2022. "Predicting the impact and publication date of individual scientists’ future papers," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(4), pages 1867-1882, April.
    16. Petersen, Alexander M. & Pan, Raj K. & Pammolli, Fabio & Fortunato, Santo, 2019. "Methods to account for citation inflation in research evaluation," Research Policy, Elsevier, vol. 48(7), pages 1855-1865.
    17. Wang, Xing & Zhang, Zhihui, 2020. "Improving the reliability of short-term citation impact indicators by taking into account the correlation between short- and long-term citation impact," Journal of Informetrics, Elsevier, vol. 14(2).

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Dunaiski, Marcel & Geldenhuys, Jaco & Visser, Willem, 2019. "On the interplay between normalisation, bias, and performance of paper impact metrics," Journal of Informetrics, Elsevier, vol. 13(1), pages 270-290.
    2. Dunaiski, Marcel & Geldenhuys, Jaco & Visser, Willem, 2019. "Globalised vs averaged: Bias and ranking performance on the author level," Journal of Informetrics, Elsevier, vol. 13(1), pages 299-313.
    3. Waltman, Ludo, 2016. "A review of the literature on citation impact indicators," Journal of Informetrics, Elsevier, vol. 10(2), pages 365-391.
    4. Ludo Waltman & Nees Jan Eck, 2013. "Source normalized indicators of citation impact: an overview of different approaches and an empirical comparison," Scientometrics, Springer;Akadémiai Kiadó, vol. 96(3), pages 699-716, September.
    5. Liwei Cai & Jiahao Tian & Jiaying Liu & Xiaomei Bai & Ivan Lee & Xiangjie Kong & Feng Xia, 2019. "Scholarly impact assessment: a survey of citation weighting solutions," Scientometrics, Springer;Akadémiai Kiadó, vol. 118(2), pages 453-478, February.
    6. Bouyssou, Denis & Marchant, Thierry, 2016. "Ranking authors using fractional counting of citations: An axiomatic approach," Journal of Informetrics, Elsevier, vol. 10(1), pages 183-199.
    7. Waltman, Ludo & van Eck, Nees Jan, 2013. "A systematic empirical comparison of different approaches for normalizing citation impact indicators," Journal of Informetrics, Elsevier, vol. 7(4), pages 833-849.
    8. Ludo Waltman & Erjia Yan & Nees Jan Eck, 2011. "A recursive field-normalized bibliometric performance indicator: an application to the field of library and information science," Scientometrics, Springer;Akadémiai Kiadó, vol. 89(1), pages 301-314, October.
    9. Ruiz-Castillo, Javier & Waltman, Ludo, 2015. "Field-normalized citation impact indicators using algorithmically constructed classification systems of science," Journal of Informetrics, Elsevier, vol. 9(1), pages 102-117.
    10. Xu, Shuqi & Mariani, Manuel Sebastian & Lü, Linyuan & Medo, Matúš, 2020. "Unbiased evaluation of ranking metrics reveals consistent performance in science and technology citation data," Journal of Informetrics, Elsevier, vol. 14(1).
    11. Zhihui Zhang & Ying Cheng & Nian Cai Liu, 2015. "Improving the normalization effect of mean-based method from the perspective of optimization: optimization-based linear methods and their performance," Scientometrics, Springer;Akadémiai Kiadó, vol. 102(1), pages 587-607, January.
    12. Jiang, Xiaorui & Zhuge, Hai, 2019. "Forward search path count as an alternative indirect citation impact indicator," Journal of Informetrics, Elsevier, vol. 13(4).
    13. Tolga Yuret, 2018. "Author-weighted impact factor and reference return ratio: can we attain more equality among fields?," Scientometrics, Springer;Akadémiai Kiadó, vol. 116(3), pages 2097-2111, September.
    14. Tahamtan, Iman & Bornmann, Lutz, 2018. "Creativity in science and the link to cited references: Is the creative potential of papers reflected in their cited references?," Journal of Informetrics, Elsevier, vol. 12(3), pages 906-930.
    15. Yanbo Zhou & Xin-Li Xu & Xu-Hua Yang & Qu Li, 2022. "The influence of disruption on evaluating the scientific significance of papers," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(10), pages 5931-5945, October.
    16. T. S. Evans & N. Hopkins & B. S. Kaube, 2012. "Universality of performance indicators based on citation and reference counts," Scientometrics, Springer;Akadémiai Kiadó, vol. 93(2), pages 473-495, November.
    17. Mariani, Manuel Sebastian & Medo, Matúš & Zhang, Yi-Cheng, 2016. "Identification of milestone papers through time-balanced network centrality," Journal of Informetrics, Elsevier, vol. 10(4), pages 1207-1223.
    18. Javier Ruiz-Castillo, 2013. "The role of statistics in establishing the similarity of citation distributions in a static and a dynamic context," Scientometrics, Springer;Akadémiai Kiadó, vol. 96(1), pages 173-181, July.
    19. Yu Zhang & Min Wang & Morteza Saberi & Elizabeth Chang, 2022. "Analysing academic paper ranking algorithms using test data and benchmarks: an investigation," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(7), pages 4045-4074, July.
    20. Abramo, Giovanni & Cicero, Tindaro & D’Angelo, Ciriaco Andrea, 2012. "How important is choice of the scaling factor in standardizing citations?," Journal of Informetrics, Elsevier, vol. 6(4), pages 645-654.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:infome:v:11:y:2017:i:3:p:766-782. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/joi .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.