IDEAS home Printed from https://ideas.repec.org/a/eee/infome/v10y2016i2p336-346.html
   My bibliography  Save this article

The discretised lognormal and hooked power law distributions for complete citation data: Best options for modelling and regression

Author

Listed:
  • Thelwall, Mike

Abstract

Identifying the statistical distribution that best fits citation data is important to allow robust and powerful quantitative analyses. Whilst previous studies have suggested that both the hooked power law and discretised lognormal distributions fit better than the power law and negative binomial distributions, no comparisons so far have covered all articles within a discipline, including those that are uncited. Based on an analysis of 26 different Scopus subject areas in seven different years, this article reports comparisons of the discretised lognormal and the hooked power law with citation data, adding 1 to citation counts in order to include zeros. The hooked power law fits better in two thirds of the subject/year combinations tested for journal articles that are at least three years old, including most medical, life and natural sciences, and for virtually all subject areas for younger articles. Conversely, the discretised lognormal tends to fit best for arts, humanities, social science and engineering fields. The difference between the fits of the distributions is mostly small, however, and so either could reasonably be used for modelling citation data. For regression analyses the best option is to use ordinary least squares regression applied to the natural logarithm of citation counts plus one, especially for sets of younger articles, because of the increased precision of the parameters.

Suggested Citation

  • Thelwall, Mike, 2016. "The discretised lognormal and hooked power law distributions for complete citation data: Best options for modelling and regression," Journal of Informetrics, Elsevier, vol. 10(2), pages 336-346.
  • Handle: RePEc:eee:infome:v:10:y:2016:i:2:p:336-346
    DOI: 10.1016/j.joi.2015.12.007
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S1751157715301449
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.joi.2015.12.007?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Ali Gazni & Fereshteh Didegah, 2011. "Investigating different types of research collaboration and citation impact: a case study of Harvard University’s publications," Scientometrics, Springer;Akadémiai Kiadó, vol. 87(2), pages 251-265, May.
    2. Bruce G Charlton & Peter Andras, 2007. "Evaluating universities using simple scientometric research-output metrics: Total citation counts per university for a retrospective seven-year rolling sample," Science and Public Policy, Oxford University Press, vol. 34(8), pages 555-563, October.
    3. S. Redner, 1998. "How popular is your paper? An empirical study of the citation distribution," The European Physical Journal B: Condensed Matter and Complex Systems, Springer;EDP Sciences, vol. 4(2), pages 131-134, July.
    4. Fairclough, Ruth & Thelwall, Mike, 2015. "National research impact indicators from Mendeley readers," Journal of Informetrics, Elsevier, vol. 9(4), pages 845-859.
    5. Hanssen, Thor-Erik Sandberg & Jørgensen, Finn, 2015. "The value of experience in research," Journal of Informetrics, Elsevier, vol. 9(1), pages 16-24.
    6. Didegah, Fereshteh & Thelwall, Mike, 2013. "Which factors help authors produce the highest impact research? Collaboration, journal and document properties," Journal of Informetrics, Elsevier, vol. 7(4), pages 861-873.
    7. Thelwall, Mike & Wilson, Paul, 2014. "Distributions for cited articles from individual subjects and years," Journal of Informetrics, Elsevier, vol. 8(4), pages 824-839.
    8. T. S. Evans & N. Hopkins & B. S. Kaube, 2012. "Universality of performance indicators based on citation and reference counts," Scientometrics, Springer;Akadémiai Kiadó, vol. 93(2), pages 473-495, November.
    9. Fairclough, Ruth & Thelwall, Mike, 2015. "More precise methods for national research citation impact comparisons," Journal of Informetrics, Elsevier, vol. 9(4), pages 895-906.
    10. Waltman, Ludo & van Eck, Nees Jan & van Leeuwen, Thed N. & Visser, Martijn S. & van Raan, Anthony F.J., 2011. "Towards a new crown indicator: Some theoretical considerations," Journal of Informetrics, Elsevier, vol. 5(1), pages 37-47.
    11. Jonathan Adams, 2005. "Early citation counts correlate with accumulated impact," Scientometrics, Springer;Akadémiai Kiadó, vol. 63(3), pages 567-581, June.
    12. Young-Ho Eom & Santo Fortunato, 2011. "Characterizing and Modeling Citation Dynamics," PLOS ONE, Public Library of Science, vol. 6(9), pages 1-7, September.
    13. Thelwall, Mike, 2016. "The precision of the arithmetic mean, geometric mean and percentiles for citation data: An experimental simulation modelling approach," Journal of Informetrics, Elsevier, vol. 10(1), pages 110-123.
    14. Anthony F. J. Raan, 2006. "Comparison of the Hirsch-index with standard bibliometric indicators and with peer judgment for 147 chemistry research groups," Scientometrics, Springer;Akadémiai Kiadó, vol. 67(3), pages 491-502, June.
    15. Éric Archambault & Étienne Vignola-Gagné & Grégoire Côté & Vincent Larivière & Yves Gingrasb, 2006. "Benchmarking scientific output in the social sciences and humanities: The limits of existing databases," Scientometrics, Springer;Akadémiai Kiadó, vol. 68(3), pages 329-342, September.
    16. Vuong, Quang H, 1989. "Likelihood Ratio Tests for Model Selection and Non-nested Hypotheses," Econometrica, Econometric Society, vol. 57(2), pages 307-333, March.
    17. Lokman I. Meho & Kiduk Yang, 2007. "Impact of data sources on citation counts and rankings of LIS faculty: Web of science versus scopus and google scholar," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 58(13), pages 2105-2125, November.
    18. van Raan, Anthony F.J., 2001. "Two-step competition process leads to quasi power-law income distributions," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 298(3), pages 530-536.
    19. Natsuo Onodera & Fuyuki Yoshikane, 2015. "Factors affecting citation rates of research articles," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 66(4), pages 739-764, April.
    20. Thelwall, Mike & Wilson, Paul, 2014. "Regression for citation data: An evaluation of different methods," Journal of Informetrics, Elsevier, vol. 8(4), pages 963-971.
    21. Diana Hicks, 1999. "The difficulty of achieving full coverage of international social science literature and the bibliometric consequences," Scientometrics, Springer;Akadémiai Kiadó, vol. 44(2), pages 193-215, February.
    22. Thed N. Van Leeuwen & Henk F. Moed & Robert J. W. Tijssen & Martijn S. Visser & Anthony F. J. Van Raan, 2001. "Language biases in the coverage of the Science Citation Index and its consequencesfor international comparisons of national research performance," Scientometrics, Springer;Akadémiai Kiadó, vol. 51(1), pages 335-346, April.
    23. Mark J. McCabe & Christopher M. Snyder, 2015. "Does Online Availability Increase Citations? Theory and Evidence from a Panel of Economics and Business Journals," The Review of Economics and Statistics, MIT Press, vol. 97(1), pages 144-165, March.
    24. P R Chandy & Thomas G E Williams, 1994. "The Impact of Journals and Authors on International Business Research: A Citational Analysis of JIBS Articles," Journal of International Business Studies, Palgrave Macmillan;Academy of International Business, vol. 25(4), pages 715-728, December.
    25. Tian Yu & Guang Yu & Peng-Yu Li & Liang Wang, 2014. "Citation impact prediction for scientific papers using stepwise regression analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 101(2), pages 1233-1252, November.
    26. Derek De Solla Price, 1976. "A general theory of bibliometric and other cumulative advantage processes," Journal of the American Society for Information Science, Association for Information Science & Technology, vol. 27(5), pages 292-306, September.
    27. Björn Hellqvist, 2010. "Referencing in the humanities and its implications for citation analysis," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 61(2), pages 310-318, February.
    28. Björn Hellqvist, 2010. "Referencing in the humanities and its implications for citation analysis," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 61(2), pages 310-318, February.
    29. Ajiferuke, Isola & Famoye, Felix, 2015. "Modelling count response variables in informetric studies: Comparison among count, linear, and lognormal regression models," Journal of Informetrics, Elsevier, vol. 9(3), pages 499-513.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Mike Thelwall & Kayvan Kousha, 2017. "ResearchGate versus Google Scholar: Which finds more early citations?," Scientometrics, Springer;Akadémiai Kiadó, vol. 112(2), pages 1125-1131, August.
    2. Thelwall, Mike, 2016. "Citation count distributions for large monodisciplinary journals," Journal of Informetrics, Elsevier, vol. 10(3), pages 863-874.
    3. Thelwall, Mike, 2017. "Three practical field normalised alternative indicator formulae for research evaluation," Journal of Informetrics, Elsevier, vol. 11(1), pages 128-151.
    4. Vîiu, Gabriel-Alexandru, 2018. "The lognormal distribution explains the remarkable pattern documented by characteristic scores and scales in scientometrics," Journal of Informetrics, Elsevier, vol. 12(2), pages 401-415.
    5. Akella, Akhil Pandey & Alhoori, Hamed & Kondamudi, Pavan Ravikanth & Freeman, Cole & Zhou, Haiming, 2021. "Early indicators of scientific impact: Predicting citations with altmetrics," Journal of Informetrics, Elsevier, vol. 15(2).
    6. Copiello, Sergio, 2019. "Peer and neighborhood effects: Citation analysis using a spatial autoregressive model and pseudo-spatial data," Journal of Informetrics, Elsevier, vol. 13(1), pages 238-254.
    7. Żogała-Siudem, Barbara & Cena, Anna & Siudem, Grzegorz & Gagolewski, Marek, 2023. "Interpretable reparameterisations of citation models," Journal of Informetrics, Elsevier, vol. 17(1).
    8. Zahedi, Zohreh & Haustein, Stefanie, 2018. "On the relationships between bibliographic characteristics of scientific documents and citation and Mendeley readership counts: A large-scale analysis of Web of Science publications," Journal of Informetrics, Elsevier, vol. 12(1), pages 191-202.
    9. Thelwall, Mike & Nevill, Tamara, 2018. "Could scientists use Altmetric.com scores to predict longer term citation counts?," Journal of Informetrics, Elsevier, vol. 12(1), pages 237-248.
    10. Mike Thelwall, 2016. "Interpreting correlations between citation counts and other indicators," Scientometrics, Springer;Akadémiai Kiadó, vol. 108(1), pages 337-347, July.
    11. Banshal, Sumit Kumar & Gupta, Solanki & Lathabai, Hiran H & Singh, Vivek Kumar, 2022. "Power Laws in altmetrics: An empirical analysis," Journal of Informetrics, Elsevier, vol. 16(3).
    12. Guillermo Armando Ronda-Pupo & J. Sylvan Katz, 2018. "The power law relationship between citation impact and multi-authorship patterns in articles in Information Science & Library Science journals," Scientometrics, Springer;Akadémiai Kiadó, vol. 114(3), pages 919-932, March.
    13. Thelwall, Mike, 2016. "Are there too many uncited articles? Zero inflated variants of the discretised lognormal and hooked power law distributions," Journal of Informetrics, Elsevier, vol. 10(2), pages 622-633.
    14. Thelwall, Mike, 2016. "Are the discretised lognormal and hooked power law distributions plausible for citation data?," Journal of Informetrics, Elsevier, vol. 10(2), pages 454-470.
    15. Donner, Paul, 2018. "Effect of publication month on citation impact," Journal of Informetrics, Elsevier, vol. 12(1), pages 330-343.
    16. Mrowinski, Maciej J. & Gagolewski, Marek & Siudem, Grzegorz, 2022. "Accidentality in journal citation patterns," Journal of Informetrics, Elsevier, vol. 16(4).
    17. Mike Thelwall, 2017. "Avoiding obscure topics and generalising findings produces higher impact research," Scientometrics, Springer;Akadémiai Kiadó, vol. 110(1), pages 307-320, January.
    18. Cena, Anna & Gagolewski, Marek & Siudem, Grzegorz & Żogała-Siudem, Barbara, 2022. "Validating citation models by proxy indices," Journal of Informetrics, Elsevier, vol. 16(2).
    19. Guillermo Armando Ronda-Pupo & J. Sylvan Katz, 2017. "The scaling relationship between degree centrality of countries and their citation-based performance on Management Information Systems," Scientometrics, Springer;Akadémiai Kiadó, vol. 112(3), pages 1285-1299, September.
    20. Guillermo Armando Ronda-Pupo, 2017. "The citation-based impact of complex innovation systems scales with the size of the system," Scientometrics, Springer;Akadémiai Kiadó, vol. 112(1), pages 141-151, July.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Thelwall, Mike, 2016. "Are there too many uncited articles? Zero inflated variants of the discretised lognormal and hooked power law distributions," Journal of Informetrics, Elsevier, vol. 10(2), pages 622-633.
    2. Thelwall, Mike, 2016. "The precision of the arithmetic mean, geometric mean and percentiles for citation data: An experimental simulation modelling approach," Journal of Informetrics, Elsevier, vol. 10(1), pages 110-123.
    3. Thelwall, Mike, 2016. "Citation count distributions for large monodisciplinary journals," Journal of Informetrics, Elsevier, vol. 10(3), pages 863-874.
    4. Waltman, Ludo, 2016. "A review of the literature on citation impact indicators," Journal of Informetrics, Elsevier, vol. 10(2), pages 365-391.
    5. Thelwall, Mike, 2016. "Are the discretised lognormal and hooked power law distributions plausible for citation data?," Journal of Informetrics, Elsevier, vol. 10(2), pages 454-470.
    6. Thelwall, Mike & Sud, Pardeep, 2016. "National, disciplinary and temporal variations in the extent to which articles with more authors have more impact: Evidence from a geometric field normalised citation indicator," Journal of Informetrics, Elsevier, vol. 10(1), pages 48-61.
    7. Mike Thelwall, 2016. "Interpreting correlations between citation counts and other indicators," Scientometrics, Springer;Akadémiai Kiadó, vol. 108(1), pages 337-347, July.
    8. Thelwall, Mike, 2017. "Three practical field normalised alternative indicator formulae for research evaluation," Journal of Informetrics, Elsevier, vol. 11(1), pages 128-151.
    9. Iman Tahamtan & Askar Safipour Afshar & Khadijeh Ahamdzadeh, 2016. "Factors affecting number of citations: a comprehensive review of the literature," Scientometrics, Springer;Akadémiai Kiadó, vol. 107(3), pages 1195-1225, June.
    10. Stegehuis, Clara & Litvak, Nelly & Waltman, Ludo, 2015. "Predicting the long-term citation impact of recent publications," Journal of Informetrics, Elsevier, vol. 9(3), pages 642-657.
    11. S. R. Goldberg & H. Anthony & T. S. Evans, 2015. "Modelling citation networks," Scientometrics, Springer;Akadémiai Kiadó, vol. 105(3), pages 1577-1604, December.
    12. Guan, Jiancheng & Yan, Yan & Zhang, Jing Jing, 2017. "The impact of collaboration and knowledge networks on citations," Journal of Informetrics, Elsevier, vol. 11(2), pages 407-422.
    13. Brito, Ricardo & Navarro, Alonso Rodríguez, 2021. "The inconsistency of h-index: A mathematical analysis," Journal of Informetrics, Elsevier, vol. 15(1).
    14. Thelwall, Mike & Fairclough, Ruth, 2017. "The accuracy of confidence intervals for field normalised indicators," Journal of Informetrics, Elsevier, vol. 11(2), pages 530-540.
    15. Fairclough, Ruth & Thelwall, Mike, 2015. "More precise methods for national research citation impact comparisons," Journal of Informetrics, Elsevier, vol. 9(4), pages 895-906.
    16. Zahedi, Zohreh & Haustein, Stefanie, 2018. "On the relationships between bibliographic characteristics of scientific documents and citation and Mendeley readership counts: A large-scale analysis of Web of Science publications," Journal of Informetrics, Elsevier, vol. 12(1), pages 191-202.
    17. Mingyang Wang & Zhenyu Wang & Guangsheng Chen, 2019. "Which can better predict the future success of articles? Bibliometric indices or alternative metrics," Scientometrics, Springer;Akadémiai Kiadó, vol. 119(3), pages 1575-1595, June.
    18. Tol, Richard S.J., 2013. "The Matthew effect for cohorts of economists," Journal of Informetrics, Elsevier, vol. 7(2), pages 522-527.
    19. Wan Jing Low & Paul Wilson & Mike Thelwall, 2016. "Stopped sum models and proposed variants for citation data," Scientometrics, Springer;Akadémiai Kiadó, vol. 107(2), pages 369-384, May.
    20. Amalia Mas-Bleda & Mike Thelwall, 2016. "Can alternative indicators overcome language biases in citation counts? A comparison of Spanish and UK research," Scientometrics, Springer;Akadémiai Kiadó, vol. 109(3), pages 2007-2030, December.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:infome:v:10:y:2016:i:2:p:336-346. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/joi .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.