IDEAS home Printed from https://ideas.repec.org/a/eee/infome/v8y2014i4p963-971.html
   My bibliography  Save this article

Regression for citation data: An evaluation of different methods

Author

Listed:
  • Thelwall, Mike
  • Wilson, Paul

Abstract

Citations are increasingly used for research evaluations. It is therefore important to identify factors affecting citation scores that are unrelated to scholarly quality or usefulness so that these can be taken into account. Regression is the most powerful statistical technique to identify these factors and hence it is important to identify the best regression strategy for citation data. Citation counts tend to follow a discrete lognormal distribution and, in the absence of alternatives, have been investigated with negative binomial regression. Using simulated discrete lognormal data (continuous lognormal data rounded to the nearest integer) this article shows that a better strategy is to add one to the citations, take their log and then use the general linear (ordinary least squares) model for regression (e.g., multiple linear regression, ANOVA), or to use the generalised linear model without the log. Reasonable results can also be obtained if all the zero citations are discarded, the log is taken of the remaining citation counts and then the general linear model is used, or if the generalised linear model is used with the continuous lognormal distribution. Similar approaches are recommended for altmetric data, if it proves to be lognormally distributed.

Suggested Citation

  • Thelwall, Mike & Wilson, Paul, 2014. "Regression for citation data: An evaluation of different methods," Journal of Informetrics, Elsevier, vol. 8(4), pages 963-971.
  • Handle: RePEc:eee:infome:v:8:y:2014:i:4:p:963-971
    DOI: 10.1016/j.joi.2014.09.011
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S1751157714000923
    Download Restriction: Full text for ScienceDirect subscribers only

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Fereshteh Didegah & Mike Thelwall, 2013. "Determinants of research citation impact in nanoscience and nanotechnology," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 64(5), pages 1055-1064, May.
    2. Jonathan Furner, 2002. "On recommending," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 53(9), pages 747-763.
    3. Michael J. Stringer & Marta Sales-Pardo & Luís A. Nunes Amaral, 2010. "Statistical validation of a global model for the distribution of the ultimate number of citations accrued by papers published in a scientific journal," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 61(7), pages 1377-1385, July.
    4. Yu, Tian & Yu, Guang & Wang, Ming-Yang, 2014. "Classification method for detecting coercive self-citation in journals," Journal of Informetrics, Elsevier, vol. 8(1), pages 123-135.
    5. Didegah, Fereshteh & Thelwall, Mike, 2013. "Which factors help authors produce the highest impact research? Collaboration, journal and document properties," Journal of Informetrics, Elsevier, vol. 7(4), pages 861-873.
    6. David L. Anderson & Warren Smart & John Tressler, 2013. "Evaluating research -- peer review team assessment and journal based bibliographic measures: New Zealand PBRF research output scores in 2006," New Zealand Economic Papers, Taylor & Francis Journals, vol. 47(2), pages 140-157, August.
    7. Chaomei Chen, 2012. "Predictive effects of structural variation on citation counts," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 63(3), pages 431-449, March.
    8. Thelwall, Mike & Wilson, Paul, 2014. "Distributions for cited articles from individual subjects and years," Journal of Informetrics, Elsevier, vol. 8(4), pages 824-839.
    9. Bornmann, Lutz & Williams, Richard, 2013. "How to calculate the practical significance of citation impact differences? An empirical example from evaluative institutional bibliometrics using adjusted predictions and marginal effects," Journal of Informetrics, Elsevier, vol. 7(2), pages 562-574.
    10. Butler, Linda, 2003. "Explaining Australia's increased share of ISI publications--the effects of a funding formula based on publication counts," Research Policy, Elsevier, vol. 32(1), pages 143-155, January.
    11. Waltman, Ludo & van Eck, Nees Jan & van Leeuwen, Thed N. & Visser, Martijn S. & van Raan, Anthony F.J., 2011. "Towards a new crown indicator: Some theoretical considerations," Journal of Informetrics, Elsevier, vol. 5(1), pages 37-47.
    12. Sei-Ching Joanna Sin, 2011. "International coauthorship and citation impact: A bibliometric study of six LIS journals, 1980–2008," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 62(9), pages 1770-1783, September.
    13. Katherine W. McCain, 2012. "Assessing Obliteration by Incorporation: Issues and Caveats," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 63(11), pages 2129-2139, November.
    14. Susanne E. Baumgartner & Loet Leydesdorff, 2014. "Group-based trajectory modeling (GBTM) of citations in scholarly literature: Dynamic qualities of “transient” and “sticky knowledge claims”," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 65(4), pages 797-811, April.
    15. Wu, Jiang, 2013. "Investigating the universal distributions of normalized indicators and developing field-independent index," Journal of Informetrics, Elsevier, vol. 7(1), pages 63-71.
    16. Ehsan Mohammadi & Mike Thelwall, 2014. "Mendeley readership altmetrics for the social sciences and humanities: Research evaluation and knowledge flows," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 65(8), pages 1627-1638, August.
    17. Hadas Shema & Judit Bar-Ilan & Mike Thelwall, 2014. "Do blog citations correlate with a higher number of future citations? Research blogs as a potential source for alternative metrics," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 65(5), pages 1018-1027, May.
    18. Tang, Li, 2013. "Does “birds of a feather flock together” matter—Evidence from a longitudinal study on US–China scientific collaboration," Journal of Informetrics, Elsevier, vol. 7(2), pages 330-344.
    19. Franceschet, Massimo & Costantini, Antonio, 2011. "The first Italian research assessment exercise: A bibliometric perspective," Journal of Informetrics, Elsevier, vol. 5(2), pages 275-291.
    20. Stasinopoulos, D. Mikis & Rigby, Robert A., 2007. "Generalized Additive Models for Location Scale and Shape (GAMLSS) in R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 23(i07).
    21. Dag W Aksnes, 2003. "Characteristics of highly cited papers," Research Evaluation, Oxford University Press, vol. 12(3), pages 159-170, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Mingyang Wang & Zhenyu Wang & Guangsheng Chen, 2019. "Which can better predict the future success of articles? Bibliometric indices or alternative metrics," Scientometrics, Springer;Akadémiai Kiadó, vol. 119(3), pages 1575-1595, June.
    2. Iman Tahamtan & Askar Safipour Afshar & Khadijeh Ahamdzadeh, 2016. "Factors affecting number of citations: a comprehensive review of the literature," Scientometrics, Springer;Akadémiai Kiadó, vol. 107(3), pages 1195-1225, June.
    3. Mike Thelwall, 2016. "Interpreting correlations between citation counts and other indicators," Scientometrics, Springer;Akadémiai Kiadó, vol. 108(1), pages 337-347, July.
    4. Thelwall, Mike, 2016. "The precision of the arithmetic mean, geometric mean and percentiles for citation data: An experimental simulation modelling approach," Journal of Informetrics, Elsevier, vol. 10(1), pages 110-123.
    5. Abramo, Giovanni & D’Angelo, Ciriaco Andrea, 2015. "The relationship between the number of authors of a publication, its citations and the impact factor of the publishing journal: Evidence from Italy," Journal of Informetrics, Elsevier, vol. 9(4), pages 746-761.
    6. Yaşar Tonta & Müge Akbulut, 2020. "Does monetary support increase citation impact of scholarly papers?," Scientometrics, Springer;Akadémiai Kiadó, vol. 125(2), pages 1617-1641, November.
    7. Tian Yu & Guang Yu & Peng-Yu Li & Liang Wang, 2014. "Citation impact prediction for scientific papers using stepwise regression analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 101(2), pages 1233-1252, November.
    8. Thelwall, Mike & Fairclough, Ruth, 2015. "The influence of time and discipline on the magnitude of correlations between citation counts and quality scores," Journal of Informetrics, Elsevier, vol. 9(3), pages 529-541.
    9. Bornmann, Lutz & Leydesdorff, Loet, 2015. "Does quality and content matter for citedness? A comparison with para-textual factors and over time," Journal of Informetrics, Elsevier, vol. 9(3), pages 419-429.
    10. Bornmann, Lutz, 2014. "Do altmetrics point to the broader impact of research? An overview of benefits and disadvantages of altmetrics," Journal of Informetrics, Elsevier, vol. 8(4), pages 895-903.
    11. Maksym Polyakov & Serhiy Polyakov & Md Sayed Iftekhar, 2017. "Does academic collaboration equally benefit impact of research across topics? The case of agricultural, resource, environmental and ecological economics," Scientometrics, Springer;Akadémiai Kiadó, vol. 113(3), pages 1385-1405, December.
    12. Mike Thelwall & Paul Wilson, 2016. "Does research with statistics have more impact? The citation rank advantage of structural equation modeling," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 67(5), pages 1233-1244, May.
    13. Kaile Gong & Juan Xie & Ying Cheng & Vincent Larivière & Cassidy R. Sugimoto, 2019. "The citation advantage of foreign language references for Chinese social science papers," Scientometrics, Springer;Akadémiai Kiadó, vol. 120(3), pages 1439-1460, September.
    14. Hu, Zhigang & Tian, Wencan & Xu, Shenmeng & Zhang, Chunbo & Wang, Xianwen, 2018. "Four pitfalls in normalizing citation indicators: An investigation of ESI’s selection of highly cited papers," Journal of Informetrics, Elsevier, vol. 12(4), pages 1133-1145.
    15. Vîiu, Gabriel-Alexandru, 2018. "The lognormal distribution explains the remarkable pattern documented by characteristic scores and scales in scientometrics," Journal of Informetrics, Elsevier, vol. 12(2), pages 401-415.
    16. Ruan, Xuanmin & Zhu, Yuanyang & Li, Jiang & Cheng, Ying, 2020. "Predicting the citation counts of individual papers via a BP neural network," Journal of Informetrics, Elsevier, vol. 14(3).
    17. Thelwall, Mike & Fairclough, Ruth, 2015. "Geometric journal impact factors correcting for individual highly cited articles," Journal of Informetrics, Elsevier, vol. 9(2), pages 263-272.
    18. Thelwall, Mike & Fairclough, Ruth, 2017. "The accuracy of confidence intervals for field normalised indicators," Journal of Informetrics, Elsevier, vol. 11(2), pages 530-540.
    19. Copiello, Sergio, 2019. "Peer and neighborhood effects: Citation analysis using a spatial autoregressive model and pseudo-spatial data," Journal of Informetrics, Elsevier, vol. 13(1), pages 238-254.
    20. Thelwall, Mike & Sud, Pardeep, 2016. "National, disciplinary and temporal variations in the extent to which articles with more authors have more impact: Evidence from a geometric field normalised citation indicator," Journal of Informetrics, Elsevier, vol. 10(1), pages 48-61.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:infome:v:8:y:2014:i:4:p:963-971. See general information about how to correct material in RePEc.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: (Haili He). General contact details of provider: http://www.elsevier.com/locate/joi .

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service hosted by the Research Division of the Federal Reserve Bank of St. Louis . RePEc uses bibliographic data supplied by the respective publishers.