IDEAS home Printed from https://ideas.repec.org/a/eee/infome/v8y2014i4p963-971.html
   My bibliography  Save this article

Regression for citation data: An evaluation of different methods

Author

Listed:
  • Thelwall, Mike
  • Wilson, Paul

Abstract

Citations are increasingly used for research evaluations. It is therefore important to identify factors affecting citation scores that are unrelated to scholarly quality or usefulness so that these can be taken into account. Regression is the most powerful statistical technique to identify these factors and hence it is important to identify the best regression strategy for citation data. Citation counts tend to follow a discrete lognormal distribution and, in the absence of alternatives, have been investigated with negative binomial regression. Using simulated discrete lognormal data (continuous lognormal data rounded to the nearest integer) this article shows that a better strategy is to add one to the citations, take their log and then use the general linear (ordinary least squares) model for regression (e.g., multiple linear regression, ANOVA), or to use the generalised linear model without the log. Reasonable results can also be obtained if all the zero citations are discarded, the log is taken of the remaining citation counts and then the general linear model is used, or if the generalised linear model is used with the continuous lognormal distribution. Similar approaches are recommended for altmetric data, if it proves to be lognormally distributed.

Suggested Citation

  • Thelwall, Mike & Wilson, Paul, 2014. "Regression for citation data: An evaluation of different methods," Journal of Informetrics, Elsevier, vol. 8(4), pages 963-971.
  • Handle: RePEc:eee:infome:v:8:y:2014:i:4:p:963-971
    DOI: 10.1016/j.joi.2014.09.011
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S1751157714000923
    Download Restriction: Full text for ScienceDirect subscribers only

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Bornmann, Lutz & Williams, Richard, 2013. "How to calculate the practical significance of citation impact differences? An empirical example from evaluative institutional bibliometrics using adjusted predictions and marginal effects," Journal of Informetrics, Elsevier, vol. 7(2), pages 562-574.
    2. Butler, Linda, 2003. "Explaining Australia's increased share of ISI publications--the effects of a funding formula based on publication counts," Research Policy, Elsevier, vol. 32(1), pages 143-155, January.
    3. David L. Anderson & Warren Smart & John Tressler, 2013. "Evaluating research -- peer review team assessment and journal based bibliographic measures: New Zealand PBRF research output scores in 2006," New Zealand Economic Papers, Taylor & Francis Journals, vol. 47(2), pages 140-157, August.
    4. Waltman, Ludo & van Eck, Nees Jan & van Leeuwen, Thed N. & Visser, Martijn S. & van Raan, Anthony F.J., 2011. "Towards a new crown indicator: Some theoretical considerations," Journal of Informetrics, Elsevier, vol. 5(1), pages 37-47.
    5. Fereshteh Didegah & Mike Thelwall, 2013. "Determinants of research citation impact in nanoscience and nanotechnology," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 64(5), pages 1055-1064, May.
    6. Sei-Ching Joanna Sin, 2011. "International coauthorship and citation impact: A bibliometric study of six LIS journals, 1980–2008," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 62(9), pages 1770-1783, September.
    7. Michael J. Stringer & Marta Sales-Pardo & Luís A. Nunes Amaral, 2010. "Statistical validation of a global model for the distribution of the ultimate number of citations accrued by papers published in a scientific journal," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 61(7), pages 1377-1385, July.
    8. Yu, Tian & Yu, Guang & Wang, Ming-Yang, 2014. "Classification method for detecting coercive self-citation in journals," Journal of Informetrics, Elsevier, vol. 8(1), pages 123-135.
    9. Katherine W. McCain, 2012. "Assessing Obliteration by Incorporation: Issues and Caveats," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 63(11), pages 2129-2139, November.
    10. Susanne E. Baumgartner & Loet Leydesdorff, 2014. "Group-based trajectory modeling (GBTM) of citations in scholarly literature: Dynamic qualities of “transient” and “sticky knowledge claims”," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 65(4), pages 797-811, April.
    11. Wu, Jiang, 2013. "Investigating the universal distributions of normalized indicators and developing field-independent index," Journal of Informetrics, Elsevier, vol. 7(1), pages 63-71.
    12. Ehsan Mohammadi & Mike Thelwall, 2014. "Mendeley readership altmetrics for the social sciences and humanities: Research evaluation and knowledge flows," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 65(8), pages 1627-1638, August.
    13. Hadas Shema & Judit Bar-Ilan & Mike Thelwall, 2014. "Do blog citations correlate with a higher number of future citations? Research blogs as a potential source for alternative metrics," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 65(5), pages 1018-1027, May.
    14. Didegah, Fereshteh & Thelwall, Mike, 2013. "Which factors help authors produce the highest impact research? Collaboration, journal and document properties," Journal of Informetrics, Elsevier, vol. 7(4), pages 861-873.
    15. Tang, Li, 2013. "Does “birds of a feather flock together” matter—Evidence from a longitudinal study on US–China scientific collaboration," Journal of Informetrics, Elsevier, vol. 7(2), pages 330-344.
    16. Franceschet, Massimo & Costantini, Antonio, 2011. "The first Italian research assessment exercise: A bibliometric perspective," Journal of Informetrics, Elsevier, vol. 5(2), pages 275-291.
    17. Chaomei Chen, 2012. "Predictive effects of structural variation on citation counts," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 63(3), pages 431-449, March.
    18. Stasinopoulos, D. Mikis & Rigby, Robert A., 2007. "Generalized Additive Models for Location Scale and Shape (GAMLSS) in R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 23(i07).
    19. Dag W Aksnes, 2003. "Characteristics of highly cited papers," Research Evaluation, Oxford University Press, vol. 12(3), pages 159-170, December.
    20. Thelwall, Mike & Wilson, Paul, 2014. "Distributions for cited articles from individual subjects and years," Journal of Informetrics, Elsevier, vol. 8(4), pages 824-839.
    Full references (including those not matched with items on IDEAS)

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:infome:v:8:y:2014:i:4:p:963-971. See general information about how to correct material in RePEc.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: (Dana Niculescu). General contact details of provider: http://www.elsevier.com/locate/joi .

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service hosted by the Research Division of the Federal Reserve Bank of St. Louis . RePEc uses bibliographic data supplied by the respective publishers.