Regression for citation data: An evaluation of different methods

Regression for citation data: An evaluation of different methods

Author

Listed:

Thelwall, Mike
Wilson, Paul

Abstract

Citations are increasingly used for research evaluations. It is therefore important to identify factors affecting citation scores that are unrelated to scholarly quality or usefulness so that these can be taken into account. Regression is the most powerful statistical technique to identify these factors and hence it is important to identify the best regression strategy for citation data. Citation counts tend to follow a discrete lognormal distribution and, in the absence of alternatives, have been investigated with negative binomial regression. Using simulated discrete lognormal data (continuous lognormal data rounded to the nearest integer) this article shows that a better strategy is to add one to the citations, take their log and then use the general linear (ordinary least squares) model for regression (e.g., multiple linear regression, ANOVA), or to use the generalised linear model without the log. Reasonable results can also be obtained if all the zero citations are discarded, the log is taken of the remaining citation counts and then the general linear model is used, or if the generalised linear model is used with the continuous lognormal distribution. Similar approaches are recommended for altmetric data, if it proves to be lognormally distributed.

Suggested Citation

Thelwall, Mike & Wilson, Paul, 2014. "Regression for citation data: An evaluation of different methods," Journal of Informetrics, Elsevier, vol. 8(4), pages 963-971.

Handle: RePEc:eee:infome:v:8:y:2014:i:4:p:963-971
DOI: 10.1016/j.joi.2014.09.011

Download full text from publisher

As the access to this document is restricted, you may want to

for a different version of it.

References listed on IDEAS

H. P. F. Peters & A. F. J. van Raan, 1994. "On determinants of citation scores: A case study in chemical engineering," Journal of the American Society for Information Science, Association for Information Science & Technology, vol. 45(1), pages 39-49, January.
Sei‐Ching Joanna Sin, 2011. "International coauthorship and citation impact: A bibliometric study of six LIS journals, 1980–2008," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 62(9), pages 1770-1783, September.
Glenn D. Walters, 2006. "Predicting subsequent citations to articles published in twelve crime-psychology journals: Author impact versus journal impact," Scientometrics, Springer;Akadémiai Kiadó, vol. 69(3), pages 499-510, December.
Yu, Tian & Yu, Guang & Wang, Ming-Yang, 2014. "Classification method for detecting coercive self-citation in journals," Journal of Informetrics, Elsevier, vol. 8(1), pages 123-135.
David L. Anderson & Warren Smart & John Tressler, 2013. "Evaluating research -- peer review team assessment and journal based bibliographic measures: New Zealand PBRF research output scores in 2006," New Zealand Economic Papers, Taylor & Francis Journals, vol. 47(2), pages 140-157, August.
- David L. Anderson & Warren Smart & John Tressler, 2012. "Evaluating Research - Peer Review Team Assessment and Journal-Based Bibliographic Measures: New Zealand PBRF Research Output Scores in 2006," Working Papers in Economics 12/03, University of Waikato.
Chaomei Chen, 2012. "Predictive effects of structural variation on citation counts," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 63(3), pages 431-449, March.
Xuemei Li & Mike Thelwall & Dean Giustini, 2012. "Validating online reference managers for scholarly impact measurement," Scientometrics, Springer;Akadémiai Kiadó, vol. 91(2), pages 461-471, May.
Zi‐Lin He, 2009. "International collaboration does not have greater epistemic authority," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 60(10), pages 2151-2164, October.
John D. McDonald, 2007. "Understanding journal usage: A statistical analysis of citation and use," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 58(1), pages 39-50, January.
Loet Leydesdorff & Stephen Bensman, 2006. "Classification and powerlaws: The logarithmic transformation," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 57(11), pages 1470-1486, September.
Per O. Seglen, 1992. "The skewness of science," Journal of the American Society for Information Science, Association for Information Science & Technology, vol. 43(9), pages 628-638, October.
Stasinopoulos, D. Mikis & Rigby, Robert A., 2007. "Generalized Additive Models for Location Scale and Shape (GAMLSS) in R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 23(i07).
Fuyuki Yoshikane, 2013. "Multiple regression analysis of a patent’s citation frequency and quantitative characteristics: the case of Japanese patents," Scientometrics, Springer;Akadémiai Kiadó, vol. 96(1), pages 365-379, July.
Jonathan Furner, 2002. "On recommending," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 53(9), pages 747-763.
Chaomei Chen, 2012. "Predictive effects of structural variation on citation counts," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 63(3), pages 431-449, March.
Radhamany Sooryamoorthy, 2009. "Do types of collaboration change citation? Collaboration and citation patterns of South African science publications," Scientometrics, Springer;Akadémiai Kiadó, vol. 81(1), pages 177-193, October.
Fereshteh Didegah & Mike Thelwall, 2013. "Determinants of research citation impact in nanoscience and nanotechnology," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 64(5), pages 1055-1064, May.
Katherine W. McCain, 2012. "Assessing Obliteration by Incorporation: Issues and Caveats," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 63(11), pages 2129-2139, November.
Michael N. Mavros & Vangelis Bardakas & Petros I. Rafailidis & Thalia A. Sardi & Elena Demetriou & Matthew E. Falagas, 2013. "Comparison of number of citations to full original articles versus brief reports," Scientometrics, Springer;Akadémiai Kiadó, vol. 94(1), pages 203-206, January.
Susanne E. Baumgartner & Loet Leydesdorff, 2014. "Group-based trajectory modeling (GBTM) of citations in scholarly literature: Dynamic qualities of “transient” and “sticky knowledge claims”," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 65(4), pages 797-811, April.
Ehsan Mohammadi & Mike Thelwall, 2013. "Assessing non-standard article impact using F1000 labels," Scientometrics, Springer;Akadémiai Kiadó, vol. 97(2), pages 383-395, November.
Ehsan Mohammadi & Mike Thelwall, 2014. "Mendeley readership altmetrics for the social sciences and humanities: Research evaluation and knowledge flows," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 65(8), pages 1627-1638, August.
Hadas Shema & Judit Bar-Ilan & Mike Thelwall, 2014. "Do blog citations correlate with a higher number of future citations? Research blogs as a potential source for alternative metrics," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 65(5), pages 1018-1027, May.
Tang, Li, 2013. "Does “birds of a feather flock together” matter—Evidence from a longitudinal study on US–China scientific collaboration," Journal of Informetrics, Elsevier, vol. 7(2), pages 330-344.
Franceschet, Massimo & Costantini, Antonio, 2011. "The first Italian research assessment exercise: A bibliometric perspective," Journal of Informetrics, Elsevier, vol. 5(2), pages 275-291.
Koon-Kiu Yan & Mark Gerstein, 2011. "The Spread of Scientific Information: Insights from the Web Usage Statistics in PLoS Article-Level Metrics," PLOS ONE, Public Library of Science, vol. 6(5), pages 1-7, May.
Michael J. Stringer & Marta Sales-Pardo & Luís A. Nunes Amaral, 2010. "Statistical validation of a global model for the distribution of the ultimate number of citations accrued by papers published in a scientific journal," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 61(7), pages 1377-1385, July.
Aziz Kutlar & Ali Kabasakal & Mehmet Sena Ekici, 2013. "Contributions of Turkish academicians supervising PhD dissertations and their universities to economics: an evaluation of the 1990–2011 period," Scientometrics, Springer;Akadémiai Kiadó, vol. 97(3), pages 639-658, December.
Didegah, Fereshteh & Thelwall, Mike, 2013. "Which factors help authors produce the highest impact research? Collaboration, journal and document properties," Journal of Informetrics, Elsevier, vol. 7(4), pages 861-873.
Thelwall, Mike & Wilson, Paul, 2014. "Distributions for cited articles from individual subjects and years," Journal of Informetrics, Elsevier, vol. 8(4), pages 824-839.
T. S. Evans & N. Hopkins & B. S. Kaube, 2012. "Universality of performance indicators based on citation and reference counts," Scientometrics, Springer;Akadémiai Kiadó, vol. 93(2), pages 473-495, November.
Bornmann, Lutz & Williams, Richard, 2013. "How to calculate the practical significance of citation impact differences? An empirical example from evaluative institutional bibliometrics using adjusted predictions and marginal effects," Journal of Informetrics, Elsevier, vol. 7(2), pages 562-574.
Waltman, Ludo & van Eck, Nees Jan & van Leeuwen, Thed N. & Visser, Martijn S. & van Raan, Anthony F.J., 2011. "Towards a new crown indicator: Some theoretical considerations," Journal of Informetrics, Elsevier, vol. 5(1), pages 37-47.
Sei-Ching Joanna Sin, 2011. "International coauthorship and citation impact: A bibliometric study of six LIS journals, 1980–2008," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 62(9), pages 1770-1783, September.
Katherine W. McCain, 2012. "Assessing Obliteration by Incorporation: Issues and Caveats," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 63(11), pages 2129-2139, November.
Wu, Jiang, 2013. "Investigating the universal distributions of normalized indicators and developing field-independent index," Journal of Informetrics, Elsevier, vol. 7(1), pages 63-71.
Giovanni Abramo & Tindaro Cicero & Ciriaco Andrea D’Angelo, 2013. "National peer-review research assessment exercises for the hard sciences can be a complete waste of money: the Italian case," Scientometrics, Springer;Akadémiai Kiadó, vol. 95(1), pages 311-324, April.
O. Mryglod & R. Kenna & Yu. Holovatch & B. Berche, 2013. "Absolute and specific measures of research group excellence," Scientometrics, Springer;Akadémiai Kiadó, vol. 95(1), pages 115-127, April.
Fereshteh Didegah & Mike Thelwall, 2013. "Determinants of research citation impact in nanoscience and nanotechnology," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 64(5), pages 1055-1064, May.
John Rigby, 2013. "Looking for the impact of peer review: does count of funding acknowledgements really predict research impact?," Scientometrics, Springer;Akadémiai Kiadó, vol. 94(1), pages 57-73, January.
O. Mryglod & R. Kenna & Yu. Holovatch & B. Berche, 2013. "Comparison of a citation-based indicator and peer review for absolute and specific measures of research-group excellence," Scientometrics, Springer;Akadémiai Kiadó, vol. 97(3), pages 767-777, December.
Butler, Linda, 2003. "Explaining Australia's increased share of ISI publications--the effects of a funding formula based on publication counts," Research Policy, Elsevier, vol. 32(1), pages 143-155, January.
Dag W Aksnes, 2003. "Characteristics of highly cited papers," Research Evaluation, Oxford University Press, vol. 12(3), pages 159-170, December.

Full references (including those not matched with items on IDEAS)

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Mingyang Wang & Zhenyu Wang & Guangsheng Chen, 2019. "Which can better predict the future success of articles? Bibliometric indices or alternative metrics," Scientometrics, Springer;Akadémiai Kiadó, vol. 119(3), pages 1575-1595, June.
Kayvan Kousha & Mike Thelwall, 2024. "Factors associating with or predicting more cited or higher quality journal articles: An Annual Review of Information Science and Technology (ARIST) paper," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 75(3), pages 215-244, March.
Iman Tahamtan & Askar Safipour Afshar & Khadijeh Ahamdzadeh, 2016. "Factors affecting number of citations: a comprehensive review of the literature," Scientometrics, Springer;Akadémiai Kiadó, vol. 107(3), pages 1195-1225, June.
Ajiferuke, Isola & Famoye, Felix, 2015. "Modelling count response variables in informetric studies: Comparison among count, linear, and lognormal regression models," Journal of Informetrics, Elsevier, vol. 9(3), pages 499-513.
Martorell Cunil, Onofre & Otero González, Luis & Durán Santomil, Pablo & Mulet Forteza, Carlos, 2023. "How to accomplish a highly cited paper in the tourism, leisure and hospitality field," Journal of Business Research, Elsevier, vol. 157(C).
Elizabeth S. Vieira, 2023. "The influence of research collaboration on citation impact: the countries in the European Innovation Scoreboard," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(6), pages 3555-3579, June.
Fan, Lingxu & Guo, Lei & Wang, Xinhua & Xu, Liancheng & Liu, Fangai, 2022. "Does the author’s collaboration mode lead to papers’ different citation impacts? An empirical analysis based on propensity score matching," Journal of Informetrics, Elsevier, vol. 16(4).
Zhang, Xinyuan & Xie, Qing & Song, Min, 2021. "Measuring the impact of novelty, bibliometric, and academic-network factors on citation count using a neural network," Journal of Informetrics, Elsevier, vol. 15(2).
Uddin, Shahadat & Khan, Arif, 2016. "The impact of author-selected keywords on citation counts," Journal of Informetrics, Elsevier, vol. 10(4), pages 1166-1177.
Thelwall, Mike & Sud, Pardeep, 2016. "National, disciplinary and temporal variations in the extent to which articles with more authors have more impact: Evidence from a geometric field normalised citation indicator," Journal of Informetrics, Elsevier, vol. 10(1), pages 48-61.
Thelwall, Mike, 2016. "The precision of the arithmetic mean, geometric mean and percentiles for citation data: An experimental simulation modelling approach," Journal of Informetrics, Elsevier, vol. 10(1), pages 110-123.
Yaşar Tonta & Müge Akbulut, 2020. "Does monetary support increase citation impact of scholarly papers?," Scientometrics, Springer;Akadémiai Kiadó, vol. 125(2), pages 1617-1641, November.
Abramo, Giovanni & D’Angelo, Ciriaco Andrea, 2015. "The relationship between the number of authors of a publication, its citations and the impact factor of the publishing journal: Evidence from Italy," Journal of Informetrics, Elsevier, vol. 9(4), pages 746-761.
Mingyang Wang & Shi Li & Guangsheng Chen, 2017. "Detecting latent referential articles based on their vitality performance in the latest 2 years," Scientometrics, Springer;Akadémiai Kiadó, vol. 112(3), pages 1557-1571, September.
Yifan Qian & Wenge Rong & Nan Jiang & Jie Tang & Zhang Xiong, 2017. "Citation regression analysis of computer science publications in different ranking categories and subfields," Scientometrics, Springer;Akadémiai Kiadó, vol. 110(3), pages 1351-1374, March.
Tian Yu & Guang Yu & Peng-Yu Li & Liang Wang, 2014. "Citation impact prediction for scientific papers using stepwise regression analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 101(2), pages 1233-1252, November.
Thelwall, Mike & Fairclough, Ruth, 2015. "The influence of time and discipline on the magnitude of correlations between citation counts and quality scores," Journal of Informetrics, Elsevier, vol. 9(3), pages 529-541.
Bornmann, Lutz, 2014. "Do altmetrics point to the broader impact of research? An overview of benefits and disadvantages of altmetrics," Journal of Informetrics, Elsevier, vol. 8(4), pages 895-903.
Didegah, Fereshteh & Thelwall, Mike, 2013. "Which factors help authors produce the highest impact research? Collaboration, journal and document properties," Journal of Informetrics, Elsevier, vol. 7(4), pages 861-873.
Vîiu, Gabriel-Alexandru, 2018. "The lognormal distribution explains the remarkable pattern documented by characteristic scores and scales in scientometrics," Journal of Informetrics, Elsevier, vol. 12(2), pages 401-415.

More about this item

Keywords

; ; ; ; ; ;

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:infome:v:8:y:2014:i:4:p:963-971. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/joi .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

Regression for citation data: An evaluation of different methods

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Most related items

More about this item

Keywords

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data