IDEAS home Printed from https://ideas.repec.org/a/spr/scient/v125y2020i1d10.1007_s11192-020-03647-7.html
   My bibliography  Save this article

Sample size in bibliometric analysis

Author

Listed:
  • Gordon Rogers

    (Institute for Scientific Information, Clarivate Analytics)

  • Martin Szomszor

    (Institute for Scientific Information, Clarivate Analytics)

  • Jonathan Adams

    (Institute for Scientific Information, Clarivate Analytics
    King’s College London)

Abstract

While bibliometric analysis is normally able to rely on complete publication sets this is not universally the case. For example, Australia (in ERA) and the UK (in the RAE/REF) use institutional research assessment that may rely on small or fractional parts of researcher output. Using the Category Normalised Citation Impact (CNCI) for the publications of ten universities with similar output (21,000–28,000 articles and reviews) indexed in the Web of Science for 2014–2018, we explore the extent to which a ‘sample’ of institutional data can accurately represent the averages and/or the correct relative status of the population CNCIs. Starting with full institutional data, we find a high variance in average CNCI across 10,000 institutional samples of fewer than 200 papers, which we suggest may be an analytical minimum although smaller samples may be acceptable for qualitative review. When considering the ‘top’ CNCI paper in researcher sets represented by DAIS-ID clusters, we find that samples of 1000 papers provide a good guide to relative (but not absolute) institutional citation performance, which is driven by the abundance of high performing individuals. However, such samples may be perturbed by scarce ‘highly cited’ papers in smaller or less research-intensive units. We draw attention to the significance of this for assessment processes and the further evidence that university rankings are innately unstable and generally unreliable.

Suggested Citation

  • Gordon Rogers & Martin Szomszor & Jonathan Adams, 2020. "Sample size in bibliometric analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 125(1), pages 777-794, October.
  • Handle: RePEc:spr:scient:v:125:y:2020:i:1:d:10.1007_s11192-020-03647-7
    DOI: 10.1007/s11192-020-03647-7
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11192-020-03647-7
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11192-020-03647-7?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Wolfgang Glänzel & Henk F. Moed, 2013. "Opinion paper: thoughts and facts on bibliometric indicators," Scientometrics, Springer;Akadémiai Kiadó, vol. 96(1), pages 381-394, July.
    2. Potter, Ross W.K. & Szomszor, Martin & Adams, Jonathan, 2020. "Interpreting CNCIs on a country-scale: The effect of domestic and international collaboration type," Journal of Informetrics, Elsevier, vol. 14(4).
    3. Per O. Seglen, 1994. "Causal relationship between article citedness and journal impact," Journal of the American Society for Information Science, Association for Information Science & Technology, vol. 45(1), pages 1-11, January.
    4. Fairclough, Ruth & Thelwall, Mike, 2015. "More precise methods for national research citation impact comparisons," Journal of Informetrics, Elsevier, vol. 9(4), pages 895-906.
    5. Zhesi Shen & Liying Yang & Zengru Di & Jinshan Wu, 2019. "Large enough sample size to rank two groups of data reliably according to their means," Scientometrics, Springer;Akadémiai Kiadó, vol. 118(2), pages 653-671, February.
    6. Thelwall, Mike, 2016. "The precision of the arithmetic mean, geometric mean and percentiles for citation data: An experimental simulation modelling approach," Journal of Informetrics, Elsevier, vol. 10(1), pages 110-123.
    7. Per O. Seglen, 1992. "The skewness of science," Journal of the American Society for Information Science, Association for Information Science & Technology, vol. 43(9), pages 628-638, October.
    8. Michael Levin & Stefan Krawczyk & Steven Bethard & Dan Jurafsky, 2012. "Citation‐based bootstrapping for large‐scale author disambiguation," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 63(5), pages 1030-1047, May.
    9. Waltman, Ludo, 2016. "A review of the literature on citation impact indicators," Journal of Informetrics, Elsevier, vol. 10(2), pages 365-391.
    10. Henry Small, 1973. "Co‐citation in the scientific literature: A new measure of the relationship between two documents," Journal of the American Society for Information Science, Association for Information Science & Technology, vol. 24(4), pages 265-269, July.
    11. Jochen Gläser & Thomas H Spurling & Linda Butler, 2004. "Intraorganisational evaluation: are there ‘least evaluable units'?," Research Evaluation, Oxford University Press, vol. 13(1), pages 19-32, April.
    12. María del Carmen Calatrava Moreno & Thomas Auzinger & Hannes Werthner, 2016. "On the uncertainty of interdisciplinarity measurements due to incomplete bibliographic data," Scientometrics, Springer;Akadémiai Kiadó, vol. 107(1), pages 213-232, April.
    13. Michael Levin & Stefan Krawczyk & Steven Bethard & Dan Jurafsky, 2012. "Citation-based bootstrapping for large-scale author disambiguation," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 63(5), pages 1030-1047, May.
    14. Howard D. White & Belver C. Griffith, 1981. "Author cocitation: A literature measure of intellectual structure," Journal of the American Society for Information Science, Association for Information Science & Technology, vol. 32(3), pages 163-171, May.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Ricardo Rodrigues & Carlos Sampaio & Paulo Duarte & José Manuel Hernández-Mogollón, 2022. "Cross-Border Innovation: Assessing Concepts, Contexts, and Content," Sustainability, MDPI, vol. 14(23), pages 1-18, November.
    2. Eugene Seo & Sanghee Lee, 2023. "Implications of Aging in Place in the Context of the Residential Environment: Bibliometric Analysis and Literature Review," IJERPH, MDPI, vol. 20(20), pages 1-30, October.
    3. Ruihua Chen & Yafu Gong & Yanghe Liu & Wen Cheng, 2023. "A Bibliometric and Content Analysis of Strategy-Based Instruction in Second or Foreign Language Teaching From 2000 to 2021," SAGE Open, , vol. 13(1), pages 21582440231, March.
    4. Araceli Martin-Candilejo & Francisco J. Martin-Carrasco & Ana Iglesias & Luis Garrote, 2023. "Heading into the Unknown? Exploring Sustainable Drought Management in the Mediterranean Region," Sustainability, MDPI, vol. 16(1), pages 1-18, December.
    5. Satish Kumar & Weng Marc Lim & Nitesh Pandey & J. Christopher Westland, 2021. "20 years of Electronic Commerce Research," Electronic Commerce Research, Springer, vol. 21(1), pages 1-40, March.
    6. Jonathan Adams & Jo Johnson & Jonathan Grant, 2022. "The rise of UK–China research collaboration: Trends, opportunities and challenges [The West Should Start Sending Its Scientists to China]," Science and Public Policy, Oxford University Press, vol. 49(1), pages 132-147.
    7. Weisheng Chiu & Thomas Chun Man Fan & Sang-Back Nam & Ping-Hung Sun, 2021. "Knowledge Mapping and Sustainable Development of eSports Research: A Bibliometric and Visualized Analysis," Sustainability, MDPI, vol. 13(18), pages 1-17, September.
    8. Huichen Gao & Shijuan Wang, 2022. "The Intellectual Structure of Research on Rural-to-Urban Migrants: A Bibliometric Analysis," IJERPH, MDPI, vol. 19(15), pages 1-19, August.
    9. Alencar Bravo & Darli Vieira & Thais Ayres Rebello, 2022. "The Origins, Evolution, Current State, and Future of Green Products and Consumer Research: A Bibliometric Analysis," Sustainability, MDPI, vol. 14(17), pages 1-25, September.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Waltman, Ludo, 2016. "A review of the literature on citation impact indicators," Journal of Informetrics, Elsevier, vol. 10(2), pages 365-391.
    2. Giovanni Abramo & Ciriaco Andrea D’Angelo & Flavia Costa, 2023. "Correlating article citedness and journal impact: an empirical investigation by field on a large-scale dataset," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(3), pages 1877-1894, March.
    3. Perianes-Rodriguez, Antonio & Waltman, Ludo & van Eck, Nees Jan, 2016. "Constructing bibliometric networks: A comparison between full and fractional counting," Journal of Informetrics, Elsevier, vol. 10(4), pages 1178-1195.
    4. Xu, Shuo & Hao, Liyuan & Yang, Guancan & Lu, Kun & An, Xin, 2021. "A topic models based framework for detecting and forecasting emerging technologies," Technological Forecasting and Social Change, Elsevier, vol. 162(C).
    5. Gerson Pech & Catarina Delgado, 2020. "Percentile and stochastic-based approach to the comparison of the number of citations of articles indexed in different bibliographic databases," Scientometrics, Springer;Akadémiai Kiadó, vol. 123(1), pages 223-252, April.
    6. Zhesi Shen & Liying Yang & Zengru Di & Jinshan Wu, 2019. "Large enough sample size to rank two groups of data reliably according to their means," Scientometrics, Springer;Akadémiai Kiadó, vol. 118(2), pages 653-671, February.
    7. Antonoyiannakis, Manolis, 2018. "Impact Factors and the Central Limit Theorem: Why citation averages are scale dependent," Journal of Informetrics, Elsevier, vol. 12(4), pages 1072-1088.
    8. Bo Liu & Wei Song & Qian Sun, 2022. "Status, Trend, and Prospect of Global Farmland Abandonment Research: A Bibliometric Analysis," IJERPH, MDPI, vol. 19(23), pages 1-30, November.
    9. Lin Zhang & Ronald Rousseau & Gunnar Sivertsen, 2017. "Science deserves to be judged by its contents, not by its wrapping: Revisiting Seglen's work on journal impact and research evaluation," PLOS ONE, Public Library of Science, vol. 12(3), pages 1-18, March.
    10. Yong Huang & Yi Bu & Ying Ding & Wei Lu, 2018. "Number versus structure: towards citing cascades," Scientometrics, Springer;Akadémiai Kiadó, vol. 117(3), pages 2177-2193, December.
    11. Thelwall, Mike, 2016. "Are there too many uncited articles? Zero inflated variants of the discretised lognormal and hooked power law distributions," Journal of Informetrics, Elsevier, vol. 10(2), pages 622-633.
    12. Tsung Teng Chen, 2012. "The development and empirical study of a literature review aiding system," Scientometrics, Springer;Akadémiai Kiadó, vol. 92(1), pages 105-116, July.
    13. Gaviria-Marin, Magaly & Merigó, José M. & Baier-Fuentes, Hugo, 2019. "Knowledge management: A global examination based on bibliometric analysis," Technological Forecasting and Social Change, Elsevier, vol. 140(C), pages 194-220.
    14. Pamela E. Sandstrom, 2001. "Scholarly communication as a socioecological system," Scientometrics, Springer;Akadémiai Kiadó, vol. 51(3), pages 573-605, July.
    15. Dixit, Aasheesh & Jakhar, Suresh Kumar, 2021. "Airport capacity management: A review and bibliometric analysis," Journal of Air Transport Management, Elsevier, vol. 91(C).
    16. Zhong, Xiang & Liu, Jiajun & Gao, Yong & Wu, Lun, 2017. "Analysis of co-occurrence toponyms in web pages based on complex networks," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 466(C), pages 462-475.
    17. A Cecile J W Janssens & Michael Goodman & Kimberly R Powell & Marta Gwinn, 2017. "A critical evaluation of the algorithm behind the Relative Citation Ratio (RCR)," PLOS Biology, Public Library of Science, vol. 15(10), pages 1-5, October.
    18. Jianhua Hou, 2017. "Exploration into the evolution and historical roots of citation analysis by referenced publication year spectroscopy," Scientometrics, Springer;Akadémiai Kiadó, vol. 110(3), pages 1437-1452, March.
    19. Bonaccorsi, Andrea & Haddawy, Peter & Cicero, Tindaro & Hassan, Saeed-Ul, 2017. "The solitude of stars. An analysis of the distributed excellence model of European universities," Journal of Informetrics, Elsevier, vol. 11(2), pages 435-454.
    20. Thelwall, Mike, 2018. "Dimensions: A competitor to Scopus and the Web of Science?," Journal of Informetrics, Elsevier, vol. 12(2), pages 430-435.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:scient:v:125:y:2020:i:1:d:10.1007_s11192-020-03647-7. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.