IDEAS home Printed from https://ideas.repec.org/a/spr/scient/v130y2025i5d10.1007_s11192-025-05335-w.html
   My bibliography  Save this article

How much data is sufficient for reliable bibliometric domain analysis? A multi-scenario experimental approach

Author

Listed:
  • Guo Chen

    (Nanjing University of Science and Technology)

  • Shuya Chen

    (Nanjing University of Science and Technology)

  • Zhili Chen

    (Nanjing University of Science and Technology)

  • Lu Xiao

    (Nanjing University of Finance and Economics)

  • Jiming Hu

    (Wuhan University)

Abstract

Determining the adequate data size for bibliometric domain analysis is a crucial yet unresolved issue in bibliometric research. In this paper, we propose a systematic approach to address this challenge by considering multiple task scenarios and conducting sampling experiments on five domains. We introduce two indexes to quantitatively evaluate the reliability of sub-bibliographic datasets with different sample sizes in fitting the complete bibliographic datasets, focusing on the impact of scale on dataset completeness. We find that while larger datasets tend to yield better results, diminishing returns are observed as the dataset size increases due to higher costs and time investments. Specific analysis tasks, such as subject category and country analysis (including co-occurrence relationships), can be conducted with smaller dataset sizes. However, analyzing authors and their co-occurrence relationships necessitates a larger dataset size. Nevertheless, different analysis scenarios require varying dataset sizes, especially when considering result ranking, co-occurrence relationship analysis, and top high-frequency elements. We also find that the appropriate dataset scale for analyzing different elements depends on their power-law distribution in the bibliographic dataset. Our findings offer practical guidance for researchers in selecting the appropriate dataset size for their specific analysis tasks, taking into account factors such as domain size, analyzed objects, the number of top values to be analyzed, and result ranking requirements.

Suggested Citation

  • Guo Chen & Shuya Chen & Zhili Chen & Lu Xiao & Jiming Hu, 2025. "How much data is sufficient for reliable bibliometric domain analysis? A multi-scenario experimental approach," Scientometrics, Springer;Akadémiai Kiadó, vol. 130(5), pages 2923-2946, May.
  • Handle: RePEc:spr:scient:v:130:y:2025:i:5:d:10.1007_s11192-025-05335-w
    DOI: 10.1007/s11192-025-05335-w
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11192-025-05335-w
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11192-025-05335-w?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to

    for a different version of it.

    References listed on IDEAS

    as
    1. Gordon Rogers & Martin Szomszor & Jonathan Adams, 2020. "Sample size in bibliometric analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 125(1), pages 777-794, October.
    2. S. Lozano & L. Calzada-Infante & B. Adenso-Díaz & S. García, 2019. "Complex network analysis of keywords co-occurrence in the recent efficiency analysis literature," Scientometrics, Springer;Akadémiai Kiadó, vol. 120(2), pages 609-629, August.
    3. Muhammad Omar & Arif Mehmood & Gyu Sang Choi & Han Woo Park, 2017. "Global mapping of artificial intelligence in Google and Google Scholar," Scientometrics, Springer;Akadémiai Kiadó, vol. 113(3), pages 1269-1305, December.
    4. Chen, Guo & Xiao, Lu, 2016. "Selecting publication keywords for domain analysis in bibliometrics: A comparison of three methods," Journal of Informetrics, Elsevier, vol. 10(1), pages 212-223.
    5. Staša Milojević & Cassidy R. Sugimoto & Erjia Yan & Ying Ding, 2011. "The cognitive structure of Library and Information Science: Analysis of article title words," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 62(10), pages 1933-1953, October.
    6. Fei Shu & Charles‐Antoine Julien & Vincent Larivière, 2019. "Does the web of science accurately represent chinese scientific performance?," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 70(10), pages 1138-1152, October.
    7. Loet Leydesdorff & Adina Nerghes, 2017. "Co-word maps and topic modeling: A comparison using small and medium-sized corpora (N > 1,000)," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 68(4), pages 1024-1035, April.
    8. Jan Schulz, 2016. "Using Monte Carlo simulations to assess the impact of author name disambiguation quality on different bibliometric analyses," Scientometrics, Springer;Akadémiai Kiadó, vol. 107(3), pages 1283-1298, June.
    9. Guo Chen & Jing Chen & Yu Shao & Lu Xiao, 2023. "Automatic noise reduction of domain-specific bibliographic datasets using positive-unlabeled learning," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(2), pages 1187-1204, February.
    10. Shu, Fei & Dinneen, Jesse David & Asadi, Banafsheh & Julien, Charles-Antoine, 2017. "Mapping science using Library of Congress Subject Headings," Journal of Informetrics, Elsevier, vol. 11(4), pages 1080-1094.
    11. Vivek Kumar Singh & Prashasti Singh & Mousumi Karmakar & Jacqueline Leta & Philipp Mayr, 2021. "The journal coverage of Web of Science, Scopus and Dimensions: A comparative analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(6), pages 5113-5142, June.
    12. Kevin W. Boyack & Katy Börner & Richard Klavans, 2009. "Mapping the structure and evolution of chemistry research," Scientometrics, Springer;Akadémiai Kiadó, vol. 79(1), pages 45-60, April.
    13. Waleed Iqbal & Junaid Qadir & Gareth Tyson & Adnan Noor Mian & Saeed-ul Hassan & Jon Crowcroft, 2019. "A bibliometric analysis of publications in computer networking research," Scientometrics, Springer;Akadémiai Kiadó, vol. 119(2), pages 1121-1155, May.
    14. Williams, Richard & Bornmann, Lutz, 2016. "Sampling issues in bibliometric analysis," Journal of Informetrics, Elsevier, vol. 10(4), pages 1225-1232.
    15. Shu, Fei & Julien, Charles-Antoine & Zhang, Lin & Qiu, Junping & Zhang, Jing & Larivière, Vincent, 2019. "Comparing journal and paper level classifications of science," Journal of Informetrics, Elsevier, vol. 13(1), pages 202-225.
    16. Staša Milojević & Cassidy R. Sugimoto & Erjia Yan & Ying Ding, 2011. "The cognitive structure of Library and Information Science: Analysis of article title words," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 62(10), pages 1933-1953, October.
    17. Yu-Wei Chang & Mu-Hsuan Huang, 2012. "A study of the evolution of interdisciplinarity in library and information science: Using three bibliometric methods," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 63(1), pages 22-33, January.
    18. Waltman, Ludo & van Eck, Nees Jan & Noyons, Ed C.M., 2010. "A unified approach to mapping and clustering of bibliometric networks," Journal of Informetrics, Elsevier, vol. 4(4), pages 629-635.
    19. Yu‐Wei Chang & Mu‐Hsuan Huang, 2012. "A study of the evolution of interdisciplinarity in library and information science: Using three bibliometric methods," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 63(1), pages 22-33, January.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. María Pinto & Rosaura Fernández-Pascual & David Caballero-Mariscal & Dora Sales, 2020. "Information literacy trends in higher education (2006–2019): visualizing the emerging field of mobile information literacy," Scientometrics, Springer;Akadémiai Kiadó, vol. 124(2), pages 1479-1510, August.
    2. Yuen-Hsien Tseng & Ming-Yueh Tsay, 2013. "Journal clustering of library and information science for subfield delineation using the bibliometric analysis toolkit: CATAR," Scientometrics, Springer;Akadémiai Kiadó, vol. 95(2), pages 503-528, May.
    3. Abhijit Thakuria & Dipen Deka, 2024. "A decadal study on identifying latent topics and research trends in open access LIS journals using topic modeling approach," Scientometrics, Springer;Akadémiai Kiadó, vol. 129(7), pages 3841-3869, July.
    4. Yu-Wei Chang, 2018. "Examining interdisciplinarity of library and information science (LIS) based on LIS articles contributed by non-LIS authors," Scientometrics, Springer;Akadémiai Kiadó, vol. 116(3), pages 1589-1613, September.
    5. Pin Li & Guoli Yang & Chuanqi Wang, 2019. "Visual topical analysis of library and information science," Scientometrics, Springer;Akadémiai Kiadó, vol. 121(3), pages 1753-1791, December.
    6. Ping Liu & Qiong Wu & Xiangming Mu & Kaipeng Yu & Yiting Guo, 2015. "Detecting the intellectual structure of library and information science based on formal concept analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 104(3), pages 737-762, September.
    7. Carlos G. Figuerola & Francisco Javier García Marco & María Pinto, 2017. "Mapping the evolution of library and information science (1978–2014) using topic modeling on LISA," Scientometrics, Springer;Akadémiai Kiadó, vol. 112(3), pages 1507-1535, September.
    8. Guo Chen & Jing Chen & Yu Shao & Lu Xiao, 2023. "Automatic noise reduction of domain-specific bibliographic datasets using positive-unlabeled learning," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(2), pages 1187-1204, February.
    9. Rons, Nadine, 2018. "Bibliometric approximation of a scientific specialty by combining key sources, title words, authors and references," Journal of Informetrics, Elsevier, vol. 12(1), pages 113-132.
    10. Belén Ribeiro-Navarrete & José Ramón Saura & Virginia Simón-Moya, 2024. "Setting the development of digitalization: state-of-the-art and potential for future research in cooperatives," Review of Managerial Science, Springer, vol. 18(5), pages 1459-1488, May.
    11. Karmen Stopar & Tomaž Bartol, 2019. "Digital competences, computer skills and information literacy in secondary education: mapping and visualization of trends and concepts," Scientometrics, Springer;Akadémiai Kiadó, vol. 118(2), pages 479-498, February.
    12. Jian Xu & Yi Bu & Ying Ding & Sinan Yang & Hongli Zhang & Chen Yu & Lin Sun, 2018. "Understanding the formation of interdisciplinary research from the perspective of keyword evolution: a case study on joint attention," Scientometrics, Springer;Akadémiai Kiadó, vol. 117(2), pages 973-995, November.
    13. Shiji Chen & Clément Arsenault & Yves Gingras & Vincent Larivière, 2015. "Exploring the interdisciplinary evolution of a discipline: the case of Biochemistry and Molecular Biology," Scientometrics, Springer;Akadémiai Kiadó, vol. 102(2), pages 1307-1323, February.
    14. repec:plo:pone00:0189137 is not listed on IDEAS
    15. Yi Bu & Binglu Wang & Win-bin Huang & Shangkun Che & Yong Huang, 2018. "Using the appearance of citations in full text on author co-citation analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 116(1), pages 275-289, July.
    16. Yang, Siluo & Han, Ruizhen & Wolfram, Dietmar & Zhao, Yuehua, 2016. "Visualizing the intellectual structure of information science (2006–2015): Introducing author keyword coupling analysis," Journal of Informetrics, Elsevier, vol. 10(1), pages 132-150.
    17. Lin Zhang & Beibei Sun & Fei Shu & Ying Huang, 2022. "Comparing paper level classifications across different methods and systems: an investigation of Nature publications," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(12), pages 7633-7651, December.
    18. Rodrigo Dorantes-Gilardi & Aurora A. Ramírez-Álvarez & Diana Terrazas-Santamaría, 2023. "Is there a differentiated gender effect of collaboration with super-cited authors? Evidence from junior researchers in economics," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(4), pages 2317-2336, April.
    19. Shesen Guo & Ganzhou Zhang, 2017. "Analyzing concept complexity, knowledge ageing and diffusion pattern of Mooc," Scientometrics, Springer;Akadémiai Kiadó, vol. 112(1), pages 413-430, July.
    20. Sabrina Petersohn & Thomas Heinze, 2018. "Professionalization of bibliometric research assessment. Insights from the history of the Leiden Centre for Science and Technology Studies (CWTS)," Science and Public Policy, Oxford University Press, vol. 45(4), pages 565-578.
    21. Hao Wang & Sanhong Deng & Xinning Su, 2016. "A study on construction and analysis of discipline knowledge structure of Chinese LIS based on CSSCI," Scientometrics, Springer;Akadémiai Kiadó, vol. 109(3), pages 1725-1759, December.

    More about this item

    Keywords

    ;
    ;
    ;
    ;
    ;

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:scient:v:130:y:2025:i:5:d:10.1007_s11192-025-05335-w. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.