IDEAS home Printed from https://ideas.repec.org/a/spr/scient/v104y2015i3d10.1007_s11192-015-1614-6.html
   My bibliography  Save this article

Methods for estimating the size of Google Scholar

Author

Listed:
  • Enrique Orduna-Malea

    (Polytechnic University of Valencia)

  • Juan M. Ayllón

    (Universidad de Granada)

  • Alberto Martín-Martín

    (Universidad de Granada)

  • Emilio Delgado López-Cózar

    (Universidad de Granada)

Abstract

The emergence of academic search engines (mainly Google Scholar and Microsoft Academic Search) that aspire to index the entirety of current academic knowledge has revived and increased interest in the size of the academic web. The main objective of this paper is to propose various methods to estimate the current size (number of indexed documents) of Google Scholar (May 2014) and to determine its validity, precision and reliability. To do this, we present, apply and discuss three empirical methods: an external estimate based on empirical studies of Google Scholar coverage, and two internal estimate methods based on direct, empty and absurd queries, respectively. The results, despite providing disparate values, place the estimated size of Google Scholar at around 160–165 million documents. However, all the methods show considerable limitations and uncertainties due to inconsistencies in the Google Scholar search functionalities.

Suggested Citation

  • Enrique Orduna-Malea & Juan M. Ayllón & Alberto Martín-Martín & Emilio Delgado López-Cózar, 2015. "Methods for estimating the size of Google Scholar," Scientometrics, Springer;Akadémiai Kiadó, vol. 104(3), pages 931-949, September.
  • Handle: RePEc:spr:scient:v:104:y:2015:i:3:d:10.1007_s11192-015-1614-6
    DOI: 10.1007/s11192-015-1614-6
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11192-015-1614-6
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11192-015-1614-6?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Wallace Koehler, 1999. "An analysis of web page and web site constancy and permanence," Journal of the American Society for Information Science, Association for Information Science & Technology, vol. 50(2), pages 162-180.
    2. Nigel Payne & Mike Thelwall, 2007. "A longitudinal study of academic webs: Growth and stabilisation," Scientometrics, Springer;Akadémiai Kiadó, vol. 71(3), pages 523-539, June.
    3. Réka Albert & Hawoong Jeong & Albert-László Barabási, 1999. "Diameter of the World-Wide Web," Nature, Nature, vol. 401(6749), pages 130-131, September.
    4. Joost C. F. Winter & Amir A. Zadpoor & Dimitra Dodou, 2014. "The expansion of Google Scholar versus Web of Science: a longitudinal study," Scientometrics, Springer;Akadémiai Kiadó, vol. 98(2), pages 1547-1565, February.
    5. Isidro F. Aguillo, 2012. "Is Google Scholar useful for bibliometrics? A webometric analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 91(2), pages 343-351, May.
    6. Anne-Wil Harzing, 2014. "A longitudinal study of Google Scholar coverage between 2012 and 2013," Scientometrics, Springer;Akadémiai Kiadó, vol. 98(1), pages 565-575, January.
    7. Steve Lawrence & C. Lee Giles, 1999. "Accessibility of information on the web," Nature, Nature, vol. 400(6740), pages 107-107, July.
    8. Kayvan Kousha & Mike Thelwall, 2008. "Sources of Google Scholar citations outside the Science Citation Index: A comparison between four science disciplines," Scientometrics, Springer;Akadémiai Kiadó, vol. 74(2), pages 273-294, February.
    9. Lokman I. Meho & Kiduk Yang, 2007. "Impact of data sources on citation counts and rankings of LIS faculty: Web of science versus scopus and google scholar," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 58(13), pages 2105-2125, November.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Susanne Mikki & Øyvind L. Gjesdal & Tormod E. Strømme, 2018. "Grades of Openness: Open and Closed Articles in Norway," Publications, MDPI, vol. 6(4), pages 1-12, November.
    2. Alvarez-Meaza, Izaskun & Zarrabeitia-Bilbao, Enara & Rio-Belver, Rosa-María & Garechana-Anacabe, Gaizka, 2021. "Green scheduling to achieve green manufacturing: Pursuing a research agenda by mapping science," Technology in Society, Elsevier, vol. 67(C).
    3. Martín-Martín, Alberto & Costas, Rodrigo & van Leeuwen, Thed & Delgado López-Cózar, Emilio, 2018. "Evidence of open access of scientific publications in Google Scholar: A large-scale analysis," Journal of Informetrics, Elsevier, vol. 12(3), pages 819-841.
    4. John P A Ioannidis, 2018. "Meta-research: Why research on research matters," PLOS Biology, Public Library of Science, vol. 16(3), pages 1-6, March.
    5. Alberto Martín-Martín & Enrique Orduna-Malea & Emilio Delgado López-Cózar, 2018. "A novel method for depicting academic disciplines through Google Scholar Citations: The case of Bibliometrics," Scientometrics, Springer;Akadémiai Kiadó, vol. 114(3), pages 1251-1273, March.
    6. Sven E. Hug & Martin P. Brändle, 2017. "The coverage of Microsoft Academic: analyzing the publication output of a university," Scientometrics, Springer;Akadémiai Kiadó, vol. 113(3), pages 1551-1571, December.
    7. Simone Belli & Carlos Gonzalo-Penela, 2020. "Science, research, and innovation infospheres in Google results of the Ibero-American countries," Scientometrics, Springer;Akadémiai Kiadó, vol. 123(2), pages 635-653, May.
    8. Michael Lang, 2020. "Business Model Innovation Approaches: A Systematic Literature Review," Acta Universitatis Agriculturae et Silviculturae Mendelianae Brunensis, Mendel University Press, vol. 68(2), pages 435-449.
    9. Martín-Martín, Alberto & Orduna-Malea, Enrique & Thelwall, Mike & Delgado López-Cózar, Emilio, 2018. "Google Scholar, Web of Science, and Scopus: A systematic comparison of citations in 252 subject categories," Journal of Informetrics, Elsevier, vol. 12(4), pages 1160-1177.
    10. Enrique Orduña-Malea & Rodrigo Costas, 2021. "Link-based approach to study scientific software usage: the case of VOSviewer," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(9), pages 8153-8186, September.
    11. Halevi, Gali & Moed, Henk & Bar-Ilan, Judit, 2017. "Suitability of Google Scholar as a source of scientific information and as a source of data for scientific evaluation—Review of the Literature," Journal of Informetrics, Elsevier, vol. 11(3), pages 823-834.
    12. Alberto Martín-Martín & Enrique Orduna-Malea & Emilio Delgado López-Cózar, 2018. "Coverage of highly-cited documents in Google Scholar, Web of Science, and Scopus: a multidisciplinary comparison," Scientometrics, Springer;Akadémiai Kiadó, vol. 116(3), pages 2175-2188, September.
    13. Siviwe Bangani, 2018. "The impact of electronic theses and dissertations: a study of the institutional repository of a university in South Africa," Scientometrics, Springer;Akadémiai Kiadó, vol. 115(1), pages 131-151, April.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Martin-Martin, Alberto & Orduna-Malea, Enrique & Harzing, Anne-Wil & Delgado López-Cózar, Emilio, 2017. "Can we use Google Scholar to identify highly-cited documents?," Journal of Informetrics, Elsevier, vol. 11(1), pages 152-163.
    2. Hamid R. Jamali & Majid Nabavi, 2015. "Open access and sources of full-text articles in Google Scholar in different subject fields," Scientometrics, Springer;Akadémiai Kiadó, vol. 105(3), pages 1635-1651, December.
    3. Michael Gusenbauer, 2019. "Google Scholar to overshadow them all? Comparing the sizes of 12 academic search engines and bibliographic databases," Scientometrics, Springer;Akadémiai Kiadó, vol. 118(1), pages 177-214, January.
    4. Waltman, Ludo, 2016. "A review of the literature on citation impact indicators," Journal of Informetrics, Elsevier, vol. 10(2), pages 365-391.
    5. Moed, Henk F. & Bar-Ilan, Judit & Halevi, Gali, 2016. "A new methodology for comparing Google Scholar and Scopus," Journal of Informetrics, Elsevier, vol. 10(2), pages 533-551.
    6. Sergio Copiello, 2019. "The open access citation premium may depend on the openness and inclusiveness of the indexing database, but the relationship is controversial because it is ambiguous where the open access boundary lie," Scientometrics, Springer;Akadémiai Kiadó, vol. 121(2), pages 995-1018, November.
    7. Martín-Martín, Alberto & Orduna-Malea, Enrique & Thelwall, Mike & Delgado López-Cózar, Emilio, 2018. "Google Scholar, Web of Science, and Scopus: A systematic comparison of citations in 252 subject categories," Journal of Informetrics, Elsevier, vol. 12(4), pages 1160-1177.
    8. Antonio Cavacini, 2015. "What is the best database for computer science journal articles?," Scientometrics, Springer;Akadémiai Kiadó, vol. 102(3), pages 2059-2071, March.
    9. Anne-Wil Harzing & Satu Alakangas, 2016. "Google Scholar, Scopus and the Web of Science: a longitudinal and cross-disciplinary comparison," Scientometrics, Springer;Akadémiai Kiadó, vol. 106(2), pages 787-804, February.
    10. Enrique Orduna-Malea & Selenay Aytac & Clara Y. Tran, 2019. "Universities through the eyes of bibliographic databases: a retroactive growth comparison of Google Scholar, Scopus and Web of Science," Scientometrics, Springer;Akadémiai Kiadó, vol. 121(1), pages 433-450, October.
    11. Cristòfol Rovira & Lluís Codina & Frederic Guerrero-Solé & Carlos Lopezosa, 2019. "Ranking by Relevance and Citation Counts, a Comparative Study: Google Scholar, Microsoft Academic, WoS and Scopus," Future Internet, MDPI, vol. 11(9), pages 1-21, September.
    12. Cristòfol Rovira & Lluís Codina & Carlos Lopezosa, 2021. "Language Bias in the Google Scholar Ranking Algorithm," Future Internet, MDPI, vol. 13(2), pages 1-17, January.
    13. Muhammad Raheel & Samreen Ayaz & Muhammad Tanvir Afzal, 2018. "Evaluation of h-index, its variants and extensions based on publication age & citation intensity in civil engineering," Scientometrics, Springer;Akadémiai Kiadó, vol. 114(3), pages 1107-1127, March.
    14. Halevi, Gali & Moed, Henk & Bar-Ilan, Judit, 2017. "Suitability of Google Scholar as a source of scientific information and as a source of data for scientific evaluation—Review of the Literature," Journal of Informetrics, Elsevier, vol. 11(3), pages 823-834.
    15. Muhammad Salman & Mohammad Masroor Ahmed & Muhammad Tanvir Afzal, 2021. "Assessment of author ranking indices based on multi-authorship," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(5), pages 4153-4172, May.
    16. Simone Belli & Carlos Gonzalo-Penela, 2020. "Science, research, and innovation infospheres in Google results of the Ibero-American countries," Scientometrics, Springer;Akadémiai Kiadó, vol. 123(2), pages 635-653, May.
    17. Isidro F. Aguillo, 2012. "Is Google Scholar useful for bibliometrics? A webometric analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 91(2), pages 343-351, May.
    18. Loizides, Orestis-Stavros & Koutsakis, Polychronis, 2017. "On evaluating the quality of a computer science/computer engineering conference," Journal of Informetrics, Elsevier, vol. 11(2), pages 541-552.
    19. Judit Bar-Ilan, 2001. "Data collection methods on the Web for infometric purposes — A review and analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 50(1), pages 7-32, January.
    20. Peder Olesen Larsen & Markus Ins, 2010. "The rate of growth in scientific publication and the decline in coverage provided by Science Citation Index," Scientometrics, Springer;Akadémiai Kiadó, vol. 84(3), pages 575-603, September.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:scient:v:104:y:2015:i:3:d:10.1007_s11192-015-1614-6. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.