IDEAS home Printed from https://ideas.repec.org/a/bla/jinfst/v75y2024i1p43-58.html
   My bibliography  Save this article

Why are these publications missing? Uncovering the reasons behind the exclusion of documents in free‐access scholarly databases

Author

Listed:
  • Lorena Delgado‐Quirós
  • Isidro F. Aguillo
  • Alberto Martín‐Martín
  • Emilio Delgado López‐Cózar
  • Enrique Orduña‐Malea
  • José Luis Ortega

Abstract

This study analyses the coverage of seven free‐access bibliographic databases (Crossref, Dimensions—non‐subscription version, Google Scholar, Lens, Microsoft Academic, Scilit, and Semantic Scholar) to identify the potential reasons that might cause the exclusion of scholarly documents and how they could influence coverage. To do this, 116 k randomly selected bibliographic records from Crossref were used as a baseline. API endpoints and web scraping were used to query each database. The results show that coverage differences are mainly caused by the way each service builds their databases. While classic bibliographic databases ingest almost the exact same content from Crossref (Lens and Scilit miss 0.1% and 0.2% of the records, respectively), academic search engines present lower coverage (Google Scholar does not find: 9.8%, Semantic Scholar: 10%, and Microsoft Academic: 12%). Coverage differences are mainly attributed to external factors, such as web accessibility and robot exclusion policies (39.2%–46%), and internal requirements that exclude secondary content (6.5%–11.6%). In the case of Dimensions, the only classic bibliographic database with the lowest coverage (7.6%), internal selection criteria such as the indexation of full books instead of book chapters (65%) and the exclusion of secondary content (15%) are the main motives of missing publications.

Suggested Citation

  • Lorena Delgado‐Quirós & Isidro F. Aguillo & Alberto Martín‐Martín & Emilio Delgado López‐Cózar & Enrique Orduña‐Malea & José Luis Ortega, 2024. "Why are these publications missing? Uncovering the reasons behind the exclusion of documents in free‐access scholarly databases," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 75(1), pages 43-58, January.
  • Handle: RePEc:bla:jinfst:v:75:y:2024:i:1:p:43-58
    DOI: 10.1002/asi.24839
    as

    Download full text from publisher

    File URL: https://doi.org/10.1002/asi.24839
    Download Restriction: no

    File URL: https://libkey.io/10.1002/asi.24839?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Anne-Wil Harzing & Satu Alakangas, 2017. "Microsoft Academic: is the phoenix getting wings?," Scientometrics, Springer;Akadémiai Kiadó, vol. 110(1), pages 371-383, January.
    2. Dalibor Fiala, 2011. "Mining citation information from CiteSeer data," Scientometrics, Springer;Akadémiai Kiadó, vol. 86(3), pages 553-562, March.
    3. José Luis Ortega & Isidro F. Aguillo, 2014. "Microsoft academic search and Google scholar citations: Comparative analysis of author profiles," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 65(6), pages 1149-1156, June.
    4. Philippe Mongeon & Adèle Paul-Hus, 2016. "The journal coverage of Web of Science and Scopus: a comparative analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 106(1), pages 213-228, January.
    5. Michael Gusenbauer, 2019. "Google Scholar to overshadow them all? Comparing the sizes of 12 academic search engines and bibliographic databases," Scientometrics, Springer;Akadémiai Kiadó, vol. 118(1), pages 177-214, January.
    6. Thelwall, Mike, 2018. "Dimensions: A competitor to Scopus and the Web of Science?," Journal of Informetrics, Elsevier, vol. 12(2), pages 430-435.
    7. Kayvan Kousha & Mike Thelwall & Somayeh Rezaie, 2011. "Assessing the citation impact of books: The role of Google Books, Google Scholar, and Scopus," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 62(11), pages 2147-2164, November.
    8. Lokman I. Meho & Kiduk Yang, 2007. "Impact of data sources on citation counts and rankings of LIS faculty: Web of science versus scopus and google scholar," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 58(13), pages 2105-2125, November.
    9. Anne-Wil Harzing & Satu Alakangas, 2017. "Microsoft Academic is one year old: the Phoenix is ready to leave the nest," Scientometrics, Springer;Akadémiai Kiadó, vol. 112(3), pages 1887-1894, September.
    10. Sven E. Hug & Martin P. Brändle, 2017. "The coverage of Microsoft Academic: analyzing the publication output of a university," Scientometrics, Springer;Akadémiai Kiadó, vol. 113(3), pages 1551-1571, December.
    11. Vivek Kumar Singh & Prashasti Singh & Mousumi Karmakar & Jacqueline Leta & Philipp Mayr, 2021. "The journal coverage of Web of Science, Scopus and Dimensions: A comparative analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(6), pages 5113-5142, June.
    12. M. Ryan Haley, 2014. "Ranking top economics and finance journals using Microsoft academic search versus Google scholar: How does the new publish or perish option compare?," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 65(5), pages 1079-1084, May.
    13. Mu‐hsuan Huang & Yu‐wei Chang, 2008. "Characteristics of research output in social sciences and humanities: From a research evaluation perspective," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 59(11), pages 1819-1828, September.
    14. Alberto Martín-Martín & Mike Thelwall & Enrique Orduna-Malea & Emilio Delgado López-Cózar, 2021. "Google Scholar, Microsoft Academic, Scopus, Dimensions, Web of Science, and OpenCitations’ COCI: a multidisciplinary comparison of coverage via citations," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(1), pages 871-906, January.
    15. Kayvan Kousha & Mike Thelwall & Somayeh Rezaie, 2011. "Assessing the citation impact of books: The role of Google Books, Google Scholar, and Scopus," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 62(11), pages 2147-2164, November.
    16. Anne-Wil Harzing, 2019. "Two new kids on the block: How do Crossref and Dimensions compare with Google Scholar, Microsoft Academic, Scopus and the Web of Science?," Scientometrics, Springer;Akadémiai Kiadó, vol. 120(1), pages 341-349, July.
    17. Alberto Martín-Martín & Mike Thelwall & Enrique Orduna-Malea & Emilio Delgado López-Cózar, 2021. "Correction to: Google Scholar, Microsoft Academic, Scopus, Dimensions, Web of Science, and OpenCitations’ COCI: a multidisciplinary comparison of coverage via citations," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(1), pages 907-908, January.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Michael Gusenbauer, 2022. "Search where you will find most: Comparing the disciplinary coverage of 56 bibliographic databases," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(5), pages 2683-2745, May.
    2. Raminta Pranckutė, 2021. "Web of Science (WoS) and Scopus: The Titans of Bibliographic Information in Today’s Academic World," Publications, MDPI, vol. 9(1), pages 1-59, March.
    3. Zhentao Liang & Jin Mao & Kun Lu & Gang Li, 2021. "Finding citations for PubMed: a large-scale comparison between five freely available bibliographic data sources," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(12), pages 9519-9542, December.
    4. Kayvan Kousha & Mike Thelwall, 2024. "Factors associating with or predicting more cited or higher quality journal articles: An Annual Review of Information Science and Technology (ARIST) paper," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 75(3), pages 215-244, March.
    5. Kousha, Kayvan & Thelwall, Mike, 2018. "Can Microsoft Academic help to assess the citation impact of academic books?," Journal of Informetrics, Elsevier, vol. 12(3), pages 972-984.
    6. Gabriel Alves Vieira & Jacqueline Leta, 2024. "biblioverlap: an R package for document matching across bibliographic datasets," Scientometrics, Springer;Akadémiai Kiadó, vol. 129(7), pages 4513-4527, July.
    7. Mike Thelwall, 2018. "Does Microsoft Academic find early citations?," Scientometrics, Springer;Akadémiai Kiadó, vol. 114(1), pages 325-334, January.
    8. Alberto Martín-Martín & Mike Thelwall & Enrique Orduna-Malea & Emilio Delgado López-Cózar, 2021. "Google Scholar, Microsoft Academic, Scopus, Dimensions, Web of Science, and OpenCitations’ COCI: a multidisciplinary comparison of coverage via citations," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(1), pages 871-906, January.
    9. Irina Gerasimov & Binita KC & Armin Mehrabian & James Acker & Michael P. McGuire, 2024. "Comparison of datasets citation coverage in Google Scholar, Web of Science, Scopus, Crossref, and DataCite," Scientometrics, Springer;Akadémiai Kiadó, vol. 129(7), pages 3681-3704, July.
    10. repec:hal:cepnwp:hal-01922266 is not listed on IDEAS
    11. Mike Thelwall, 2021. "Alternative medicines worth researching? Citation analyses of acupuncture, chiropractic, homeopathy, and osteopathy 1996–2017," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(10), pages 8731-8747, October.
    12. Dušan Nikolić & Dragan Ivanović & Lidija Ivanović, 2024. "An open-source tool for merging data from multiple citation databases," Scientometrics, Springer;Akadémiai Kiadó, vol. 129(7), pages 4573-4595, July.
    13. Thelwall, Mike, 2018. "Microsoft Academic automatic document searches: Accuracy for journal articles and suitability for citation analysis," Journal of Informetrics, Elsevier, vol. 12(1), pages 1-9.
    14. Abdelghani Maddi & Aouatif De La Laurencie, 2018. "La dynamique des SHS françaises dans le Web of Science," CEPN Working Papers 2018-05, Centre d'Economie de l'Université de Paris Nord.
    15. Anne-Wil Harzing, 2019. "Two new kids on the block: How do Crossref and Dimensions compare with Google Scholar, Microsoft Academic, Scopus and the Web of Science?," Scientometrics, Springer;Akadémiai Kiadó, vol. 120(1), pages 341-349, July.
    16. Qingqing Zhou, 2024. "Evaluating book impacts via integrating multi-source reviews," Scientometrics, Springer;Akadémiai Kiadó, vol. 129(11), pages 6931-6946, November.
    17. Shir Aviv-Reuven & Ariel Rosenfeld, 2023. "A logical set theory approach to journal subject classification analysis: intra-system irregularities and inter-system discrepancies in Web of Science and Scopus," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(1), pages 157-175, January.
    18. Steve J. Bickley & Ho Fai Chan & Benno Torgler, 2022. "Artificial intelligence in the field of economics," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(4), pages 2055-2084, April.
    19. Halevi, Gali & Moed, Henk & Bar-Ilan, Judit, 2017. "Suitability of Google Scholar as a source of scientific information and as a source of data for scientific evaluation—Review of the Literature," Journal of Informetrics, Elsevier, vol. 11(3), pages 823-834.
    20. Adriana Ana Maria Davidescu & Margareta-Stela Florescu & Liviu Cosmin Mosora & Mihaela Hrisanta Mosora & Eduard Mihai Manta, 2022. "A Bibliometric Analysis of Research Publications of the Bucharest University of Economic Studies in Time of Pandemics: Implications for Teachers’ Professional Publishing Activity," IJERPH, MDPI, vol. 19(14), pages 1-36, July.
    21. Mike Thelwall & Nabeil Maflahi, 2020. "Academic collaboration rates and citation associations vary substantially between countries and fields," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 71(8), pages 968-978, August.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:jinfst:v:75:y:2024:i:1:p:43-58. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: http://www.asis.org .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.