IDEAS home Printed from https://ideas.repec.org/a/spr/scient/v126y2021i12d10.1007_s11192-021-04191-8.html
   My bibliography  Save this article

Finding citations for PubMed: a large-scale comparison between five freely available bibliographic data sources

Author

Listed:
  • Zhentao Liang

    (Wuhan University
    Wuhan University)

  • Jin Mao

    (Wuhan University
    Wuhan University)

  • Kun Lu

    (University of Oklahoma)

  • Gang Li

    (Wuhan University
    Wuhan University)

Abstract

As an important biomedical database, PubMed provides users with free access to abstracts of its documents. However, citations between these documents need to be collected from external data sources. Although previous studies have investigated the coverage of various data sources, the quality of citations is underexplored. In response, this study compares the coverage and citation quality of five freely available data sources on 30 million PubMed documents, including OpenCitations Index of CrossRef open DOI-to-DOI citations (COCI), Dimensions, Microsoft Academic Graph (MAG), National Institutes of Health’s Open Citation Collection (NIH-OCC), and Semantic Scholar Open Research Corpus (S2ORC). Three gold standards and five metrics are introduced to evaluate the correctness and completeness of citations. Our results indicate that Dimensions is the most comprehensive data source that provides references for 62.4% of PubMed documents, outperforming the official NIH-OCC dataset (56.7%). Over 90% of citation links in other data sources can also be found in Dimensions. The coverage of MAG, COCI, and S2ORC is 59.6%, 34.7%, and 23.5%, respectively. Regarding the citation quality, Dimensions and NIH-OCC achieve the best overall results. Almost all data sources have a precision higher than 90%, but their recall is much lower. All databases have better performances on recent publications than earlier ones. Meanwhile, the gaps between different data sources have diminished for the documents published in recent years. This study provides evidence for researchers to choose suitable PubMed citation sources, which is also helpful for evaluating the citation quality of free bibliographic databases.

Suggested Citation

  • Zhentao Liang & Jin Mao & Kun Lu & Gang Li, 2021. "Finding citations for PubMed: a large-scale comparison between five freely available bibliographic data sources," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(12), pages 9519-9542, December.
  • Handle: RePEc:spr:scient:v:126:y:2021:i:12:d:10.1007_s11192-021-04191-8
    DOI: 10.1007/s11192-021-04191-8
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11192-021-04191-8
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11192-021-04191-8?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Vivek Kumar Singh & Prashasti Singh & Mousumi Karmakar & Jacqueline Leta & Philipp Mayr, 2021. "The journal coverage of Web of Science, Scopus and Dimensions: A comparative analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(6), pages 5113-5142, June.
    2. Qi Wang, 2018. "A bibliometric model for identifying emerging research topics," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 69(2), pages 290-304, February.
    3. Thelwall, Mike, 2017. "Microsoft Academic: A multidisciplinary comparison of citation counts with Scopus and Mendeley for 29 journals," Journal of Informetrics, Elsevier, vol. 11(4), pages 1201-1212.
    4. Anne-Wil Harzing, 2016. "Microsoft Academic (Search): a Phoenix arisen from the ashes?," Scientometrics, Springer;Akadémiai Kiadó, vol. 108(3), pages 1637-1647, September.
    5. Anne-Wil Harzing & Satu Alakangas, 2017. "Microsoft Academic: is the phoenix getting wings?," Scientometrics, Springer;Akadémiai Kiadó, vol. 110(1), pages 371-383, January.
    6. Joost C. F. Winter & Amir A. Zadpoor & Dimitra Dodou, 2014. "The expansion of Google Scholar versus Web of Science: a longitudinal study," Scientometrics, Springer;Akadémiai Kiadó, vol. 98(2), pages 1547-1565, February.
    7. Philippe Mongeon & Adèle Paul-Hus, 2016. "The journal coverage of Web of Science and Scopus: a comparative analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 106(1), pages 213-228, January.
    8. Small, Henry & Boyack, Kevin W. & Klavans, Richard, 2014. "Identifying emerging topics in science and technology," Research Policy, Elsevier, vol. 43(8), pages 1450-1467.
    9. Mei Hsiu-Ching Ho & John S. Liu, 2021. "The swift knowledge development path of COVID-19 research: the first 150 days," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(3), pages 2391-2399, March.
    10. Xianwen Wang & Chen Liu & Wenli Mao & Zhichao Fang, 2015. "Erratum to: The open access advantage considering citation, article usage and social media attention," Scientometrics, Springer;Akadémiai Kiadó, vol. 103(3), pages 1149-1149, June.
    11. Anne-Wil Harzing & Satu Alakangas, 2017. "Microsoft Academic is one year old: the Phoenix is ready to leave the nest," Scientometrics, Springer;Akadémiai Kiadó, vol. 112(3), pages 1887-1894, September.
    12. Ivan Heibi & Silvio Peroni & David Shotton, 2019. "Software review: COCI, the OpenCitations Index of Crossref open DOI-to-DOI citations," Scientometrics, Springer;Akadémiai Kiadó, vol. 121(2), pages 1213-1228, November.
    13. Tahamtan, Iman & Bornmann, Lutz, 2018. "Core elements in the process of citing publications: Conceptual overview of the literature," Journal of Informetrics, Elsevier, vol. 12(1), pages 203-216.
    14. Sven E. Hug & Michael Ochsner & Martin P. Brändle, 2017. "Citation analysis with microsoft academic," Scientometrics, Springer;Akadémiai Kiadó, vol. 111(1), pages 371-378, April.
    15. Chen, Baitong & Tsutsui, Satoshi & Ding, Ying & Ma, Feicheng, 2017. "Understanding the topic evolution in a scientific domain: An exploratory study for the field of information retrieval," Journal of Informetrics, Elsevier, vol. 11(4), pages 1175-1189.
    16. Xianwen Wang & Chen Liu & Wenli Mao & Zhichao Fang, 2015. "The open access advantage considering citation, article usage and social media attention," Scientometrics, Springer;Akadémiai Kiadó, vol. 103(2), pages 555-564, May.
    17. Junwen Zhu & Weishu Liu, 2020. "A tale of two databases: the use of Web of Science and Scopus in academic papers," Scientometrics, Springer;Akadémiai Kiadó, vol. 123(1), pages 321-335, April.
    18. Anne-Wil Harzing, 2019. "Two new kids on the block: How do Crossref and Dimensions compare with Google Scholar, Microsoft Academic, Scopus and the Web of Science?," Scientometrics, Springer;Akadémiai Kiadó, vol. 120(1), pages 341-349, July.
    19. Martín-Martín, Alberto & Orduna-Malea, Enrique & Thelwall, Mike & Delgado López-Cózar, Emilio, 2018. "Google Scholar, Web of Science, and Scopus: A systematic comparison of citations in 252 subject categories," Journal of Informetrics, Elsevier, vol. 12(4), pages 1160-1177.
    20. Robin Haunschild & Sven E. Hug & Martin P. Brändle & Lutz Bornmann, 2018. "The number of linked references of publications in Microsoft Academic in comparison with the Web of Science," Scientometrics, Springer;Akadémiai Kiadó, vol. 114(1), pages 367-370, January.
    21. Sven E. Hug & Martin P. Brändle, 2017. "The coverage of Microsoft Academic: analyzing the publication output of a university," Scientometrics, Springer;Akadémiai Kiadó, vol. 113(3), pages 1551-1571, December.
    22. Björn Hammarfelt, 2011. "Interdisciplinarity and the intellectual base of literature studies: citation analysis of highly cited monographs," Scientometrics, Springer;Akadémiai Kiadó, vol. 86(3), pages 705-725, March.
    23. Hu, Xiaojun & Rousseau, Ronald & Chen, Jin, 2011. "On the definition of forward and backward citation generations," Journal of Informetrics, Elsevier, vol. 5(1), pages 27-36.
    24. Yi Zhang & Xiaojing Cai & Caroline V. Fry & Mengjia Wu & Caroline S. Wagner, 2021. "Topic evolution, disruption and resilience in early COVID-19 research," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(5), pages 4225-4253, May.
    25. Ghassan Abdul-Majeed & Wissam Mahmood & Nasri S. M. Namer, 2021. "Measuring research performance of Iraqi universities using Scopus data," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(3), pages 2349-2363, March.
    26. Xiaoyao Han, 2020. "Evolution of research topics in LIS between 1996 and 2019: an analysis based on latent Dirichlet allocation topic model," Scientometrics, Springer;Akadémiai Kiadó, vol. 125(3), pages 2561-2595, December.
    27. Mike Thelwall, 2016. "Interpreting correlations between citation counts and other indicators," Scientometrics, Springer;Akadémiai Kiadó, vol. 108(1), pages 337-347, July.
    28. Teja Koler-Povh & Primož Južnič & Goran Turk, 2014. "Impact of open access on citation of scholarly publications in the field of civil engineering," Scientometrics, Springer;Akadémiai Kiadó, vol. 98(2), pages 1033-1045, February.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Li, Xin & Tang, Xuli & Cheng, Qikai, 2022. "Predicting the clinical citation count of biomedical papers using multilayer perceptron neural network," Journal of Informetrics, Elsevier, vol. 16(4).
    2. Yuyan Jiang & Xueli Liu, 2023. "A construction and empirical research of the journal disruption index based on open citation data," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(7), pages 3935-3958, July.
    3. Zhentao Liang & Jin Mao & Gang Li, 2023. "Bias against scientific novelty: A prepublication perspective," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 74(1), pages 99-114, January.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Alberto Martín-Martín & Mike Thelwall & Enrique Orduna-Malea & Emilio Delgado López-Cózar, 2021. "Google Scholar, Microsoft Academic, Scopus, Dimensions, Web of Science, and OpenCitations’ COCI: a multidisciplinary comparison of coverage via citations," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(1), pages 871-906, January.
    2. Thelwall, Mike, 2018. "Microsoft Academic automatic document searches: Accuracy for journal articles and suitability for citation analysis," Journal of Informetrics, Elsevier, vol. 12(1), pages 1-9.
    3. Raminta Pranckutė, 2021. "Web of Science (WoS) and Scopus: The Titans of Bibliographic Information in Today’s Academic World," Publications, MDPI, vol. 9(1), pages 1-59, March.
    4. Anne-Wil Harzing, 2019. "Two new kids on the block: How do Crossref and Dimensions compare with Google Scholar, Microsoft Academic, Scopus and the Web of Science?," Scientometrics, Springer;Akadémiai Kiadó, vol. 120(1), pages 341-349, July.
    5. Kousha, Kayvan & Thelwall, Mike, 2018. "Can Microsoft Academic help to assess the citation impact of academic books?," Journal of Informetrics, Elsevier, vol. 12(3), pages 972-984.
    6. Kousha, Kayvan & Thelwall, Mike & Abdoli, Mahshid, 2018. "Can Microsoft Academic assess the early citation impact of in-press articles? A multi-discipline exploratory analysis," Journal of Informetrics, Elsevier, vol. 12(1), pages 287-298.
    7. Mike Thelwall, 2018. "Does Microsoft Academic find early citations?," Scientometrics, Springer;Akadémiai Kiadó, vol. 114(1), pages 325-334, January.
    8. Michael Gusenbauer, 2022. "Search where you will find most: Comparing the disciplinary coverage of 56 bibliographic databases," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(5), pages 2683-2745, May.
    9. Michael Thelwall, 2018. "Can Microsoft Academic be used for citation analysis of preprint archives? The case of the Social Science Research Network," Scientometrics, Springer;Akadémiai Kiadó, vol. 115(2), pages 913-928, May.
    10. Toluwase Victor Asubiaro & Sodiq Onaolapo, 2023. "A comparative study of the coverage of African journals in Web of Science, Scopus, and CrossRef," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 74(7), pages 745-758, July.
    11. Robin Haunschild & Sven E. Hug & Martin P. Brändle & Lutz Bornmann, 2018. "The number of linked references of publications in Microsoft Academic in comparison with the Web of Science," Scientometrics, Springer;Akadémiai Kiadó, vol. 114(1), pages 367-370, January.
    12. Vivek Kumar Singh & Prashasti Singh & Mousumi Karmakar & Jacqueline Leta & Philipp Mayr, 2021. "The journal coverage of Web of Science, Scopus and Dimensions: A comparative analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(6), pages 5113-5142, June.
    13. Dunaiski, Marcel & Geldenhuys, Jaco & Visser, Willem, 2019. "On the interplay between normalisation, bias, and performance of paper impact metrics," Journal of Informetrics, Elsevier, vol. 13(1), pages 270-290.
    14. Abdelghani Maddi & Aouatif de La Laurencie, 2018. "La dynamique des SHS françaises dans le Web of Science : un manque de représentativité ou de visibilité internationale ?," Working Papers hal-01922266, HAL.
    15. Chunli Wei & Jingyi Zhao & Jue Ni & Jiang Li, 2023. "What does open peer review bring to scientific articles? Evidence from PLoS journals," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(5), pages 2763-2776, May.
    16. Sergio Copiello, 2019. "The open access citation premium may depend on the openness and inclusiveness of the indexing database, but the relationship is controversial because it is ambiguous where the open access boundary lie," Scientometrics, Springer;Akadémiai Kiadó, vol. 121(2), pages 995-1018, November.
    17. Abdelghani Maddi & Aouatif De La Laurencie, 2018. "La dynamique des SHS françaises dans le Web of Science," CEPN Working Papers 2018-05, Centre d'Economie de l'Université de Paris Nord.
    18. Cristòfol Rovira & Lluís Codina & Frederic Guerrero-Solé & Carlos Lopezosa, 2019. "Ranking by Relevance and Citation Counts, a Comparative Study: Google Scholar, Microsoft Academic, WoS and Scopus," Future Internet, MDPI, vol. 11(9), pages 1-21, September.
    19. Sven E. Hug & Martin P. Brändle, 2017. "The coverage of Microsoft Academic: analyzing the publication output of a university," Scientometrics, Springer;Akadémiai Kiadó, vol. 113(3), pages 1551-1571, December.
    20. Michael Gusenbauer, 2019. "Google Scholar to overshadow them all? Comparing the sizes of 12 academic search engines and bibliographic databases," Scientometrics, Springer;Akadémiai Kiadó, vol. 118(1), pages 177-214, January.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:scient:v:126:y:2021:i:12:d:10.1007_s11192-021-04191-8. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.