IDEAS home Printed from https://ideas.repec.org/a/spr/scient/v120y2019i3d10.1007_s11192-019-03162-4.html
   My bibliography  Save this article

Types of DOI errors of cited references in Web of Science with a cleaning method

Author

Listed:
  • Shuo Xu

    (Beijing University of Technology)

  • Liyuan Hao

    (Beijing University of Technology)

  • Xin An

    (Beijing Forestry University)

  • Dongsheng Zhai

    (Beijing University of Technology)

  • Hongshen Pang

    (Library, Shenzhen University)

Abstract

Though the bibliographic databases, such as Web of Science (WoS), largely promote the development of scientometrics and informetrics, these databases are not free of errors. The main purpose of this work is to figure out which types of DOI errors of cited references exist, how often each type of errors occur, and whether it is possible to automatically correct these errors. After careful analysis, several classic DOI errors of cited references, such as prefix-, suffix- and other-type errors, are identified, Then, a cleaning method is put forward on the basis of regular expressions. Experimental results on the bibliographic data in the gene editing field from the WoS database indicate that our cleaning approach can improve largely the quality of DOI names of cited references.

Suggested Citation

  • Shuo Xu & Liyuan Hao & Xin An & Dongsheng Zhai & Hongshen Pang, 2019. "Types of DOI errors of cited references in Web of Science with a cleaning method," Scientometrics, Springer;Akadémiai Kiadó, vol. 120(3), pages 1427-1437, September.
  • Handle: RePEc:spr:scient:v:120:y:2019:i:3:d:10.1007_s11192-019-03162-4
    DOI: 10.1007/s11192-019-03162-4
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11192-019-03162-4
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11192-019-03162-4?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Junwen Zhu & Guangyuan Hu & Weishu Liu, 2019. "DOI errors and possible solutions for Web of Science," Scientometrics, Springer;Akadémiai Kiadó, vol. 118(2), pages 709-718, February.
    2. Markus Goldstein & Seiichi Uchida, 2016. "A Comparative Evaluation of Unsupervised Anomaly Detection Algorithms for Multivariate Data," PLOS ONE, Public Library of Science, vol. 11(4), pages 1-31, April.
    3. Li Tang & Guangyuan Hu & Weishu Liu, 2017. "Funding acknowledgment analysis: Queries and caveats," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 68(3), pages 790-794, March.
    4. Fiorenzo Franceschini & Domenico Maisano & Luca Mastrogiacomo, 2013. "A novel approach for estimating the omitted‐citation rate of bibliometric databases with an application to the field of bibliometrics," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 64(10), pages 2149-2156, October.
    5. Erwin Krauskopf, 2019. "Missing documents in Scopus: the case of the journal Enfermeria Nefrologica," Scientometrics, Springer;Akadémiai Kiadó, vol. 119(1), pages 543-547, April.
    6. Gorraiz, Juan & Melero-Fuentes, David & Gumpenberger, Christian & Valderrama-Zurián, Juan-Carlos, 2016. "Availability of digital object identifiers (DOIs) in Web of Science and Scopus," Journal of Informetrics, Elsevier, vol. 10(1), pages 98-109.
    7. Franceschini, Fiorenzo & Maisano, Domenico & Mastrogiacomo, Luca, 2016. "The museum of errors/horrors in Scopus," Journal of Informetrics, Elsevier, vol. 10(1), pages 174-182.
    8. Fiorenzo Franceschini & Domenico Maisano & Luca Mastrogiacomo, 2013. "A novel approach for estimating the omitted-citation rate of bibliometric databases with an application to the field of bibliometrics," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 64(10), pages 2149-2156, October.
    9. Shuo Xu & Junwan Liu & Dongsheng Zhai & Xin An & Zheng Wang & Hongshen Pang, 2018. "Overlapping thematic structures extraction with mixed-membership stochastic blockmodel," Scientometrics, Springer;Akadémiai Kiadó, vol. 117(1), pages 61-84, October.
    10. Franceschini, Fiorenzo & Maisano, Domenico & Mastrogiacomo, Luca, 2014. "Scientific journal publishers and omitted citations in bibliometric databases: Any relationship?," Journal of Informetrics, Elsevier, vol. 8(3), pages 751-765.
    11. Valderrama-Zurián, Juan-Carlos & Aguilar-Moya, Remedios & Melero-Fuentes, David & Aleixandre-Benavent, Rafael, 2015. "A systematic analysis of duplicate records in Scopus," Journal of Informetrics, Elsevier, vol. 9(3), pages 570-576.
    12. Liu, Weishu & Hu, Guangyuan & Tang, Li, 2018. "Missing author address information in Web of Science—An explorative study," Journal of Informetrics, Elsevier, vol. 12(3), pages 985-997.
    13. Christophe Boudry & Ghislaine Chartron, 2017. "Availability of digital object identifiers in publications archived by PubMed," Scientometrics, Springer;Akadémiai Kiadó, vol. 110(3), pages 1453-1469, March.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Shuo Xu & Mengjia An & Xin An, 2021. "Do scientific publications by editorial board members have shorter publication delays and then higher influence?," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(8), pages 6697-6713, August.
    2. Alessia Cioffi & Sara Coppini & Arcangelo Massari & Arianna Moretti & Silvio Peroni & Cristian Santini & Nooshin Shahidzadeh Asadi, 2022. "Identifying and correcting invalid citations due to DOI errors in Crossref data," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(6), pages 3593-3612, June.
    3. Xu, Shuo & Hao, Liyuan & An, Xin & Yang, Guancan & Wang, Feifei, 2019. "Emerging research topics detection with multiple machine learning models," Journal of Informetrics, Elsevier, vol. 13(4).
    4. Junwan Liu & Rui Wang & Shuo Xu, 2021. "What academic mobility configurations contribute to high performance: an fsQCA analysis of CSC-funded visiting scholars," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(2), pages 1079-1100, February.
    5. Shuo Xu & Ling Li & Xin An & Liyuan Hao & Guancan Yang, 2021. "An approach for detecting the commonality and specialty between scientific publications and patents," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(9), pages 7445-7475, September.
    6. Gerson Pech & Catarina Delgado, 2020. "Assessing the publication impact using citation data from both Scopus and WoS databases: an approach validated in 15 research fields," Scientometrics, Springer;Akadémiai Kiadó, vol. 125(2), pages 909-924, November.
    7. Abdelghani Maddi & Lesya Baudoin, 2022. "The quality of the web of science data: a longitudinal study on the completeness of authors-addresses links," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(11), pages 6279-6292, November.
    8. Weishu Liu & Meiting Huang & Haifeng Wang, 2021. "Same journal but different numbers of published records indexed in Scopus and Web of Science Core Collection: causes, consequences, and solutions," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(5), pages 4541-4550, May.
    9. Raminta Pranckutė, 2021. "Web of Science (WoS) and Scopus: The Titans of Bibliographic Information in Today’s Academic World," Publications, MDPI, vol. 9(1), pages 1-59, March.
    10. Weishu Liu, 2020. "Accuracy of funding information in Scopus: a comparative case study," Scientometrics, Springer;Akadémiai Kiadó, vol. 124(1), pages 803-811, July.
    11. Wang, Feifei & Dong, Jiaxin & Lu, Wanzhao & Xu, Shuo, 2023. "Collaboration prediction based on multilayer all-author tripartite citation networks: A case study of gene editing," Journal of Informetrics, Elsevier, vol. 17(1).
    12. Xu, Shuo & Hao, Liyuan & Yang, Guancan & Lu, Kun & An, Xin, 2021. "A topic models based framework for detecting and forecasting emerging technologies," Technological Forecasting and Social Change, Elsevier, vol. 162(C).
    13. Wang, Feifei & Jia, Chenran & Wang, Xiaohan & Liu, Junwan & Xu, Shuo & Liu, Yang & Yang, Chenyuyan, 2019. "Exploring all-author tripartite citation networks: A case study of gene editing," Journal of Informetrics, Elsevier, vol. 13(3), pages 856-873.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Waltman, Ludo, 2016. "A review of the literature on citation impact indicators," Journal of Informetrics, Elsevier, vol. 10(2), pages 365-391.
    2. Raminta Pranckutė, 2021. "Web of Science (WoS) and Scopus: The Titans of Bibliographic Information in Today’s Academic World," Publications, MDPI, vol. 9(1), pages 1-59, March.
    3. Junwen Zhu & Fang Liu & Weishu Liu, 2019. "The secrets behind Web of Science’s DOI search," Scientometrics, Springer;Akadémiai Kiadó, vol. 119(3), pages 1745-1753, June.
    4. Shirley Ainsworth & Jane M. Russell, 2018. "Has hosting on science direct improved the visibility of Latin American scholarly journals? A preliminary analysis of data quality," Scientometrics, Springer;Akadémiai Kiadó, vol. 115(3), pages 1463-1484, June.
    5. Junwen Zhu & Guangyuan Hu & Weishu Liu, 2019. "DOI errors and possible solutions for Web of Science," Scientometrics, Springer;Akadémiai Kiadó, vol. 118(2), pages 709-718, February.
    6. Weishu Liu, 2020. "Accuracy of funding information in Scopus: a comparative case study," Scientometrics, Springer;Akadémiai Kiadó, vol. 124(1), pages 803-811, July.
    7. Xiaoling Huang & Lei Wang & Weishu Liu, 2023. "Identification of national research output using Scopus/Web of Science Core Collection: a revisit and further investigation," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(4), pages 2337-2347, April.
    8. Alessia Cioffi & Sara Coppini & Arcangelo Massari & Arianna Moretti & Silvio Peroni & Cristian Santini & Nooshin Shahidzadeh Asadi, 2022. "Identifying and correcting invalid citations due to DOI errors in Crossref data," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(6), pages 3593-3612, June.
    9. Weishu Liu & Meiting Huang & Haifeng Wang, 2021. "Same journal but different numbers of published records indexed in Scopus and Web of Science Core Collection: causes, consequences, and solutions," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(5), pages 4541-4550, May.
    10. Thelwall, Mike, 2018. "Microsoft Academic automatic document searches: Accuracy for journal articles and suitability for citation analysis," Journal of Informetrics, Elsevier, vol. 12(1), pages 1-9.
    11. Weishu Liu, 2021. "A matter of time: publication dates in Web of Science Core Collection," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(1), pages 849-857, January.
    12. Hui Li & Weishu Liu, 2020. "Same same but different: self-citations identified through Scopus and Web of Science Core Collection," Scientometrics, Springer;Akadémiai Kiadó, vol. 124(3), pages 2723-2732, September.
    13. Abdelghani Maddi & Lesya Baudoin, 2022. "The quality of the web of science data: a longitudinal study on the completeness of authors-addresses links," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(11), pages 6279-6292, November.
    14. Weishu Liu & Li Tang & Guangyuan Hu, 2020. "Funding information in Web of Science: an updated overview," Scientometrics, Springer;Akadémiai Kiadó, vol. 122(3), pages 1509-1524, March.
    15. Guangyuan Hu & Lei Wang & Rong Ni & Weishu Liu, 2020. "Which h-index? An exploration within the Web of Science," Scientometrics, Springer;Akadémiai Kiadó, vol. 123(3), pages 1225-1233, June.
    16. Weishu Liu, 2019. "The data source of this study is Web of Science Core Collection? Not enough," Scientometrics, Springer;Akadémiai Kiadó, vol. 121(3), pages 1815-1824, December.
    17. Fiorenzo Franceschini & Domenico Maisano & Luca Mastrogiacomo, 2016. "Do Scopus and WoS correct “old” omitted citations?," Scientometrics, Springer;Akadémiai Kiadó, vol. 107(2), pages 321-335, May.
    18. Franceschini, Fiorenzo & Maisano, Domenico & Mastrogiacomo, Luca, 2016. "Empirical analysis and classification of database errors in Scopus and Web of Science," Journal of Informetrics, Elsevier, vol. 10(4), pages 933-953.
    19. Christophe Boudry & Ghislaine Chartron, 2017. "Availability of digital object identifiers in publications archived by PubMed," Scientometrics, Springer;Akadémiai Kiadó, vol. 110(3), pages 1453-1469, March.
    20. Sergio Copiello, 2019. "The open access citation premium may depend on the openness and inclusiveness of the indexing database, but the relationship is controversial because it is ambiguous where the open access boundary lie," Scientometrics, Springer;Akadémiai Kiadó, vol. 121(2), pages 995-1018, November.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:scient:v:120:y:2019:i:3:d:10.1007_s11192-019-03162-4. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.