IDEAS home Printed from https://ideas.repec.org/a/spr/scient/v124y2020i1d10.1007_s11192-020-03430-8.html
   My bibliography  Save this article

How many preprints have actually been printed and why: a case study of computer science preprints on arXiv

Author

Listed:
  • Jialiang Lin

    (Xiamen University)

  • Yao Yu

    (Xiamen University)

  • Yu Zhou

    (Xiamen University)

  • Zhiyang Zhou

    (Xiamen University)

  • Xiaodong Shi

    (Xiamen University)

Abstract

Preprints play an increasingly critical role in academic communities. There are many reasons driving researchers to post their manuscripts to preprint servers before formal submission to journals or conferences, but the use of preprints has also sparked considerable controversy, especially surrounding the claim of priority. In this paper, a case study of computer science preprints submitted to arXiv from 2008 to 2017 is conducted to quantify how many preprints have eventually been printed in peer-reviewed venues. Among those published manuscripts, some are published under different titles and without an update to their preprints on arXiv. In the case of these manuscripts, the traditional fuzzy matching method is incapable of mapping the preprint to the final published version. In view of this issue, we introduce a semantics-based mapping method with the employment of Bidirectional Encoder Representations from Transformers (BERT). With this new mapping method and a plurality of data sources, we find that 66% of all sampled preprints are published under unchanged titles and 11% are published under different titles and with other modifications. A further analysis was then performed to investigate why these preprints but not others were accepted for publication. Our comparison reveals that in the field of computer science, published preprints feature adequate revisions, multiple authorship, detailed abstract and introduction, extensive and authoritative references and available source code.

Suggested Citation

  • Jialiang Lin & Yao Yu & Yu Zhou & Zhiyang Zhou & Xiaodong Shi, 2020. "How many preprints have actually been printed and why: a case study of computer science preprints on arXiv," Scientometrics, Springer;Akadémiai Kiadó, vol. 124(1), pages 555-574, July.
  • Handle: RePEc:spr:scient:v:124:y:2020:i:1:d:10.1007_s11192-020-03430-8
    DOI: 10.1007/s11192-020-03430-8
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11192-020-03430-8
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11192-020-03430-8?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Martin-Martin, Alberto & Orduna-Malea, Enrique & Harzing, Anne-Wil & Delgado López-Cózar, Emilio, 2017. "Can we use Google Scholar to identify highly-cited documents?," Journal of Informetrics, Elsevier, vol. 11(1), pages 152-163.
    2. Philip M. Davis & Michael J. Fromerth, 2007. "Does the arXiv lead to higher citations and reduced publisher downloads for mathematics articles?," Scientometrics, Springer;Akadémiai Kiadó, vol. 71(2), pages 203-215, May.
    3. Björk, Bo-Christer & Solomon, David, 2013. "The publishing delay in scholarly peer-reviewed journals," Journal of Informetrics, Elsevier, vol. 7(4), pages 914-923.
    4. George Vrettas & Mark Sanderson, 2015. "Conferences versus journals in computer science," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 66(12), pages 2674-2684, December.
    5. Antonio Cavacini, 2015. "What is the best database for computer science journal articles?," Scientometrics, Springer;Akadémiai Kiadó, vol. 102(3), pages 2059-2071, March.
    6. Vincent Larivière & Cassidy R. Sugimoto & Benoit Macaluso & Staša Milojević & Blaise Cronin & Mike Thelwall, 2014. "arXiv E-prints and the journal of record: An analysis of roles and relationships," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 65(6), pages 1157-1169, June.
    7. Tim Brody & Stevan Harnad & Leslie Carr, 2006. "Earlier Web usage statistics as predictors of later citation impact," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 57(8), pages 1060-1072, June.
    8. Kayvan Kousha & Mike Thelwall, 2008. "Sources of Google Scholar citations outside the Science Citation Index: A comparison between four science disciplines," Scientometrics, Springer;Akadémiai Kiadó, vol. 74(2), pages 273-294, February.
    9. Martín-Martín, Alberto & Orduna-Malea, Enrique & Thelwall, Mike & Delgado López-Cózar, Emilio, 2018. "Google Scholar, Web of Science, and Scopus: A systematic comparison of citations in 252 subject categories," Journal of Informetrics, Elsevier, vol. 12(4), pages 1160-1177.
    10. Paul Ginsparg, 2011. "ArXiv at 20," Nature, Nature, vol. 476(7359), pages 145-147, August.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Tang, Xuli & Li, Xin & Ding, Ying & Song, Min & Bu, Yi, 2020. "The pace of artificial intelligence innovations: Speed, talent, and trial-and-error," Journal of Informetrics, Elsevier, vol. 14(4).
    2. Guillaume Cabanac & Theodora Oikonomidi & Isabelle Boutron, 2021. "Day-to-day discovery of preprint–publication links," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(6), pages 5285-5304, June.
    3. Jialiang Lin & Yao Yu & Jiaxin Song & Xiaodong Shi, 2022. "Detecting and analyzing missing citations to published scientific entities," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(5), pages 2395-2412, May.
    4. Akbaritabar, Aliakbar & Stephen, Dimity & Squazzoni, Flaminio, 2022. "A study of referencing changes in preprint-publication pairs across multiple fields," Journal of Informetrics, Elsevier, vol. 16(2).

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Wang, Zhiqi & Chen, Yue & Glänzel, Wolfgang, 2020. "Preprints as accelerator of scholarly communication: An empirical analysis in Mathematics," Journal of Informetrics, Elsevier, vol. 14(4).
    2. Zhiqi Wang & Wolfgang Glänzel & Yue Chen, 2020. "The impact of preprints in Library and Information Science: an analysis of citations, usage and social attention indicators," Scientometrics, Springer;Akadémiai Kiadó, vol. 125(2), pages 1403-1423, November.
    3. Tanya Araújo & Elsa Fontainha, 2018. "Are scientific memes inherited differently from gendered authorship?," Scientometrics, Springer;Akadémiai Kiadó, vol. 117(2), pages 953-972, November.
    4. Waltman, Ludo, 2016. "A review of the literature on citation impact indicators," Journal of Informetrics, Elsevier, vol. 10(2), pages 365-391.
    5. Sergio Copiello, 2019. "The open access citation premium may depend on the openness and inclusiveness of the indexing database, but the relationship is controversial because it is ambiguous where the open access boundary lie," Scientometrics, Springer;Akadémiai Kiadó, vol. 121(2), pages 995-1018, November.
    6. Liwei Zhang & Jue Wang, 2021. "What affects publications’ popularity on Twitter?," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(11), pages 9185-9198, November.
    7. Michael Gusenbauer, 2022. "Search where you will find most: Comparing the disciplinary coverage of 56 bibliographic databases," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(5), pages 2683-2745, May.
    8. Matthew Cobb, 2017. "The prehistory of biology preprints: A forgotten experiment from the 1960s," PLOS Biology, Public Library of Science, vol. 15(11), pages 1-12, November.
    9. Vivek Kumar Singh & Satya Swarup Srichandan & Hiran H. Lathabai, 2022. "ResearchGate and Google Scholar: how much do they differ in publications, citations and different metrics and why?," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(3), pages 1515-1542, March.
    10. Rongying Zhao & Mingkun Wei, 2017. "Academic impact evaluation of Wechat in view of social media perspective," Scientometrics, Springer;Akadémiai Kiadó, vol. 112(3), pages 1777-1791, September.
    11. Michael Gusenbauer, 2019. "Google Scholar to overshadow them all? Comparing the sizes of 12 academic search engines and bibliographic databases," Scientometrics, Springer;Akadémiai Kiadó, vol. 118(1), pages 177-214, January.
    12. Alberto Martín-Martín & Enrique Orduna-Malea & Emilio Delgado López-Cózar, 2018. "Coverage of highly-cited documents in Google Scholar, Web of Science, and Scopus: a multidisciplinary comparison," Scientometrics, Springer;Akadémiai Kiadó, vol. 116(3), pages 2175-2188, September.
    13. Frandsen, Tove Faber, 2009. "The effects of open access on un-published documents: A case study of economics working papers," Journal of Informetrics, Elsevier, vol. 3(2), pages 124-133.
    14. Alberto Martín-Martín & Mike Thelwall & Enrique Orduna-Malea & Emilio Delgado López-Cózar, 2021. "Google Scholar, Microsoft Academic, Scopus, Dimensions, Web of Science, and OpenCitations’ COCI: a multidisciplinary comparison of coverage via citations," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(1), pages 871-906, January.
    15. Csomós, György, 2020. "Introducing recalibrated academic performance indicators in the evaluation of individuals’ research performance: A case study from Eastern Europe," Journal of Informetrics, Elsevier, vol. 14(4).
    16. Casey Eaton & Amanda Banks & Kristin Weger & Bryan Mesmer & Robert Moreland, 2023. "Understanding perceived influencers on project outcomes and quantifying disciplinary similarities in academic literature," Systems Research and Behavioral Science, Wiley Blackwell, vol. 40(3), pages 460-487, May.
    17. David Jancsics & Salvador Espinosa & Jonathan Carlos, 2023. "Organizational noncompliance: an interdisciplinary review of social and organizational factors," Management Review Quarterly, Springer, vol. 73(3), pages 1273-1301, September.
    18. Jingqi Gao & Xiang Wu & Xiaowei Luo & Shukai Guan, 2021. "Scientometric Analysis of Safety Sign Research: 1990–2019," IJERPH, MDPI, vol. 18(1), pages 1-15, January.
    19. Cristina Robledo-Ardila & Juan Pablo Román-Calderón, 2022. "Potential: in search for meaning, theory and avenues for future research a systematic review," Management Review Quarterly, Springer, vol. 72(1), pages 149-186, February.
    20. Citron, Daniel T. & Way, Samuel F., 2018. "Network assembly of scientific communities of varying size and specificity," Journal of Informetrics, Elsevier, vol. 12(1), pages 181-190.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:scient:v:124:y:2020:i:1:d:10.1007_s11192-020-03430-8. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.