IDEAS home Printed from https://ideas.repec.org/a/bla/jinfst/v72y2021i12p1461-1476.html
   My bibliography  Save this article

Prevalence of nonsensical algorithmically generated papers in the scientific literature

Author

Listed:
  • Guillaume Cabanac
  • Cyril Labbé

Abstract

In 2014 leading publishers withdrew more than 120 nonsensical publications automatically generated with the SCIgen program. Casual observations suggested that similar problematic papers are still published and sold, without follow‐up retractions. No systematic screening has been performed and the prevalence of such nonsensical publications in the scientific literature is unknown. Our contribution is 2‐fold. First, we designed a detector that combs the scientific literature for grammar‐based computer‐generated papers. Applied to SCIgen, it has a 83.6% precision. Second, we performed a scientometric study of the 243 detected SCIgen‐papers from 19 publishers. We estimate the prevalence of SCIgen‐papers to be 75 per million papers in Information and Computing Sciences. Only 19% of the 243 problematic papers were dealt with: formal retraction (12) or silent removal (34). Publishers still serve and sometimes sell the remaining 197 papers without any caveat. We found evidence of citation manipulation via edited SCIgen bibliographies. This work reveals metric gaming up to the point of absurdity: fraudsters publish nonsensical algorithmically generated papers featuring genuine references. It stresses the need to screen papers for nonsense before peer‐review and chase citation manipulation in published papers. Overall, this is yet another illustration of the harmful effects of the pressure to publish or perish.

Suggested Citation

  • Guillaume Cabanac & Cyril Labbé, 2021. "Prevalence of nonsensical algorithmically generated papers in the scientific literature," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 72(12), pages 1461-1476, December.
  • Handle: RePEc:bla:jinfst:v:72:y:2021:i:12:p:1461-1476
    DOI: 10.1002/asi.24495
    as

    Download full text from publisher

    File URL: https://doi.org/10.1002/asi.24495
    Download Restriction: no

    File URL: https://libkey.io/10.1002/asi.24495?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Paul Ginsparg, 2014. "ArXiv screens spot fake papers," Nature, Nature, vol. 508(7494), pages 44-44, April.
    2. Philip Ball, 2005. "Computer conference welcomes gobbledegook paper," Nature, Nature, vol. 434(7036), pages 946-946, April.
    3. Diego Raphael Amancio, 2015. "Comparing the topological properties of real and artificially generated scientific manuscripts," Scientometrics, Springer;Akadémiai Kiadó, vol. 105(3), pages 1763-1779, December.
    4. Priyanka Pulla, 2019. "The plan to mine the world’s research papers," Nature, Nature, vol. 571(7765), pages 316-318, July.
    5. Anne-Wil Harzing, 2019. "Two new kids on the block: How do Crossref and Dimensions compare with Google Scholar, Microsoft Academic, Scopus and the Web of Science?," Scientometrics, Springer;Akadémiai Kiadó, vol. 120(1), pages 341-349, July.
    6. Guillaume Cabanac, 2016. "Bibliogifts in LibGen? A study of a text-sharing platform driven by biblioleaks and crowdsourcing," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 67(4), pages 874-884, April.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Ludovic Jeanne, 2024. "Textual imitations and artificial intelligence : a prospective essay on academic fraud," Post-Print hal-04794323, HAL.
    2. Fang Lei & Liang Du & Min Dong & Xuemei Liu, 2024. "Global retractions due to randomly generated content: Characterization and trends," Scientometrics, Springer;Akadémiai Kiadó, vol. 129(12), pages 7943-7958, December.
    3. Howell, Bronwyn E. & Potgieter, Petrus H., 2023. "AI-generated lemons: a sour outlook for content producers?," 32nd European Regional ITS Conference, Madrid 2023: Realising the digital decade in the European Union – Easier said than done? 277971, International Telecommunications Society (ITS).

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Jennifer A. Byrne & Cyril Labbé, 2017. "Striking similarities between publications from China describing single gene knockdown experiments in human cancer cell lines," Scientometrics, Springer;Akadémiai Kiadó, vol. 110(3), pages 1471-1493, March.
    2. Nguyen Minh Tien & Cyril Labbé, 2018. "Detecting automatically generated sentences with grammatical structure similarity," Scientometrics, Springer;Akadémiai Kiadó, vol. 116(2), pages 1247-1271, August.
    3. Hamid R. Jamali, 2017. "Copyright compliance and infringement in ResearchGate full-text journal articles," Scientometrics, Springer;Akadémiai Kiadó, vol. 112(1), pages 241-254, July.
    4. Kyle J. Burghardt & Bradley H. Howlett & Audrey S. Khoury & Stephanie M. Fern & Paul R. Burghardt, 2020. "Three Commonly Utilized Scholarly Databases and a Social Network Site Provide Different, But Related, Metrics of Pharmacy Faculty Publication," Publications, MDPI, vol. 8(2), pages 1-10, April.
    5. Adélie Ranville & Marcos Barros, 2022. "Towards Normative Theories of Social Entrepreneurship. A Review of the Top Publications of the Field," Journal of Business Ethics, Springer, vol. 180(2), pages 407-438, October.
    6. Marek Kwiek & Wojciech Roszka, 2022. "Academic vs. biological age in research on academic careers: a large-scale study with implications for scientifically developing systems," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(6), pages 3543-3575, June.
    7. Vivek Kumar Singh & Prashasti Singh & Mousumi Karmakar & Jacqueline Leta & Philipp Mayr, 2021. "The journal coverage of Web of Science, Scopus and Dimensions: A comparative analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(6), pages 5113-5142, June.
    8. Kilian Buehling & Matthias Geissler & Dorothea Strecker, 2022. "Free access to scientific literature and its influence on the publishing activity in developing countries: The effect of Sci‐Hub in the field of mathematics," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 73(9), pages 1336-1355, September.
    9. Toluwase Victor Asubiaro & Sodiq Onaolapo, 2023. "A comparative study of the coverage of African journals in Web of Science, Scopus, and CrossRef," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 74(7), pages 745-758, July.
    10. Juan Andrés Cabral & Florencia Iara Pucci, 2020. "¿Cuál es el alcance de la revolución de la credibilidad?," Asociación Argentina de Economía Política: Working Papers 4318, Asociación Argentina de Economía Política.
    11. de Arruda, Henrique F. & Silva, Filipi N. & Comin, Cesar H. & Amancio, Diego R. & Costa, Luciano da F., 2019. "Connecting network science and information theory," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 515(C), pages 641-648.
    12. Antonio Miceli & Birgit Hagen & Maria Pia Riccardi & Francesco Sotti & Davide Settembre-Blundo, 2021. "Thriving, Not Just Surviving in Changing Times: How Sustainability, Agility and Digitalization Intertwine with Organizational Resilience," Sustainability, MDPI, vol. 13(4), pages 1-17, February.
    13. Steve J. Bickley & Ho Fai Chan & Benno Torgler, 2022. "Artificial intelligence in the field of economics," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(4), pages 2055-2084, April.
    14. Diego Raphael Amancio, 2015. "Comparing the topological properties of real and artificially generated scientific manuscripts," Scientometrics, Springer;Akadémiai Kiadó, vol. 105(3), pages 1763-1779, December.
    15. Ahmed Idi Kato, 2023. "Unlocking the Potential of Microfinance Solutions on Urban Woman Entrepreneurship Development in East Africa: A Bibliometric Analysis Perspective," Sustainability, MDPI, vol. 15(20), pages 1-22, October.
    16. Corrêa, Edilson A. & Amancio, Diego R., 2019. "Word sense induction using word embeddings and community detection in complex networks," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 523(C), pages 180-190.
    17. Raminta Pranckutė, 2021. "Web of Science (WoS) and Scopus: The Titans of Bibliographic Information in Today’s Academic World," Publications, MDPI, vol. 9(1), pages 1-59, March.
    18. Corrêa, Edilson A. & Marinho, Vanessa Q. & Amancio, Diego R., 2020. "Semantic flow in language networks discriminates texts by genre and publication date," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 557(C).
    19. Mehdi Toloo & Rouhollah Khodabandelou & Amar Oukil, 2022. "A Comprehensive Bibliometric Analysis of Fractional Programming (1965–2020)," Mathematics, MDPI, vol. 10(11), pages 1-21, May.
    20. Hunter Bennett & Flynn Slattery, 2023. "Graphical abstracts are associated with greater Altmetric attention scores, but not citations, in sport science," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(6), pages 3793-3804, June.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:jinfst:v:72:y:2021:i:12:p:1461-1476. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: http://www.asis.org .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.