Author
Listed:
- Tulsi Suchak
- Anietie E Aliu
- Charlie Harrison
- Reyer Zwiggelaar
- Nophar Geifman
- Matt Spick
Abstract
With the growth of artificial intelligence (AI)-ready datasets such as the National Health and Nutrition Examination Survey (NHANES), new opportunities for data-driven research are being created, but also generating risks of data exploitation by paper mills. In this work, we focus on two areas of potential concern for AI-supported research efforts. First, we describe the production of large numbers of formulaic single-factor analyses, relating single predictors to specific health conditions, where multifactorial approaches would be more appropriate. Employing AI-supported single-factor approaches removes context from research, fails to capture interactions, avoids false discovery correction, and is an approach that can easily be adopted by paper mills. Second, we identify risks of selective data usage, such as analyzing limited date ranges or cohort subsets without clear justification, suggestive of data dredging, and post-hoc hypothesis formation. Using a systematic literature search for single-factor analyses, we identified 341 NHANES-derived research papers published over the past decade, each proposing an association between a predictor and a health condition from the wide range contained within NHANES. We found evidence that research failed to take account of multifactorial relationships, that manuscripts did not account for the risks of false discoveries, and that researchers selectively extracted data from NHANES rather than utilizing the full range of data available. Given the explosion of AI-assisted productivity in published manuscripts (the systematic search strategy used here identified an average of 4 papers per annum from 2014 to 2021, but 190 in 2024–9 October alone), we highlight a set of best practices to address these concerns, aimed at researchers, data controllers, publishers, and peer reviewers, to encourage improved statistical practices and mitigate the risks of paper mills using AI-assisted workflows to introduce low-quality manuscripts to the scientific literature.The combination of AI and national health databases offers opportunities, but may also be exploited by unethical agents. This study shows that there has been an explosion of formulaic research articles, including inappropriate study designs and false discoveries, that use data from the US NHANES resource, providing a case study in new unethical research practices, including paper mills.
Suggested Citation
Tulsi Suchak & Anietie E Aliu & Charlie Harrison & Reyer Zwiggelaar & Nophar Geifman & Matt Spick, 2025.
"Explosion of formulaic research articles, including inappropriate study designs and false discoveries, based on the NHANES US national health database,"
PLOS Biology, Public Library of Science, vol. 23(5), pages 1-15, May.
Handle:
RePEc:plo:pbio00:3003152
DOI: 10.1371/journal.pbio.3003152
Download full text from publisher
Most related items
These are the items that most often cite the same works as this one and are cited by the same works as this one.
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pbio00:3003152. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosbiology (email available below). General contact details of provider: https://journals.plos.org/plosbiology/ .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.