Explosion of formulaic research articles, including inappropriate study designs and false discoveries, based on the NHANES US national health database

Explosion of formulaic research articles, including inappropriate study designs and false discoveries, based on the NHANES US national health database

Author

Listed:

Tulsi Suchak
Anietie E Aliu
Charlie Harrison
Reyer Zwiggelaar
Nophar Geifman
Matt Spick

Abstract

With the growth of artificial intelligence (AI)-ready datasets such as the National Health and Nutrition Examination Survey (NHANES), new opportunities for data-driven research are being created, but also generating risks of data exploitation by paper mills. In this work, we focus on two areas of potential concern for AI-supported research efforts. First, we describe the production of large numbers of formulaic single-factor analyses, relating single predictors to specific health conditions, where multifactorial approaches would be more appropriate. Employing AI-supported single-factor approaches removes context from research, fails to capture interactions, avoids false discovery correction, and is an approach that can easily be adopted by paper mills. Second, we identify risks of selective data usage, such as analyzing limited date ranges or cohort subsets without clear justification, suggestive of data dredging, and post-hoc hypothesis formation. Using a systematic literature search for single-factor analyses, we identified 341 NHANES-derived research papers published over the past decade, each proposing an association between a predictor and a health condition from the wide range contained within NHANES. We found evidence that research failed to take account of multifactorial relationships, that manuscripts did not account for the risks of false discoveries, and that researchers selectively extracted data from NHANES rather than utilizing the full range of data available. Given the explosion of AI-assisted productivity in published manuscripts (the systematic search strategy used here identified an average of 4 papers per annum from 2014 to 2021, but 190 in 2024–9 October alone), we highlight a set of best practices to address these concerns, aimed at researchers, data controllers, publishers, and peer reviewers, to encourage improved statistical practices and mitigate the risks of paper mills using AI-assisted workflows to introduce low-quality manuscripts to the scientific literature.The combination of AI and national health databases offers opportunities, but may also be exploited by unethical agents. This study shows that there has been an explosion of formulaic research articles, including inappropriate study designs and false discoveries, that use data from the US NHANES resource, providing a case study in new unethical research practices, including paper mills.

Suggested Citation

Tulsi Suchak & Anietie E Aliu & Charlie Harrison & Reyer Zwiggelaar & Nophar Geifman & Matt Spick, 2025. "Explosion of formulaic research articles, including inappropriate study designs and false discoveries, based on the NHANES US national health database," PLOS Biology, Public Library of Science, vol. 23(5), pages 1-15, May.

Handle: RePEc:plo:pbio00:3003152
DOI: 10.1371/journal.pbio.3003152

Download full text from publisher

References listed on IDEAS

Jennifer A Byrne & Anna Abalkina & Olufolake Akinduro-Aje & Jana Christopher & Sarah E Eaton & Nitin Joshi & Ulf Scheffler & Nick H Wise & Jennifer Wright, 2024. "A call for research to address the threat of paper mills," PLOS Biology, Public Library of Science, vol. 22(11), pages 1-5, November.

Full references (including those not matched with items on IDEAS)

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

More about this item

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pbio00:3003152. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosbiology (email available below). General contact details of provider: https://journals.plos.org/plosbiology/ .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

Explosion of formulaic research articles, including inappropriate study designs and false discoveries, based on the NHANES US national health database

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Most related items

More about this item

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data