IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0330050.html

Resampling methods for class imbalance in clinical prediction models: A scoping review protocol

Author

Listed:
  • Osama Abdelhay
  • Adam Shatnawi
  • Hassan Najadat
  • Taghreed Altamimi

Abstract

Introduction: Class imbalance—where clinically important “positive” cases make up less than 30% of the dataset—systematically reduces the sensitivity and fairness of medical prediction models. Although data-level techniques, such as random oversampling, random undersampling, SMOTE, and algorithm-level approaches like cost-sensitive learning, are widely used, the empirical evidence on when these corrections improve model performance remains scattered across different diseases and modelling frameworks. This protocol outlines a scoping systematic review with meta-regression that will map and quantitatively summarise 15 years of research on resampling strategies in imbalanced clinical datasets, addressing a key methodological gap in reliable medical AI. Methods and analysis: We will search MEDLINE, EMBASE, Scopus, Web of Science Core Collection, and IEEE Xplore, along with grey literature sources (medRxiv, arXiv, bioRxiv) for primary studies (2009–31 Dec 2024) that apply at least one resampling or cost-sensitive strategy to binary clinical prediction tasks with a minority-class prevalence of less than 30%. There will be no language restrictions. Two reviewers will screen records, extract data using a piloted form, and document the process in a PRISMA flow diagram. A descriptive synthesis will catalogue the clinical domain, sample size, imbalance ratio, resampling strategy, model type, and performance metrics where 10 or more studies report compatible AUCs. A random-effects mixed-effects meta-regression (logit-transformed AUC) will be used to examine the effect of moderators, including imbalance ratio, resampling strategy, model family, and sample size. Small-study effects will be assessed with funnel plots, Egger’s test, trim-and-fill, and weight-function models; influence diagnostics and leave-one-out analyses will evaluate robustness. Since this is a methodological review, formal clinical risk-of-bias tools are optional; instead, design-level screening, influence diagnostics, and sensitivity analyses will enhance transparency. Discussion: By combining a comprehensive conceptual framework with quantitative estimates, this review aims to determine when data-level versus algorithm-level balancing leads to genuine improvements in discrimination, calibration, and cost-sensitive metrics across various medical fields. The findings will help researchers select concise, evidence-based methods for addressing imbalance, inform journal and regulatory reporting standards, and identify research gaps such as the under-reporting of calibration and misclassification costs, which must be addressed before balanced models can be reliably trusted in clinical practice. Systematic review registration: INPLASY202550026.

Suggested Citation

  • Osama Abdelhay & Adam Shatnawi & Hassan Najadat & Taghreed Altamimi, 2025. "Resampling methods for class imbalance in clinical prediction models: A scoping review protocol," PLOS ONE, Public Library of Science, vol. 20(11), pages 1-11, November.
  • Handle: RePEc:plo:pone00:0330050
    DOI: 10.1371/journal.pone.0330050
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0330050
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0330050&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0330050?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0330050. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.