Author
Listed:
- Michael Owusu-Adjei
- James Ben Hayfron-Acquah
- Twum Frimpong
- Gaddafi Abdul-Salaam
Abstract
Focus on predictive algorithm and its performance evaluation is extensively covered in most research studies to determine best or appropriate predictive model with Optimum prediction solution indicated by prediction accuracy score, precision, recall, f1score etc. Prediction accuracy score from performance evaluation has been used extensively as the main determining metric for performance recommendation. It is one of the most widely used metric for identifying optimal prediction solution irrespective of dataset class distribution context or nature of dataset and output class distribution between the minority and majority variables. The key research question however is the impact of class inequality on prediction accuracy score in such datasets with output class distribution imbalance as compared to balanced accuracy score in the determination of model performance in healthcare and other real-world application systems. Answering this question requires an appraisal of current state of knowledge in both prediction accuracy score and balanced accuracy score use in real-world applications where there is unequal class distribution. Review of related works that highlight the use of imbalanced class distribution datasets with evaluation metrics will assist in contextualizing this systematic review.Author summary: The incidence of unequal class distribution in real-world applications such as healthcare and other non-medical settings continue to receive attention due to machine learning technique challenges with minority class contribution in datasets with imbalanced class distribution. Challenges such as discounting minority class contribution which may be the subject of interest. Predictive modeling evaluation of such datasets with prediction accuracy score which does not take into account dataset class distribution variation could create an erroneous impression of a supposedly high performing machine learning technique as it discounts minority class contribution. Estimating predictive model performance with balanced accuracy score that incorporates other important metrics such as true positives, true positive rates, true negatives, true negative rates, false positive, false positive rates, false negatives and false negative rates could help assess machine learning model performance more adequately and accurately to determine appropriate model performance.
Suggested Citation
Michael Owusu-Adjei & James Ben Hayfron-Acquah & Twum Frimpong & Gaddafi Abdul-Salaam, 2023.
"Imbalanced class distribution and performance evaluation metrics: A systematic review of prediction accuracy for determining model performance in healthcare systems,"
PLOS Digital Health, Public Library of Science, vol. 2(11), pages 1-19, November.
Handle:
RePEc:plo:pdig00:0000290
DOI: 10.1371/journal.pdig.0000290
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pdig00:0000290. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: digitalhealth (email available below). General contact details of provider: https://journals.plos.org/digitalhealth .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.