Author
Listed:
- Samantha Kanny
- Grisha Post
- Patricia Carbajales-Dale
- William Cummings
- Janet Evatt
- Windsor Westbrook Sherrill
Abstract
Approximately 11.6% of Americans have diabetes and South Carolina has one of the highest rates of adults with diabetes. Diabetes self-management programs have been observed to be effective in promoting weight loss and improving diabetes knowledge and self-care behaviors. The ability to keep vulnerable individuals in these programs is critical to helping the growing diabetic population. Utilizing machine learning is gaining popularity in healthcare settings. The objective of this study is to assess the effectiveness of several machine learning methods in predicting attrition from a diabetes self-management program, utilizing participant demographics and various evaluation measures. Data were collected from participants enrolled in Health Extension for Diabetes (HED). Descriptive statistics were used to examine HED participant demographics, while Mann-Whitney U tests and chi-square tests were used to examine relationships between demographics and pre-program evaluation measures. Through the various analyses, health-related measures – specifically the SF-12 quality of life scores, Distressed Communities Index (DCI) score, along with demographic factors (race, age, height, and educational attainment), and spatial variables (drive time to the nearest grocery store) emerged as influential predictors of attrition. However, the machine learning models showed poor overall performance, with AUC values ranging from 0.53 – 0.64 and F-1 scores between 0.19 – 0.36, indicating low predictive power. Among the models tested, XGBoost with downsampling yielded the highest AUC value (0.64) and a slightly higher F-1 score (0.36). To enhance model interpretability, SHAP (SHapley Additive exPlanations) was applied. While these models are not suitable for accurately predicting individual attrition risk in diabetes self-management programs, they identify potential factors influencing dropout rates. These findings underscore the difficulty for models to accurately predict health behavior outcomes, highlighting the need for future research to improve predictive modeling to better support patient engagement and retention.Author summary: Approximately 11.6% of Americans have diabetes, and South Carolina has one of the highest diabetes rates in adults. Diabetes self-management programs have proven effective, making it critical to retain vulnerable individuals in these programs to address the growing diabetic population. Machine learning is gaining popularity in healthcare settings due to its ability to predict patient outcomes. This study examined the effectiveness of machine learning to predict attrition from a diabetes self-management program, utilizing participant demographics and evaluation measures. Despite the inability of all models to accurately predict individual attrition risk in diabetes self-management programs, the models did identify quality of life scores, the Distressed Communities Index (DCI) score, race, age, height, and drive time to nearest grocery store as influential features in predicting attrition. In healthcare, datasets are commonly imbalanced due to the low prevalence of behavior-based outcomes such as treatment nonadherence or program attrition, which poses challenges for training machine learning models. While balancing the dataset slightly improved attrition prediction, the models still indicated a weak predictive ability. These findings underscore the difficulty for models to accurately predict health behavior outcomes, highlighting the need for future research to improve predictive modeling to better support patient engagement and retention.
Suggested Citation
Samantha Kanny & Grisha Post & Patricia Carbajales-Dale & William Cummings & Janet Evatt & Windsor Westbrook Sherrill, 2025.
"A comparative approach of machine learning models to predict attrition in a diabetes management program,"
PLOS Digital Health, Public Library of Science, vol. 4(7), pages 1-20, July.
Handle:
RePEc:plo:pdig00:0000930
DOI: 10.1371/journal.pdig.0000930
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pdig00:0000930. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: digitalhealth (email available below). General contact details of provider: https://journals.plos.org/digitalhealth .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.