A novel interpretable machine learning system to generate clinical risk scores: An application for predicting early mortality or unplanned readmission in a retrospective cohort study

My bibliography Save this article

A novel interpretable machine learning system to generate clinical risk scores: An application for predicting early mortality or unplanned readmission in a retrospective cohort study

Author

Listed:

Yilin Ning
Siqi Li
Marcus Eng Hock Ong
Feng Xie
Bibhas Chakraborty
Daniel Shu Wei Ting
Nan Liu

Registered:

Abstract

Risk scores are widely used for clinical decision making and commonly generated from logistic regression models. Machine-learning-based methods may work well for identifying important predictors to create parsimonious scores, but such ‘black box’ variable selection limits interpretability, and variable importance evaluated from a single model can be biased. We propose a robust and interpretable variable selection approach using the recently developed Shapley variable importance cloud (ShapleyVIC) that accounts for variability in variable importance across models. Our approach evaluates and visualizes overall variable contributions for in-depth inference and transparent variable selection, and filters out non-significant contributors to simplify model building steps. We derive an ensemble variable ranking from variable contributions across models, which is easily integrated with an automated and modularized risk score generator, AutoScore, for convenient implementation. In a study of early death or unplanned readmission after hospital discharge, ShapleyVIC selected 6 variables from 41 candidates to create a well-performing risk score, which had similar performance to a 16-variable model from machine-learning-based ranking. Our work contributes to the recent emphasis on interpretability of prediction models for high-stakes decision making, providing a disciplined solution to detailed assessment of variable importance and transparent development of parsimonious clinical risk scores.Author summary: Risk scores help clinicians quickly assess the risk for a patient by adding up a few scores associated with key predictors. Given the simplicity of such scores, shortlisting the most important predictors is key to predictive performance, but traditional methods are sometimes insufficient when there are a lot of candidates to choose from. As a rising area of research, machine learning provides a growing toolkit for variable selection, but as many machine learning models are complex ‘black boxes’ that differ considerably from risk scores, directly plugging machine learning tools into risk score development can harm both interpretability and predictive performance. We propose a robust and interpretable variable selection mechanism that is tailored to risk scores, and integrate it with an automated framework for convenient risk score development. In a clinical example, we demonstrated how our proposed method can help researchers understand the contribution of 41 candidate variables to outcome prediction through visualizations, filter out 20 variables with non-significant contribution and build a well-performing risk score using only 6 variables, whereas a machine-learning-based method selected 16 variables to achieve a similar performance. We have thus presented a useful tool to support transparent high-stakes decision making.

Suggested Citation

Yilin Ning & Siqi Li & Marcus Eng Hock Ong & Feng Xie & Bibhas Chakraborty & Daniel Shu Wei Ting & Nan Liu, 2022. "A novel interpretable machine learning system to generate clinical risk scores: An application for predicting early mortality or unplanned readmission in a retrospective cohort study," PLOS Digital Health, Public Library of Science, vol. 1(6), pages 1-20, June.

Handle: RePEc:plo:pdig00:0000062
DOI: 10.1371/journal.pdig.0000062

Download full text from publisher

More about this item

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pdig00:0000062. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

We have no bibliographic references for this item. You can help adding them by using this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: digitalhealth (email available below). General contact details of provider: https://journals.plos.org/digitalhealth .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

A novel interpretable machine learning system to generate clinical risk scores: An application for predicting early mortality or unplanned readmission in a retrospective cohort study

Author

Abstract

Suggested Citation

Download full text from publisher

More about this item

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data