IDEAS home Printed from https://ideas.repec.org/a/plo/pdig00/0000712.html
   My bibliography  Save this article

Comparing imputation approaches to handle systematically missing inputs in risk calculators

Author

Listed:
  • Anja Mühlemann
  • Philip Stange
  • Antoine Faul
  • Serena Lozza-Fiacco
  • Rowan Iskandar
  • Manuela Moraru
  • Susanne Theis
  • Petra Stute
  • Ben D Spycher
  • David Ginsbourger

Abstract

Risk calculators based on statistical and/or mechanistic models have flourished and are increasingly available for a variety of diseases. However, in the day-to-day practice, their usage may be hampered by missing input variables. Certain measurements needed to calculate disease risk may be difficult to acquire, e.g. because they necessitate blood draws, and may be systematically missing in the population of interest. We compare several deterministic and probabilistic imputation approaches to surrogate predictions from risk calculators while accounting for uncertainty due to systematically missing inputs. The considered approaches predict missing inputs from available ones. In the case of probabilistic imputation, this leads to probabilistic prediction of the risk. We compare the methods using scoring techniques for forecast evaluation, with a focus on the Brier and CRPS scores. We also discuss the classification of patients into risk groups defined by thresholding predicted probabilities. While the considered procedures are not meant to replace fully-informed risk calculations, employing them to get first indications of risk distribution in the absence of at least one input parameter may find useful applications in medical practice. To illustrate this, we use the SCORE2 risk calculator for cardiovascular disease and a data set including medical data from 359 women, obtained from the gynecology department at the Inselspital in Bern, Switzerland. Using this data set, we mimic the situation where some input parameters, blood lipids and blood pressure, are systematically missing and compute the SCORE2 risk by probabilistic imputation of the missing variables based on the remaining input variables. We compare this approach to established imputation techniques like MICE by means of scoring rules and visualize in turn how probabilistic imputation can be used in sample size considerations.Author summary: Risk calculators and more generally, computer codes, play an important part in digital health. Given patient information, they allow for instance getting estimates for probabilities of developing certain diseases. Yet when part of the required patient information is missing, e.g., because some of the risk factors could not be measured, performing risk calculations may require to imputate missing values. We compare different imputation approaches, and essentially make a case that using probabilistic imputation approaches is worth the effort compared to deterministic approaches. In essence, propagating uncertainties on the imputated risk factors leads to probabilistic predictors of risks. We illustrate on the considered risks of developing a cardiovascular disease for cohort of patients from a menopause clinic in Bern, Switzerland, how the considered probabilistic approaches outperform deterministic ones in terms of forecast evaluation scores, and how such probabilistic risk predictions may be used in medical practice, highlighting in turn arising trade-offs between type I and type II errors.

Suggested Citation

  • Anja Mühlemann & Philip Stange & Antoine Faul & Serena Lozza-Fiacco & Rowan Iskandar & Manuela Moraru & Susanne Theis & Petra Stute & Ben D Spycher & David Ginsbourger, 2025. "Comparing imputation approaches to handle systematically missing inputs in risk calculators," PLOS Digital Health, Public Library of Science, vol. 4(1), pages 1-26, January.
  • Handle: RePEc:plo:pdig00:0000712
    DOI: 10.1371/journal.pdig.0000712
    as

    Download full text from publisher

    File URL: https://journals.plos.org/digitalhealth/article?id=10.1371/journal.pdig.0000712
    Download Restriction: no

    File URL: https://journals.plos.org/digitalhealth/article/file?id=10.1371/journal.pdig.0000712&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pdig.0000712?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. David D. Kim & Lu Wang & Brianna N. Lauren & Junxiu Liu & Matti Marklund & Yujin Lee & Renata Micha & Dariush Mozaffarian & John B. Wong, 2023. "Development and Validation of the US Diabetes, Obesity, Cardiovascular Disease Microsimulation (DOC-M) Model: Health Disparity and Economic Impact Model," Medical Decision Making, , vol. 43(7-8), pages 930-948, October.
    2. Patrick C Stone & Christina Chu & Chris Todd & Jane Griffiths & Anastasia Kalpakidou & Vaughan Keeley & Rumana Z Omar & Victoria Vickerstaff, 2022. "The accuracy of clinician predictions of survival in the Prognosis in Palliative care Study II (PiPS2): A prospective observational study," PLOS ONE, Public Library of Science, vol. 17(4), pages 1-13, April.
    3. Claudia Czado & Tilmann Gneiting & Leonhard Held, 2009. "Predictive Model Assessment for Count Data," Biometrics, The International Biometric Society, vol. 65(4), pages 1254-1261, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Xia Li, 2024. "Unveiling Portfolio Resilience: Harnessing Asymmetric Copulas for Dynamic Risk Assessment in the Knowledge Economy," Journal of the Knowledge Economy, Springer;Portland International Center for Management of Engineering and Technology (PICMET), vol. 15(3), pages 10200-10226, September.
    2. Vasiliki Christou & Konstantinos Fokianos, 2014. "Quasi-Likelihood Inference For Negative Binomial Time Series Models," Journal of Time Series Analysis, Wiley Blackwell, vol. 35(1), pages 55-78, January.
    3. James Mitchell & Martin Weale, 2023. "Censored density forecasts: Production and evaluation," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 38(5), pages 714-734, August.
    4. Lu, Ye & Suthaharan, Neyavan, 2023. "Electricity price spike clustering: A zero-inflated GARX approach," Energy Economics, Elsevier, vol. 124(C).
    5. Fokianos, Konstantinos & Fried, Roland & Kharin, Yuriy & Voloshko, Valeriy, 2022. "Statistical analysis of multivariate discrete-valued time series," Journal of Multivariate Analysis, Elsevier, vol. 188(C).
    6. Snyder, Ralph D. & Ord, J. Keith & Beaumont, Adrian, 2012. "Forecasting the intermittent demand for slow-moving inventories: A modelling approach," International Journal of Forecasting, Elsevier, vol. 28(2), pages 485-496.
    7. Diebold, Francis X. & Shin, Minchul & Zhang, Boyuan, 2023. "On the aggregation of probability assessments: Regularized mixtures of predictive densities for Eurozone inflation and real interest rates," Journal of Econometrics, Elsevier, vol. 237(2).
    8. Nicholas G. Reich & Justin Lessler & Krzysztof Sakrejda & Stephen A. Lauer & Sopon Iamsirithaworn & Derek A. T. Cummings, 2016. "Case Study in Evaluating Time Series Prediction Models Using the Relative Mean Absolute Error," The American Statistician, Taylor & Francis Journals, vol. 70(3), pages 285-292, July.
    9. Dag Tjøstheim, 2012. "Some recent theory for autoregressive count time series," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 21(3), pages 413-438, September.
    10. Antonio Bracale & Pasquale De Falco, 2015. "An Advanced Bayesian Method for Short-Term Probabilistic Forecasting of the Generation of Wind Power," Energies, MDPI, vol. 8(9), pages 1-22, September.
    11. David Harris & Gael M. Martin & Indeewara Perera & Don S. Poskitt, 2017. "Construction and visualization of optimal confidence sets for frequentist distributional forecasts," Monash Econometrics and Business Statistics Working Papers 9/17, Monash University, Department of Econometrics and Business Statistics.
    12. Birgit Schrödle & Leonhard Held, 2011. "A primer on disease mapping and ecological regression using $${\texttt{INLA}}$$," Computational Statistics, Springer, vol. 26(2), pages 241-258, June.
    13. Xueli Wang & Moqin Zhou & Jinzhu Jia & Zhi Geng & Gexin Xiao, 2018. "A Bayesian Approach to Real-Time Monitoring and Forecasting of Chinese Foodborne Diseases," IJERPH, MDPI, vol. 15(8), pages 1-13, August.
    14. Moritz Berger & Gerhard Tutz, 2021. "Transition models for count data: a flexible alternative to fixed distribution models," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 30(4), pages 1259-1283, October.
    15. Kolassa, Stephan, 2016. "Evaluating predictive count data distributions in retail sales forecasting," International Journal of Forecasting, Elsevier, vol. 32(3), pages 788-803.
    16. Fabian Krüger & Sebastian Lerch & Thordis Thorarinsdottir & Tilmann Gneiting, 2021. "Predictive Inference Based on Markov Chain Monte Carlo Output," International Statistical Review, International Statistical Institute, vol. 89(2), pages 274-301, August.
    17. Braun, Julia & Sabanés Bové, Daniel & Held, Leonhard, 2014. "Choice of generalized linear mixed models using predictive crossvalidation," Computational Statistics & Data Analysis, Elsevier, vol. 75(C), pages 190-202.
    18. Kim M Pepin & Jia Wang & Colleen T Webb & Jennifer A Hoeting & Mary Poss & Peter J Hudson & Wenshan Hong & Huachen Zhu & Yi Guan & Steven Riley, 2013. "Anticipating the Prevalence of Avian Influenza Subtypes H9 and H5 in Live-Bird Markets," PLOS ONE, Public Library of Science, vol. 8(2), pages 1-8, February.
    19. Christopher Gelpi & Nazli Avdan, 2018. "Democracies at risk? A forecasting analysis of regime type and the risk of terrorist attack," Conflict Management and Peace Science, Peace Science Society (International), vol. 35(1), pages 18-42, January.
    20. Gergely Ganics & Barbara Rossi & Tatevik Sekhposyan, 2024. "From Fixed‐Event to Fixed‐Horizon Density Forecasts: Obtaining Measures of Multihorizon Uncertainty from Survey Density Forecasts," Journal of Money, Credit and Banking, Blackwell Publishing, vol. 56(7), pages 1675-1704, October.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pdig00:0000712. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: digitalhealth (email available below). General contact details of provider: https://journals.plos.org/digitalhealth .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.