IDEAS home Printed from https://ideas.repec.org/a/plo/pdig00/0000712.html
   My bibliography  Save this article

Comparing imputation approaches to handle systematically missing inputs in risk calculators

Author

Listed:
  • Anja Mühlemann
  • Philip Stange
  • Antoine Faul
  • Serena Lozza-Fiacco
  • Rowan Iskandar
  • Manuela Moraru
  • Susanne Theis
  • Petra Stute
  • Ben D Spycher
  • David Ginsbourger

Abstract

Risk calculators based on statistical and/or mechanistic models have flourished and are increasingly available for a variety of diseases. However, in the day-to-day practice, their usage may be hampered by missing input variables. Certain measurements needed to calculate disease risk may be difficult to acquire, e.g. because they necessitate blood draws, and may be systematically missing in the population of interest. We compare several deterministic and probabilistic imputation approaches to surrogate predictions from risk calculators while accounting for uncertainty due to systematically missing inputs. The considered approaches predict missing inputs from available ones. In the case of probabilistic imputation, this leads to probabilistic prediction of the risk. We compare the methods using scoring techniques for forecast evaluation, with a focus on the Brier and CRPS scores. We also discuss the classification of patients into risk groups defined by thresholding predicted probabilities. While the considered procedures are not meant to replace fully-informed risk calculations, employing them to get first indications of risk distribution in the absence of at least one input parameter may find useful applications in medical practice. To illustrate this, we use the SCORE2 risk calculator for cardiovascular disease and a data set including medical data from 359 women, obtained from the gynecology department at the Inselspital in Bern, Switzerland. Using this data set, we mimic the situation where some input parameters, blood lipids and blood pressure, are systematically missing and compute the SCORE2 risk by probabilistic imputation of the missing variables based on the remaining input variables. We compare this approach to established imputation techniques like MICE by means of scoring rules and visualize in turn how probabilistic imputation can be used in sample size considerations.Author summary: Risk calculators and more generally, computer codes, play an important part in digital health. Given patient information, they allow for instance getting estimates for probabilities of developing certain diseases. Yet when part of the required patient information is missing, e.g., because some of the risk factors could not be measured, performing risk calculations may require to imputate missing values. We compare different imputation approaches, and essentially make a case that using probabilistic imputation approaches is worth the effort compared to deterministic approaches. In essence, propagating uncertainties on the imputated risk factors leads to probabilistic predictors of risks. We illustrate on the considered risks of developing a cardiovascular disease for cohort of patients from a menopause clinic in Bern, Switzerland, how the considered probabilistic approaches outperform deterministic ones in terms of forecast evaluation scores, and how such probabilistic risk predictions may be used in medical practice, highlighting in turn arising trade-offs between type I and type II errors.

Suggested Citation

  • Anja Mühlemann & Philip Stange & Antoine Faul & Serena Lozza-Fiacco & Rowan Iskandar & Manuela Moraru & Susanne Theis & Petra Stute & Ben D Spycher & David Ginsbourger, 2025. "Comparing imputation approaches to handle systematically missing inputs in risk calculators," PLOS Digital Health, Public Library of Science, vol. 4(1), pages 1-26, January.
  • Handle: RePEc:plo:pdig00:0000712
    DOI: 10.1371/journal.pdig.0000712
    as

    Download full text from publisher

    File URL: https://journals.plos.org/digitalhealth/article?id=10.1371/journal.pdig.0000712
    Download Restriction: no

    File URL: https://journals.plos.org/digitalhealth/article/file?id=10.1371/journal.pdig.0000712&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pdig.0000712?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. David D. Kim & Lu Wang & Brianna N. Lauren & Junxiu Liu & Matti Marklund & Yujin Lee & Renata Micha & Dariush Mozaffarian & John B. Wong, 2023. "Development and Validation of the US Diabetes, Obesity, Cardiovascular Disease Microsimulation (DOC-M) Model: Health Disparity and Economic Impact Model," Medical Decision Making, , vol. 43(7-8), pages 930-948, October.
    2. Patrick C Stone & Christina Chu & Chris Todd & Jane Griffiths & Anastasia Kalpakidou & Vaughan Keeley & Rumana Z Omar & Victoria Vickerstaff, 2022. "The accuracy of clinician predictions of survival in the Prognosis in Palliative care Study II (PiPS2): A prospective observational study," PLOS ONE, Public Library of Science, vol. 17(4), pages 1-13, April.
    3. Claudia Czado & Tilmann Gneiting & Leonhard Held, 2009. "Predictive Model Assessment for Count Data," Biometrics, The International Biometric Society, vol. 65(4), pages 1254-1261, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. James Mitchell & Martin Weale, 2023. "Censored density forecasts: Production and evaluation," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 38(5), pages 714-734, August.
    2. Lu, Ye & Suthaharan, Neyavan, 2023. "Electricity price spike clustering: A zero-inflated GARX approach," Energy Economics, Elsevier, vol. 124(C).
    3. Snyder, Ralph D. & Ord, J. Keith & Beaumont, Adrian, 2012. "Forecasting the intermittent demand for slow-moving inventories: A modelling approach," International Journal of Forecasting, Elsevier, vol. 28(2), pages 485-496.
    4. Diebold, Francis X. & Shin, Minchul & Zhang, Boyuan, 2023. "On the aggregation of probability assessments: Regularized mixtures of predictive densities for Eurozone inflation and real interest rates," Journal of Econometrics, Elsevier, vol. 237(2).
    5. Antonio Bracale & Pasquale De Falco, 2015. "An Advanced Bayesian Method for Short-Term Probabilistic Forecasting of the Generation of Wind Power," Energies, MDPI, vol. 8(9), pages 1-22, September.
    6. Xueli Wang & Moqin Zhou & Jinzhu Jia & Zhi Geng & Gexin Xiao, 2018. "A Bayesian Approach to Real-Time Monitoring and Forecasting of Chinese Foodborne Diseases," IJERPH, MDPI, vol. 15(8), pages 1-13, August.
    7. Moritz Berger & Gerhard Tutz, 2021. "Transition models for count data: a flexible alternative to fixed distribution models," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 30(4), pages 1259-1283, October.
    8. Fabian Krüger & Sebastian Lerch & Thordis Thorarinsdottir & Tilmann Gneiting, 2021. "Predictive Inference Based on Markov Chain Monte Carlo Output," International Statistical Review, International Statistical Institute, vol. 89(2), pages 274-301, August.
    9. Braun, Julia & Sabanés Bové, Daniel & Held, Leonhard, 2014. "Choice of generalized linear mixed models using predictive crossvalidation," Computational Statistics & Data Analysis, Elsevier, vol. 75(C), pages 190-202.
    10. Christopher Gelpi & Nazli Avdan, 2018. "Democracies at risk? A forecasting analysis of regime type and the risk of terrorist attack," Conflict Management and Peace Science, Peace Science Society (International), vol. 35(1), pages 18-42, January.
    11. Wei Wei & Leonhard Held, 2014. "Calibration tests for count data," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 23(4), pages 787-805, December.
    12. Yuhyeong Jang & Raanju R. Sundararajan & Wagner Barreto-Souza & Elizabeth Wheaton-Paramo, 2024. "Determining economic factors for sex trafficking in the United States using count time series regression," Empirical Economics, Springer, vol. 67(1), pages 337-354, July.
    13. Kheifets, Igor & Velasco, Carlos, 2017. "New goodness-of-fit diagnostics for conditional discrete response models," Journal of Econometrics, Elsevier, vol. 200(1), pages 135-149.
    14. Rossi, Barbara & Ganics, Gergely & Sekhposyan, Tatevik, 2020. "From Fixed-event to Fixed-horizon Density Forecasts: Obtaining Measures of Multi-horizon Uncertainty from Survey Density Foreca," CEPR Discussion Papers 14267, C.E.P.R. Discussion Papers.
    15. Frank van Berkum & Katrien Antonio & Michel Vellekoop, 2021. "Quantifying longevity gaps using micro‐level lifetime data," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 184(2), pages 548-570, April.
    16. Shi, Peng & Zhao, Zifeng, 2024. "Enhanced pricing and management of bundled insurance risks with dependence-aware prediction using pair copula construction," Journal of Econometrics, Elsevier, vol. 240(1).
    17. Gao, Lisa & Shi, Peng, 2022. "Leveraging high-resolution weather information to predict hail damage claims: A spatial point process for replicated point patterns," Insurance: Mathematics and Economics, Elsevier, vol. 107(C), pages 161-179.
    18. Emily S Nightingale & Lloyd A C Chapman & Sridhar Srikantiah & Swaminathan Subramanian & Purushothaman Jambulingam & Johannes Bracher & Mary M Cameron & Graham F Medley, 2020. "A spatio-temporal approach to short-term prediction of visceral leishmaniasis diagnoses in India," PLOS Neglected Tropical Diseases, Public Library of Science, vol. 14(7), pages 1-21, July.
    19. Roel Verbelen & Katrien Antonio & Gerda Claeskens, 2018. "Unravelling the predictive power of telematics data in car insurance pricing," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 67(5), pages 1275-1304, November.
    20. Boris Aleksandrov & Christian H. Weiß, 2020. "Testing the dispersion structure of count time series using Pearson residuals," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 104(3), pages 325-361, September.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pdig00:0000712. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: digitalhealth (email available below). General contact details of provider: https://journals.plos.org/digitalhealth .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.