IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0237724.html
   My bibliography  Save this article

A new analytical framework for missing data imputation and classification with uncertainty: Missing data imputation and heart failure readmission prediction

Author

Listed:
  • Zhiyong Hu
  • Dongping Du

Abstract

Background: The wide adoption of electronic health records (EHR) system has provided vast opportunities to advance health care services. However, the prevalence of missing values in EHR system poses a great challenge on data analysis to support clinical decision-making. The objective of this study is to develop a new methodological framework that can address the missing data challenge and provide a reliable tool to predict the hospital readmission among Heart Failure patients. Methods: We used Gaussian Process Latent Variable Model (GPLVM) to impute the missing values. Specifically, a lower dimensional embedding was learned from a small complete dataset and then used to impute the missing values in the incomplete dataset. The GPLVM-based missing data imputation can provide both the mean estimate and the uncertainty associated with the mean estimate. To incorporate the uncertainty in prediction, a constrained support vector machine (cSVM) was developed to obtain robust predictions. We first sampled multiple datasets from the distributions of input uncertainty and trained a support vector machine for each dataset. Then an optimal classifier was identified by selecting the support vectors that maximize the separation margin of a newly sampled dataset and minimize the similarity with the pre-trained support vectors. Results: The proposed model was derived and validated using Physionet MIMIC-III clinical database. The GPLVM imputation provided normalized mean absolute errors of 0.11 and 0.12 respectively when 20% and 30% of instances contained missing values, and the confidence bounds of the estimations captures 97% of the true values. The cSVM model provided an average Area Under Curve of 0.68, which improves the prediction accuracy by 7% as compared to some existing classifiers. Conclusions: The proposed method provides accurate imputation of missing values and has a better prediction performance as compared to existing models that can only deal with deterministic inputs.

Suggested Citation

  • Zhiyong Hu & Dongping Du, 2020. "A new analytical framework for missing data imputation and classification with uncertainty: Missing data imputation and heart failure readmission prediction," PLOS ONE, Public Library of Science, vol. 15(9), pages 1-15, September.
  • Handle: RePEc:plo:pone00:0237724
    DOI: 10.1371/journal.pone.0237724
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0237724
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0237724&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0237724?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Yu-Wei Lin & Yuqian Zhou & Faraz Faghri & Michael J Shaw & Roy H Campbell, 2019. "Analysis and prediction of unplanned intensive care unit readmission using recurrent neural networks with long short-term memory," PLOS ONE, Public Library of Science, vol. 14(7), pages 1-22, July.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Indy Man Kit Ho & Anthony Weldon & Jason Tze Ho Yong & Candy Tze Tim Lam & Jaime Sampaio, 2023. "Using Machine Learning Algorithms to Pool Data from Meta-Analysis for the Prediction of Countermovement Jump Improvement," IJERPH, MDPI, vol. 20(10), pages 1-15, May.
    2. Zixian Liu & Guansan Du & Shuai Zhou & Haifeng Lu & Han Ji, 2022. "Analysis of Internet Financial Risks Based on Deep Learning and BP Neural Network," Computational Economics, Springer;Society for Computational Economics, vol. 59(4), pages 1481-1499, April.
    3. Daren Zhao & Huiwu Zhang & Qing Cao & Zhiyi Wang & Sizhang He & Minghua Zhou & Ruihua Zhang, 2022. "The research of ARIMA, GM(1,1), and LSTM models for prediction of TB cases in China," PLOS ONE, Public Library of Science, vol. 17(2), pages 1-18, February.
    4. Indy Man Kit Ho & Kai Yuen Cheong & Anthony Weldon, 2021. "Predicting student satisfaction of emergency remote learning in higher education during COVID-19 using machine learning techniques," PLOS ONE, Public Library of Science, vol. 16(4), pages 1-27, April.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0237724. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.