IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0341259.html

Assessing in-hospital mortality risk in ICU lung cancer patients using machine learning: An analysis based on the MIMIC-IV database

Author

Listed:
  • Jianwei Wang
  • Lizhen Lin
  • Li-ping Qiu
  • Li-lan Zheng
  • Lu-xi Wu
  • Hui Lv
  • Haihua Xie

Abstract

Background: Patients with advanced lung cancer admitted to the intensive care unit (ICU) face a substantially elevated risk of in-hospital mortality. Early identification of high-risk individuals is essential to support timely clinical decision-making. This study aimed to develop and validate a predictive model using machine learning (ML) techniques to estimate in-hospital mortality in this patient population. Methods: Clinical data were obtained from the Medical Information Mart for Intensive Care-IV (MIMIC-IV) database. Feature selection was performed using least absolute shrinkage and selection operator (LASSO) regression, enabling the construction of eight ML models: logistic regression (LR), support vector machine (SVM), gradient boosting machine (GBM), artificial neural network (ANN), extreme gradient boosting (XGBoost), k-nearest neighbors (k-NN), adaptive boosting (AdaBoost), and random forest (RF). Model performance was assessed using the area under the receiver operating characteristic curve (AUC), as well as accuracy, sensitivity, specificity, and F1 score. Discrimination, calibration, and clinical utility were also evaluated. The final model incorporated 27 clinically interpretable variables, including not only established severity scores (e.g., SAPS II) but also dynamic treatment factors (e.g., vasopressin, mechanical ventilation duration) that reflect real-world ICU practice. SHAP analysis was employed to enhance interpretability, allowing clinicians to understand both the magnitude and directionality of key predictors—an improvement over black-box ML applications in prior studies. Results: Among the 1,755 patients included, 368 (21%) died during hospitalization in the training cohort.Notably, older individuals, particularly those of Caucasian descent, demonstrated a higher susceptibility to mortality during their hospital stay. Lasso regression revealed that 27 variables demonstrated a significant correlation with lung cancer, such as gender, hospital stay duration The XGBoost model achieved the highest predictive performance, achieving an accuracy of 0.783, an F1 score of 0.595, and an AUC of 0.865 (95% CI: 0.840–0.891)within the training cohort. The performance metrics for the test cohort reflected similar trends, with an accuracy of 0.719, an F1 score of 0.543, and an AUC of 0.790(95% CI: 0.741–0.840). Key predictors identified consistently across models (LR, SVM, ANN, and XGBoost) included hospital stay duration, Simplified Acute Physiology Score II (SAPS II), use of norepinephrine and vasopressin, prothrombin time (PT), mechanical ventilation duration, white blood cell count (WBC), and blood urea nitrogen (BUN). The SHAP summary plot further illustrated the direction and magnitude of influence for the top 15 predictors. Conclusion: The XGBoost-based model showed the best performance in predicting in-hospital mortality among critically ill lung cancer patients. Hospital stay duration and SAPS II score emerged as the most influential predictors,which can serve as the basis for a simplified clinical risk score. These findings may support early risk stratification and guide clinical decision-making in the ICU. The analysis, relying exclusively on internal divisions from MIMIC-IV, restricts the model’s generalizability and, consequently, its applicability in broader clinical contexts.

Suggested Citation

  • Jianwei Wang & Lizhen Lin & Li-ping Qiu & Li-lan Zheng & Lu-xi Wu & Hui Lv & Haihua Xie, 2026. "Assessing in-hospital mortality risk in ICU lung cancer patients using machine learning: An analysis based on the MIMIC-IV database," PLOS ONE, Public Library of Science, vol. 21(1), pages 1-16, January.
  • Handle: RePEc:plo:pone00:0341259
    DOI: 10.1371/journal.pone.0341259
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0341259
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0341259&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0341259?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0341259. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.