IDEAS home Printed from https://ideas.repec.org/a/spr/annopr/v348y2025i1d10.1007_s10479-023-05691-x.html
   My bibliography  Save this article

A framework to predict second primary lung cancer patients by using ensemble models

Author

Listed:
  • Yen-Chun Huang

    (Tamkang University)

  • Chieh-Wen Ho

    (Department of Biology, Texas A&M University)

  • Wen-Ru Chou

    (Fu Jen Catholic University
    Fu Jen Catholic University)

  • Mingchih Chen

    (Fu Jen Catholic University
    Fu Jen Catholic University)

Abstract

Machine learning (ML) model prediction, which has been wildly used in healthcare industry recently, serves as a tool to help users to make quick decisions. The prediction results could improve treatment outcomes and reduce the medical expenses. This research proposed the ML-based decision tool to predict the second primary lung cancer probability within lung cancer patients. This tool included following stages: The first stage is data processing to select the target patients by using National Health Insurance Research Database from 2011 to 2016 period as study. The second stage has used synthetic minority oversampling technique (SMOTE) to make data balancing. The third stage is feature selecting, and in final stage, we have applied five ML algorithms, which is included: Logistic Regression (LGR), Decision Tree, Random Forests (RF), multivariate adaptive regression splines (MARS), and extreme gradient boosting (XGBoost) with optimal features, then followed by building ensemble models. The results show that after feature selection, the ensemble models yield an accuracy rate 0.932. Different types of therapy (Chemotherapy (CH); Radiotherapy (RT), tyrosine kinase inhibitor (TKI)), different clinical stages, and Epidermal Growth Factor Receptor (EGFR) states were the top five optimal features affecting developed second primary lung cancer. This study can help physicians to identify the possibility with second primary lung cancer patients and make complete treatment plans for them.

Suggested Citation

  • Yen-Chun Huang & Chieh-Wen Ho & Wen-Ru Chou & Mingchih Chen, 2025. "A framework to predict second primary lung cancer patients by using ensemble models," Annals of Operations Research, Springer, vol. 348(1), pages 373-397, May.
  • Handle: RePEc:spr:annopr:v:348:y:2025:i:1:d:10.1007_s10479-023-05691-x
    DOI: 10.1007/s10479-023-05691-x
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s10479-023-05691-x
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s10479-023-05691-x?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:annopr:v:348:y:2025:i:1:d:10.1007_s10479-023-05691-x. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.