Author
Listed:
- Thomas Callender
- Fergus Imrie
- Bogdan Cebere
- Nora Pashayan
- Neal Navani
- Mihaela van der Schaar
- Sam M Janes
Abstract
Background: Risk-based screening for lung cancer is currently being considered in several countries; however, the optimal approach to determine eligibility remains unclear. Ensemble machine learning could support the development of highly parsimonious prediction models that maintain the performance of more complex models while maximising simplicity and generalisability, supporting the widespread adoption of personalised screening. In this work, we aimed to develop and validate ensemble machine learning models to determine eligibility for risk-based lung cancer screening. Methods and findings: For model development, we used data from 216,714 ever-smokers recruited between 2006 and 2010 to the UK Biobank prospective cohort and 26,616 high-risk ever-smokers recruited between 2002 and 2004 to the control arm of the US National Lung Screening (NLST) randomised controlled trial. The NLST trial randomised high-risk smokers from 33 US centres with at least a 30 pack-year smoking history and fewer than 15 quit-years to annual CT or chest radiography screening for lung cancer. We externally validated our models among 49,593 participants in the chest radiography arm and all 80,659 ever-smoking participants in the US Prostate, Lung, Colorectal and Ovarian (PLCO) Screening Trial. The PLCO trial, recruiting from 1993 to 2001, analysed the impact of chest radiography or no chest radiography for lung cancer screening. We primarily validated in the PLCO chest radiography arm such that we could benchmark against comparator models developed within the PLCO control arm. Models were developed to predict the risk of 2 outcomes within 5 years from baseline: diagnosis of lung cancer and death from lung cancer. We assessed model discrimination (area under the receiver operating curve, AUC), calibration (calibration curves and expected/observed ratio), overall performance (Brier scores), and net benefit with decision curve analysis. Conclusions: We present parsimonious ensemble machine learning models to predict the risk of lung cancer in ever-smokers, demonstrating a novel approach that could simplify the implementation of risk-based lung cancer screening in multiple settings. Thomas Callender and colleagues present a development and validation study using parsimonious ensemble machine learning models to assess eligibility for lung cancer screening.Why was this study done?: What did the researchers do and find?: What do these findings mean?:
Suggested Citation
Thomas Callender & Fergus Imrie & Bogdan Cebere & Nora Pashayan & Neal Navani & Mihaela van der Schaar & Sam M Janes, 2023.
"Assessing eligibility for lung cancer screening using parsimonious ensemble machine learning models: A development and validation study,"
PLOS Medicine, Public Library of Science, vol. 20(10), pages 1-18, October.
Handle:
RePEc:plo:pmed00:1004287
DOI: 10.1371/journal.pmed.1004287
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pmed00:1004287. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosmedicine (email available below). General contact details of provider: https://journals.plos.org/plosmedicine/ .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.