IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0214365.html

Prediction of premature all-cause mortality: A prospective general population cohort study comparing machine-learning and standard epidemiological approaches

Author

Listed:
  • Stephen F Weng
  • Luis Vaz
  • Nadeem Qureshi
  • Joe Kai

Abstract

Background: Prognostic modelling using standard methods is well-established, particularly for predicting risk of single diseases. Machine-learning may offer potential to explore outcomes of even greater complexity, such as premature death. This study aimed to develop novel prediction algorithms using machine-learning, in addition to standard survival modelling, to predict premature all-cause mortality. Methods: A prospective population cohort of 502,628 participants aged 40–69 years were recruited to the UK Biobank from 2006–2010 and followed-up until 2016. Participants were assessed on a range of demographic, biometric, clinical and lifestyle factors. Mortality data by ICD-10 were obtained from linkage to Office of National Statistics. Models were developed using deep learning, random forest and Cox regression. Calibration was assessed by comparing observed to predicted risks; and discrimination by area under the ‘receiver operating curve’ (AUC). Findings: 14,418 deaths (2.9%) occurred over a total follow-up time of 3,508,454 person-years. A simple age and gender Cox model was the least predictive (AUC 0.689, 95% CI 0.681–0.699). A multivariate Cox regression model significantly improved discrimination by 6.2% (AUC 0.751, 95% CI 0.748–0.767). The application of machine-learning algorithms further improved discrimination by 3.2% using random forest (AUC 0.783, 95% CI 0.776–0.791) and 3.9% using deep learning (AUC 0.790, 95% CI 0.783–0.797). These ML algorithms improved discrimination by 9.4% and 10.1% respectively from a simple age and gender Cox regression model. Random forest and deep learning achieved similar levels of discrimination with no significant difference. Machine-learning algorithms were well-calibrated, while Cox regression models consistently over-predicted risk. Conclusions: Machine-learning significantly improved accuracy of prediction of premature all-cause mortality in this middle-aged population, compared to standard methods. This study illustrates the value of machine-learning for risk prediction within a traditional epidemiological study design, and how this approach might be reported to assist scientific verification.

Suggested Citation

  • Stephen F Weng & Luis Vaz & Nadeem Qureshi & Joe Kai, 2019. "Prediction of premature all-cause mortality: A prospective general population cohort study comparing machine-learning and standard epidemiological approaches," PLOS ONE, Public Library of Science, vol. 14(3), pages 1-22, March.
  • Handle: RePEc:plo:pone00:0214365
    DOI: 10.1371/journal.pone.0214365
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0214365
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0214365&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0214365?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Kun-Hsing Yu & Ce Zhang & Gerald J. Berry & Russ B. Altman & Christopher Ré & Daniel L. Rubin & Michael Snyder, 2016. "Predicting non-small cell lung cancer prognosis by fully automated microscopic pathology image features," Nature Communications, Nature, vol. 7(1), pages 1-10, November.
    2. Stephen F Weng & Jenna Reps & Joe Kai & Jonathan M Garibaldi & Nadeem Qureshi, 2017. "Can machine-learning improve cardiovascular risk prediction using routine clinical data?," PLOS ONE, Public Library of Science, vol. 12(4), pages 1-14, April.
    3. Andre Esteva & Brett Kuprel & Roberto A. Novoa & Justin Ko & Susan M. Swetter & Helen M. Blau & Sebastian Thrun, 2017. "Dermatologist-level classification of skin cancer with deep neural networks," Nature, Nature, vol. 542(7639), pages 115-118, February.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. George Papantonopoulos & Chryssa Delatola & Keiso Takahashi & Marja L Laine & Bruno G Loos, 2019. "Hidden noise in immunologic parameters might explain rapid progression in early-onset periodontitis," PLOS ONE, Public Library of Science, vol. 14(11), pages 1-14, November.
    2. Salvatore Tedesco & Martina Andrulli & Markus Åkerlund Larsson & Daniel Kelly & Antti Alamäki & Suzanne Timmons & John Barton & Joan Condell & Brendan O’Flynn & Anna Nordström, 2021. "Comparison of Machine Learning Techniques for Mortality Prediction in a Prospective Cohort of Older Adults," IJERPH, MDPI, vol. 18(23), pages 1-18, December.
    3. Qiufen Sun & Liyun Zhao & Yuxiang Yang & Yinqi Ding & Canqing Yu & Dianjianyi Sun & Yuanjie Pang & Pei Pei & Ling Yang & Yiping Chen & Huaidong Du & Ranran Du & Maxim Barnard & Junshi Chen & Zhengming, 2025. "A simulation study of the impact of population-wide lifestyle modifications on life expectancy in the Chinese population," Nature Communications, Nature, vol. 16(1), pages 1-11, December.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Emily J MacKay & Michael D Stubna & Corey Chivers & Michael E Draugelis & William J Hanson & Nimesh D Desai & Peter W Groeneveld, 2021. "Application of machine learning approaches to administrative claims data to predict clinical outcomes in medical and surgical patient populations," PLOS ONE, Public Library of Science, vol. 16(6), pages 1-14, June.
    2. Majd Oteibi & Adam Tamimi & Kaneez Abbas & Gabriel Tamimi & Danesh Khazaei & Hadi Khazaei, 2024. "Advancing Digital Health using AI and Machine Learning Solutions for Early Ultrasonic Detection of Breast Disorders in Women," International Journal of Research and Scientific Innovation, International Journal of Research and Scientific Innovation (IJRSI), vol. 11(11), pages 518-527, November.
    3. Syed Ibrar Hussain & Elena Toscano, 2025. "Enhancing Recognition and Categorization of Skin Lesions with Tailored Deep Convolutional Networks and Robust Data Augmentation Techniques," Mathematics, MDPI, vol. 13(9), pages 1-36, April.
    4. von Walter, Benjamin & Wentzel, Daniel & Raff, Stefan, 2023. "Should service firms introduce algorithmic advice to their existing customers? The moderating effect of service relationships," Journal of Retailing, Elsevier, vol. 99(2), pages 280-296.
    5. Mirza Rizwan Sajid & Bader A. Almehmadi & Waqas Sami & Mansour K. Alzahrani & Noryanti Muhammad & Christophe Chesneau & Asif Hanif & Arshad Ali Khan & Ahmad Shahbaz, 2021. "Development of Nonlaboratory-Based Risk Prediction Models for Cardiovascular Diseases Using Conventional and Machine Learning Approaches," IJERPH, MDPI, vol. 18(23), pages 1-16, November.
    6. Sidra Mehboob, Maryam Bukhari, Yaser Ali Shah, SalabatKhan, MuhammadSharif, 2025. "Enhanced Skin Cancer Classification with MobileNetV3 and Morphological Preprocessing: A Deep Learning-Based Extension," International Journal of Innovations in Science & Technology, 50sea, vol. 7(7), pages 1-12, May.
    7. Han Li & Feng Tian, 2026. "Advancing Decision-Making through AI-Human Collaboration: A Systematic Review and Conceptual Framework," Group Decision and Negotiation, Springer, vol. 35(2), pages 1-24, June.
    8. Riccardo Zanardelli, 2025. "Navigating the safe harbor paradox in human-machine systems," Papers 2509.14057, arXiv.org, revised Jan 2026.
    9. repec:bjc:journl:v:12:y:2025:i:9:p:2881-2888 is not listed on IDEAS
    10. Lin Lu & Laurent Dercle & Binsheng Zhao & Lawrence H. Schwartz, 2021. "Deep learning for the prediction of early on-treatment response in metastatic colorectal cancer from serial medical imaging," Nature Communications, Nature, vol. 12(1), pages 1-11, December.
    11. Sangwon Chae & Sungjun Kwon & Donghyun Lee, 2018. "Predicting Infectious Disease Using Deep Learning and Big Data," IJERPH, MDPI, vol. 15(8), pages 1-20, July.
    12. Salvatore Tedesco & Martina Andrulli & Markus Åkerlund Larsson & Daniel Kelly & Antti Alamäki & Suzanne Timmons & John Barton & Joan Condell & Brendan O’Flynn & Anna Nordström, 2021. "Comparison of Machine Learning Techniques for Mortality Prediction in a Prospective Cohort of Older Adults," IJERPH, MDPI, vol. 18(23), pages 1-18, December.
    13. Kita-Wojciechowska Kinga & Kidziński Łukasz, 2019. "Google Street View image predicts car accident risk," Central European Economic Journal, Sciendo, vol. 6(53), pages 151-163, January.
    14. Zheng Yan & Wenqian Robertson & Yaosheng Lou & Tom W. Robertson & Sung Yong Park, 2021. "Finding leading scholars in mobile phone behavior: a mixed-method analysis of an emerging interdisciplinary field," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(12), pages 9499-9517, December.
    15. Songhee Cheon & Jungyoon Kim & Jihye Lim, 2019. "The Use of Deep Learning to Predict Stroke Patient Mortality," IJERPH, MDPI, vol. 16(11), pages 1-12, May.
    16. Marcus Buckmann & Andy Haldane & Anne-Caroline Hüser, 2021. "Comparing minds and machines: implications for financial stability," Oxford Review of Economic Policy, Oxford University Press and Oxford Review of Economic Policy Limited, vol. 37(3), pages 479-508.
    17. Ajay Dev & Sanjay Kumar Malik, 2021. "Artificial Bee Colony Optimized Deep Neural Network Model for Handling Imbalanced Stroke Data: ABC-DNN for Prediction of Stroke," International Journal of E-Health and Medical Communications (IJEHMC), IGI Global Scientific Publishing, vol. 12(5), pages 67-83, September.
    18. Sourov Ahmed & Marjan Akter Badhon & Mahmudul Hassan Maruf, 2025. "A Survey-Driven Ensemble Approach to Predicting Sovereign Debt Distress in Bangladesh," International Journal of Scientific Research and Modern Technology, Prasu Publications, vol. 4(10), pages 103-114.
    19. Freddy Gabbay & Rotem Lev Aharoni & Ori Schweitzer, 2022. "Deep Neural Network Memory Performance and Throughput Modeling and Simulation Framework," Mathematics, MDPI, vol. 10(21), pages 1-20, November.
    20. Matteo D’Antonio & Wilfredo G. Gonzalez Rivera & Robert A. Greenes & Melissa Gymrek & Kelly A. Frazer, 2025. "A highly accurate risk factor-based XGBoost multiethnic model for identifying patients with skin cancer," Nature Communications, Nature, vol. 16(1), pages 1-15, December.
    21. Sebastian Gehrmann & Franck Dernoncourt & Yeran Li & Eric T Carlson & Joy T Wu & Jonathan Welt & John Foote Jr. & Edward T Moseley & David W Grant & Patrick D Tyler & Leo A Celi, 2018. "Comparing deep learning and concept extraction based methods for patient phenotyping from clinical narratives," PLOS ONE, Public Library of Science, vol. 13(2), pages 1-19, February.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0214365. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.