IDEAS home Printed from https://ideas.repec.org/a/plo/pdig00/0000276.html
   My bibliography  Save this article

AutoPrognosis 2.0: Democratizing diagnostic and prognostic modeling in healthcare with automated machine learning

Author

Listed:
  • Fergus Imrie
  • Bogdan Cebere
  • Eoin F McKinney
  • Mihaela van der Schaar

Abstract

Diagnostic and prognostic models are increasingly important in medicine and inform many clinical decisions. Recently, machine learning approaches have shown improvement over conventional modeling techniques by better capturing complex interactions between patient covariates in a data-driven manner. However, the use of machine learning introduces technical and practical challenges that have thus far restricted widespread adoption of such techniques in clinical settings. To address these challenges and empower healthcare professionals, we present an open-source machine learning framework, AutoPrognosis 2.0, to facilitate the development of diagnostic and prognostic models. AutoPrognosis leverages state-of-the-art advances in automated machine learning to develop optimized machine learning pipelines, incorporates model explainability tools, and enables deployment of clinical demonstrators, without requiring significant technical expertise. To demonstrate AutoPrognosis 2.0, we provide an illustrative application where we construct a prognostic risk score for diabetes using the UK Biobank, a prospective study of 502,467 individuals. The models produced by our automated framework achieve greater discrimination for diabetes than expert clinical risk scores. We have implemented our risk score as a web-based decision support tool, which can be publicly accessed by patients and clinicians. By open-sourcing our framework as a tool for the community, we aim to provide clinicians and other medical practitioners with an accessible resource to develop new risk scores, personalized diagnostics, and prognostics using machine learning techniques.Software: https://github.com/vanderschaarlab/AutoPrognosisAuthor summary: Previous studies have reported promising applications of machine learning (ML) approaches in healthcare. However, there remain significant challenges to using ML for diagnostic and prognostic modeling, particularly for non-ML experts, that currently prevent broader adoption of these approaches. We developed an open-source tool, AutoPrognosis 2.0, to address these challenges and make modern statistical and machine learning methods available to expert and non-expert ML users. AutoPrognosis configures and optimizes ML pipelines using automated machine learning to develop powerful predictive models, while also providing interpretability methods to allow users to understand and debug these models. This study illustrates the application of AutoPrognosis to diabetes risk prediction using data from UK Biobank. The risk score developed using AutoPrognosis outperforms existing risk scores and has been implemented as a web-based decision support tool that can be publicly accessed by patients and clinicians. This study suggests that AutoPrognosis 2.0 can be used by healthcare experts to create new clinical tools and predictive pipelines across various clinical outcomes, employing advanced machine learning techniques.

Suggested Citation

  • Fergus Imrie & Bogdan Cebere & Eoin F McKinney & Mihaela van der Schaar, 2023. "AutoPrognosis 2.0: Democratizing diagnostic and prognostic modeling in healthcare with automated machine learning," PLOS Digital Health, Public Library of Science, vol. 2(6), pages 1-21, June.
  • Handle: RePEc:plo:pdig00:0000276
    DOI: 10.1371/journal.pdig.0000276
    as

    Download full text from publisher

    File URL: https://journals.plos.org/digitalhealth/article?id=10.1371/journal.pdig.0000276
    Download Restriction: no

    File URL: https://journals.plos.org/digitalhealth/article/file?id=10.1371/journal.pdig.0000276&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pdig.0000276?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Andrew J. Vickers & Elena B. Elkin, 2006. "Decision Curve Analysis: A Novel Method for Evaluating Prediction Models," Medical Decision Making, , vol. 26(6), pages 565-574, November.
    2. van Buuren, Stef & Groothuis-Oudshoorn, Karin, 2011. "mice: Multivariate Imputation by Chained Equations in R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 45(i03).
    3. Marc-Andre Schulz & B. T. Thomas Yeo & Joshua T. Vogelstein & Janaina Mourao-Miranada & Jakob N. Kather & Konrad Kording & Blake Richards & Danilo Bzdok, 2020. "Different scaling of linear models and deep learning in UKBiobank brain images versus machine-learning datasets," Nature Communications, Nature, vol. 11(1), pages 1-15, December.
    4. Vickers, Andrew J, 2008. "Decision Analysis for the Evaluation of Diagnostic Tests, Prediction Models, and Molecular Markers," The American Statistician, American Statistical Association, vol. 62(4), pages 314-320.
    5. repec:plo:pmed00:1001779 is not listed on IDEAS
    6. Crone, Sven F. & Lessmann, Stefan & Stahlbock, Robert, 2006. "The impact of preprocessing on data mining: An evaluation of classifier sensitivity in direct marketing," European Journal of Operational Research, Elsevier, vol. 173(3), pages 781-800, September.
    7. Ahmed M Alaa & Thomas Bolton & Emanuele Di Angelantonio & James H F Rudd & Mihaela van der Schaar, 2019. "Cardiovascular disease risk prediction using automated machine learning: A prospective study of 423,604 UK Biobank participants," PLOS ONE, Public Library of Science, vol. 14(5), pages 1-17, May.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Rawan Omar & Sooyun Caroline Tavolacci & Lathan Liou & Dillan F Villavisanis & Yoav Y Broza & Hossam Haick, 2024. "Real-time prognostic biomarkers for predicting in-hospital mortality and cardiac complications in COVID-19 patients," PLOS Global Public Health, Public Library of Science, vol. 4(3), pages 1-17, March.
    2. Dexin Chen & Meiting Fu & Liangjie Chi & Liyan Lin & Jiaxin Cheng & Weisong Xue & Chenyan Long & Wei Jiang & Xiaoyu Dong & Jian Sui & Dajia Lin & Jianping Lu & Shuangmu Zhuo & Side Liu & Guoxin Li & G, 2022. "Prognostic and predictive value of a pathomics signature in gastric cancer," Nature Communications, Nature, vol. 13(1), pages 1-13, December.
    3. Alex Thompson & Scott Devine & Mike Kattan & Andrew Muir, 2014. "Prediction of Treatment Week Eight Response & Sustained Virologic Response in Patients Treated with Boceprevir Plus Peginterferon Alfa and Ribavirin," PLOS ONE, Public Library of Science, vol. 9(8), pages 1-8, August.
    4. Tracey L. Marsh & Holly Janes & Margaret S. Pepe, 2020. "Statistical inference for net benefit measures in biomarker validation studies," Biometrics, The International Biometric Society, vol. 76(3), pages 843-852, September.
    5. Tae Yoon Lee & Paul Gustafson & Mohsen Sadatsafavi, 2023. "Closed-Form Solution of the Unit Normal Loss Integral in 2 Dimensions, with Application in Value-of-Information Analysis," Medical Decision Making, , vol. 43(5), pages 621-626, July.
    6. Baker Stuart G. & Van Calster Ben & Steyerberg Ewout W., 2012. "Evaluating a New Marker for Risk Prediction Using the Test Tradeoff: An Update," The International Journal of Biostatistics, De Gruyter, vol. 8(1), pages 1-37, March.
    7. Kevin Sandeman & Juho T Eineluoto & Joona Pohjonen & Andrew Erickson & Tuomas P Kilpeläinen & Petrus Järvinen & Henrikki Santti & Anssi Petas & Mika Matikainen & Suvi Marjasuo & Anu Kenttämies & Tuoma, 2020. "Prostate MRI added to CAPRA, MSKCC and Partin cancer nomograms significantly enhances the prediction of adverse findings and biochemical recurrence after radical prostatectomy," PLOS ONE, Public Library of Science, vol. 15(7), pages 1-14, July.
    8. Todd J. Levy & Kevin Coppa & Jinxuan Cang & Douglas P. Barnaby & Marc D. Paradis & Stuart L. Cohen & Alex Makhnevich & David Klaveren & David M. Kent & Karina W. Davidson & Jamie S. Hirsch & Theodoros, 2022. "Development and validation of self-monitoring auto-updating prognostic models of survival for hospitalized COVID-19 patients," Nature Communications, Nature, vol. 13(1), pages 1-14, December.
    9. Jérôme Allyn & Cyril Ferdynus & Michel Bohrer & Cécile Dalban & Dorothée Valance & Nicolas Allou, 2016. "Simplified Acute Physiology Score II as Predictor of Mortality in Intensive Care Units: A Decision Curve Analysis," PLOS ONE, Public Library of Science, vol. 11(10), pages 1-11, October.
    10. Noémi Kreif & Richard Grieve & Iván Díaz & David Harrison, 2015. "Evaluation of the Effect of a Continuous Treatment: A Machine Learning Approach with an Application to Treatment for Traumatic Brain Injury," Health Economics, John Wiley & Sons, Ltd., vol. 24(9), pages 1213-1228, September.
    11. Abhilash Bandam & Eedris Busari & Chloi Syranidou & Jochen Linssen & Detlef Stolten, 2022. "Classification of Building Types in Germany: A Data-Driven Modeling Approach," Data, MDPI, vol. 7(4), pages 1-23, April.
    12. Boonstra Philip S. & Little Roderick J.A. & West Brady T. & Andridge Rebecca R. & Alvarado-Leiton Fernanda, 2021. "A Simulation Study of Diagnostics for Selection Bias," Journal of Official Statistics, Sciendo, vol. 37(3), pages 751-769, September.
    13. Lin Lin & Rachel L Spreng & Kelly E Seaton & S Moses Dennison & Lindsay C Dahora & Daniel J Schuster & Sheetal Sawant & Peter B Gilbert & Youyi Fong & Neville Kisalu & Andrew J Pollard & Georgia D Tom, 2024. "GeM-LR: Discovering predictive biomarkers for small datasets in vaccine studies," PLOS Computational Biology, Public Library of Science, vol. 20(11), pages 1-23, November.
    14. Ja Hyeon Ku & Myong Kim & Seok-Soo Byun & Hyeon Jeong & Cheol Kwak & Hyeon Hoe Kim & Sang Eun Lee, 2015. "External Validation of Models for Prediction of Lymph Node Metastasis in Urothelial Carcinoma of the Bladder," PLOS ONE, Public Library of Science, vol. 10(10), pages 1-10, October.
    15. Christopher J Greenwood & George J Youssef & Primrose Letcher & Jacqui A Macdonald & Lauryn J Hagg & Ann Sanson & Jenn Mcintosh & Delyse M Hutchinson & John W Toumbourou & Matthew Fuller-Tyszkiewicz &, 2020. "A comparison of penalised regression methods for informing the selection of predictive markers," PLOS ONE, Public Library of Science, vol. 15(11), pages 1-14, November.
    16. Liangyuan Hu & Lihua Li, 2022. "Using Tree-Based Machine Learning for Health Studies: Literature Review and Case Series," IJERPH, MDPI, vol. 19(23), pages 1-13, December.
    17. Norah Alyabs & Sy Han Chiou, 2022. "The Missing Indicator Approach for Accelerated Failure Time Model with Covariates Subject to Limits of Detection," Stats, MDPI, vol. 5(2), pages 1-13, May.
    18. Feldkircher, Martin, 2014. "The determinants of vulnerability to the global financial crisis 2008 to 2009: Credit growth and other sources of risk," Journal of International Money and Finance, Elsevier, vol. 43(C), pages 19-49.
    19. repec:plo:pone00:0154450 is not listed on IDEAS
    20. Eunsil Seok & Akhgar Ghassabian & Yuyan Wang & Mengling Liu, 2024. "Statistical Methods for Modeling Exposure Variables Subject to Limit of Detection," Statistics in Biosciences, Springer;International Chinese Statistical Association, vol. 16(2), pages 435-458, July.
    21. Ida Kubiszewski & Kenneth Mulder & Diane Jarvis & Robert Costanza, 2022. "Toward better measurement of sustainable development and wellbeing: A small number of SDG indicators reliably predict life satisfaction," Sustainable Development, John Wiley & Sons, Ltd., vol. 30(1), pages 139-148, February.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pdig00:0000276. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: digitalhealth (email available below). General contact details of provider: https://journals.plos.org/digitalhealth .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.