IDEAS home Printed from https://ideas.repec.org/a/plo/pdig00/0000309.html
   My bibliography  Save this article

Training and testing of a gradient boosted machine learning model to predict adverse outcome in patients presenting to emergency departments with suspected covid-19 infection in a middle-income setting

Author

Listed:
  • Gordon Ward Fuller
  • Madina Hasan
  • Peter Hodkinson
  • David McAlpine
  • Steve Goodacre
  • Peter A Bath
  • Laura Sbaffi
  • Yasein Omer
  • Lee Wallis
  • Carl Marincowitz

Abstract

COVID-19 infection rates remain high in South Africa. Clinical prediction models may be helpful for rapid triage, and supporting clinical decision making, for patients with suspected COVID-19 infection. The Western Cape, South Africa, has integrated electronic health care data facilitating large-scale linked routine datasets. The aim of this study was to develop a machine learning model to predict adverse outcome in patients presenting with suspected COVID-19 suitable for use in a middle-income setting. A retrospective cohort study was conducted using linked, routine data, from patients presenting with suspected COVID-19 infection to public-sector emergency departments (EDs) in the Western Cape, South Africa between 27th August 2020 and 31st October 2021. The primary outcome was death or critical care admission at 30 days. An XGBoost machine learning model was trained and internally tested using split-sample validation. External validation was performed in 3 test cohorts: Western Cape patients presenting during the Omicron COVID-19 wave, a UK cohort during the ancestral COVID-19 wave, and a Sudanese cohort during ancestral and Eta waves. A total of 282,051 cases were included in a complete case training dataset. The prevalence of 30-day adverse outcome was 4.0%. The most important features for predicting adverse outcome were the requirement for supplemental oxygen, peripheral oxygen saturations, level of consciousness and age. Internal validation using split-sample test data revealed excellent discrimination (C-statistic 0.91, 95% CI 0.90 to 0.91) and calibration (CITL of 1.05). The model achieved C-statistics of 0.84 (95% CI 0.84 to 0.85), 0.72 (95% CI 0.71 to 0.73), and 0.62, (95% CI 0.59 to 0.65) in the Omicron, UK, and Sudanese test cohorts. Results were materially unchanged in sensitivity analyses examining missing data. An XGBoost machine learning model achieved good discrimination and calibration in prediction of adverse outcome in patients presenting with suspected COVID19 to Western Cape EDs. Performance was reduced in temporal and geographical external validation.Author summary: The coronavirus disease 2019 (COVID-19) pandemic continues, with ongoing high infection rates. Clinical prediction models are tools that compute the risk of a given patient outcome based on a set of individual characteristics. Such models may be helpful for rapid triage, and supporting clinical decision making, for patients with suspected COVID-19 infection. Machine learning is where a data is provided to a computer algorithm to produce a mathematical model for prediction of future outcomes, such as a clinical prediction model. We developed a machine learning algorithm in many patients with suspected COVID-19 infection from the Western Cape, South Africa during their initial pandemic wave. We then tested it in three other groups of patients: Western Cape patients presenting during the Omicron COVID-19 wave, a UK cohort during the ancestral COVID-19 wave, and a Sudanese cohort during ancestral and Eta waves. We found that the most important features for predicting adverse outcome were the requirement for supplemental oxygen, peripheral oxygen saturations, level of consciousness and age. Our model performed well in Western Cape patients during the initial COVID19 pandemic wave. The model could strongly identify patients who subsequently died or required intensive care treatment. However, performance was reduced in the other settings.

Suggested Citation

  • Gordon Ward Fuller & Madina Hasan & Peter Hodkinson & David McAlpine & Steve Goodacre & Peter A Bath & Laura Sbaffi & Yasein Omer & Lee Wallis & Carl Marincowitz, 2023. "Training and testing of a gradient boosted machine learning model to predict adverse outcome in patients presenting to emergency departments with suspected covid-19 infection in a middle-income settin," PLOS Digital Health, Public Library of Science, vol. 2(9), pages 1-18, September.
  • Handle: RePEc:plo:pdig00:0000309
    DOI: 10.1371/journal.pdig.0000309
    as

    Download full text from publisher

    File URL: https://journals.plos.org/digitalhealth/article?id=10.1371/journal.pdig.0000309
    Download Restriction: no

    File URL: https://journals.plos.org/digitalhealth/article/file?id=10.1371/journal.pdig.0000309&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pdig.0000309?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pdig00:0000309. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: digitalhealth (email available below). General contact details of provider: https://journals.plos.org/digitalhealth .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.