IDEAS home Printed from https://ideas.repec.org/a/plo/pntd00/0007969.html
   My bibliography  Save this article

Severity Index for Suspected Arbovirus (SISA): Machine learning for accurate prediction of hospitalization in subjects suspected of arboviral infection

Author

Listed:
  • Rachel Sippy
  • Daniel F Farrell
  • Daniel A Lichtenstein
  • Ryan Nightingale
  • Megan A Harris
  • Joseph Toth
  • Paris Hantztidiamantis
  • Nicholas Usher
  • Cinthya Cueva Aponte
  • Julio Barzallo Aguilar
  • Anthony Puthumana
  • Christina D Lupone
  • Timothy Endy
  • Sadie J Ryan
  • Anna M Stewart Ibarra

Abstract

Background: Dengue, chikungunya, and Zika are arboviruses of major global health concern. Decisions regarding the clinical management of suspected arboviral infection are challenging in resource-limited settings, particularly when deciding on patient hospitalization. The objective of this study was to determine if hospitalization of individuals with suspected arboviral infections could be predicted using subject intake data. Methodology/Principal findings: Two prediction models were developed using data from a surveillance study in Machala, a city in southern coastal Ecuador with a high burden of arboviral infections. Data were obtained from subjects who presented at sentinel medical centers with suspected arboviral infection (November 2013 to September 2017). The first prediction model—called the Severity Index for Suspected Arbovirus (SISA)—used only demographic and symptom data. The second prediction model—called the Severity Index for Suspected Arbovirus with Laboratory (SISAL)—incorporated laboratory data. These models were selected by comparing the prediction ability of seven machine learning algorithms; the area under the receiver operating characteristic curve from the prediction of a test dataset was used to select the final algorithm for each model. After eliminating those with missing data, the SISA dataset had 534 subjects, and the SISAL dataset had 98 subjects. For SISA, the best prediction algorithm was the generalized boosting model, with an AUC of 0.91. For SISAL, the best prediction algorithm was the elastic net with an AUC of 0.94. A sensitivity analysis revealed that SISA and SISAL are not directly comparable to one another. Conclusions/Significance: Both SISA and SISAL were able to predict arbovirus hospitalization with a high degree of accuracy in our dataset. These algorithms will need to be tested and validated on new data from future patients. Machine learning is a powerful prediction tool and provides an excellent option for new management tools and clinical assessment of arboviral infection. Author summary: Patient triage is a critical decision for clinicians. Patients with suspected arbovirus infection are difficult to diagnose as symptoms can be vague and molecular testing can be expensive or unavailable. Determining whether these patients should be hospitalized or not can be challenging, especially in resource-limited settings. Our study included data from 543 subjects with a diagnosis of suspected dengue, chikungunya, or Zika infection. Using a machine learning approach, we tested the ability of seven algorithms to predict hospitalization status based on the signs, symptoms, and laboratory data that would be available to a clinician at patient intake. Using only signs and symptoms, we were able to predict hospitalization with high accuracy (94%). Including laboratory data also resulted in highly accurate prediction of hospitalization (92%). This tool should be tested in future studies with new subject data. Upon further development, we envision a simple mobile application to aid in the decision-making process for clinicians in areas with limited resources.

Suggested Citation

  • Rachel Sippy & Daniel F Farrell & Daniel A Lichtenstein & Ryan Nightingale & Megan A Harris & Joseph Toth & Paris Hantztidiamantis & Nicholas Usher & Cinthya Cueva Aponte & Julio Barzallo Aguilar & An, 2020. "Severity Index for Suspected Arbovirus (SISA): Machine learning for accurate prediction of hospitalization in subjects suspected of arboviral infection," PLOS Neglected Tropical Diseases, Public Library of Science, vol. 14(2), pages 1-20, February.
  • Handle: RePEc:plo:pntd00:0007969
    DOI: 10.1371/journal.pntd.0007969
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosntds/article?id=10.1371/journal.pntd.0007969
    Download Restriction: no

    File URL: https://journals.plos.org/plosntds/article/file?id=10.1371/journal.pntd.0007969&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pntd.0007969?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Kuhn, Max, 2008. "Building Predictive Models in R Using the caret Package," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 28(i05).
    2. Friedman, Jerome H., 2002. "Stochastic gradient boosting," Computational Statistics & Data Analysis, Elsevier, vol. 38(4), pages 367-378, February.
    3. Jimena Barbeito-Andrés & Lavínia Schuler-Faccini & Patricia Pestana Garcez, 2018. "Why is congenital Zika syndrome asymmetrically distributed among human populations?," PLOS Biology, Public Library of Science, vol. 16(8), pages 1-11, August.
    4. Simon N. Wood, 2003. "Thin plate regression splines," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 65(1), pages 95-114, February.
    5. Simon N. Wood, 2011. "Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 73(1), pages 3-36, January.
    6. Friedman, Jerome H. & Hastie, Trevor & Tibshirani, Rob, 2010. "Regularization Paths for Generalized Linear Models via Coordinate Descent," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 33(i01).
    7. Karatzoglou, Alexandros & Smola, Alexandros & Hornik, Kurt & Zeileis, Achim, 2004. "kernlab - An S4 Package for Kernel Methods in R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 11(i09).
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Bellotti, Anthony & Brigo, Damiano & Gambetti, Paolo & Vrins, Frédéric, 2021. "Forecasting recovery rates on non-performing loans with machine learning," International Journal of Forecasting, Elsevier, vol. 37(1), pages 428-444.
    2. Maria-Carmen García-Centeno & Román Mínguez-Salido & Raúl del Pozo-Rubio, 2021. "The Classification of Profiles of Financial Catastrophe Caused by Out-of-Pocket Payments: A Methodological Approach," Mathematics, MDPI, vol. 9(11), pages 1-20, May.
    3. Fitzpatrick, Trevor & Mues, Christophe, 2016. "An empirical comparison of classification algorithms for mortgage default prediction: evidence from a distressed mortgage market," European Journal of Operational Research, Elsevier, vol. 249(2), pages 427-439.
    4. Paul Ghelasi & Florian Ziel, 2024. "From day-ahead to mid and long-term horizons with econometric electricity price forecasting models," Papers 2406.00326, arXiv.org, revised Aug 2024.
    5. Longhi, Christian & Musolesi, Antonio & Baumont, Catherine, 2014. "Modeling structural change in the European metropolitan areas during the process of economic integration," Economic Modelling, Elsevier, vol. 37(C), pages 395-407.
    6. Štefan Lyócsa & Petra Vašaničová & Branka Hadji Misheva & Marko Dávid Vateha, 2022. "Default or profit scoring credit systems? Evidence from European and US peer-to-peer lending markets," Financial Innovation, Springer;Southwestern University of Finance and Economics, vol. 8(1), pages 1-21, December.
    7. Roberto Basile & Luigi Benfratello & Davide Castellani, 2012. "Geoadditive models for regional count data: an application to industrial location," ERSA conference papers ersa12p83, European Regional Science Association.
    8. E. Zanini & E. Eastoe & M. J. Jones & D. Randell & P. Jonathan, 2020. "Flexible covariate representations for extremes," Environmetrics, John Wiley & Sons, Ltd., vol. 31(5), August.
    9. Ji, Shujuan & Liu, Xiaojie & Wang, Yuanqing, 2024. "The role of road infrastructures in the usage of bikeshare and private bicycle," Transport Policy, Elsevier, vol. 149(C), pages 234-246.
    10. Van Belle, Jente & Guns, Tias & Verbeke, Wouter, 2021. "Using shared sell-through data to forecast wholesaler demand in multi-echelon supply chains," European Journal of Operational Research, Elsevier, vol. 288(2), pages 466-479.
    11. Ronald E. Gangnon & Natasha K. Stout & Oguzhan Alagoz & John M. Hampton & Brian L. Sprague & Amy Trentham-Dietz, 2018. "Contribution of Breast Cancer to Overall Mortality for US Women," Medical Decision Making, , vol. 38(1_suppl), pages 24-31, April.
    12. Marra, Giampiero & Wood, Simon N., 2011. "Practical variable selection for generalized additive models," Computational Statistics & Data Analysis, Elsevier, vol. 55(7), pages 2372-2387, July.
    13. Andrea S Martinez-Vernon & James A Covington & Ramesh P Arasaradnam & Siavash Esfahani & Nicola O’Connell & Ioannis Kyrou & Richard S Savage, 2018. "An improved machine learning pipeline for urinary volatiles disease detection: Diagnosing diabetes," PLOS ONE, Public Library of Science, vol. 13(9), pages 1-20, September.
    14. Jun Wang & Jinyong Huang & Yunlong Hu & Qianwen Guo & Shasha Zhang & Jinglin Tian & Yanqin Niu & Ling Ji & Yuzhong Xu & Peijun Tang & Yaqin He & Yuna Wang & Shuya Zhang & Hao Yang & Kang Kang & Xinchu, 2024. "Terminal modifications independent cell-free RNA sequencing enables sensitive early cancer detection and classification," Nature Communications, Nature, vol. 15(1), pages 1-13, December.
    15. Basile, Roberto & Durbán, María & Mínguez, Román & María Montero, Jose & Mur, Jesús, 2014. "Modeling regional economic dynamics: Spatial dependence, spatial heterogeneity and nonlinearities," Journal of Economic Dynamics and Control, Elsevier, vol. 48(C), pages 229-245.
    16. Yagli, Gokhan Mert & Yang, Dazhi & Srinivasan, Dipti, 2019. "Automatic hourly solar forecasting using machine learning models," Renewable and Sustainable Energy Reviews, Elsevier, vol. 105(C), pages 487-498.
    17. Feuillet, Thierry & Bulteau, Julie & Dantan, Sophie, 2021. "Modelling context-specific relationships between neighbourhood socioeconomic disadvantage and private car use," Journal of Transport Geography, Elsevier, vol. 93(C).
    18. Shailendra Gurjar & Usha Ananthakumar, 2023. "The economics of art: price determinants and returns on investment in Indian paintings," International Journal of Social Economics, Emerald Group Publishing Limited, vol. 50(6), pages 839-859, January.
    19. Distaso, Walter & Roccazzella, Francesco & Vrins, Frédéric, 2023. "Business cycle and realized losses in the consumer credit industry," LIDAM Discussion Papers LFIN 2023007, Université catholique de Louvain, Louvain Finance (LFIN).
    20. Gressani, Oswaldo & Lambert, Philippe, 2021. "Laplace approximations for fast Bayesian inference in generalized additive models based on P-splines," Computational Statistics & Data Analysis, Elsevier, vol. 154(C).

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pntd00:0007969. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosntds (email available below). General contact details of provider: https://journals.plos.org/plosntds/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.