IDEAS home Printed from https://ideas.repec.org/a/plo/pntd00/0012599.html
   My bibliography  Save this article

Hybrid Machine Learning Approach to Zero-Inflated Data Improves Accuracy of Dengue Prediction

Author

Listed:
  • Micanaldo Ernesto Francisco
  • Thaddeus M Carvajal
  • Kozo Watanabe

Abstract

Background: Spatiotemporal dengue forecasting using machine learning (ML) can contribute to the development of prevention and control strategies for impending dengue outbreaks. However, training data for dengue incidence may be inflated with frequent zero values because of the rarity of cases, which lowers the prediction accuracy. This study aimed to understand the influence of spatiotemporal resolutions of data on the accuracy of dengue incidence prediction using ML models, to understand how the influence of spatiotemporal resolution differs between quantitative and qualitative predictions of dengue incidence, and to improve the accuracy of dengue incidence prediction with zero-inflated data. Methodology: We predicted dengue incidence at six spatiotemporal resolutions and compared their prediction accuracy. Six ML algorithms were compared: generalized additive models, random forests, conditional inference forest, artificial neural networks, support vector machines and regression, and extreme gradient boosting. Data from 2009 to 2012 were used for training, and data from 2013 were used for model validation with quantitative and qualitative dengue variables. To address the inaccuracy in the quantitative prediction of dengue incidence due to zero-inflated data at fine spatiotemporal scales, we developed a hybrid approach in which the second-stage quantitative prediction is performed only when/where the first-stage qualitative model predicts the occurrence of dengue cases. Principal findings: At higher resolutions, the dengue incidence data were zero-inflated, which was insufficient for quantitative pattern extraction of relationships between dengue incidence and environmental variables by ML. Qualitative models, used as binary variables, eased the effect of data distribution. Our novel hybrid approach of combining qualitative and quantitative predictions demonstrated high potential for predicting zero-inflated or rare phenomena, such as dengue. Significance: Our research contributes valuable insights to the field of spatiotemporal dengue prediction and provides an alternative solution to enhance prediction accuracy in zero-inflated data where hurdle or zero-inflated models cannot be applied. Author summary: In our study, we tackled the complex challenge of predicting dengue fever outbreaks, a crucial task in the field of epidemiology. Dengue prediction is complicated because it relies on the quality of data, which may be affected by the temporal and spatial resolution. We explored different machine learning algorithms across various spatial (village, city and region) and temporal resolutions (weekly and monthly). A key hurdle we encountered was the high frequency of zero values in reported dengue cases, a common issue known as zero-inflated data. This phenomenon makes accurate predictions difficult, especially at finer resolutions. To overcome this obstacle, we first made qualitative predictions about the presence or absence of dengue cases. Then, in scenarios indicating disease presence, we estimated the magnitude of cases quantitatively. This innovative method we designated as hybrid approach and significantly enhanced prediction accuracy in zero-inflated data. This approach can be applied to continuous data where zero-inflated or hurdle models cannot be applied. Our findings have broader implications beyond dengue prediction, shedding light on the challenges of dealing with zero-inflated data in various real-world situations. By improving our understanding of these complexities, our research contributes valuable insights that not only benefit scientists working in epidemiology but also have practical applications in public health strategies ensuring more effective and targeted interventions.

Suggested Citation

  • Micanaldo Ernesto Francisco & Thaddeus M Carvajal & Kozo Watanabe, 2024. "Hybrid Machine Learning Approach to Zero-Inflated Data Improves Accuracy of Dengue Prediction," PLOS Neglected Tropical Diseases, Public Library of Science, vol. 18(10), pages 1-22, October.
  • Handle: RePEc:plo:pntd00:0012599
    DOI: 10.1371/journal.pntd.0012599
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosntds/article?id=10.1371/journal.pntd.0012599
    Download Restriction: no

    File URL: https://journals.plos.org/plosntds/article/file?id=10.1371/journal.pntd.0012599&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pntd.0012599?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pntd00:0012599. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosntds (email available below). General contact details of provider: https://journals.plos.org/plosntds/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.