Prediction of hepatitis E using machine learning models

Prediction of hepatitis E using machine learning models

Author

Listed:

Yanhui Guo
Yi Feng
Fuli Qu
Li Zhang
Bingyu Yan
Jingjing Lv

Abstract

Background: Accurate and reliable predictions of infectious disease can be valuable to public health organizations that plan interventions to decrease or prevent disease transmission. A great variety of models have been developed for this task. However, for different data series, the performance of these models varies. Hepatitis E, as an acute liver disease, has been a major public health problem. Which model is more appropriate for predicting the incidence of hepatitis E? In this paper, three different methods are used and the performance of the three methods is compared. Methods: Autoregressive integrated moving average(ARIMA), support vector machine(SVM) and long short-term memory(LSTM) recurrent neural network were adopted and compared. ARIMA was implemented by python with the help of statsmodels. SVM was accomplished by matlab with libSVM library. LSTM was designed by ourselves with Keras, a deep learning library. To tackle the problem of overfitting caused by limited training samples, we adopted dropout and regularization strategies in our LSTM model. Experimental data were obtained from the monthly incidence and cases number of hepatitis E from January 2005 to December 2017 in Shandong province, China. We selected data from July 2015 to December 2017 to validate the models, and the rest was taken as training set. Three metrics were applied to compare the performance of models, including root mean square error(RMSE), mean absolute percentage error(MAPE) and mean absolute error(MAE). Results: By analyzing data, we took ARIMA(1, 1, 1), ARIMA(3, 1, 2) as monthly incidence prediction model and cases number prediction model, respectively. Cross-validation and grid search were used to optimize parameters of SVM. Penalty coefficient C and kernel function parameter g were set 8, 0.125 for incidence prediction, and 22, 0.01 for cases number prediction. LSTM has 4 nodes. Dropout and L2 regularization parameters were set 0.15, 0.001, respectively. By the metrics of RMSE, we obtained 0.022, 0.0204, 0.01 for incidence prediction, using ARIMA, SVM and LSTM. And we obtained 22.25, 20.0368, 11.75 for cases number prediction, using three models. For MAPE metrics, the results were 23.5%, 21.7%, 15.08%, and 23.6%, 21.44%, 13.6%, for incidence prediction and cases number prediction, respectively. For MAE metrics, the results were 0.018, 0.0167, 0.011 and 18.003, 16.5815, 9.984, for incidence prediction and cases number prediction, respectively. Conclusions: Comparing ARIMA, SVM and LSTM, we found that nonlinear models(SVM, LSTM) outperform linear models(ARIMA). LSTM obtained the best performance in all three metrics of RSME, MAPE, MAE. Hence, LSTM is the most suitable for predicting hepatitis E monthly incidence and cases number.

Suggested Citation

Yanhui Guo & Yi Feng & Fuli Qu & Li Zhang & Bingyu Yan & Jingjing Lv, 2020. "Prediction of hepatitis E using machine learning models," PLOS ONE, Public Library of Science, vol. 15(9), pages 1-12, September.

Handle: RePEc:plo:pone00:0237750
DOI: 10.1371/journal.pone.0237750

Download full text from publisher

References listed on IDEAS

Anna L Buczak & Benjamin Baugher & Linda J Moniz & Thomas Bagley & Steven M Babin & Erhan Guven, 2018. "Ensemble method for dengue prediction," PLOS ONE, Public Library of Science, vol. 13(1), pages 1-23, January.

Full references (including those not matched with items on IDEAS)

Citations

Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.

Cited by:

Tassallah Abdullahi & Geoff Nitschke & Neville Sweijd, 2022. "Predicting diarrhoea outbreaks with climate change," PLOS ONE, Public Library of Science, vol. 17(4), pages 1-18, April.
Minghui Wang & Tong Li, 2025. "Pest and Disease Prediction and Management for Sugarcane Using a Hybrid Autoregressive Integrated Moving Average—A Long Short-Term Memory Model," Agriculture, MDPI, vol. 15(5), pages 1-16, February.
Daren Zhao & Huiwu Zhang & Qing Cao & Zhiyi Wang & Sizhang He & Minghua Zhou & Ruihua Zhang, 2022. "The research of ARIMA, GM(1,1), and LSTM models for prediction of TB cases in China," PLOS ONE, Public Library of Science, vol. 17(2), pages 1-18, February.

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Zhichao Li, 2022. "Forecasting Weekly Dengue Cases by Integrating Google Earth Engine-Based Risk Predictor Generation and Google Colab-Based Deep Learning Modeling in Fortaleza and the Federal District, Brazil," IJERPH, MDPI, vol. 19(20), pages 1-16, October.
Panja, Madhurima & Chakraborty, Tanujit & Nadim, Sk Shahid & Ghosh, Indrajit & Kumar, Uttam & Liu, Nan, 2023. "An ensemble neural network approach to forecast Dengue outbreak based on climatic condition," Chaos, Solitons & Fractals, Elsevier, vol. 167(C).
Soudeep Deb & Sougata Deb, 2022. "An ensemble method for early prediction of dengue outbreak," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 185(1), pages 84-101, January.
Prashant Rangarajan & Sandeep K Mody & Madhav Marathe, 2019. "Forecasting dengue and influenza incidences using a sparse representation of Google trends, electronic health records, and time series data," PLOS Computational Biology, Public Library of Science, vol. 15(11), pages 1-24, November.

More about this item

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0237750. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

Prediction of hepatitis E using machine learning models

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Citations

Most related items

More about this item

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data