IDEAS home Printed from https://ideas.repec.org/a/spr/nathaz/v110y2022i1d10.1007_s11069-021-04939-8.html
   My bibliography  Save this article

An evaluation of various data pre-processing techniques with machine learning models for water level prediction

Author

Listed:
  • Ervin Shan Khai Tiu

    (Universiti Tunku Abdul Rahman)

  • Yuk Feng Huang

    (Universiti Tunku Abdul Rahman)

  • Jing Lin Ng

    (UCSI University)

  • Nouar AlDahoul

    (Multimedia University)

  • Ali Najah Ahmed

    (University Tenaga Nasional (UNITEN))

  • Ahmed Elshafie

    (University of Malaya)

Abstract

Floods are the most frequent type of natural disaster. It destroys wildlife habitat, damages bridges, railways, roads, properties, and puts millions of people at risk. As such, flood detection systems have been developed to monitor the changes of water level and raise an alarm should there be imminent danger. River water level prediction is a significant task in flood mitigation planning and floodplains management. Usually, using raw data of rainfall series directly with machine learning (ML) regression methods, does not result in sufficiently good prediction accuracy. The raw data should be pre-processed using specific techniques to enhance their quality a priori to being applied to the prediction methods. This paper serves to address the stated problem by utilizing various data pre-processing techniques such as the Variational Mode Decomposition (VMD), Bagging, Boosting, Bagging-VMD, and Boosting-VMD to enhance the quality of input data and thus culminating in improved model accuracy. The five proposed pre-processing techniques were applied to the observed daily rainfall series of the Dungun river basin, Malaysia, for the period starting from November to February (Northeast Monsoon) from 1996 to 2016. Two machine learning models, the base models (Ori), that is the artificial neural network (ANN) and the support vector regression (SVR), were used in conjunction with the data pre-processing methods. The comparison between the ML methods with and without data pre-processing was done. It was found that prediction of water levels with the two ML methods of SVR and ANN together with the Boosting-VMD was superior to those results derived with just the base original model (Ori). The advantage of the enhanced models (respectively, founded on SVR and ANN) over the original models (SVR and ANN) is best reflected in the performance statistics. Numerical results in terms of root mean square error (RMSE) of (0.42, 0.20 vs 1.85,1.82), mean absolute percentage error (MAPE) of (4.36, 2.82 vs 18.89, 22.56), mean absolute error (MAE) of (0.28,0.16 vs 1.25, 1.41), and Nash–Sutcliffe efficiency coefficient (NSE) (0.96, 0.99 vs 0.25, 0.27) were obtained for the respective models. Additionally, various data visualization graphs such as hydrographs, residual hydrographs, peak-estimates, and box and whisker plots were illustrated to compare between various data pre-processing techniques. The experimental results showed that both the Boosting and the Boosting-VMD methods showed better performance over the other techniques. The Boosting-ANN model was found to be the better model to predict river water levels with the lowest RMSE (0.19), MAPE (2.72), and MAE (0.15) and the highest NSE (0.99).

Suggested Citation

  • Ervin Shan Khai Tiu & Yuk Feng Huang & Jing Lin Ng & Nouar AlDahoul & Ali Najah Ahmed & Ahmed Elshafie, 2022. "An evaluation of various data pre-processing techniques with machine learning models for water level prediction," Natural Hazards: Journal of the International Society for the Prevention and Mitigation of Natural Hazards, Springer;International Society for the Prevention and Mitigation of Natural Hazards, vol. 110(1), pages 121-153, January.
  • Handle: RePEc:spr:nathaz:v:110:y:2022:i:1:d:10.1007_s11069-021-04939-8
    DOI: 10.1007/s11069-021-04939-8
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11069-021-04939-8
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11069-021-04939-8?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Lahmiri, Salim, 2015. "Long memory in international financial markets trends and short movements during 2008 financial crisis based on variational mode decomposition and detrended fluctuation analysis," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 437(C), pages 130-138.
    2. Friedman, Jerome H., 2002. "Stochastic gradient boosting," Computational Statistics & Data Analysis, Elsevier, vol. 38(4), pages 367-378, February.
    3. Bahrudin Hrnjica & Ognjen Bonacci, 2019. "Lake Level Prediction using Feed Forward and Recurrent Neural Networks," Water Resources Management: An International Journal, Published for the European Water Resources Association (EWRA), Springer;European Water Resources Association (EWRA), vol. 33(7), pages 2471-2484, May.
    4. Wen-chuan Wang & Kwok-wing Chau & Dong-mei Xu & Xiao-Yun Chen, 2015. "Improving Forecasting Accuracy of Annual Runoff Time Series Using ARIMA Based on EEMD Decomposition," Water Resources Management: An International Journal, Published for the European Water Resources Association (EWRA), Springer;European Water Resources Association (EWRA), vol. 29(8), pages 2655-2675, June.
    5. Taymoor Awchi, 2014. "River Discharges Forecasting In Northern Iraq Using Different ANN Techniques," Water Resources Management: An International Journal, Published for the European Water Resources Association (EWRA), Springer;European Water Resources Association (EWRA), vol. 28(3), pages 801-814, February.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Lahmiri, Salim & Bekiros, Stelios, 2017. "Disturbances and complexity in volatility time series," Chaos, Solitons & Fractals, Elsevier, vol. 105(C), pages 38-42.
    2. Bissan Ghaddar & Ignacio Gómez-Casares & Julio González-Díaz & Brais González-Rodríguez & Beatriz Pateiro-López & Sofía Rodríguez-Ballesteros, 2023. "Learning for Spatial Branching: An Algorithm Selection Approach," INFORMS Journal on Computing, INFORMS, vol. 35(5), pages 1024-1043, September.
    3. Shahzad, Syed Jawad Hussain & Nor, Safwan Mohd & Kumar, Ronald Ravinesh & Mensi, Walid, 2017. "Interdependence and contagion among industry-level US credit markets: An application of wavelet and VMD based copula approaches," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 466(C), pages 310-324.
    4. Nahushananda Chakravarthy H G & Karthik M Seenappa & Sujay Raghavendra Naganna & Dayananda Pruthviraja, 2023. "Machine Learning Models for the Prediction of the Compressive Strength of Self-Compacting Concrete Incorporating Incinerated Bio-Medical Waste Ash," Sustainability, MDPI, vol. 15(18), pages 1-22, September.
    5. Wen, Shaoting & Buyukada, Musa & Evrendilek, Fatih & Liu, Jingyong, 2020. "Uncertainty and sensitivity analyses of co-combustion/pyrolysis of textile dyeing sludge and incense sticks: Regression and machine-learning models," Renewable Energy, Elsevier, vol. 151(C), pages 463-474.
    6. Rana Muhammad Adnan & Zhongmin Liang & Xiaohui Yuan & Ozgur Kisi & Muhammad Akhlaq & Binquan Li, 2019. "Comparison of LSSVR, M5RT, NF-GP, and NF-SC Models for Predictions of Hourly Wind Speed and Wind Power Based on Cross-Validation," Energies, MDPI, vol. 12(2), pages 1-22, January.
    7. Spiliotis, Evangelos & Makridakis, Spyros & Kaltsounis, Anastasios & Assimakopoulos, Vassilios, 2021. "Product sales probabilistic forecasting: An empirical evaluation using the M5 competition data," International Journal of Production Economics, Elsevier, vol. 240(C).
    8. Kusiak, Andrew & Zheng, Haiyang & Song, Zhe, 2009. "On-line monitoring of power curves," Renewable Energy, Elsevier, vol. 34(6), pages 1487-1493.
    9. Zhu, Siying & Zhu, Feng, 2019. "Cycling comfort evaluation with instrumented probe bicycle," Transportation Research Part A: Policy and Practice, Elsevier, vol. 129(C), pages 217-231.
    10. Mohammad Zounemat-Kermani, 2016. "Investigating Chaos and Nonlinear Forecasting in Short Term and Mid-term River Discharge," Water Resources Management: An International Journal, Published for the European Water Resources Association (EWRA), Springer;European Water Resources Association (EWRA), vol. 30(5), pages 1851-1865, March.
    11. Dursun Delen & Hamed M. Zolbanin & Durand Crosby & David Wright, 2021. "To imprison or not to imprison: an analytics model for drug courts," Annals of Operations Research, Springer, vol. 303(1), pages 101-124, August.
    12. Doruk Cengiz & Arindrajit Dube & Attila S. Lindner & David Zentler-Munro, 2021. "Seeing Beyond the Trees: Using Machine Learning to Estimate the Impact of Minimum Wages on Labor Market Outcomes," NBER Working Papers 28399, National Bureau of Economic Research, Inc.
    13. Zhou, Jing & Li, Wei & Wang, Jiaxin & Ding, Shuai & Xia, Chengyi, 2019. "Default prediction in P2P lending from high-dimensional data based on machine learning," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 534(C).
    14. Lu, Yingjie & Li, Tao & Hu, Hui & Zeng, Xuemei, 2023. "Short-term prediction of reference crop evapotranspiration based on machine learning with different decomposition methods in arid areas of China," Agricultural Water Management, Elsevier, vol. 279(C).
    15. Quande Qin & Huangda He & Li Li & Ling-Yun He, 2020. "A Novel Decomposition-Ensemble Based Carbon Price Forecasting Model Integrated with Local Polynomial Prediction," Computational Economics, Springer;Society for Computational Economics, vol. 55(4), pages 1249-1273, April.
    16. Bohdan M. Pavlyshenko, 2019. "Machine-Learning Models for Sales Time Series Forecasting," Data, MDPI, vol. 4(1), pages 1-11, January.
    17. Tsionas, Mike G. & Michaelides, Panayotis G., 2017. "Bayesian analysis of chaos: The joint return-volatility dynamical system," MPRA Paper 80632, University Library of Munich, Germany.
    18. Vidhi Vig & Anmol Kaur, 2022. "Time series forecasting and mathematical modeling of COVID-19 pandemic in India: a developing country struggling to cope up," International Journal of System Assurance Engineering and Management, Springer;The Society for Reliability, Engineering Quality and Operations Management (SREQOM),India, and Division of Operation and Maintenance, Lulea University of Technology, Sweden, vol. 13(6), pages 2920-2933, December.
    19. Ali Suliman & Milad Jajarmizadeh & Sobri Harun & Intan Mat Darus, 2015. "Comparison of Semi-Distributed, GIS-Based Hydrological Models for the Prediction of Streamflow in a Large Catchment," Water Resources Management: An International Journal, Published for the European Water Resources Association (EWRA), Springer;European Water Resources Association (EWRA), vol. 29(9), pages 3095-3110, July.
    20. Xinxin He & Jungang Luo & Ganggang Zuo & Jiancang Xie, 2019. "Daily Runoff Forecasting Using a Hybrid Model Based on Variational Mode Decomposition and Deep Neural Networks," Water Resources Management: An International Journal, Published for the European Water Resources Association (EWRA), Springer;European Water Resources Association (EWRA), vol. 33(4), pages 1571-1590, March.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:nathaz:v:110:y:2022:i:1:d:10.1007_s11069-021-04939-8. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.