IDEAS home Printed from https://ideas.repec.org/a/spr/nathaz/v110y2022i1d10.1007_s11069-021-04939-8.html
   My bibliography  Save this article

An evaluation of various data pre-processing techniques with machine learning models for water level prediction

Author

Listed:
  • Ervin Shan Khai Tiu

    (Universiti Tunku Abdul Rahman)

  • Yuk Feng Huang

    (Universiti Tunku Abdul Rahman)

  • Jing Lin Ng

    (UCSI University)

  • Nouar AlDahoul

    (Multimedia University)

  • Ali Najah Ahmed

    (University Tenaga Nasional (UNITEN))

  • Ahmed Elshafie

    (University of Malaya)

Abstract

Floods are the most frequent type of natural disaster. It destroys wildlife habitat, damages bridges, railways, roads, properties, and puts millions of people at risk. As such, flood detection systems have been developed to monitor the changes of water level and raise an alarm should there be imminent danger. River water level prediction is a significant task in flood mitigation planning and floodplains management. Usually, using raw data of rainfall series directly with machine learning (ML) regression methods, does not result in sufficiently good prediction accuracy. The raw data should be pre-processed using specific techniques to enhance their quality a priori to being applied to the prediction methods. This paper serves to address the stated problem by utilizing various data pre-processing techniques such as the Variational Mode Decomposition (VMD), Bagging, Boosting, Bagging-VMD, and Boosting-VMD to enhance the quality of input data and thus culminating in improved model accuracy. The five proposed pre-processing techniques were applied to the observed daily rainfall series of the Dungun river basin, Malaysia, for the period starting from November to February (Northeast Monsoon) from 1996 to 2016. Two machine learning models, the base models (Ori), that is the artificial neural network (ANN) and the support vector regression (SVR), were used in conjunction with the data pre-processing methods. The comparison between the ML methods with and without data pre-processing was done. It was found that prediction of water levels with the two ML methods of SVR and ANN together with the Boosting-VMD was superior to those results derived with just the base original model (Ori). The advantage of the enhanced models (respectively, founded on SVR and ANN) over the original models (SVR and ANN) is best reflected in the performance statistics. Numerical results in terms of root mean square error (RMSE) of (0.42, 0.20 vs 1.85,1.82), mean absolute percentage error (MAPE) of (4.36, 2.82 vs 18.89, 22.56), mean absolute error (MAE) of (0.28,0.16 vs 1.25, 1.41), and Nash–Sutcliffe efficiency coefficient (NSE) (0.96, 0.99 vs 0.25, 0.27) were obtained for the respective models. Additionally, various data visualization graphs such as hydrographs, residual hydrographs, peak-estimates, and box and whisker plots were illustrated to compare between various data pre-processing techniques. The experimental results showed that both the Boosting and the Boosting-VMD methods showed better performance over the other techniques. The Boosting-ANN model was found to be the better model to predict river water levels with the lowest RMSE (0.19), MAPE (2.72), and MAE (0.15) and the highest NSE (0.99).

Suggested Citation

  • Ervin Shan Khai Tiu & Yuk Feng Huang & Jing Lin Ng & Nouar AlDahoul & Ali Najah Ahmed & Ahmed Elshafie, 2022. "An evaluation of various data pre-processing techniques with machine learning models for water level prediction," Natural Hazards: Journal of the International Society for the Prevention and Mitigation of Natural Hazards, Springer;International Society for the Prevention and Mitigation of Natural Hazards, vol. 110(1), pages 121-153, January.
  • Handle: RePEc:spr:nathaz:v:110:y:2022:i:1:d:10.1007_s11069-021-04939-8
    DOI: 10.1007/s11069-021-04939-8
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11069-021-04939-8
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11069-021-04939-8?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Wen-chuan Wang & Kwok-wing Chau & Dong-mei Xu & Xiao-Yun Chen, 2015. "Improving Forecasting Accuracy of Annual Runoff Time Series Using ARIMA Based on EEMD Decomposition," Water Resources Management: An International Journal, Published for the European Water Resources Association (EWRA), Springer;European Water Resources Association (EWRA), vol. 29(8), pages 2655-2675, June.
    2. Lahmiri, Salim, 2015. "Long memory in international financial markets trends and short movements during 2008 financial crisis based on variational mode decomposition and detrended fluctuation analysis," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 437(C), pages 130-138.
    3. Taymoor Awchi, 2014. "River Discharges Forecasting In Northern Iraq Using Different ANN Techniques," Water Resources Management: An International Journal, Published for the European Water Resources Association (EWRA), Springer;European Water Resources Association (EWRA), vol. 28(3), pages 801-814, February.
    4. Friedman, Jerome H., 2002. "Stochastic gradient boosting," Computational Statistics & Data Analysis, Elsevier, vol. 38(4), pages 367-378, February.
    5. Bahrudin Hrnjica & Ognjen Bonacci, 2019. "Lake Level Prediction using Feed Forward and Recurrent Neural Networks," Water Resources Management: An International Journal, Published for the European Water Resources Association (EWRA), Springer;European Water Resources Association (EWRA), vol. 33(7), pages 2471-2484, May.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Mansoor, Umer & Jamal, Arshad & Su, Junbiao & Sze, N.N. & Chen, Anthony, 2023. "Investigating the risk factors of motorcycle crash injury severity in Pakistan: Insights and policy recommendations," Transport Policy, Elsevier, vol. 139(C), pages 21-38.
    2. Lahmiri, Salim & Bekiros, Stelios, 2017. "Disturbances and complexity in volatility time series," Chaos, Solitons & Fractals, Elsevier, vol. 105(C), pages 38-42.
    3. Bissan Ghaddar & Ignacio Gómez-Casares & Julio González-Díaz & Brais González-Rodríguez & Beatriz Pateiro-López & Sofía Rodríguez-Ballesteros, 2023. "Learning for Spatial Branching: An Algorithm Selection Approach," INFORMS Journal on Computing, INFORMS, vol. 35(5), pages 1024-1043, September.
    4. Shahzad, Syed Jawad Hussain & Nor, Safwan Mohd & Kumar, Ronald Ravinesh & Mensi, Walid, 2017. "Interdependence and contagion among industry-level US credit markets: An application of wavelet and VMD based copula approaches," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 466(C), pages 310-324.
    5. Akash Malhotra, 2018. "A hybrid econometric-machine learning approach for relative importance analysis: Prioritizing food policy," Papers 1806.04517, arXiv.org, revised Aug 2020.
    6. Kagiso Samuel More & Christian Wolkersdorfer, 2022. "Predicting and Forecasting Mine Water Parameters Using a Hybrid Intelligent System," Water Resources Management: An International Journal, Published for the European Water Resources Association (EWRA), Springer;European Water Resources Association (EWRA), vol. 36(8), pages 2813-2826, June.
    7. Nahushananda Chakravarthy H G & Karthik M Seenappa & Sujay Raghavendra Naganna & Dayananda Pruthviraja, 2023. "Machine Learning Models for the Prediction of the Compressive Strength of Self-Compacting Concrete Incorporating Incinerated Bio-Medical Waste Ash," Sustainability, MDPI, vol. 15(18), pages 1-22, September.
    8. Tim Voigt & Martin Kohlhase & Oliver Nelles, 2021. "Incremental DoE and Modeling Methodology with Gaussian Process Regression: An Industrially Applicable Approach to Incorporate Expert Knowledge," Mathematics, MDPI, vol. 9(19), pages 1-26, October.
    9. Wen, Shaoting & Buyukada, Musa & Evrendilek, Fatih & Liu, Jingyong, 2020. "Uncertainty and sensitivity analyses of co-combustion/pyrolysis of textile dyeing sludge and incense sticks: Regression and machine-learning models," Renewable Energy, Elsevier, vol. 151(C), pages 463-474.
    10. Rana Muhammad Adnan & Zhongmin Liang & Xiaohui Yuan & Ozgur Kisi & Muhammad Akhlaq & Binquan Li, 2019. "Comparison of LSSVR, M5RT, NF-GP, and NF-SC Models for Predictions of Hourly Wind Speed and Wind Power Based on Cross-Validation," Energies, MDPI, vol. 12(2), pages 1-22, January.
    11. Zhu, Haibin & Bai, Lu & He, Lidan & Liu, Zhi, 2023. "Forecasting realized volatility with machine learning: Panel data perspective," Journal of Empirical Finance, Elsevier, vol. 73(C), pages 251-271.
    12. Spiliotis, Evangelos & Makridakis, Spyros & Kaltsounis, Anastasios & Assimakopoulos, Vassilios, 2021. "Product sales probabilistic forecasting: An empirical evaluation using the M5 competition data," International Journal of Production Economics, Elsevier, vol. 240(C).
    13. Zhang, Ning & Li, Zhiying & Zou, Xun & Quiring, Steven M., 2019. "Comparison of three short-term load forecast models in Southern California," Energy, Elsevier, vol. 189(C).
    14. Smyl, Slawek & Hua, N. Grace, 2019. "Machine learning methods for GEFCom2017 probabilistic load forecasting," International Journal of Forecasting, Elsevier, vol. 35(4), pages 1424-1431.
    15. Barzin,Samira & Avner,Paolo & Maruyama Rentschler,Jun Erik & O’Clery,Neave, 2022. "Where Are All the Jobs ? A Machine Learning Approach for High Resolution Urban Employment Prediction inDeveloping Countries," Policy Research Working Paper Series 9979, The World Bank.
    16. Eike Emrich & Christian Pierdzioch, 2016. "Volunteering, Match Quality, and Internet Use," Schmollers Jahrbuch : Journal of Applied Social Science Studies / Zeitschrift für Wirtschafts- und Sozialwissenschaften, Duncker & Humblot, Berlin, vol. 136(2), pages 199-226.
    17. Lahmiri, Salim, 2016. "Image characterization by fractal descriptors in variational mode decomposition domain: Application to brain magnetic resonance," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 456(C), pages 235-243.
    18. Salman Sharifazari & Shahab Araghinejad, 2015. "Development of a Nonparametric Model for Multivariate Hydrological Monthly Series Simulation Considering Climate Change Impacts," Water Resources Management: An International Journal, Published for the European Water Resources Association (EWRA), Springer;European Water Resources Association (EWRA), vol. 29(14), pages 5309-5322, November.
    19. Kusiak, Andrew & Zheng, Haiyang & Song, Zhe, 2009. "On-line monitoring of power curves," Renewable Energy, Elsevier, vol. 34(6), pages 1487-1493.
    20. Zhu, Siying & Zhu, Feng, 2019. "Cycling comfort evaluation with instrumented probe bicycle," Transportation Research Part A: Policy and Practice, Elsevier, vol. 129(C), pages 217-231.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:nathaz:v:110:y:2022:i:1:d:10.1007_s11069-021-04939-8. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.