IDEAS home Printed from https://ideas.repec.org/a/gam/jsusta/v18y2026i3p1200-d1848054.html

Ensemble Machine Learning for Operational Water Quality Monitoring Using Weighted Model Fusion for pH Forecasting

Author

Listed:
  • Wenwen Chen

    (College of Management and Engineering, Xuzhou University of Technology, Xuzhou 221018, China
    These authors contributed equally to this work.)

  • Yinzi Shao

    (College of Saint Petersburg Joint Engineering, Xuzhou University of Technology, Xuzhou 221018, China
    These authors contributed equally to this work.)

  • Zhicheng Xu

    (School of Chemistry and Life Sciences, Nanjing University of Posts and Telecommunications, Nanjing 210003, China)

  • Bing Zhou

    (College of Saint Petersburg Joint Engineering, Xuzhou University of Technology, Xuzhou 221018, China)

  • Shuhe Cui

    (College of Design and Engineering, National University of Singapore, Singapore 119077, Singapore)

  • Zhenxiang Dai

    (College of Saint Petersburg Joint Engineering, Xuzhou University of Technology, Xuzhou 221018, China)

  • Shuai Yin

    (College of Saint Petersburg Joint Engineering, Xuzhou University of Technology, Xuzhou 221018, China)

  • Yuewen Gao

    (College of Management and Engineering, Xuzhou University of Technology, Xuzhou 221018, China)

  • Lili Liu

    (College of Saint Petersburg Joint Engineering, Xuzhou University of Technology, Xuzhou 221018, China)

Abstract

Water quality monitoring faces increasing challenges due to accelerating industrialization and urbanization, demanding accurate, real-time, and reliable prediction technologies. This study presents a novel ensemble learning framework integrating Gaussian Process Regression, Support Vector Regression, and Random Forest algorithms for high-precision water quality pH prediction. The research utilized a comprehensive spatiotemporal dataset, comprising 11 water quality parameters from 37 monitoring stations across Georgia, USA, spanning 705 days from January 2016 to January 2018. The ensemble model employed a dynamic weight allocation strategy based on cross-validation error performance, assigning optimal weights of 34.27% to Random Forest, 33.26% to Support Vector Regression, and 32.47% to Gaussian Process Regression. The integrated approach achieved superior predictive performance, with a mean absolute error of 0.0062 and coefficient of determination of 0.8533, outperforming individual base learners across multiple evaluation metrics. Statistical significance testing using Wilcoxon signed-rank tests with a Bonferroni correction confirmed that the ensemble significantly outperforms all individual models ( p < 0.001). Comparison with state-of-the-art models (LightGBM, XGBoost, TabNet) demonstrated competitive or superior ensemble performance. Comprehensive ablation experiments revealed that Random Forest removal causes the largest performance degradation (+4.43% MAE increase). Feature importance analysis revealed the dissolved oxygen maximum and conductance mean as the most influential predictors, contributing 22.1% and 17.5%, respectively. Cross-validation results demonstrated robust model stability with a mean absolute error of 0.0053 ± 0.0002, while bootstrap confidence intervals confirmed narrow uncertainty bounds of 0.0060 to 0.0066. Spatiotemporal analysis identified station-specific performance variations ranging from 0.0036 to 0.0150 MAE. High-error stations (12, 29, 33) were analyzed to distinguish characteristics, including higher pH variability and potential upstream pollution influences. An integrated software platform was developed featuring intuitive interface, real-time prediction, and comprehensive visualization tools for environmental monitoring applications.

Suggested Citation

  • Wenwen Chen & Yinzi Shao & Zhicheng Xu & Bing Zhou & Shuhe Cui & Zhenxiang Dai & Shuai Yin & Yuewen Gao & Lili Liu, 2026. "Ensemble Machine Learning for Operational Water Quality Monitoring Using Weighted Model Fusion for pH Forecasting," Sustainability, MDPI, vol. 18(3), pages 1-20, January.
  • Handle: RePEc:gam:jsusta:v:18:y:2026:i:3:p:1200-:d:1848054
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2071-1050/18/3/1200/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2071-1050/18/3/1200/
    Download Restriction: no
    ---><---

    More about this item

    Keywords

    ;
    ;
    ;

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jsusta:v:18:y:2026:i:3:p:1200-:d:1848054. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.