IDEAS home Printed from https://ideas.repec.org/a/gam/jftint/v14y2022i11p324-d966701.html
   My bibliography  Save this article

A Machine Learning Predictive Model to Detect Water Quality and Pollution

Author

Listed:
  • Xiaoting Xu

    (School of Computer Science, The University of Sydney, Camperdown, NSW 2006, Australia)

  • Tin Lai

    (School of Computer Science, The University of Sydney, Camperdown, NSW 2006, Australia)

  • Sayka Jahan

    (Department of Environmental Sciences, Macquarie University, Sydney, NSW 2109, Australia)

  • Farnaz Farid

    (School of Social Sciences, Western Sydney University, Penrith, NSW 2751, Australia)

  • Abubakar Bello

    (School of Social Sciences, Western Sydney University, Penrith, NSW 2751, Australia)

Abstract

The increasing prevalence of marine pollution during the past few decades motivated recent research to help ease the situation. Typical water quality assessment requires continuous monitoring of water and sediments at remote locations with labour-intensive laboratory tests to determine the degree of pollution. We propose an automated water quality assessment framework where we formalise a predictive model using machine learning to infer the water quality and level of pollution using collected water and sediments samples. Firstly, due to the sparsity of sample collection locations, the amount of sediment samples of water is limited, and the dataset is incomplete. Therefore, after an extensive investigation on various data imputation methods’ performance in water and sediment datasets with different missing data rates, we chose the best imputation method to process the missing data. Afterwards, the water sediment sample will be tagged as one of four levels of pollution based on some guidelines and then the machine learning model will use a specific technique named classification to find the relationship between the data and the final result. After that, the result of prediction can be compared to the real result so that it can be checked whether the model is good and whether the prediction is accurate. Finally, the research gave improvement advice based on the result obtained from the model building part. Empirically, we show that our best model archives an accuracy of 75% after accounting for 57% of missing data. Experimentally, we show that our model would assist in automatically assessing water quality screening based on possibly incomplete real-world data.

Suggested Citation

  • Xiaoting Xu & Tin Lai & Sayka Jahan & Farnaz Farid & Abubakar Bello, 2022. "A Machine Learning Predictive Model to Detect Water Quality and Pollution," Future Internet, MDPI, vol. 14(11), pages 1-14, November.
  • Handle: RePEc:gam:jftint:v:14:y:2022:i:11:p:324-:d:966701
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/1999-5903/14/11/324/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/1999-5903/14/11/324/
    Download Restriction: no
    ---><---

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jftint:v:14:y:2022:i:11:p:324-:d:966701. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.