IDEAS home Printed from https://ideas.repec.org/a/gam/jsusta/v17y2025i17p7853-d1738780.html

Application of Machine Learning Approaches to Predict Soil Element Background Concentration at Large Region Scale

Author

Listed:
  • Jiao Li

    (Technical Centre for Soil, Agriculture and Rural Ecology and Environment, Ministry of Ecology and Environment, Beijing 100012, China)

  • Linglong Meng

    (Technical Centre for Soil, Agriculture and Rural Ecology and Environment, Ministry of Ecology and Environment, Beijing 100012, China)

  • Tianran Li

    (Technical Centre for Soil, Agriculture and Rural Ecology and Environment, Ministry of Ecology and Environment, Beijing 100012, China)

  • Pengli Xue

    (Technical Centre for Soil, Agriculture and Rural Ecology and Environment, Ministry of Ecology and Environment, Beijing 100012, China)

  • Hejing Wang

    (Technical Centre for Soil, Agriculture and Rural Ecology and Environment, Ministry of Ecology and Environment, Beijing 100012, China)

  • Jie Hua

    (Technical Centre for Soil, Agriculture and Rural Ecology and Environment, Ministry of Ecology and Environment, Beijing 100012, China)

Abstract

Soil element background concentration is foundational data for environmental quality assessment, contamination diagnosis, and sustainable land management. However, existing investigation-based methods are time-consuming and inefficient. The machine learning (ML) method has demonstrated excellent performance in predicting soil heavy metal concentration. In this study, based on the nine environmental variables of soil formation from 210 soil monitoring points, including elevation, pH, organic matter, soil type, parent material, plant cover, land use type, topography, and soil texture, decision tree (DT), random forest (RF), extreme gradient boosting (XGB), and support vector machine (SVM) models were used to predict the eleven soil element background concentrations. Among them, SVM and RF models could be used for an effective prediction of the background concentration of all soil heavy metals. Compared with the XGBoost and DT, the SVM for all heavy metals except for cadmium (Cd) and manganese (Mn) performs best. Although the key factors affecting background concentrations vary among different soil elements, organic matter, soil type, and altitude, they play a crucial role in the accurate prediction of soil element background concentration. This study provides simple and efficient ML models for predicting soil element background concentration at the large regional scale. The results of this study can be utilized to distinguish natural geochemical processes from human-induced pollution.

Suggested Citation

  • Jiao Li & Linglong Meng & Tianran Li & Pengli Xue & Hejing Wang & Jie Hua, 2025. "Application of Machine Learning Approaches to Predict Soil Element Background Concentration at Large Region Scale," Sustainability, MDPI, vol. 17(17), pages 1-22, August.
  • Handle: RePEc:gam:jsusta:v:17:y:2025:i:17:p:7853-:d:1738780
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2071-1050/17/17/7853/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2071-1050/17/17/7853/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Friedman, Jerome H., 2002. "Stochastic gradient boosting," Computational Statistics & Data Analysis, Elsevier, vol. 38(4), pages 367-378, February.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Mansoor, Umer & Jamal, Arshad & Su, Junbiao & Sze, N.N. & Chen, Anthony, 2023. "Investigating the risk factors of motorcycle crash injury severity in Pakistan: Insights and policy recommendations," Transport Policy, Elsevier, vol. 139(C), pages 21-38.
    2. Ylinen, Mika & Ranta, Mikko, 2025. "Predicting corporate innovation using machine learning and social media data," Technovation, Elsevier, vol. 148(C).
    3. Bissan Ghaddar & Ignacio Gómez-Casares & Julio González-Díaz & Brais González-Rodríguez & Beatriz Pateiro-López & Sofía Rodríguez-Ballesteros, 2023. "Learning for Spatial Branching: An Algorithm Selection Approach," INFORMS Journal on Computing, INFORMS, vol. 35(5), pages 1024-1043, September.
    4. Akash Malhotra, 2018. "A hybrid econometric-machine learning approach for relative importance analysis: Prioritizing food policy," Papers 1806.04517, arXiv.org, revised Aug 2020.
    5. Nahushananda Chakravarthy H G & Karthik M Seenappa & Sujay Raghavendra Naganna & Dayananda Pruthviraja, 2023. "Machine Learning Models for the Prediction of the Compressive Strength of Self-Compacting Concrete Incorporating Incinerated Bio-Medical Waste Ash," Sustainability, MDPI, vol. 15(18), pages 1-22, September.
    6. Tim Voigt & Martin Kohlhase & Oliver Nelles, 2021. "Incremental DoE and Modeling Methodology with Gaussian Process Regression: An Industrially Applicable Approach to Incorporate Expert Knowledge," Mathematics, MDPI, vol. 9(19), pages 1-26, October.
    7. Wen, Shaoting & Buyukada, Musa & Evrendilek, Fatih & Liu, Jingyong, 2020. "Uncertainty and sensitivity analyses of co-combustion/pyrolysis of textile dyeing sludge and incense sticks: Regression and machine-learning models," Renewable Energy, Elsevier, vol. 151(C), pages 463-474.
    8. Zhu, Haibin & Bai, Lu & He, Lidan & Liu, Zhi, 2023. "Forecasting realized volatility with machine learning: Panel data perspective," Journal of Empirical Finance, Elsevier, vol. 73(C), pages 251-271.
    9. Spiliotis, Evangelos & Makridakis, Spyros & Kaltsounis, Anastasios & Assimakopoulos, Vassilios, 2021. "Product sales probabilistic forecasting: An empirical evaluation using the M5 competition data," International Journal of Production Economics, Elsevier, vol. 240(C).
    10. Zhang, Ning & Li, Zhiying & Zou, Xun & Quiring, Steven M., 2019. "Comparison of three short-term load forecast models in Southern California," Energy, Elsevier, vol. 189(C).
    11. Smyl, Slawek & Hua, N. Grace, 2019. "Machine learning methods for GEFCom2017 probabilistic load forecasting," International Journal of Forecasting, Elsevier, vol. 35(4), pages 1424-1431.
    12. Barzin,Samira & Avner,Paolo & Maruyama Rentschler,Jun Erik & O’Clery,Neave, 2022. "Where Are All the Jobs ? A Machine Learning Approach for High Resolution Urban Employment Prediction inDeveloping Countries," Policy Research Working Paper Series 9979, The World Bank.
    13. Wu, Jishi & Feng, Tao & Jia, Peng, 2025. "Revealing the built environment impacts on curbside freight parking demand using a deep generalized additive modeling framework," Transport Policy, Elsevier, vol. 174(C).
    14. Kusiak, Andrew & Zheng, Haiyang & Song, Zhe, 2009. "On-line monitoring of power curves," Renewable Energy, Elsevier, vol. 34(6), pages 1487-1493.
    15. Zhu, Siying & Zhu, Feng, 2019. "Cycling comfort evaluation with instrumented probe bicycle," Transportation Research Part A: Policy and Practice, Elsevier, vol. 129(C), pages 217-231.
    16. Catherine Ikae & Jacques Savoy, 2022. "Gender identification on Twitter," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 73(1), pages 58-69, January.
    17. Cao, Jason & Tao, Tao, 2025. "Can an identified environmental correlate of car ownership serve as a practical planning tool?," Transportation Research Part A: Policy and Practice, Elsevier, vol. 191(C).
    18. Barkan, Oren & Benchimol, Jonathan & Caspi, Itamar & Cohen, Eliya & Hammer, Allon & Koenigstein, Noam, 2023. "Forecasting CPI inflation components with Hierarchical Recurrent Neural Networks," EconStor Open Access Articles and Book Chapters, ZBW - Leibniz Information Centre for Economics, vol. 39(3), pages 1145-1162.
    19. Martijn Kagie & Michiel Van Wezel, 2007. "Hedonic price models and indices based on boosting applied to the Dutch housing market," Intelligent Systems in Accounting, Finance and Management, John Wiley & Sons, Ltd., vol. 15(3‐4), pages 85-106, July.
    20. Matthias Bogaert & Michel Ballings & Dirk Van den Poel, 2018. "Evaluating the importance of different communication types in romantic tie prediction on social media," Annals of Operations Research, Springer, vol. 263(1), pages 501-527, April.

    More about this item

    Keywords

    ;
    ;
    ;
    ;
    ;

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jsusta:v:17:y:2025:i:17:p:7853-:d:1738780. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager The email address of this maintainer does not seem to be valid anymore. Please ask MDPI Indexing Manager to update the entry or send us the correct address (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.