IDEAS home Printed from https://ideas.repec.org/a/gam/jsusta/v17y2025i13p6168-d1695199.html
   My bibliography  Save this article

Spatial Prediction of Soil Organic Carbon Based on a Multivariate Feature Set and Stacking Ensemble Algorithm: A Case Study of Wei-Ku Oasis in China

Author

Listed:
  • Zuming Cao

    (College of Geographic Science and Tourism, Xinjiang Normal University, Urumqi 830017, China
    Xinjiang Arid Zone Lake Environment and Resources Laboratory, Urumqi 830017, China)

  • Xiaowei Luo

    (College of Geographic Science and Tourism, Xinjiang Normal University, Urumqi 830017, China
    Xinjiang Arid Zone Lake Environment and Resources Laboratory, Urumqi 830017, China)

  • Xuemei Wang

    (College of Geographic Science and Tourism, Xinjiang Normal University, Urumqi 830017, China
    Xinjiang Arid Zone Lake Environment and Resources Laboratory, Urumqi 830017, China)

  • Dun Li

    (College of Geographic Science and Tourism, Xinjiang Normal University, Urumqi 830017, China
    Xinjiang Arid Zone Lake Environment and Resources Laboratory, Urumqi 830017, China)

Abstract

Accurate estimation of soil organic carbon (SOC) content is crucial for assessing terrestrial ecosystem carbon stocks. Although traditional methods offer relatively high estimation accuracy, they are limited by poor timeliness and high costs. Combining measured data, remote sensing technology, and machine learning (ML) algorithms enables rapid, efficient, and accurate large-scale prediction. However, single ML models often face issues like high feature variable redundancy and weak generalization ability. Integrated models can effectively overcome these problems. This study focuses on the Weigan–Kuqa River oasis (Wei-Ku Oasis), a typical arid oasis in northwest China. It integrates Sentinel-2A multispectral imagery, a digital elevation model, ERA5 meteorological reanalysis data, soil attribute, and land use (LU) data to estimate SOC. The Boruta algorithm, Lasso regression, and its combination methods were used to screen feature variables, constructing a multidimensional feature space. Ensemble models like Random Forest (RF), Gradient Boosting Machine (GBM), and the Stacking model are built. Results show that the Stacking model, constructed by combining the screened variable sets, exhibited optimal prediction accuracy (test set R 2 = 0.61, RMSE = 2.17 g∙kg −1 , RPD = 1.61), which reduced the prediction error by 9% compared to single model prediction. Difference Vegetation Index (DVI), Bare Soil Evapotranspiration (BSE), and type of land use (TLU) have a substantial multidimensional synergistic influence on the spatial differentiation pattern of the SOC. The implementation of TLU has been demonstrated to exert a substantial influence on the model’s estimation performance, as evidenced by an augmentation of 24% in the R 2 of the test set. The integration of Boruta–Lasso combination screening and Stacking has been shown to facilitate the construction of a high-precision SOC content estimation model. This model has the capacity to provide technical support for precision fertilization in oasis regions in arid zones and the management of regional carbon sinks.

Suggested Citation

  • Zuming Cao & Xiaowei Luo & Xuemei Wang & Dun Li, 2025. "Spatial Prediction of Soil Organic Carbon Based on a Multivariate Feature Set and Stacking Ensemble Algorithm: A Case Study of Wei-Ku Oasis in China," Sustainability, MDPI, vol. 17(13), pages 1-25, July.
  • Handle: RePEc:gam:jsusta:v:17:y:2025:i:13:p:6168-:d:1695199
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2071-1050/17/13/6168/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2071-1050/17/13/6168/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Nuno Carvalhais & Matthias Forkel & Myroslava Khomik & Jessica Bellarby & Martin Jung & Mirco Migliavacca & Mingquan Μu & Sassan Saatchi & Maurizio Santoro & Martin Thurner & Ulrich Weber & Bernhard A, 2014. "Global covariation of carbon turnover times with climate in terrestrial ecosystems," Nature, Nature, vol. 514(7521), pages 213-217, October.
    2. Kursa, Miron B. & Rudnicki, Witold R., 2010. "Feature Selection with the Boruta Package," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 36(i11).
    3. Rodrigues, Eugénio & Gomes, Álvaro & Gaspar, Adélio Rodrigues & Henggeler Antunes, Carlos, 2018. "Estimation of renewable energy and built environment-related variables using neural networks – A review," Renewable and Sustainable Energy Reviews, Elsevier, vol. 94(C), pages 959-988.
    4. Chai, Xuqing & Li, Shihao & Liang, Fengwei, 2024. "A novel battery SOC estimation method based on random search optimized LSTM neural network," Energy, Elsevier, vol. 306(C).
    5. Mukhtar Iderawumi Abdulraheem & Wei Zhang & Shixin Li & Ata Jahangir Moshayedi & Aitazaz A. Farooque & Jiandong Hu, 2023. "Advancement of Remote Sensing for Soil Measurements and Applications: A Comprehensive Review," Sustainability, MDPI, vol. 15(21), pages 1-32, October.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Tong, Jianfeng & Liu, Zhenxing & Zhang, Yong & Zheng, Xiujuan & Jin, Junyang, 2023. "Improved multi-gate mixture-of-experts framework for multi-step prediction of gas load," Energy, Elsevier, vol. 282(C).
    2. Re Cecconi, F. & Moretti, N. & Tagliabue, L.C., 2019. "Application of artificial neutral network and geographic information system to evaluate retrofit potential in public school buildings," Renewable and Sustainable Energy Reviews, Elsevier, vol. 110(C), pages 266-277.
    3. Asma Shaheen & Javed Iqbal, 2018. "Spatial Distribution and Mobility Assessment of Carcinogenic Heavy Metals in Soil Profiles Using Geostatistics and Random Forest, Boruta Algorithm," Sustainability, MDPI, vol. 10(3), pages 1-20, March.
    4. Ramón Ferri-García & María del Mar Rueda, 2022. "Variable selection in Propensity Score Adjustment to mitigate selection bias in online surveys," Statistical Papers, Springer, vol. 63(6), pages 1829-1881, December.
    5. Jiang, Wei & Wang, Teng & Yuan, Dongdong & Sha, Aimin & Zhang, Shuo & Zhang, Yufei & Xiao, Jingjing & Xing, Chengwei, 2024. "Available solar resources and photovoltaic system planning strategy for highway," Renewable and Sustainable Energy Reviews, Elsevier, vol. 203(C).
    6. Yvan Devaux & Lu Zhang & Andrew I. Lumley & Kanita Karaduzovic-Hadziabdic & Vincent Mooser & Simon Rousseau & Muhammad Shoaib & Venkata Satagopam & Muhamed Adilovic & Prashant Kumar Srivastava & Costa, 2024. "Development of a long noncoding RNA-based machine learning model to predict COVID-19 in-hospital mortality," Nature Communications, Nature, vol. 15(1), pages 1-12, December.
    7. Ghosh, Indranil & Chaudhuri, Tamal Datta & Alfaro-Cortés, Esteban & Gámez, Matías & García, Noelia, 2022. "A hybrid approach to forecasting futures prices with simultaneous consideration of optimality in ensemble feature selection and advanced artificial intelligence," Technological Forecasting and Social Change, Elsevier, vol. 181(C).
    8. Yang Zhao & Denise Gorse, 2024. "Earthquake prediction from seismic indicators using tree-based ensemble learning," Natural Hazards: Journal of the International Society for the Prevention and Mitigation of Natural Hazards, Springer;International Society for the Prevention and Mitigation of Natural Hazards, vol. 120(3), pages 2283-2309, February.
    9. Ruilin Bai & Yu Yao & Qiaosong Lin & Lize Wu & Zhen Li & Huijuan Wang & Mingze Ma & Di Mu & Lingxiang Hu & Hai Yang & Weihan Li & Shaolong Zhu & Xiaojun Wu & Xianhong Rui & Yan Yu, 2025. "Preferable single-atom catalysts enabled by natural language processing for high energy density Na-S batteries," Nature Communications, Nature, vol. 16(1), pages 1-15, December.
    10. Conor Waldock & Bernhard Wegscheider & Dario Josi & Bárbara Borges Calegari & Jakob Brodersen & Luiz Jardim de Queiroz & Ole Seehausen, 2024. "Deconstructing the geography of human impacts on species’ natural distribution," Nature Communications, Nature, vol. 15(1), pages 1-15, December.
    11. Zhongen Niu & Huimin Yan & Fang Liu, 2020. "Decreasing Cropping Intensity Dominated the Negative Trend of Cropland Productivity in Southern China in 2000–2015," Sustainability, MDPI, vol. 12(23), pages 1-14, December.
    12. Zhenghu Zhou & Chengjie Ren & Chuankuan Wang & Manuel Delgado-Baquerizo & Yiqi Luo & Zhongkui Luo & Zhenggang Du & Biao Zhu & Yuanhe Yang & Shuo Jiao & Fazhu Zhao & Andong Cai & Gaihe Yang & Gehong We, 2024. "Global turnover of soil mineral-associated and particulate organic carbon," Nature Communications, Nature, vol. 15(1), pages 1-9, December.
    13. Manuel J. García Rodríguez & Vicente Rodríguez Montequín & Francisco Ortega Fernández & Joaquín M. Villanueva Balsera, 2019. "Public Procurement Announcements in Spain: Regulations, Data Analysis, and Award Price Estimator Using Machine Learning," Complexity, Hindawi, vol. 2019, pages 1-20, November.
    14. Sangjin Kim & Jong-Min Kim, 2019. "Two-Stage Classification with SIS Using a New Filter Ranking Method in High Throughput Data," Mathematics, MDPI, vol. 7(6), pages 1-16, May.
    15. Baihan Wang & Alfred Pozarickij & Mohsen Mazidi & Neil Wright & Pang Yao & Saredo Said & Andri Iona & Christiana Kartsonaki & Hannah Fry & Kuang Lin & Yiping Chen & Huaidong Du & Daniel Avery & Dan Sc, 2025. "Comparative studies of 2168 plasma proteins measured by two affinity-based platforms in 4000 Chinese adults," Nature Communications, Nature, vol. 16(1), pages 1-13, December.
    16. Arjan S. Gosal & Janine A. McMahon & Katharine M. Bowgen & Catherine H. Hoppe & Guy Ziv, 2021. "Identifying and Mapping Groups of Protected Area Visitors by Environmental Awareness," Land, MDPI, vol. 10(6), pages 1-14, May.
    17. repec:plo:pone00:0185380 is not listed on IDEAS
    18. Cao, Liang & Su, Jianping & Saddler, Jack & Cao, Yankai & Wang, Yixiu & Lee, Gary & Siang, Lim C. & Luo, Yi & Pinchuk, Robert & Li, Jin & Gopaluni, R. Bhushan, 2025. "Machine learning for real-time green carbon dioxide tracking in refinery processes," Renewable and Sustainable Energy Reviews, Elsevier, vol. 213(C).
    19. Foutzopoulos, Giorgos & Pandis, Nikolaos & Tsagris, Michail, 2024. "Predicting full retirement attainment of NBA players," MPRA Paper 121540, University Library of Munich, Germany.
    20. Zhao-Yue Chen & Hervé Petetin & Raúl Fernando Méndez Turrubiates & Hicham Achebak & Carlos Pérez García-Pando & Joan Ballester, 2024. "Population exposure to multiple air pollutants and its compound episodes in Europe," Nature Communications, Nature, vol. 15(1), pages 1-11, December.
    21. Schrader, Silja & Graham, Sonia & Campbell, Rebecca & Height, Kaitlyn & Hawkes, Gina, 2024. "Grower attitudes and practices toward area-wide management of cropping weeds in Australia," Land Use Policy, Elsevier, vol. 137(C).

    More about this item

    Keywords

    ;
    ;
    ;
    ;
    ;
    ;

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jsusta:v:17:y:2025:i:13:p:6168-:d:1695199. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.