IDEAS home Printed from https://ideas.repec.org/a/gam/jlands/v11y2022i11p2098-d979647.html
   My bibliography  Save this article

Simple Optimal Sampling Algorithm to Strengthen Digital Soil Mapping Using the Spatial Distribution of Machine Learning Predictive Uncertainty: A Case Study for Field Capacity Prediction

Author

Listed:
  • Hyunje Yang

    (Forest Environment and Conservation Department, National Institute of Forest Science, Seoul 02455, Republic of Korea)

  • Honggeun Lim

    (Forest Environment and Conservation Department, National Institute of Forest Science, Seoul 02455, Republic of Korea)

  • Haewon Moon

    (Forest Environment and Conservation Department, National Institute of Forest Science, Seoul 02455, Republic of Korea)

  • Qiwen Li

    (Forest Environment and Conservation Department, National Institute of Forest Science, Seoul 02455, Republic of Korea)

  • Sooyoun Nam

    (Forest Environment and Conservation Department, National Institute of Forest Science, Seoul 02455, Republic of Korea)

  • Jaehoon Kim

    (Forest Environment and Conservation Department, National Institute of Forest Science, Seoul 02455, Republic of Korea)

  • Hyung Tae Choi

    (Forest Environment and Conservation Department, National Institute of Forest Science, Seoul 02455, Republic of Korea)

Abstract

Machine learning models are now capable of delivering coveted digital soil mapping (DSM) benefits (e.g., field capacity (FC) prediction); therefore, determining the optimal sample sites and sample size is essential to maximize the training efficacy. We solve this with a novel optimal sampling algorithm that allows the authentic augmentation of insufficient soil features using machine learning predictive uncertainty. Nine hundred and fifty-three forest soil samples and geographically referenced forest information were used to develop predictive models, and FCs in South Korea were estimated with six predictor set hierarchies. Random forest and gradient boosting models were used for estimation since tree-based models had better predictive performance than other machine learning algorithms. There was a significant relationship between model predictive uncertainties and training data distribution, where higher uncertainties were distributed in the data scarcity area. Further, we confirmed that the predictive uncertainties decreased when additional sample sites were added to the training data. Environmental covariate information of each grid cell in South Korea was then used to select the sampling sites. Optimal sites were coordinated at the cell having the highest predictive uncertainty, and the sample size was determined using the predictable rate. This intuitive method can be generalized to improve global DSM.

Suggested Citation

  • Hyunje Yang & Honggeun Lim & Haewon Moon & Qiwen Li & Sooyoun Nam & Jaehoon Kim & Hyung Tae Choi, 2022. "Simple Optimal Sampling Algorithm to Strengthen Digital Soil Mapping Using the Spatial Distribution of Machine Learning Predictive Uncertainty: A Case Study for Field Capacity Prediction," Land, MDPI, vol. 11(11), pages 1-18, November.
  • Handle: RePEc:gam:jlands:v:11:y:2022:i:11:p:2098-:d:979647
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2073-445X/11/11/2098/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2073-445X/11/11/2098/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Vân Anh Huynh-Thu & Alexandre Irrthum & Louis Wehenkel & Pierre Geurts, 2010. "Inferring Regulatory Networks from Expression Data Using Tree-Based Methods," PLOS ONE, Public Library of Science, vol. 5(9), pages 1-10, September.
    2. Hyunje Yang & Hyeonju Yoo & Honggeun Lim & Jaehoon Kim & Hyung Tae Choi, 2021. "Impacts of Soil Properties, Topography, and Environmental Features on Soil Water Holding Capacities (SWHCs) and Their Interrelationships," Land, MDPI, vol. 10(12), pages 1-15, November.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Qingfei Pan & Liang Ding & Siarhei Hladyshau & Xiangyu Yao & Jiayu Zhou & Lei Yan & Yogesh Dhungana & Hao Shi & Chenxi Qian & Xinran Dong & Chad Burdyshaw & Joao Pedro Veloso & Alireza Khatamian & Zhe, 2025. "scMINER: a mutual information-based framework for clustering and hidden driver inference from single-cell transcriptomics data," Nature Communications, Nature, vol. 16(1), pages 1-20, December.
    2. Lulu Shang & Jennifer A Smith & Xiang Zhou, 2020. "Leveraging gene co-expression patterns to infer trait-relevant tissues in genome-wide association studies," PLOS Genetics, Public Library of Science, vol. 16(4), pages 1-30, April.
    3. Cecilia Pessoa Rodrigues & Aindrila Chatterjee & Meike Wiese & Thomas Stehle & Witold Szymanski & Maria Shvedunova & Asifa Akhtar, 2021. "Histone H4 lysine 16 acetylation controls central carbon metabolism and diet-induced obesity in mice," Nature Communications, Nature, vol. 12(1), pages 1-21, December.
    4. Marius Arend & Yizhong Yuan & M. Águila Ruiz-Sola & Nooshin Omranian & Zoran Nikoloski & Dimitris Petroutsos, 2023. "Widening the landscape of transcriptional regulation of green algal photoprotection," Nature Communications, Nature, vol. 14(1), pages 1-15, December.
    5. Ruonan Wu & Michelle R. Davison & William C. Nelson & Montana L. Smith & Mary S. Lipton & Janet K. Jansson & Ryan S. McClure & Jason E. McDermott & Kirsten S. Hofmockel, 2023. "Hi-C metagenome sequencing reveals soil phage–host interactions," Nature Communications, Nature, vol. 14(1), pages 1-12, December.
    6. Fei Liu & Shao-Wu Zhang & Wei-Feng Guo & Ze-Gang Wei & Luonan Chen, 2016. "Inference of Gene Regulatory Network Based on Local Bayesian Networks," PLOS Computational Biology, Public Library of Science, vol. 12(8), pages 1-17, August.
    7. Yu Chang & Yujie Fang & Jiahan Liu & Tiantian Ye & Xiaokai Li & Haifu Tu & Ying Ye & Yao Wang & Lizhong Xiong, 2024. "Stress-induced nuclear translocation of ONAC023 improves drought and heat tolerance through multiple processes in rice," Nature Communications, Nature, vol. 15(1), pages 1-21, December.
    8. Edoardo Bertolini & Brian R. Rice & Max Braud & Jiani Yang & Sarah Hake & Josh Strable & Alexander E. Lipka & Andrea L. Eveland, 2025. "Regulatory variation controlling architectural pleiotropy in maize," Nature Communications, Nature, vol. 16(1), pages 1-18, December.
    9. Mingyi Wang & Jerome Verdier & Vagner A Benedito & Yuhong Tang & Jeremy D Murray & Yinbing Ge & Jörg D Becker & Helena Carvalho & Christian Rogers & Michael Udvardi & Ji He, 2013. "LegumeGRN: A Gene Regulatory Network Prediction Server for Functional and Comparative Studies," PLOS ONE, Public Library of Science, vol. 8(7), pages 1-7, July.
    10. Qiao Wen Tan & Peng Ken Lim & Zhong Chen & Asher Pasha & Nicholas Provart & Marius Arend & Zoran Nikoloski & Marek Mutwil, 2023. "Cross-stress gene expression atlas of Marchantia polymorpha reveals the hierarchy and regulatory principles of abiotic stress responses," Nature Communications, Nature, vol. 14(1), pages 1-19, December.
    11. Alfonso Monaco & Nicola Amoroso & Loredana Bellantuono & Eufemia Lella & Angela Lombardi & Anna Monda & Andrea Tateo & Roberto Bellotti & Sabina Tangaro, 2019. "Shannon entropy approach reveals relevant genes in Alzheimer’s disease," PLOS ONE, Public Library of Science, vol. 14(12), pages 1-29, December.
    12. Bastien Lextrait, 2021. "Scaling up SME's credit scoring scope with LightGBM," EconomiX Working Papers 2021-25, University of Paris Nanterre, EconomiX.
    13. Maghsoodi, Masoume, 2016. "A New Method to Build Gene Regulation Network Based on Fuzzy Hierarchical Clustering Methods," MPRA Paper 79743, University Library of Munich, Germany.
    14. Ze Yan & Ji Yang & Wen-Tian Wei & Ming-Liang Zhou & Dong-Xin Mo & Xing Wan & Rui Ma & Mei-Ming Wu & Jia-Hui Huang & Ya-Jing Liu & Feng-Hua Lv & Meng-Hua Li, 2024. "A time-resolved multi-omics atlas of transcriptional regulation in response to high-altitude hypoxia across whole-body tissues," Nature Communications, Nature, vol. 15(1), pages 1-22, December.
    15. Rachael M. Zemek & Wee Loong Chin & Vanessa S. Fear & Ben Wylie & Thomas H. Casey & Cath Forbes & Caitlin M. Tilsed & Louis Boon & Belinda B. Guo & Anthony Bosco & Alistair R. R. Forrest & Michael J. , 2022. "Temporally restricted activation of IFNβ signaling underlies response to immune checkpoint therapy in mice," Nature Communications, Nature, vol. 13(1), pages 1-16, December.
    16. Dongsheng Chen & Jian Sun & Jiacheng Zhu & Xiangning Ding & Tianming Lan & Xiran Wang & Weiying Wu & Zhihua Ou & Linnan Zhu & Peiwen Ding & Haoyu Wang & Lihua Luo & Rong Xiang & Xiaoling Wang & Jiayin, 2021. "Single cell atlas for 11 non-model mammals, reptiles and birds," Nature Communications, Nature, vol. 12(1), pages 1-17, December.
    17. Hinako M Takase & Tappei Mishina & Tetsutaro Hayashi & Mika Yoshimura & Mariko Kuse & Itoshi Nikaido & Tomoya S Kitajima, 2024. "Transcriptomic signatures of WNT-driven pathways and granulosa cell-oocyte interactions during primordial follicle activation," PLOS ONE, Public Library of Science, vol. 19(10), pages 1-26, October.
    18. Jie Xiong & Tong Zhou, 2012. "Gene Regulatory Network Inference from Multifactorial Perturbation Data Using both Regression and Correlation Analyses," PLOS ONE, Public Library of Science, vol. 7(9), pages 1-13, September.
    19. Marco Grimaldi & Roberto Visintainer & Giuseppe Jurman, 2011. "RegnANN: Reverse Engineering Gene Networks Using Artificial Neural Networks," PLOS ONE, Public Library of Science, vol. 6(12), pages 1-19, December.
    20. Dmitry Chernykh & Roman Biryukov & Andrey Bondarovich & Lilia Lubenets & Anatoly Pavlenko & Kamilla Rakhymbek & Denis Revenko & Zheniskul Zhantassova, 2025. "Spatiotemporal Analysis of Soil Moisture Variability and Precipitation Response Across Soil Texture Classes in East Kazakhstan," Land, MDPI, vol. 14(6), pages 1-20, May.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jlands:v:11:y:2022:i:11:p:2098-:d:979647. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.