IDEAS home Printed from https://ideas.repec.org/a/gam/jlands/v11y2022i11p2098-d979647.html
   My bibliography  Save this article

Simple Optimal Sampling Algorithm to Strengthen Digital Soil Mapping Using the Spatial Distribution of Machine Learning Predictive Uncertainty: A Case Study for Field Capacity Prediction

Author

Listed:
  • Hyunje Yang

    (Forest Environment and Conservation Department, National Institute of Forest Science, Seoul 02455, Republic of Korea)

  • Honggeun Lim

    (Forest Environment and Conservation Department, National Institute of Forest Science, Seoul 02455, Republic of Korea)

  • Haewon Moon

    (Forest Environment and Conservation Department, National Institute of Forest Science, Seoul 02455, Republic of Korea)

  • Qiwen Li

    (Forest Environment and Conservation Department, National Institute of Forest Science, Seoul 02455, Republic of Korea)

  • Sooyoun Nam

    (Forest Environment and Conservation Department, National Institute of Forest Science, Seoul 02455, Republic of Korea)

  • Jaehoon Kim

    (Forest Environment and Conservation Department, National Institute of Forest Science, Seoul 02455, Republic of Korea)

  • Hyung Tae Choi

    (Forest Environment and Conservation Department, National Institute of Forest Science, Seoul 02455, Republic of Korea)

Abstract

Machine learning models are now capable of delivering coveted digital soil mapping (DSM) benefits (e.g., field capacity (FC) prediction); therefore, determining the optimal sample sites and sample size is essential to maximize the training efficacy. We solve this with a novel optimal sampling algorithm that allows the authentic augmentation of insufficient soil features using machine learning predictive uncertainty. Nine hundred and fifty-three forest soil samples and geographically referenced forest information were used to develop predictive models, and FCs in South Korea were estimated with six predictor set hierarchies. Random forest and gradient boosting models were used for estimation since tree-based models had better predictive performance than other machine learning algorithms. There was a significant relationship between model predictive uncertainties and training data distribution, where higher uncertainties were distributed in the data scarcity area. Further, we confirmed that the predictive uncertainties decreased when additional sample sites were added to the training data. Environmental covariate information of each grid cell in South Korea was then used to select the sampling sites. Optimal sites were coordinated at the cell having the highest predictive uncertainty, and the sample size was determined using the predictable rate. This intuitive method can be generalized to improve global DSM.

Suggested Citation

  • Hyunje Yang & Honggeun Lim & Haewon Moon & Qiwen Li & Sooyoun Nam & Jaehoon Kim & Hyung Tae Choi, 2022. "Simple Optimal Sampling Algorithm to Strengthen Digital Soil Mapping Using the Spatial Distribution of Machine Learning Predictive Uncertainty: A Case Study for Field Capacity Prediction," Land, MDPI, vol. 11(11), pages 1-18, November.
  • Handle: RePEc:gam:jlands:v:11:y:2022:i:11:p:2098-:d:979647
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2073-445X/11/11/2098/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2073-445X/11/11/2098/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Vân Anh Huynh-Thu & Alexandre Irrthum & Louis Wehenkel & Pierre Geurts, 2010. "Inferring Regulatory Networks from Expression Data Using Tree-Based Methods," PLOS ONE, Public Library of Science, vol. 5(9), pages 1-10, September.
    2. Hyunje Yang & Hyeonju Yoo & Honggeun Lim & Jaehoon Kim & Hyung Tae Choi, 2021. "Impacts of Soil Properties, Topography, and Environmental Features on Soil Water Holding Capacities (SWHCs) and Their Interrelationships," Land, MDPI, vol. 10(12), pages 1-15, November.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Cecilia Pessoa Rodrigues & Aindrila Chatterjee & Meike Wiese & Thomas Stehle & Witold Szymanski & Maria Shvedunova & Asifa Akhtar, 2021. "Histone H4 lysine 16 acetylation controls central carbon metabolism and diet-induced obesity in mice," Nature Communications, Nature, vol. 12(1), pages 1-21, December.
    2. Jie Xiong & Tong Zhou, 2012. "Gene Regulatory Network Inference from Multifactorial Perturbation Data Using both Regression and Correlation Analyses," PLOS ONE, Public Library of Science, vol. 7(9), pages 1-13, September.
    3. Marco Grimaldi & Roberto Visintainer & Giuseppe Jurman, 2011. "RegnANN: Reverse Engineering Gene Networks Using Artificial Neural Networks," PLOS ONE, Public Library of Science, vol. 6(12), pages 1-19, December.
    4. Marius Arend & Yizhong Yuan & M. Águila Ruiz-Sola & Nooshin Omranian & Zoran Nikoloski & Dimitris Petroutsos, 2023. "Widening the landscape of transcriptional regulation of green algal photoprotection," Nature Communications, Nature, vol. 14(1), pages 1-15, December.
    5. Takeshi Hase & Samik Ghosh & Ryota Yamanaka & Hiroaki Kitano, 2013. "Harnessing Diversity towards the Reconstructing of Large Scale Gene Regulatory Networks," PLOS Computational Biology, Public Library of Science, vol. 9(11), pages 1-16, November.
    6. Ruonan Wu & Michelle R. Davison & William C. Nelson & Montana L. Smith & Mary S. Lipton & Janet K. Jansson & Ryan S. McClure & Jason E. McDermott & Kirsten S. Hofmockel, 2023. "Hi-C metagenome sequencing reveals soil phage–host interactions," Nature Communications, Nature, vol. 14(1), pages 1-12, December.
    7. Kinzy Tyler G. & Starr Timothy K. & Tseng George C. & Ho Yen-Yi, 2019. "Meta-analytic framework for modeling genetic coexpression dynamics," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 18(1), pages 1-13, February.
    8. Li, Jiawen & Meng, Lu & Zhang, Zelin & Yang, Kejia, 2023. "Low-frequency, high-impact: Discovering important rare events from UGC," Journal of Retailing and Consumer Services, Elsevier, vol. 70(C).
    9. Fei Liu & Shao-Wu Zhang & Wei-Feng Guo & Ze-Gang Wei & Luonan Chen, 2016. "Inference of Gene Regulatory Network Based on Local Bayesian Networks," PLOS Computational Biology, Public Library of Science, vol. 12(8), pages 1-17, August.
    10. Lingfei Wang & Tom Michoel, 2017. "Efficient and accurate causal inference with hidden confounders from genome-transcriptome variation data," PLOS Computational Biology, Public Library of Science, vol. 13(8), pages 1-26, August.
    11. Mingyi Wang & Jerome Verdier & Vagner A Benedito & Yuhong Tang & Jeremy D Murray & Yinbing Ge & Jörg D Becker & Helena Carvalho & Christian Rogers & Michael Udvardi & Ji He, 2013. "LegumeGRN: A Gene Regulatory Network Prediction Server for Functional and Comparative Studies," PLOS ONE, Public Library of Science, vol. 8(7), pages 1-7, July.
    12. Fei Wang & Peiwen Ding & Xue Liang & Xiangning Ding & Camilla Blunk Brandt & Evelina Sjöstedt & Jiacheng Zhu & Saga Bolund & Lijing Zhang & Laura P. M. H. Rooij & Lihua Luo & Yanan Wei & Wandong Zhao , 2022. "Endothelial cell heterogeneity and microglia regulons revealed by a pig cell landscape at single-cell level," Nature Communications, Nature, vol. 13(1), pages 1-18, December.
    13. Qiao Wen Tan & Peng Ken Lim & Zhong Chen & Asher Pasha & Nicholas Provart & Marius Arend & Zoran Nikoloski & Marek Mutwil, 2023. "Cross-stress gene expression atlas of Marchantia polymorpha reveals the hierarchy and regulatory principles of abiotic stress responses," Nature Communications, Nature, vol. 14(1), pages 1-19, December.
    14. Alfonso Monaco & Nicola Amoroso & Loredana Bellantuono & Eufemia Lella & Angela Lombardi & Anna Monda & Andrea Tateo & Roberto Bellotti & Sabina Tangaro, 2019. "Shannon entropy approach reveals relevant genes in Alzheimer’s disease," PLOS ONE, Public Library of Science, vol. 14(12), pages 1-29, December.
    15. Bastien Lextrait, 2021. "Scaling up SME's credit scoring scope with LightGBM," EconomiX Working Papers 2021-25, University of Paris Nanterre, EconomiX.
    16. Maghsoodi, Masoume, 2016. "A New Method to Build Gene Regulation Network Based on Fuzzy Hierarchical Clustering Methods," MPRA Paper 79743, University Library of Munich, Germany.
    17. Natalie M. Clark & Trevor M. Nolan & Ping Wang & Gaoyuan Song & Christian Montes & Conner T. Valentine & Hongqing Guo & Rosangela Sozzani & Yanhai Yin & Justin W. Walley, 2021. "Integrated omics networks reveal the temporal signaling events of brassinosteroid response in Arabidopsis," Nature Communications, Nature, vol. 12(1), pages 1-13, December.
    18. Holger Weishaupt & Patrik Johansson & Christopher Engström & Sven Nelander & Sergei Silvestrov & Fredrik J Swartling, 2017. "Loss of Conservation of Graph Centralities in Reverse-engineered Transcriptional Regulatory Networks," Methodology and Computing in Applied Probability, Springer, vol. 19(4), pages 1089-1105, December.
    19. Meichen Dong & Yiping He & Yuchao Jiang & Fei Zou, 2023. "Joint gene network construction by single‐cell RNA sequencing data," Biometrics, The International Biometric Society, vol. 79(2), pages 915-925, June.
    20. Natalie Jane de Vries & Rodrigo Reis & Pablo Moscato, 2015. "Clustering Consumers Based on Trust, Confidence and Giving Behaviour: Data-Driven Model Building for Charitable Involvement in the Australian Not-For-Profit Sector," PLOS ONE, Public Library of Science, vol. 10(4), pages 1-28, April.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jlands:v:11:y:2022:i:11:p:2098-:d:979647. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.