IDEAS home Printed from https://ideas.repec.org/a/gam/jlands/v14y2025i3p545-d1605846.html
   My bibliography  Save this article

Post-hoc Evaluation of Sample Size in a Regional Digital Soil Mapping Project

Author

Listed:
  • Daniel D. Saurette

    (School of Environmental Sciences, University of Guelph, 50 Stone Rd East, Guelph, ON N1G 2W1, Canada
    Ontario Ministry of Agriculture, Food and Agribusiness, 1 Stone Rd West, Guelph, ON N1G 2Y4, Canada)

  • Richard J. Heck

    (School of Environmental Sciences, University of Guelph, 50 Stone Rd East, Guelph, ON N1G 2W1, Canada)

  • Adam W. Gillespie

    (School of Environmental Sciences, University of Guelph, 50 Stone Rd East, Guelph, ON N1G 2W1, Canada)

  • Aaron A. Berg

    (Department of Geography, Environment & Geomatics, University of Guelph, 50 Stone Rd East, Guelph, ON N1G 2W1, Canada)

  • Asim Biswas

    (School of Environmental Sciences, University of Guelph, 50 Stone Rd East, Guelph, ON N1G 2W1, Canada)

Abstract

The transition from conventional soil mapping (CSM) to digital soil mapping (DSM) not only affects the final map products, but it also affects the concepts of scale, resolution, and sampling intensity. This is critical because in the CSM approach, sampling intensity is intricately linked to the desired scale of soil map publication, which provided standardization of sampling. This is not the case for DSM where sample size varies widely by project, and sampling design studies have largely focused on where to sample without due consideration for sample size. Using a regional soil survey dataset with 1791 sampled and described soil profiles, we first extracted an external validation dataset using the conditioned Latin hypercube sampling (cLHS) algorithm and then created repeated ( n = 10) sample plans of increasing size from the remaining calibration sites using the cLHS, feature space coverage sampling (FSCS), and simple random sampling (SRS). We then trained random forest (RF) models for four soil properties: pH, CEC, clay content, and SOC at five different depths. We identified the effective sample size based on the model learning curves and compared it to the optimal sample size determined from the Jensen–Shannon divergence (D JS ) applied to the environmental covariates. Maps were then generated from models that used all the calibration points (reference maps) and from models that used the optimal sample size (optimal maps) for comparison. Our findings revealed that the optimal sample sizes based on the D JS analysis were closely aligned with the effective sample sizes from the model learning curves (815 for cLHS, 832 for FSCS, and 847 for SRS). Furthermore, the comparison of the optimal maps to the reference maps showed little difference in the global statistics (concordance correlation coefficient and root mean square error) and spatial trends of the data, confirming that the optimal sample size was sufficient for creating predictions of similar accuracy to the full calibration dataset. Finally, we conclude that the Ottawa soil survey project could have saved between CAD 330,500 and CAD 374,000 (CAD = Canadian dollars) if the determination of optimal sample size tools presented herein existed during the project planning phase. This clearly illustrates the need for additional research in determining an optimal sample size for DSM and demonstrates that operationalization of DSM in public institutions requires a sound scientific basis for determining sample size.

Suggested Citation

  • Daniel D. Saurette & Richard J. Heck & Adam W. Gillespie & Aaron A. Berg & Asim Biswas, 2025. "Post-hoc Evaluation of Sample Size in a Regional Digital Soil Mapping Project," Land, MDPI, vol. 14(3), pages 1-22, March.
  • Handle: RePEc:gam:jlands:v:14:y:2025:i:3:p:545-:d:1605846
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2073-445X/14/3/545/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2073-445X/14/3/545/
    Download Restriction: no
    ---><---

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jlands:v:14:y:2025:i:3:p:545-:d:1605846. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.