IDEAS home Printed from https://ideas.repec.org/a/spr/jagbes/v27y2022i1d10.1007_s13253-021-00479-7.html
   My bibliography  Save this article

Techniques to Improve Ecological Interpretability of Black-Box Machine Learning Models

Author

Listed:
  • Thomas Welchowski

    (University of Bonn)

  • Kelly O. Maloney

    (U.S. Geological Survey (USGS) Eastern Ecological Science Center at the Leetown Research Laboratory)

  • Richard Mitchell

    (U.S. Environmental Protection Agency Office of Water Washington)

  • Matthias Schmid

    (University of Bonn)

Abstract

Statistical modeling of ecological data is often faced with a large number of variables as well as possible nonlinear relationships and higher-order interaction effects. Gradient boosted trees (GBT) have been successful in addressing these issues and have shown a good predictive performance in modeling nonlinear relationships, in particular in classification settings with a categorical response variable. They also tend to be robust against outliers. However, their black-box nature makes it difficult to interpret these models. We introduce several recently developed statistical tools to the environmental research community in order to advance interpretation of these black-box models. To analyze the properties of the tools, we applied gradient boosted trees to investigate biological health of streams within the contiguous USA, as measured by a benthic macroinvertebrate biotic index. Based on these data and a simulation study, we demonstrate the advantages and limitations of partial dependence plots (PDP), individual conditional expectation (ICE) curves and accumulated local effects (ALE) in their ability to identify covariate–response relationships. Additionally, interaction effects were quantified according to interaction strength (IAS) and Friedman’s $$H^2$$ H 2 statistic. Interpretable machine learning techniques are useful tools to open the black-box of gradient boosted trees in the environmental sciences. This finding is supported by our case study on the effect of impervious surface on the benthic condition, which agrees with previous results in the literature. Overall, the most important variables were ecoregion, bed stability, watershed area, riparian vegetation and catchment slope. These variables were also present in most identified interaction effects. In conclusion, graphical tools (PDP, ICE, ALE) enable visualization and easier interpretation of GBT but should be supported by analytical statistical measures. Future methodological research is needed to investigate the properties of interaction tests. Supplementary materials accompanying this paper appear on-line.

Suggested Citation

  • Thomas Welchowski & Kelly O. Maloney & Richard Mitchell & Matthias Schmid, 2022. "Techniques to Improve Ecological Interpretability of Black-Box Machine Learning Models," Journal of Agricultural, Biological and Environmental Statistics, Springer;The International Biometric Society;American Statistical Association, vol. 27(1), pages 175-197, March.
  • Handle: RePEc:spr:jagbes:v:27:y:2022:i:1:d:10.1007_s13253-021-00479-7
    DOI: 10.1007/s13253-021-00479-7
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s13253-021-00479-7
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s13253-021-00479-7?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. van der Laan Mark J., 2006. "Statistical Inference for Variable Importance," The International Journal of Biostatistics, De Gruyter, vol. 2(1), pages 1-33, February.
    2. Stevens, Don L. & Olsen, Anthony R., 2004. "Spatially Balanced Sampling of Natural Resources," Journal of the American Statistical Association, American Statistical Association, vol. 99, pages 262-278, January.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Lorenzo Fattorini & Timothy G. Gregoire & Sara Trentini, 2018. "The Use of Calibration Weighting for Variance Estimation Under Systematic Sampling: Applications to Forest Cover Assessment," Journal of Agricultural, Biological and Environmental Statistics, Springer;The International Biometric Society;American Statistical Association, vol. 23(3), pages 358-373, September.
    2. Brian D. Williamson & Peter B. Gilbert & Marco Carone & Noah Simon, 2021. "Nonparametric variable importance assessment using machine learning techniques," Biometrics, The International Biometric Society, vol. 77(1), pages 9-22, March.
    3. Anton Grafström & Niklas L. P. Lundström & Lina Schelin, 2012. "Spatially Balanced Sampling through the Pivotal Method," Biometrics, The International Biometric Society, vol. 68(2), pages 514-520, June.
    4. Rose Sherri & van der Laan Mark J., 2008. "Simple Optimal Weighting of Cases and Controls in Case-Control Studies," The International Journal of Biostatistics, De Gruyter, vol. 4(1), pages 1-24, September.
    5. Xin Zhao & Anton Grafström, 2020. "A sample coordination method to monitor totals of environmental variables," Environmetrics, John Wiley & Sons, Ltd., vol. 31(6), September.
    6. Linda Altieri & Daniela Cocchi, 2021. "Spatial Sampling for Non‐compact Patterns," International Statistical Review, International Statistical Institute, vol. 89(3), pages 532-549, December.
    7. Matthew Nahorniak & David P Larsen & Carol Volk & Chris E Jordan, 2015. "Using Inverse Probability Bootstrap Sampling to Eliminate Sample Induced Bias in Model Based Analysis of Unequal Probability Samples," PLOS ONE, Public Library of Science, vol. 10(6), pages 1-19, June.
    8. Alcasena, Fermín J. & Salis, Michele & Nauslar, Nicholas J. & Aguinaga, A. Eduardo & Vega-García, Cristina, 2016. "Quantifying economic losses from wildfires in black pine afforestations of northern Spain," Forest Policy and Economics, Elsevier, vol. 73(C), pages 153-167.
    9. Kathryn M. Irvine & T. J. Rodhouse & Ilai N. Keren, 2016. "Extending Ordinal Regression with a Latent Zero-Augmented Beta Distribution," Journal of Agricultural, Biological and Environmental Statistics, Springer;The International Biometric Society;American Statistical Association, vol. 21(4), pages 619-640, December.
    10. B. L. Robertson & J. A. Brown & T. McDonald & P. Jaksons, 2013. "BAS: Balanced Acceptance Sampling of Natural Resources," Biometrics, The International Biometric Society, vol. 69(3), pages 776-784, September.
    11. Shepherd, Keith D. & Shepherd, Gemma & Walsh, Markus G., 2015. "Land health surveillance and response: A framework for evidence-informed land management," Agricultural Systems, Elsevier, vol. 132(C), pages 93-106.
    12. Antoine Chambaz & Mark J. Laan, 2014. "Inference in Targeted Group-Sequential Covariate-Adjusted Randomized Clinical Trials," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 41(1), pages 104-140, March.
    13. Lorenzo Fattorini & Marzia Marcheselli & Caterina Pisani & Luca Pratelli, 2022. "Design‐based properties of the nearest neighbor spatial interpolator and its bootstrap mean squared error estimator," Biometrics, The International Biometric Society, vol. 78(4), pages 1454-1463, December.
    14. Robertson, B.L. & McDonald, T. & Price, C.J. & Brown, J.A., 2017. "A modification of balanced acceptance sampling," Statistics & Probability Letters, Elsevier, vol. 129(C), pages 107-112.
    15. Maria Michela Dickson & Yves Tillé, 2016. "Ordered spatial sampling by means of the traveling salesman problem," Computational Statistics, Springer, vol. 31(4), pages 1359-1372, December.
    16. Xuli Zan & Zuliang Zhao & Wei Liu & Xiaodong Zhang & Zhe Liu & Shaoming Li & Dehai Zhu, 2019. "The Layout of Maize Variety Test Sites Based on the Spatiotemporal Classification of the Planting Environment," Sustainability, MDPI, vol. 11(13), pages 1-15, July.
    17. McHugh, Peter A. & Saunders, W. Carl & Bouwes, Nicolaas & Wall, C. Eric & Bangen, Sara & Wheaton, Joseph M. & Nahorniak, Matthew & Ruzycki, James R. & Tattam, Ian A. & Jordan, Chris E., 2017. "Linking models across scales to assess the viability and restoration potential of a threatened population of steelhead (Oncorhynchus mykiss) in the Middle Fork John Day River, Oregon, USA," Ecological Modelling, Elsevier, vol. 355(C), pages 24-38.
    18. Guillaume Chauvet & Ronan Le Gleut, 2021. "Inference under pivotal sampling: Properties, variance estimation, and application to tesselation for spatial sampling," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 48(1), pages 108-131, March.
    19. Steen Magnussen & Johannes Breidenbach, 2020. "Retrieval of among-stand variances from one observation per stand," Journal of Forest Science, Czech Academy of Agricultural Sciences, vol. 66(4), pages 133-149.
    20. Gideon Minua Kwaku Ampofo & Prosper Basommi Laari & Emmanuel Opoku Ware & Williams Shaw, 2023. "Further investigation of the total natural resource rents and economic growth nexus in resource-abundant sub-Saharan African countries," Mineral Economics, Springer;Raw Materials Group (RMG);Luleå University of Technology, vol. 36(1), pages 97-121, January.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:jagbes:v:27:y:2022:i:1:d:10.1007_s13253-021-00479-7. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.