IDEAS home Printed from https://ideas.repec.org/a/spr/jagbes/v27y2022i1d10.1007_s13253-021-00479-7.html
   My bibliography  Save this article

Techniques to Improve Ecological Interpretability of Black-Box Machine Learning Models

Author

Listed:
  • Thomas Welchowski

    (University of Bonn)

  • Kelly O. Maloney

    (U.S. Geological Survey (USGS) Eastern Ecological Science Center at the Leetown Research Laboratory)

  • Richard Mitchell

    (U.S. Environmental Protection Agency Office of Water Washington)

  • Matthias Schmid

    (University of Bonn)

Abstract

Statistical modeling of ecological data is often faced with a large number of variables as well as possible nonlinear relationships and higher-order interaction effects. Gradient boosted trees (GBT) have been successful in addressing these issues and have shown a good predictive performance in modeling nonlinear relationships, in particular in classification settings with a categorical response variable. They also tend to be robust against outliers. However, their black-box nature makes it difficult to interpret these models. We introduce several recently developed statistical tools to the environmental research community in order to advance interpretation of these black-box models. To analyze the properties of the tools, we applied gradient boosted trees to investigate biological health of streams within the contiguous USA, as measured by a benthic macroinvertebrate biotic index. Based on these data and a simulation study, we demonstrate the advantages and limitations of partial dependence plots (PDP), individual conditional expectation (ICE) curves and accumulated local effects (ALE) in their ability to identify covariate–response relationships. Additionally, interaction effects were quantified according to interaction strength (IAS) and Friedman’s $$H^2$$ H 2 statistic. Interpretable machine learning techniques are useful tools to open the black-box of gradient boosted trees in the environmental sciences. This finding is supported by our case study on the effect of impervious surface on the benthic condition, which agrees with previous results in the literature. Overall, the most important variables were ecoregion, bed stability, watershed area, riparian vegetation and catchment slope. These variables were also present in most identified interaction effects. In conclusion, graphical tools (PDP, ICE, ALE) enable visualization and easier interpretation of GBT but should be supported by analytical statistical measures. Future methodological research is needed to investigate the properties of interaction tests. Supplementary materials accompanying this paper appear on-line.

Suggested Citation

  • Thomas Welchowski & Kelly O. Maloney & Richard Mitchell & Matthias Schmid, 2022. "Techniques to Improve Ecological Interpretability of Black-Box Machine Learning Models," Journal of Agricultural, Biological and Environmental Statistics, Springer;The International Biometric Society;American Statistical Association, vol. 27(1), pages 175-197, March.
  • Handle: RePEc:spr:jagbes:v:27:y:2022:i:1:d:10.1007_s13253-021-00479-7
    DOI: 10.1007/s13253-021-00479-7
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s13253-021-00479-7
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s13253-021-00479-7?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. van der Laan Mark J., 2006. "Statistical Inference for Variable Importance," The International Journal of Biostatistics, De Gruyter, vol. 2(1), pages 1-33, February.
    2. Stevens, Don L. & Olsen, Anthony R., 2004. "Spatially Balanced Sampling of Natural Resources," Journal of the American Statistical Association, American Statistical Association, vol. 99, pages 262-278, January.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Tomasz Bąk, 2021. "Spatial sampling methods modified by model use," Statistics in Transition New Series, Polish Statistical Association, vol. 22(2), pages 143-154, June.
    2. Lorenzo Fattorini & Timothy G. Gregoire & Sara Trentini, 2018. "The Use of Calibration Weighting for Variance Estimation Under Systematic Sampling: Applications to Forest Cover Assessment," Journal of Agricultural, Biological and Environmental Statistics, Springer;The International Biometric Society;American Statistical Association, vol. 23(3), pages 358-373, September.
    3. Brian D. Williamson & Peter B. Gilbert & Marco Carone & Noah Simon, 2021. "Nonparametric variable importance assessment using machine learning techniques," Biometrics, The International Biometric Society, vol. 77(1), pages 9-22, March.
    4. Pommerening, Arne & Szmyt, Janusz & Zhang, Gongqiao, 2020. "A new nearest-neighbour index for monitoring spatial size diversity: The hyperbolic tangent index," Ecological Modelling, Elsevier, vol. 435(C).
    5. Tuglus Catherine & van der Laan Mark J., 2011. "Repeated Measures Semiparametric Regression Using Targeted Maximum Likelihood Methodology with Application to Transcription Factor Activity Discovery," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 10(1), pages 1-31, January.
    6. Sokbae Lee & Ryo Okui & Yoon†Jae Whang, 2017. "Doubly robust uniform confidence band for the conditional average treatment effect function," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 32(7), pages 1207-1225, November.
    7. Anton Grafström & Niklas L. P. Lundström & Lina Schelin, 2012. "Spatially Balanced Sampling through the Pivotal Method," Biometrics, The International Biometric Society, vol. 68(2), pages 514-520, June.
    8. Rose Sherri & van der Laan Mark J., 2008. "Simple Optimal Weighting of Cases and Controls in Case-Control Studies," The International Journal of Biostatistics, De Gruyter, vol. 4(1), pages 1-24, September.
    9. Raphaël Jauslin & Bardia Panahbehagh & Yves Tillé, 2022. "Sequential spatially balanced sampling," Environmetrics, John Wiley & Sons, Ltd., vol. 33(8), December.
    10. Xin Zhao & Anton Grafström, 2020. "A sample coordination method to monitor totals of environmental variables," Environmetrics, John Wiley & Sons, Ltd., vol. 31(6), September.
    11. Huan Xie & Fang Wang & Yali Gong & Xiaohua Tong & Yanmin Jin & Ang Zhao & Chao Wei & Xinyi Zhang & Shicheng Liao, 2022. "Spatially Balanced Sampling for Validation of GlobeLand30 Using Landscape Pattern-Based Inclusion Probability," Sustainability, MDPI, vol. 14(5), pages 1-19, February.
    12. Linda Altieri & Daniela Cocchi, 2021. "Spatial Sampling for Non‐compact Patterns," International Statistical Review, International Statistical Institute, vol. 89(3), pages 532-549, December.
    13. Sara Franceschi & Rosa Maria Di Biase & Agnese Marcelli & Lorenzo Fattorini, 2022. "Some Empirical Results on Nearest-Neighbour Pseudo-populations for Resampling from Spatial Populations," Stats, MDPI, vol. 5(2), pages 1-16, April.
    14. Matthew Nahorniak & David P Larsen & Carol Volk & Chris E Jordan, 2015. "Using Inverse Probability Bootstrap Sampling to Eliminate Sample Induced Bias in Model Based Analysis of Unequal Probability Samples," PLOS ONE, Public Library of Science, vol. 10(6), pages 1-19, June.
    15. Alcasena, Fermín J. & Salis, Michele & Nauslar, Nicholas J. & Aguinaga, A. Eduardo & Vega-García, Cristina, 2016. "Quantifying economic losses from wildfires in black pine afforestations of northern Spain," Forest Policy and Economics, Elsevier, vol. 73(C), pages 153-167.
    16. Kathryn M. Irvine & T. J. Rodhouse & Ilai N. Keren, 2016. "Extending Ordinal Regression with a Latent Zero-Augmented Beta Distribution," Journal of Agricultural, Biological and Environmental Statistics, Springer;The International Biometric Society;American Statistical Association, vol. 21(4), pages 619-640, December.
    17. B. L. Robertson & J. A. Brown & T. McDonald & P. Jaksons, 2013. "BAS: Balanced Acceptance Sampling of Natural Resources," Biometrics, The International Biometric Society, vol. 69(3), pages 776-784, September.
    18. Shepherd, Keith D. & Shepherd, Gemma & Walsh, Markus G., 2015. "Land health surveillance and response: A framework for evidence-informed land management," Agricultural Systems, Elsevier, vol. 132(C), pages 93-106.
    19. B. L. Robertson & O. Ozturk & O. Kravchuk & J. A. Brown, 2022. "Spatially Balanced Sampling with Local Ranking," Journal of Agricultural, Biological and Environmental Statistics, Springer;The International Biometric Society;American Statistical Association, vol. 27(4), pages 622-639, December.
    20. Jacopo Paglia & Jo Eidsvik & Juha Karvanen, 2022. "Efficient spatial designs using Hausdorff distances and Bayesian optimization," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 49(3), pages 1060-1084, September.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:jagbes:v:27:y:2022:i:1:d:10.1007_s13253-021-00479-7. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.