IDEAS home Printed from https://ideas.repec.org/a/eee/ecomod/v477y2023ics0304380022003465.html
   My bibliography  Save this article

Testing the effect of sample prevalence and sampling methods on probability- and favourability-based SDMs

Author

Listed:
  • Marchetto, Elisa
  • Da Re, Daniele
  • Tordoni, Enrico
  • Bazzichetto, Manuele
  • Zannini, Piero
  • Celebrin, Simone
  • Chieffallo, Ludovico
  • Malavasi, Marco
  • Rocchini, Duccio

Abstract

Predicting the occurrence probability of species is intrinsically dependent on the quality of the training dataset and, in particular, on the sample prevalence (i.e., the ratio between presences and absences). Whenever the number of presences and absences is not equal within the training dataset, the predictions deviate towards higher values as the sample prevalence increases and vice versa. As a result, probability models of species occurrence with different sample prevalence cannot be directly compared. The favourability concept was introduced to amend this limitation. Indeed, the favourability – i.e., the variation in the probability of occurrence regardless the sample prevalence – could reduce the degree of uncertainty when comparing species distributions despite different sample prevalences. To test this hypothesis, we simulated 50 virtual species and compared the predictive performance of four probability-based and favourability-based Species Distribution Models (GLM, GAM, RF, BRT) under a set of different prevalence values and sampling strategies (i.e, random and stratified sampling). Favourability-based models performed slightly better than probability-based models in predicting the species distribution over geographic space, confirming also their capability to reduce the variability of the predictions across different degrees of sample prevalence.

Suggested Citation

  • Marchetto, Elisa & Da Re, Daniele & Tordoni, Enrico & Bazzichetto, Manuele & Zannini, Piero & Celebrin, Simone & Chieffallo, Ludovico & Malavasi, Marco & Rocchini, Duccio, 2023. "Testing the effect of sample prevalence and sampling methods on probability- and favourability-based SDMs," Ecological Modelling, Elsevier, vol. 477(C).
  • Handle: RePEc:eee:ecomod:v:477:y:2023:i:c:s0304380022003465
    DOI: 10.1016/j.ecolmodel.2022.110248
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0304380022003465
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.ecolmodel.2022.110248?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Wright, Marvin N. & Ziegler, Andreas, 2017. "ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 77(i01).
    2. Fourcade, Yoan, 2021. "Fine-tuning niche models matters in invasion ecology. A lesson from the land planarian Obama nungara," Ecological Modelling, Elsevier, vol. 457(C).
    3. Sillero, Neftalí & Arenas-Castro, Salvador & Enriquez‐Urzelai, Urtzi & Vale, Cândida Gomes & Sousa-Guedes, Diana & Martínez-Freiría, Fernando & Real, Raimundo & Barbosa, A.Márcia, 2021. "Want to model a species niche? A step-by-step guideline on correlative ecological niche modelling," Ecological Modelling, Elsevier, vol. 456(C).
    4. Grimmett, Liam & Whitsed, Rachel & Horta, Ana, 2020. "Presence-only species distribution models are sensitive to sample prevalence: Evaluating models using spatial prediction stability and accuracy metrics," Ecological Modelling, Elsevier, vol. 431(C).
    5. Guo, Chuanbo & Lek, Sovan & Ye, Shaowen & Li, Wei & Liu, Jiashou & Li, Zhongjie, 2015. "Uncertainty in ensemble modelling of large-scale species distribution: Effects from species characteristics and model techniques," Ecological Modelling, Elsevier, vol. 306(C), pages 67-75.
    6. Barbosa, A. Márcia & Real, Raimundo & Mario Vargas, J., 2009. "Transferability of environmental favourability models in geographic space: The case of the Iberian desman (Galemys pyrenaicus) in Portugal and Spain," Ecological Modelling, Elsevier, vol. 220(5), pages 747-754.
    7. VanDerWal, Jeremy & Shoo, Luke P. & Graham, Catherine & Williams, Stephen E., 2009. "Selecting pseudo-absence data for presence-only distribution modeling: How far should you stray from what you know?," Ecological Modelling, Elsevier, vol. 220(4), pages 589-594.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Amaro, George & Fidelis, Elisangela Gomes & da Silva, Ricardo Siqueira & Marchioro, Cesar Augusto, 2023. "Effect of study area extent on the potential distribution of Species: A case study with models for Raoiella indica Hirst (Acari: Tenuipalpidae)," Ecological Modelling, Elsevier, vol. 483(C).
    2. Benkendorf, Donald J. & Schwartz, Samuel D. & Cutler, D. Richard & Hawkins, Charles P., 2023. "Correcting for the effects of class imbalance improves the performance of machine-learning based species distribution models," Ecological Modelling, Elsevier, vol. 483(C).

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Mariana Oliveira & Luís Torgo & Vítor Santos Costa, 2021. "Evaluation Procedures for Forecasting with Spatiotemporal Data," Mathematics, MDPI, vol. 9(6), pages 1-27, March.
    2. Alsamadisi, Adam G. & Tran, Liem T. & Papeş, Monica, 2020. "Employing inferences across scales: Integrating spatial data with different resolutions to enhance Maxent models," Ecological Modelling, Elsevier, vol. 415(C).
    3. Guo, Chuanbo & Chen, Yushun & Liu, Han & Lu, Yin & Qu, Xiao & Yuan, Hui & Lek, Sovan & Xie, Songguang, 2019. "Modelling fish communities in relation to water quality in the impounded lakes of China’s South-to-North Water Diversion Project," Ecological Modelling, Elsevier, vol. 397(C), pages 25-35.
    4. Arjan S. Gosal & Janine A. McMahon & Katharine M. Bowgen & Catherine H. Hoppe & Guy Ziv, 2021. "Identifying and Mapping Groups of Protected Area Visitors by Environmental Awareness," Land, MDPI, vol. 10(6), pages 1-14, May.
    5. Albert Stuart Reece & Gary Kenneth Hulse, 2022. "European Epidemiological Patterns of Cannabis- and Substance-Related Congenital Neurological Anomalies: Geospatiotemporal and Causal Inferential Study," IJERPH, MDPI, vol. 20(1), pages 1-35, December.
    6. Liang, Wanwan & Papeş, Monica & Tran, Liem & Grant, Jerome & Washington-Allen, Robert & Stewart, Scott & Wiggins, Gregory, 2018. "The effect of pseudo-absence selection method on transferability of species distribution models in the context of non-adaptive niche shift," Ecological Modelling, Elsevier, vol. 388(C), pages 1-9.
    7. Michael Parzinger & Lucia Hanfstaengl & Ferdinand Sigg & Uli Spindler & Ulrich Wellisch & Markus Wirnsberger, 2020. "Residual Analysis of Predictive Modelling Data for Automated Fault Detection in Building’s Heating, Ventilation and Air Conditioning Systems," Sustainability, MDPI, vol. 12(17), pages 1-18, August.
    8. Amaro, George & Fidelis, Elisangela Gomes & da Silva, Ricardo Siqueira & Marchioro, Cesar Augusto, 2023. "Effect of study area extent on the potential distribution of Species: A case study with models for Raoiella indica Hirst (Acari: Tenuipalpidae)," Ecological Modelling, Elsevier, vol. 483(C).
    9. Van Belle, Jente & Guns, Tias & Verbeke, Wouter, 2021. "Using shared sell-through data to forecast wholesaler demand in multi-echelon supply chains," European Journal of Operational Research, Elsevier, vol. 288(2), pages 466-479.
    10. Christian König & Patrick Weigelt & Julian Schrader & Amanda Taylor & Jens Kattge & Holger Kreft, 2019. "Biodiversity data integration—the significance of data resolution and domain," PLOS Biology, Public Library of Science, vol. 17(3), pages 1-16, March.
    11. Albert Stuart Reece & Gary Kenneth Hulse, 2022. "European Epidemiological Patterns of Cannabis- and Substance-Related Body Wall Congenital Anomalies: Geospatiotemporal and Causal Inferential Study," IJERPH, MDPI, vol. 19(15), pages 1-38, July.
    12. Philipp Bach & Victor Chernozhukov & Malte S. Kurz & Martin Spindler & Sven Klaassen, 2021. "DoubleML -- An Object-Oriented Implementation of Double Machine Learning in R," Papers 2103.09603, arXiv.org, revised Feb 2024.
    13. Barker, Justin R. & MacIsaac, Hugh J., 2022. "Species distribution models: Administrative boundary centroid occurrences require careful interpretation," Ecological Modelling, Elsevier, vol. 472(C).
    14. Jorge Luis Andrade & José Luis Valencia, 2022. "A Fuzzy Random Survival Forest for Predicting Lapses in Insurance Portfolios Containing Imprecise Data," Mathematics, MDPI, vol. 11(1), pages 1-16, December.
    15. Eeva-Katri Kumpula & Pauline Norris & Adam C Pomerleau, 2020. "Stocks of paracetamol products stored in urban New Zealand households: A cross-sectional study," PLOS ONE, Public Library of Science, vol. 15(6), pages 1-11, June.
    16. Michael Bucker & Gero Szepannek & Alicja Gosiewska & Przemyslaw Biecek, 2020. "Transparency, Auditability and eXplainability of Machine Learning Models in Credit Scoring," Papers 2009.13384, arXiv.org.
    17. Jian Lu & Raheel Ahmad & Thomas Nguyen & Jeffrey Cifello & Humza Hemani & Jiangyuan Li & Jinguo Chen & Siyi Li & Jing Wang & Achouak Achour & Joseph Chen & Meagan Colie & Ana Lustig & Christopher Dunn, 2022. "Heterogeneity and transcriptome changes of human CD8+ T cells across nine decades of life," Nature Communications, Nature, vol. 13(1), pages 1-13, December.
    18. Timo Schulte & Tillmann Wurz & Oliver Groene & Sabine Bohnet-Joschko, 2023. "Big Data Analytics to Reduce Preventable Hospitalizations—Using Real-World Data to Predict Ambulatory Care-Sensitive Conditions," IJERPH, MDPI, vol. 20(6), pages 1-16, March.
    19. Guarino, Ernestino de Souza Gomes & Barbosa, Ana Márcia & Waechter, Jorge Luiz, 2012. "Occurrence and abundance models of threatened plant species: Applications to mitigate the impact of hydroelectric power dams," Ecological Modelling, Elsevier, vol. 230(C), pages 22-33.
    20. Fogliato Riccardo & Oliveira Natalia L. & Yurko Ronald, 2021. "TRAP: a predictive framework for the Assessment of Performance in Trail Running," Journal of Quantitative Analysis in Sports, De Gruyter, vol. 17(2), pages 129-143, June.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:ecomod:v:477:y:2023:i:c:s0304380022003465. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.journals.elsevier.com/ecological-modelling .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.