IDEAS home Printed from https://ideas.repec.org/a/wly/envmet/v36y2025i5ne70023.html

Occupancy Modeling for Rare Species Using Large Datasets: A Subsampling Approach

Author

Listed:
  • Johanna de Haan‐Ward
  • Simon J. Bonner
  • Douglas G. Woolford

Abstract

Citizen science monitoring programs, such as the Breeding Bird Survey, provide a wealth of data for understanding species abundance and distribution. However, traditional approaches for occupancy modeling of rare species can be difficult to apply to large, imbalanced datasets. We propose a new method for occupancy modeling where the original dataset is subsampled seasonally, keeping all sites with at least one detection along with a random sample of sites with no detections. Occupancy models cannot be fit directly to these subsampled data because the assumption of binomial sampling no longer holds. However, we show that the occupancy probability is adjusted by an offset, meaning inference on the effects of predictors is still valid. We propose a method for model fitting via direct maximum likelihood and demonstrate via simulation that this leads to computational gains. We illustrate our method using data on Canada Warblers (Cardellina canadensis) from the Breeding Bird Survey in Ontario, Canada from 1997 to 2018, where 95% of sites have zero detections annually, demonstrating that we can accurately estimate the occupancy and detection parameters, including estimating the effects of habitat covariates, using just 10% of the original dataset.

Suggested Citation

  • Johanna de Haan‐Ward & Simon J. Bonner & Douglas G. Woolford, 2025. "Occupancy Modeling for Rare Species Using Large Datasets: A Subsampling Approach," Environmetrics, John Wiley & Sons, Ltd., vol. 36(5), July.
  • Handle: RePEc:wly:envmet:v:36:y:2025:i:5:n:e70023
    DOI: 10.1002/env.70023
    as

    Download full text from publisher

    File URL: https://doi.org/10.1002/env.70023
    Download Restriction: no

    File URL: https://libkey.io/10.1002/env.70023?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. King, Gary & Zeng, Langche, 2001. "Logistic Regression in Rare Events Data," Political Analysis, Cambridge University Press, vol. 9(2), pages 137-163, January.
    2. Jeffrey Daniel & Julie Horrocks & Gary J. Umphrey, 2020. "Efficient Modelling of Presence-Only Species Data via Local Background Sampling," Journal of Agricultural, Biological and Environmental Statistics, Springer;The International Biometric Society;American Statistical Association, vol. 25(1), pages 90-111, March.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Angel M. Morales & Patrick Tarwater & Indika Mallawaarachchi & Alok Kumar Dwivedi & Juan B. Figueroa-Casas, 2015. "Multinomial logistic regression approach for the evaluation of binary diagnostic test in medical research," Statistics in Transition new series, Główny Urząd Statystyczny (Polska), vol. 16(2), pages 203-222, June.
    2. F. Gauthier & D. Germain & B. Hétu, 2017. "Logistic models as a forecasting tool for snow avalanches in a cold maritime climate: northern Gaspésie, Québec, Canada," Natural Hazards: Journal of the International Society for the Prevention and Mitigation of Natural Hazards, Springer;International Society for the Prevention and Mitigation of Natural Hazards, vol. 89(1), pages 201-232, October.
    3. Douglas Cumming & Lars Hornuf & Moein Karami & Denis Schweizer, 2023. "Disentangling Crowdfunding from Fraudfunding," Journal of Business Ethics, Springer, vol. 182(4), pages 1103-1128, February.
    4. Cristian David Correa-Álvarez & Juan Carlos Salazar-Uribe & Luis Raúl Pericchi-Guerra, 2023. "Bayesian multilevel logistic regression models: a case study applied to the results of two questionnaires administered to university students," Computational Statistics, Springer, vol. 38(4), pages 1791-1810, December.
    5. Eunae Yoo & Elliot Rabinovich & Bin Gu, 2020. "The Growth of Follower Networks on Social Media Platforms for Humanitarian Operations," Production and Operations Management, Production and Operations Management Society, vol. 29(12), pages 2696-2715, December.
    6. Lo Turco, Alessia & Maggioni, Daniela, 2018. "Effects of Islamic religiosity on bilateral trust in trade: The case of Turkish exports," Journal of Comparative Economics, Elsevier, vol. 46(4), pages 947-965.
    7. Matija Kovacic & Claudio Zoli, 2021. "Ethnic distribution, effective power and conflict," Social Choice and Welfare, Springer;The Society for Social Choice and Welfare, vol. 57(2), pages 257-299, August.
    8. Blackman, Allen & Guerrero, Santiago, 2012. "What drives voluntary eco-certification in Mexico?," Journal of Comparative Economics, Elsevier, vol. 40(2), pages 256-268.
    9. Jacob Ausderan, 2018. "Reassessing the democratic advantage in interstate wars using k-adic datasets," Conflict Management and Peace Science, Peace Science Society (International), vol. 35(5), pages 451-473, September.
    10. Paul Poast, 2013. "Issue linkage and international cooperation: An empirical investigation," Conflict Management and Peace Science, Peace Science Society (International), vol. 30(3), pages 286-303, July.
    11. Yerko Rojas, 2017. "Evictions and short-term all-cause mortality: a 3-year follow-up study of a middle-aged Swedish population," International Journal of Public Health, Springer;Swiss School of Public Health (SSPH+), vol. 62(3), pages 343-351, April.
    12. Mehrez Ben Slama & Dhafer Saidane & Hassouna Fedhila, 2012. "How to identify targets in the M&A banking operations? Case of cross-border strategies in Europe by line of activity," Review of Quantitative Finance and Accounting, Springer, vol. 38(2), pages 209-240, February.
    13. Marcin Chlebus, 2014. "One-day prediction of state of turbulence for financial instrument based on models for binary dependent variable," Ekonomia journal, Faculty of Economic Sciences, University of Warsaw, vol. 37.
    14. Lorenzo Cassi & Anne Plunket, 2014. "Proximity, network formation and inventive performance: in search of the proximity paradox," The Annals of Regional Science, Springer;Western Regional Science Association, vol. 53(2), pages 395-422, September.
    15. Trent Geisler & Herman Ray & Ying Xie, 2023. "Finding the Proverbial Needle: Improving Minority Class Identification Under Extreme Class Imbalance," Journal of Classification, Springer;The Classification Society, vol. 40(1), pages 192-212, April.
    16. Dugan, Spencer August & Utne, Ingrid Bouwer, 2025. "Improved identification of maritime risk-influencing factors using AIS data in regression analysis," Reliability Engineering and System Safety, Elsevier, vol. 262(C).
    17. Adriana Bruscato Bortoluzzo & Danny Pimentel Claro & Marco Antonio Leonel Caetano & Rinaldo Artes, 2009. "Estimating Claim Size and Probability in the Auto-insurance Industry: The Zero-adjusted Inverse Gaussian (ZAIG) Distribution," Business and Economics Working Papers 056, Unidade de Negocios e Economia, Insper.
    18. Wegenast, Tim, 2013. "The Impact of Fuel Ownership on Intrastate Violence," GIGA Working Papers 225, GIGA German Institute of Global and Area Studies.
    19. Xinfu Xing & Chenglong Wu & Jinhui Li & Xueyou Li & Limin Zhang & Rongjie He, 2021. "Susceptibility assessment for rainfall-induced landslides using a revised logistic regression method," Natural Hazards: Journal of the International Society for the Prevention and Mitigation of Natural Hazards, Springer;International Society for the Prevention and Mitigation of Natural Hazards, vol. 106(1), pages 97-117, March.
    20. Hwang, Seokyoun & Sarath, Bharat & Han, Seung-youb, 2022. "Auditor independence: The effect of auditors’ quality control efforts and corporate governance," Journal of International Accounting, Auditing and Taxation, Elsevier, vol. 47(C).

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:wly:envmet:v:36:y:2025:i:5:n:e70023. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: http://www.interscience.wiley.com/jpages/1180-4009/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.