IDEAS home Printed from https://ideas.repec.org/a/eee/ecomod/v217y2008i1p48-58.html
   My bibliography  Save this article

A comparison of the performance of threshold criteria for binary classification in terms of predicted prevalence and kappa

Author

Listed:
  • Freeman, Elizabeth A.
  • Moisen, Gretchen G.

Abstract

Modelling techniques used in binary classification problems often result in a predicted probability surface, which is then translated into a presence–absence classification map. However, this translation requires a (possibly subjective) choice of threshold above which the variable of interest is predicted to be present. The selection of this threshold value can have dramatic effects on model accuracy as well as the predicted prevalence for the variable (the overall proportion of locations where the variable is predicted to be present). The traditional default is to simply use a threshold of 0.5 as the cut-off, but this does not necessarily preserve the observed prevalence or result in the highest prediction accuracy, especially for data sets with very high or very low observed prevalence. Alternatively, the thresholds can be chosen to optimize map accuracy, as judged by various criteria. Here we examine the effect of 11 of these potential criteria on predicted prevalence, prediction accuracy, and the resulting map output. Comparisons are made using output from presence–absence models developed for 13 tree species in the northern mountains of Utah. We found that species with poor model quality or low prevalence were most sensitive to the choice of threshold. For these species, a 0.5 cut-off was unreliable, sometimes resulting in substantially lower kappa and underestimated prevalence, with possible detrimental effects on a management decision. If a management objective requires a map to portray unbiased estimates of species prevalence, then the best results were obtained from thresholds deliberately chosen so that the predicted prevalence equaled the observed prevalence, followed closely by thresholds chosen to maximize kappa. These were also the two criteria with the highest mean kappa from our independent test data. For particular management applications the special cases of user specified required accuracy may be most appropriate. Ultimately, maps will typically have multiple and somewhat conflicting management applications. Therefore, providing users with a continuous probability surface may be the most versatile and powerful method, allowing threshold choice to be matched with each maps intended use.

Suggested Citation

  • Freeman, Elizabeth A. & Moisen, Gretchen G., 2008. "A comparison of the performance of threshold criteria for binary classification in terms of predicted prevalence and kappa," Ecological Modelling, Elsevier, vol. 217(1), pages 48-58.
  • Handle: RePEc:eee:ecomod:v:217:y:2008:i:1:p:48-58
    DOI: 10.1016/j.ecolmodel.2008.05.015
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0304380008002275
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.ecolmodel.2008.05.015?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Freeman, Elizabeth A. & Moisen, Gretchen, 2008. "PresenceAbsence: An R Package for Presence Absence Analysis," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 23(i11).
    2. Cramer,J. S., 2011. "Logit Models from Economics and Other Fields," Cambridge Books, Cambridge University Press, number 9780521188036.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Pecchi, Matteo & Marchi, Maurizio & Burton, Vanessa & Giannetti, Francesca & Moriondo, Marco & Bernetti, Iacopo & Bindi, Marco & Chirici, Gherardo, 2019. "Species distribution modelling to support forest management. A literature review," Ecological Modelling, Elsevier, vol. 411(C).
    2. Václavík, Tomáš & Meentemeyer, Ross K., 2009. "Invasive species distribution modeling (iSDM): Are absence data and dispersal constraints needed to predict actual distributions?," Ecological Modelling, Elsevier, vol. 220(23), pages 3248-3258.
    3. Aziza Usmanova & Ahmed Aziz & Dilshodjon Rakhmonov & Walid Osamy, 2022. "Utilities of Artificial Intelligence in Poverty Prediction: A Review," Sustainability, MDPI, vol. 14(21), pages 1-39, October.
    4. Watling, James I. & Romañach, Stephanie S. & Bucklin, David N. & Speroterra, Carolina & Brandt, Laura A. & Pearlstine, Leonard G. & Mazzotti, Frank J., 2012. "Do bioclimate variables improve performance of climate envelope models?," Ecological Modelling, Elsevier, vol. 246(C), pages 79-85.
    5. Nenzén, H.K. & Araújo, M.B., 2011. "Choice of threshold alters projections of species range shifts under climate change," Ecological Modelling, Elsevier, vol. 222(18), pages 3346-3354.
    6. Salvador Arenas-Castro & João Gonçalves & Paulo Alves & Domingo Alcaraz-Segura & João P Honrado, 2018. "Assessing the multi-scale predictive ability of ecosystem functional attributes for species distribution modelling," PLOS ONE, Public Library of Science, vol. 13(6), pages 1-31, June.
    7. Alessandra Guglielmi & Francesca Ieva & Anna M. Paganoni & Fabrizio Ruggeri & Jacopo Soriano, 2014. "Semiparametric Bayesian models for clustering and classification in the presence of unbalanced in-hospital survival," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 63(1), pages 25-46, January.
    8. Brice B Hanberry & Hong S He & Brian J Palik, 2012. "Pseudoabsence Generation Strategies for Species Distribution Models," PLOS ONE, Public Library of Science, vol. 7(8), pages 1-12, August.
    9. Benkendorf, Donald J. & Schwartz, Samuel D. & Cutler, D. Richard & Hawkins, Charles P., 2023. "Correcting for the effects of class imbalance improves the performance of machine-learning based species distribution models," Ecological Modelling, Elsevier, vol. 483(C).
    10. Peter M Rose & Mark J Kennard & David B Moffatt & Fran Sheldon & Gavin L Butler, 2016. "Testing Three Species Distribution Modelling Strategies to Define Fish Assemblage Reference Conditions for Stream Bioassessment and Related Applications," PLOS ONE, Public Library of Science, vol. 11(1), pages 1-23, January.
    11. Sillero, Neftalí & Arenas-Castro, Salvador & Enriquez‐Urzelai, Urtzi & Vale, Cândida Gomes & Sousa-Guedes, Diana & Martínez-Freiría, Fernando & Real, Raimundo & Barbosa, A.Márcia, 2021. "Want to model a species niche? A step-by-step guideline on correlative ecological niche modelling," Ecological Modelling, Elsevier, vol. 456(C).
    12. Dean Fantazzini & Yufeng Xiao, 2023. "Detecting Pump-and-Dumps with Crypto-Assets: Dealing with Imbalanced Datasets and Insiders’ Anticipated Purchases," Econometrics, MDPI, vol. 11(3), pages 1-73, August.
    13. Vu, Khoa & Vuong, Nguyen Dinh Tuan & Vu-Thanh, Tu-Anh & Nguyen, Anh Ngoc, 2022. "Income shock and food insecurity prediction Vietnam under the pandemic," World Development, Elsevier, vol. 153(C).
    14. Liu, Fang & McShea, William J. & Li, Diqiang, 2017. "Correlating habitat suitability with landscape connectivity: A case study of Sichuan golden monkey in China," Ecological Modelling, Elsevier, vol. 353(C), pages 37-46.
    15. Alexandra D Syphard & Avi Bar Massada & Van Butsic & Jon E Keeley, 2013. "Land Use Planning and Wildfire: Development Policies Influence Future Probability of Housing Loss," PLOS ONE, Public Library of Science, vol. 8(8), pages 1-12, August.
    16. Freeman, Elizabeth A. & Moisen, Gretchen G. & Frescino, Tracey S., 2012. "Evaluating effectiveness of down-sampling for stratified designs and unbalanced prevalence in Random Forest models of tree species distributions in Nevada," Ecological Modelling, Elsevier, vol. 233(C), pages 1-10.
    17. Toshiya Matsuura & Ken Sugimura & Asako Miyamoto & Nobuhiko Tanaka, 2013. "Knowledge-Based Estimation of Edible Fern Harvesting Sites in Mountainous Communities of Northeastern Japan," Sustainability, MDPI, vol. 6(1), pages 1-18, December.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Szabó, György & Borsos, István & Szombati, Edit, 2019. "Games, graphs and Kirchhoff laws," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 521(C), pages 416-423.
    2. Katarzyna Sokołowska, 2014. "Determinants and perceptions of social mobility in Poland, 1992-2008," Contemporary Economics, University of Economics and Human Sciences in Warsaw., vol. 8(1), March.
    3. Bianca Polenzani & Chiara Riganelli & Andrea Marchini, 2020. "Sustainability Perception of Local Extra Virgin Olive Oil and Consumers’ Attitude: A New Italian Perspective," Sustainability, MDPI, vol. 12(3), pages 1-18, January.
    4. Annemiek Vuren & Daniel Vuuren, 2007. "Financial Incentives in Disability Insurance in the Netherlands," De Economist, Springer, vol. 155(1), pages 73-98, March.
    5. Gordon Kemp & João Santos Silva, 2016. "Partial effects in fixed-effects models," United Kingdom Stata Users' Group Meetings 2016 06, Stata Users Group.
    6. Aldona Standar & Agnieszka Kozera & Łukasz Satoła, 2021. "The Importance of Local Investments Co-Financed by the European Union in the Field of Renewable Energy Sources in Rural Areas of Poland," Energies, MDPI, vol. 14(2), pages 1-23, January.
    7. Zigraiova, Diana & Jakubik, Petr, 2015. "Systemic event prediction by an aggregate early warning system: An application to the Czech Republic," Economic Systems, Elsevier, vol. 39(4), pages 553-576.
    8. Vorpahl, Peter & Elsenbeer, Helmut & Märker, Michael & Schröder, Boris, 2012. "How can statistical models help to determine driving factors of landslides?," Ecological Modelling, Elsevier, vol. 239(C), pages 27-39.
    9. Fioretti, Guido, 2007. "The organizational learning curve," European Journal of Operational Research, Elsevier, vol. 177(3), pages 1375-1384, March.
    10. Giuseppe Orlando & Roberta Pelosi, 2020. "Non-Performing Loans for Italian Companies: When Time Matters. An Empirical Research on Estimating Probability to Default and Loss Given Default," IJFS, MDPI, vol. 8(4), pages 1-22, November.
    11. Beare, Brendan K & Toda, Alexis Akira, 2020. "On the emergence of a power law in the distribution of COVID-19 cases," University of California at San Diego, Economics Working Paper Series qt9k5027d0, Department of Economics, UC San Diego.
    12. Akpoti, Komlavi & Groen, Thomas & Dossou-Yovo, Elliott & Kabo-bah, Amos T. & Zwart, Sander J., 2022. "Climate change-induced reduction in agricultural land suitability of West-Africa's inland valley landscapes," Agricultural Systems, Elsevier, vol. 200(C).
    13. Trinh, Thoai Quang & Rañola, Roberto F. & Camacho, Leni D. & Simelton, Elisabeth, 2018. "Determinants of farmers’ adaptation to climate change in agricultural production in the central region of Vietnam," Land Use Policy, Elsevier, vol. 70(C), pages 224-231.
    14. Karacuka, Mehmet & Çatık, A. Nazif & Haucap, Justus, 2013. "Consumer choice and local network effects in mobile telecommunications in Turkey," Telecommunications Policy, Elsevier, vol. 37(4), pages 334-344.
    15. Sillero, Neftalí & Campos, João Carlos & Arenas-Castro, Salvador & Barbosa, A.Márcia, 2023. "A curated list of R packages for ecological niche modelling," Ecological Modelling, Elsevier, vol. 476(C).
    16. Diana Zigraiova & Petr Jakubik, 2014. "Systemic Event Prediction by Early Warning System," Working Papers IES 2014/01, Charles University Prague, Faculty of Social Sciences, Institute of Economic Studies, revised Jan 2014.
    17. Eleftherios Giovanis, 2012. "Study of Discrete Choice Models and Adaptive Neuro-Fuzzy Inference System in the Prediction of Economic Crisis Periods in USA," Economic Analysis and Policy, Elsevier, vol. 42(1), pages 79-96, March.
    18. J. S. Cramer, 2007. "Robustness of Logit Analysis: Unobserved Heterogeneity and Mis‐specified Disturbances," Oxford Bulletin of Economics and Statistics, Department of Economics, University of Oxford, vol. 69(4), pages 545-555, August.
    19. Guo, Xiaoli & Ryvkin, Dmitry, 2022. "When is intergroup herding beneficial?," Mathematical Social Sciences, Elsevier, vol. 120(C), pages 66-77.
    20. Kincses, Áron & Tóth, Géza & Tömöri, Mihály & Michalkó, Gábor, 2016. "Identifying settlements involved in Hungary’s transit traffic," MPRA Paper 74508, University Library of Munich, Germany.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:ecomod:v:217:y:2008:i:1:p:48-58. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.journals.elsevier.com/ecological-modelling .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.