IDEAS home Printed from https://ideas.repec.org/a/bla/jorssb/v64y2002i2p295-306.html
   My bibliography  Save this article

A probabilistic nearest neighbour method for statistical pattern recognition

Author

Listed:
  • C. C. Holmes
  • N. M. Adams

Abstract

Summary. Nearest neighbour algorithms are among the most popular methods used in statistical pattern recognition. The models are conceptually simple and empirical studies have shown that their performance is highly competitive against other techniques. However, the lack of a formal framework for choosing the size of the neighbourhood k is problematic. Furthermore, the method can only make discrete predictions by reporting the relative frequency of the classes in the neighbourhood of the prediction point. We present a probabilistic framework for the k‐nearest‐neighbour method that largely overcomes these difficulties. Uncertainty is accommodated via a prior distribution on k as well as in the strength of the interaction between neighbours. These prior distributions propagate uncertainty through to proper probabilistic predictions that have continuous support on (0, 1). The method makes no assumptions about the distribution of the predictor variables. The method is also fully automatic with no user‐set parameters and empirically it proves to be highly accurate on many bench‐mark data sets.

Suggested Citation

  • C. C. Holmes & N. M. Adams, 2002. "A probabilistic nearest neighbour method for statistical pattern recognition," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 64(2), pages 295-306, May.
  • Handle: RePEc:bla:jorssb:v:64:y:2002:i:2:p:295-306
    DOI: 10.1111/1467-9868.00338
    as

    Download full text from publisher

    File URL: https://doi.org/10.1111/1467-9868.00338
    Download Restriction: no

    File URL: https://libkey.io/10.1111/1467-9868.00338?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. D. J. Hand & W. E. Henley, 1997. "Statistical Classification Methods in Consumer Credit Scoring: a Review," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 160(3), pages 523-541, September.
    2. Julian Besag & Jeremy York & Annie Mollié, 1991. "Bayesian image restoration, with two applications in spatial statistics," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 43(1), pages 1-20, March.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Ghosh, Anil K., 2006. "On optimum choice of k in nearest neighbor classification," Computational Statistics & Data Analysis, Elsevier, vol. 50(11), pages 3113-3123, July.
    2. Ha-Thu Nguyen, 2015. "How is credit scoring used to predict default in China?," EconomiX Working Papers 2015-1, University of Paris Nanterre, EconomiX.
    3. Giuseppe Nuti, 2019. "An Efficient Algorithm for Bayesian Nearest Neighbours," Methodology and Computing in Applied Probability, Springer, vol. 21(4), pages 1251-1258, December.
    4. Ha Thu Nguyen, 2015. "How is credit scoring used to predict default in China?," Working Papers hal-04133309, HAL.
    5. Subhajit Dutta & Anil K. Ghosh, 2017. "Discussion," International Statistical Review, International Statistical Institute, vol. 85(1), pages 40-43, April.
    6. Pya Arnqvist, Natalya & Ngendangenzwa, Blaise & Lindahl, Eric & Nilsson, Leif & Yu, Jun, 2021. "Efficient surface finish defect detection using reduced rank spline smoothers and probabilistic classifiers," Econometrics and Statistics, Elsevier, vol. 18(C), pages 89-105.
    7. Francesco CHELLI & Chiara GIGLIARANO & Elvio MATTIOLI, 2009. "The Impact of Inflation on Heterogeneous Groups of Households: an Application to Italy," Working Papers 329, Universita' Politecnica delle Marche (I), Dipartimento di Scienze Economiche e Sociali.
    8. Yinglei Lai & Baolin Wu & Hongyu Zhao, 2011. "A permutation test approach to the choice of size k for the nearest neighbors classifier," Journal of Applied Statistics, Taylor & Francis Journals, vol. 38(10), pages 2289-2302.
    9. Manahov, Viktor & Hudson, Robert & Gebka, Bartosz, 2014. "Does high frequency trading affect technical analysis and market efficiency? And if so, how?," Journal of International Financial Markets, Institutions and Money, Elsevier, vol. 28(C), pages 131-157.
    10. Jing Quan & Xuelian Sun, 2024. "Credit risk assessment using the factorization machine model with feature interactions," Palgrave Communications, Palgrave Macmillan, vol. 11(1), pages 1-10, December.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Dangxing Chen & Weicheng Ye & Jiahui Ye, 2022. "Interpretable Selective Learning in Credit Risk," Papers 2209.10127, arXiv.org.
    2. Jonathan K. Budd & Peter G. Taylor, 2015. "Calculating optimal limits for transacting credit card customers," Papers 1506.05376, arXiv.org, revised Aug 2015.
    3. Dinh, K. & Kleimeier, S., 2006. "Credit scoring for Vietnam's retail banking market : implementation and implications for transactional versus relationship lending," Research Memorandum 012, Maastricht University, Maastricht Research School of Economics of Technology and Organization (METEOR).
    4. Katie Wilson & Jon Wakefield, 2022. "A probabilistic model for analyzing summary birth history data," Demographic Research, Max Planck Institute for Demographic Research, Rostock, Germany, vol. 47(11), pages 291-344.
    5. Ulf Römer & Oliver Musshoff, 2017. "Can agricultural credit scoring for microfinance institutions be implemented and improved by weather data?," Agricultural Finance Review, Emerald Group Publishing Limited, vol. 78(1), pages 83-97, December.
    6. Eibich, Peter & Ziebarth, Nicolas, 2014. "Examining the Structure of Spatial Health Effects in Germany Using Hierarchical Bayes Models," EconStor Open Access Articles and Book Chapters, ZBW - Leibniz Information Centre for Economics, vol. 49, pages 305-320.
    7. Shreosi Sanyal & Thierry Rochereau & Cara Nichole Maesano & Laure Com-Ruelle & Isabella Annesi-Maesano, 2018. "Long-Term Effect of Outdoor Air Pollution on Mortality and Morbidity: A 12-Year Follow-Up Study for Metropolitan France," IJERPH, MDPI, vol. 15(11), pages 1-8, November.
    8. Mayer Alvo & Jingrui Mu, 2023. "COVID-19 Data Analysis Using Bayesian Models and Nonparametric Geostatistical Models," Mathematics, MDPI, vol. 11(6), pages 1-13, March.
    9. Gil, Guilherme Dôco Roberti & Costa, Marcelo Azevedo & Lopes, Ana Lúcia Miranda & Mayrink, Vinícius Diniz, 2017. "Spatial statistical methods applied to the 2015 Brazilian energy distribution benchmarking model: Accounting for unobserved determinants of inefficiencies," Energy Economics, Elsevier, vol. 64(C), pages 373-383.
    10. Kraft, Holger & Kroisandt, Gerald & Müller, Marlene, 2002. "Assessing the discriminatory power of credit scores," SFB 373 Discussion Papers 2002,67, Humboldt University of Berlin, Interdisciplinary Research Project 373: Quantification and Simulation of Economic Processes.
    11. Chen Ying & Härdle Wolfgang K. & He Qiang & Majer Piotr, 2018. "Risk related brain regions detection and individual risk classification with 3D image FPCA," Statistics & Risk Modeling, De Gruyter, vol. 35(3-4), pages 89-110, July.
    12. Sun, Weixin & Zhang, Xuantao & Li, Minghao & Wang, Yong, 2023. "Interpretable high-stakes decision support system for credit default forecasting," Technological Forecasting and Social Change, Elsevier, vol. 196(C).
    13. Vanessa Santos-Sánchez & Juan Antonio Córdoba-Doña & Javier García-Pérez & Antonio Escolar-Pujolar & Lucia Pozzi & Rebeca Ramis, 2020. "Cancer Mortality and Deprivation in the Proximity of Polluting Industrial Facilities in an Industrial Region of Spain," IJERPH, MDPI, vol. 17(6), pages 1-15, March.
    14. Berti, Patrizia & Dreassi, Emanuela & Rigo, Pietro, 2014. "Compatibility results for conditional distributions," Journal of Multivariate Analysis, Elsevier, vol. 125(C), pages 190-203.
    15. Louise Choo & Stephen G. Walker, 2008. "A new approach to investigating spatial variations of disease," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 171(2), pages 395-405, April.
    16. Thomas Wainwright, 2011. "Elite Knowledges: Framing Risk and the Geographies of Credit," Environment and Planning A, , vol. 43(3), pages 650-665, March.
    17. Young‐Geun Choi & Lawrence P. Hanrahan & Derek Norton & Ying‐Qi Zhao, 2022. "Simultaneous spatial smoothing and outlier detection using penalized regression, with application to childhood obesity surveillance from electronic health records," Biometrics, The International Biometric Society, vol. 78(1), pages 324-336, March.
    18. Roy Cerqueti & Francesca Pampurini & Annagiulia Pezzola & Anna Grazia Quaranta, 2022. "Dangerous liasons and hot customers for banks," Review of Quantitative Finance and Accounting, Springer, vol. 59(1), pages 65-89, July.
    19. Zhengyi Zhou & David S. Matteson & Dawn B. Woodard & Shane G. Henderson & Athanasios C. Micheas, 2015. "A Spatio-Temporal Point Process Model for Ambulance Demand," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 110(509), pages 6-15, March.
    20. R T Stewart, 2011. "A profit-based scoring system in consumer credit: making acquisition decisions for credit cards," Journal of the Operational Research Society, Palgrave Macmillan;The OR Society, vol. 62(9), pages 1719-1725, September.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:jorssb:v:64:y:2002:i:2:p:295-306. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: https://edirc.repec.org/data/rssssea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.