IDEAS home Printed from https://ideas.repec.org/a/spr/advdac/v12y2018i2d10.1007_s11634-017-0286-x.html
   My bibliography  Save this article

Local generalized quadratic distance metrics: application to the k-nearest neighbors classifier

Author

Listed:
  • Karim Abou-Moustafa

    (University of Alberta)

  • Frank P. Ferrie

    (McGill University)

Abstract

Finding the set of nearest neighbors for a query point of interest appears in a variety of algorithms for machine learning and pattern recognition. Examples include k nearest neighbor classification, information retrieval, case-based reasoning, manifold learning, and nonlinear dimensionality reduction. In this work, we propose a new approach for determining a distance metric from the data for finding such neighboring points. For a query point of interest, our approach learns a generalized quadratic distance (GQD) metric based on the statistical properties in a “small” neighborhood for the point of interest. The locally learned GQD metric captures information such as the density, curvature, and the intrinsic dimensionality for the points falling in this particular neighborhood. Unfortunately, learning the GQD parameters under such a local learning mechanism is a challenging problem with a high computational overhead. To address these challenges, we estimate the GQD parameters using the minimum volume covering ellipsoid (MVCE) for a set of points. The advantage of the MVCE is two-fold. First, the MVCE together with the local learning approach approximate the functionality of a well known robust estimator for covariance matrices. Second, computing the MVCE is a convex optimization problem which, in addition to having a unique global solution, can be efficiently solved using a first order optimization algorithm. We validate our metric learning approach on a large variety of datasets and show that the proposed metric has promising results when compared with five algorithms from the literature for supervised metric learning.

Suggested Citation

  • Karim Abou-Moustafa & Frank P. Ferrie, 2018. "Local generalized quadratic distance metrics: application to the k-nearest neighbors classifier," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 12(2), pages 341-363, June.
  • Handle: RePEc:spr:advdac:v:12:y:2018:i:2:d:10.1007_s11634-017-0286-x
    DOI: 10.1007/s11634-017-0286-x
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11634-017-0286-x
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11634-017-0286-x?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. D. M. Titterington, 1978. "Estimation of Correlation Coefficients by Ellipsoidal Trimming," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 27(3), pages 227-234, November.
    2. Roger Shepard, 1962. "The analysis of proximities: Multidimensional scaling with an unknown distance function. II," Psychometrika, Springer;The Psychometric Society, vol. 27(3), pages 219-246, September.
    3. J. Kruskal, 1964. "Nonmetric multidimensional scaling: A numerical method," Psychometrika, Springer;The Psychometric Society, vol. 29(2), pages 115-129, June.
    4. Roger Shepard, 1962. "The analysis of proximities: Multidimensional scaling with an unknown distance function. I," Psychometrika, Springer;The Psychometric Society, vol. 27(2), pages 125-140, June.
    5. Peng Sun & Robert M. Freund, 2004. "Computation of Minimum-Volume Covering Ellipsoids," Operations Research, INFORMS, vol. 52(5), pages 690-706, October.
    6. P. Kumar & E. A. Yildirim, 2005. "Minimum-Volume Enclosing Ellipsoids and Core Sets," Journal of Optimization Theory and Applications, Springer, vol. 126(1), pages 1-21, July.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Jerzy Grobelny & Rafal Michalski & Gerhard-Wilhelm Weber, 2021. "Modeling human thinking about similarities by neuromatrices in the perspective of fuzzy logic," WORking papers in Management Science (WORMS) WORMS/21/09, Department of Operations Research and Business Intelligence, Wroclaw University of Science and Technology.
    2. Phipps Arabie, 1991. "Was euclid an unnecessarily sophisticated psychologist?," Psychometrika, Springer;The Psychometric Society, vol. 56(4), pages 567-587, December.
    3. Verniest, Fabien & Greulich, Sabine, 2019. "Methods for assessing the effects of environmental parameters on biological communities in long-term ecological studies - A literature review," Ecological Modelling, Elsevier, vol. 414(C).
    4. J. Carroll, 1985. "Review," Psychometrika, Springer;The Psychometric Society, vol. 50(1), pages 133-140, March.
    5. Albert Maydeu-Olivares & Ishwar Sethi & Phipps Arabie & A. Tanguiane & K. Klauer & Pierre Hansen & Klaas Sijtsma & M. Windham, 1995. "Book reviews," Journal of Classification, Springer;The Classification Society, vol. 12(1), pages 137-158, March.
    6. Aurea Grané & Rosario Romera, 2018. "On Visualizing Mixed-Type Data," Sociological Methods & Research, , vol. 47(2), pages 207-239, March.
    7. Lawrence Hubert, 1972. "Some extensions of Johnson's hierarchical clustering algorithms," Psychometrika, Springer;The Psychometric Society, vol. 37(3), pages 261-274, September.
    8. Groenen, P.J.F. & Borg, I., 2013. "The Past, Present, and Future of Multidimensional Scaling," Econometric Institute Research Papers EI 2013-07, Erasmus University Rotterdam, Erasmus School of Economics (ESE), Econometric Institute.
    9. Warren Torgerson, 1986. "Scaling and Psychometrika: Spatial and alternative representations of similarity data," Psychometrika, Springer;The Psychometric Society, vol. 51(1), pages 57-63, March.
    10. David Klahr, 1969. "A monte carlo investigation of the statistical significance of Kruskal's nonmetric scaling procedure," Psychometrika, Springer;The Psychometric Society, vol. 34(3), pages 319-330, September.
    11. Jacqueline Meulman & Peter Verboon, 1993. "Points of view analysis revisited: Fitting multidimensional structures to optimal distance components with cluster restrictions on the variables," Psychometrika, Springer;The Psychometric Society, vol. 58(1), pages 7-35, March.
    12. Patrick Groenen & Willem Heiser, 1996. "The tunneling method for global optimization in multidimensional scaling," Psychometrika, Springer;The Psychometric Society, vol. 61(3), pages 529-550, September.
    13. Wayne DeSarbo & Ajay Manrai & Raymond Burke, 1990. "A nonspatial methodology for the analysis of two-way proximity data incorporating the distance-density hypothesis," Psychometrika, Springer;The Psychometric Society, vol. 55(2), pages 229-253, June.
    14. la Grange, Anthony & le Roux, Niël & Gardner-Lubbe, Sugnet, 2009. "BiplotGUI: Interactive Biplots in R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 30(i12).
    15. Mark Davison, 1979. "Testing a unidimensional, qualitative unfolding model for attitudinal or developmental data," Psychometrika, Springer;The Psychometric Society, vol. 44(2), pages 179-194, June.
    16. Giovanni De Luca & Paola Zuccolotto, 2011. "A tail dependence-based dissimilarity measure for financial time series clustering," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 5(4), pages 323-340, December.
    17. Hossein Safizadeh, M. & McKenna, David R., 1996. "Application of multidimensional scaling techniques to facilities layout," European Journal of Operational Research, Elsevier, vol. 92(1), pages 54-62, July.
    18. Rosa, Samuel & Harman, Radoslav, 2022. "Computing minimum-volume enclosing ellipsoids for large datasets," Computational Statistics & Data Analysis, Elsevier, vol. 171(C).
    19. Yoshio Takane, 1981. "Multidimensional successive categories scaling: A maximum likelihood method," Psychometrika, Springer;The Psychometric Society, vol. 46(1), pages 9-28, March.
    20. Antonio Fernández-Cano & Ángel Bueno, 2002. "Multivariate evaluation of Spanish educational research journals," Scientometrics, Springer;Akadémiai Kiadó, vol. 55(1), pages 87-102, September.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:advdac:v:12:y:2018:i:2:d:10.1007_s11634-017-0286-x. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.