IDEAS home Printed from https://ideas.repec.org/p/pra/mprapa/65866.html
   My bibliography  Save this paper

The k-NN algorithm for compositional data: a revised approach with and without zero values present

Author

Listed:
  • Tsagris, Michail

Abstract

In compositional data, an observation is a vector with non-negative components which sum to a constant, typically 1. Data of this type arise in many areas, such as geology, archaeology, biology, economics and political science among others. The goal of this paper is to extend the taxicab metric and a newly suggested metric for com-positional data by employing a power transformation. Both metrics are to be used in the k-nearest neighbours algorithm regardless of the presence of zeros. Examples with real data are exhibited.

Suggested Citation

  • Tsagris, Michail, 2014. "The k-NN algorithm for compositional data: a revised approach with and without zero values present," MPRA Paper 65866, University Library of Munich, Germany.
  • Handle: RePEc:pra:mprapa:65866
    as

    Download full text from publisher

    File URL: https://mpra.ub.uni-muenchen.de/65866/1/MPRA_paper_65866.pdf
    File Function: original version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Ferdinand Österreicher & Igor Vajda, 2003. "A new class of metric divergences on probability spaces and its applicability in statistics," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 55(3), pages 639-653, September.
    2. Martín-Fernández, J.A. & Hron, K. & Templ, M. & Filzmoser, P. & Palarea-Albaladejo, J., 2012. "Model-based replacement of rounded zeros in compositional data: Classical and robust approaches," Computational Statistics & Data Analysis, Elsevier, vol. 56(9), pages 2688-2704.
    3. J. L. Scealy & A. H. Welsh, 2011. "Regression for compositional data by using distributions defined on the hypersphere," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 73(3), pages 351-375, June.
    4. Paulo Rodrigues & Ana Lima, 2009. "Analysis of an European union election using principal component analysis," Statistical Papers, Springer, vol. 50(4), pages 895-904, August.
    5. Jane Fry & Tim Fry & Keith McLaren, 2000. "Compositional data analysis and zeros in micro data," Applied Economics, Taylor & Francis Journals, vol. 32(8), pages 953-959.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Tsagris, Michail & Preston, Simon & T.A. Wood, Andrew, 2016. "Improved classi cation for compositional data using the $\alpha$-transformation," MPRA Paper 67657, University Library of Munich, Germany.
    2. Michail Tsagris & Simon Preston & Andrew T. A. Wood, 2016. "Improved Classification for Compositional Data Using the α-transformation," Journal of Classification, Springer;The Classification Society, vol. 33(2), pages 243-261, July.
    3. Juan José Egozcue & Vera Pawlowsky-Glahn, 2019. "Compositional data: the sample space and its structure," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 28(3), pages 599-638, September.
    4. M. Templ & K. Hron & P. Filzmoser, 2017. "Exploratory tools for outlier detection in compositional data with structural zeros," Journal of Applied Statistics, Taylor & Francis Journals, vol. 44(4), pages 734-752, March.
    5. Tsagris, Michail & Preston, Simon & T.A. Wood, Andrew, 2016. "Nonparametric hypothesis testing for equality of means on the simplex," MPRA Paper 72771, University Library of Munich, Germany.
    6. Tsagris, Michail, 2015. "A novel, divergence based, regression for compositional data," MPRA Paper 72769, University Library of Munich, Germany.
    7. Tsagris, Michail, 2015. "Regression analysis with compositional data containing zero values," MPRA Paper 67868, University Library of Munich, Germany.
    8. Theophilus K. Agbenyezi & Gordon Foli & Simon K. Y. Gawu, 2020. "Geochemical Characteristics Of Gold-Bearing Granitoids At Ayanfuri In The Kumasi Basin, Southwestern Ghana: Implications For The Orogenic Related Gold Systems," Earth Sciences Malaysia (ESMY), Zibeline International Publishing, vol. 4(2), pages 127-134, June.
    9. Germ`a Coenders & N'uria Arimany Serrat, 2023. "Accounting statement analysis at industry level. A gentle introduction to the compositional approach," Papers 2305.16842, arXiv.org, revised Feb 2024.
    10. Napoleón Vargas Jurado & Kent M. Eskridge & Stephen D. Kachman & Ronald M. Lewis, 2018. "Using a Bayesian Hierarchical Linear Mixing Model to Estimate Botanical Mixtures," Journal of Agricultural, Biological and Environmental Statistics, Springer;The International Biometric Society;American Statistical Association, vol. 23(2), pages 190-207, June.
    11. Jack Gregory & David I. Stern, 2012. "Fuel Choices in Rural Maharashtra," CCEP Working Papers 1207, Centre for Climate & Energy Policy, Crawford School of Public Policy, The Australian National University.
    12. Morais, Joanna & Simioni, Michel & Thomas-Agnan, Christine, 2016. "A tour of regression models for explaining shares," TSE Working Papers 16-742, Toulouse School of Economics (TSE).
    13. Sagron, Ruth & Pugatch, Rami, 2021. "Universal distribution of batch completion times and time-cost tradeoff in a production line with arbitrary buffer size," European Journal of Operational Research, Elsevier, vol. 293(3), pages 980-989.
    14. Margaret A. Abernethy & Jan Bouwens & Laurence Van Lent, 2013. "The Role of Performance Measures in the Intertemporal Decisions of Business Unit Managers," Contemporary Accounting Research, John Wiley & Sons, vol. 30(3), pages 925-961, September.
    15. Takahiro Yoshida & Morito Tsutsumi, 2018. "On the effects of spatial relationships in spatial compositional multivariate models," Letters in Spatial and Resource Sciences, Springer, vol. 11(1), pages 57-70, March.
    16. Maria Anna Di Palma & Michele Gallo, 2019. "External Information Model in a Compositional Perspective: Evaluation of Campania Adolescents’ Preferences in the Allocation of Leisure-Time," Social Indicators Research: An International and Interdisciplinary Journal for Quality-of-Life Measurement, Springer, vol. 146(1), pages 117-133, November.
    17. Kristoffer H. Hellton, 2023. "Penalized angular regression for personalized predictions," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 50(1), pages 184-212, March.
    18. Papastamoulis Panagiotis & Rattray Magnus, 2017. "Bayesian estimation of differential transcript usage from RNA-seq data," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 16(5-6), pages 367-386, December.
    19. Yu, Xisheng, 2021. "A unified entropic pricing framework of option: Using Cressie-Read family of divergences," The North American Journal of Economics and Finance, Elsevier, vol. 58(C).
    20. Wesley Wehde & Matthew C Nowlin, 2021. "Public Attribution of Responsibility for Disaster Preparedness across Three Levels of Government and the Public: Lessons from a Survey of Residents of the U.S. South Atlantic and Gulf Coast," Publius: The Journal of Federalism, CSF Associates Inc., vol. 51(2), pages 212-237.

    More about this item

    Keywords

    compositional data; entropy; k-NN algorithm; metric; supervised classification;
    All these keywords.

    JEL classification:

    • C18 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods and Methodology: General - - - Methodolical Issues: General

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:pra:mprapa:65866. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Joachim Winter (email available below). General contact details of provider: https://edirc.repec.org/data/vfmunde.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.