IDEAS home Printed from https://ideas.repec.org/a/eee/jmvana/v132y2014icp9-24.html
   My bibliography  Save this article

The analysis of distance of grouped data with categorical variables: Categorical canonical variate analysis

Author

Listed:
  • Le Roux, Niël J.
  • Gardner-Lubbe, Sugnet
  • Gower, John C.

Abstract

We use generalised biplots to develop the important special case of (i) when all variables are categorical and (ii) the samples fall into K recognised groups. We term this Categorical Canonical Variate Analysis (CatCVA), because it has similar characteristics to Rao’s Canonical Variate Analysis (CVA), especially its visual aspects. It allows centroids of groups to be exhibited in increasing numbers of dimensions, together with information on within-group sample variation. Variables are represented by category-level-points (CLPs) which are a counterpart of numerically calibrated biplot axes for quantitative variables. Mechanisms are provided for relating the samples to their category levels, for giving convex regions to help predict categories, and for adding new samples. Inter-sample distance may be measured by any Euclidean embeddable distance. Computation is minimised by working in the K−1 dimensional space containing the group centroids.

Suggested Citation

  • Le Roux, Niël J. & Gardner-Lubbe, Sugnet & Gower, John C., 2014. "The analysis of distance of grouped data with categorical variables: Categorical canonical variate analysis," Journal of Multivariate Analysis, Elsevier, vol. 132(C), pages 9-24.
  • Handle: RePEc:eee:jmvana:v:132:y:2014:i:c:p:9-24
    DOI: 10.1016/j.jmva.2014.07.014
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0047259X14001717
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.jmva.2014.07.014?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Gardner, Sugnet & Gower, John C. & le Roux, N.J., 2006. "A synthesis of canonical variate analysis, generalised canonical correlation and Procrustes analysis," Computational Statistics & Data Analysis, Elsevier, vol. 50(1), pages 107-134, January.
    2. J. Gower & P. Legendre, 1986. "Metric and Euclidean properties of dissimilarity coefficients," Journal of Classification, Springer;The Classification Society, vol. 3(1), pages 5-48, March.
    3. John Gower & Niel Roux & Sugnet Gardner-Lubbe, 2014. "The Canonical Analysis of Distance," Journal of Classification, Springer;The Classification Society, vol. 31(1), pages 107-128, April.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. John C. Gower & Niël J. Le Roux & Sugnet Gardner-Lubbe, 2022. "Properties of individual differences scaling and its interpretation," Statistical Papers, Springer, vol. 63(4), pages 1221-1245, August.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Guohuan Su & Adam Mertel & Sébastien Brosse & Justin M. Calabrese, 2023. "Species invasiveness and community invasibility of North American freshwater fish fauna revealed via trait-based analysis," Nature Communications, Nature, vol. 14(1), pages 1-12, December.
    2. la Grange, Anthony & le Roux, Niël & Gardner-Lubbe, Sugnet, 2009. "BiplotGUI: Interactive Biplots in R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 30(i12).
    3. Michael Brusco & J Dennis Cradit & Douglas Steinley, 2021. "A comparison of 71 binary similarity coefficients: The effect of base rates," PLOS ONE, Public Library of Science, vol. 16(4), pages 1-19, April.
    4. Balepur, Prashant Narayan, 1998. "Impacts of Computer-Mediated Communication on Travel and Communication Patterns: The Davis Community Network Study," Institute of Transportation Studies, Research Reports, Working Papers, Proceedings qt6cb1f85c, Institute of Transportation Studies, UC Berkeley.
    5. Niemann, Helen & Moehrle, Martin G. & Frischkorn, Jonas, 2017. "Use of a new patent text-mining and visualization method for identifying patenting patterns over time: Concept, method and test application," Technological Forecasting and Social Change, Elsevier, vol. 115(C), pages 210-220.
    6. Michael J. Greenacre & Patrick J. F. Groenen, 2016. "Weighted Euclidean Biplots," Journal of Classification, Springer;The Classification Society, vol. 33(3), pages 442-459, October.
    7. Douglas L. Steinley & M. J. Brusco, 2019. "Using an Iterative Reallocation Partitioning Algorithm to Verify Test Multidimensionality," Journal of Classification, Springer;The Classification Society, vol. 36(3), pages 397-413, October.
    8. Matthijs Warrens, 2008. "Bounds of Resemblance Measures for Binary (Presence/Absence) Variables," Journal of Classification, Springer;The Classification Society, vol. 25(2), pages 195-208, November.
    9. Anna Maria D’Arcangelis & Giulia Rotundo, 2016. "Complex Networks in Finance," Lecture Notes in Economics and Mathematical Systems, in: Pasquale Commendatore & Mariano Matilla-García & Luis M. Varela & Jose S. Cánovas (ed.), Complex Networks and Dynamics, pages 209-235, Springer.
    10. Carla Coltharp & Rene P Kessler & Jie Xiao, 2012. "Accurate Construction of Photoactivated Localization Microscopy (PALM) Images for Quantitative Measurements," PLOS ONE, Public Library of Science, vol. 7(12), pages 1-15, December.
    11. Letizia Mencarini & Raffaella Piccarreta & Marco Le Moglie, 2022. "Life‐course perspective on personality traits and fertility with sequence analysis," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 185(3), pages 1344-1369, July.
    12. Vines, S.K., 2015. "Predictive nonlinear biplots: Maps and trajectories," Journal of Multivariate Analysis, Elsevier, vol. 140(C), pages 47-59.
    13. Rizzi, Alfredo & Vichi, Maurizio, 1995. "Representation, synthesis, variability and data preprocessing of a three-way data set," Computational Statistics & Data Analysis, Elsevier, vol. 19(2), pages 203-222, February.
    14. Hennig, Christian, 2008. "Dissolution point and isolation robustness: Robustness criteria for general cluster analysis methods," Journal of Multivariate Analysis, Elsevier, vol. 99(6), pages 1154-1176, July.
    15. S. T. Buckland & Y. Yuan & E. Marcon, 2017. "Measuring temporal trends in biodiversity," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 101(4), pages 461-474, October.
    16. Patrick Groenen & Niël Roux & Sugnet Gardner-Lubbe, 2015. "Spline-based nonlinear biplots," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 9(2), pages 219-238, June.
    17. Ricotta, Carlo & Szeidl, Laszlo, 2009. "Diversity partitioning of Rao’s quadratic entropy," Theoretical Population Biology, Elsevier, vol. 76(4), pages 299-302.
    18. A. Gordon, 1990. "Constructing dissimilarity measures," Journal of Classification, Springer;The Classification Society, vol. 7(2), pages 257-269, September.
    19. Fan, Cheng & Xiao, Fu & Yan, Chengchu & Liu, Chengliang & Li, Zhengdao & Wang, Jiayuan, 2019. "A novel methodology to explain and evaluate data-driven building energy performance models based on interpretable machine learning," Applied Energy, Elsevier, vol. 235(C), pages 1551-1560.
    20. Matthijs Warrens, 2008. "On the Indeterminacy of Resemblance Measures for Binary (Presence/Absence) Data," Journal of Classification, Springer;The Classification Society, vol. 25(1), pages 125-136, June.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:jmvana:v:132:y:2014:i:c:p:9-24. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/wps/find/journaldescription.cws_home/622892/description#description .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.