IDEAS home Printed from https://ideas.repec.org/p/bge/wpaper/555.html
   My bibliography  Save this paper

A Simple Permutation Test for Clusteredness

Author

Listed:
  • Michael Greenacre

Abstract

Hierarchical clustering is a popular method for finding structure in multivariate data, resulting in a binary tree constructed on the particular objects of the study, usually sampling units. The user faces the decision where to cut the binary tree in order to determine the number of clusters to interpret and there are various ad hoc rules for arriving at a decision. A simple permutation test is presented that diagnoses whether non-random levels of clustering are present in the set of objects and, if so, indicates the specific level at which the tree can be cut. The test is validated against random matrices to verify the type I error probability and a power study is performed on data sets with known clusteredness to study the type II error.

Suggested Citation

  • Michael Greenacre, 2011. "A Simple Permutation Test for Clusteredness," Working Papers 555, Barcelona School of Economics.
  • Handle: RePEc:bge:wpaper:555
    as

    Download full text from publisher

    File URL: http://www.barcelonagse.eu/sites/default/files/working_paper_pdfs/555.pdf
    Download Restriction: no
    ---><---

    Other versions of this item:

    References listed on IDEAS

    as
    1. Gordon, A. D., 1994. "Identifying genuine clusters in a classification," Computational Statistics & Data Analysis, Elsevier, vol. 18(5), pages 561-581, December.
    2. Michael Greenacre, 2008. "Correspondence analysis of raw data," Economics Working Papers 1112, Department of Economics and Business, Universitat Pompeu Fabra, revised Jul 2009.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Christian Haedo & Michel Mouchart, 2022. "Two-mode clustering through profiles of regions and sectors," Empirical Economics, Springer, vol. 63(4), pages 1971-1996, October.
    2. Lucie Aulus-Giacosa & Sébastien Ollier & Cleo Bertelsmeier, 2024. "Non-native ants are breaking down biogeographic boundaries and homogenizing community assemblages," Nature Communications, Nature, vol. 15(1), pages 1-11, December.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Eric Beh & Luigi D’Ambra, 2009. "Some Interpretative Tools for Non-Symmetrical Correspondence Analysis," Journal of Classification, Springer;The Classification Society, vol. 26(1), pages 55-76, April.
    2. Pilar García Gómez & Ángel López Nicolás, 2005. "Socio-economic inequalities in health in Catalonia," Hacienda Pública Española / Review of Public Economics, IEF, vol. 175(4), pages 103-121, december.
    3. David Bholat & Stephen Hans & Pedro Santos & Cheryl Schonhardt-Bailey, 2015. "Text mining for central banks," Handbooks, Centre for Central Banking Studies, Bank of England, number 33, April.
    4. Michael Greenacre, 2012. "Fuzzy coding in constrained ordinations," Economics Working Papers 1325, Department of Economics and Business, Universitat Pompeu Fabra.
    5. Rémi Bazillier & Nicolas Sirven, 2006. "Les normes fondamentales du travail contribuent-elles à réduire les inégalités ?," Revue Française d'Économie, Programme National Persée, vol. 21(2), pages 111-146.
    6. Alfonso Gambardella & Walter Garcia Fontes, 1996. "European research funding and regional technological capabilities: Network composition analysis," Economics Working Papers 174, Department of Economics and Business, Universitat Pompeu Fabra.
    7. Paul Green & Jonathan Kim & Frank Carmone, 1990. "A preliminary study of optimal variable weighting in k-means clustering," Journal of Classification, Springer;The Classification Society, vol. 7(2), pages 271-285, September.
    8. Michael J. Greenacre & Patrick J. F. Groenen, 2016. "Weighted Euclidean Biplots," Journal of Classification, Springer;The Classification Society, vol. 33(3), pages 442-459, October.
    9. Malcolm Dow & Peter Willett & Roderick McDonald & Belver Griffith & Michael Greenacre & Peter Bryant & Daniel Wartenberg & Ove Frank, 1987. "Book reviews," Journal of Classification, Springer;The Classification Society, vol. 4(2), pages 245-278, September.
    10. Vartan Choulakian, 1988. "Exploratory analysis of contingency tables by loglinear formulation and generalizations of correspondence analysis," Psychometrika, Springer;The Psychometric Society, vol. 53(2), pages 235-250, June.
    11. W. Krzanowski & Gregory Cermak & Jan Leeuw & Fionn Murtagh & Peter Bryant & Bernard Monjardet & Chikio Hayashi, 1985. "Book reviews," Journal of Classification, Springer;The Classification Society, vol. 2(1), pages 277-299, December.
    12. François Bavaud, 2011. "On the Schoenberg Transformations in Data Analysis: Theory and Illustrations," Journal of Classification, Springer;The Classification Society, vol. 28(3), pages 297-314, October.
    13. Maura Vásquez & Guillermo Ramírez & Alberto Camardiel & Tomás Aluja, 2008. "A Biplot graphical tool to model the relationships between two sets of variables," Economía, Instituto de Investigaciones Económicas y Sociales (IIES). Facultad de Ciencias Económicas y Sociales. Universidad de Los Andes. Mérida, Venezuela, vol. 33(25), pages 117-130, january-j.
    14. Jurlin, Kresimir & Malekovic, Sanja & Puljiz, Jaksa & Cziraky, Dario & Polic, Mario, 2002. "Covariance structure analysis of regional development data: an application to municipality development assessment," ERSA conference papers ersa02p469, European Regional Science Association.
    15. Robert Boik, 1996. "An efficient algorithm for joint correspondence analysis," Psychometrika, Springer;The Psychometric Society, vol. 61(2), pages 255-269, June.
    16. Jos Berge, 1995. "Review," Psychometrika, Springer;The Psychometric Society, vol. 60(2), pages 313-315, June.
    17. Evert Meijers, 2005. "High-level consumer services in polycentric urban regions - hospital care and higher education between duplication and complementarity," ERSA conference papers ersa05p208, European Regional Science Association.
    18. Laurent Lesnard & Thibaut Saint Pol, 2009. "Patterns of Workweek Schedules in France," Social Indicators Research: An International and Interdisciplinary Journal for Quality-of-Life Measurement, Springer, vol. 93(1), pages 171-176, August.
    19. Warrens, Matthijs J. & Heiser, Willem J., 2009. "Diagnostics for regression dependence in tables re-ordered by the dominant correspondence analysis solution," Computational Statistics & Data Analysis, Elsevier, vol. 53(8), pages 3139-3144, June.
    20. Nappi-Choulet, Ingrid & Décamps, Aurélien, 2011. "Is Sustainability Attractive for Corporate Real Estate Decisions ?," ESSEC Working Papers WP1106, ESSEC Research Center, ESSEC Business School.

    More about this item

    Keywords

    Hierarchical clustering; Distance; permutation test;
    All these keywords.

    JEL classification:

    • C19 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods and Methodology: General - - - Other
    • C88 - Mathematical and Quantitative Methods - - Data Collection and Data Estimation Methodology; Computer Programs - - - Other Computer Software

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bge:wpaper:555. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Bruno Guallar (email available below). General contact details of provider: https://edirc.repec.org/data/bargses.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.