This file is part of IDEAS, which uses RePEc data


[ Papers | Articles | Software | Books | Chapters | Authors | Institutions | JEL Classification | NEP reports | Search | New papers by email | Author registration | Rankings | Volunteers | FAQ | Blog | Help! ]

New tools for evaluating the results of cluster analyses

Author info | Abstract | Publisher info | Download info | Related research | Statistics
Author Info
Hildegard Schaeper () (HIS)
Abstract

Clustering methods are designed for finding groups in data, i.e., for grouping similar objects (variables or observations) into the same cluster and dissimilar objects into separate clusters. Although the main idea is rather simple, carrying out a cluster analysis remains a challenging task. The number of different clustering methods is huge and clustering includes many choices, such as the decision between basic approaches (e.g., hierarchical and partitioning methods), the choice of a dissimilarity or similarity measure, the selection of a particular linkage method when performing a hierarchical agglomerative cluster analysis, the choice of an initial partition when carrying out a partitioning cluster analysis, and the determination of the appropriate number of clusters. Each of these decisions can affect the classification results. Apart from two commands for determining the number of clusters (cluster stop, cluster dendrogram) Stata has no built-in tools that allow examination of clustering results. We therefore developed some simple tools that provide further evaluation criteria: * programs assisting in determining the number of clusters (Mojena’s stopping rules for hierarchical clustering techniques, PRE coefficient, F-Max statistic and Beale’s F values for a partitioning cluster analysis), * a program for testing the stability of classifications produced by different cluster analyses (Rand index), and * a program that computes ETA2 to assess how well the clustering variables separate the clusters. The presentation will compare these programs with other cluster-analysis tools (agglomeration schedule, scree diagram).

Download Info
To download:

If you experience problems downloading a file, check if you have the proper application to view it first. Information about this may be contained in the File-Format links below. In case of further problems read the IDEAS help page. Note that these files are not on the IDEAS site. Please be patient as the files may be large.

File URL: http://fmwww.bc.edu/repec/dsug2006/schaeper_pres_short.ppt
Our checks indicate that this address may not be valid because: 404 Not Found. If this is indeed the case, please notify (Christopher F Baum)
File Format:
File Function:
Download Restriction: no

Publisher Info
Paper provided by Stata Users Group in its series German Stata Users' Group Meetings 2006 with number 08.

Download reference. The following formats are available: HTML (with abstract), plain text (with abstract), BibTeX, RIS (EndNote, RefMan, ProCite), ReDIF
Length:
Date of creation: 24 May 2006
Date of revision:
Handle: RePEc:boc:dsug06:08

Contact details of provider:
Postal: Administration Building, 140 Commonwealth Avenue, Chestnut Hill MA 02467
Phone: 617-552-3670
Fax: 617-552-2308
Email:
Web page: http://www.stata.com/meeting/4german
More information through EDIRC

For technical questions regarding this item, or to correct its listing, contact: (Christopher F Baum).

Related research
Keywords:

Statistics
Access and download statistics

Did you know? About 1000 archives contribute their bibliographic data to RePEc.

This page was last updated on 2009-10-23.


This information is provided to you by IDEAS at the Department of Economics, College of Liberal Arts and Sciences, University of Connecticut using RePEc data on a server sponsored by the Society for Economic Dynamics.