Author
Listed:
- Baoying Wang
- Imad Rahal
- Aijuan Dong
Abstract
There have been many attempts for clustering categorical data such as market basket dataset. However, most of categorical clustering approaches belong to partitional clustering which requires at least one input parameter (e.g., the minimum intra-cluster similarity or the desired number of clusters). In this paper, we propose a parallelised hierarchical clustering approach for categorical data (PH-clustering) using vertical data structures. In order to minimise the impact of low support items, we devise a weighted confidence (WC) affinity function to compute the similarity between clusters. Based on our analysis of the major clustering steps, we adopt a partial local and partial global approach to reduce computation time as well as to keep network communication at minimum. Load balance issues are addressed especially during the data partitioning phase. Our experimental results on standardised market basket data show that the proposed weighted confidence affinity measure is more accurate than other contemporary affinity measures in the literature and that our parallel clustering approach provides magnitudes of time improvements over sequential clustering especially over larger data sizes. Our results also indicate that the number of items/attributes in the dataset has a more drastic impact on performance than the number of transactions/tuples.
Suggested Citation
Baoying Wang & Imad Rahal & Aijuan Dong, 2011.
"Parallel hierarchical clustering using weighted confidence affinity,"
International Journal of Data Mining, Modelling and Management, Inderscience Enterprises Ltd, vol. 3(2), pages 110-129.
Handle:
RePEc:ids:ijdmmm:v:3:y:2011:i:2:p:110-129
Download full text from publisher
As the access to this document is restricted, you may want to
for a different version of it.
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:ids:ijdmmm:v:3:y:2011:i:2:p:110-129. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sarah Parker (email available below). General contact details of provider: http://www.inderscience.com/browse/index.php?journalID=342 .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.