IDEAS home Printed from https://ideas.repec.org/a/ids/ijdmmm/v3y2011i2p110-129.html
   My bibliography  Save this article

Parallel hierarchical clustering using weighted confidence affinity

Author

Listed:
  • Baoying Wang
  • Imad Rahal
  • Aijuan Dong

Abstract

There have been many attempts for clustering categorical data such as market basket dataset. However, most of categorical clustering approaches belong to partitional clustering which requires at least one input parameter (e.g., the minimum intra-cluster similarity or the desired number of clusters). In this paper, we propose a parallelised hierarchical clustering approach for categorical data (PH-clustering) using vertical data structures. In order to minimise the impact of low support items, we devise a weighted confidence (WC) affinity function to compute the similarity between clusters. Based on our analysis of the major clustering steps, we adopt a partial local and partial global approach to reduce computation time as well as to keep network communication at minimum. Load balance issues are addressed especially during the data partitioning phase. Our experimental results on standardised market basket data show that the proposed weighted confidence affinity measure is more accurate than other contemporary affinity measures in the literature and that our parallel clustering approach provides magnitudes of time improvements over sequential clustering especially over larger data sizes. Our results also indicate that the number of items/attributes in the dataset has a more drastic impact on performance than the number of transactions/tuples.

Suggested Citation

  • Baoying Wang & Imad Rahal & Aijuan Dong, 2011. "Parallel hierarchical clustering using weighted confidence affinity," International Journal of Data Mining, Modelling and Management, Inderscience Enterprises Ltd, vol. 3(2), pages 110-129.
  • Handle: RePEc:ids:ijdmmm:v:3:y:2011:i:2:p:110-129
    as

    Download full text from publisher

    File URL: http://www.inderscience.com/link.php?id=41491
    Download Restriction: Access to full text is restricted to subscribers.
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:ids:ijdmmm:v:3:y:2011:i:2:p:110-129. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sarah Parker (email available below). General contact details of provider: http://www.inderscience.com/browse/index.php?journalID=342 .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.