IDEAS home Printed from https://ideas.repec.org/a/sae/intdis/v5y2009i1p81-81.html
   My bibliography  Save this article

Privacy-Preserving Hierarchical-k-Means Clustering on Horizontally Partitioned Data

Author

Listed:
  • Anrong Xue
  • Dongjie Jiang
  • Shiguang Ju
  • Weihe Chen
  • Handa Ma

Abstract

Privacy preserving mining of distributed data is an important direction for data mining, and privacy preserving clustering is one of the main researches. Privacy-preserving data mining techniques enable knowledge discovery without requiring disclosure of private data. The existing privacy preserving algorithms mainly concentrated on association rules and classification, only few algorithms on privacy preserving clustering, and these algorithms mainly concentrated on centralized and vertically partitioned data. So we proposed privacy preserving hierarchical k-means clustering algorithm on horizontally partitioned data, denoted as HPPHKC. The complexity on k-means clustering algorithm is only O(n), so most existing privacy preserving clustering algorithms are concentrated on k-means and based on two parties and the trusted third party, these algorithms have the drawbacks of inaccurate results because of choosing initial clustering centers randomly and applying to multi-party difficult and revealing privacy because of depending on the third party excessively. By introduction of three protocols for secure multi-party computation: distance computation, cluster center computation, and standardization and combination of the merits of hierarchical and k-means clustering, we presented a privacy-preserving hierarchical-k-means clustering algorithm on horizontally partitioned data for semi-honest parties using some secure multi-party computation protocols. The algorithm uses the security protocol mentioned above to achieve the protection of the privacy data, and uses the hierarchical clustering algorithm to obtain k cluster centers, then uses the k-means clustering algorithm to obtain the final k clusters. We introduce the clustering feature and the clustering feature tree, which are used to summarize the cluster representations. A clustering feature (CF) is a three-dimensional vector summarizing information about clusters of objects. The i th clustering feature is CF i = (cn i ,cc i ,cp i ), where cn i is the number of th clusters, denoted as the size of th cluster, cc i is the center of the cni objects, and cp i is the pointer of the list of cni objects. The algorithm has two phases: the first phase, every object can be as a cluster, a secure computation protocol is used to compute the dissimilarity matrix and the most similar clusters will be merged. This process is repeated until we get the assigned clusters number k and get k clustering centers. In the second phase, the semi-honest third party and all data involved parties use the k-means algorithm refine the results of the first phase and get the final clustering results. Finally, we give the proof of security of the algorithm and analysis of communication costs, and we show that our scheme is secure and complete with good efficiency.

Suggested Citation

  • Anrong Xue & Dongjie Jiang & Shiguang Ju & Weihe Chen & Handa Ma, 2009. "Privacy-Preserving Hierarchical-k-Means Clustering on Horizontally Partitioned Data," International Journal of Distributed Sensor Networks, , vol. 5(1), pages 81-81, January.
  • Handle: RePEc:sae:intdis:v:5:y:2009:i:1:p:81-81
    DOI: 10.1080/15501320802571863
    as

    Download full text from publisher

    File URL: https://journals.sagepub.com/doi/10.1080/15501320802571863
    Download Restriction: no

    File URL: https://libkey.io/10.1080/15501320802571863?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:sae:intdis:v:5:y:2009:i:1:p:81-81. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: SAGE Publications (email available below). General contact details of provider: .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.