IDEAS home Printed from https://ideas.repec.org/a/hin/complx/7698274.html
   My bibliography  Save this article

Self-Adaptive -Means Based on a Covering Algorithm

Author

Listed:
  • Yiwen Zhang
  • Yuanyuan Zhou
  • Xing Guo
  • Jintao Wu
  • Qiang He
  • Xiao Liu
  • Yun Yang

Abstract

The -means algorithm is one of the ten classic algorithms in the area of data mining and has been studied by researchers in numerous fields for a long time. However, the value of the clustering number in the -means algorithm is not always easy to be determined, and the selection of the initial centers is vulnerable to outliers. This paper proposes an improved -means clustering algorithm called the covering -means algorithm (C- -means). The C- -means algorithm can not only acquire efficient and accurate clustering results but also self-adaptively provide a reasonable numbers of clusters based on the data features. It includes two phases: the initialization of the covering algorithm (CA) and the Lloyd iteration of the -means . The first phase executes the CA. CA self-organizes and recognizes the number of clusters based on the similarities in the data, and it requires neither the number of clusters to be prespecified nor the initial centers to be manually selected. Therefore, it has a “blind†feature, that is, is not preselected. The second phase performs the Lloyd iteration based on the results of the first phase. The C- -means algorithm combines the advantages of CA and -means. Experiments are carried out on the Spark platform, and the results verify the good scalability of the C- -means algorithm. This algorithm can effectively solve the problem of large-scale data clustering. Extensive experiments on real data sets show that the accuracy and efficiency of the C- -means algorithm outperforms the existing algorithms under both sequential and parallel conditions.

Suggested Citation

  • Yiwen Zhang & Yuanyuan Zhou & Xing Guo & Jintao Wu & Qiang He & Xiao Liu & Yun Yang, 2018. "Self-Adaptive -Means Based on a Covering Algorithm," Complexity, Hindawi, vol. 2018, pages 1-16, August.
  • Handle: RePEc:hin:complx:7698274
    DOI: 10.1155/2018/7698274
    as

    Download full text from publisher

    File URL: http://downloads.hindawi.com/journals/8503/2018/7698274.pdf
    Download Restriction: no

    File URL: http://downloads.hindawi.com/journals/8503/2018/7698274.xml
    Download Restriction: no

    File URL: https://libkey.io/10.1155/2018/7698274?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:hin:complx:7698274. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Mohamed Abdelhakeem (email available below). General contact details of provider: https://www.hindawi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.