IDEAS home Printed from https://ideas.repec.org/a/eee/chsofr/v153y2021ip1s0960077921008481.html
   My bibliography  Save this article

FKMAWCW: Categorical fuzzy k-modes clustering with automated attribute-weight and cluster-weight learning

Author

Listed:
  • Golzari Oskouei, Amin
  • Balafar, Mohammad Ali
  • Motamed, Cina

Abstract

The fuzzy k-modes (FKM) is a popular method for clustering categorical data. However, the main problem of this algorithm is that it is very sensitive to the initialization of primary clusters, so inappropriate initial cluster centers lead to poor local optima. Another problem with the FKM is the equal importance of the attributes used during the clustering process, which in real applications, the importance of the attributes are different, and some attributes are more important than others. Some versions of FKM have been presented in the literature, each of which has somehow solved one of the above problems. In this paper, we propose a new clustering method (FKMAWCW) to solve mentioned problems at the same time. In the proposed clustering process, a local attribute weighting mechanism is used to weight the attributes of each cluster properly. Also, a cluster weighting mechanism is proposed to solve the initialization sensitivity. Attribute weight and cluster weight are learned simultaneously and automatically during the clustering process. In addition, to reduce the noise sensitivity, a new distance function is proposed. So, the proposed algorithm can tolerate noisy environment. Extensive experiments on 11 benchmark datasets and an artificially generated dataset show that the proposed algorithm performs better than the state-of-the-art algorithms. This paper presents mathematical analyses to obtain updating functions, providing the convergence proof of the algorithm. The implementation source code of FKMAWCW is made publicly available at https://github.com/Amin-Golzari-Oskouei/FKMAWCW.

Suggested Citation

  • Golzari Oskouei, Amin & Balafar, Mohammad Ali & Motamed, Cina, 2021. "FKMAWCW: Categorical fuzzy k-modes clustering with automated attribute-weight and cluster-weight learning," Chaos, Solitons & Fractals, Elsevier, vol. 153(P1).
  • Handle: RePEc:eee:chsofr:v:153:y:2021:i:p1:s0960077921008481
    DOI: 10.1016/j.chaos.2021.111494
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0960077921008481
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.chaos.2021.111494?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Sami Naouali & Semeh Ben Salem & Zied Chtourou, 2020. "Clustering Categorical Data: A Survey," International Journal of Information Technology & Decision Making (IJITDM), World Scientific Publishing Co. Pte. Ltd., vol. 19(01), pages 49-96, February.
    2. Asgari-Chenaghlu, Meysam & Feizi-Derakhshi, Mohammad-Reza & farzinvash, Leili & Balafar, Mohammad-Ali & Motamed, Cina, 2021. "TopicBERT: A cognitive approach for topic detection from multimodal post stream using BERT and memory–graph," Chaos, Solitons & Fractals, Elsevier, vol. 151(C).
    3. Wayne DeSarbo & J. Carroll & Linda Clark & Paul Green, 1984. "Synthesized clustering: A method for amalgamating alternative clustering bases with differential weighting of variables," Psychometrika, Springer;The Psychometric Society, vol. 49(1), pages 57-78, March.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Renato Cordeiro Amorim, 2016. "A Survey on Feature Weighting Based K-Means Algorithms," Journal of Classification, Springer;The Classification Society, vol. 33(2), pages 210-242, July.
    2. Balepur, Prashant Narayan, 1998. "Impacts of Computer-Mediated Communication on Travel and Communication Patterns: The Davis Community Network Study," Institute of Transportation Studies, Research Reports, Working Papers, Proceedings qt6cb1f85c, Institute of Transportation Studies, UC Berkeley.
    3. Paul Green & Jonathan Kim & Frank Carmone, 1990. "A preliminary study of optimal variable weighting in k-means clustering," Journal of Classification, Springer;The Classification Society, vol. 7(2), pages 271-285, September.
    4. Geert Soete & Wayne DeSarbo & J. Carroll, 1985. "Optimal variable weighting for hierarchical clustering: An alternating least-squares algorithm," Journal of Classification, Springer;The Classification Society, vol. 2(1), pages 173-192, December.
    5. Renato Amorim, 2015. "Feature Relevance in Ward’s Hierarchical Clustering Using the L p Norm," Journal of Classification, Springer;The Classification Society, vol. 32(1), pages 46-62, April.
    6. J. Fernando Vera & Rodrigo Macías, 2021. "On the Behaviour of K-Means Clustering of a Dissimilarity Matrix by Means of Full Multidimensional Scaling," Psychometrika, Springer;The Psychometric Society, vol. 86(2), pages 489-513, June.
    7. A. Gordon, 1990. "Constructing dissimilarity measures," Journal of Classification, Springer;The Classification Society, vol. 7(2), pages 257-269, September.
    8. Douglas Steinley & Michael Brusco, 2008. "Selection of Variables in Cluster Analysis: An Empirical Comparison of Eight Procedures," Psychometrika, Springer;The Psychometric Society, vol. 73(1), pages 125-144, March.
    9. Yaling Deng & Shuliang Zou & Daming You, 2018. "Theoretical Guidance on Evacuation Decisions after a Big Nuclear Accident under the Assumption That Evacuation Is Desirable," Sustainability, MDPI, vol. 10(9), pages 1-14, August.
    10. Nikzad-Khasmakhi, N. & Balafar, M.A. & Reza Feizi-Derakhshi, M. & Motamed, Cina, 2021. "BERTERS: Multimodal representation learning for expert recommendation system with transformers and graph embeddings," Chaos, Solitons & Fractals, Elsevier, vol. 151(C).
    11. Maarten M. Kampert & Jacqueline J. Meulman & Jerome H. Friedman, 2017. "rCOSA: A Software Package for Clustering Objects on Subsets of Attributes," Journal of Classification, Springer;The Classification Society, vol. 34(3), pages 514-547, October.
    12. Wayne DeSarbo & Richard Oliver & Arvind Rangaswamy, 1989. "A simulated annealing methodology for clusterwise linear regression," Psychometrika, Springer;The Psychometric Society, vol. 54(4), pages 707-736, September.
    13. Dolnicar, Sara & Grün, Bettina & Leisch, Friedrich, 2016. "Increasing sample size compensates for data problems in segmentation studies," Journal of Business Research, Elsevier, vol. 69(2), pages 992-999.
    14. Gao, Jinxin & Hitchcock, David B., 2010. "James-Stein shrinkage to improve k-means cluster analysis," Computational Statistics & Data Analysis, Elsevier, vol. 54(9), pages 2113-2127, September.
    15. Tsionas, Mike G., 2023. "Clustering and meta-envelopment in data envelopment analysis," European Journal of Operational Research, Elsevier, vol. 304(2), pages 763-778.
    16. J. Fernando Vera & Rodrigo Macías, 2017. "Variance-Based Cluster Selection Criteria in a K-Means Framework for One-Mode Dissimilarity Data," Psychometrika, Springer;The Psychometric Society, vol. 82(2), pages 275-294, June.
    17. Tsai, Chieh-Yuan & Chiu, Chuang-Cheng, 2008. "Developing a feature weight self-adjustment mechanism for a K-means clustering algorithm," Computational Statistics & Data Analysis, Elsevier, vol. 52(10), pages 4658-4672, June.
    18. Shi, Lingyuan & Yang, Xin & Chang, Ximing & Wu, Jianjun & Sun, Huijun, 2023. "An improved density peaks clustering algorithm based on k nearest neighbors and turning point for evaluating the severity of railway accidents," Reliability Engineering and System Safety, Elsevier, vol. 233(C).
    19. Stef Buuren & Willem Heiser, 1989. "Clusteringn objects intok groups under optimal scaling of variables," Psychometrika, Springer;The Psychometric Society, vol. 54(4), pages 699-706, September.
    20. Myung-Hoe Huh & Yong Lim, 2009. "Weighting variables in K-means clustering," Journal of Applied Statistics, Taylor & Francis Journals, vol. 36(1), pages 67-78.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:chsofr:v:153:y:2021:i:p1:s0960077921008481. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Thayer, Thomas R. (email available below). General contact details of provider: https://www.journals.elsevier.com/chaos-solitons-and-fractals .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.