IDEAS home Printed from https://ideas.repec.org/a/vrs/stintr/v20y2019i2p33-47n1.html
   My bibliography  Save this article

The Effect Of Binary Data Transformation In Categorical Data Clustering

Author

Listed:
  • Cibulková Jana

    (Department of Statistics and Probability, University of Economics, Prague, Czech Republic .)

  • Šulc Zdeněk

    (Department of Statistics and Probability, University of Economics, Prague, Czech Republic .)

  • Sirota Sergej

    (Department]of Statistics and Probability, University of Economics, Prague, Czech Republic .)

  • Řezanková Hana

    (Department of Statistics and Probability, University of Economics, Prague, Czech Republic .)

Abstract

This paper focuses on hierarchical clustering of categorical data and compares two approaches which can be used for this task. The first one, an extremely common approach, is to perform a binary transformation of the categorical variables into sets of dummy variables and then use the similarity measures suited for binary data. These similarity measures are well examined, and they occur in both commercial and non-commercial software. However, a binary transformation can possibly cause a loss of information in the data or decrease the speed of the computations. The second approach uses similarity measures developed for the categorical data. But these measures are not so well examined as the binary ones and they are not implemented in commercial software. The comparison of these two approaches is performed on generated data sets with categorical variables and the evaluation is done using both the internal and the external evaluation criteria. The purpose of this paper is to show that the binary transformation is not necessary in the process of clustering categorical data since the second approach leads to at least comparably good clustering results as the first approach.

Suggested Citation

  • Cibulková Jana & Šulc Zdeněk & Sirota Sergej & Řezanková Hana, 2019. "The Effect Of Binary Data Transformation In Categorical Data Clustering," Statistics in Transition New Series, Polish Statistical Association, vol. 20(2), pages 33-47, June.
  • Handle: RePEc:vrs:stintr:v:20:y:2019:i:2:p:33-47:n:1
    DOI: 10.21307/stattrans-2019-013
    as

    Download full text from publisher

    File URL: https://doi.org/10.21307/stattrans-2019-013
    Download Restriction: no

    File URL: https://libkey.io/10.21307/stattrans-2019-013?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:vrs:stintr:v:20:y:2019:i:2:p:33-47:n:1. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Peter Golla (email available below). General contact details of provider: https://www.sciendo.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.