IDEAS home Printed from https://ideas.repec.org/a/spr/advdac/v15y2021i2d10.1007_s11634-020-00425-4.html
   My bibliography  Save this article

Clustering of modal-valued symbolic data

Author

Listed:
  • Nataša Kejžar

    (University of Ljubljana)

  • Simona Korenjak-Černe

    (University of Ljubljana)

  • Vladimir Batagelj

    (Institute of Mathematics, Physics and Mechanics
    University of Primorska
    National Research University Higher School of Economics)

Abstract

Symbolic data analysis is based on special descriptions of data known as symbolic objects (SOs). Such descriptions preserve more detailed information about units and their clusters than the usual representations with mean values. A special type of SO is a representation with frequency or probability distributions (modal values). This representation enables us to simultaneously consider variables of all measurement types during the clustering process. In this paper, we present the theoretical basis for compatible leaders and agglomerative clustering methods with alternative dissimilarities for modal-valued SOs. The leaders method efficiently solves clustering problems with large numbers of units, while the agglomerative method can be applied either alone to a small data set, or to leaders, obtained from the compatible leaders clustering method. We focus on (a) the inclusion of weights that enables clustering representatives to retain the same structure as if clustering only first order units and (b) the selection of relative dissimilarities that produce more interpretable, i.e., meaningful optimal clustering representatives. The usefulness of the proposed methods with adaptations was assessed and substantiated by carefully constructed simulation settings and demonstrated on three different real-world data sets gaining in interpretability from the use of weights (population pyramids and ESS data) or relative dissimilarity (US patents data).

Suggested Citation

  • Nataša Kejžar & Simona Korenjak-Černe & Vladimir Batagelj, 2021. "Clustering of modal-valued symbolic data," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 15(2), pages 513-541, June.
  • Handle: RePEc:spr:advdac:v:15:y:2021:i:2:d:10.1007_s11634-020-00425-4
    DOI: 10.1007/s11634-020-00425-4
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11634-020-00425-4
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11634-020-00425-4?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Hall, B. & Jaffe, A. & Trajtenberg, M., 2001. "The NBER Patent Citations Data File: Lessons, Insights and Methodological Tools," Papers 2001-29, Tel Aviv.
    2. Billard L. & Diday E., 2003. "From the Statistics of Data to the Statistics of Knowledge: Symbolic Data Analysis," Journal of the American Statistical Association, American Statistical Association, vol. 98, pages 470-487, January.
    3. Kim, Jaejik & Billard, L., 2012. "Dissimilarity measures and divisive clustering for symbolic multimodal-valued data," Computational Statistics & Data Analysis, Elsevier, vol. 56(9), pages 2795-2808.
    4. Kim, Jaejik & Billard, L., 2011. "A polythetic clustering process and cluster validity indexes for histogram-valued objects," Computational Statistics & Data Analysis, Elsevier, vol. 55(7), pages 2250-2262, July.
    5. Nataša Kejžar & Simona Korenjak-Černe & Vladimir Batagelj, 2011. "Clustering of Distributions: A Case of Patent Citations," Journal of Classification, Springer;The Classification Society, vol. 28(2), pages 156-183, July.
    6. Francisco Carvalho & Paula Brito & Hans-Hermann Bock, 2006. "Dynamic clustering for interval data based on L 2 distance," Computational Statistics, Springer, vol. 21(2), pages 231-250, June.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. A. Pedro Duarte Silva & Peter Filzmoser & Paula Brito, 2018. "Outlier detection in interval data," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 12(3), pages 785-822, September.
    2. Fei Liu & L. Billard, 2022. "Partition of Interval-Valued Observations Using Regression," Journal of Classification, Springer;The Classification Society, vol. 39(1), pages 55-77, March.
    3. Maia, André Luis Santiago & de Carvalho, Francisco de A.T., 2011. "Holt's exponential smoothing and neural network models for forecasting interval-valued time series," International Journal of Forecasting, Elsevier, vol. 27(3), pages 740-759, July.
    4. Maia, André Luis Santiago & de Carvalho, Francisco de A.T., 2011. "Holt’s exponential smoothing and neural network models for forecasting interval-valued time series," International Journal of Forecasting, Elsevier, vol. 27(3), pages 740-759.
    5. Guo, Junpeng & Li, Wenhua & Li, Chenhua & Gao, Sa, 2012. "Standardization of interval symbolic data based on the empirical descriptive statistics," Computational Statistics & Data Analysis, Elsevier, vol. 56(3), pages 602-610.
    6. Soroosh Shalileh, 2023. "An Effective Partitional Crisp Clustering Method Using Gradient Descent Approach," Mathematics, MDPI, vol. 11(12), pages 1-23, June.
    7. M. Rosário Oliveira & Margarida Azeitona & António Pacheco & Rui Valadas, 2022. "Association measures for interval variables," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 16(3), pages 491-520, September.
    8. Ana Belén Ramos-Guajardo, 2022. "A hierarchical clustering method for random intervals based on a similarity measure," Computational Statistics, Springer, vol. 37(1), pages 229-261, March.
    9. Manuel Ammann & Philipp Horsch & David Oesch, 2016. "Competing with Superstars," Management Science, INFORMS, vol. 62(10), pages 2842-2858, October.
    10. Hickfang, Michael & Holder, Ulrike, 2018. "The impact of stock options on risk-taking: Founder-CEOs and innovation," Discussion Papers of the Institute for Organisational Economics 12/2018, University of Münster, Institute for Organisational Economics.
    11. Ufuk Akcigit & Murat Celik & Daron Acemoglu, 2014. "Young, Restless and Creative: Openness to Disruption and Creative Innovations," 2014 Meeting Papers 377, Society for Economic Dynamics.
    12. Panayotis Dessyllas & Alan Hughes, 2005. "R&D and Patenting Activity and the Propensity to Acquire in High Technology Industries," Industrial Organization 0507008, University Library of Munich, Germany.
    13. Miguel de Carvalho & Gabriel Martos, 2022. "Modeling interval trendlines: Symbolic singular spectrum analysis for interval time series," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 41(1), pages 167-180, January.
    14. Galasso, Alberto & Schankerman, Mark, 2013. "Patents and Cumulative Innovation:Causal Evidence from the Courts," IIR Working Paper 13-16, Institute of Innovation Research, Hitotsubashi University.
    15. Guan-Can Yang & Gang Li & Chun-Ya Li & Yun-Hua Zhao & Jing Zhang & Tong Liu & Dar-Zen Chen & Mu-Hsuan Huang, 2015. "Using the comprehensive patent citation network (CPC) to evaluate patent value," Scientometrics, Springer;Akadémiai Kiadó, vol. 105(3), pages 1319-1346, December.
    16. Jos� Lobo & Charlotta Mellander & Kevin Stolarick & Deborah Strumsky, 2014. "The Inventive, the Educated and the Creative: How Do They Affect Metropolitan Productivity?," Industry and Innovation, Taylor & Francis Journals, vol. 21(2), pages 155-177, February.
    17. Sheikh, Shahbaz, 2018. "The impact of market competition on the relation between CEO power and firm innovation," Journal of Multinational Financial Management, Elsevier, vol. 44(C), pages 36-50.
    18. Pauly, Stefan & Stipanicic, Fernando, 2021. "The creation and diffusion of knowledge: Evidence from the Jet Age," CEPREMAP Working Papers (Docweb) 2112, CEPREMAP.
    19. Suma Athreye & Martha Prevezer, 2008. "R&D offshoring and the domestic science base in India and China," Working Papers 26, Queen Mary, University of London, School of Business and Management, Centre for Globalisation Research.
    20. Florent Silve & Alexander Plekhanov, 2018. "Institutions, innovation and growth : Evidence from industry data," The Economics of Transition, The European Bank for Reconstruction and Development, vol. 26(3), pages 335-362, July.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:advdac:v:15:y:2021:i:2:d:10.1007_s11634-020-00425-4. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.