IDEAS home Printed from https://ideas.repec.org/a/ags/aolpei/157585.html
   My bibliography  Save this article

Hierarchical Cluster Analysis – Various Approaches to Data Preparation

Author

Listed:
  • Pacáková, Z.
  • Poláčková, J.

Abstract

The article deals with two various approaches to data preparation to avoid multicollinearity. The aim of the article is to find similarities among the e-communication level of EU states using hierarchical cluster analysis. The original set of fourteen indicators was first reduced on the basis of correlation analysis while in case of high correlation indicator of higher variability was included in further analysis. Secondly the data were transformed using principal component analysis while the principal components are poorly correlated. For further analysis five principal components explaining about 92% of variance were selected. Hierarchical cluster analysis was performed both based on the reduced data set and the principal component scores. Both times three clusters were assumed following Pseudo t-Squared and Pseudo F Statistic, but the final clusters were not identical. An important characteristic to compare the two results found was to look at the proportion of variance accounted for by the clusters which was about ten percent higher for the principal component scores (57.8% compared to 47%). Therefore it can be stated, that in case of using principal component scores as an input variables for cluster analysis with explained proportion high enough (about 92% for in our analysis), the loss of information is lower compared to data reduction on the basis of correlation analysis.

Suggested Citation

  • Pacáková, Z. & Poláčková, J., 2013. "Hierarchical Cluster Analysis – Various Approaches to Data Preparation," AGRIS on-line Papers in Economics and Informatics, Czech University of Life Sciences Prague, Faculty of Economics and Management, vol. 5(3), pages 1-11, September.
  • Handle: RePEc:ags:aolpei:157585
    DOI: 10.22004/ag.econ.157585
    as

    Download full text from publisher

    File URL: https://ageconsearch.umn.edu/record/157585/files/agris_on-line_2013_3_pacakova_polackova-2.pdf
    Download Restriction: no

    File URL: https://libkey.io/10.22004/ag.econ.157585?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Glenn Milligan & Martha Cooper, 1985. "An examination of procedures for determining the number of clusters in a data set," Psychometrika, Springer;The Psychometric Society, vol. 50(2), pages 159-179, June.
    2. H. Bock, 1985. "On some significance tests in cluster analysis," Journal of Classification, Springer;The Classification Society, vol. 2(1), pages 77-108, December.
    3. Bock H.H., 1985. "On some significance tests in cluster analysis," Journal of Classification, Springer;The Classification Society, vol. 2(1), pages 300-300, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. María Gallegos & Gunter Ritter, 2009. "Trimming algorithms for clustering contaminated grouped data and their robustness," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 3(2), pages 135-167, September.
    2. Gallegos, María Teresa & Ritter, Gunter, 2010. "Using combinatorial optimization in model-based trimmed clustering with cardinality constraints," Computational Statistics & Data Analysis, Elsevier, vol. 54(3), pages 637-654, March.
    3. Véronique Cariou & Stéphane Verdun & Emmanuelle Diaz & El Qannari & Evelyne Vigneau, 2009. "Comparison of three hypothesis testing approaches for the selection of the appropriate number of clusters of variables," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 3(3), pages 227-241, December.
    4. Jacques-Antoine Gauthier & Eric D. Widmer & Philipp Bucher & Cédric Notredame, 2009. "How Much Does It Cost?," Sociological Methods & Research, , vol. 38(1), pages 197-231, August.
    5. María Teresa Gallegos & Gunter Ritter, 2018. "Probabilistic clustering via Pareto solutions and significance tests," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 12(2), pages 179-202, June.
    6. Álvarez Gil, M. J. & Burgos Jiménez, J. & Céspedes Lorente, J. J., 2001. "An analysis of environmental management, organizational context and performance of Spanish hotels," Omega, Elsevier, vol. 29(6), pages 457-471, December.
    7. Liu, Pei-chen Barry & Hansen, Mark & Mukherjee, Avijit, 2008. "Scenario-based air traffic flow management: From theory to practice," Transportation Research Part B: Methodological, Elsevier, vol. 42(7-8), pages 685-702, August.
    8. Hélène Syed Zwick & S. Ali Shah Syed, 2017. "The polarization impact of the crisis on the Eurozone labour markets: a hierarchical cluster analysis," Applied Economics Letters, Taylor & Francis Journals, vol. 24(7), pages 472-476, April.
    9. Lingsong Meng & Dorina Avram & George Tseng & Zhiguang Huo, 2022. "Outcome‐guided sparse K‐means for disease subtype discovery via integrating phenotypic data with high‐dimensional transcriptomic data," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 71(2), pages 352-375, March.
    10. repec:hal:spmain:info:hdl:2441/5ahh4t5kfl8nprei89ignlk5nl is not listed on IDEAS
    11. Goethner, Maximilian & Hornuf, Lars & Regner, Tobias, 2021. "Protecting investors in equity crowdfunding: An empirical analysis of the small investor protection act," Technological Forecasting and Social Change, Elsevier, vol. 162(C).
    12. Lee, Keun & Shin, Hochul, 2021. "Varieties of capitalism and East Asia: Long-term evolution, structural change, and the end of East Asian capitalism," Structural Change and Economic Dynamics, Elsevier, vol. 56(C), pages 431-437.
    13. Marcel Clermont & Julia Schaefer, 2019. "Identification of Outliers in Data Envelopment Analysis," Schmalenbach Business Review, Springer;Schmalenbach-Gesellschaft, vol. 71(4), pages 475-496, October.
    14. Cees H. Elzinga & Aart C. Liefbroer, 2007. "De-standardization of Family-Life Trajectories of Young Adults: A Cross-National Comparison Using Sequence Analysis," European Journal of Population, Springer;European Association for Population Studies, vol. 23(3), pages 225-250, October.
    15. Vivien Kana Zeumo & Alexis Tsoukiàs & Blaise Some, 2012. "A new methodology for multidimensional poverty measurement based on the capability approach," Working Papers hal-00874627, HAL.
    16. Michele Cincera, 2005. "Firms' productivity growth and R&D spillovers: An analysis of alternative technological proximity measures," Economics of Innovation and New Technology, Taylor & Francis Journals, vol. 14(8), pages 657-682.
    17. Marcelo Yoshio Takam, 2014. "Short-Term Drivers Of Sovereign Cds Spreads," Anais do XLI Encontro Nacional de Economia [Proceedings of the 41st Brazilian Economics Meeting] 130, ANPEC - Associação Nacional dos Centros de Pós-Graduação em Economia [Brazilian Association of Graduate Programs in Economics].
    18. Pennings, J.S.J. & van Kranenburg, H.L. & Hagedoorn, J., 2005. "Past, present and future of the telecommunications industry," Research Memorandum 016, Maastricht University, Maastricht Research School of Economics of Technology and Organization (METEOR).
    19. Balakrishnan, P. V. (Sundar) & Cooper, Martha C. & Jacob, Varghese S. & Lewis, Phillip A., 1996. "Comparative performance of the FSCL neural net and K-means algorithm for market segmentation," European Journal of Operational Research, Elsevier, vol. 93(2), pages 346-357, September.
    20. Anis Hoayek & Didier Rullière, 2024. "Assessing clustering methods using Shannon's entropy," Post-Print hal-03812055, HAL.
    21. Li-Xuan Qin & Steven G. Self, 2006. "The Clustering of Regression Models Method with Applications in Gene Expression Data," Biometrics, The International Biometric Society, vol. 62(2), pages 526-533, June.

    More about this item

    Keywords

    ;

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:ags:aolpei:157585. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: AgEcon Search (email available below). General contact details of provider: https://edirc.repec.org/data/fevszcz.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.