IDEAS home Printed from https://ideas.repec.org/a/spr/advdac/v12y2018i2d10.1007_s11634-017-0292-z.html
   My bibliography  Save this article

Clustering of imbalanced high-dimensional media data

Author

Listed:
  • Šárka Brodinová

    (TU Wien)

  • Maia Zaharieva

    (TU Wien
    University of Vienna)

  • Peter Filzmoser

    (TU Wien)

  • Thomas Ortner

    (University of Vienna
    TU Wien)

  • Christian Breiteneder

    (TU Wien)

Abstract

Media content in large repositories usually exhibits multiple groups of strongly varying sizes. Media of potential interest often form notably smaller groups. Such media groups differ so much from the remaining data that it may be worthy to look at them in more detail. In contrast, media with popular content appear in larger groups. Identifying groups of varying sizes is addressed by clustering of imbalanced data. Clustering highly imbalanced media groups is additionally challenged by the high dimensionality of the underlying features. In this paper, we present the imbalanced clustering (IClust) algorithm designed to reveal group structures in high-dimensional media data. IClust employs an existing clustering method in order to find an initial set of a large number of potentially highly pure clusters which are then successively merged. The main advantage of IClust is that the number of clusters does not have to be pre-specified and that no specific assumptions about the cluster or data characteristics need to be made. Experiments on real-world media data demonstrate that in comparison to existing methods, IClust is able to better identify media groups, especially groups of small sizes.

Suggested Citation

  • Šárka Brodinová & Maia Zaharieva & Peter Filzmoser & Thomas Ortner & Christian Breiteneder, 2018. "Clustering of imbalanced high-dimensional media data," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 12(2), pages 261-284, June.
  • Handle: RePEc:spr:advdac:v:12:y:2018:i:2:d:10.1007_s11634-017-0292-z
    DOI: 10.1007/s11634-017-0292-z
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11634-017-0292-z
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11634-017-0292-z?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Fionn Murtagh & Pierre Legendre, 2014. "Ward’s Hierarchical Agglomerative Clustering Method: Which Algorithms Implement Ward’s Criterion?," Journal of Classification, Springer;The Classification Society, vol. 31(3), pages 274-295, October.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Maurizio Vichi & Carlo Cavicchia & Patrick J. F. Groenen, 2022. "Hierarchical Means Clustering," Journal of Classification, Springer;The Classification Society, vol. 39(3), pages 553-577, November.
    2. Jiao Jieying & Hu Guanyu & Yan Jun, 2021. "A Bayesian marked spatial point processes model for basketball shot chart," Journal of Quantitative Analysis in Sports, De Gruyter, vol. 17(2), pages 77-90, June.
    3. Paulus, Michal & Kristoufek, Ladislav, 2015. "Worldwide clustering of the corruption perception," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 428(C), pages 351-358.
    4. Hyeri Choi & Min Jae Park, 2019. "Evaluating the Efficiency of Governmental Excellence for Social Progress: Focusing on Low- and Lower-Middle-Income Countries," Social Indicators Research: An International and Interdisciplinary Journal for Quality-of-Life Measurement, Springer, vol. 141(1), pages 111-130, January.
    5. Maksym Polyakov & Morteza Chalak & Md. Sayed Iftekhar & Ram Pandit & Sorada Tapsuwan & Fan Zhang & Chunbo Ma, 2018. "Authorship, Collaboration, Topics, and Research Gaps in Environmental and Resource Economics 1991–2015," Environmental & Resource Economics, Springer;European Association of Environmental and Resource Economists, vol. 71(1), pages 217-239, September.
    6. Grzegorz Maciejewski & Mirosława Malinowska & Barbara Kucharska & Michał Kucia & Beata Kolny, 2021. "Sustainable Development as a Factor Differentiating Consumer Behavior: The Case of Poland," European Research Studies Journal, European Research Studies Journal, vol. 0(3), pages 934-948.
    7. Giger, Markus & Mutea, Emily & Kiteme, Boniface & Eckert, Sandra & Anseeuw, Ward & Zaehringer, Julie G., 2020. "Large agricultural investments in Kenya’s Nanyuki Area: Inventory and analysis of business models," Land Use Policy, Elsevier, vol. 99(C).
    8. Walker, Nathan L. & Styles, David & Coughlan, Paul & Williams, A. Prysor, 2022. "Cross-sector sustainability benchmarking of major utilities in the United Kingdom," Utilities Policy, Elsevier, vol. 78(C).
    9. Pierre H. H. Schneeberger & Morgan Gueuning & Sophie Welsche & Eveline Hürlimann & Julian Dommann & Cécile Häberli & Jürg E. Frey & Somphou Sayasone & Jennifer Keiser, 2022. "Different gut microbial communities correlate with efficacy of albendazole-ivermectin against soil-transmitted helminthiases," Nature Communications, Nature, vol. 13(1), pages 1-12, December.
    10. Abang Zainoren Abang Abdurahman & Syerina Azlin Md Nasir & Wan Fairos Wan Yaacob & Serah Jaya & Suhaili Mokhtar, 2021. "Spatio-Temporal Clustering of Sarawak Malaysia Total Protected Area Visitors," Sustainability, MDPI, vol. 13(21), pages 1-19, October.
    11. Mulu Abraha Woldegiorgis & Janet E. Hiller & Wubegzier Mekonnen & Jahar Bhowmik, 2018. "Disparities in maternal health services in sub-Saharan Africa," International Journal of Public Health, Springer;Swiss School of Public Health (SSPH+), vol. 63(4), pages 525-535, May.
    12. Monika Stanny & Łukasz Komorowski & Andrzej Rosner, 2021. "The Socio-Economic Heterogeneity of Rural Areas: Towards a Rural Typology of Poland," Energies, MDPI, vol. 14(16), pages 1-23, August.
    13. Renato Amorim, 2015. "Feature Relevance in Ward’s Hierarchical Clustering Using the L p Norm," Journal of Classification, Springer;The Classification Society, vol. 32(1), pages 46-62, April.
    14. Anca Gabriela Ilie & Marinela Luminita Emanuela Zlatea & Cristina Negreanu & Dan Dumitriu & Alma Pentescu, 2023. "Reliance on Russian Federation Energy Imports and Renewable Energy in the European Union," The AMFITEATRU ECONOMIC journal, Academy of Economic Studies - Bucharest, Romania, vol. 25(64), pages 780-780, August.
    15. Luiza Ossowska & Dorota Janiszewska & Natalia Bartkowiak-Bakun & Grzegorz Kwiatkowski, 2020. "Energy Consumption Versus Greenhouse Gas Emissions in EU," European Research Studies Journal, European Research Studies Journal, vol. 0(3), pages 185-198.
    16. Lerato Lerato & Thomas Niesler, 2015. "Clustering Acoustic Segments Using Multi-Stage Agglomerative Hierarchical Clustering," PLOS ONE, Public Library of Science, vol. 10(10), pages 1-24, October.
    17. Jon Ellingsen & Vegard H. Larsen & Leif Anders Thorsrud, 2020. "News Media vs. FRED-MD for Macroeconomic Forecasting," CESifo Working Paper Series 8639, CESifo.
    18. Sokhna Dieng & Pierre Michel & Abdoulaye Guindo & Kankoe Sallah & El-Hadj Ba & Badara Cissé & Maria Patrizia Carrieri & Cheikh Sokhna & Paul Milligan & Jean Gaudart, 2020. "Application of Functional Data Analysis to Identify Patterns of Malaria Incidence, to Guide Targeted Control Strategies," IJERPH, MDPI, vol. 17(11), pages 1-23, June.
    19. Jill F. Lundell & Brennan Bean & Jürgen Symanzik, 2023. "Let’s talk about the weather: a cluster-based approach to weather forecast accuracy," Computational Statistics, Springer, vol. 38(3), pages 1135-1155, September.
    20. Dong, Xinghui & Li, Jia & Gao, Di & Zheng, Kai, 2020. "Wind speed modeling for cascade clusters of wind turbines part 1: The cascade clusters of wind turbines," Energy, Elsevier, vol. 205(C).

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:advdac:v:12:y:2018:i:2:d:10.1007_s11634-017-0292-z. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.