IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v12y2024i2p346-d1323394.html
   My bibliography  Save this article

A Formalization of Multilabel Classification in Terms of Lattice Theory and Information Theory: Concerning Datasets

Author

Listed:
  • Francisco J. Valverde-Albacete

    (Department of Signal Theory and Communications, Telematic Systems and Computation, Universidad Rey Juan Carlos, 28942 Fuenlabrada, Madrid, Spain
    These authors contributed equally to this work.)

  • Carmen Peláez-Moreno

    (Department of Signal Theory and Communications, Universidad Carlos III de Madrid, 28911 Leganés, Madrid, Spain
    These authors contributed equally to this work.)

Abstract

Multilabel classification is a recently conceptualized task in machine learning. Contrary to most of the research that has so far focused on classification machinery, we take a data-centric approach and provide an integrative framework that blends qualitative and quantitative descriptions of multilabel data sources. By combining lattice theory, in the form of formal concept analysis, and entropy triangles, obtained from information theory, we explain from first principles the fundamental issues of multilabel datasets such as the dependencies of the labels, their imbalances, or the effects of the presence of hapaxes. This allows us to provide guidelines for resampling and new data collection and their relationship with broad modelling approaches. We have empirically validated our framework using 56 open datasets, challenging previous characterizations that prove that our formalization brings useful insights into the task of multilabel classification. Further work will consider the extension of this formalization to understand the relationship between the data sources, the classification methods, and ways to assess their performance.

Suggested Citation

  • Francisco J. Valverde-Albacete & Carmen Peláez-Moreno, 2024. "A Formalization of Multilabel Classification in Terms of Lattice Theory and Information Theory: Concerning Datasets," Mathematics, MDPI, vol. 12(2), pages 1-31, January.
  • Handle: RePEc:gam:jmathe:v:12:y:2024:i:2:p:346-:d:1323394
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/12/2/346/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/12/2/346/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Meila, Marina, 2007. "Comparing clusterings--an information based distance," Journal of Multivariate Analysis, Elsevier, vol. 98(5), pages 873-895, May.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Assaf Almog & Ferry Besamusca & Mel MacMahon & Diego Garlaschelli, 2015. "Mesoscopic Community Structure of Financial Markets Revealed by Price and Sign Fluctuations," PLOS ONE, Public Library of Science, vol. 10(7), pages 1-16, July.
    2. Juan Lucio & Raúl Mínguez & Asier Minondo & Francisco Requena, 2016. "Networks and the Dynamics of Firms' Export Portfolio: Evidence for Mexico," The World Economy, Wiley Blackwell, vol. 39(5), pages 708-736, May.
    3. Assaf Almog & Ferry Besamusca & Mel MacMahon & Diego Garlaschelli, 2015. "Mesoscopic Community Structure of Financial Markets Revealed by Price and Sign Fluctuations," Papers 1504.00590, arXiv.org.
    4. Damien A Fair & Alexander L Cohen & Jonathan D Power & Nico U F Dosenbach & Jessica A Church & Francis M Miezin & Bradley L Schlaggar & Steven E Petersen, 2009. "Functional Brain Networks Develop from a “Local to Distributed” Organization," PLOS Computational Biology, Public Library of Science, vol. 5(5), pages 1-14, May.
    5. Alessandro Chessa & Pierpaolo D’Urso & Livia Giovanni & Vincenzina Vitale & Alfonso Gebbia, 2023. "Complex networks for community detection of basketball players," Annals of Operations Research, Springer, vol. 325(1), pages 363-389, June.
    6. Piccardi, Carlo & Calatroni, Lisa & Bertoni, Fabio, 2010. "Communities in Italian corporate networks," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 389(22), pages 5247-5258.
    7. Luciana Crosilla & Marco Malgarini, 2011. "Behavioural models for manufacturing firms: analysing survey data," ECONOMIA E POLITICA INDUSTRIALE, FrancoAngeli Editore, vol. 2011(4), pages 139-163.
    8. Claudio Conversano & Massimo Cannas & Francesco Mola & Emiliano Sironi, 2019. "Random effects clustering in multilevel modeling: choosing a proper partition," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 13(1), pages 279-301, March.
    9. Lou, Hao & Li, Shenghong & Zhao, Yuxin, 2013. "Detecting community structure using label propagation with weighted coherent neighborhood propinquity," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 392(14), pages 3095-3105.
    10. Francisco de A. T. Carvalho & Antonio Irpino & Rosanna Verde & Antonio Balzanella, 2022. "Batch Self-Organizing Maps for Distributional Data with an Automatic Weighting of Variables and Components," Journal of Classification, Springer;The Classification Society, vol. 39(2), pages 343-375, July.
    11. Neave O'Clery & Samuel Heroy & Francois Hulot & Mariano Beguerisse-D'iaz, 2019. "Unravelling the forces underlying urban industrial agglomeration," Papers 1903.09279, arXiv.org, revised Jun 2019.
    12. Efstratios K Kosmidis & Vasiliki Moschou & Georgios Ziogas & Ioannis Boukovinas & Maria Albani & Nikolaos A Laskaris, 2014. "Functional Aspects of the EGF-Induced MAP Kinase Cascade: A Complex Self-Organizing System Approach," PLOS ONE, Public Library of Science, vol. 9(11), pages 1-12, November.
    13. Isabella Morlini & Sergio Zani, 2012. "Dissimilarity and similarity measures for comparing dendrograms and their applications," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 6(2), pages 85-105, July.
    14. Kemmawadee Preedalikit & Daniel Fernández & Ivy Liu & Louise McMillan & Marta Nai Ruscone & Roy Costilla, 2024. "Row mixture-based clustering with covariates for ordinal responses," Computational Statistics, Springer, vol. 39(5), pages 2511-2555, July.
    15. Ekaterina Kovaleva & Boris Mirkin, 2015. "Bisecting K-Means and 1D Projection Divisive Clustering: A Unified Framework and Experimental Comparison," Journal of Classification, Springer;The Classification Society, vol. 32(3), pages 414-442, October.
    16. Julian Maluck & Reik V Donner, 2015. "A Network of Networks Perspective on Global Trade," PLOS ONE, Public Library of Science, vol. 10(7), pages 1-24, July.
    17. Christian Hennig, 2022. "An empirical comparison and characterisation of nine popular clustering methods," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 16(1), pages 201-229, March.
    18. Zema, Sebastiano Michele, 2022. "Uncovering the network structure of non-centrally cleared derivative markets: evidences from regulatory data," Working Paper Series 2721, European Central Bank.
    19. Ronaldo F. Zampolo & Frederico H. R. Lopes & Rodrigo M. S. de Oliveira & Martim F. Fernandes & Victor Dmitriev, 2024. "Dimensionality Reduction and Clustering Strategies for Label Propagation in Partial Discharge Data Sets," Energies, MDPI, vol. 17(23), pages 1-18, November.
    20. Huaylla, Claudia A. & Kuperman, Marcelo N. & Garibaldi, Lucas A., 2024. "Comparison of two statistical measures of complexity applied to ecological bipartite networks," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 642(C).

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:12:y:2024:i:2:p:346-:d:1323394. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.