IDEAS home Printed from
   My bibliography  Save this article

Comparing clusterings--an information based distance


  • Meila, Marina


This paper proposes an information theoretic criterion for comparing two partitions, or clusterings, of the same data set. The criterion, called variation of information (VI), measures the amount of information lost and gained in changing from clustering to clustering . The basic properties of VI are presented and discussed. We focus on two kinds of properties: (1) those that help one build intuition about the new criterion (in particular, it is shown the VI is a true metric on the space of clusterings), and (2) those that pertain to the comparability of VI values over different experimental conditions. As the latter properties have rarely been discussed explicitly before, other existing comparison criteria are also examined in their light. Finally we present the VI from an axiomatic point of view, showing that it is the only "sensible" criterion for comparing partitions that is both aligned to the lattice and convexely additive. As a consequence, we prove an impossibility result for comparing partitions: there is no criterion for comparing partitions that simultaneously satisfies the above two desirable properties and is bounded.

Suggested Citation

  • Meila, Marina, 2007. "Comparing clusterings--an information based distance," Journal of Multivariate Analysis, Elsevier, vol. 98(5), pages 873-895, May.
  • Handle: RePEc:eee:jmvana:v:98:y:2007:i:5:p:873-895

    Download full text from publisher

    File URL:
    Download Restriction: Full text for ScienceDirect subscribers only

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    1. Lawrence Hubert & Phipps Arabie, 1985. "Comparing partitions," Journal of Classification, Springer;The Classification Society, vol. 2(1), pages 193-218, December.
    2. Wallace, Neil, 1983. "A comment on McCallum," Carnegie-Rochester Conference Series on Public Policy, Elsevier, vol. 18(1), pages 51-56, January.
    Full references (including those not matched with items on IDEAS)


    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.

    Cited by:

    1. Juan Lucio & Raúl Mínguez & Asier Minondo & Francisco Requena, 2016. "Networks and the Dynamics of Firms' Export Portfolio: Evidence for Mexico," The World Economy, Wiley Blackwell, vol. 39(5), pages 708-736, May.
    2. Lou, Hao & Li, Shenghong & Zhao, Yuxin, 2013. "Detecting community structure using label propagation with weighted coherent neighborhood propinquity," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 392(14), pages 3095-3105.
    3. Ekaterina Kovaleva & Boris Mirkin, 2015. "Bisecting K-Means and 1D Projection Divisive Clustering: A Unified Framework and Experimental Comparison," Journal of Classification, Springer;The Classification Society, vol. 32(3), pages 414-442, October.
    4. Isabella Morlini & Sergio Zani, 2012. "Dissimilarity and similarity measures for comparing dendrograms and their applications," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 6(2), pages 85-105, July.
    5. Assaf Almog & Ferry Besamusca & Mel MacMahon & Diego Garlaschelli, 2015. "Mesoscopic Community Structure of Financial Markets Revealed by Price and Sign Fluctuations," Papers 1504.00590,
    6. Piccardi, Carlo & Calatroni, Lisa & Bertoni, Fabio, 2010. "Communities in Italian corporate networks," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 389(22), pages 5247-5258.
    7. Luciana Crosilla & Marco Malgarini, 2011. "Behavioural models for manufacturing firms: analysing survey data," ECONOMIA E POLITICA INDUSTRIALE, FrancoAngeli Editore, vol. 2011(4), pages 139-163.
    8. Alan Lee & Bobby Willcox, 2014. "Minkowski Generalizations of Ward’s Method in Hierarchical Clustering," Journal of Classification, Springer;The Classification Society, vol. 31(2), pages 194-218, July.
    9. O’Hagan, Adrian & Murphy, Thomas Brendan & Gormley, Isobel Claire & McNicholas, Paul D. & Karlis, Dimitris, 2016. "Clustering with the multivariate normal inverse Gaussian distribution," Computational Statistics & Data Analysis, Elsevier, vol. 93(C), pages 18-30.
    10. Miloš Gligorić & Zoran Gligorić & Čedomir Beljić & Slavko Torbica & Svetlana Štrbac Savić & Jasmina Nedeljković Ostojić, 2016. "Multi-Attribute Technological Modeling of Coal Deposits Based on the Fuzzy TOPSIS and C-Mean Clustering Algorithms," Energies, MDPI, Open Access Journal, vol. 9(12), pages 1-23, December.


    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:jmvana:v:98:y:2007:i:5:p:873-895. See general information about how to correct material in RePEc.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: (Dana Niculescu). General contact details of provider: .

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service hosted by the Research Division of the Federal Reserve Bank of St. Louis . RePEc uses bibliographic data supplied by the respective publishers.