IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0159161.html
   My bibliography  Save this article

Analysis of Network Clustering Algorithms and Cluster Quality Metrics at Scale

Author

Listed:
  • Scott Emmons
  • Stephen Kobourov
  • Mike Gallant
  • Katy Börner

Abstract

Overview: Notions of community quality underlie the clustering of networks. While studies surrounding network clustering are increasingly common, a precise understanding of the realtionship between different cluster quality metrics is unknown. In this paper, we examine the relationship between stand-alone cluster quality metrics and information recovery metrics through a rigorous analysis of four widely-used network clustering algorithms—Louvain, Infomap, label propagation, and smart local moving. We consider the stand-alone quality metrics of modularity, conductance, and coverage, and we consider the information recovery metrics of adjusted Rand score, normalized mutual information, and a variant of normalized mutual information used in previous work. Our study includes both synthetic graphs and empirical data sets of sizes varying from 1,000 to 1,000,000 nodes. Cluster Quality Metrics: We find significant differences among the results of the different cluster quality metrics. For example, clustering algorithms can return a value of 0.4 out of 1 on modularity but score 0 out of 1 on information recovery. We find conductance, though imperfect, to be the stand-alone quality metric that best indicates performance on the information recovery metrics. Additionally, our study shows that the variant of normalized mutual information used in previous work cannot be assumed to differ only slightly from traditional normalized mutual information. Network Clustering Algorithms: Smart local moving is the overall best performing algorithm in our study, but discrepancies between cluster evaluation metrics prevent us from declaring it an absolutely superior algorithm. Interestingly, Louvain performed better than Infomap in nearly all the tests in our study, contradicting the results of previous work in which Infomap was superior to Louvain. We find that although label propagation performs poorly when clusters are less clearly defined, it scales efficiently and accurately to large graphs with well-defined clusters.

Suggested Citation

  • Scott Emmons & Stephen Kobourov & Mike Gallant & Katy Börner, 2016. "Analysis of Network Clustering Algorithms and Cluster Quality Metrics at Scale," PLOS ONE, Public Library of Science, vol. 11(7), pages 1-18, July.
  • Handle: RePEc:plo:pone00:0159161
    DOI: 10.1371/journal.pone.0159161
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0159161
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0159161&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0159161?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Ludo Waltman & Nees Eck, 2013. "A smart local moving algorithm for large-scale modularity-based community detection," The European Physical Journal B: Condensed Matter and Complex Systems, Springer;EDP Sciences, vol. 86(11), pages 1-14, November.
    2. Barabási, Albert-László & Ravasz, Erzsébet & Vicsek, Tamás, 2001. "Deterministic scale-free networks," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 299(3), pages 559-564.
    3. Franke, R., 2016. "CHIMERA: Top-down model for hierarchical, overlapping and directed cluster structures in directed and weighted complex networks," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 461(C), pages 384-408.
    4. Michael T Schaub & Jean-Charles Delvenne & Sophia N Yaliraki & Mauricio Barahona, 2012. "Markov Dynamics as a Zooming Lens for Multiscale Community Detection: Non Clique-Like Communities and the Field-of-View Limit," PLOS ONE, Public Library of Science, vol. 7(2), pages 1-11, February.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Kevin W. Boyack, 2017. "Investigating the effect of global data on topic detection," Scientometrics, Springer;Akadémiai Kiadó, vol. 111(2), pages 999-1015, May.
    2. Andrej Srakar, 2017. "Prevalence of Diseases and Health Care Utilization ofthe Self-Employed Artists and TheirEmpirical Determinants: Evidence From a Slovenian Survey," ACEI Working Paper Series AWP-08-2017, Association for Cultural Economics International, revised Sep 2017.
    3. Katy Börner & Adam H. Simpson & Andreas Bueckle & Robert L. Goldstone, 2018. "Science map metaphors: a comparison of network versus hexmap-based visualizations," Scientometrics, Springer;Akadémiai Kiadó, vol. 114(2), pages 409-426, February.
    4. Marlen Komorowski & Ruxandra Lupu & Sara Pepper & Justin Lewis, 2021. "Joining the Dots—Understanding the Value Generation of Creative Networks for Sustainability in Local Creative Ecosystems," Sustainability, MDPI, vol. 13(22), pages 1-16, November.
    5. Fabio Bento & Marco Tagliabue & Flora Lorenzo, 2020. "Organizational Silos: A Scoping Review Informed by a Behavioral Perspective on Systems and Networks," Societies, MDPI, vol. 10(3), pages 1-27, July.
    6. Laura Freeman & Abdul Rahman & Feras A. Batarseh, 2021. "Enabling Artificial Intelligence Adoption through Assurance," Social Sciences, MDPI, vol. 10(9), pages 1-15, August.
    7. Olga Nechaeva & Valentina Mazzoli & Raffaele Donvito, 2023. "Brand engagement into self-concept and culture: a literature review for a future research agenda," Journal of Brand Management, Palgrave Macmillan, vol. 30(5), pages 414-431, September.
    8. Chen, Hailiang & Chen, Bin & Ai, Chuan & Zhu, Mengna & Qiu, Xiaogang, 2022. "The evolving network model with community size and distance preferences," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 596(C).
    9. Olivia Fischer & Loris T. Jeitziner & Dirk U. Wulff, 2024. "Affect in science communication: a data-driven analysis of TED Talks on YouTube," Palgrave Communications, Palgrave Macmillan, vol. 11(1), pages 1-9, December.
    10. Sitaram Devarakonda & Dmitriy Korobskiy & Tandy Warnow & George Chacko, 2020. "Viewing computer science through citation analysis: Salton and Bergmark Redux," Scientometrics, Springer;Akadémiai Kiadó, vol. 125(1), pages 271-287, October.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Lutz Bornmann & Robin Haunschild & Sven E. Hug, 2018. "Visualizing the context of citations referencing papers published by Eugene Garfield: a new type of keyword co-occurrence analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 114(2), pages 427-437, February.
    2. Natalya Ivanova & Ekaterina Zolotova, 2023. "Landolt Indicator Values in Modern Research: A Review," Sustainability, MDPI, vol. 15(12), pages 1-22, June.
    3. Nina Sakinah Ahmad Rofaie & Seuk Wai Phoong & Muzalwana Abdul Talib & Ainin Sulaiman, 2023. "Light-emitting diode (LED) research: A bibliometric analysis during 2003–2018," Quality & Quantity: International Journal of Methodology, Springer, vol. 57(1), pages 173-191, February.
    4. Giovanni Matteo & Pierfrancesco Nardi & Stefano Grego & Caterina Guidi, 2018. "Bibliometric analysis of Climate Change Vulnerability Assessment research," Environment Systems and Decisions, Springer, vol. 38(4), pages 508-516, December.
    5. Yi-Ming Wei & Jin-Wei Wang & Tianqi Chen & Bi-Ying Yu & Hua Liao, 2018. "Frontiers of Low-Carbon Technologies: Results from Bibliographic Coupling with Sliding Window," CEEP-BIT Working Papers 116, Center for Energy and Environmental Policy Research (CEEP), Beijing Institute of Technology.
    6. Loredana Canfora & Corrado Costa & Federico Pallottino & Stefano Mocali, 2021. "Trends in Soil Microbial Inoculants Research: A Science Mapping Approach to Unravel Strengths and Weaknesses of Their Application," Agriculture, MDPI, vol. 11(2), pages 1-21, February.
    7. Zhong, Xiang & Liu, Jiajun & Gao, Yong & Wu, Lun, 2017. "Analysis of co-occurrence toponyms in web pages based on complex networks," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 466(C), pages 462-475.
    8. Evi Sachini & Nikolaos Karampekios & Pierpaolo Brutti & Konstantinos Sioumalas-Christodoulou, 2020. "Should I stay or should I go? Using bibliometrics to identify the international mobility of highly educated Greek manpower," Scientometrics, Springer;Akadémiai Kiadó, vol. 125(1), pages 641-663, October.
    9. Blagus, Neli & Šubelj, Lovro & Bajec, Marko, 2012. "Self-similar scaling of density in complex real-world networks," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 391(8), pages 2794-2802.
    10. Vanessa Ioannoni & Tommaso Vitale & Corrado Costa & Iris Elliott, 2020. "Depicting communities of Romani studies: on the who, when and where of Roma related scientific publications," Scientometrics, Springer;Akadémiai Kiadó, vol. 122(3), pages 1473-1490, March.
    11. Claudio M. Rocco & Kash Barker & Jose Moronta, 2022. "Determining the best algorithm to detect community structures in networks: application to power systems," Environment Systems and Decisions, Springer, vol. 42(2), pages 251-264, June.
    12. Razdan, Ashok, 2013. "Networks in extensive air showers," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 392(4), pages 982-986.
    13. Gao, Yan & Liu, Gengyuan & Casazza, Marco & Hao, Yan & Zhang, Yan & Giannetti, Biagio F., 2018. "Economy-pollution nexus model of cities at river basin scale based on multi-agent simulation: A conceptual framework," Ecological Modelling, Elsevier, vol. 379(C), pages 22-38.
    14. Jensen, Scott & Liu, Xiaozhong & Yu, Yingying & Milojevic, Staša, 2016. "Generation of topic evolution trees from heterogeneous bibliographic networks," Journal of Informetrics, Elsevier, vol. 10(2), pages 606-621.
    15. Blasi, Monica Francesca & Casorelli, Ida & Colosimo, Alfredo & Blasi, Francesco Simone & Bignami, Margherita & Giuliani, Alessandro, 2005. "A recursive network approach can identify constitutive regulatory circuits in gene expression data," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 348(C), pages 349-370.
    16. Collins C. Okolie & Gideon Danso-Abbeam & Okechukwu Groupson-Paul & Abiodun A. Ogundeji, 2022. "Climate-Smart Agriculture Amidst Climate Change to Enhance Agricultural Production: A Bibliometric Analysis," Land, MDPI, vol. 12(1), pages 1-23, December.
    17. Oleg E. Karpov & Elena N. Pitsik & Semen A. Kurkin & Vladimir A. Maksimenko & Alexander V. Gusev & Natali N. Shusharina & Alexander E. Hramov, 2023. "Analysis of Publication Activity and Research Trends in the Field of AI Medical Applications: Network Approach," IJERPH, MDPI, vol. 20(7), pages 1-17, March.
    18. Tokuda, Eric K. & Comin, Cesar H. & Costa, Luciano da F., 2022. "Revisiting agglomerative clustering," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 585(C).
    19. Gurzki, Hannes & Woisetschläger, David M., 2017. "Mapping the luxury research landscape: A bibliometric citation analysis," Journal of Business Research, Elsevier, vol. 77(C), pages 147-166.
    20. Zhong, Sheng & Verspagen, Bart, 2016. "The role of technological trajectories in catching-up-based development: An application to energy efficiency technologies," MERIT Working Papers 2016-013, United Nations University - Maastricht Economic and Social Research Institute on Innovation and Technology (MERIT).

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0159161. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.