IDEAS home Printed from https://ideas.repec.org/a/plo/pcsy00/0000069.html
   My bibliography  Save this article

FastEnsemble: Scalable ensemble clustering on large networks

Author

Listed:
  • Yasamin Tabatabaee
  • Eleanor Wedell
  • Minhyuk Park
  • Tandy Warnow

Abstract

Many community detection algorithms are inherently stochastic, leading to variations in their output depending on input parameters and random seeds. This variability makes the results of a single run of these algorithms less reliable. Moreover, different clustering algorithms, optimization criteria (e.g., modularity and the Constant Potts model), and resolution values can result in substantially different partitions on the same network. Consensus clustering methods, such as Ensemble Clustering for Graphs (ECG) and FastConsensus, have been proposed to reduce the instability of non-deterministic algorithms and improve their accuracy by combining a set of partitions resulting from multiple runs of a clustering algorithm. In Complex Networks and their Applications 2024, we introduced FastEnsemble, a new consensus clustering method; here we present a more extensive evaluation of this method. Our results on both real-world and synthetic networks show that FastEnsemble produces more accurate clusterings than two other consensus clustering methods, ECG and FastConsensus, for many model conditions. Furthermore, FastEnsemble is fast enough to be used on networks with more than 3 million nodes, and so improves on the speed and scalability of FastConsensus. Finally, we showcase the utility of consensus clustering methods in mitigating the effect of resolution limit and clustering networks that are only partially covered by communities.Author summary: Consensus (ensemble) clustering methods, such as FastConsensus and Ensemble Clustering for Graphs (ECG), combine partitions from multiple runs of the same clustering algorithm, in order to improve stability and accuracy of the output partition. In this study, we present a new ensemble clustering method, FastEnsemble, and show that it provides improved accuracy under many conditions compared to FastConsensus and ECG. We show results using FastEnsemble with Leiden optimizing modularity or the Constant Potts model (CPM) and the Louvain algorithm. We show that FastEnsemble and other consensus clustering methods can reduce the effect of resolution limit for both modularity- and CPM-optimization. Finally, we demonstrate that consensus clustering methods can improve community detection over modularity-optimization using Leiden on networks with both clusterable and unclusterable regions.

Suggested Citation

  • Yasamin Tabatabaee & Eleanor Wedell & Minhyuk Park & Tandy Warnow, 2025. "FastEnsemble: Scalable ensemble clustering on large networks," PLOS Complex Systems, Public Library of Science, vol. 2(10), pages 1-29, October.
  • Handle: RePEc:plo:pcsy00:0000069
    DOI: 10.1371/journal.pcsy.0000069
    as

    Download full text from publisher

    File URL: https://journals.plos.org/complexsystems/article?id=10.1371/journal.pcsy.0000069
    Download Restriction: no

    File URL: https://journals.plos.org/complexsystems/article/file?id=10.1371/journal.pcsy.0000069&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcsy.0000069?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcsy00:0000069. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: complexsystem (email available below). General contact details of provider: https://journals.plos.org/complexsystems/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.