IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v66y2013icp178-192.html
   My bibliography  Save this article

Cluster Forests

Author

Listed:
  • Yan, Donghui
  • Chen, Aiyou
  • Jordan, Michael I.

Abstract

With inspiration from Random Forests (RF) in the context of classification, a new clustering ensemble method—Cluster Forests (CF) is proposed. Geometrically, CF randomly probes a high-dimensional data cloud to obtain “good local clusterings” and then aggregates via spectral clustering to obtain cluster assignments for the whole dataset. The search for good local clusterings is guided by a cluster quality measure kappa. CF progressively improves each local clustering in a fashion that resembles the tree growth in RF. Empirical studies on several real-world datasets under two different performance metrics show that CF compares favorably to its competitors. Theoretical analysis reveals that the kappa measure makes it possible to grow the local clustering in a desirable way—it is “noise-resistant”. A closed-form expression is obtained for the mis-clustering rate of spectral clustering under a perturbation model, which yields new insights into some aspects of spectral clustering.

Suggested Citation

  • Yan, Donghui & Chen, Aiyou & Jordan, Michael I., 2013. "Cluster Forests," Computational Statistics & Data Analysis, Elsevier, vol. 66(C), pages 178-192.
  • Handle: RePEc:eee:csdana:v:66:y:2013:i:c:p:178-192
    DOI: 10.1016/j.csda.2013.04.010
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167947313001400
    Download Restriction: Full text for ScienceDirect subscribers only.

    File URL: https://libkey.io/10.1016/j.csda.2013.04.010?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Nowicki K. & Snijders T. A. B., 2001. "Estimation and Prediction for Stochastic Blockstructures," Journal of the American Statistical Association, American Statistical Association, vol. 96, pages 1077-1087, September.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Gérard Biau & Erwan Scornet, 2016. "A random forest guided tour," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 25(2), pages 197-227, June.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Yunpeng Zhao & Qing Pan & Chengan Du, 2019. "Logistic regression augmented community detection for network data with application in identifying autism‐related gene pathways," Biometrics, The International Biometric Society, vol. 75(1), pages 222-234, March.
    2. Falk Bräuning & Siem Jan Koopman, 2016. "The dynamic factor network model with an application to global credit risk," Working Papers 16-13, Federal Reserve Bank of Boston.
    3. Leto Peel & Tiago P. Peixoto & Manlio De Domenico, 2022. "Statistical inference links data and theory in network science," Nature Communications, Nature, vol. 13(1), pages 1-15, December.
    4. Tom Britton, 2020. "Epidemic models on social networks—With inference," Statistica Neerlandica, Netherlands Society for Statistics and Operations Research, vol. 74(3), pages 222-241, August.
    5. Cristiano Varin & Manuela Cattelan & David Firth, 2016. "Statistical modelling of citation exchange between statistics journals," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 179(1), pages 1-63, January.
    6. Xiu Xu & Weining Wang & Yongcheol Shin & Chaowen Zheng, 2021. "Dynamic Network Quantile Regression Model," Papers 2111.07633, arXiv.org.
    7. Xu, Xiu & Wang, Weining & Shin, Yongcheol, 2020. "Dynamic Spatial Network Quantile Autoregression," IRTG 1792 Discussion Papers 2020-024, Humboldt University of Berlin, International Research Training Group 1792 "High Dimensional Nonstationary Time Series".
    8. Arora, Saurabh & Sanditov, Bulat, 2009. "Caste as Community? Networks of social affinity in a South Indian village," MERIT Working Papers 2009-037, United Nations University - Maastricht Economic and Social Research Institute on Innovation and Technology (MERIT).
    9. Irene Crimaldi & Michela Del Vicario & Greg Morrison & Walter Quattrociocchi & Massimo Riccaboni, 2015. "Homophily and Triadic Closure in Evolving Social Networks," Working Papers 3/2015, IMT School for Advanced Studies Lucca, revised May 2015.
    10. Teague R. Henry & Kathleen M. Gates & Mitchell J. Prinstein & Douglas Steinley, 2020. "Modeling Heterogeneous Peer Assortment Effects Using Finite Mixture Exponential Random Graph Models," Psychometrika, Springer;The Psychometric Society, vol. 85(1), pages 8-34, March.
    11. Li Guo & Wolfgang Karl Hardle & Yubo Tao, 2018. "A Time-Varying Network for Cryptocurrencies," Papers 1802.03708, arXiv.org, revised Nov 2022.
    12. Prasenjit Ghosh & Debdeep Pati & Anirban Bhattacharya, 2020. "Posterior Contraction Rates for Stochastic Block Models," Sankhya A: The Indian Journal of Statistics, Springer;Indian Statistical Institute, vol. 82(2), pages 448-476, August.
    13. Saint‐Clair Chabert‐Liddell & Pierre Barbillon & Sophie Donnet, 2022. "Impact of the mesoscale structure of a bipartite ecological interaction network on its robustness through a probabilistic modeling," Environmetrics, John Wiley & Sons, Ltd., vol. 33(2), March.
    14. Boyuan Zhang, 2022. "Incorporating Prior Knowledge of Latent Group Structure in Panel Data Models," Papers 2211.16714, arXiv.org, revised Oct 2023.
    15. Tracy M. Sweet, 2015. "Incorporating Covariates Into Stochastic Blockmodels," Journal of Educational and Behavioral Statistics, , vol. 40(6), pages 635-664, December.
    16. Hledik, Juraj & Rastelli, Riccardo, 2020. "A dynamic network model to measure exposure diversification in the Austrian interbank market," ESRB Working Paper Series 109, European Systemic Risk Board.
    17. Dragana M. Pavlović & Bryan R.L. Guillaume & Soroosh Afyouni & Thomas E. Nichols, 2020. "Multi‐subject stochastic blockmodels with mixed effects for adaptive analysis of individual differences in human brain network cluster structure," Statistica Neerlandica, Netherlands Society for Statistics and Operations Research, vol. 74(3), pages 363-396, August.
    18. ter Braak, Cajo J.F. & Kourmpetis, Yiannis & Kiers, Henk A.L. & Bink, Marco C.A.M., 2009. "Approximating a similarity matrix by a latent class model: A reappraisal of additive fuzzy clustering," Computational Statistics & Data Analysis, Elsevier, vol. 53(8), pages 3183-3193, June.
    19. Michael Brusco & Douglas Steinley, 2011. "A Tabu-Search Heuristic for Deterministic Two-Mode Blockmodeling of Binary Network Matrices," Psychometrika, Springer;The Psychometric Society, vol. 76(4), pages 612-633, October.
    20. Michael Schweinberger, 2020. "Statistical inference for continuous‐time Markov processes with block structure based on discrete‐time network data," Statistica Neerlandica, Netherlands Society for Statistics and Operations Research, vol. 74(3), pages 342-362, August.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:66:y:2013:i:c:p:178-192. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.