IDEAS home Printed from https://ideas.repec.org/a/oup/biomet/v104y2017i2p361-377..html
   My bibliography  Save this article

Covariate-assisted spectral clustering

Author

Listed:
  • N. Binkiewicz
  • J. T. Vogelstein
  • K. Rohe

Abstract

SummaryBiological and social systems consist of myriad interacting units. The interactions can be represented in the form of a graph or network. Measurements of these graphs can reveal the underlying structure of these interactions, which provides insight into the systems that generated the graphs. Moreover, in applications such as connectomics, social networks, and genomics, graph data are accompanied by contextualizing measures on each node. We utilize these node covariates to help uncover latent communities in a graph, using a modification of spectral clustering. Statistical guarantees are provided under a joint mixture model that we call the node-contextualized stochastic blockmodel, including a bound on the misclustering rate. The bound is used to derive conditions for achieving perfect clustering. For most simulated cases, covariate-assisted spectral clustering yields results superior both to regularized spectral clustering without node covariates and to an adaptation of canonical correlation analysis. We apply our clustering method to large brain graphs derived from diffusion MRI data, using the node locations or neurological region membership as covariates. In both cases, covariate-assisted spectral clustering yields clusters that are easier to interpret neurologically.

Suggested Citation

  • N. Binkiewicz & J. T. Vogelstein & K. Rohe, 2017. "Covariate-assisted spectral clustering," Biometrika, Biometrika Trust, vol. 104(2), pages 361-377.
  • Handle: RePEc:oup:biomet:v:104:y:2017:i:2:p:361-377.
    as

    Download full text from publisher

    File URL: http://hdl.handle.net/10.1093/biomet/asx008
    Download Restriction: Access to full text is restricted to subscribers.
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Fengqin Tang & Xuejing Zhao & Cuixia Li, 2023. "Community Detection in Multilayer Networks Based on Matrix Factorization and Spectral Embedding Method," Mathematics, MDPI, vol. 11(7), pages 1-19, March.
    2. Heather Mathews & Alexander Volfovsky, 2023. "Community informed experimental design," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 32(4), pages 1141-1166, October.
    3. Junlong Zhao & Xiumin Liu & Hansheng Wang & Chenlei Leng, 2022. "Dimension reduction for covariates in network data [On semidefinite relaxations for the block model]," Biometrika, Biometrika Trust, vol. 109(1), pages 85-102.
    4. Fengqin Tang & Chunning Wang & Jinxia Su & Yuanyuan Wang, 2020. "Spectral clustering-based community detection using graph distance and node attributes," Computational Statistics, Springer, vol. 35(1), pages 69-94, March.
    5. Guo, Li & Tao, Yubo & Härdle, Wolfgang Karl, 2019. "Dynamic Network Perspective of Cryptocurrencies," IRTG 1792 Discussion Papers 2019-009, Humboldt University of Berlin, International Research Training Group 1792 "High Dimensional Nonstationary Time Series".
    6. Qiuping Wang & Yuan Zhang & Ting Yan, 2023. "Asymptotic theory in network models with covariates and a growing number of node parameters," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 75(2), pages 369-392, April.
    7. Ma, Shujie & Su, Liangjun & Zhang, Yichong, 2020. "Detecting Latent Communities in Network Formation Models," Economics and Statistics Working Papers 12-2020, Singapore Management University, School of Economics.
    8. Li Guo & Wolfgang Karl Hardle & Yubo Tao, 2018. "A Time-Varying Network for Cryptocurrencies," Papers 1802.03708, arXiv.org, revised Nov 2022.
    9. Junhui Cai & Dan Yang & Wu Zhu & Haipeng Shen & Linda Zhao, 2021. "Network regression and supervised centrality estimation," Papers 2111.12921, arXiv.org.
    10. Lucy L. Gao & Daniela Witten & Jacob Bien, 2022. "Testing for association in multiview network data," Biometrics, The International Biometric Society, vol. 78(3), pages 1018-1030, September.
    11. S Chandna & S C Olhede & P J Wolfe, 2022. "Local linear graphon estimation using covariates [Representations for partially exchangeable arrays of random variables]," Biometrika, Biometrika Trust, vol. 109(3), pages 721-734.
    12. Babkin, Sergii & Stewart, Jonathan R. & Long, Xiaochen & Schweinberger, Michael, 2020. "Large-scale estimation of random graph models with local dependence," Computational Statistics & Data Analysis, Elsevier, vol. 152(C).

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:oup:biomet:v:104:y:2017:i:2:p:361-377.. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Oxford University Press (email available below). General contact details of provider: https://academic.oup.com/biomet .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.