IDEAS home Printed from https://ideas.repec.org/a/bla/jorssb/v84y2022i5p1666-1698.html
   My bibliography  Save this article

Exact clustering in tensor block model: Statistical optimality and computational limit

Author

Listed:
  • Rungang Han
  • Yuetian Luo
  • Miaoyan Wang
  • Anru R. Zhang

Abstract

High‐order clustering aims to identify heterogeneous substructures in multiway datasets that arise commonly in neuroimaging, genomics, social network studies, etc. The non‐convex and discontinuous nature of this problem pose significant challenges in both statistics and computation. In this paper, we propose a tensor block model and the computationally efficient methods, high‐order Lloyd algorithm (HLloyd), and high‐order spectral clustering (HSC), for high‐order clustering. The convergence guarantees and statistical optimality are established for the proposed procedure under a mild sub‐Gaussian noise assumption. Under the Gaussian tensor block model, we completely characterise the statistical‐computational trade‐off for achieving high‐order exact clustering based on three different signal‐to‐noise ratio regimes. The analysis relies on new techniques of high‐order spectral perturbation analysis and a ‘singular‐value‐gap‐free’ error bound in tensor estimation, which are substantially different from the matrix spectral analyses in the literature. Finally, we show the merits of the proposed procedures via extensive experiments on both synthetic and real datasets.

Suggested Citation

  • Rungang Han & Yuetian Luo & Miaoyan Wang & Anru R. Zhang, 2022. "Exact clustering in tensor block model: Statistical optimality and computational limit," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 84(5), pages 1666-1698, November.
  • Handle: RePEc:bla:jorssb:v:84:y:2022:i:5:p:1666-1698
    DOI: 10.1111/rssb.12547
    as

    Download full text from publisher

    File URL: https://doi.org/10.1111/rssb.12547
    Download Restriction: no

    File URL: https://libkey.io/10.1111/rssb.12547?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. J. Carroll & Jih-Jie Chang, 1970. "Analysis of individual differences in multidimensional scaling via an n-way generalization of “Eckart-Young” decomposition," Psychometrika, Springer;The Psychometric Society, vol. 35(3), pages 283-319, September.
    2. Anru Zhang & Rungang Han, 2019. "Optimal Sparse Singular Value Decomposition for High-Dimensional High-Order Data," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 114(528), pages 1708-1725, October.
    3. Hua Zhou & Lexin Li & Hongtu Zhu, 2013. "Tensor Regression with Applications in Neuroimaging Data Analysis," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 108(502), pages 540-552, June.
    4. Carl Eckart & Gale Young, 1936. "The approximation of one matrix by another of lower rank," Psychometrika, Springer;The Psychometric Society, vol. 1(3), pages 211-218, September.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Gong, Tingnan & Zhang, Weiping & Chen, Yu, 2023. "Uncovering block structures in large rectangular matrices," Journal of Multivariate Analysis, Elsevier, vol. 198(C).

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Monica Billio & Roberto Casarin & Matteo Iacopini & Sylvia Kaufmann, 2023. "Bayesian Dynamic Tensor Regression," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 41(2), pages 429-439, April.
    2. Yoshio Takane & Forrest Young & Jan Leeuw, 1977. "Nonmetric individual differences multidimensional scaling: An alternating least squares method with optimal scaling features," Psychometrika, Springer;The Psychometric Society, vol. 42(1), pages 7-67, March.
    3. Will Wei Sun & Junwei Lu & Han Liu & Guang Cheng, 2017. "Provable sparse tensor decomposition," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 79(3), pages 899-916, June.
    4. Peter Schönemann, 1970. "On metric multidimensional unfolding," Psychometrika, Springer;The Psychometric Society, vol. 35(3), pages 349-366, September.
    5. Jos Berge & Henk Kiers, 1991. "Some clarifications of the CANDECOMP algorithm applied to INDSCAL," Psychometrika, Springer;The Psychometric Society, vol. 56(2), pages 317-326, June.
    6. Richard Sands & Forrest Young, 1980. "Component models for three-way data: An alternating least squares algorithm with optimal scaling features," Psychometrika, Springer;The Psychometric Society, vol. 45(1), pages 39-67, March.
    7. Paolo Giordani & Roberto Rocci & Giuseppe Bove, 2020. "Factor Uniqueness of the Structural Parafac Model," Psychometrika, Springer;The Psychometric Society, vol. 85(3), pages 555-574, September.
    8. Alwin Stegeman & Tam Lam, 2014. "Three-Mode Factor Analysis by Means of Candecomp/Parafac," Psychometrika, Springer;The Psychometric Society, vol. 79(3), pages 426-443, July.
    9. Köhn, Hans-Friedrich, 2010. "Representation of individual differences in rectangular proximity data through anti-Q matrix decomposition," Computational Statistics & Data Analysis, Elsevier, vol. 54(10), pages 2343-2357, October.
    10. Minghui Ding & Yimin Wei & Pengpeng Xie, 2023. "A Randomized Singular Value Decomposition for Third-Order Oriented Tensors," Journal of Optimization Theory and Applications, Springer, vol. 197(1), pages 358-382, April.
    11. Philip T. Reiss & Jeff Goldsmith & Han Lin Shang & R. Todd Ogden, 2017. "Methods for Scalar-on-Function Regression," International Statistical Review, International Statistical Institute, vol. 85(2), pages 228-249, August.
    12. Vivek F. Farias & Andrew A. L, 2019. "Learning Preferences with Side Information," Management Science, INFORMS, vol. 65(7), pages 3131-3149, July.
    13. Alwin Stegeman, 2018. "Simultaneous Component Analysis by Means of Tucker3," Psychometrika, Springer;The Psychometric Society, vol. 83(1), pages 21-47, March.
    14. John C. Gower & Niël J. Le Roux & Sugnet Gardner-Lubbe, 2022. "Properties of individual differences scaling and its interpretation," Statistical Papers, Springer, vol. 63(4), pages 1221-1245, August.
    15. Schoonees, P.C. & Groenen, P.J.F. & van de Velden, M., 2015. "Least-squares Bilinear Clustering of Three-way Data," Econometric Institute Research Papers EI2014-23, Erasmus University Rotterdam, Erasmus School of Economics (ESE), Econometric Institute.
    16. Pieter C. Schoonees & Patrick J. F. Groenen & Michel Velden, 2022. "Least-squares bilinear clustering of three-way data," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 16(4), pages 1001-1037, December.
    17. Mariela González-Narváez & María José Fernández-Gómez & Susana Mendes & José-Luis Molina & Omar Ruiz-Barzola & Purificación Galindo-Villardón, 2021. "Study of Temporal Variations in Species–Environment Association through an Innovative Multivariate Method: MixSTATICO," Sustainability, MDPI, vol. 13(11), pages 1-25, May.
    18. Lin Liu, 2021. "Matrix‐based introduction to multivariate data analysis, by KoheiAdachi 2nd edition. Singapore: Springer Nature, 2020. pp. 457," Biometrics, The International Biometric Society, vol. 77(4), pages 1498-1500, December.
    19. Sewell, Daniel K., 2018. "Visualizing data through curvilinear representations of matrices," Computational Statistics & Data Analysis, Elsevier, vol. 128(C), pages 255-270.
    20. Kohei Adachi & Nickolay T. Trendafilov, 2016. "Sparse principal component analysis subject to prespecified cardinality of loadings," Computational Statistics, Springer, vol. 31(4), pages 1403-1427, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:jorssb:v:84:y:2022:i:5:p:1666-1698. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: https://edirc.repec.org/data/rssssea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.