IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v212y2025ics0167947325001082.html

Model-based clustering for covariance matrices via penalized Wishart mixture models

Author

Listed:
  • Cappozzo, Andrea
  • Casa, Alessandro

Abstract

Covariance matrices provide a valuable source of information about complex interactions and dependencies within the data. However, from a clustering perspective, this information has often been underutilized and overlooked. Indeed, commonly adopted distance-based approaches tend to rely primarily on mean levels to characterize and differentiate between groups. Recently, there have been promising efforts to cluster covariance matrices directly, thereby distinguishing groups solely based on the relationships between variables. From a model-based perspective, a probabilistic formalization has been provided by considering a mixture model with component densities following a Wishart distribution. Notwithstanding, this approach faces challenges when dealing with a large number of variables, as the number of parameters to be estimated increases quadratically. To address this issue, a sparse Wishart mixture model is proposed, which assumes that the component scale matrices possess a cluster-dependent degree of sparsity. Model estimation is performed by maximizing a penalized log-likelihood, enforcing a covariance graphical lasso penalty on the component scale matrices. This penalty not only reduces the number of non-zero parameters, mitigating the challenges of high-dimensional settings, but also enhances the interpretability of results by emphasizing the most relevant relationships among variables. The proposed methodology is tested on both simulated and real data, demonstrating its ability to unravel the complexities of neuroimaging data and effectively cluster subjects based on the relational patterns among distinct brain regions.

Suggested Citation

  • Cappozzo, Andrea & Casa, Alessandro, 2025. "Model-based clustering for covariance matrices via penalized Wishart mixture models," Computational Statistics & Data Analysis, Elsevier, vol. 212(C).
  • Handle: RePEc:eee:csdana:v:212:y:2025:i:c:s0167947325001082
    DOI: 10.1016/j.csda.2025.108232
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167947325001082
    Download Restriction: Full text for ScienceDirect subscribers only.

    File URL: https://libkey.io/10.1016/j.csda.2025.108232?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to

    for a different version of it.

    References listed on IDEAS

    as
    1. A. Hassairi & S. Lajmi, 2001. "Riesz Exponential Families on Symmetric Cones," Journal of Theoretical Probability, Springer, vol. 14(4), pages 927-948, October.
    2. Andrews, Jeffrey L. & McNicholas, Paul D. & Subedi, Sanjeena, 2011. "Model-based classification via mixtures of multivariate t-distributions," Computational Statistics & Data Analysis, Elsevier, vol. 55(1), pages 520-529, January.
    3. Jacob Bien & Florentina Bunea & Luo Xiao, 2016. "Convex Banding of the Covariance Matrix," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 111(514), pages 834-845, April.
    4. Mark S. Handcock & Adrian E. Raftery & Jeremy M. Tantrum, 2007. "Model‐based clustering for social networks," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 170(2), pages 301-354, March.
    5. Lawrence Hubert & Phipps Arabie, 1985. "Comparing partitions," Journal of Classification, Springer;The Classification Society, vol. 2(1), pages 193-218, December.
    6. De la Cruz-Mesia, Rolando & Quintana, Fernando A. & Marshall, Guillermo, 2008. "Model-based clustering for longitudinal data," Computational Statistics & Data Analysis, Elsevier, vol. 52(3), pages 1441-1457, January.
    7. Warton, David I., 2008. "Penalized Normal Likelihood and Ridge Regularization of Correlation and Covariance Matrices," Journal of the American Statistical Association, American Statistical Association, vol. 103, pages 340-349, March.
    8. Cai, Tony & Liu, Weidong, 2011. "Adaptive Thresholding for Sparse Covariance Matrix Estimation," Journal of the American Statistical Association, American Statistical Association, vol. 106(494), pages 672-684.
    9. Charles Bouveyron & Julien Jacques, 2011. "Model-based clustering of time series in group-specific functional subspaces," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 5(4), pages 281-300, December.
    10. Jacob Bien, 2019. "Graph-Guided Banding of the Covariance Matrix," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 114(526), pages 782-792, April.
    11. Jacob Bien & Robert J. Tibshirani, 2011. "Sparse estimation of a covariance matrix," Biometrika, Biometrika Trust, vol. 98(4), pages 807-820.
    12. Salvatore D. Tomarchio & Luca Bagnato & Antonio Punzo, 2024. "Model-based clustering using a new multivariate skew distribution," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 18(1), pages 61-83, March.
    13. Tounsi, Mariem & Zine, Raoudha, 2012. "The inverse Riesz probability distribution on symmetric matrices," Journal of Multivariate Analysis, Elsevier, vol. 111(C), pages 174-182.
    14. Sanjay Chaudhuri & Mathias Drton & Thomas S. Richardson, 2007. "Estimation of a covariance matrix with zeros," Biometrika, Biometrika Trust, vol. 94(1), pages 199-216.
    15. Alessandro Casa & Andrea Cappozzo & Michael Fop, 2022. "Group-Wise Shrinkage Estimation in Penalized Model-Based Clustering," Journal of Classification, Springer;The Classification Society, vol. 39(3), pages 648-674, November.
    16. Ming Yuan & Yi Lin, 2007. "Model selection and estimation in the Gaussian graphical model," Biometrika, Biometrika Trust, vol. 94(1), pages 19-35.
    17. Salvatore D. Tomarchio & Antonio Punzo, 2025. "On the Number of Components for Matrix‐Variate Mixtures: A Comparison Among Information Criteria," International Statistical Review, International Statistical Institute, vol. 93(2), pages 222-245, August.
    18. Fionn Murtagh & Pierre Legendre, 2014. "Ward’s Hierarchical Agglomerative Clustering Method: Which Algorithms Implement Ward’s Criterion?," Journal of Classification, Springer;The Classification Society, vol. 31(3), pages 274-295, October.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Lam, Clifford, 2020. "High-dimensional covariance matrix estimation," LSE Research Online Documents on Economics 101667, London School of Economics and Political Science, LSE Library.
    2. Alessandro Casa & Andrea Cappozzo & Michael Fop, 2022. "Group-Wise Shrinkage Estimation in Penalized Model-Based Clustering," Journal of Classification, Springer;The Classification Society, vol. 39(3), pages 648-674, November.
    3. Alexa A. Sochaniwsky & Michael P. B. Gallaugher & Yang Tang & Paul D. McNicholas, 2025. "Flexible Clustering with a Sparse Mixture of Generalized Hyperbolic Distributions," Journal of Classification, Springer;The Classification Society, vol. 42(1), pages 113-133, March.
    4. Lin Zhang & Andrew DiLernia & Karina Quevedo & Jazmin Camchong & Kelvin Lim & Wei Pan, 2021. "A random covariance model for bi‐level graphical modeling with application to resting‐state fMRI data," Biometrics, The International Biometric Society, vol. 77(4), pages 1385-1396, December.
    5. A. Bekker & A. Kheyri & M. Arashi, 2026. "Augmented Graphical Ridge Estimation with Application in the Cryptocurrency Market," Computational Economics, Springer;Society for Computational Economics, vol. 67(2), pages 781-825, February.
    6. Bailey, Natalia & Pesaran, M. Hashem & Smith, L. Vanessa, 2019. "A multiple testing approach to the regularisation of large sample correlation matrices," Journal of Econometrics, Elsevier, vol. 208(2), pages 507-534.
    7. Azam Kheyri & Andriette Bekker & Mohammad Arashi, 2022. "High-Dimensional Precision Matrix Estimation through GSOS with Application in the Foreign Exchange Market," Mathematics, MDPI, vol. 10(22), pages 1-19, November.
    8. Sung, Bongjung & Lee, Jaeyong, 2023. "Covariance structure estimation with Laplace approximation," Journal of Multivariate Analysis, Elsevier, vol. 198(C).
    9. Alessandro Casa & Charles Bouveyron & Elena Erosheva & Giovanna Menardi, 2021. "Co-clustering of Time-Dependent Data via the Shape Invariant Model," Journal of Classification, Springer;The Classification Society, vol. 38(3), pages 626-649, October.
    10. Sven Husmann & Antoniya Shivarova & Rick Steinert, 2021. "Cross-validated covariance estimators for high-dimensional minimum-variance portfolios," Financial Markets and Portfolio Management, Springer;Swiss Society for Financial Market Research, vol. 35(3), pages 309-352, September.
    11. Cristina Anton & Iain Smith, 2024. "Model-based clustering of functional data via mixtures of t distributions," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 18(3), pages 563-595, September.
    12. Irene Vrbik & Paul McNicholas, 2015. "Fractionally-Supervised Classification," Journal of Classification, Springer;The Classification Society, vol. 32(3), pages 359-381, October.
    13. Maurizio Vichi & Carlo Cavicchia & Patrick J. F. Groenen, 2022. "Hierarchical Means Clustering," Journal of Classification, Springer;The Classification Society, vol. 39(3), pages 553-577, November.
    14. Avagyan, Vahe & Alonso Fernández, Andrés Modesto & Nogales, Francisco J., 2015. "D-trace Precision Matrix Estimation Using Adaptive Lasso Penalties," DES - Working Papers. Statistics and Econometrics. WS 21775, Universidad Carlos III de Madrid. Departamento de Estadística.
    15. Wan-Lun Wang, 2019. "Mixture of multivariate t nonlinear mixed models for multiple longitudinal data with heterogeneity and missing values," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 28(1), pages 196-222, March.
    16. Guang Ouyang & Dipak K. Dey & Panpan Zhang, 2020. "Clique-Based Method for Social Network Clustering," Journal of Classification, Springer;The Classification Society, vol. 37(1), pages 254-274, April.
    17. Seunghwan Lee & Sang Cheol Kim & Donghyeon Yu, 2023. "An efficient GPU-parallel coordinate descent algorithm for sparse precision matrix estimation via scaled lasso," Computational Statistics, Springer, vol. 38(1), pages 217-242, March.
    18. Wessel N. Wieringen & Gwenaël G. R. Leday, 2024. "Ridge-type covariance and precision matrix estimators of the multivariate normal distribution," Statistical Papers, Springer, vol. 65(9), pages 5835-5849, December.
    19. Benjamin Poignard & Manabu Asai, 2023. "Estimation of high-dimensional vector autoregression via sparse precision matrix," The Econometrics Journal, Royal Economic Society, vol. 26(2), pages 307-326.
    20. Dong Liu & Changwei Zhao & Yong He & Lei Liu & Ying Guo & Xinsheng Zhang, 2023. "Simultaneous cluster structure learning and estimation of heterogeneous graphs for matrix‐variate fMRI data," Biometrics, The International Biometric Society, vol. 79(3), pages 2246-2259, September.

    More about this item

    Keywords

    ;
    ;
    ;
    ;
    ;
    ;

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:212:y:2025:i:c:s0167947325001082. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.