IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v12y2024i7p1091-d1370085.html
   My bibliography  Save this article

Simultaneous Bayesian Clustering and Model Selection with Mixture of Robust Factor Analyzers

Author

Listed:
  • Shan Feng

    (School of Mathematics and Statistics, Northwestern Polytechnical University, Xi’an 710129, China
    College of Statistics, Xi’an University of Finance and Economics, Xi’an 710100, China)

  • Wenxian Xie

    (School of Mathematics and Statistics, Northwestern Polytechnical University, Xi’an 710129, China)

  • Yufeng Nie

    (School of Mathematics and Statistics, Northwestern Polytechnical University, Xi’an 710129, China)

Abstract

Finite Gaussian mixture models are powerful tools for modeling distributions of random phenomena and are widely used for clustering tasks. However, their interpretability and efficiency are often degraded by the impact of redundancy and noise, especially on high-dimensional datasets. In this work, we propose a generative graphical model for parsimonious modeling of the Gaussian mixtures and robust unsupervised learning. The model assumes that the data are generated independently and identically from a finite mixture of robust factor analyzers, where the features’ salience is adjusted by an active set of latent factors to allow a violation of the local independence assumption. For the model inference, we propose a structured variational Bayes inference framework to realize simultaneous clustering, model selection and outlier processing. Performance of the proposed algorithm is evaluated by conducting experiments on artificial and real-world datasets. Moreover, an application on the high-dimensional machine learning task of handwritten alphabet recognition is introduced.

Suggested Citation

  • Shan Feng & Wenxian Xie & Yufeng Nie, 2024. "Simultaneous Bayesian Clustering and Model Selection with Mixture of Robust Factor Analyzers," Mathematics, MDPI, vol. 12(7), pages 1-23, April.
  • Handle: RePEc:gam:jmathe:v:12:y:2024:i:7:p:1091-:d:1370085
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/12/7/1091/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/12/7/1091/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Bouveyron, Charles & Brunet-Saumard, Camille, 2014. "Model-based clustering of high-dimensional data: A review," Computational Statistics & Data Analysis, Elsevier, vol. 71(C), pages 52-78.
    2. Fan, Jianqing & Ke, Yuan & Wang, Kaizheng, 2020. "Factor-adjusted regularized model selection," Journal of Econometrics, Elsevier, vol. 216(1), pages 71-85.
    3. Zhang, Chun-Xia & Xu, Shuang & Zhang, Jiang-She, 2019. "A novel variational Bayesian method for variable selection in logistic regression models," Computational Statistics & Data Analysis, Elsevier, vol. 133(C), pages 1-19.
    4. Emilie Devijver & Mélina Gallopin, 2018. "Block-Diagonal Covariance Selection for High-Dimensional Gaussian Graphical Models," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 113(521), pages 306-314, January.
    5. McLachlan, G.J. & Bean, R.W. & Ben-Tovim Jones, L., 2007. "Extension of the mixture of factor analyzers model to incorporate the multivariate t-distribution," Computational Statistics & Data Analysis, Elsevier, vol. 51(11), pages 5327-5338, July.
    6. Qing Mai & Hui Zou & Ming Yuan, 2012. "A direct approach to sparse discriminant analysis in ultra-high dimensions," Biometrika, Biometrika Trust, vol. 99(1), pages 29-42.
    7. A. Bhattacharya & D. B. Dunson, 2011. "Sparse Bayesian infinite factor models," Biometrika, Biometrika Trust, vol. 98(2), pages 291-306.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Paul D. McNicholas, 2016. "Model-Based Clustering," Journal of Classification, Springer;The Classification Society, vol. 33(3), pages 331-373, October.
    2. Murray, Paula M. & Browne, Ryan P. & McNicholas, Paul D., 2017. "A mixture of SDB skew-t factor analyzers," Econometrics and Statistics, Elsevier, vol. 3(C), pages 160-168.
    3. Cristina Tortora & Paul D. McNicholas & Ryan P. Browne, 2016. "A mixture of generalized hyperbolic factor analyzers," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 10(4), pages 423-440, December.
    4. Hauzenberger, Niko & Huber, Florian & Klieber, Karin & Marcellino, Massimiliano, 2025. "Bayesian neural networks for macroeconomic analysis," Journal of Econometrics, Elsevier, vol. 249(PC).
    5. Miao He & Yanhong Guo, 2022. "Systemic Risk Contributions of Financial Institutions during the Stock Market Crash in China," Sustainability, MDPI, vol. 14(9), pages 1-14, April.
    6. Faicel Chamroukhi, 2016. "Piecewise Regression Mixture for Simultaneous Functional Data Clustering and Optimal Segmentation," Journal of Classification, Springer;The Classification Society, vol. 33(3), pages 374-411, October.
    7. Conti, Gabriella & Frühwirth-Schnatter, Sylvia & Heckman, James J. & Piatek, Rémi, 2014. "Bayesian exploratory factor analysis," Journal of Econometrics, Elsevier, vol. 183(1), pages 31-57.
    8. Niko Hauzenberger & Maximilian Bock & Michael Pfarrhofer & Anna Stelzer & Gregor Zens, 2018. "Implications of macroeconomic volatility in the Euro area," Papers 1801.02925, arXiv.org, revised Jun 2018.
    9. Chaofeng Yuan & Wensheng Zhu & Xuming He & Jianhua Guo, 2019. "A mixture factor model with applications to microarray data," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 28(1), pages 60-76, March.
    10. Matthew W. Wheeler, 2019. "Bayesian additive adaptive basis tensor product models for modeling high dimensional surfaces: an application to high‐throughput toxicity testing," Biometrics, The International Biometric Society, vol. 75(1), pages 193-201, March.
    11. Benati, S. & Conde, E., 2022. "A relative robust approach on expected returns with bounded CVaR for portfolio selection," European Journal of Operational Research, Elsevier, vol. 296(1), pages 332-352.
    12. Joshua C. C. Chan, 2024. "BVARs and stochastic volatility," Chapters, in: Michael P. Clements & Ana Beatriz Galvão (ed.), Handbook of Research Methods and Applications in Macroeconomic Forecasting, chapter 3, pages 43-67, Edward Elgar Publishing.
    13. Durante, Daniele, 2017. "A note on the multiplicative gamma process," Statistics & Probability Letters, Elsevier, vol. 122(C), pages 198-204.
    14. Lin, Tsung-I & McNicholas, Paul D. & Ho, Hsiu J., 2014. "Capturing patterns via parsimonious t mixture models," Statistics & Probability Letters, Elsevier, vol. 88(C), pages 80-87.
    15. Yuan Liao & Xinjie Ma & Andreas Neuhierl & Zhentao Shi, 2023. "Economic Forecasts Using Many Noises," Papers 2312.05593, arXiv.org, revised Dec 2023.
    16. Loperfido, Nicola, 2018. "Skewness-based projection pursuit: A computational approach," Computational Statistics & Data Analysis, Elsevier, vol. 120(C), pages 42-57.
    17. Zeyu Wu & Cheng Wang & Weidong Liu, 2023. "A unified precision matrix estimation framework via sparse column-wise inverse operator under weak sparsity," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 75(4), pages 619-648, August.
    18. Kenneth D Harris & Hannah Hochgerner & Nathan G Skene & Lorenza Magno & Linda Katona & Carolina Bengtsson Gonzales & Peter Somogyi & Nicoletta Kessaris & Sten Linnarsson & Jens Hjerling-Leffler, 2018. "Classes and continua of hippocampal CA1 inhibitory neurons revealed by single-cell transcriptomics," PLOS Biology, Public Library of Science, vol. 16(6), pages 1-37, June.
    19. Tao Sun, 2024. "Bundle Choice Model with Endogenous Regressors: An Application to Soda Tax," Papers 2412.05794, arXiv.org.
    20. Simon Beyeler & Sylvia Kaufmann, 2021. "Reduced‐form factor augmented VAR—Exploiting sparsity to include meaningful factors," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 36(7), pages 989-1012, November.

    More about this item

    Keywords

    ;
    ;
    ;
    ;
    ;

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:12:y:2024:i:7:p:1091-:d:1370085. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.