IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v124y2018icp220-234.html
   My bibliography  Save this article

Overfitting Bayesian mixtures of factor analyzers with an unknown number of components

Author

Listed:
  • Papastamoulis, Panagiotis

Abstract

Recent advances on overfitting Bayesian mixture models provide a solid and straightforward approach for inferring the underlying number of clusters and model parameters in heterogeneous datasets. The applicability of such a framework in clustering correlated high dimensional data is demonstrated. For this purpose an overfitting mixture of factor analyzers is introduced, assuming that the number of factors is fixed. A Markov chain Monte Carlo (MCMC) sampler combined with a prior parallel tempering scheme is used to estimate the posterior distribution of model parameters. The optimal number of factors is estimated using information criteria. Identifiability issues related to the label switching problem are dealt by post-processing the simulated MCMC sample by relabeling algorithms. The method is benchmarked against state-of-the-art software for maximum likelihood estimation of mixtures of factor analyzers using an extensive simulation study. Finally, the applicability of the method is illustrated in publicly available data.

Suggested Citation

  • Papastamoulis, Panagiotis, 2018. "Overfitting Bayesian mixtures of factor analyzers with an unknown number of components," Computational Statistics & Data Analysis, Elsevier, vol. 124(C), pages 220-234.
  • Handle: RePEc:eee:csdana:v:124:y:2018:i:c:p:220-234
    DOI: 10.1016/j.csda.2018.03.007
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167947318300550
    Download Restriction: Full text for ScienceDirect subscribers only.

    File URL: https://libkey.io/10.1016/j.csda.2018.03.007?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. repec:bfi:wpaper:2014-014 is not listed on IDEAS
    2. Conti, Gabriella & Frühwirth-Schnatter, Sylvia & Heckman, James J. & Piatek, Rémi, 2014. "Bayesian exploratory factor analysis," Journal of Econometrics, Elsevier, vol. 183(1), pages 31-57.
    3. Papastamoulis, Panagiotis & Martin-Magniette, Marie-Laure & Maugis-Rabusseau, Cathy, 2016. "On the estimation of mixtures of Poisson regression models with large number of components," Computational Statistics & Data Analysis, Elsevier, vol. 93(C), pages 97-106.
    4. Papastamoulis, Panagiotis, 2016. "label.switching: An R Package for Dealing with the Label Switching Problem in MCMC Outputs," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 69(c01).
    5. David J. Spiegelhalter & Nicola G. Best & Bradley P. Carlin & Angelika Van Der Linde, 2002. "Bayesian measures of model complexity and fit," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 64(4), pages 583-639, October.
    6. Matthew Stephens, 2000. "Dealing with label switching in mixture models," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 62(4), pages 795-809.
    7. repec:dau:papers:123456789/6069 is not listed on IDEAS
    8. McLachlan, G. J. & Peel, D. & Bean, R. W., 2003. "Modelling high-dimensional data by mixtures of factor analyzers," Computational Statistics & Data Analysis, Elsevier, vol. 41(3-4), pages 379-388, January.
    9. Zoé van Havre & Nicole White & Judith Rousseau & Kerrie Mengersen, 2015. "Overfitting Bayesian Mixture Models with an Unknown Number of Components," PLOS ONE, Public Library of Science, vol. 10(7), pages 1-27, July.
    10. Angelika van der Linde, 2005. "DIC in variable selection," Statistica Neerlandica, Netherlands Society for Statistics and Operations Research, vol. 59(1), pages 45-56, February.
    11. Panagiotis Papastamoulis & George Iliopoulos, 2013. "On the Convergence Rate of Random Permutation Sampler and ECR Algorithm in Missing Data Models," Methodology and Computing in Applied Probability, Springer, vol. 15(2), pages 293-304, June.
    12. Walter Ledermann, 1937. "On the rank of the reduced correlational matrix in multiple-factor analysis," Psychometrika, Springer;The Psychometric Society, vol. 2(2), pages 85-93, June.
    13. Papastamoulis, Panagiotis & Iliopoulos, George, 2009. "Reversible Jump MCMC in mixtures of normal distributions with the same component means," Computational Statistics & Data Analysis, Elsevier, vol. 53(4), pages 900-911, February.
    14. repec:dau:papers:123456789/4648 is not listed on IDEAS
    15. David J. Spiegelhalter & Nicola G. Best & Bradley P. Carlin & Angelika Linde, 2014. "The deviance information criterion: 12 years on," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 76(3), pages 485-493, June.
    16. Angelika van der Linde, 2012. "A Bayesian view of model complexity," Statistica Neerlandica, Netherlands Society for Statistics and Operations Research, vol. 66(3), pages 253-271, August.
    17. McNicholas, P.D. & Murphy, T.B. & McDaid, A.F. & Frost, D., 2010. "Serial and parallel implementations of model-based clustering via parsimonious Gaussian mixture models," Computational Statistics & Data Analysis, Elsevier, vol. 54(3), pages 711-723, March.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Briana J. K. Stephenson & Amy H. Herring & Andrew F. Olshan, 2022. "Derivation of maternal dietary patterns accounting for regional heterogeneity," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 71(5), pages 1957-1977, November.
    2. Wan-Lun Wang & Tsung-I Lin, 2020. "Automated learning of mixtures of factor analysis models with missing information," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 29(4), pages 1098-1124, December.
    3. Roy Costilla & Ivy Liu & Richard Arnold & Daniel Fernández, 2019. "Bayesian model-based clustering for longitudinal ordinal data," Computational Statistics, Springer, vol. 34(3), pages 1015-1038, September.
    4. Kai Yang & Qingqing Zhang & Xinyang Yu & Xiaogang Dong, 2023. "Bayesian inference for a mixture double autoregressive model," Statistica Neerlandica, Netherlands Society for Statistics and Operations Research, vol. 77(2), pages 188-207, May.
    5. Kelvyn Jones & David Manley & Ron Johnston & Dewi Owen, 2018. "Modelling residential segregation as unevenness and clustering: A multilevel modelling approach incorporating spatial dependence and tackling the MAUP," Environment and Planning B, , vol. 45(6), pages 1122-1141, November.
    6. Park, Byung-Jung & Zhang, Yunlong & Lord, Dominique, 2010. "Bayesian mixture modeling approach to account for heterogeneity in speed data," Transportation Research Part B: Methodological, Elsevier, vol. 44(5), pages 662-673, June.
    7. Montanari, Angela & Viroli, Cinzia, 2011. "Maximum likelihood estimation of mixtures of factor analyzers," Computational Statistics & Data Analysis, Elsevier, vol. 55(9), pages 2712-2723, September.
    8. Royce Anders & William Batchelder, 2015. "Cultural Consensus Theory for the Ordinal Data Case," Psychometrika, Springer;The Psychometric Society, vol. 80(1), pages 151-181, March.
    9. Shuhui Guo & Lihua Xiong & Jie Chen & Shenglian Guo & Jun Xia & Ling Zeng & Chong-Yu Xu, 2023. "Nonstationary Regional Flood Frequency Analysis Based on the Bayesian Method," Water Resources Management: An International Journal, Published for the European Water Resources Association (EWRA), Springer;European Water Resources Association (EWRA), vol. 37(2), pages 659-681, January.
    10. Lu, Xiaosun & Huang, Yangxin & Zhu, Yiliang, 2016. "Finite mixture of nonlinear mixed-effects joint models in the presence of missing and mismeasured covariate, with application to AIDS studies," Computational Statistics & Data Analysis, Elsevier, vol. 93(C), pages 119-130.
    11. Lubrano, Michel & Ndoye, Abdoul Aziz Junior, 2016. "Income inequality decomposition using a finite mixture of log-normal distributions: A Bayesian approach," Computational Statistics & Data Analysis, Elsevier, vol. 100(C), pages 830-846.
    12. Simon Beyeler & Sylvia Kaufmann, 2021. "Reduced‐form factor augmented VAR—Exploiting sparsity to include meaningful factors," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 36(7), pages 989-1012, November.
    13. Yuan Fang & Dimitris Karlis & Sanjeena Subedi, 2022. "Infinite Mixtures of Multivariate Normal-Inverse Gaussian Distributions for Clustering of Skewed Data," Journal of Classification, Springer;The Classification Society, vol. 39(3), pages 510-552, November.
    14. Kazuhiko Kakamu, 2022. "Bayesian analysis of mixtures of lognormal distribution with an unknown number of components from grouped data," Papers 2210.05115, arXiv.org, revised Sep 2023.
    15. Terrance Savitsky & Daniel McCaffrey, 2014. "Bayesian Hierarchical Multivariate Formulation with Factor Analysis for Nested Ordinal Data," Psychometrika, Springer;The Psychometric Society, vol. 79(2), pages 275-302, April.
    16. Rufo, M.J. & Martín, J. & Pérez, C.J., 2010. "New approaches to compute Bayes factor in finite mixture models," Computational Statistics & Data Analysis, Elsevier, vol. 54(12), pages 3324-3335, December.
    17. Emanuele Gramuglia & Geir Storvik & Morten Stakkeland, 2021. "Clustering and automatic labelling within time series of categorical observations—with an application to marine log messages," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 70(3), pages 714-732, June.
    18. Hazelton, Martin L. & Parry, Katharina, 2016. "Statistical methods for comparison of day-to-day traffic models," Transportation Research Part B: Methodological, Elsevier, vol. 92(PA), pages 22-34.
    19. Muhammed Semakula & Franco̧is Niragire & Christel Faes, 2020. "Bayesian spatio-temporal modeling of malaria risk in Rwanda," PLOS ONE, Public Library of Science, vol. 15(9), pages 1-16, September.
    20. Fabian Krüger & Sebastian Lerch & Thordis Thorarinsdottir & Tilmann Gneiting, 2021. "Predictive Inference Based on Markov Chain Monte Carlo Output," International Statistical Review, International Statistical Institute, vol. 89(2), pages 274-301, August.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:124:y:2018:i:c:p:220-234. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.