IDEAS home Printed from https://ideas.repec.org/a/spr/stmapp/v24y2015i4p623-649.html
   My bibliography  Save this article

Cluster-weighted $$t$$ t -factor analyzers for robust model-based clustering and dimension reduction

Author

Listed:
  • Sanjeena Subedi
  • Antonio Punzo
  • Salvatore Ingrassia
  • Paul McNicholas

Abstract

Cluster-weighted models represent a convenient approach for model-based clustering, especially when the covariates contribute to defining the cluster-structure of the data. However, applicability may be limited when the number of covariates is high and performance may be affected by noise and outliers. To overcome these problems, common/uncommon $$t$$ t -factor analyzers for the covariates, and a $$t$$ t -distribution for the response variable, are here assumed in each mixture component. A family of twenty parsimonious variants of this model is also presented and the alternating expectation-conditional maximization algorithm, for maximum likelihood estimation of the parameters of all models in the family, is described. Artificial and real data show that these models have very good clustering performance and that the algorithm is able to recover the parameters very well. Copyright Springer-Verlag Berlin Heidelberg 2015

Suggested Citation

  • Sanjeena Subedi & Antonio Punzo & Salvatore Ingrassia & Paul McNicholas, 2015. "Cluster-weighted $$t$$ t -factor analyzers for robust model-based clustering and dimension reduction," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 24(4), pages 623-649, November.
  • Handle: RePEc:spr:stmapp:v:24:y:2015:i:4:p:623-649
    DOI: 10.1007/s10260-015-0298-7
    as

    Download full text from publisher

    File URL: http://hdl.handle.net/10.1007/s10260-015-0298-7
    Download Restriction: Access to full text is restricted to subscribers.

    File URL: https://libkey.io/10.1007/s10260-015-0298-7?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Sanjeena Subedi & Antonio Punzo & Salvatore Ingrassia & Paul McNicholas, 2013. "Clustering and classification via cluster-weighted factor analyzers," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 7(1), pages 5-40, March.
    2. Salvatore Ingrassia & Antonio Punzo & Giorgio Vittadini & Simona Minotti, 2015. "Erratum to: The Generalized Linear Mixed Cluster-Weighted Model," Journal of Classification, Springer;The Classification Society, vol. 32(2), pages 327-355, July.
    3. Leisch, Friedrich, 2004. "FlexMix: A General Framework for Finite Mixture Models and Latent Class Regression in R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 11(i08).
    4. Grün, Bettina & Leisch, Friedrich, 2008. "FlexMix Version 2: Finite Mixtures with Concomitant Variables and Varying and Constant Parameters," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 28(i04).
    5. Ingrassia, Salvatore & Minotti, Simona C. & Punzo, Antonio, 2014. "Model-based clustering via linear cluster-weighted models," Computational Statistics & Data Analysis, Elsevier, vol. 71(C), pages 159-182.
    6. Wayne DeSarbo & William Cron, 1988. "A maximum likelihood methodology for clusterwise linear regression," Journal of Classification, Springer;The Classification Society, vol. 5(2), pages 249-282, September.
    7. Salvatore Ingrassia & Antonio Punzo & Giorgio Vittadini & Simona Minotti, 2015. "The Generalized Linear Mixed Cluster-Weighted Model," Journal of Classification, Springer;The Classification Society, vol. 32(1), pages 85-113, April.
    8. Dankmar Böhning & Ekkehart Dietz & Rainer Schaub & Peter Schlattmann & Bruce Lindsay, 1994. "The distribution of the likelihood ratio for mixtures of densities from the one-parameter exponential family," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 46(2), pages 373-388, June.
    9. McLachlan, G.J. & Bean, R.W. & Ben-Tovim Jones, L., 2007. "Extension of the mixture of factor analyzers model to incorporate the multivariate t-distribution," Computational Statistics & Data Analysis, Elsevier, vol. 51(11), pages 5327-5338, July.
    10. Salvatore Ingrassia & Simona Minotti & Giorgio Vittadini, 2012. "Local Statistical Modeling via a Cluster-Weighted Approach with Elliptical Distributions," Journal of Classification, Springer;The Classification Society, vol. 29(3), pages 363-401, October.
    11. G. J. McLachlan, 1987. "On Bootstrapping the Likelihood Ratio Test Statistic for the Number of Components in a Normal Mixture," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 36(3), pages 318-324, November.
    12. Ehrlich, Isaac, 1973. "Participation in Illegitimate Activities: A Theoretical and Empirical Investigation," Journal of Political Economy, University of Chicago Press, vol. 81(3), pages 521-565, May-June.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Antonio Punzo & Paul. D. McNicholas, 2017. "Robust Clustering in Regression Analysis via the Contaminated Gaussian Cluster-Weighted Model," Journal of Classification, Springer;The Classification Society, vol. 34(2), pages 249-293, July.
    2. Michael P. B. Gallaugher & Salvatore D. Tomarchio & Paul D. McNicholas & Antonio Punzo, 2022. "Multivariate cluster weighted models using skewed distributions," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 16(1), pages 93-124, March.
    3. Maruotti, Antonello & Punzo, Antonio, 2017. "Model-based time-varying clustering of multivariate longitudinal data with covariates and outliers," Computational Statistics & Data Analysis, Elsevier, vol. 113(C), pages 475-496.
    4. Leila Amiri & Mojtaba Khazaei & Mojtaba Ganjali, 2017. "General location model with factor analyzer covariance matrix structure and its applications," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 11(3), pages 593-609, September.
    5. Antonio Punzo & Salvatore Ingrassia & Antonello Maruotti, 2021. "Multivariate hidden Markov regression models: random covariates and heavy-tailed distributions," Statistical Papers, Springer, vol. 62(3), pages 1519-1555, June.
    6. Diani, Cecilia & Galimberti, Giuliano & Soffritti, Gabriele, 2022. "Multivariate cluster-weighted models based on seemingly unrelated linear regression," Computational Statistics & Data Analysis, Elsevier, vol. 171(C).
    7. Sanjeena Subedi & Paul D. McNicholas, 2021. "A Variational Approximations-DIC Rubric for Parameter Estimation and Mixture Model Selection Within a Family Setting," Journal of Classification, Springer;The Classification Society, vol. 38(1), pages 89-108, April.
    8. Paul D. McNicholas, 2016. "Model-Based Clustering," Journal of Classification, Springer;The Classification Society, vol. 33(3), pages 331-373, October.
    9. Angelo Mazza & Antonio Punzo, 2020. "Mixtures of multivariate contaminated normal regression models," Statistical Papers, Springer, vol. 61(2), pages 787-822, April.
    10. Morris, Katherine & Punzo, Antonio & McNicholas, Paul D. & Browne, Ryan P., 2019. "Asymmetric clusters and outliers: Mixtures of multivariate contaminated shifted asymmetric Laplace distributions," Computational Statistics & Data Analysis, Elsevier, vol. 132(C), pages 145-166.
    11. Luis Angel García-Escudero & Alfonso Gordaliza & Francesca Greselin & Salvatore Ingrassia & Agustín Mayo-Iscar, 2018. "Eigenvalues and constraints in mixture modeling: geometric and computational issues," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 12(2), pages 203-233, June.
    12. Perthame, Emeline & Forbes, Florence & Deleforge, Antoine, 2018. "Inverse regression approach to robust nonlinear high-to-low dimensional mapping," Journal of Multivariate Analysis, Elsevier, vol. 163(C), pages 1-14.
    13. Michael P. B. Gallaugher & Paul D. McNicholas, 2019. "On Fractionally-Supervised Classification: Weight Selection and Extension to the Multivariate t-Distribution," Journal of Classification, Springer;The Classification Society, vol. 36(2), pages 232-265, July.
    14. Utkarsh J. Dang & Antonio Punzo & Paul D. McNicholas & Salvatore Ingrassia & Ryan P. Browne, 2017. "Multivariate Response and Parsimony for Gaussian Cluster-Weighted Models," Journal of Classification, Springer;The Classification Society, vol. 34(1), pages 4-34, April.
    15. Počuča, Nikola & Jevtić, Petar & McNicholas, Paul D. & Miljkovic, Tatjana, 2020. "Modeling frequency and severity of claims with the zero-inflated generalized cluster-weighted models," Insurance: Mathematics and Economics, Elsevier, vol. 94(C), pages 79-93.
    16. Yang, Yu-Chen & Lin, Tsung-I & Castro, Luis M. & Wang, Wan-Lun, 2020. "Extending finite mixtures of t linear mixed-effects models with concomitant covariates," Computational Statistics & Data Analysis, Elsevier, vol. 148(C).

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Utkarsh J. Dang & Antonio Punzo & Paul D. McNicholas & Salvatore Ingrassia & Ryan P. Browne, 2017. "Multivariate Response and Parsimony for Gaussian Cluster-Weighted Models," Journal of Classification, Springer;The Classification Society, vol. 34(1), pages 4-34, April.
    2. Salvatore Ingrassia & Antonio Punzo, 2020. "Cluster Validation for Mixtures of Regressions via the Total Sum of Squares Decomposition," Journal of Classification, Springer;The Classification Society, vol. 37(2), pages 526-547, July.
    3. Angelo Mazza & Antonio Punzo, 2020. "Mixtures of multivariate contaminated normal regression models," Statistical Papers, Springer, vol. 61(2), pages 787-822, April.
    4. Salvatore D. Tomarchio & Paul D. McNicholas & Antonio Punzo, 2021. "Matrix Normal Cluster-Weighted Models," Journal of Classification, Springer;The Classification Society, vol. 38(3), pages 556-575, October.
    5. Paul D. McNicholas, 2016. "Model-Based Clustering," Journal of Classification, Springer;The Classification Society, vol. 33(3), pages 331-373, October.
    6. Antonio Punzo & Paul. D. McNicholas, 2017. "Robust Clustering in Regression Analysis via the Contaminated Gaussian Cluster-Weighted Model," Journal of Classification, Springer;The Classification Society, vol. 34(2), pages 249-293, July.
    7. Diani, Cecilia & Galimberti, Giuliano & Soffritti, Gabriele, 2022. "Multivariate cluster-weighted models based on seemingly unrelated linear regression," Computational Statistics & Data Analysis, Elsevier, vol. 171(C).
    8. Michael P. B. Gallaugher & Salvatore D. Tomarchio & Paul D. McNicholas & Antonio Punzo, 2022. "Multivariate cluster weighted models using skewed distributions," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 16(1), pages 93-124, March.
    9. Gabriele Soffritti, 2021. "Estimating the Covariance Matrix of the Maximum Likelihood Estimator Under Linear Cluster-Weighted Models," Journal of Classification, Springer;The Classification Society, vol. 38(3), pages 594-625, October.
    10. Xiaoqiong Fang & Andy W. Chen & Derek S. Young, 2023. "Predictors with measurement error in mixtures of polynomial regressions," Computational Statistics, Springer, vol. 38(1), pages 373-401, March.
    11. Keefe Murphy & Thomas Brendan Murphy, 2020. "Gaussian parsimonious clustering models with covariates and a noise component," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 14(2), pages 293-325, June.
    12. Paolo Berta & Salvatore Ingrassia & Antonio Punzo & Giorgio Vittadini, 2016. "Multilevel cluster-weighted models for the evaluation of hospitals," METRON, Springer;Sapienza Università di Roma, vol. 74(3), pages 275-292, December.
    13. Rainer Schlittgen, 2011. "A weighted least-squares approach to clusterwise regression," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 95(2), pages 205-217, June.
    14. Počuča, Nikola & Jevtić, Petar & McNicholas, Paul D. & Miljkovic, Tatjana, 2020. "Modeling frequency and severity of claims with the zero-inflated generalized cluster-weighted models," Insurance: Mathematics and Economics, Elsevier, vol. 94(C), pages 79-93.
    15. Sanjeena Subedi & Antonio Punzo & Salvatore Ingrassia & Paul McNicholas, 2013. "Clustering and classification via cluster-weighted factor analyzers," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 7(1), pages 5-40, March.
    16. Yang, Yu-Chen & Lin, Tsung-I & Castro, Luis M. & Wang, Wan-Lun, 2020. "Extending finite mixtures of t linear mixed-effects models with concomitant covariates," Computational Statistics & Data Analysis, Elsevier, vol. 148(C).
    17. Roberto Mari & Salvatore Ingrassia & Antonio Punzo, 2023. "Local and Overall Deviance R-Squared Measures for Mixtures of Generalized Linear Models," Journal of Classification, Springer;The Classification Society, vol. 40(2), pages 233-266, July.
    18. Michael P. B. Gallaugher & Paul D. McNicholas, 2019. "On Fractionally-Supervised Classification: Weight Selection and Extension to the Multivariate t-Distribution," Journal of Classification, Springer;The Classification Society, vol. 36(2), pages 232-265, July.
    19. Maruotti, Antonello & Punzo, Antonio, 2017. "Model-based time-varying clustering of multivariate longitudinal data with covariates and outliers," Computational Statistics & Data Analysis, Elsevier, vol. 113(C), pages 475-496.
    20. Salvatore Ingrassia & Antonio Punzo & Giorgio Vittadini & Simona Minotti, 2015. "The Generalized Linear Mixed Cluster-Weighted Model," Journal of Classification, Springer;The Classification Society, vol. 32(1), pages 85-113, April.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:stmapp:v:24:y:2015:i:4:p:623-649. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.