IDEAS home Printed from https://ideas.repec.org/a/eee/jmvana/v125y2014icp100-120.html
   My bibliography  Save this article

Assessment of the number of components in Gaussian mixture models in the presence of multiple local maximizers

Author

Listed:
  • Kim, Daeyoung
  • Seo, Byungtae

Abstract

Gaussian mixtures are very flexible in representing the underlying structure in the data. However, the likelihood inference for Gaussian mixtures with unrestricted covariance matrices is theoretically and practically challenging because the likelihood function is unbounded and often has multiple local maximizers. As shown in the numerical studies of this paper, the presence of multiple local maximizers including spurious local maximizers affects the performances of model selection criteria used to choose the number of components. In this paper we propose a new type of likelihood-based estimator, a gradient-based k-deleted maximum likelihood estimator, for Gaussian mixture models. The proposed estimator is designed to avoid spurious local maximizers and choose a statistically desirable local maximizer in the presence of multiple local maximizers. We first prove the consistency of the proposed estimator and then examine, by a real-data example and simulation studies, the performance of the proposed method in the likelihood-based model selection criteria commonly used to assess the number of components in Gaussian mixture models.

Suggested Citation

  • Kim, Daeyoung & Seo, Byungtae, 2014. "Assessment of the number of components in Gaussian mixture models in the presence of multiple local maximizers," Journal of Multivariate Analysis, Elsevier, vol. 125(C), pages 100-120.
  • Handle: RePEc:eee:jmvana:v:125:y:2014:i:c:p:100-120
    DOI: 10.1016/j.jmva.2013.11.018
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0047259X13002625
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.jmva.2013.11.018?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Seo, Byungtae & Lindsay, Bruce G., 2010. "A computational strategy for doubly smoothed MLE exemplified in the normal mixture model," Computational Statistics & Data Analysis, Elsevier, vol. 54(8), pages 1930-1941, August.
    2. Biernacki, Christophe & Chrétien, Stéphane, 2003. "Degeneracy in the maximum likelihood estimation of univariate Gaussian mixtures with EM," Statistics & Probability Letters, Elsevier, vol. 61(4), pages 373-382, February.
    3. Chen, Jiahua & Tan, Xianming, 2009. "Inference for multivariate normal mixtures," Journal of Multivariate Analysis, Elsevier, vol. 100(7), pages 1367-1383, August.
    4. Wilfried Seidel & Hana Ševčíková, 2004. "Types of likelihood maxima in mixture models and their implication on the performance of tests," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 56(4), pages 631-654, December.
    5. Gabriela Ciuperca & Andrea Ridolfi & Jérôme Idier, 2003. "Penalized Maximum Likelihood Estimator for Normal Mixtures," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 30(1), pages 45-59, March.
    6. Gilles Celeux & Gilda Soromenho, 1996. "An entropy criterion for assessing the number of clusters in a mixture model," Journal of Classification, Springer;The Classification Society, vol. 13(2), pages 195-212, September.
    7. Dankmar Böhning & Ekkehart Dietz & Rainer Schaub & Peter Schlattmann & Bruce Lindsay, 1994. "The distribution of the likelihood ratio for mixtures of densities from the one-parameter exponential family," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 46(2), pages 373-388, June.
    8. Hamparsum Bozdogan, 1987. "Model selection and Akaike's Information Criterion (AIC): The general theory and its analytical extensions," Psychometrika, Springer;The Psychometric Society, vol. 52(3), pages 345-370, September.
    9. Chris Fraley & Adrian E. Raftery, 2007. "Bayesian Regularization for Normal Mixture Estimation and Model-Based Clustering," Journal of Classification, Springer;The Classification Society, vol. 24(2), pages 155-181, September.
    10. Christian Hennig, 2010. "Methods for merging Gaussian mixture components," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 4(1), pages 3-34, April.
    11. Fraley C. & Raftery A.E., 2002. "Model-Based Clustering, Discriminant Analysis, and Density Estimation," Journal of the American Statistical Association, American Statistical Association, vol. 97, pages 611-631, June.
    12. Seo, Byungtae & Kim, Daeyoung, 2012. "Root selection in normal mixture models," Computational Statistics & Data Analysis, Elsevier, vol. 56(8), pages 2454-2470.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Sangkon Oh & Byungtae Seo, 2023. "Merging Components in Linear Gaussian Cluster-Weighted Models," Journal of Classification, Springer;The Classification Society, vol. 40(1), pages 25-51, April.
    2. Shiyao Liu & Huaiqing Wu & William Q. Meeker, 2015. "Understanding and Addressing the Unbounded "Likelihood" Problem," The American Statistician, Taylor & Francis Journals, vol. 69(3), pages 191-200, August.
    3. Roberto Mari & Roberto Rocci & Stefano Antonio Gattone, 2020. "Scale-constrained approaches for maximum likelihood estimation and model selection of clusterwise linear regression models," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 29(1), pages 49-78, March.
    4. Roberto Rocci & Stefano Antonio Gattone & Roberto Di Mari, 2018. "A data driven equivariant approach to constrained Gaussian mixture modeling," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 12(2), pages 235-260, June.
    5. Heather Shappell & Sean L. Simpson, 2022. "Discussion on “Distributional independent component analysis for diverse neuroimaging modalities” by Ben Wu, Subhadip Pal, Jian Kang, and Ying Guo," Biometrics, The International Biometric Society, vol. 78(3), pages 1106-1108, September.
    6. Derek S. Young & Xi Chen & Dilrukshi C. Hewage & Ricardo Nilo-Poyanco, 2019. "Finite mixture-of-gamma distributions: estimation, inference, and model-based clustering," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 13(4), pages 1053-1082, December.
    7. Xuwen Zhu & Volodymyr Melnykov, 2015. "Probabilistic assessment of model-based clustering," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 9(4), pages 395-422, December.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Seo, Byungtae & Kim, Daeyoung, 2012. "Root selection in normal mixture models," Computational Statistics & Data Analysis, Elsevier, vol. 56(8), pages 2454-2470.
    2. Nicolas Depraetere & Martina Vandebroek, 2014. "Order selection in finite mixtures of linear regressions," Statistical Papers, Springer, vol. 55(3), pages 871-911, August.
    3. Roberto Rocci & Stefano Antonio Gattone & Roberto Di Mari, 2018. "A data driven equivariant approach to constrained Gaussian mixture modeling," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 12(2), pages 235-260, June.
    4. Luis Angel García-Escudero & Alfonso Gordaliza & Francesca Greselin & Salvatore Ingrassia & Agustín Mayo-Iscar, 2018. "Eigenvalues and constraints in mixture modeling: geometric and computational issues," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 12(2), pages 203-233, June.
    5. Roberto Mari & Roberto Rocci & Stefano Antonio Gattone, 2020. "Scale-constrained approaches for maximum likelihood estimation and model selection of clusterwise linear regression models," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 29(1), pages 49-78, March.
    6. Ingrassia, Salvatore & Rocci, Roberto, 2011. "Degeneracy of the EM algorithm for the MLE of multivariate Gaussian mixtures and dynamic constraints," Computational Statistics & Data Analysis, Elsevier, vol. 55(4), pages 1715-1725, April.
    7. Tin Lok James Ng & Thomas Brendan Murphy, 2021. "Model-based Clustering of Count Processes," Journal of Classification, Springer;The Classification Society, vol. 38(2), pages 188-211, July.
    8. Ingrassia, Salvatore & Rocci, Roberto, 2007. "Constrained monotone EM algorithms for finite mixture of multivariate Gaussians," Computational Statistics & Data Analysis, Elsevier, vol. 51(11), pages 5339-5351, July.
    9. Omar N. Solinger & Woody van Olffen & Robert A. Roe & Joeri Hofmans, 2013. "On Becoming (Un)Committed: A Taxonomy and Test of Newcomer Onboarding Scenarios," Organization Science, INFORMS, vol. 24(6), pages 1640-1661, December.
    10. Sakyajit Bhattacharya & Paul McNicholas, 2014. "A LASSO-penalized BIC for mixture model selection," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 8(1), pages 45-61, March.
    11. Shaikh Mateen & McNicholas Paul D & Desmond Anthony F, 2010. "A Pseudo-EM Algorithm for Clustering Incomplete Longitudinal Data," The International Journal of Biostatistics, De Gruyter, vol. 6(1), pages 1-17, March.
    12. Galimberti, Giuliano & Soffritti, Gabriele, 2014. "A multivariate linear regression analysis using finite mixtures of t distributions," Computational Statistics & Data Analysis, Elsevier, vol. 71(C), pages 138-150.
    13. Ana Oliveira-Brochado & Francisco Vitorino Martins, 2008. "Determining the Number of Market Segments Using an Experimental Design," FEP Working Papers 263, Universidade do Porto, Faculdade de Economia do Porto.
    14. Julian Rossbroich & Jeffrey Durieux & Tom F. Wilderjans, 2022. "Model Selection Strategies for Determining the Optimal Number of Overlapping Clusters in Additive Overlapping Partitional Clustering," Journal of Classification, Springer;The Classification Society, vol. 39(2), pages 264-301, July.
    15. Seo, Byungtae & Lindsay, Bruce G., 2010. "A computational strategy for doubly smoothed MLE exemplified in the normal mixture model," Computational Statistics & Data Analysis, Elsevier, vol. 54(8), pages 1930-1941, August.
    16. Kim, Daeyoung & Kim, Jong-Min & Liao, Shu-Min & Jung, Yoon-Sung, 2013. "Mixture of D-vine copulas for modeling dependence," Computational Statistics & Data Analysis, Elsevier, vol. 64(C), pages 1-19.
    17. Ana Oliveira-Brochado & Francisco Vitorino Martins, 2014. "Identifying Small Market Segments with Mixture Regression Models," International Journal of Finance, Insurance and Risk Management, International Journal of Finance, Insurance and Risk Management, vol. 4(4), pages 812-812.
    18. Scrucca, Luca, 2016. "Identifying connected components in Gaussian finite mixture models for clustering," Computational Statistics & Data Analysis, Elsevier, vol. 93(C), pages 5-17.
    19. Marianna Virtanen & Jussi Vahtera & Jenny Head & Rosemary Dray-Spira & Annaleena Okuloff & Adam G Tabak & Marcel Goldberg & Jenni Ervasti & Markus Jokela & Archana Singh-Manoux & Jaana Pentti & Marie , 2015. "Work Disability among Employees with Diabetes: Latent Class Analysis of Risk Factors in Three Prospective Cohort Studies," PLOS ONE, Public Library of Science, vol. 10(11), pages 1-14, November.
    20. Danks, Nicholas P. & Sharma, Pratyush N. & Sarstedt, Marko, 2020. "Model selection uncertainty and multimodel inference in partial least squares structural equation modeling (PLS-SEM)," Journal of Business Research, Elsevier, vol. 113(C), pages 13-24.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:jmvana:v:125:y:2014:i:c:p:100-120. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/wps/find/journaldescription.cws_home/622892/description#description .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.