IDEAS home Printed from https://ideas.repec.org/a/spr/stpapr/v58y2017i4d10.1007_s00362-016-0747-x.html
   My bibliography  Save this article

A comparison of the $$L_2$$ L 2 minimum distance estimator and the EM-algorithm when fitting $${\varvec{{k}}}$$ k -component univariate normal mixtures

Author

Listed:
  • Brenton R. Clarke

    (Murdoch University)

  • Thomas Davidson

    (Australian Bureau of Statistics)

  • Robert Hammarstrand

    (Murdoch University)

Abstract

The method of maximum likelihood using the EM-algorithm for fitting finite mixtures of normal distributions is the accepted method of estimation ever since it has been shown to be superior to the method of moments. Recent books testify to this. There has however been criticism of the method of maximum likelihood for this problem, the main criticism being when the variances of component distributions are unequal the likelihood is in fact unbounded and there can be multiple local maxima. Another major criticism is that the maximum likelihood estimator is not robust. Several alternative minimum distance estimators have since been proposed as a way of dealing with the first problem. This paper deals with one of these estimators which is not only superior due to its robustness, but in fact can have an advantage in numerical studies even at the model distribution. Importantly, robust alternatives of the EM-algorithm, ostensibly fitting t distributions when in fact the data are mixtures of normals, are also not competitive at the normal mixture model when compared to the chosen minimum distance estimator. It is argued for instance that natural processes should lead to mixtures whose component distributions are normal as a result of the Central Limit Theorem. On the other hand data can be contaminated because of extraneous sources as are typically assumed in robustness studies. This calls for a robust estimator.

Suggested Citation

  • Brenton R. Clarke & Thomas Davidson & Robert Hammarstrand, 2017. "A comparison of the $$L_2$$ L 2 minimum distance estimator and the EM-algorithm when fitting $${\varvec{{k}}}$$ k -component univariate normal mixtures," Statistical Papers, Springer, vol. 58(4), pages 1247-1266, December.
  • Handle: RePEc:spr:stpapr:v:58:y:2017:i:4:d:10.1007_s00362-016-0747-x
    DOI: 10.1007/s00362-016-0747-x
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s00362-016-0747-x
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s00362-016-0747-x?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Biernacki, Christophe & Chrétien, Stéphane, 2003. "Degeneracy in the maximum likelihood estimation of univariate Gaussian mixtures with EM," Statistics & Probability Letters, Elsevier, vol. 61(4), pages 373-382, February.
    2. Clarke, Brenton R. & Futschik, Andreas, 2007. "On the convergence of Newton's method when estimating higher dimensional parameters," Journal of Multivariate Analysis, Elsevier, vol. 98(5), pages 916-931, May.
    3. Clarke, Brenton R., 1989. "An unbiased minimum distance estimator of the proportion parameter in a mixture of two normal distributions," Statistics & Probability Letters, Elsevier, vol. 7(4), pages 275-281, February.
    4. Wilfried Seidel & Hana Ševčíková, 2004. "Types of likelihood maxima in mixture models and their implication on the performance of tests," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 56(4), pages 631-654, December.
    5. Wilfried Seidel & Karl Mosler & Manfred Alker, 2000. "A Cautionary Note on Likelihood Ratio Tests in Mixture Models," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 52(3), pages 481-487, September.
    6. B. Clarke & C. Heathcote, 1994. "Robust estimation ofk-component univariate normal mixtures," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 46(1), pages 83-93, March.
    7. Sharon Lee & Geoffrey McLachlan, 2013. "On mixtures of skew normal and skew $$t$$ -distributions," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 7(3), pages 241-266, September.
    8. Klar, Bernhard & Meintanis, Simos G., 2005. "Tests for normal mixtures based on the empirical characteristic function," Computational Statistics & Data Analysis, Elsevier, vol. 49(1), pages 227-242, April.
    9. Nicolas Depraetere & Martina Vandebroek, 2014. "Order selection in finite mixtures of linear regressions," Statistical Papers, Springer, vol. 55(3), pages 871-911, August.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Angelo Mazza & Antonio Punzo, 2020. "Mixtures of multivariate contaminated normal regression models," Statistical Papers, Springer, vol. 61(2), pages 787-822, April.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Brenton Clarke & Peter McKinnon & Geoff Riley, 2012. "A fast robust method for fitting gamma distributions," Statistical Papers, Springer, vol. 53(4), pages 1001-1014, November.
    2. Seo, Byungtae & Kim, Daeyoung, 2012. "Root selection in normal mixture models," Computational Statistics & Data Analysis, Elsevier, vol. 56(8), pages 2454-2470.
    3. Garel, Bernard, 2007. "Recent asymptotic results in testing for mixtures," Computational Statistics & Data Analysis, Elsevier, vol. 51(11), pages 5295-5304, July.
    4. Kim, Daeyoung & Seo, Byungtae, 2014. "Assessment of the number of components in Gaussian mixture models in the presence of multiple local maximizers," Journal of Multivariate Analysis, Elsevier, vol. 125(C), pages 100-120.
    5. Roberto Rocci & Stefano Antonio Gattone & Roberto Di Mari, 2018. "A data driven equivariant approach to constrained Gaussian mixture modeling," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 12(2), pages 235-260, June.
    6. Redivo, Edoardo & Nguyen, Hien D. & Gupta, Mayetri, 2020. "Bayesian clustering of skewed and multimodal data using geometric skewed normal distributions," Computational Statistics & Data Analysis, Elsevier, vol. 152(C).
    7. Zhu, Xuwen & Melnykov, Volodymyr, 2018. "Manly transformation in finite mixture modeling," Computational Statistics & Data Analysis, Elsevier, vol. 121(C), pages 190-208.
    8. Jabłońska-Sabuka, Matylda & Teuerle, Marek & Wyłomańska, Agnieszka, 2017. "Bivariate sub-Gaussian model for stock index returns," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 486(C), pages 628-637.
    9. Azzalini, Adelchi & Browne, Ryan P. & Genton, Marc G. & McNicholas, Paul D., 2016. "On nomenclature for, and the relative merits of, two formulations of skew distributions," Statistics & Probability Letters, Elsevier, vol. 110(C), pages 201-206.
    10. Lee, Sharon X. & McLachlan, Geoffrey J., 2022. "An overview of skew distributions in model-based clustering," Journal of Multivariate Analysis, Elsevier, vol. 188(C).
    11. Jiménez-Gamero, M. Dolores & Kim, Hyoung-Moon, 2015. "Fast goodness-of-fit tests based on the characteristic function," Computational Statistics & Data Analysis, Elsevier, vol. 89(C), pages 172-191.
    12. Marek Śmieja & Magdalena Wiercioch, 2017. "Constrained clustering with a complex cluster structure," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 11(3), pages 493-518, September.
    13. Bhat, Chandra R., 2018. "New matrix-based methods for the analytic evaluation of the multivariate cumulative normal distribution function," Transportation Research Part B: Methodological, Elsevier, vol. 109(C), pages 238-256.
    14. Yana Melnykov & Xuwen Zhu & Volodymyr Melnykov, 2021. "Transformation mixture modeling for skewed data groups with heavy tails and scatter," Computational Statistics, Springer, vol. 36(1), pages 61-78, March.
    15. Nicolas Depraetere & Martina Vandebroek, 2014. "Order selection in finite mixtures of linear regressions," Statistical Papers, Springer, vol. 55(3), pages 871-911, August.
    16. Meintanis, Simos G. & Iliopoulos, George, 2008. "Fourier methods for testing multivariate independence," Computational Statistics & Data Analysis, Elsevier, vol. 52(4), pages 1884-1895, January.
    17. Tran, Thanh N. & Wehrens, Ron & Buydens, Lutgarde M.C., 2006. "KNN-kernel density-based clustering for high-dimensional multivariate data," Computational Statistics & Data Analysis, Elsevier, vol. 51(2), pages 513-525, November.
    18. Andrews, Jeffrey L., 2018. "Addressing overfitting and underfitting in Gaussian model-based clustering," Computational Statistics & Data Analysis, Elsevier, vol. 127(C), pages 160-171.
    19. F. Bartolucci & A. Farcomeni & F. Pennoni, 2014. "Latent Markov models: a review of a general framework for the analysis of longitudinal data with covariates," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 23(3), pages 433-465, September.
    20. Chauveau, Didier & Hoang, Vy Thuy Lynh, 2016. "Nonparametric mixture models with conditionally independent multivariate component densities," Computational Statistics & Data Analysis, Elsevier, vol. 103(C), pages 1-16.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:stpapr:v:58:y:2017:i:4:d:10.1007_s00362-016-0747-x. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.