Initializing the EM algorithm in Gaussian mixture models with an unknown number of components
An approach is proposed for initializing the expectation–maximization (EM) algorithm in multivariate Gaussian mixture models with an unknown number of components. As the EM algorithm is often sensitive to the choice of the initial parameter vector, efficient initialization is an important preliminary process for the future convergence of the algorithm to the best local maximum of the likelihood function. We propose a strategy initializing mean vectors by choosing points with higher concentrations of neighbors and using a truncated normal distribution for the preliminary estimation of dispersion matrices. The suggested approach is illustrated on examples and compared with several other initialization methods.
If you experience problems downloading a file, check if you have the proper application to view it first. In case of further problems read the IDEAS help page. Note that these files are not on the IDEAS site. Please be patient as the files may be large.
As the access to this document is restricted, you may want to look for a different version under "Related research" (further below) or search for a different version of it.
References listed on IDEAS
Please report citation or reference errors to , or , if you are the registered author of the cited work, log in to your RePEc Author Service profile, click on "citations" and make appropriate adjustments.:
- Lawrence Hubert & Phipps Arabie, 1985. "Comparing partitions," Journal of Classification, Springer, vol. 2(1), pages 193-218, December.
- Andrews, Jeffrey L. & McNicholas, Paul D. & Subedi, Sanjeena, 2011. "Model-based classification via mixtures of multivariate t-distributions," Computational Statistics & Data Analysis, Elsevier, vol. 55(1), pages 520-529, January.
- Karlis, Dimitris & Xekalaki, Evdokia, 2003. "Choosing initial values for the EM algorithm for finite mixtures," Computational Statistics & Data Analysis, Elsevier, vol. 41(3-4), pages 577-590, January.
- Biernacki, Christophe & Celeux, Gilles & Govaert, Gerard, 2003. "Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models," Computational Statistics & Data Analysis, Elsevier, vol. 41(3-4), pages 561-575, January.
- McGrory, C.A. & Titterington, D.M., 2007. "Variational approximations in Bayesian model selection for finite mixture distributions," Computational Statistics & Data Analysis, Elsevier, vol. 51(11), pages 5352-5367, July.
- Li, Jia & Zha, Hongyuan, 2006. "Two-way Poisson mixture models for simultaneous document classification and word clustering," Computational Statistics & Data Analysis, Elsevier, vol. 50(1), pages 163-180, January.
- Bouveyron, C. & Girard, S. & Schmid, C., 2007. "High-dimensional data clustering," Computational Statistics & Data Analysis, Elsevier, vol. 52(1), pages 502-519, September.
- Kiefer, Nicholas M, 1978. "Discrete Parameter Variation: Efficient Estimation of a Switching Regression Model," Econometrica, Econometric Society, vol. 46(2), pages 427-34, March.
When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:56:y:2012:i:6:p:1381-1395. See general information about how to correct material in RePEc.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: (Zhang, Lei)
If references are entirely missing, you can add them using this form.