IDEAS home Printed from https://ideas.repec.org/a/spr/stpapr/v55y2014i3p871-911.html
   My bibliography  Save this article

Order selection in finite mixtures of linear regressions

Author

Listed:
  • Nicolas Depraetere
  • Martina Vandebroek

Abstract

Finite mixture models can adequately model population heterogeneity when this heterogeneity arises from a finite number of relatively homogeneous clusters. An example of such a situation is market segmentation. Order selection in mixture models, i.e. selecting the correct number of components, however, is a problem which has not been satisfactorily resolved. Existing simulation results in the literature do not completely agree with each other. Moreover, it appears that the performance of different selection methods is affected by the type of model and the parameter values. Furthermore, most existing results are based on simulations where the true generating model is identical to one of the models in the candidate set. In order to partly fill this gap we carried out a (relatively) large simulation study for finite mixture models of normal linear regressions. We included several types of model (mis)specification to study the robustness of 18 order selection methods. Furthermore, we compared the performance of these selection methods based on unpenalized and penalized estimates of the model parameters. The results indicate that order selection based on penalized estimates greatly improves the success rates of all order selection methods. The most successful methods were $$MDL2$$ MDL 2 , $$MRC$$ MRC , $$MRC_k$$ MRC k , $$ICL$$ ICL – $$BIC$$ BIC , $$ICL$$ ICL , $$CAIC$$ CAIC , $$BIC$$ BIC and $$CLC$$ CLC but not one method was consistently good or best for all types of model (mis)specification. Copyright Springer-Verlag Berlin Heidelberg 2014

Suggested Citation

  • Nicolas Depraetere & Martina Vandebroek, 2014. "Order selection in finite mixtures of linear regressions," Statistical Papers, Springer, vol. 55(3), pages 871-911, August.
  • Handle: RePEc:spr:stpapr:v:55:y:2014:i:3:p:871-911
    DOI: 10.1007/s00362-013-0534-x
    as

    Download full text from publisher

    File URL: http://hdl.handle.net/10.1007/s00362-013-0534-x
    Download Restriction: Access to full text is restricted to subscribers.

    File URL: https://libkey.io/10.1007/s00362-013-0534-x?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Naik, Prasad A. & Shi, Peide & Tsai, Chih-Ling, 2007. "Extending the Akaike Information Criterion to Mixture Regression Models," Journal of the American Statistical Association, American Statistical Association, vol. 102, pages 244-254, March.
    2. Gilles Celeux & Gilda Soromenho, 1996. "An entropy criterion for assessing the number of clusters in a mixture model," Journal of Classification, Springer;The Classification Society, vol. 13(2), pages 195-212, September.
    3. Hawkins, Dollena S. & Allen, David M. & Stromberg, Arnold J., 2001. "Determining the number of components in mixtures of linear models," Computational Statistics & Data Analysis, Elsevier, vol. 38(1), pages 15-48, November.
    4. Joseph E. Cavanaugh, 2004. "Criteria for Linear Model Selection Based on Kullback's Symmetric Divergence," Australian & New Zealand Journal of Statistics, Australian Statistical Publishing Association Inc., vol. 46(2), pages 257-274, June.
    5. Garel, Bernard, 2007. "Recent asymptotic results in testing for mixtures," Computational Statistics & Data Analysis, Elsevier, vol. 51(11), pages 5295-5304, July.
    6. Cavanaugh, Joseph E., 1999. "A large-sample model selection criterion based on Kullback's symmetric divergence," Statistics & Probability Letters, Elsevier, vol. 42(4), pages 333-343, May.
    7. Dankmar Böhning & Ekkehart Dietz & Rainer Schaub & Peter Schlattmann & Bruce Lindsay, 1994. "The distribution of the likelihood ratio for mixtures of densities from the one-parameter exponential family," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 46(2), pages 373-388, June.
    8. Hathaway, Richard J., 1986. "Another interpretation of the EM algorithm for mixture distributions," Statistics & Probability Letters, Elsevier, vol. 4(2), pages 53-56, March.
    9. Hafidi, Bezza & Mkhadri, Abdallah, 2010. "The Kullback information criterion for mixture regression models," Statistics & Probability Letters, Elsevier, vol. 80(9-10), pages 807-815, May.
    10. Yang, Chih-Chien, 2006. "Evaluating latent class analysis models in qualitative phenotype identification," Computational Statistics & Data Analysis, Elsevier, vol. 50(4), pages 1090-1104, February.
    11. Chih-Chien Yang & Chih-Chiang Yang, 2007. "Separating Latent Classes by Information Criteria," Journal of Classification, Springer;The Classification Society, vol. 24(2), pages 183-203, September.
    12. Allen Fleishman, 1978. "A method for simulating non-normal distributions," Psychometrika, Springer;The Psychometric Society, vol. 43(4), pages 521-532, December.
    13. Yuhong Yang, 2005. "Can the strengths of AIC and BIC be shared? A conflict between model indentification and regression estimation," Biometrika, Biometrika Trust, vol. 92(4), pages 937-950, December.
    14. Wilfried Seidel & Hana Ševčíková, 2004. "Types of likelihood maxima in mixture models and their implication on the performance of tests," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 56(4), pages 631-654, December.
    15. Gabriela Ciuperca & Andrea Ridolfi & Jérôme Idier, 2003. "Penalized Maximum Likelihood Estimator for Normal Mixtures," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 30(1), pages 45-59, March.
    16. Ana Oliveira-Brochado & Francisco Vitorino Martins, 2008. "Determining the Number of Market Segments Using an Experimental Design," FEP Working Papers 263, Universidade do Porto, Faculdade de Economia do Porto.
    17. Hamparsum Bozdogan, 1987. "Model selection and Akaike's Information Criterion (AIC): The general theory and its analytical extensions," Psychometrika, Springer;The Psychometric Society, vol. 52(3), pages 345-370, September.
    18. Kamel Jedidi & Harsharanjeet S. Jagpal & Wayne S. DeSarbo, 1997. "Finite-Mixture Structural Equation Models for Response-Based Segmentation and Unobserved Heterogeneity," Marketing Science, INFORMS, vol. 16(1), pages 39-59.
    19. Wayne DeSarbo & William Cron, 1988. "A maximum likelihood methodology for clusterwise linear regression," Journal of Classification, Springer;The Classification Society, vol. 5(2), pages 249-282, September.
    20. Chen, Jiahua & Tan, Xianming, 2009. "Inference for multivariate normal mixtures," Journal of Multivariate Analysis, Elsevier, vol. 100(7), pages 1367-1383, August.
    21. Biernacki, Christophe & Celeux, Gilles & Govaert, Gerard, 2003. "Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models," Computational Statistics & Data Analysis, Elsevier, vol. 41(3-4), pages 561-575, January.
    22. Karlis, Dimitris & Xekalaki, Evdokia, 2003. "Choosing initial values for the EM algorithm for finite mixtures," Computational Statistics & Data Analysis, Elsevier, vol. 41(3-4), pages 577-590, January.
    23. Stanley Sclove, 1987. "Application of model-selection criteria to some problems in multivariate analysis," Psychometrika, Springer;The Psychometric Society, vol. 52(3), pages 333-343, September.
    24. Headrick, Todd C., 2002. "Fast fifth-order polynomial transforms for generating univariate and multivariate nonnormal distributions," Computational Statistics & Data Analysis, Elsevier, vol. 40(4), pages 685-711, October.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Gabriele Perrone & Gabriele Soffritti, 2023. "Seemingly unrelated clusterwise linear regression for contaminated data," Statistical Papers, Springer, vol. 64(3), pages 883-921, June.
    2. Brenton R. Clarke & Thomas Davidson & Robert Hammarstrand, 2017. "A comparison of the $$L_2$$ L 2 minimum distance estimator and the EM-algorithm when fitting $${\varvec{{k}}}$$ k -component univariate normal mixtures," Statistical Papers, Springer, vol. 58(4), pages 1247-1266, December.
    3. Camila Borelli Zeller & Celso Rômulo Barbosa Cabral & Víctor Hugo Lachos & Luis Benites, 2019. "Finite mixture of regression models for censored data based on scale mixtures of normal distributions," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 13(1), pages 89-116, March.
    4. Nguyen, Hien D. & McLachlan, Geoffrey J., 2016. "Linear mixed models with marginally symmetric nonparametric random effects," Computational Statistics & Data Analysis, Elsevier, vol. 103(C), pages 151-169.
    5. Angelo Mazza & Antonio Punzo, 2020. "Mixtures of multivariate contaminated normal regression models," Statistical Papers, Springer, vol. 61(2), pages 787-822, April.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Kim, Daeyoung & Seo, Byungtae, 2014. "Assessment of the number of components in Gaussian mixture models in the presence of multiple local maximizers," Journal of Multivariate Analysis, Elsevier, vol. 125(C), pages 100-120.
    2. Ana Oliveira-Brochado & Francisco Vitorino Martins, 2014. "Identifying Small Market Segments with Mixture Regression Models," International Journal of Finance, Insurance and Risk Management, International Journal of Finance, Insurance and Risk Management, vol. 4(4), pages 812-812.
    3. Morgan, Grant B. & Hodge, Kari J. & Baggett, Aaron R., 2016. "Latent profile analysis with nonnormal mixtures: A Monte Carlo examination of model selection using fit indices," Computational Statistics & Data Analysis, Elsevier, vol. 93(C), pages 146-161.
    4. Seo, Byungtae & Kim, Daeyoung, 2012. "Root selection in normal mixture models," Computational Statistics & Data Analysis, Elsevier, vol. 56(8), pages 2454-2470.
    5. Ana Oliveira-Brochado & Francisco Vitorino Martins, 2008. "Determining the Number of Market Segments Using an Experimental Design," FEP Working Papers 263, Universidade do Porto, Faculdade de Economia do Porto.
    6. O’Hagan, Adrian & Murphy, Thomas Brendan & Gormley, Isobel Claire, 2012. "Computational aspects of fitting mixture models via the expectation–maximization algorithm," Computational Statistics & Data Analysis, Elsevier, vol. 56(12), pages 3843-3864.
    7. Antonio Punzo & Paul. D. McNicholas, 2017. "Robust Clustering in Regression Analysis via the Contaminated Gaussian Cluster-Weighted Model," Journal of Classification, Springer;The Classification Society, vol. 34(2), pages 249-293, July.
    8. Angelo Mazza & Antonio Punzo, 2020. "Mixtures of multivariate contaminated normal regression models," Statistical Papers, Springer, vol. 61(2), pages 787-822, April.
    9. Sarstedt, Marko & Salcher, André, 2007. "Modellselektion in Finite Mixture PLS-Modellen," Discussion Papers in Business Administration 1394, University of Munich, Munich School of Management.
    10. Francesco BARTOLUCCI & Silvia BACCI & Claudia PIGINI, 2015. "A Misspecification Test for Finite-Mixture Logistic Models for Clustered Binary and Ordered Responses," Working Papers 410, Universita' Politecnica delle Marche (I), Dipartimento di Scienze Economiche e Sociali.
    11. Hung Tong & Cristina Tortora, 2022. "Model-based clustering and outlier detection with missing data," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 16(1), pages 5-30, March.
    12. Salvatore Ingrassia & Antonio Punzo & Giorgio Vittadini & Simona Minotti, 2015. "Erratum to: The Generalized Linear Mixed Cluster-Weighted Model," Journal of Classification, Springer;The Classification Society, vol. 32(2), pages 327-355, July.
    13. Paolo Berta & Salvatore Ingrassia & Antonio Punzo & Giorgio Vittadini, 2016. "Multilevel cluster-weighted models for the evaluation of hospitals," METRON, Springer;Sapienza Università di Roma, vol. 74(3), pages 275-292, December.
    14. Meng Li & Sijia Xiang & Weixin Yao, 2016. "Robust estimation of the number of components for mixtures of linear regression models," Computational Statistics, Springer, vol. 31(4), pages 1539-1555, December.
    15. Papastamoulis, Panagiotis & Martin-Magniette, Marie-Laure & Maugis-Rabusseau, Cathy, 2016. "On the estimation of mixtures of Poisson regression models with large number of components," Computational Statistics & Data Analysis, Elsevier, vol. 93(C), pages 97-106.
    16. Kerekes, Monika, 2012. "Growth miracles and failures in a Markov switching classification model of growth," Journal of Development Economics, Elsevier, vol. 98(2), pages 167-177.
    17. Salvatore Ingrassia & Antonio Punzo, 2020. "Cluster Validation for Mixtures of Regressions via the Total Sum of Squares Decomposition," Journal of Classification, Springer;The Classification Society, vol. 37(2), pages 526-547, July.
    18. Kerekes, Monika, 2009. "Growth miracles and failures in a Markov switching classification model of growth," Discussion Papers 2009/11, Free University Berlin, School of Business & Economics.
    19. Morris, Katherine & Punzo, Antonio & McNicholas, Paul D. & Browne, Ryan P., 2019. "Asymmetric clusters and outliers: Mixtures of multivariate contaminated shifted asymmetric Laplace distributions," Computational Statistics & Data Analysis, Elsevier, vol. 132(C), pages 145-166.
    20. Marhuenda, Yolanda & Morales, Domingo & del Carmen Pardo, María, 2014. "Information criteria for Fay–Herriot model selection," Computational Statistics & Data Analysis, Elsevier, vol. 70(C), pages 268-280.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:stpapr:v:55:y:2014:i:3:p:871-911. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.