Confidence set for mixture order selection

Confidence set for mixture order selection

Author

Listed:

Casa, Alessandro
Ferrari, Davide

Abstract

A fundamental challenge in approximating an unknown density using finite Gaussian mixture models is selecting the number of mixture components, also known as order. Traditional approaches choose a single best model using information criteria. However, often models with different orders yield similar fits, leading to substantial model selection uncertainty and making it challenging to identify the optimal number of components. In this paper, we introduce the Model Selection Confidence Set (MSCS) for order selection in Gaussian mixtures – a set-valued estimator that, with a predefined confidence level, includes the true mixture order across repeated samples. Rather than selecting a single model, our MSCS identifies all plausible orders by determining whether each candidate model is at least as plausible as the best-selected one, using a screening based on a penalized likelihood ratio statistic. We provide theoretical guarantees for asymptotic coverage, and demonstrate its practical advantages through simulations and real data analysis.

Suggested Citation

Casa, Alessandro & Ferrari, Davide, 2025. "Confidence set for mixture order selection," Statistics & Probability Letters, Elsevier, vol. 226(C).

Handle: RePEc:eee:stapro:v:226:y:2025:i:c:s0167715225001543
DOI: 10.1016/j.spl.2025.110509

Download full text from publisher

As the access to this document is restricted, you may want to

for a different version of it.

References listed on IDEAS

Sylvia. Richardson & Peter J. Green, 1997. "On Bayesian Analysis of Mixtures with an Unknown Number of Components (with discussion)," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 59(4), pages 731-792.
Vuong, Quang H, 1989. "Likelihood Ratio Tests for Model Selection and Non-nested Hypotheses," Econometrica, Econometric Society, vol. 57(2), pages 307-333, March.
Wichitchan, Supawadee & Yao, Weixin & Yang, Guangren, 2019. "Hypothesis testing for finite mixture models," Computational Statistics & Data Analysis, Elsevier, vol. 132(C), pages 180-189.
Chen, Jiahua & Khalili, Abbas, 2009. "Order Selection in Finite Mixture Models With a Nonsmooth Penalty," Journal of the American Statistical Association, American Statistical Association, vol. 104(485), pages 187-196.
G. J. McLachlan, 1987. "On Bootstrapping the Likelihood Ratio Test Statistic for the Number of Components in a Normal Mixture," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 36(3), pages 318-324, November.

Full references (including those not matched with items on IDEAS)

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Lo, Yungtai, 2005. "Likelihood ratio tests of the number of components in a normal mixture with unequal variances," Statistics & Probability Letters, Elsevier, vol. 71(3), pages 225-235, March.
Derek S. Young & Xi Chen & Dilrukshi C. Hewage & Ricardo Nilo-Poyanco, 2019. "Finite mixture-of-gamma distributions: estimation, inference, and model-based clustering," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 13(4), pages 1053-1082, December.
Daniel McNeish & Jeffrey R. Harring, 2017. "The Effect of Model Misspecification on Growth Mixture Model Class Enumeration," Journal of Classification, Springer;The Classification Society, vol. 34(2), pages 223-248, July.
Lo, Yungtai, 2011. "Bias from misspecification of the component variances in a normal mixture," Computational Statistics & Data Analysis, Elsevier, vol. 55(9), pages 2739-2747, September.
Roy Levy & Gregory R. Hancock, 2011. "An Extended Model Comparison Framework for Covariance and Mean Structure Models, Accommodating Multiple Groups and Latent Mixtures," Sociological Methods & Research, , vol. 40(2), pages 256-278, May.
Polymenis, A. & Titterington, D. M., 1998. "On the determination of the number of components in a mixture," Statistics & Probability Letters, Elsevier, vol. 38(4), pages 295-298, July.
Roberto Zelli & Maria Grazia Pittau, 2006. "Empirical evidence of income dynamics across EU regions," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 21(5), pages 605-628.
- Maria Grazia Pittau & Roberto Zelli, 2006. "Empirical evidence of income dynamics across EU regions," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 21(5), pages 605-628, July.
Cong, Lin & Yao, Weixin, 2021. "A Likelihood Ratio Test of a Homoscedastic Multivariate Normal Mixture Against a Heteroscedastic Multivariate Normal Mixture," Econometrics and Statistics, Elsevier, vol. 18(C), pages 79-88.
Daniel Fernández & Richard Arnold & Shirley Pledger & Ivy Liu & Roy Costilla, 2019. "Finite mixture biclustering of discrete type multivariate data," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 13(1), pages 117-143, March.
Vaidehi Dixit & Ryan Martin, 2022. "Estimating a Mixing Distribution on the Sphere Using Predictive Recursion," Sankhya B: The Indian Journal of Statistics, Springer;Indian Statistical Institute, vol. 84(2), pages 596-626, November.
Arun Gopalakrishnan & Eric T. Bradlow & Peter S. Fader, 2017. "A Cross-Cohort Changepoint Model for Customer-Base Analysis," Marketing Science, INFORMS, vol. 36(2), pages 195-213, March.
Bettina Grün & Gertraud Malsiner-Walli & Sylvia Frühwirth-Schnatter, 2022. "How many data clusters are in the Galaxy data set?," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 16(2), pages 325-349, June.
Fabrice Gilles & Sabina Issehnane & Florent Sari, 2022. "Using short-term jobs as a way to find a regular job. What kind of role for local context?," TEPP Working Paper 2022-07, TEPP.
Vipin Arora & Shuping Shi, 2016. "Nonlinearities and tests of asset price bubbles," Empirical Economics, Springer, vol. 50(4), pages 1421-1433, June.
Luiz Paulo Fávero & Joseph F. Hair & Rafael de Freitas Souza & Matheus Albergaria & Talles V. Brugni, 2021. "Zero-Inflated Generalized Linear Mixed Models: A Better Way to Understand Data Relationships," Mathematics, MDPI, vol. 9(10), pages 1-28, May.
Da Fonseca José & Grasselli Martino & Ielpo Florian, 2014. "Estimating the Wishart Affine Stochastic Correlation Model using the empirical characteristic function," Studies in Nonlinear Dynamics & Econometrics, De Gruyter, vol. 18(3), pages 253-289, May.
Das, Marcel & van Soest, Arthur, 1999. "A panel data model for subjective information on household income growth," Journal of Economic Behavior & Organization, Elsevier, vol. 40(4), pages 409-426, December.
- Das, J.W.M. & van Soest, A.H.O., 1996. "A Panel Data Model for Subjective Information on Household Income Growth," Discussion Paper 1996-75, Tilburg University, Center for Economic Research.
- Das, J.W.M. & van Soest, A.H.O., 1996. "A Panel Data Model for Subjective Information on Household Income Growth," Other publications TiSEM 9111e0db-3678-40a7-ab7f-1, Tilburg University, School of Economics and Management.
Gillespie, Colin S., 2015. "Fitting Heavy Tailed Distributions: The poweRlaw Package," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 64(i02).
Luis Garicano & Thomas N. Hubbard, 2016. "The Returns to Knowledge Hierarchies," The Journal of Law, Economics, and Organization, Oxford University Press, vol. 32(4), pages 653-684.
- Thomas Hubbard & Luis Garicano, 2007. "The Return to Knowledge Hierarchies," Working Papers 07-01, Center for Economic Studies, U.S. Census Bureau.
- Luis Garicano & Thomas N. Hubbard, 2007. "The Return to Knowledge Hierarchies," NBER Working Papers 12815, National Bureau of Economic Research, Inc.
- Hubbard, Thomas N. & Garicano, Luis, 2007. "The Return to Knowledge Hierarchies," CEPR Discussion Papers 6077, Centre for Economic Policy Research.
- Garicano, Luis & Hubbard, Thomas N., 2016. "The returns to knowledge hierarchies," LSE Research Online Documents on Economics 68590, London School of Economics and Political Science, LSE Library.
Yen, Steven T. & Chern, Wen S. & Lee, Hwang-Jaw, "undated". "Effects Of Income Sources On Household Food Expenditures," 1991 Annual Meeting, August 4-7, Manhattan, Kansas 271167, American Agricultural Economics Association (New Name 2008: Agricultural and Applied Economics Association).

More about this item

Keywords

; ; ; ; ;

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:stapro:v:226:y:2025:i:c:s0167715225001543. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/wps/find/journaldescription.cws_home/622892/description#description .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

Confidence set for mixture order selection

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Most related items

More about this item

Keywords

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data