IDEAS home Printed from https://ideas.repec.org/a/eee/stapro/v226y2025ics0167715225001543.html

Confidence set for mixture order selection

Author

Listed:
  • Casa, Alessandro
  • Ferrari, Davide

Abstract

A fundamental challenge in approximating an unknown density using finite Gaussian mixture models is selecting the number of mixture components, also known as order. Traditional approaches choose a single best model using information criteria. However, often models with different orders yield similar fits, leading to substantial model selection uncertainty and making it challenging to identify the optimal number of components. In this paper, we introduce the Model Selection Confidence Set (MSCS) for order selection in Gaussian mixtures – a set-valued estimator that, with a predefined confidence level, includes the true mixture order across repeated samples. Rather than selecting a single model, our MSCS identifies all plausible orders by determining whether each candidate model is at least as plausible as the best-selected one, using a screening based on a penalized likelihood ratio statistic. We provide theoretical guarantees for asymptotic coverage, and demonstrate its practical advantages through simulations and real data analysis.

Suggested Citation

  • Casa, Alessandro & Ferrari, Davide, 2025. "Confidence set for mixture order selection," Statistics & Probability Letters, Elsevier, vol. 226(C).
  • Handle: RePEc:eee:stapro:v:226:y:2025:i:c:s0167715225001543
    DOI: 10.1016/j.spl.2025.110509
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167715225001543
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.spl.2025.110509?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to

    for a different version of it.

    References listed on IDEAS

    as
    1. Sylvia. Richardson & Peter J. Green, 1997. "On Bayesian Analysis of Mixtures with an Unknown Number of Components (with discussion)," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 59(4), pages 731-792.
    2. Vuong, Quang H, 1989. "Likelihood Ratio Tests for Model Selection and Non-nested Hypotheses," Econometrica, Econometric Society, vol. 57(2), pages 307-333, March.
    3. Wichitchan, Supawadee & Yao, Weixin & Yang, Guangren, 2019. "Hypothesis testing for finite mixture models," Computational Statistics & Data Analysis, Elsevier, vol. 132(C), pages 180-189.
    4. Chen, Jiahua & Khalili, Abbas, 2009. "Order Selection in Finite Mixture Models With a Nonsmooth Penalty," Journal of the American Statistical Association, American Statistical Association, vol. 104(485), pages 187-196.
    5. G. J. McLachlan, 1987. "On Bootstrapping the Likelihood Ratio Test Statistic for the Number of Components in a Normal Mixture," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 36(3), pages 318-324, November.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Lo, Yungtai, 2005. "Likelihood ratio tests of the number of components in a normal mixture with unequal variances," Statistics & Probability Letters, Elsevier, vol. 71(3), pages 225-235, March.
    2. Derek S. Young & Xi Chen & Dilrukshi C. Hewage & Ricardo Nilo-Poyanco, 2019. "Finite mixture-of-gamma distributions: estimation, inference, and model-based clustering," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 13(4), pages 1053-1082, December.
    3. Daniel McNeish & Jeffrey R. Harring, 2017. "The Effect of Model Misspecification on Growth Mixture Model Class Enumeration," Journal of Classification, Springer;The Classification Society, vol. 34(2), pages 223-248, July.
    4. Lo, Yungtai, 2011. "Bias from misspecification of the component variances in a normal mixture," Computational Statistics & Data Analysis, Elsevier, vol. 55(9), pages 2739-2747, September.
    5. Roy Levy & Gregory R. Hancock, 2011. "An Extended Model Comparison Framework for Covariance and Mean Structure Models, Accommodating Multiple Groups and Latent Mixtures," Sociological Methods & Research, , vol. 40(2), pages 256-278, May.
    6. Polymenis, A. & Titterington, D. M., 1998. "On the determination of the number of components in a mixture," Statistics & Probability Letters, Elsevier, vol. 38(4), pages 295-298, July.
    7. Roberto Zelli & Maria Grazia Pittau, 2006. "Empirical evidence of income dynamics across EU regions," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 21(5), pages 605-628.
    8. Cong, Lin & Yao, Weixin, 2021. "A Likelihood Ratio Test of a Homoscedastic Multivariate Normal Mixture Against a Heteroscedastic Multivariate Normal Mixture," Econometrics and Statistics, Elsevier, vol. 18(C), pages 79-88.
    9. Daniel Fernández & Richard Arnold & Shirley Pledger & Ivy Liu & Roy Costilla, 2019. "Finite mixture biclustering of discrete type multivariate data," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 13(1), pages 117-143, March.
    10. Vaidehi Dixit & Ryan Martin, 2022. "Estimating a Mixing Distribution on the Sphere Using Predictive Recursion," Sankhya B: The Indian Journal of Statistics, Springer;Indian Statistical Institute, vol. 84(2), pages 596-626, November.
    11. Arun Gopalakrishnan & Eric T. Bradlow & Peter S. Fader, 2017. "A Cross-Cohort Changepoint Model for Customer-Base Analysis," Marketing Science, INFORMS, vol. 36(2), pages 195-213, March.
    12. Bettina Grün & Gertraud Malsiner-Walli & Sylvia Frühwirth-Schnatter, 2022. "How many data clusters are in the Galaxy data set?," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 16(2), pages 325-349, June.
    13. Fabrice Gilles & Sabina Issehnane & Florent Sari, 2022. "Using short-term jobs as a way to find a regular job. What kind of role for local context?," TEPP Working Paper 2022-07, TEPP.
    14. Vipin Arora & Shuping Shi, 2016. "Nonlinearities and tests of asset price bubbles," Empirical Economics, Springer, vol. 50(4), pages 1421-1433, June.
    15. Luiz Paulo Fávero & Joseph F. Hair & Rafael de Freitas Souza & Matheus Albergaria & Talles V. Brugni, 2021. "Zero-Inflated Generalized Linear Mixed Models: A Better Way to Understand Data Relationships," Mathematics, MDPI, vol. 9(10), pages 1-28, May.
    16. Da Fonseca José & Grasselli Martino & Ielpo Florian, 2014. "Estimating the Wishart Affine Stochastic Correlation Model using the empirical characteristic function," Studies in Nonlinear Dynamics & Econometrics, De Gruyter, vol. 18(3), pages 253-289, May.
    17. Das, Marcel & van Soest, Arthur, 1999. "A panel data model for subjective information on household income growth," Journal of Economic Behavior & Organization, Elsevier, vol. 40(4), pages 409-426, December.
    18. Gillespie, Colin S., 2015. "Fitting Heavy Tailed Distributions: The poweRlaw Package," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 64(i02).
    19. Luis Garicano & Thomas N. Hubbard, 2016. "The Returns to Knowledge Hierarchies," The Journal of Law, Economics, and Organization, Oxford University Press, vol. 32(4), pages 653-684.
    20. Yen, Steven T. & Chern, Wen S. & Lee, Hwang-Jaw, "undated". "Effects Of Income Sources On Household Food Expenditures," 1991 Annual Meeting, August 4-7, Manhattan, Kansas 271167, American Agricultural Economics Association (New Name 2008: Agricultural and Applied Economics Association).

    More about this item

    Keywords

    ;
    ;
    ;
    ;
    ;

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:stapro:v:226:y:2025:i:c:s0167715225001543. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/wps/find/journaldescription.cws_home/622892/description#description .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.