IDEAS home Printed from https://ideas.repec.org/a/taf/jnlasa/v113y2018i521p340-356.html
   My bibliography  Save this article

Mixture Models With a Prior on the Number of Components

Author

Listed:
  • Jeffrey W. Miller
  • Matthew T. Harrison

Abstract

A natural Bayesian approach for mixture models with an unknown number of components is to take the usual finite mixture model with symmetric Dirichlet weights, and put a prior on the number of components—that is, to use a mixture of finite mixtures (MFM). The most commonly used method of inference for MFMs is reversible jump Markov chain Monte Carlo, but it can be nontrivial to design good reversible jump moves, especially in high-dimensional spaces. Meanwhile, there are samplers for Dirichlet process mixture (DPM) models that are relatively simple and are easily adapted to new applications. It turns out that, in fact, many of the essential properties of DPMs are also exhibited by MFMs—an exchangeable partition distribution, restaurant process, random measure representation, and stick-breaking representation—and crucially, the MFM analogues are simple enough that they can be used much like the corresponding DPM properties. Consequently, many of the powerful methods developed for inference in DPMs can be directly applied to MFMs as well; this simplifies the implementation of MFMs and can substantially improve mixing. We illustrate with real and simulated data, including high-dimensional gene expression data used to discriminate cancer subtypes. Supplementary materials for this article are available online.

Suggested Citation

  • Jeffrey W. Miller & Matthew T. Harrison, 2018. "Mixture Models With a Prior on the Number of Components," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 113(521), pages 340-356, January.
  • Handle: RePEc:taf:jnlasa:v:113:y:2018:i:521:p:340-356
    DOI: 10.1080/01621459.2016.1255636
    as

    Download full text from publisher

    File URL: http://hdl.handle.net/10.1080/01621459.2016.1255636
    Download Restriction: Access to full text is restricted to subscribers.

    File URL: https://libkey.io/10.1080/01621459.2016.1255636?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Betancourt, Brenda & Sosa, Juan & Rodríguez, Abel, 2022. "A prior for record linkage based on allelic partitions," Computational Statistics & Data Analysis, Elsevier, vol. 172(C).
    2. Zaheer Ahmed & Alberto Cassese & Gerard Breukelen & Jan Schepers, 2021. "REMAXINT: a two-mode clustering-based method for statistical inference on two-way interaction," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 15(4), pages 987-1013, December.
    3. Zaheer Ahmed & Alberto Cassese & Gerard Breukelen & Jan Schepers, 2023. "E-ReMI: Extended Maximal Interaction Two-mode Clustering," Journal of Classification, Springer;The Classification Society, vol. 40(2), pages 298-331, July.
    4. Jiao Jieying & Hu Guanyu & Yan Jun, 2021. "A Bayesian marked spatial point processes model for basketball shot chart," Journal of Quantitative Analysis in Sports, De Gruyter, vol. 17(2), pages 77-90, June.
    5. Ludkin, Matthew, 2020. "Inference for a generalised stochastic block model with unknown number of blocks and non-conjugate edge models," Computational Statistics & Data Analysis, Elsevier, vol. 152(C).
    6. Jieying Jiao & Guanyu Hu & Jun Yan, 2021. "Heterogeneity pursuit for spatial point pattern with application to tree locations: A Bayesian semiparametric recourse," Environmetrics, John Wiley & Sons, Ltd., vol. 32(7), November.
    7. Guanyu Hu & Yishu Xue & Zhihua Ma, 2020. "Bayesian Clustered Coefficients Regression with Auxiliary Covariates Assistant Random Effects," Papers 2004.12022, arXiv.org, revised Aug 2021.
    8. Miller Jeffrey W., 2023. "Consistency of mixture models with a prior on the number of components," Dependence Modeling, De Gruyter, vol. 11(1), pages 1-9, January.
    9. Zhenke Wu & Livia Casciola‐Rosen & Antony Rosen & Scott L. Zeger, 2021. "A Bayesian approach to restricted latent class models for scientifically structured clustering of multivariate binary outcomes," Biometrics, The International Biometric Society, vol. 77(4), pages 1431-1444, December.
    10. Burghardt, Elliot & Sewell, Daniel & Cavanaugh, Joseph, 2022. "Agglomerative and divisive hierarchical Bayesian clustering," Computational Statistics & Data Analysis, Elsevier, vol. 176(C).
    11. Im, Yunju & Tan, Aixin, 2021. "Bayesian subgroup analysis in regression using mixture models," Computational Statistics & Data Analysis, Elsevier, vol. 162(C).
    12. Bettina Grün & Gertraud Malsiner-Walli & Sylvia Frühwirth-Schnatter, 2022. "How many data clusters are in the Galaxy data set?," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 16(2), pages 325-349, June.
    13. Han, Ningren & Ram, Rajeev J., 2020. "Bayesian modeling and computation for analyte quantification in complex mixtures using Raman spectroscopy," Computational Statistics & Data Analysis, Elsevier, vol. 143(C).
    14. Laura D'Angelo & Antonio Canale & Zhaoxia Yu & Michele Guindani, 2023. "Bayesian nonparametric analysis for the detection of spikes in noisy calcium imaging data," Biometrics, The International Biometric Society, vol. 79(2), pages 1370-1382, June.
    15. Sylvia Frühwirth-Schnatter & Gertraud Malsiner-Walli, 2019. "From here to infinity: sparse finite versus Dirichlet process mixtures in model-based clustering," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 13(1), pages 33-64, March.
    16. Buschbom, Jutta, 2018. "Exploring and validating statistical reliability in forensic conservation genetics," Thünen Reports 63, Johann Heinrich von Thünen Institute, Federal Research Institute for Rural Areas, Forestry and Fisheries.
    17. L Schiavon & A Canale & D B Dunson, 2022. "Generalized infinite factorization models [A latent factor linear mixed model for high-dimensional longitudinal data analysis]," Biometrika, Biometrika Trust, vol. 109(3), pages 817-835.
    18. Grazian, Clara & Villa, Cristiano & Liseo, Brunero, 2020. "On a loss-based prior for the number of components in mixture models," Statistics & Probability Letters, Elsevier, vol. 158(C).

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:taf:jnlasa:v:113:y:2018:i:521:p:340-356. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Longhurst (email available below). General contact details of provider: http://www.tandfonline.com/UASA20 .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.