IDEAS home Printed from https://ideas.repec.org/a/bla/jorssb/v70y2008i1p119-139.html

Clustering using objective functions and stochastic search

Author

Listed:
  • James G. Booth
  • George Casella
  • James P. Hobert

Abstract

Summary. A new approach to clustering multivariate data, based on a multilevel linear mixed model, is proposed. A key feature of the model is that observations from the same cluster are correlated, because they share cluster‐specific random effects. The inclusion of cluster‐specific random effects allows parsimonious departure from an assumed base model for cluster mean profiles. This departure is captured statistically via the posterior expectation, or best linear unbiased predictor. One of the parameters in the model is the true underlying partition of the data, and the posterior distribution of this parameter, which is known up to a normalizing constant, is used to cluster the data. The problem of finding partitions with high posterior probability is not amenable to deterministic methods such as the EM algorithm. Thus, we propose a stochastic search algorithm that is driven by a Markov chain that is a mixture of two Metropolis–Hastings algorithms—one that makes small scale changes to individual objects and another that performs large scale moves involving entire clusters. The methodology proposed is fundamentally different from the well‐known finite mixture model approach to clustering, which does not explicitly include the partition as a parameter, and involves an independent and identically distributed structure.

Suggested Citation

  • James G. Booth & George Casella & James P. Hobert, 2008. "Clustering using objective functions and stochastic search," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 70(1), pages 119-139, February.
  • Handle: RePEc:bla:jorssb:v:70:y:2008:i:1:p:119-139
    DOI: 10.1111/j.1467-9868.2007.00629.x
    as

    Download full text from publisher

    File URL: https://doi.org/10.1111/j.1467-9868.2007.00629.x
    Download Restriction: no

    File URL: https://libkey.io/10.1111/j.1467-9868.2007.00629.x?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Ruppert,David & Wand,M. P. & Carroll,R. J., 2003. "Semiparametric Regression," Cambridge Books, Cambridge University Press, number 9780521780506, August.
    2. Hitchcock, David B. & Casella, George & Booth, James G., 2006. "Improved Estimation of Dissimilarities by Presmoothing Functional Data," Journal of the American Statistical Association, American Statistical Association, vol. 101, pages 211-222, March.
    3. Ruppert,David & Wand,M. P. & Carroll,R. J., 2003. "Semiparametric Regression," Cambridge Books, Cambridge University Press, number 9780521785167, August.
    4. Heard, Nicholas A. & Holmes, Christopher C. & Stephens, David A., 2006. "A Quantitative Study of Gene Regulation Involved in the Immune Response of Anopheline Mosquitoes: An Application of Bayesian Hierarchical Clustering of Curves," Journal of the American Statistical Association, American Statistical Association, vol. 101, pages 18-29, March.
    5. Celeux, Gilles & Govaert, Gerard, 1992. "A classification EM algorithm for clustering and two stochastic versions," Computational Statistics & Data Analysis, Elsevier, vol. 14(3), pages 315-332, October.
    6. Serban, Nicoleta & Wasserman, Larry, 2005. "CATS: Clustering After Transformation and Smoothing," Journal of the American Statistical Association, American Statistical Association, vol. 100, pages 990-999, September.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Yu Fei & Rongli Li & Zhouhong Li & Guoqi Qian, 2024. "Clustering Longitudinal Data for Growth Curve Modelling by Gibbs Sampler and Information Criterion," Journal of Classification, Springer;The Classification Society, vol. 41(2), pages 371-401, July.
    2. Xu, Peirong & Peng, Heng & Huang, Tao, 2018. "Unsupervised learning of mixture regression models for longitudinal data," Computational Statistics & Data Analysis, Elsevier, vol. 125(C), pages 44-56.
    3. Sam Hui & Eric Bradlow, 2012. "Bayesian multi-resolution spatial analysis with applications to marketing," Quantitative Marketing and Economics (QME), Springer, vol. 10(4), pages 419-452, December.
    4. Yongsung Joo & George Casella & James Hobert, 2010. "Bayesian model-based tight clustering for time course data," Computational Statistics, Springer, vol. 25(1), pages 17-38, March.
    5. Nicoleta Serban & Huijing Jiang, 2012. "Multilevel Functional Clustering Analysis," Biometrics, The International Biometric Society, vol. 68(3), pages 805-814, September.
    6. Wan-Lun Wang, 2019. "Mixture of multivariate t nonlinear mixed models for multiple longitudinal data with heterogeneity and missing values," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 28(1), pages 196-222, March.
    7. Charlotte Articus & Jan Pablo Burgard, 2014. "A Finite Mixture Fay Herriot-type model for estimating regional rental prices in Germany," Research Papers in Economics 2014-14, University of Trier, Department of Economics.
    8. Chen Sui-Pi & Huang Guan-Hua, 2014. "A Bayesian clustering approach for detecting gene-gene interactions in high-dimensional genotype data," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 13(3), pages 275-297, June.
    9. Wan-Lun Wang & Yu-Chen Yang & Tsung-I Lin, 2024. "Extending finite mixtures of nonlinear mixed-effects models with covariate-dependent mixing weights," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 18(2), pages 271-307, June.
    10. Francisco H. C. Alencar & Larissa A Matos & Víctor H. Lachos, 2022. "Finite Mixture of Censored Linear Mixed Models for Irregularly Observed Longitudinal Data," Journal of Classification, Springer;The Classification Society, vol. 39(3), pages 463-486, November.
    11. Yang, Yu-Chen & Lin, Tsung-I & Castro, Luis M. & Wang, Wan-Lun, 2020. "Extending finite mixtures of t linear mixed-effects models with concomitant covariates," Computational Statistics & Data Analysis, Elsevier, vol. 148(C).
    12. Jessie J Hsu & Dianne M Finkelstein & David A Schoenfeld, 2015. "Outcome-Driven Cluster Analysis with Application to Microarray Data," PLOS ONE, Public Library of Science, vol. 10(11), pages 1-15, November.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Huaihou Chen & Philip T. Reiss & Thaddeus Tarpey, 2014. "Optimally weighted L-super-2 distance for functional data," Biometrics, The International Biometric Society, vol. 70(3), pages 516-525, September.
    2. Otto-Sobotka, Fabian & Salvati, Nicola & Ranalli, Maria Giovanna & Kneib, Thomas, 2019. "Adaptive semiparametric M-quantile regression," Econometrics and Statistics, Elsevier, vol. 11(C), pages 116-129.
    3. Mestekemper, Thomas & Windmann, Michael & Kauermann, Göran, 2010. "Functional hourly forecasting of water temperature," International Journal of Forecasting, Elsevier, vol. 26(4), pages 684-699, October.
    4. Naschold, Felix, 2012. "“The Poor Stay Poor”: Household Asset Poverty Traps in Rural Semi-Arid India," World Development, Elsevier, vol. 40(10), pages 2033-2043.
    5. Arthur Charpentier & Emmanuel Flachaire & Antoine Ly, 2017. "Econom\'etrie et Machine Learning," Papers 1708.06992, arXiv.org, revised Mar 2018.
    6. Hyunju Son & Youyi Fong, 2021. "Fast grid search and bootstrap‐based inference for continuous two‐phase polynomial regression models," Environmetrics, John Wiley & Sons, Ltd., vol. 32(3), May.
    7. Welham, S.J. & Thompson, R., 2009. "A note on bimodality in the log-likelihood function for penalized spline mixed models," Computational Statistics & Data Analysis, Elsevier, vol. 53(4), pages 920-931, February.
    8. Longhi, Christian & Musolesi, Antonio & Baumont, Catherine, 2014. "Modeling structural change in the European metropolitan areas during the process of economic integration," Economic Modelling, Elsevier, vol. 37(C), pages 395-407.
    9. Kuhlenkasper, Torben & Kauermann, Göran, 2010. "Female wage profiles: An additive mixed model approach to employment breaks due to childcare," HWWI Research Papers 2-18, Hamburg Institute of International Economics (HWWI).
    10. Strasak, Alexander M. & Umlauf, Nikolaus & Pfeiffer, Ruth M. & Lang, Stefan, 2011. "Comparing penalized splines and fractional polynomials for flexible modelling of the effects of continuous predictor variables," Computational Statistics & Data Analysis, Elsevier, vol. 55(4), pages 1540-1551, April.
    11. Christian Schluter & Jackline Wahba, 2012. "Abstract: Illegal Migration, Wages, and Remittances: Semi-Parametric Estimation of Illegality Effects," Norface Discussion Paper Series 2012037, Norface Research Programme on Migration, Department of Economics, University College London.
    12. Zi Ye & Giles Hooker & Stephen P. Ellner, 2021. "Generalized Single Index Models and Jensen Effects on Reproduction and Survival," Journal of Agricultural, Biological and Environmental Statistics, Springer;The International Biometric Society;American Statistical Association, vol. 26(3), pages 492-512, September.
    13. Esra Kürüm & Danh V. Nguyen & Qi Qian & Sudipto Banerjee & Connie M. Rhee & Damla Şentürk, 2024. "Spatiotemporal multilevel joint modeling of longitudinal and survival outcomes in end-stage kidney disease," Lifetime Data Analysis: An International Journal Devoted to Statistical Methods and Applications for Time-to-Event Data, Springer, vol. 30(4), pages 827-852, October.
    14. Daniel Edinam Wormenor & Sampson Twumasi-Ankrah & Accam Burnett Tetteh, 2025. "Comparison of Semiparametric Models in the Presence of Noise and Outliers," Journal of Applied Mathematics, John Wiley & Sons, vol. 2025(1).
    15. Ferraccioli, Federico & Sangalli, Laura M. & Finos, Livio, 2022. "Some first inferential tools for spatial regression with differential regularization," Journal of Multivariate Analysis, Elsevier, vol. 189(C).
    16. Vahid Goodarzi Vanani & Davood Shahsavani & Mohammad Kazemi, 2025. "A robust partial linear model combining modified Huber loss function and variable selection," Statistical Papers, Springer, vol. 66(6), pages 1-28, October.
    17. Blöchl, Andreas, 2014. "Trend Estimation with Penalized Splines as Mixed Models for Series with Structural Breaks," Discussion Papers in Economics 18446, University of Munich, Department of Economics.
    18. Akdeniz Duran, Esra & Härdle, Wolfgang Karl & Osipenko, Maria, 2012. "Difference based ridge and Liu type estimators in semiparametric regression models," Journal of Multivariate Analysis, Elsevier, vol. 105(1), pages 164-175.
    19. Jullion, Astrid & Lambert, Philippe, 2007. "Robust specification of the roughness penalty prior distribution in spatially adaptive Bayesian P-splines models," Computational Statistics & Data Analysis, Elsevier, vol. 51(5), pages 2542-2558, February.
    20. Skaug, Hans J. & Fournier, David A., 2006. "Automatic approximation of the marginal likelihood in non-Gaussian hierarchical models," Computational Statistics & Data Analysis, Elsevier, vol. 51(2), pages 699-709, November.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:jorssb:v:70:y:2008:i:1:p:119-139. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: https://edirc.repec.org/data/rssssea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.