IDEAS home Printed from https://ideas.repec.org/a/bla/jorssb/v70y2008i1p119-139.html
   My bibliography  Save this article

Clustering using objective functions and stochastic search

Author

Listed:
  • James G. Booth
  • George Casella
  • James P. Hobert

Abstract

Summary. A new approach to clustering multivariate data, based on a multilevel linear mixed model, is proposed. A key feature of the model is that observations from the same cluster are correlated, because they share cluster‐specific random effects. The inclusion of cluster‐specific random effects allows parsimonious departure from an assumed base model for cluster mean profiles. This departure is captured statistically via the posterior expectation, or best linear unbiased predictor. One of the parameters in the model is the true underlying partition of the data, and the posterior distribution of this parameter, which is known up to a normalizing constant, is used to cluster the data. The problem of finding partitions with high posterior probability is not amenable to deterministic methods such as the EM algorithm. Thus, we propose a stochastic search algorithm that is driven by a Markov chain that is a mixture of two Metropolis–Hastings algorithms—one that makes small scale changes to individual objects and another that performs large scale moves involving entire clusters. The methodology proposed is fundamentally different from the well‐known finite mixture model approach to clustering, which does not explicitly include the partition as a parameter, and involves an independent and identically distributed structure.

Suggested Citation

  • James G. Booth & George Casella & James P. Hobert, 2008. "Clustering using objective functions and stochastic search," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 70(1), pages 119-139, February.
  • Handle: RePEc:bla:jorssb:v:70:y:2008:i:1:p:119-139
    DOI: 10.1111/j.1467-9868.2007.00629.x
    as

    Download full text from publisher

    File URL: https://doi.org/10.1111/j.1467-9868.2007.00629.x
    Download Restriction: no

    File URL: https://libkey.io/10.1111/j.1467-9868.2007.00629.x?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Hitchcock, David B. & Casella, George & Booth, James G., 2006. "Improved Estimation of Dissimilarities by Presmoothing Functional Data," Journal of the American Statistical Association, American Statistical Association, vol. 101, pages 211-222, March.
    2. Ruppert,David & Wand,M. P. & Carroll,R. J., 2003. "Semiparametric Regression," Cambridge Books, Cambridge University Press, number 9780521785167.
    3. Heard, Nicholas A. & Holmes, Christopher C. & Stephens, David A., 2006. "A Quantitative Study of Gene Regulation Involved in the Immune Response of Anopheline Mosquitoes: An Application of Bayesian Hierarchical Clustering of Curves," Journal of the American Statistical Association, American Statistical Association, vol. 101, pages 18-29, March.
    4. Celeux, Gilles & Govaert, Gerard, 1992. "A classification EM algorithm for clustering and two stochastic versions," Computational Statistics & Data Analysis, Elsevier, vol. 14(3), pages 315-332, October.
    5. Ruppert,David & Wand,M. P. & Carroll,R. J., 2003. "Semiparametric Regression," Cambridge Books, Cambridge University Press, number 9780521780506.
    6. Serban, Nicoleta & Wasserman, Larry, 2005. "CATS: Clustering After Transformation and Smoothing," Journal of the American Statistical Association, American Statistical Association, vol. 100, pages 990-999, September.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Yongsung Joo & George Casella & James Hobert, 2010. "Bayesian model-based tight clustering for time course data," Computational Statistics, Springer, vol. 25(1), pages 17-38, March.
    2. Nicoleta Serban & Huijing Jiang, 2012. "Multilevel Functional Clustering Analysis," Biometrics, The International Biometric Society, vol. 68(3), pages 805-814, September.
    3. Wan-Lun Wang, 2019. "Mixture of multivariate t nonlinear mixed models for multiple longitudinal data with heterogeneity and missing values," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 28(1), pages 196-222, March.
    4. Chen Sui-Pi & Huang Guan-Hua, 2014. "A Bayesian clustering approach for detecting gene-gene interactions in high-dimensional genotype data," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 13(3), pages 1-23, June.
    5. Jessie J Hsu & Dianne M Finkelstein & David A Schoenfeld, 2015. "Outcome-Driven Cluster Analysis with Application to Microarray Data," PLOS ONE, Public Library of Science, vol. 10(11), pages 1-15, November.
    6. Xu, Peirong & Peng, Heng & Huang, Tao, 2018. "Unsupervised learning of mixture regression models for longitudinal data," Computational Statistics & Data Analysis, Elsevier, vol. 125(C), pages 44-56.
    7. Sam Hui & Eric Bradlow, 2012. "Bayesian multi-resolution spatial analysis with applications to marketing," Quantitative Marketing and Economics (QME), Springer, vol. 10(4), pages 419-452, December.
    8. Charlotte Articus & Jan Pablo Burgard, 2014. "A Finite Mixture Fay Herriot-type model for estimating regional rental prices in Germany," Research Papers in Economics 2014-14, University of Trier, Department of Economics.
    9. Francisco H. C. Alencar & Larissa A Matos & Víctor H. Lachos, 2022. "Finite Mixture of Censored Linear Mixed Models for Irregularly Observed Longitudinal Data," Journal of Classification, Springer;The Classification Society, vol. 39(3), pages 463-486, November.
    10. Yang, Yu-Chen & Lin, Tsung-I & Castro, Luis M. & Wang, Wan-Lun, 2020. "Extending finite mixtures of t linear mixed-effects models with concomitant covariates," Computational Statistics & Data Analysis, Elsevier, vol. 148(C).

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Huaihou Chen & Philip T. Reiss & Thaddeus Tarpey, 2014. "Optimally weighted L-super-2 distance for functional data," Biometrics, The International Biometric Society, vol. 70(3), pages 516-525, September.
    2. Zanin, Luca & Marra, Giampiero, 2012. "Assessing the functional relationship between CO2 emissions and economic development using an additive mixed model approach," Economic Modelling, Elsevier, vol. 29(4), pages 1328-1337.
    3. Ni, Xiao & Zhang, Hao Helen & Zhang, Daowen, 2009. "Automatic model selection for partially linear models," Journal of Multivariate Analysis, Elsevier, vol. 100(9), pages 2100-2111, October.
    4. Proietti, Tommaso, 2010. "Trend Estimation," MPRA Paper 21607, University Library of Munich, Germany.
    5. Otto-Sobotka, Fabian & Salvati, Nicola & Ranalli, Maria Giovanna & Kneib, Thomas, 2019. "Adaptive semiparametric M-quantile regression," Econometrics and Statistics, Elsevier, vol. 11(C), pages 116-129.
    6. Javier Parada Gómez Urquiza & Alejandro López-Feldman, 2013. "Poverty dynamics in rural Mexico: What does the future hold?," Ensayos Revista de Economia, Universidad Autonoma de Nuevo Leon, Facultad de Economia, vol. 0(2), pages 55-74, November.
    7. Bethany Everett & David Rehkopf & Richard Rogers, 2013. "The Nonlinear Relationship Between Education and Mortality: An Examination of Cohort, Race/Ethnic, and Gender Differences," Population Research and Policy Review, Springer;Southern Demographic Association (SDA), vol. 32(6), pages 893-917, December.
    8. Tatiyana V. Apanasovich & David Ruppert & Joanne R. Lupton & Natasa Popovic & Nancy D. Turner & Robert S. Chapkin & Raymond J. Carroll, 2008. "Aberrant Crypt Foci and Semiparametric Modeling of Correlated Binary Data," Biometrics, The International Biometric Society, vol. 64(2), pages 490-500, June.
    9. Eduardo L. Montoya & Wendy Meiring, 2016. "An F-type test for detecting departure from monotonicity in a functional linear model," Journal of Nonparametric Statistics, Taylor & Francis Journals, vol. 28(2), pages 322-337, June.
    10. Yu, Jun, 2012. "A semiparametric stochastic volatility model," Journal of Econometrics, Elsevier, vol. 167(2), pages 473-482.
    11. Timothy K.M. Beatty & Erling Røed Larsen, 2005. "Using Engel curves to estimate bias in the Canadian CPI as a cost of living index," Canadian Journal of Economics/Revue canadienne d'économique, John Wiley & Sons, vol. 38(2), pages 482-499, May.
    12. Mestekemper, Thomas & Windmann, Michael & Kauermann, Göran, 2010. "Functional hourly forecasting of water temperature," International Journal of Forecasting, Elsevier, vol. 26(4), pages 684-699, October.
    13. Naschold, Felix, 2012. "“The Poor Stay Poor”: Household Asset Poverty Traps in Rural Semi-Arid India," World Development, Elsevier, vol. 40(10), pages 2033-2043.
    14. Arthur Charpentier & Emmanuel Flachaire & Antoine Ly, 2017. "Econom\'etrie et Machine Learning," Papers 1708.06992, arXiv.org, revised Mar 2018.
    15. Jaroslaw Harezlak & Louise M. Ryan & Jay N. Giedd & Nicholas Lange, 2005. "Individual and Population Penalized Regression Splines for Accelerated Longitudinal Designs," Biometrics, The International Biometric Society, vol. 61(4), pages 1037-1048, December.
    16. Hyunju Son & Youyi Fong, 2021. "Fast grid search and bootstrap‐based inference for continuous two‐phase polynomial regression models," Environmetrics, John Wiley & Sons, Ltd., vol. 32(3), May.
    17. Welham, S.J. & Thompson, R., 2009. "A note on bimodality in the log-likelihood function for penalized spline mixed models," Computational Statistics & Data Analysis, Elsevier, vol. 53(4), pages 920-931, February.
    18. Michael Wegener & Göran Kauermann, 2017. "Forecasting in nonlinear univariate time series using penalized splines," Statistical Papers, Springer, vol. 58(3), pages 557-576, September.
    19. Dlugosz, Stephan & Mammen, Enno & Wilke, Ralf A., 2017. "Generalized partially linear regression with misclassified data and an application to labour market transitions," Computational Statistics & Data Analysis, Elsevier, vol. 110(C), pages 145-159.
    20. Bernhard Baumgartner & Daniel Guhl & Thomas Kneib & Winfried J. Steiner, 2018. "Flexible estimation of time-varying effects for frequently purchased retail goods: a modeling approach based on household panel data," OR Spectrum: Quantitative Approaches in Management, Springer;Gesellschaft für Operations Research e.V., vol. 40(4), pages 837-873, October.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:jorssb:v:70:y:2008:i:1:p:119-139. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: https://edirc.repec.org/data/rssssea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.