IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v177y2023ics0167947322001633.html
   My bibliography  Save this article

Nonparametric bagging clustering methods to identify latent structures from a sequence of dependent categorical data

Author

Listed:
  • Abramowicz, Konrad
  • Sjöstedt de Luna, Sara
  • Strandberg, Johan

Abstract

Nonparametric bagging clustering methods are studied and compared to identify latent structures from a sequence of dependent categorical data observed along a one-dimensional (discrete) time domain. The frequency of the observed categories is assumed to be generated by a (slowly varying) latent signal, according to latent state-specific probability distributions. The bagging clustering methods use random tessellations (partitions) of the time domain and clustering of the category frequencies of the observed data in the tessellation cells to recover the latent signal, within a bagging framework. New and existing ways of generating the tessellations and clustering are discussed and combined into different bagging clustering methods. Edge tessellations and adaptive tessellations are the new proposed ways of forming partitions. Composite methods are also introduced, that are using (automated) decision rules based on entropy measures to choose among the proposed bagging clustering methods. The performance of all the methods is compared in a simulation study. From the simulation study it can be concluded that local and global entropy measures are powerful tools in improving the recovery of the latent signal, both via the adaptive tessellation strategies (local entropy) and in designing composite methods (global entropy). The composite methods are robust and overall improve performance, in particular the composite method using adaptive (edge) tessellations.

Suggested Citation

  • Abramowicz, Konrad & Sjöstedt de Luna, Sara & Strandberg, Johan, 2023. "Nonparametric bagging clustering methods to identify latent structures from a sequence of dependent categorical data," Computational Statistics & Data Analysis, Elsevier, vol. 177(C).
  • Handle: RePEc:eee:csdana:v:177:y:2023:i:c:s0167947322001633
    DOI: 10.1016/j.csda.2022.107583
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167947322001633
    Download Restriction: Full text for ScienceDirect subscribers only.

    File URL: https://libkey.io/10.1016/j.csda.2022.107583?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Menafoglio, Alessandra & Secchi, Piercesare, 2017. "Statistical analysis of complex and spatially dependent data: A review of Object Oriented Spatial Statistics," European Journal of Operational Research, Elsevier, vol. 258(2), pages 401-410.
    2. Matthew Stephens, 2000. "Dealing with label switching in mixture models," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 62(4), pages 795-809.
    3. Lemmens, A. & Croux, C. & Stremersch, S., 2012. "Dynamics in international market segmentation of new product growth," Other publications TiSEM 306086bd-670f-48d2-97d1-3, Tilburg University, School of Economics and Management.
    4. Lemmens, Aurélie & Croux, Christophe & Stremersch, Stefan, 2012. "Dynamics in the international market segmentation of new product growth," International Journal of Research in Marketing, Elsevier, vol. 29(1), pages 81-92.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Jiang, Yuanchun & Liu, Yezheng & Shang, Jennifer & Yildirim, Pinar & Zhang, Qingfu, 2018. "Optimizing online recurring promotions for dual-channel retailers: Segmented markets with multiple objectives," European Journal of Operational Research, Elsevier, vol. 267(2), pages 612-627.
    2. Gelper, Sarah & Stremersch, Stefan, 2014. "Variable selection in international diffusion models," International Journal of Research in Marketing, Elsevier, vol. 31(4), pages 356-367.
    3. Moon, Sangkil & Jalali, Nima & Erevelles, Sunil, 2021. "Segmentation of both reviewers and businesses on social media," Journal of Retailing and Consumer Services, Elsevier, vol. 61(C).
    4. Kappe, Eelco & Stadler Blank, Ashley & DeSarbo, Wayne S., 2018. "A random coefficients mixture hidden Markov model for marketing research," International Journal of Research in Marketing, Elsevier, vol. 35(3), pages 415-431.
    5. Ruslan Ilyasov, 2014. "About the Method of Analysis of Economic Correlations by Differentiation of Spline Models," Modern Applied Science, Canadian Center of Science and Education, vol. 8(5), pages 197-197, October.
    6. Amirali Kani & Wayne S. DeSarbo & Duncan K. H. Fong, 2018. "A Factorial Hidden Markov Model for the Analysis of Temporal Change in Choice Models," Customer Needs and Solutions, Springer;Institute for Sustainable Innovation and Growth (iSIG), vol. 5(3), pages 162-177, December.
    7. Guhl, Daniel & Baumgartner, Bernhard & Kneib, Thomas & Steiner, Winfried J., 2018. "Estimating time-varying parameters in brand choice models: A semiparametric approach," International Journal of Research in Marketing, Elsevier, vol. 35(3), pages 394-414.
    8. Yao, Weixin & Wei, Yan & Yu, Chun, 2014. "Robust mixture regression using the t-distribution," Computational Statistics & Data Analysis, Elsevier, vol. 71(C), pages 116-127.
    9. Jeong Eun Lee & Christian Robert, 2013. "Imortance Sampling Schemes for Evidence Approximation in Mixture Models," Working Papers 2013-42, Center for Research in Economics and Statistics.
    10. Aßmann, Christian & Boysen-Hogrefe, Jens & Pape, Markus, 2012. "The directional identification problem in Bayesian factor analysis: An ex-post approach," Kiel Working Papers 1799, Kiel Institute for the World Economy (IfW Kiel).
    11. Sun-Joo Cho & Allan S. Cohen, 2010. "A Multilevel Mixture IRT Model With an Application to DIF," Journal of Educational and Behavioral Statistics, , vol. 35(3), pages 336-370, June.
    12. Brian Hartley, 2020. "Corridor stability of the Kaleckian growth model: a Markov-switching approach," Working Papers 2013, New School for Social Research, Department of Economics, revised Nov 2020.
    13. Papastamoulis, Panagiotis, 2018. "Overfitting Bayesian mixtures of factor analyzers with an unknown number of components," Computational Statistics & Data Analysis, Elsevier, vol. 124(C), pages 220-234.
    14. Simen Alexander Linge Johnsen & Jörg Bollmann, 2020. "Coccolith mass and morphology of different Emiliania huxleyi morphotypes: A critical examination using Canary Islands material," PLOS ONE, Public Library of Science, vol. 15(3), pages 1-29, March.
    15. Nichole E. Carlson & Timothy D. Johnson & Morton B. Brown, 2009. "A Bayesian Approach to Modeling Associations Between Pulsatile Hormones," Biometrics, The International Biometric Society, vol. 65(2), pages 650-659, June.
    16. Montanari, Angela & Viroli, Cinzia, 2011. "Maximum likelihood estimation of mixtures of factor analyzers," Computational Statistics & Data Analysis, Elsevier, vol. 55(9), pages 2712-2723, September.
    17. Stéphane Bonhomme & Koen Jochmans & Jean-Marc Robin, 2016. "Non-parametric estimation of finite mixtures from repeated measurements," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 78(1), pages 211-229, January.
    18. Xue, Jiacheng & Yao, Weixin, 2022. "Machine Learning Embedded Semiparametric Mixtures of Regressions with Covariate-Varying Mixing Proportions," Econometrics and Statistics, Elsevier, vol. 22(C), pages 159-171.
    19. Liqun Wang & James Fu, 2007. "A practical sampling approach for a Bayesian mixture model with unknown number of components," Statistical Papers, Springer, vol. 48(4), pages 631-653, October.
    20. Royce Anders & William Batchelder, 2015. "Cultural Consensus Theory for the Ordinal Data Case," Psychometrika, Springer;The Psychometric Society, vol. 80(1), pages 151-181, March.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:177:y:2023:i:c:s0167947322001633. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.