IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v177y2023ics0167947322001633.html
   My bibliography  Save this article

Nonparametric bagging clustering methods to identify latent structures from a sequence of dependent categorical data

Author

Listed:
  • Abramowicz, Konrad
  • Sjöstedt de Luna, Sara
  • Strandberg, Johan

Abstract

Nonparametric bagging clustering methods are studied and compared to identify latent structures from a sequence of dependent categorical data observed along a one-dimensional (discrete) time domain. The frequency of the observed categories is assumed to be generated by a (slowly varying) latent signal, according to latent state-specific probability distributions. The bagging clustering methods use random tessellations (partitions) of the time domain and clustering of the category frequencies of the observed data in the tessellation cells to recover the latent signal, within a bagging framework. New and existing ways of generating the tessellations and clustering are discussed and combined into different bagging clustering methods. Edge tessellations and adaptive tessellations are the new proposed ways of forming partitions. Composite methods are also introduced, that are using (automated) decision rules based on entropy measures to choose among the proposed bagging clustering methods. The performance of all the methods is compared in a simulation study. From the simulation study it can be concluded that local and global entropy measures are powerful tools in improving the recovery of the latent signal, both via the adaptive tessellation strategies (local entropy) and in designing composite methods (global entropy). The composite methods are robust and overall improve performance, in particular the composite method using adaptive (edge) tessellations.

Suggested Citation

  • Abramowicz, Konrad & Sjöstedt de Luna, Sara & Strandberg, Johan, 2023. "Nonparametric bagging clustering methods to identify latent structures from a sequence of dependent categorical data," Computational Statistics & Data Analysis, Elsevier, vol. 177(C).
  • Handle: RePEc:eee:csdana:v:177:y:2023:i:c:s0167947322001633
    DOI: 10.1016/j.csda.2022.107583
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167947322001633
    Download Restriction: Full text for ScienceDirect subscribers only.

    File URL: https://libkey.io/10.1016/j.csda.2022.107583?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Menafoglio, Alessandra & Secchi, Piercesare, 2017. "Statistical analysis of complex and spatially dependent data: A review of Object Oriented Spatial Statistics," European Journal of Operational Research, Elsevier, vol. 258(2), pages 401-410.
    2. Lemmens, A. & Croux, C. & Stremersch, S., 2012. "Dynamics in international market segmentation of new product growth," Other publications TiSEM 306086bd-670f-48d2-97d1-3, Tilburg University, School of Economics and Management.
    3. Matthew Stephens, 2000. "Dealing with label switching in mixture models," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 62(4), pages 795-809.
    4. Lemmens, Aurélie & Croux, Christophe & Stremersch, Stefan, 2012. "Dynamics in the international market segmentation of new product growth," International Journal of Research in Marketing, Elsevier, vol. 29(1), pages 81-92.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Kappe, Eelco & Stadler Blank, Ashley & DeSarbo, Wayne S., 2018. "A random coefficients mixture hidden Markov model for marketing research," International Journal of Research in Marketing, Elsevier, vol. 35(3), pages 415-431.
    2. Jiang, Yuanchun & Liu, Yezheng & Shang, Jennifer & Yildirim, Pinar & Zhang, Qingfu, 2018. "Optimizing online recurring promotions for dual-channel retailers: Segmented markets with multiple objectives," European Journal of Operational Research, Elsevier, vol. 267(2), pages 612-627.
    3. Ruslan Ilyasov, 2014. "About the Method of Analysis of Economic Correlations by Differentiation of Spline Models," Modern Applied Science, Canadian Center of Science and Education, vol. 8(5), pages 197-197, October.
    4. Gelper, Sarah & Stremersch, Stefan, 2014. "Variable selection in international diffusion models," International Journal of Research in Marketing, Elsevier, vol. 31(4), pages 356-367.
    5. Amirali Kani & Wayne S. DeSarbo & Duncan K. H. Fong, 2018. "A Factorial Hidden Markov Model for the Analysis of Temporal Change in Choice Models," Customer Needs and Solutions, Springer;Institute for Sustainable Innovation and Growth (iSIG), vol. 5(3), pages 162-177, December.
    6. Moon, Sangkil & Jalali, Nima & Erevelles, Sunil, 2021. "Segmentation of both reviewers and businesses on social media," Journal of Retailing and Consumer Services, Elsevier, vol. 61(C).
    7. Guhl, Daniel & Baumgartner, Bernhard & Kneib, Thomas & Steiner, Winfried J., 2018. "Estimating time-varying parameters in brand choice models: A semiparametric approach," International Journal of Research in Marketing, Elsevier, vol. 35(3), pages 394-414.
    8. Wan-Lun Wang, 2019. "Mixture of multivariate t nonlinear mixed models for multiple longitudinal data with heterogeneity and missing values," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 28(1), pages 196-222, March.
    9. Mark S. Handcock & Adrian E. Raftery & Jeremy M. Tantrum, 2007. "Model‐based clustering for social networks," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 170(2), pages 301-354, March.
    10. Arman Oganisian & Nandita Mitra & Jason A. Roy, 2021. "A Bayesian nonparametric model for zero‐inflated outcomes: Prediction, clustering, and causal estimation," Biometrics, The International Biometric Society, vol. 77(1), pages 125-135, March.
    11. Yao, Weixin & Wei, Yan & Yu, Chun, 2014. "Robust mixture regression using the t-distribution," Computational Statistics & Data Analysis, Elsevier, vol. 71(C), pages 116-127.
    12. Rufo, M.J. & Pérez, C.J. & Martín, J., 2009. "Local parametric sensitivity for mixture models of lifetime distributions," Reliability Engineering and System Safety, Elsevier, vol. 94(7), pages 1238-1244.
    13. Jeong Eun Lee & Christian Robert, 2013. "Imortance Sampling Schemes for Evidence Approximation in Mixture Models," Working Papers 2013-42, Center for Research in Economics and Statistics.
    14. Aßmann, Christian & Boysen-Hogrefe, Jens & Pape, Markus, 2012. "The directional identification problem in Bayesian factor analysis: An ex-post approach," Kiel Working Papers 1799, Kiel Institute for the World Economy (IfW Kiel).
    15. Sphiwe B. Skhosana & Salomon M. Millard & Frans H. J. Kanfer, 2023. "A Novel EM-Type Algorithm to Estimate Semi-Parametric Mixtures of Partially Linear Models," Mathematics, MDPI, vol. 11(5), pages 1-20, February.
    16. Sun-Joo Cho & Allan S. Cohen, 2010. "A Multilevel Mixture IRT Model With an Application to DIF," Journal of Educational and Behavioral Statistics, , vol. 35(3), pages 336-370, June.
    17. Ungolo, Francesco & Kleinow, Torsten & Macdonald, Angus S., 2020. "A hierarchical model for the joint mortality analysis of pension scheme data with missing covariates," Insurance: Mathematics and Economics, Elsevier, vol. 91(C), pages 68-84.
    18. Ioannis Ntzoufras & Claudia Tarantola, 2012. "Conjugate and Conditional Conjugate Bayesian Analysis of Discrete Graphical Models of Marginal Independence," Quaderni di Dipartimento 178, University of Pavia, Department of Economics and Quantitative Methods.
    19. Brian Hartley, 2020. "Corridor stability of the Kaleckian growth model: a Markov-switching approach," Working Papers 2013, New School for Social Research, Department of Economics, revised Nov 2020.
    20. Park, Byung-Jung & Zhang, Yunlong & Lord, Dominique, 2010. "Bayesian mixture modeling approach to account for heterogeneity in speed data," Transportation Research Part B: Methodological, Elsevier, vol. 44(5), pages 662-673, June.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:177:y:2023:i:c:s0167947322001633. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.