IDEAS home Printed from https://ideas.repec.org/a/bla/jorssc/v70y2021i3p714-732.html
   My bibliography  Save this article

Clustering and automatic labelling within time series of categorical observations—with an application to marine log messages

Author

Listed:
  • Emanuele Gramuglia
  • Geir Storvik
  • Morten Stakkeland

Abstract

System logs or log files containing textual messages with associated time stamps are generated by many technologies and systems. The clustering technique proposed in this paper provides a tool to discover and identify patterns or macrolevel events in this data. The motivating application is logs generated by frequency converters in the propulsion system on a ship, while the general setting is fault identification and classification in complex industrial systems. The paper introduces an offline approach for dividing a time series of log messages into a series of discrete segments of random lengths. These segments are clustered into a limited set of states. A state is assumed to correspond to a specific operation or condition of the system, and can be a fault mode or a normal operation. Each of the states can be associated with a specific, limited set of messages, where messages appear in a random or semi‐structured order within the segments. These structures are in general not defined a priori. We propose a Bayesian hierarchical model where the states are characterised both by the temporal frequency and the type of messages within each segment. An algorithm for inference based on reversible jump MCMC is proposed. The performance of the method is assessed by both simulations and operational data.

Suggested Citation

  • Emanuele Gramuglia & Geir Storvik & Morten Stakkeland, 2021. "Clustering and automatic labelling within time series of categorical observations—with an application to marine log messages," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 70(3), pages 714-732, June.
  • Handle: RePEc:bla:jorssc:v:70:y:2021:i:3:p:714-732
    DOI: 10.1111/rssc.12483
    as

    Download full text from publisher

    File URL: https://doi.org/10.1111/rssc.12483
    Download Restriction: no

    File URL: https://libkey.io/10.1111/rssc.12483?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Matthew Stephens, 2000. "Dealing with label switching in mixture models," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 62(4), pages 795-809.
    2. repec:dau:papers:123456789/6069 is not listed on IDEAS
    3. Papastamoulis, Panagiotis, 2016. "label.switching: An R Package for Dealing with the Label Switching Problem in MCMC Outputs," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 69(c01).
    4. Vanessa Didelez, 2008. "Graphical models for marked point processes based on local independence," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 70(1), pages 245-264, February.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Papastamoulis, Panagiotis, 2018. "Overfitting Bayesian mixtures of factor analyzers with an unknown number of components," Computational Statistics & Data Analysis, Elsevier, vol. 124(C), pages 220-234.
    2. You, Na & Dai, Hongsheng & Wang, Xueqin & Yu, Qingyun, 2024. "Sequential estimation for mixture of regression models for heterogeneous population," Computational Statistics & Data Analysis, Elsevier, vol. 194(C).
    3. Kensuke Okada & Shin-ichi Mayekawa, 2018. "Post-processing of Markov chain Monte Carlo output in Bayesian latent variable models with application to multidimensional scaling," Computational Statistics, Springer, vol. 33(3), pages 1457-1473, September.
    4. Louit, Sydney & Clark, Evan A. & Gelbard, Alexander H. & Vivek, Niketna & Yan, Jun & Zhang, Panpan, 2025. "CALF-SBM: A covariate-assisted latent factor stochastic block model," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 667(C).
    5. Kazuhiro Yamaguchi & Jonathan Templin, 2022. "A Gibbs Sampling Algorithm with Monotonicity Constraints for Diagnostic Classification Models," Journal of Classification, Springer;The Classification Society, vol. 39(1), pages 24-54, March.
    6. Yao, Weixin & Wei, Yan & Yu, Chun, 2014. "Robust mixture regression using the t-distribution," Computational Statistics & Data Analysis, Elsevier, vol. 71(C), pages 116-127.
    7. Jeong Eun Lee & Christian Robert, 2013. "Imortance Sampling Schemes for Evidence Approximation in Mixture Models," Working Papers 2013-42, Center for Research in Economics and Statistics.
    8. Aßmann, Christian & Boysen-Hogrefe, Jens & Pape, Markus, 2012. "The directional identification problem in Bayesian factor analysis: An ex-post approach," Kiel Working Papers 1799, Kiel Institute for the World Economy (IfW Kiel).
    9. Sun-Joo Cho & Allan S. Cohen, 2010. "A Multilevel Mixture IRT Model With an Application to DIF," Journal of Educational and Behavioral Statistics, , vol. 35(3), pages 336-370, June.
    10. Brian Hartley, 2020. "Corridor stability of the Kaleckian growth model: a Markov-switching approach," Working Papers 2013, New School for Social Research, Department of Economics, revised Nov 2020.
    11. Simen Alexander Linge Johnsen & Jörg Bollmann, 2020. "Coccolith mass and morphology of different Emiliania huxleyi morphotypes: A critical examination using Canary Islands material," PLOS ONE, Public Library of Science, vol. 15(3), pages 1-29, March.
    12. Nichole E. Carlson & Timothy D. Johnson & Morton B. Brown, 2009. "A Bayesian Approach to Modeling Associations Between Pulsatile Hormones," Biometrics, The International Biometric Society, vol. 65(2), pages 650-659, June.
    13. Montanari, Angela & Viroli, Cinzia, 2011. "Maximum likelihood estimation of mixtures of factor analyzers," Computational Statistics & Data Analysis, Elsevier, vol. 55(9), pages 2712-2723, September.
    14. Stéphane Bonhomme & Koen Jochmans & Jean-Marc Robin, 2016. "Non-parametric estimation of finite mixtures from repeated measurements," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 78(1), pages 211-229, January.
    15. Ogawa, Ryo & Engler, Jan O. & Cord, Anna F., 2024. "Functional responses in habitat selection as a management tool to evaluate agri-environment schemes for farmland birds," Ecological Modelling, Elsevier, vol. 494(C).
    16. Xue, Jiacheng & Yao, Weixin, 2022. "Machine Learning Embedded Semiparametric Mixtures of Regressions with Covariate-Varying Mixing Proportions," Econometrics and Statistics, Elsevier, vol. 22(C), pages 159-171.
    17. Liqun Wang & James Fu, 2007. "A practical sampling approach for a Bayesian mixture model with unknown number of components," Statistical Papers, Springer, vol. 48(4), pages 631-653, October.
    18. Royce Anders & William Batchelder, 2015. "Cultural Consensus Theory for the Ordinal Data Case," Psychometrika, Springer;The Psychometric Society, vol. 80(1), pages 151-181, March.
    19. Lu, Xiaosun & Huang, Yangxin & Zhu, Yiliang, 2016. "Finite mixture of nonlinear mixed-effects joint models in the presence of missing and mismeasured covariate, with application to AIDS studies," Computational Statistics & Data Analysis, Elsevier, vol. 93(C), pages 119-130.
    20. Bilancia, Massimo & Dačević, Rade, 2025. "A Dirichlet-Multinomial mixture model of Statistical Science: Mapping the shift of a paradigm," Journal of Informetrics, Elsevier, vol. 19(1).

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:jorssc:v:70:y:2021:i:3:p:714-732. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: https://edirc.repec.org/data/rssssea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.