IDEAS home Printed from https://ideas.repec.org/a/bla/jorssa/v184y2021i4p1414-1451.html
   My bibliography  Save this article

Clustering longitudinal life‐course sequences using mixtures of exponential‐distance models

Author

Listed:
  • Keefe Murphy
  • T. Brendan Murphy
  • Raffaella Piccarreta
  • I. Claire Gormley

Abstract

Sequence analysis is an increasingly popular approach for analysing life courses represented by ordered collections of activities experienced by subjects over time. Here, we analyse a survey data set containing information on the career trajectories of a cohort of Northern Irish youths tracked between the ages of 16 and 22. We propose a novel, model‐based clustering approach suited to the analysis of such data from a holistic perspective, with the aims of estimating the number of typical career trajectories, identifying the relevant features of these patterns, and assessing the extent to which such patterns are shaped by background characteristics. Several criteria exist for measuring pairwise dissimilarities among categorical sequences. Typically, dissimilarity matrices are employed as input to heuristic clustering algorithms. The family of methods we develop instead clusters sequences directly using mixtures of exponential‐distance models. Basing the models on weighted variants of the Hamming distance metric permits closed‐form expressions for parameter estimation. Simultaneously allowing the component membership probabilities to depend on fixed covariates and accommodating sampling weights in the clustering process yields new insights on the Northern Irish data. In particular, we find that school examination performance is the single most important predictor of cluster membership.

Suggested Citation

  • Keefe Murphy & T. Brendan Murphy & Raffaella Piccarreta & I. Claire Gormley, 2021. "Clustering longitudinal life‐course sequences using mixtures of exponential‐distance models," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 184(4), pages 1414-1451, October.
  • Handle: RePEc:bla:jorssa:v:184:y:2021:i:4:p:1414-1451
    DOI: 10.1111/rssa.12712
    as

    Download full text from publisher

    File URL: https://doi.org/10.1111/rssa.12712
    Download Restriction: no

    File URL: https://libkey.io/10.1111/rssa.12712?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Matthias Studer & Gilbert Ritschard, 2016. "What matters in differences between life trajectories: a comparative review of sequence dissimilarity measures," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 179(2), pages 481-511, February.
    2. Bakk, Zsuzsa & Kuha, Jouni, 2018. "Two-step estimation of models between latent classes and external variables," LSE Research Online Documents on Economics 85161, London School of Economics and Political Science, LSE Library.
    3. Hahsler, Michael & Hornik, Kurt & Buchta, Christian, 2008. "Getting Things in Order: An Introduction to the R Package seriation," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 25(i03).
    4. Melnykov, Volodymyr, 2016. "ClickClust: An R Package for Model-Based Clustering of Categorical Sequences," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 74(i09).
    5. Keefe Murphy & Thomas Brendan Murphy, 2020. "Gaussian parsimonious clustering models with covariates and a noise component," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 14(2), pages 293-325, June.
    6. Linzer, Drew A. & Lewis, Jeffrey B., 2011. "poLCA: An R Package for Polytomous Variable Latent Class Analysis," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 42(i10).
    7. Zsuzsa Bakk & Jouni Kuha, 2018. "Two-Step Estimation of Models Between Latent Classes and External Variables," Psychometrika, Springer;The Psychometric Society, vol. 83(4), pages 871-892, December.
    8. Dankmar Böhning & Ekkehart Dietz & Rainer Schaub & Peter Schlattmann & Bruce Lindsay, 1994. "The distribution of the likelihood ratio for mixtures of densities from the one-parameter exponential family," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 46(2), pages 373-388, June.
    9. Celeux, Gilles & Govaert, Gerard, 1992. "A classification EM algorithm for clustering and two stochastic versions," Computational Statistics & Data Analysis, Elsevier, vol. 14(3), pages 315-332, October.
    10. Fernando Muñoz-Bullón & Miguel A. Malo, 2003. "Employment status mobility from a life-cycle perspective," Demographic Research, Max Planck Institute for Demographic Research, Rostock, Germany, vol. 9(7), pages 119-162.
    11. Gabadinho, Alexis & Ritschard, Gilbert & Müller, Nicolas S & Studer, Matthias, 2011. "Analyzing and Visualizing State Sequences in R with TraMineR," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 40(i04).
    12. Duncan McVicar & Michael Anyadike‐Danes, 2002. "Predicting successful and unsuccessful transitions from school to work by using sequence methods," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 165(2), pages 317-334, June.
    13. Adrian O’Hagan & Thomas Brendan Murphy & Luca Scrucca & Isobel Claire Gormley, 2019. "Investigation of parameter uncertainty in clustering using a Gaussian mixture model via jackknife, bootstrap and weighted likelihood bootstrap," Computational Statistics, Springer, vol. 34(4), pages 1779-1813, December.
    14. Murphy, Thomas Brendan & Martin, Donal, 2003. "Mixtures of distance-based models for ranking data," Computational Statistics & Data Analysis, Elsevier, vol. 41(3-4), pages 645-655, January.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Piccarreta, Raffaella & Bonetti, Marco, 2019. "Assessing and comparing models for sequence data by microsimulation (with Supplementary Material)," SocArXiv 3mcfp, Center for Open Science.
    2. Giorgio Eduardo Montanari & Marco Doretti & Maria Francesca Marino, 2022. "Model-based two-way clustering of second-level units in ordinal multilevel latent Markov models," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 16(2), pages 457-485, June.
    3. Piccarreta, Raffaella & Struffolino, Emanuela, 2019. "An Integrated Heuristic for Validation in Sequence Analysis," SocArXiv v7mj8, Center for Open Science.
    4. Erofili Grapsa & Dorrit Posel, 2016. "Sequencing the real time of the elderly: Evidence from South Africa," Demographic Research, Max Planck Institute for Demographic Research, Rostock, Germany, vol. 35(25), pages 711-744.
    5. Marco Raffaella Piccarreta & Marco Bonetti & Stefano Lombardi, 2018. "Comparing models for sequence data: prediction and dissimilarities," Working Papers 113, "Carlo F. Dondena" Centre for Research on Social Dynamics (DONDENA), Università Commerciale Luigi Bocconi.
    6. Struffolino, Emanuela, 2019. "Navigating the early career: The social stratification of young workers’ employment trajectories in Italy," EconStor Open Access Articles and Book Chapters, ZBW - Leibniz Information Centre for Economics, vol. 63, pages 1-17.
    7. Adrian O’Hagan & Arthur White, 2019. "Improved model-based clustering performance using Bayesian initialization averaging," Computational Statistics, Springer, vol. 34(1), pages 201-231, March.
    8. Marcel Raab & Emanuela Struffolino, 2020. "The Heterogeneity of Partnership Trajectories to Childlessness in Germany," European Journal of Population, Springer;European Association for Population Studies, vol. 36(1), pages 53-70, March.
    9. Júlia Mikolai & Hill Kulu, 2019. "Union dissolution and housing trajectories in Britain," Demographic Research, Max Planck Institute for Demographic Research, Rostock, Germany, vol. 41(7), pages 161-196.
    10. repec:jss:jstsof:40:i04 is not listed on IDEAS
    11. Babette Bühler & Katja Möhring & Andreas P. Weiland, 2022. "Assessing dissimilarity of employment history information from survey and administrative data using sequence analysis techniques," Quality & Quantity: International Journal of Methodology, Springer, vol. 56(6), pages 4747-4774, December.
    12. Marc A. Scott & Kaushik Mohan & Jacques‐Antoine Gauthier, 2020. "Model‐based clustering and analysis of life history data," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 183(3), pages 1231-1251, June.
    13. Devillanova, Carlo & Raitano, Michele & Struffolino, Emanuela, 2019. "Longitudinal employment trajectories and health in middle life: Insights from linked administrative and survey data," EconStor Open Access Articles and Book Chapters, ZBW - Leibniz Information Centre for Economics, pages 1375-1412.
    14. Andy Dickerson & Emily McDool & Damon Morris, 2023. "Post-compulsory education pathways and labour market outcomes," Education Economics, Taylor & Francis Journals, vol. 31(3), pages 326-352, May.
    15. Suh, Ellie, 2022. "Can't save or won't save: financial resilience and discretionary retirement saving among British adults in their thirties and forties," LSE Research Online Documents on Economics 110492, London School of Economics and Political Science, LSE Library.
    16. Michael Anyadike-Danes & Duncan McVicar, 2010. "My Brilliant Career: Characterizing the Early Labor Market Trajectories of British Women From Generation X," Sociological Methods & Research, , vol. 38(3), pages 482-512, February.
    17. Bakk, Zsuzsa & Kuha, Jouni, 2020. "Relating latent class membership to external variables: an overview," LSE Research Online Documents on Economics 107564, London School of Economics and Political Science, LSE Library.
    18. Cees H. Elzinga & Matthias Studer, 2019. "Normalization of Distance and Similarity in Sequence Analysis," Sociological Methods & Research, , vol. 48(4), pages 877-904, November.
    19. Lisa Toczek & Hans Bosma & Richard Peter, 2022. "Early retirement intentions: the impact of employment biographies, work stress and health among a baby-boomer generation," European Journal of Ageing, Springer, vol. 19(4), pages 1479-1491, December.
    20. Kandt, Jens & Leak, Alistair, 2019. "Examining inclusive mobility through smartcard data: What shall we make of senior citizens' declining bus patronage in the West Midlands?," Journal of Transport Geography, Elsevier, vol. 79(C), pages 1-1.
    21. Mathias Voigt & Antonio Abellán & Julio Pérez & Diego Ramiro, 2020. "The effects of socioeconomic conditions on old-age mortality within shared disability pathways," PLOS ONE, Public Library of Science, vol. 15(9), pages 1-17, September.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:jorssa:v:184:y:2021:i:4:p:1414-1451. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: https://edirc.repec.org/data/rssssea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.