IDEAS home Printed from https://ideas.repec.org/a/bla/jorssa/v184y2021i4p1414-1451.html
   My bibliography  Save this article

Clustering longitudinal life‐course sequences using mixtures of exponential‐distance models

Author

Listed:
  • Keefe Murphy
  • T. Brendan Murphy
  • Raffaella Piccarreta
  • I. Claire Gormley

Abstract

Sequence analysis is an increasingly popular approach for analysing life courses represented by ordered collections of activities experienced by subjects over time. Here, we analyse a survey data set containing information on the career trajectories of a cohort of Northern Irish youths tracked between the ages of 16 and 22. We propose a novel, model‐based clustering approach suited to the analysis of such data from a holistic perspective, with the aims of estimating the number of typical career trajectories, identifying the relevant features of these patterns, and assessing the extent to which such patterns are shaped by background characteristics. Several criteria exist for measuring pairwise dissimilarities among categorical sequences. Typically, dissimilarity matrices are employed as input to heuristic clustering algorithms. The family of methods we develop instead clusters sequences directly using mixtures of exponential‐distance models. Basing the models on weighted variants of the Hamming distance metric permits closed‐form expressions for parameter estimation. Simultaneously allowing the component membership probabilities to depend on fixed covariates and accommodating sampling weights in the clustering process yields new insights on the Northern Irish data. In particular, we find that school examination performance is the single most important predictor of cluster membership.

Suggested Citation

  • Keefe Murphy & T. Brendan Murphy & Raffaella Piccarreta & I. Claire Gormley, 2021. "Clustering longitudinal life‐course sequences using mixtures of exponential‐distance models," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 184(4), pages 1414-1451, October.
  • Handle: RePEc:bla:jorssa:v:184:y:2021:i:4:p:1414-1451
    DOI: 10.1111/rssa.12712
    as

    Download full text from publisher

    File URL: https://doi.org/10.1111/rssa.12712
    Download Restriction: no

    File URL: https://libkey.io/10.1111/rssa.12712?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Celeux, Gilles & Govaert, Gerard, 1992. "A classification EM algorithm for clustering and two stochastic versions," Computational Statistics & Data Analysis, Elsevier, vol. 14(3), pages 315-332, October.
    2. Matthias Studer & Gilbert Ritschard, 2016. "What matters in differences between life trajectories: a comparative review of sequence dissimilarity measures," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 179(2), pages 481-511, February.
    3. Zsuzsa Bakk & Jouni Kuha, 2018. "Two-Step Estimation of Models Between Latent Classes and External Variables," Psychometrika, Springer;The Psychometric Society, vol. 83(4), pages 871-892, December.
    4. Hahsler, Michael & Hornik, Kurt & Buchta, Christian, 2008. "Getting Things in Order: An Introduction to the R Package seriation," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 25(i03).
    5. Fernando Muñoz-Bullón & Miguel A. Malo, 2003. "Employment status mobility from a life-cycle perspective," Demographic Research, Max Planck Institute for Demographic Research, Rostock, Germany, vol. 9(7), pages 119-162.
    6. Gabadinho, Alexis & Ritschard, Gilbert & Müller, Nicolas S & Studer, Matthias, 2011. "Analyzing and Visualizing State Sequences in R with TraMineR," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 40(i04).
    7. Bakk, Zsuzsa & Kuha, Jouni, 2018. "Two-step estimation of models between latent classes and external variables," LSE Research Online Documents on Economics 85161, London School of Economics and Political Science, LSE Library.
    8. Melnykov, Volodymyr, 2016. "ClickClust: An R Package for Model-Based Clustering of Categorical Sequences," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 74(i09).
    9. Duncan McVicar & Michael Anyadike‐Danes, 2002. "Predicting successful and unsuccessful transitions from school to work by using sequence methods," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 165(2), pages 317-334, June.
    10. Adrian O’Hagan & Thomas Brendan Murphy & Luca Scrucca & Isobel Claire Gormley, 2019. "Investigation of parameter uncertainty in clustering using a Gaussian mixture model via jackknife, bootstrap and weighted likelihood bootstrap," Computational Statistics, Springer, vol. 34(4), pages 1779-1813, December.
    11. Keefe Murphy & Thomas Brendan Murphy, 2020. "Gaussian parsimonious clustering models with covariates and a noise component," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 14(2), pages 293-325, June.
    12. Murphy, Thomas Brendan & Martin, Donal, 2003. "Mixtures of distance-based models for ranking data," Computational Statistics & Data Analysis, Elsevier, vol. 41(3-4), pages 645-655, January.
    13. Dankmar Böhning & Ekkehart Dietz & Rainer Schaub & Peter Schlattmann & Bruce Lindsay, 1994. "The distribution of the likelihood ratio for mixtures of densities from the one-parameter exponential family," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 46(2), pages 373-388, June.
    14. Linzer, Drew A. & Lewis, Jeffrey B., 2011. "poLCA: An R Package for Polytomous Variable Latent Class Analysis," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 42(i10).
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. repec:osf:socarx:3mcfp_v1 is not listed on IDEAS
    2. Piccarreta, Raffaella & Bonetti, Marco, 2019. "Assessing and comparing models for sequence data by microsimulation (with Supplementary Material)," SocArXiv 3mcfp, Center for Open Science.
    3. repec:osf:socarx:v7mj8_v1 is not listed on IDEAS
    4. Giorgio Eduardo Montanari & Marco Doretti & Maria Francesca Marino, 2022. "Model-based two-way clustering of second-level units in ordinal multilevel latent Markov models," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 16(2), pages 457-485, June.
    5. Marco Raffaella Piccarreta & Marco Bonetti & Stefano Lombardi, 2018. "Comparing models for sequence data: prediction and dissimilarities," Working Papers 113, "Carlo F. Dondena" Centre for Research on Social Dynamics (DONDENA), Università Commerciale Luigi Bocconi.
    6. Struffolino, Emanuela, 2019. "Navigating the early career: The social stratification of young workers’ employment trajectories in Italy," EconStor Open Access Articles and Book Chapters, ZBW - Leibniz Information Centre for Economics, vol. 63, pages 1-17.
    7. Piccarreta, Raffaella & Struffolino, Emanuela, 2019. "An Integrated Heuristic for Validation in Sequence Analysis," SocArXiv v7mj8, Center for Open Science.
    8. Erofili Grapsa & Dorrit Posel, 2016. "Sequencing the real time of the elderly: Evidence from South Africa," Demographic Research, Max Planck Institute for Demographic Research, Rostock, Germany, vol. 35(25), pages 711-744.
    9. Adrian O’Hagan & Arthur White, 2019. "Improved model-based clustering performance using Bayesian initialization averaging," Computational Statistics, Springer, vol. 34(1), pages 201-231, March.
    10. Marcel Raab & Emanuela Struffolino, 2020. "The Heterogeneity of Partnership Trajectories to Childlessness in Germany," European Journal of Population, Springer;European Association for Population Studies, vol. 36(1), pages 53-70, March.
    11. Yajing Zhu & Fiona Steele & Irini Moustaki, 2020. "A multilevel structural equation model for the interrelationships between multiple latent dimensions of childhood socio‐economic circumstances, partnership transitions and mid‐life health," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 183(3), pages 1029-1050, June.
    12. Julia Mikolai & Hill Kulu, 2019. "Union dissolution and housing trajectories in Britain," Demographic Research, Max Planck Institute for Demographic Research, Rostock, Germany, vol. 41(7), pages 161-196.
    13. Lin, Tsung-I, 2014. "Learning from incomplete data via parameterized t mixture models through eigenvalue decomposition," Computational Statistics & Data Analysis, Elsevier, vol. 71(C), pages 183-195.
    14. repec:jss:jstsof:40:i04 is not listed on IDEAS
    15. Yoav Bergner & Alina A. von Davier, 2019. "Process Data in NAEP: Past, Present, and Future," Journal of Educational and Behavioral Statistics, , vol. 44(6), pages 706-732, December.
    16. Babette Bühler & Katja Möhring & Andreas P. Weiland, 2022. "Assessing dissimilarity of employment history information from survey and administrative data using sequence analysis techniques," Quality & Quantity: International Journal of Methodology, Springer, vol. 56(6), pages 4747-4774, December.
    17. Marc A. Scott & Kaushik Mohan & Jacques‐Antoine Gauthier, 2020. "Model‐based clustering and analysis of life history data," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 183(3), pages 1231-1251, June.
    18. Estelle McLean & Amelia C Crampin & Rebecca Sear & Maria Sironi & Emma Slaymaker & Albert Dube, 2024. "Transitions to adulthood in men and women in rural Malawi in the 21st century using sequence analysis: Some evidence of delay," Demographic Research, Max Planck Institute for Demographic Research, Rostock, Germany, vol. 51(14), pages 459-500.
    19. Di Mari, Roberto & Bakk, Zsuzsa & Oser, Jennifer & Kuha, Jouni, 2023. "A two-step estimator for multilevel latent class analysis with covariates," LSE Research Online Documents on Economics 119994, London School of Economics and Political Science, LSE Library.
    20. Olga Czeranowska & Dominika Winogrodzka, 2024. "Socio-occupational Paths of Polish and Lithuanian Returning Migrants: Sequence Analysis of Survey Data with the Use of TraMineR for R," Journal of International Migration and Integration, Springer, vol. 25(2), pages 997-1025, June.
    21. Montorsi, Carlotta & Fusco, Alessio & Van Kerm, Philippe & Bordas, Stéphane P.A., 2024. "Predicting depression in old age: Combining life course data with machine learning," Economics & Human Biology, Elsevier, vol. 52(C).
    22. Roberto Mari & Zsuzsa Bakk & Jennifer Oser & Jouni Kuha, 2023. "A two-step estimator for multilevel latent class analysis with covariates," Psychometrika, Springer;The Psychometric Society, vol. 88(4), pages 1144-1170, December.
    23. Devillanova, Carlo & Raitano, Michele & Struffolino, Emanuela, 2019. "Longitudinal employment trajectories and health in middle life: Insights from linked administrative and survey data," EconStor Open Access Articles and Book Chapters, ZBW - Leibniz Information Centre for Economics, vol. 40, pages 1375-1412.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:jorssa:v:184:y:2021:i:4:p:1414-1451. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: https://edirc.repec.org/data/rssssea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.