IDEAS home Printed from https://ideas.repec.org/a/spr/psycho/v85y2020i3d10.1007_s11336-020-09725-2.html
   My bibliography  Save this article

Latent Theme Dictionary Model for Finding Co-occurrent Patterns in Process Data

Author

Listed:
  • Guanhua Fang

    (Columbia University)

  • Zhiliang Ying

    (Columbia University)

Abstract

Process data, which are temporally ordered sequences of categorical observations, are of recent interest due to its increasing abundance and the desire to extract useful information. A process is a collection of time-stamped events of different types, recording how an individual behaves in a given time period. The process data are too complex in terms of size and irregularity for the classical psychometric models to be directly applicable and, consequently, new ways for modeling and analysis are desired. We introduce herein a latent theme dictionary model for processes that identifies co-occurrent event patterns and individuals with similar behavioral patterns. Theoretical properties are established under certain regularity conditions for the likelihood-based estimation and inference. A nonparametric Bayes algorithm using the Markov Chain Monte Carlo method is proposed for computation. Simulation studies show that the proposed approach performs well in a range of situations. The proposed method is applied to an item in the 2012 Programme for International Student Assessment with interpretable findings.

Suggested Citation

  • Guanhua Fang & Zhiliang Ying, 2020. "Latent Theme Dictionary Model for Finding Co-occurrent Patterns in Process Data," Psychometrika, Springer;The Psychometric Society, vol. 85(3), pages 775-811, September.
  • Handle: RePEc:spr:psycho:v:85:y:2020:i:3:d:10.1007_s11336-020-09725-2
    DOI: 10.1007/s11336-020-09725-2
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11336-020-09725-2
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11336-020-09725-2?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Ishwaran H. & Rao J.S., 2003. "Detecting Differentially Expressed Genes in Microarrays Using Bayesian Model Selection," Journal of the American Statistical Association, American Statistical Association, vol. 98, pages 438-455, January.
    2. Oecd, 2016. "PISA 2015 Results in Focus," PISA in Focus 67, OECD Publishing.
    3. Dunson, David B. & Xing, Chuanhua, 2009. "Nonparametric Bayes Modeling of Multivariate Categorical Data," Journal of the American Statistical Association, American Statistical Association, vol. 104(487), pages 1042-1051.
    4. Ke Deng & Zhi Geng & Jun S. Liu, 2014. "Association pattern discovery via theme dictionary models," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 76(2), pages 319-347, March.
    5. Guanhua Fang & Jingchen Liu & Zhiliang Ying, 2019. "On the Identifiability of Diagnostic Classification Models," Psychometrika, Springer;The Psychometric Society, vol. 84(1), pages 19-40, March.
    6. W. Gibson, 1959. "Three multivariate models: Factor analysis, latent structure analysis, and latent profile analysis," Psychometrika, Springer;The Psychometric Society, vol. 24(3), pages 229-252, September.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Kunihama, T. & Herring, A.H. & Halpern, C.T. & Dunson, D.B., 2016. "Nonparametric Bayes modeling with sample survey weights," Statistics & Probability Letters, Elsevier, vol. 113(C), pages 41-48.
    2. Mahsa Samsami & Ralf Wagner, 2021. "Investment Decisions with Endogeneity: A Dirichlet Tree Analysis," JRFM, MDPI, vol. 14(7), pages 1-19, July.
    3. Guanhua Fang & Jingchen Liu & Zhiliang Ying, 2019. "On the Identifiability of Diagnostic Classification Models," Psychometrika, Springer;The Psychometric Society, vol. 84(1), pages 19-40, March.
    4. Durante, Daniele, 2017. "A note on the multiplicative gamma process," Statistics & Probability Letters, Elsevier, vol. 122(C), pages 198-204.
    5. HyungJun Cho & Jaewoo Kang & Jae Lee, 2009. "Empirical Bayes analysis of unreplicated microarray data," Computational Statistics, Springer, vol. 24(3), pages 393-408, August.
    6. Geert Soete & Willem Heiser, 1993. "A latent class unfolding model for analyzing single stimulus preference ratings," Psychometrika, Springer;The Psychometric Society, vol. 58(4), pages 545-565, December.
    7. Yajuan Si & Jerome P. Reiter, 2013. "Nonparametric Bayesian Multiple Imputation for Incomplete Categorical Variables in Large-Scale Assessment Surveys," Journal of Educational and Behavioral Statistics, , vol. 38(5), pages 499-521, October.
    8. Hiroyuki Kasahara & Katsumi Shimotsu, 2014. "Non-parametric identification and estimation of the number of components in multivariate mixtures," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 76(1), pages 97-111, January.
    9. Koop, Gary & Korobilis, Dimitris, 2016. "Model uncertainty in Panel Vector Autoregressive models," European Economic Review, Elsevier, vol. 81(C), pages 115-131.
    10. Alfò, Marco & Rocchetti, Irene, 2013. "A flexible approach to finite mixture regression models for multivariate mixed responses," Statistics & Probability Letters, Elsevier, vol. 83(7), pages 1754-1758.
    11. Dazard, Jean-Eudes & Sunil Rao, J., 2012. "Joint adaptive mean–variance regularization and variance stabilization of high dimensional data," Computational Statistics & Data Analysis, Elsevier, vol. 56(7), pages 2317-2333.
    12. James Jackson & Robin Mitra & Brian Francis & Iain Dove, 2022. "Using saturated count models for user‐friendly synthesis of large confidential administrative databases," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 185(4), pages 1613-1643, October.
    13. Humera Razzak & Christian Heumann, 2019. "Hybrid Multiple Imputation In A Large Scale Complex Survey," Statistics in Transition New Series, Polish Statistical Association, vol. 20(4), pages 33-58, December.
    14. Razzak Humera & Heumann Christian, 2019. "Hybrid Multiple Imputation In A Large Scale Complex Survey," Statistics in Transition New Series, Polish Statistical Association, vol. 20(4), pages 33-58, December.
    15. Steven Andrew Culpepper, 2019. "An Exploratory Diagnostic Model for Ordinal Responses with Binary Attributes: Identifiability and Estimation," Psychometrika, Springer;The Psychometric Society, vol. 84(4), pages 921-940, December.
    16. Olli Kiviruusu & Noora Berg & Taina Huurre & Hillevi Aro & Mauri Marttunen & Ari Haukkala, 2016. "Interpersonal Conflicts and Development of Self-Esteem from Adolescence to Mid-Adulthood. A 26-Year Follow-Up," PLOS ONE, Public Library of Science, vol. 11(10), pages 1-17, October.
    17. Jing Zhou & Anirban Bhattacharya & Amy H. Herring & David B. Dunson, 2015. "Bayesian Factorizations of Big Sparse Tensors," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 110(512), pages 1562-1576, December.
    18. Motonori Oka & Kensuke Okada, 2023. "Scalable Bayesian Approach for the Dina Q-Matrix Estimation Combining Stochastic Optimization and Variational Inference," Psychometrika, Springer;The Psychometric Society, vol. 88(1), pages 302-331, March.
    19. Ishwaran, Hemant & Sunil Rao, J., 2008. "Clustering gene expression profile data by selective shrinkage," Statistics & Probability Letters, Elsevier, vol. 78(12), pages 1490-1497, September.
    20. Hang J. Kim & Jörg Drechsler & Katherine J. Thompson, 2021. "Synthetic microdata for establishment surveys under informative sampling," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 184(1), pages 255-281, January.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:psycho:v:85:y:2020:i:3:d:10.1007_s11336-020-09725-2. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.