IDEAS home Printed from https://ideas.repec.org/a/spr/testjl/v34y2025i3d10.1007_s11749-025-00973-x.html
   My bibliography  Save this article

Mixed membership estimation for categorical data with weighted responses

Author

Listed:
  • Huan Qing

    (Chongqing University of Technology)

Abstract

The Grade-of-Membership (GoM) model, which allows subjects to belong to multiple latent classes, is a powerful tool for inferring latent classes in categorical data. However, its application is limited to categorical data with nonnegative integer responses, as it assumes that the response matrix is generated from Bernoulli or Binomial distributions, making it inappropriate for datasets with continuous or negative weighted responses. To address this, this paper proposes a novel model named the Weighted-Grade-of-Membership (WGoM) model. Our WGoM is more general than GoM because it relaxes GoM’s distribution constraint by allowing the response matrix to be generated from distributions like Bernoulli, Binomial, Normal, and Uniform as long as the expected response matrix has a block structure related to subjects’ mixed memberships under the distribution. We show that WGoM can describe any response matrix with finite distinct elements. We then propose an algorithm to estimate the latent mixed memberships and other WGoM parameters. We derive the error bounds of the estimated parameters and show that the algorithm is statistically consistent. We also propose an efficient method for determining the number of latent classes K for categorical data with weighted responses by maximizing fuzzy-weighted modularity. The performance of our methods is validated through both synthetic and real-world datasets. The results demonstrate the accuracy and efficiency of our algorithm for estimating latent mixed memberships, as well as the high accuracy of our method for estimating K, indicating their high potential for practical applications.

Suggested Citation

  • Huan Qing, 2025. "Mixed membership estimation for categorical data with weighted responses," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 34(3), pages 612-659, September.
  • Handle: RePEc:spr:testjl:v:34:y:2025:i:3:d:10.1007_s11749-025-00973-x
    DOI: 10.1007/s11749-025-00973-x
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11749-025-00973-x
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11749-025-00973-x?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to

    for a different version of it.

    References listed on IDEAS

    as
    1. Jin, Jiashun & Ke, Zheng Tracy & Luo, Shengming, 2024. "Mixed membership estimation for social networks," Journal of Econometrics, Elsevier, vol. 239(2).
    2. H. Tolley & Kenneth Manton, 1992. "Large sample properties of estimates of a discrete grade of membership model," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 44(1), pages 85-95, March.
    3. Zhenghao Zeng & Yuqi Gu & Gongjun Xu, 2023. "A Tensor-EM Method for Large-Scale Latent Class Analysis with Binary Responses," Psychometrika, Springer;The Psychometric Society, vol. 88(2), pages 580-612, June.
    4. White, Arthur & Murphy, Thomas Brendan, 2014. "BayesLCA: An R Package for Bayesian Latent Class Analysis," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 61(i13).
    5. Ling Chen & Yuqi Gu, 2024. "A Spectral Method for Identifiable Grade of Membership Analysis with Binary Responses," Psychometrika, Springer;The Psychometric Society, vol. 89(2), pages 626-657, June.
    6. Elizabeth S. Garrett & Scott L. Zeger, 2000. "Latent Class Model Diagnosis," Biometrics, The International Biometric Society, vol. 56(4), pages 1055-1067, December.
    7. Yuqi Gu & Gongjun Xu, 2023. "A Joint MLE Approach to Large-Scale Structured Latent Attribute Analysis," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 118(541), pages 746-760, January.
    8. Kushal K Dey & Chiaowen Joyce Hsiao & Matthew Stephens, 2017. "Visualizing the structure of RNA-seq expression data using grade of membership models," PLOS Genetics, Public Library of Science, vol. 13(3), pages 1-23, March.
    9. Danyang Huang & Runze Li & Hansheng Wang, 2014. "Feature Screening for Ultrahigh Dimensional Categorical Data With Applications," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 32(2), pages 237-244, April.
    10. Chris Skinner, 2019. "Analysis of Categorical Data for Complex Surveys," International Statistical Review, International Statistical Institute, vol. 87(S1), pages 64-78, May.
    11. Elena Erosheva, 2005. "Comparing Latent Structures of the Grade of Membership, Rasch, and Latent Class Models," Psychometrika, Springer;The Psychometric Society, vol. 70(4), pages 619-628, December.
    12. Xueyu Mao & Purnamrita Sarkar & Deepayan Chakrabarti, 2021. "Estimating Mixed Memberships With Sharp Eigenvector Deviations," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 116(536), pages 1928-1940, October.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Qing, Huan, 2025. "Discovering overlapping communities in multi-layer directed networks," Chaos, Solitons & Fractals, Elsevier, vol. 194(C).
    2. Ling Chen & Yuqi Gu, 2024. "A Spectral Method for Identifiable Grade of Membership Analysis with Binary Responses," Psychometrika, Springer;The Psychometric Society, vol. 89(2), pages 626-657, June.
    3. Adrian O’Hagan & Arthur White, 2019. "Improved model-based clustering performance using Bayesian initialization averaging," Computational Statistics, Springer, vol. 34(1), pages 201-231, March.
    4. Yan Feng & Erpeng Liu & Zhang Yue & Qilin Zhang & Tiankuo Han, 2019. "The Evolutionary Trends of Health Behaviors in Chinese Elderly and the Influencing Factors of These Trends: 2005–2014," IJERPH, MDPI, vol. 16(10), pages 1-17, May.
    5. Hwan Chung & Brian P. Flaherty & Joseph L. Schafer, 2006. "Latent class logistic regression: application to marijuana use and attitudes among high school seniors," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 169(4), pages 723-743, October.
    6. Lucia Taraborrelli & Yasin Şenbabaoğlu & Lifen Wang & Junghyun Lim & Kerrigan Blake & Noelyn Kljavin & Sarah Gierke & Alexis Scherl & James Ziai & Erin McNamara & Mark Owyong & Shilpa Rao & Aslihan Ka, 2023. "Tumor-intrinsic expression of the autophagy gene Atg16l1 suppresses anti-tumor immunity in colorectal cancer," Nature Communications, Nature, vol. 14(1), pages 1-17, December.
    7. Alexandre Cunha & Diana Sawyer & José Monteiro da Silva & Luca Lazzarini & Marília Rocha & Tamara Santos, 2020. "Country selection for a comparative case study on labour law systems," Policy Research Brief 65, International Policy Centre.
    8. Sarrias, Mauricio, 2021. "A two recursive equation model to correct for endogeneity in latent class binary probit models," Journal of choice modelling, Elsevier, vol. 40(C).
    9. Beatrice Franzolini & Giovanni Rebaudo, 2024. "Entropy regularization in probabilistic clustering," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 33(1), pages 37-60, March.
    10. Gioacchino Fazio & Francesca Giambona & Erasmo Vassallo & Elli Vassiliadis, 2018. "A Measure of Trust: The Italian Regional Divide in a Latent Class Approach," Social Indicators Research: An International and Interdisciplinary Journal for Quality-of-Life Measurement, Springer, vol. 140(1), pages 209-242, November.
    11. Weicong Lyu & Jee-Seon Kim & Youmi Suk, 2023. "Estimating Heterogeneous Treatment Effects Within Latent Class Multilevel Models: A Bayesian Approach," Journal of Educational and Behavioral Statistics, , vol. 48(1), pages 3-36, February.
    12. Artur Pokropek, 2016. "Grade of Membership Response Time Model for Detecting Guessing Behaviors," Journal of Educational and Behavioral Statistics, , vol. 41(3), pages 300-325, June.
    13. Subtil, Ana & de Oliveira, M. Rosário & Gonçalves, Luzia, 2012. "Conditional dependence diagnostic in the latent class model: A simulation study," Statistics & Probability Letters, Elsevier, vol. 82(7), pages 1407-1412.
    14. Kenneth W. Griffin & Lawrence M. Scheier & Bianca Acevedo & Jerry L. Grenard & Gilbert J. Botvin, 2011. "Long-Term Effects of Self-Control on Alcohol Use and Sexual Behavior among Urban Minority Young Women," IJERPH, MDPI, vol. 9(1), pages 1-23, December.
    15. Chia-Yi Chiu & Yan Sun & Yanhong Bian, 2018. "Cognitive Diagnosis for Small Educational Programs: The General Nonparametric Classification Method," Psychometrika, Springer;The Psychometric Society, vol. 83(2), pages 355-375, June.
    16. Brian Neelon & A. James O'Malley & Sharon-Lise T. Normand, 2011. "A Bayesian Two-Part Latent Class Model for Longitudinal Medical Expenditure Data: Assessing the Impact of Mental Health and Substance Abuse Parity," Biometrics, The International Biometric Society, vol. 67(1), pages 280-289, March.
    17. Ye, Mao & Zhang, Peng & Nie, Lizhen, 2018. "Clustering sparse binary data with hierarchical Bayesian Bernoulli mixture model," Computational Statistics & Data Analysis, Elsevier, vol. 123(C), pages 32-49.
    18. Baiguo An & Guozhong Feng & Jianhua Guo, 2022. "Interaction Identification and Clique Screening for Classification with Ultra-high Dimensional Discrete Features," Journal of Classification, Springer;The Classification Society, vol. 39(1), pages 122-146, March.
    19. Lyu Ni & Fang Fang & Fangjiao Wan, 2017. "Adjusted Pearson Chi-Square feature screening for multi-classification with ultrahigh dimensional data," Metrika: International Journal for Theoretical and Applied Statistics, Springer, vol. 80(6), pages 805-828, November.
    20. Xiaotian Wu & Hao Wu & Zhijin Wu, 2021. "Penalized Latent Dirichlet Allocation Model in Single-Cell RNA Sequencing," Statistics in Biosciences, Springer;International Chinese Statistical Association, vol. 13(3), pages 543-562, December.

    More about this item

    Keywords

    ;
    ;
    ;
    ;
    ;

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:testjl:v:34:y:2025:i:3:d:10.1007_s11749-025-00973-x. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.