IDEAS home Printed from https://ideas.repec.org/a/spr/jclass/v42y2025i2d10.1007_s00357-024-09492-0.html

Studying Hierarchical Latent Structures in Heterogeneous Populations with Missing Information

Author

Listed:
  • Francesca Greselin

    (University of Milano-Bicocca)

  • Giorgia Zaccaria

    (University of Milano-Bicocca)

Abstract

An ultrametric Gaussian mixture model is a powerful tool for modeling hierarchical relationships among latent concepts, making it ideal for studying complex phenomena in diverse and potentially heterogeneous populations. However, in many cases, only an incomplete set of observations is available on the phenomenon under study. To address this issue, we propose MissUGMM, an ultrametric Gaussian mixture model which takes into account the missing at random mechanism for the unobserved values. Our approach is estimated using the expectation-maximization algorithm and achieves favorable results in comparison to other existing mixture models in simulations conducted with synthetic and benchmark data sets, even without a theorized ultrametric structure underlying the data. Furthermore, MissUGMM is applied to a real-world problem for exploring the sustainable development of cities across countries starting from incomplete information provided by municipalities. Overall, our results demonstrate that MissUGMM is a powerful and versatile model in dealing with missing data and is applicable to a broader range of real-world problems.

Suggested Citation

  • Francesca Greselin & Giorgia Zaccaria, 2025. "Studying Hierarchical Latent Structures in Heterogeneous Populations with Missing Information," Journal of Classification, Springer;The Classification Society, vol. 42(2), pages 284-310, July.
  • Handle: RePEc:spr:jclass:v:42:y:2025:i:2:d:10.1007_s00357-024-09492-0
    DOI: 10.1007/s00357-024-09492-0
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s00357-024-09492-0
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s00357-024-09492-0?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to

    for a different version of it.

    References listed on IDEAS

    as
    1. J. A. John & N. R. Draper, 1980. "An Alternative Family of Transformations," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 29(2), pages 190-197, June.
    2. Wang, Wan-Lun & Lin, Tsung-I, 2016. "Maximum likelihood inference for the multivariate t mixture model," Journal of Multivariate Analysis, Elsevier, vol. 149(C), pages 54-64.
    3. Francis Cailliez, 1983. "The analytical solution of the additive constant problem," Psychometrika, Springer;The Psychometric Society, vol. 48(2), pages 305-308, June.
    4. Geert Molenberghs & Caroline Beunckens & Cristina Sotto & Michael G. Kenward, 2008. "Every missingness not at random model has a missingness at random counterpart with equal fit," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 70(2), pages 371-388, April.
    5. Paul D. McNicholas, 2016. "Model-Based Clustering," Journal of Classification, Springer;The Classification Society, vol. 33(3), pages 331-373, October.
    6. Montanari, Angela & Viroli, Cinzia, 2011. "Maximum likelihood estimation of mixtures of factor analyzers," Computational Statistics & Data Analysis, Elsevier, vol. 55(9), pages 2712-2723, September.
    7. Di Zio, Marco & Guarnera, Ugo & Luzi, Orietta, 2007. "Imputation through finite Gaussian mixture models," Computational Statistics & Data Analysis, Elsevier, vol. 51(11), pages 5305-5316, July.
    8. Wan-Lun Wang & Tsung-I Lin, 2020. "Automated learning of mixtures of factor analysis models with missing information," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 29(4), pages 1098-1124, December.
    9. Wan-Lun Wang & Tsung-I Lin, 2022. "Robust clustering of multiply censored data via mixtures of t factor analyzers," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 31(1), pages 22-53, March.
    10. Boldea, Otilia & Magnus, Jan R., 2009. "Maximum Likelihood Estimation of the Multivariate Normal Mixture Model," Journal of the American Statistical Association, American Statistical Association, vol. 104(488), pages 1539-1549.
    11. Lawrence Hubert & Phipps Arabie, 1985. "Comparing partitions," Journal of Classification, Springer;The Classification Society, vol. 2(1), pages 193-218, December.
    12. McLachlan, G. J. & Peel, D. & Bean, R. W., 2003. "Modelling high-dimensional data by mixtures of factor analyzers," Computational Statistics & Data Analysis, Elsevier, vol. 41(3-4), pages 379-388, January.
    13. Wang, Wan-Lun, 2013. "Mixtures of common factor analyzers for high-dimensional data with missing information," Journal of Multivariate Analysis, Elsevier, vol. 117(C), pages 120-133.
    14. Wang, Wan-Lun & Castro, Luis M. & Lachos, Victor H. & Lin, Tsung-I, 2019. "Model-based clustering of censored data via mixtures of factor analyzers," Computational Statistics & Data Analysis, Elsevier, vol. 140(C), pages 104-121.
    15. Dankmar Böhning & Ekkehart Dietz & Rainer Schaub & Peter Schlattmann & Bruce Lindsay, 1994. "The distribution of the likelihood ratio for mixtures of densities from the one-parameter exponential family," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 46(2), pages 373-388, June.
    16. García-Escudero, Luis Angel & Gordaliza, Alfonso & Greselin, Francesca & Ingrassia, Salvatore & Mayo-Iscar, Agustín, 2016. "The joint role of trimming and constraints in robust estimation for mixtures of Gaussian factor analyzers," Computational Statistics & Data Analysis, Elsevier, vol. 99(C), pages 131-147.
    17. Wang, Wan-Lun, 2015. "Mixtures of common t-factor analyzers for modeling high-dimensional data with missing values," Computational Statistics & Data Analysis, Elsevier, vol. 83(C), pages 223-235.
    18. McNicholas, P.D. & Murphy, T.B. & McDaid, A.F. & Frost, D., 2010. "Serial and parallel implementations of model-based clustering via parsimonious Gaussian mixture models," Computational Statistics & Data Analysis, Elsevier, vol. 54(3), pages 711-723, March.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Wang, Wan-Lun, 2015. "Mixtures of common t-factor analyzers for modeling high-dimensional data with missing values," Computational Statistics & Data Analysis, Elsevier, vol. 83(C), pages 223-235.
    2. Wan-Lun Wang & Tsung-I Lin, 2022. "Robust clustering via mixtures of t factor analyzers with incomplete data," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 16(3), pages 659-690, September.
    3. Morris, Katherine & Punzo, Antonio & McNicholas, Paul D. & Browne, Ryan P., 2019. "Asymmetric clusters and outliers: Mixtures of multivariate contaminated shifted asymmetric Laplace distributions," Computational Statistics & Data Analysis, Elsevier, vol. 132(C), pages 145-166.
    4. Wei, Yuhong & Tang, Yang & McNicholas, Paul D., 2019. "Mixtures of generalized hyperbolic distributions and mixtures of skew-t distributions for model-based clustering with incomplete data," Computational Statistics & Data Analysis, Elsevier, vol. 130(C), pages 18-41.
    5. Wang, Wan-Lun, 2013. "Mixtures of common factor analyzers for high-dimensional data with missing information," Journal of Multivariate Analysis, Elsevier, vol. 117(C), pages 120-133.
    6. Wan-Lun Wang & Luis M. Castro & Yen-Ting Chang & Tsung-I Lin, 2019. "Mixtures of restricted skew-t factor analyzers with common factor loadings," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 13(2), pages 445-480, June.
    7. Nikola Počuča & Michael P. B. Gallaugher & Paul D. McNicholas, 2026. "Modelling Shanghai soil properties with finite mixtures of $$S_\text {U}$$ Johnson distributions," Computational Statistics, Springer, vol. 41(4), pages 1-21, June.
    8. Alexa A. Sochaniwsky & Michael P. B. Gallaugher & Yang Tang & Paul D. McNicholas, 2025. "Flexible Clustering with a Sparse Mixture of Generalized Hyperbolic Distributions," Journal of Classification, Springer;The Classification Society, vol. 42(1), pages 113-133, March.
    9. Paul McLaughlin & Brian C. Franczak & Adam B. Kashlak, 2024. "Unsupervised Classification with a Family of Parsimonious Contaminated Shifted Asymmetric Laplace Mixtures," Journal of Classification, Springer;The Classification Society, vol. 41(1), pages 65-93, March.
    10. Murray, Paula M. & Browne, Ryan P. & McNicholas, Paul D., 2017. "A mixture of SDB skew-t factor analyzers," Econometrics and Statistics, Elsevier, vol. 3(C), pages 160-168.
    11. Lin, Tsung-I & McLachlan, Geoffrey J. & Lee, Sharon X., 2016. "Extending mixtures of factor models using the restricted multivariate skew-normal distribution," Journal of Multivariate Analysis, Elsevier, vol. 143(C), pages 398-413.
    12. Andrews, Jeffrey L. & McNicholas, Paul D. & Subedi, Sanjeena, 2011. "Model-based classification via mixtures of multivariate t-distributions," Computational Statistics & Data Analysis, Elsevier, vol. 55(1), pages 520-529, January.
    13. Cristina Tortora & Brian C. Franczak & Ryan P. Browne & Paul D. McNicholas, 2019. "A Mixture of Coalesced Generalized Hyperbolic Distributions," Journal of Classification, Springer;The Classification Society, vol. 36(1), pages 26-57, April.
    14. Cristina Tortora & Paul D. McNicholas & Ryan P. Browne, 2016. "A mixture of generalized hyperbolic factor analyzers," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 10(4), pages 423-440, December.
    15. Ganesh Babu & Aoife Gowen & Michael Fop & Isobel Claire Gormley, 2025. "A consensus-constrained parsimonious Gaussian mixture model for clustering hyperspectral images," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 19(2), pages 323-359, June.
    16. Wan-Lun Wang & Tsung-I Lin, 2023. "Model-based clustering via mixtures of unrestricted skew normal factor analyzers with complete and incomplete data," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 32(3), pages 787-817, September.
    17. Alessandro Casa & Andrea Cappozzo & Michael Fop, 2022. "Group-Wise Shrinkage Estimation in Penalized Model-Based Clustering," Journal of Classification, Springer;The Classification Society, vol. 39(3), pages 648-674, November.
    18. Hung Tong & Cristina Tortora, 2022. "Model-based clustering and outlier detection with missing data," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 16(1), pages 5-30, March.
    19. Shaikh Mateen & McNicholas Paul D & Desmond Anthony F, 2010. "A Pseudo-EM Algorithm for Clustering Incomplete Longitudinal Data," The International Journal of Biostatistics, De Gruyter, vol. 6(1), pages 1-17, March.
    20. Tyler Roick & Dimitris Karlis & Paul D. McNicholas, 2021. "Clustering discrete-valued time series," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 15(1), pages 209-229, March.

    More about this item

    Keywords

    ;
    ;
    ;
    ;
    ;

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:jclass:v:42:y:2025:i:2:d:10.1007_s00357-024-09492-0. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.