IDEAS home Printed from https://ideas.repec.org/a/spr/advdac/v16y2022i3d10.1007_s11634-021-00448-5.html
   My bibliography  Save this article

Finite mixture modeling of censored and missing data using the multivariate skew-normal distribution

Author

Listed:
  • Francisco H. C. Alencar

    (Universidade Estadual de Campinas)

  • Christian E. Galarza

    (Escuela Superior Politécnica del Litoral, ESPOL)

  • Larissa A. Matos

    (Universidade Estadual de Campinas)

  • Victor H. Lachos

    (University of Connecticut)

Abstract

Finite mixture models have been widely used to model and analyze data from a heterogeneous populations. Moreover, data of this kind can be missing or subject to some upper and/or lower detection limits because of the constraints of experimental apparatuses. Another complication arises when measures of each population depart significantly from normality, such as asymmetric behavior. For such data structures, we propose a robust model for censored and/or missing data based on finite mixtures of multivariate skew-normal distributions. This approach allows us to model data with great flexibility, accommodating multimodality and skewness, simultaneously, depending on the structure of the mixture components. We develop an analytically simple, yet efficient, EM-type algorithm for conducting maximum likelihood estimation of the parameters. The algorithm has closed-form expressions at the E-step that rely on formulas for the mean and variance of the truncated multivariate skew-normal distributions. Furthermore, a general information-based method for approximating the asymptotic covariance matrix of the estimators is also presented. Results obtained from the analysis of both simulated and real datasets are reported to demonstrate the effectiveness of the proposed method. The proposed algorithm and method are implemented in the new R package CensMFM.

Suggested Citation

  • Francisco H. C. Alencar & Christian E. Galarza & Larissa A. Matos & Victor H. Lachos, 2022. "Finite mixture modeling of censored and missing data using the multivariate skew-normal distribution," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 16(3), pages 521-557, September.
  • Handle: RePEc:spr:advdac:v:16:y:2022:i:3:d:10.1007_s11634-021-00448-5
    DOI: 10.1007/s11634-021-00448-5
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11634-021-00448-5
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11634-021-00448-5?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Wan-Lun Wang & Min Liu & Tsung-I Lin, 2017. "Robust skew-t factor analysis models for handling missing data," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 26(4), pages 649-672, November.
    2. Reinaldo B. Arellano-Valle & Marc G. Genton, 2010. "Multivariate extended skew-t distributions and related families," Metron - International Journal of Statistics, Dipartimento di Statistica, Probabilità e Statistiche Applicate - University of Rome, vol. 0(3), pages 201-234.
    3. Victor H. Lachos & Dipankar Bandyopadhyay & Dipak K. Dey, 2011. "Linear and Nonlinear Mixed-Effects Models for Censored HIV Viral Loads Using Normal/Independent Distributions," Biometrics, The International Biometric Society, vol. 67(4), pages 1594-1604, December.
    4. A. Azzalini & A. Capitanio, 1999. "Statistical applications of the multivariate skew normal distribution," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 61(3), pages 579-602.
    5. Basso, Rodrigo M. & Lachos, Víctor H. & Cabral, Celso Rômulo Barbosa & Ghosh, Pulak, 2010. "Robust mixture modeling based on scale mixtures of skew-normal distributions," Computational Statistics & Data Analysis, Elsevier, vol. 54(12), pages 2926-2941, December.
    6. Prates, Marcos Oliveira & Lachos, Victor Hugo & Barbosa Cabral, Celso Rômulo, 2013. "mixsmsn: Fitting Finite Mixture of Scale Mixture of Skew-Normal Distributions," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 54(i12).
    7. Cabral, Celso Rômulo Barbosa & Lachos, Víctor Hugo & Prates, Marcos O., 2012. "Multivariate mixture modeling using skew-normal independent distributions," Computational Statistics & Data Analysis, Elsevier, vol. 56(1), pages 126-142, January.
    8. Arellano-Valle, Reinaldo B. & Genton, Marc G., 2005. "On fundamental skew distributions," Journal of Multivariate Analysis, Elsevier, vol. 96(1), pages 93-116, September.
    9. Wang, Wan-Lun & Castro, Luis M. & Lachos, Victor H. & Lin, Tsung-I, 2019. "Model-based clustering of censored data via mixtures of factor analyzers," Computational Statistics & Data Analysis, Elsevier, vol. 140(C), pages 104-121.
    10. Lin, Tsung I. & Ho, Hsiu J. & Chen, Chiang L., 2009. "Analysis of multivariate skew normal models with incomplete data," Journal of Multivariate Analysis, Elsevier, vol. 100(10), pages 2337-2351, November.
    11. Lachos, Víctor H. & Moreno, Edgar J. López & Chen, Kun & Cabral, Celso Rômulo Barbosa, 2017. "Finite mixture modeling of censored data using the multivariate Student-t distribution," Journal of Multivariate Analysis, Elsevier, vol. 159(C), pages 151-167.
    12. Steven Caudill, 2012. "A partially adaptive estimator for the censored regression model based on a mixture of normal distributions," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 21(2), pages 121-137, June.
    13. Maria Karlsson & Thomas Laitila, 2014. "Finite mixture modeling of censored regression models," Statistical Papers, Springer, vol. 55(3), pages 627-642, August.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Valeriano, Katherine A.L. & Galarza, Christian E. & Matos, Larissa A. & Lachos, Victor H., 2023. "Likelihood-based inference for the multivariate skew-t regression with censored or missing responses," Journal of Multivariate Analysis, Elsevier, vol. 196(C).

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Azzalini, Adelchi, 2022. "An overview on the progeny of the skew-normal family— A personal perspective," Journal of Multivariate Analysis, Elsevier, vol. 188(C).
    2. Sharon Lee & Geoffrey McLachlan, 2013. "On mixtures of skew normal and skew $$t$$ -distributions," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 7(3), pages 241-266, September.
    3. Camila Borelli Zeller & Celso Rômulo Barbosa Cabral & Víctor Hugo Lachos & Luis Benites, 2019. "Finite mixture of regression models for censored data based on scale mixtures of normal distributions," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 13(1), pages 89-116, March.
    4. Víctor H. Lachos & Celso R. B. Cabral & Marcos O. Prates & Dipak K. Dey, 2019. "Flexible regression modeling for censored data based on mixtures of student-t distributions," Computational Statistics, Springer, vol. 34(1), pages 123-152, March.
    5. Libin Jin & Sung Nok Chiu & Jianhua Zhao & Lixing Zhu, 2023. "A constrained maximum likelihood estimation for skew normal mixtures," Metrika: International Journal for Theoretical and Applied Statistics, Springer, vol. 86(4), pages 391-419, May.
    6. Morris, Katherine & Punzo, Antonio & McNicholas, Paul D. & Browne, Ryan P., 2019. "Asymmetric clusters and outliers: Mixtures of multivariate contaminated shifted asymmetric Laplace distributions," Computational Statistics & Data Analysis, Elsevier, vol. 132(C), pages 145-166.
    7. Mirfarah, Elham & Naderi, Mehrdad & Chen, Ding-Geng, 2021. "Mixture of linear experts model for censored data: A novel approach with scale-mixture of normal distributions," Computational Statistics & Data Analysis, Elsevier, vol. 158(C).
    8. Sharon Lee & Geoffrey McLachlan, 2013. "Model-based clustering and classification with non-normal mixture distributions," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 22(4), pages 427-454, November.
    9. Lachos, Víctor H. & Moreno, Edgar J. López & Chen, Kun & Cabral, Celso Rômulo Barbosa, 2017. "Finite mixture modeling of censored data using the multivariate Student-t distribution," Journal of Multivariate Analysis, Elsevier, vol. 159(C), pages 151-167.
    10. Christian E. Galarza & Larissa A. Matos & Victor H. Lachos, 2022. "An EM algorithm for estimating the parameters of the multivariate skew-normal distribution with censored responses," METRON, Springer;Sapienza Università di Roma, vol. 80(2), pages 231-253, August.
    11. Wan-Lun Wang & Luis M. Castro & Yen-Ting Chang & Tsung-I Lin, 2019. "Mixtures of restricted skew-t factor analyzers with common factor loadings," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 13(2), pages 445-480, June.
    12. Wan-Lun Wang & Tsung-I Lin, 2022. "Robust clustering of multiply censored data via mixtures of t factor analyzers," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 31(1), pages 22-53, March.
    13. Lee, Sharon X. & McLachlan, Geoffrey J., 2022. "An overview of skew distributions in model-based clustering," Journal of Multivariate Analysis, Elsevier, vol. 188(C).
    14. Cabral, Celso Rômulo Barbosa & Lachos, Víctor Hugo & Zeller, Camila Borelli, 2014. "Multivariate measurement error models using finite mixtures of skew-Student t distributions," Journal of Multivariate Analysis, Elsevier, vol. 124(C), pages 179-198.
    15. Naderi, Mehrdad & Hung, Wen-Liang & Lin, Tsung-I & Jamalizadeh, Ahad, 2019. "A novel mixture model using the multivariate normal mean–variance mixture of Birnbaum–Saunders distributions and its application to extrasolar planets," Journal of Multivariate Analysis, Elsevier, vol. 171(C), pages 126-138.
    16. Valeriano, Katherine A.L. & Galarza, Christian E. & Matos, Larissa A. & Lachos, Victor H., 2023. "Likelihood-based inference for the multivariate skew-t regression with censored or missing responses," Journal of Multivariate Analysis, Elsevier, vol. 196(C).
    17. Olcay Arslan, 2015. "Variance-mean mixture of the multivariate skew normal distribution," Statistical Papers, Springer, vol. 56(2), pages 353-378, May.
    18. Wan-Lun Wang & Ahad Jamalizadeh & Tsung-I Lin, 2020. "Finite mixtures of multivariate scale-shape mixtures of skew-normal distributions," Statistical Papers, Springer, vol. 61(6), pages 2643-2670, December.
    19. Tsung-I Lin & I-An Chen & Wan-Lun Wang, 2023. "A robust factor analysis model based on the canonical fundamental skew-t distribution," Statistical Papers, Springer, vol. 64(2), pages 367-393, April.
    20. Wan-Lun Wang & Tsung-I Lin, 2023. "Model-based clustering via mixtures of unrestricted skew normal factor analyzers with complete and incomplete data," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 32(3), pages 787-817, September.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:advdac:v:16:y:2022:i:3:d:10.1007_s11634-021-00448-5. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.