IDEAS home Printed from https://ideas.repec.org/a/spr/stmapp/v22y2013i4p427-454.html
   My bibliography  Save this article

Model-based clustering and classification with non-normal mixture distributions

Author

Listed:
  • Sharon Lee
  • Geoffrey McLachlan

Abstract

Non-normal mixture distributions have received increasing attention in recent years. Finite mixtures of multivariate skew-symmetric distributions, in particular, the skew normal and skew $$t$$ t -mixture models, are emerging as promising extensions to the traditional normal and $$t$$ t -mixture models. Most of these parametric families of skew distributions are closely related, and can be classified into four forms under a recently proposed scheme, namely, the restricted, unrestricted, extended, and generalised forms. In this paper, we consider some of these existing proposals of multivariate non-normal mixture models and illustrate their practical use in several real applications. We first discuss the characterizations along with a brief account of some distributions belonging to the above classification scheme, then references for software implementation of EM-type algorithms for the estimation of the model parameters are given. We then compare the relative performance of restricted and unrestricted skew mixture models in clustering, discriminant analysis, and density estimation on six real datasets from flow cytometry, finance, and image analysis. We also compare the performance of mixtures of skew normal and $$t$$ t -component distributions with other non-normal component distributions, including mixtures with multivariate normal-inverse-Gaussian distributions, shifted asymmetric Laplace distributions and generalized hyperbolic distributions. Copyright Springer-Verlag Berlin Heidelberg 2013

Suggested Citation

  • Sharon Lee & Geoffrey McLachlan, 2013. "Model-based clustering and classification with non-normal mixture distributions," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 22(4), pages 427-454, November.
  • Handle: RePEc:spr:stmapp:v:22:y:2013:i:4:p:427-454
    DOI: 10.1007/s10260-013-0237-4
    as

    Download full text from publisher

    File URL: http://hdl.handle.net/10.1007/s10260-013-0237-4
    Download Restriction: Access to full text is restricted to subscribers.

    File URL: https://libkey.io/10.1007/s10260-013-0237-4?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Vrbik, I. & McNicholas, P.D., 2012. "Analytic calculations for the EM algorithm for multivariate skew-t mixture models," Statistics & Probability Letters, Elsevier, vol. 82(6), pages 1169-1174.
    2. Barry Arnold & Robert Beaver & Richard Groeneveld & William Meeker, 1993. "The nontruncated marginal of a truncated bivariate normal distribution," Psychometrika, Springer;The Psychometric Society, vol. 58(3), pages 471-488, September.
    3. Nadarajah, Saralees & Kotz, Samuel, 2003. "Skewed distributions generated by the normal kernel," Statistics & Probability Letters, Elsevier, vol. 65(3), pages 269-277, November.
    4. Gupta, Arjun K. & González-Farías, Graciela & Domínguez-Molina, J. Armando, 2004. "A multivariate skew normal distribution," Journal of Multivariate Analysis, Elsevier, vol. 89(1), pages 181-190, April.
    5. Edward I. Altman, 1968. "Financial Ratios, Discriminant Analysis And The Prediction Of Corporate Bankruptcy," Journal of Finance, American Finance Association, vol. 23(4), pages 589-609, September.
    6. Christoffersen, Peter F, 1998. "Evaluating Interval Forecasts," International Economic Review, Department of Economics, University of Pennsylvania and Osaka University Institute of Social and Economic Research Association, vol. 39(4), pages 841-862, November.
    7. Pilsun Choi & Insik Min, 2011. "A Comparison Of Conditional And Unconditional Approaches In Value‐At‐Risk Estimation," The Japanese Economic Review, Japanese Economic Association, vol. 62(1), pages 99-115, March.
    8. Edward I. Altman, 1968. "The Prediction Of Corporate Bankruptcy: A Discriminant Analysis," Journal of Finance, American Finance Association, vol. 23(1), pages 193-194, March.
    9. A. Azzalini & A. Capitanio, 1999. "Statistical applications of the multivariate skew normal distribution," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 61(3), pages 579-602.
    10. Branco, Márcia D. & Dey, Dipak K., 2001. "A General Class of Multivariate Skew-Elliptical Distributions," Journal of Multivariate Analysis, Elsevier, vol. 79(1), pages 99-113, October.
    11. Basso, Rodrigo M. & Lachos, Víctor H. & Cabral, Celso Rômulo Barbosa & Ghosh, Pulak, 2010. "Robust mixture modeling based on scale mixtures of skew-normal distributions," Computational Statistics & Data Analysis, Elsevier, vol. 54(12), pages 2926-2941, December.
    12. Lawrence Hubert & Phipps Arabie, 1985. "Comparing partitions," Journal of Classification, Springer;The Classification Society, vol. 2(1), pages 193-218, December.
    13. Cabral, Celso Rômulo Barbosa & Lachos, Víctor Hugo & Prates, Marcos O., 2012. "Multivariate mixture modeling using skew-normal independent distributions," Computational Statistics & Data Analysis, Elsevier, vol. 56(1), pages 126-142, January.
    14. Arellano-Valle, Reinaldo B. & Genton, Marc G., 2005. "On fundamental skew distributions," Journal of Multivariate Analysis, Elsevier, vol. 96(1), pages 93-116, September.
    15. Karlis, Dimitris & Xekalaki, Evdokia, 2003. "Choosing initial values for the EM algorithm for finite mixtures," Computational Statistics & Data Analysis, Elsevier, vol. 41(3-4), pages 577-590, January.
    16. Liseo, Brunero & Loperfido, Nicola, 2003. "A Bayesian interpretation of the multivariate skew-normal distribution," Statistics & Probability Letters, Elsevier, vol. 61(4), pages 395-401, February.
    17. Arellano-Valle, R. B. & del Pino, G. & San Martín, E., 2002. "Definition and probabilistic properties of skew-distributions," Statistics & Probability Letters, Elsevier, vol. 58(2), pages 111-121, June.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Wan-Lun Wang & Tsung-I Lin, 2015. "Robust model-based clustering via mixtures of skew-t distributions with missing information," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 9(4), pages 423-445, December.
    2. Zhu, Xuwen & Melnykov, Volodymyr, 2018. "Manly transformation in finite mixture modeling," Computational Statistics & Data Analysis, Elsevier, vol. 121(C), pages 190-208.
    3. Volodymyr Melnykov & Xuwen Zhu, 2019. "An extension of the K-means algorithm to clustering skewed data," Computational Statistics, Springer, vol. 34(1), pages 373-394, March.
    4. Faicel Chamroukhi, 2016. "Piecewise Regression Mixture for Simultaneous Functional Data Clustering and Optimal Segmentation," Journal of Classification, Springer;The Classification Society, vol. 33(3), pages 374-411, October.
    5. Maruotti, Antonello & Punzo, Antonio, 2017. "Model-based time-varying clustering of multivariate longitudinal data with covariates and outliers," Computational Statistics & Data Analysis, Elsevier, vol. 113(C), pages 475-496.
    6. Murray, Paula M. & Browne, Ryan P. & McNicholas, Paul D., 2017. "Hidden truncation hyperbolic distributions, finite mixtures thereof, and their application for clustering," Journal of Multivariate Analysis, Elsevier, vol. 161(C), pages 141-156.
    7. Wan-Lun Wang & Tsung-I Lin, 2023. "Model-based clustering via mixtures of unrestricted skew normal factor analyzers with complete and incomplete data," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 32(3), pages 787-817, September.
    8. Derek S. Young & Xi Chen & Dilrukshi C. Hewage & Ricardo Nilo-Poyanco, 2019. "Finite mixture-of-gamma distributions: estimation, inference, and model-based clustering," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 13(4), pages 1053-1082, December.
    9. Melnykov, Volodymyr & Zhu, Xuwen, 2018. "On model-based clustering of skewed matrix data," Journal of Multivariate Analysis, Elsevier, vol. 167(C), pages 181-194.
    10. Lee, Sharon X. & McLachlan, Geoffrey J., 2022. "An overview of skew distributions in model-based clustering," Journal of Multivariate Analysis, Elsevier, vol. 188(C).
    11. Salvatore D. Tomarchio & Luca Bagnato & Antonio Punzo, 2022. "Model-based clustering via new parsimonious mixtures of heavy-tailed distributions," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 106(2), pages 315-347, June.
    12. Sylvia Frühwirth-Schnatter & Gertraud Malsiner-Walli, 2019. "From here to infinity: sparse finite versus Dirichlet process mixtures in model-based clustering," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 13(1), pages 33-64, March.
    13. Xuwen Zhu, 2019. "Probability of misclassification in model-based clustering," Computational Statistics, Springer, vol. 34(3), pages 1427-1442, September.
    14. Ryan Janicki & Tucker S. McElroy, 2016. "Hermite expansion and estimation of monotonic transformations of Gaussian data," Journal of Nonparametric Statistics, Taylor & Francis Journals, vol. 28(1), pages 207-234, March.
    15. Nicola Loperfido, 2019. "Finite mixtures, projection pursuit and tensor rank: a triangulation," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 13(1), pages 145-173, March.
    16. McLachlan, Geoffrey J. & Lee, Sharon X., 2016. "Comment on “On nomenclature, and the relative merits of two formulations of skew distributions” by A. Azzalini, R. Browne, M. Genton, and P. McNicholas," Statistics & Probability Letters, Elsevier, vol. 116(C), pages 1-5.
    17. Wraith, Darren & Forbes, Florence, 2015. "Location and scale mixtures of Gaussians with flexible tail behaviour: Properties, inference and application to multivariate clustering," Computational Statistics & Data Analysis, Elsevier, vol. 90(C), pages 61-73.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Sharon Lee & Geoffrey McLachlan, 2013. "On mixtures of skew normal and skew $$t$$ -distributions," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 7(3), pages 241-266, September.
    2. Azzalini, Adelchi, 2022. "An overview on the progeny of the skew-normal family— A personal perspective," Journal of Multivariate Analysis, Elsevier, vol. 188(C).
    3. Morris, Katherine & Punzo, Antonio & McNicholas, Paul D. & Browne, Ryan P., 2019. "Asymmetric clusters and outliers: Mixtures of multivariate contaminated shifted asymmetric Laplace distributions," Computational Statistics & Data Analysis, Elsevier, vol. 132(C), pages 145-166.
    4. Samuel Kotz & Donatella Vicari, 2005. "Survey of developments in the theory of continuous skewed distributions," Metron - International Journal of Statistics, Dipartimento di Statistica, Probabilità e Statistiche Applicate - University of Rome, vol. 0(2), pages 225-261.
    5. Arellano-Valle, Reinaldo B. & Genton, Marc G., 2005. "On fundamental skew distributions," Journal of Multivariate Analysis, Elsevier, vol. 96(1), pages 93-116, September.
    6. Cabral, Celso Rômulo Barbosa & Lachos, Víctor Hugo & Zeller, Camila Borelli, 2014. "Multivariate measurement error models using finite mixtures of skew-Student t distributions," Journal of Multivariate Analysis, Elsevier, vol. 124(C), pages 179-198.
    7. Hossein Negarestani & Ahad Jamalizadeh & Sobhan Shafiei & Narayanaswamy Balakrishnan, 2019. "Mean mixtures of normal distributions: properties, inference and application," Metrika: International Journal for Theoretical and Applied Statistics, Springer, vol. 82(4), pages 501-528, May.
    8. Hok Shing Kwong & Saralees Nadarajah, 2022. "A New Robust Class of Skew Elliptical Distributions," Methodology and Computing in Applied Probability, Springer, vol. 24(3), pages 1669-1691, September.
    9. Lin, Tsung-I & McLachlan, Geoffrey J. & Lee, Sharon X., 2016. "Extending mixtures of factor models using the restricted multivariate skew-normal distribution," Journal of Multivariate Analysis, Elsevier, vol. 143(C), pages 398-413.
    10. Cristina Tortora & Brian C. Franczak & Ryan P. Browne & Paul D. McNicholas, 2019. "A Mixture of Coalesced Generalized Hyperbolic Distributions," Journal of Classification, Springer;The Classification Society, vol. 36(1), pages 26-57, April.
    11. Arellano-Valle, Reinaldo B. & Ferreira, Clécio S. & Genton, Marc G., 2018. "Scale and shape mixtures of multivariate skew-normal distributions," Journal of Multivariate Analysis, Elsevier, vol. 166(C), pages 98-110.
    12. Wan-Lun Wang & Tsung-I Lin, 2023. "Model-based clustering via mixtures of unrestricted skew normal factor analyzers with complete and incomplete data," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 32(3), pages 787-817, September.
    13. Kim, Hyoung-Moon & Ryu, Duchwan & Mallick, Bani K. & Genton, Marc G., 2014. "Mixtures of skewed Kalman filters," Journal of Multivariate Analysis, Elsevier, vol. 123(C), pages 228-251.
    14. Dey, Dipak K. & Liu, Junfeng, 2005. "A new construction for skew multivariate distributions," Journal of Multivariate Analysis, Elsevier, vol. 95(2), pages 323-344, August.
    15. Arellano-Valle, R.B. & Ozan, S. & Bolfarine, H. & Lachos, V.H., 2005. "Skew normal measurement error models," Journal of Multivariate Analysis, Elsevier, vol. 96(2), pages 265-281, October.
    16. Wan-Lun Wang & Tsung-I Lin, 2022. "Robust clustering of multiply censored data via mixtures of t factor analyzers," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 31(1), pages 22-53, March.
    17. Francisco H. C. Alencar & Christian E. Galarza & Larissa A. Matos & Victor H. Lachos, 2022. "Finite mixture modeling of censored and missing data using the multivariate skew-normal distribution," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 16(3), pages 521-557, September.
    18. Mauro Bernardi & Roy Cerqueti & Arsen Palestini, 2020. "The Skew Normal multivariate risk measurement framework," Computational Management Science, Springer, vol. 17(1), pages 105-119, January.
    19. Wraith, Darren & Forbes, Florence, 2015. "Location and scale mixtures of Gaussians with flexible tail behaviour: Properties, inference and application to multivariate clustering," Computational Statistics & Data Analysis, Elsevier, vol. 90(C), pages 61-73.
    20. Reinaldo B. Arellano-Valle & Marc G. Genton, 2010. "Multivariate extended skew-t distributions and related families," Metron - International Journal of Statistics, Dipartimento di Statistica, Probabilità e Statistiche Applicate - University of Rome, vol. 0(3), pages 201-234.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:stmapp:v:22:y:2013:i:4:p:427-454. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.