IDEAS home Printed from https://ideas.repec.org/a/jss/jstsof/v055i12.html
   My bibliography  Save this article

EMMIXuskew: An R Package for Fitting Mixtures of Multivariate Skew t Distributions via the EM Algorithm

Author

Listed:
  • McLachlan, Geoff
  • Lee, Sharon X

Abstract

This paper describes an algorithm for fitting finite mixtures of unrestricted Multivariate Skew t (FM-uMST) distributions. The package EMMIXuskew implements a closed-form expectation-maximization (EM) algorithm for computing the maximum likelihood (ML) estimates of the parameters for the (unrestricted) FM-MST model in R. EMMIXuskew also supports visualization of fitted contours in two and three dimensions, and random sample generation from a specified FM-uMST distribution. Finite mixtures of skew t distributions have proven to be useful in modelling heterogeneous data with asymmetric and heavy tail behaviour, for example, datasets from flow cytometry. In recent years, various versions of mixtures with multivariate skew t (MST) distributions have been proposed. However, these models adopted some restricted characterizations of the component MST distributions so that the E-step of the EM algorithm can be evaluated in closed form. This paper focuses on mixtures with unrestricted MST components, and describes an iterative algorithm for the computation of the ML estimates of its model parameters. Its implementation in R is presented with the package EMMIXuskew. The usefulness of the proposed algorithm is demonstrated in three applications to real datasets. The first example illustrates the use of the main function fmmst in the package by fitting a MST distribution to a bivariate unimodal flow cytometric sample. The second example fits a mixture of MST distributions to the Australian Institute of Sport (AIS) data, and demonstrates that EMMIXuskew can provide better clustering results than mixtures with restricted MST components. In the third example, EMMIXuskew is applied to classify cells in a trivariate flow cytometric dataset. Comparisons with some other available methods suggest that EMMIXuskew achieves a lower misclassification rate with respect to the labels given by benchmark gating analysis.

Suggested Citation

  • McLachlan, Geoff & Lee, Sharon X, 2013. "EMMIXuskew: An R Package for Fitting Mixtures of Multivariate Skew t Distributions via the EM Algorithm," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 55(i12).
  • Handle: RePEc:jss:jstsof:v:055:i12
    DOI: http://hdl.handle.net/10.18637/jss.v055.i12
    as

    Download full text from publisher

    File URL: https://www.jstatsoft.org/index.php/jss/article/view/v055i12/v55i12.pdf
    Download Restriction: no

    File URL: https://www.jstatsoft.org/index.php/jss/article/downloadSuppFile/v055i12/EMMIXuskew_0.11-5.tar.gz
    Download Restriction: no

    File URL: https://www.jstatsoft.org/index.php/jss/article/downloadSuppFile/v055i12/v55i12.R
    Download Restriction: no

    File URL: https://www.jstatsoft.org/index.php/jss/article/downloadSuppFile/v055i12/DLBCL.zip
    Download Restriction: no

    File URL: https://libkey.io/http://hdl.handle.net/10.18637/jss.v055.i12?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Ravi Varadhan & Christophe Roland, 2008. "Simple and Globally Convergent Methods for Accelerating the Convergence of Any EM Algorithm," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 35(2), pages 335-353, June.
    2. -, 2003. "Capital flows to Latin America: first quarter 2003," Oficina de la CEPAL en Washington (Estudios e Investigaciones) 28822, Naciones Unidas Comisión Económica para América Latina y el Caribe (CEPAL).
    3. Basso, Rodrigo M. & Lachos, Víctor H. & Cabral, Celso Rômulo Barbosa & Ghosh, Pulak, 2010. "Robust mixture modeling based on scale mixtures of skew-normal distributions," Computational Statistics & Data Analysis, Elsevier, vol. 54(12), pages 2926-2941, December.
    4. Adelchi Azzalini & Antonella Capitanio, 2003. "Distributions generated by perturbation of symmetry with emphasis on a multivariate skew t‐distribution," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 65(2), pages 367-389, May.
    5. -, 2003. "Capital flows to Latin America: second quarter 2002," Oficina de la CEPAL en Washington (Estudios e Investigaciones) 28812, Naciones Unidas Comisión Económica para América Latina y el Caribe (CEPAL).
    6. Álvarez Alvarado, Marcos Tulio, 2003. "¿Existe una alternativa al capitalismo?," Observatorio de la Economía Latinoamericana, Servicios Académicos Intercontinentales SL. Hasta 31/12/2022, issue 16, November.
    7. -, 2003. "Capital flows to Latin America: second quarter 2003," Oficina de la CEPAL en Washington (Estudios e Investigaciones) 28823, Naciones Unidas Comisión Económica para América Latina y el Caribe (CEPAL).
    8. -, 2003. "Capital flows to Latin America: first quarter 2002," Oficina de la CEPAL en Washington (Estudios e Investigaciones) 28811, Naciones Unidas Comisión Económica para América Latina y el Caribe (CEPAL).
    9. -, 2003. "Capital flows to Latin America: fourth quarter 2002," Oficina de la CEPAL en Washington (Estudios e Investigaciones) 28814, Naciones Unidas Comisión Económica para América Latina y el Caribe (CEPAL).
    10. -, 2003. "Capital flows to Latin America: third quarter 2003," Oficina de la CEPAL en Washington (Estudios e Investigaciones) 28824, Naciones Unidas Comisión Económica para América Latina y el Caribe (CEPAL).
    11. -, 2003. "Capital flows to Latin America: third quarter 2002," Oficina de la CEPAL en Washington (Estudios e Investigaciones) 28813, Naciones Unidas Comisión Económica para América Latina y el Caribe (CEPAL).
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Palczewski, Andrzej & Palczewski, Jan, 2019. "Black–Litterman model for continuous distributions," European Journal of Operational Research, Elsevier, vol. 273(2), pages 708-720.
    2. Badi H. Baltagi & Georges Bresson & Anoop Chaturvedi & Guy Lacroix, 2021. "Robust Dynamic Panel Data Models Using 𝛆𝛆-Contamination," Center for Policy Research Working Papers 240, Center for Policy Research, Maxwell School, Syracuse University.
    3. Rachid Laajaj & Duncan Webb & Danilo Aristizabal & Eduardo Behrentz & Raquel Bernal & Giancarlo Buitrago & Zulma Cucunubá & Fernando de la Hoz, 2021. "Understanding how socioeconomic inequalities drive inequalities in SARS-CoV-2 infections," Documentos CEDE 19241, Universidad de los Andes, Facultad de Economía, CEDE.
    4. Wan-Lun Wang & Ahad Jamalizadeh & Tsung-I Lin, 2020. "Finite mixtures of multivariate scale-shape mixtures of skew-normal distributions," Statistical Papers, Springer, vol. 61(6), pages 2643-2670, December.
    5. Lee, Sharon X. & McLachlan, Geoffrey J., 2022. "An overview of skew distributions in model-based clustering," Journal of Multivariate Analysis, Elsevier, vol. 188(C).
    6. Lin, Tsung-I & McLachlan, Geoffrey J. & Lee, Sharon X., 2016. "Extending mixtures of factor models using the restricted multivariate skew-normal distribution," Journal of Multivariate Analysis, Elsevier, vol. 143(C), pages 398-413.
    7. Ahad Jamalizadeh & Tsung-I Lin, 2017. "A general class of scale-shape mixtures of skew-normal distributions: properties and estimation," Computational Statistics, Springer, vol. 32(2), pages 451-474, June.
    8. José E. Chacón, 2019. "Mixture model modal clustering," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 13(2), pages 379-404, June.
    9. Baltagi, Badi H. & Bresson, Georges & Chaturvedi, Anoop & Lacroix, Guy, 2022. "Robust Dynamic Space-Time Panel Data Models Using ?-Contamination: An Application to Crop Yields and Climate Change," IZA Discussion Papers 15815, Institute of Labor Economics (IZA).
    10. Antonio Parisi & B. Liseo, 2018. "Objective Bayesian analysis for the multivariate skew-t model," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 27(2), pages 277-295, June.
    11. Murray, Paula M. & Browne, Ryan P. & McNicholas, Paul D., 2017. "Hidden truncation hyperbolic distributions, finite mixtures thereof, and their application for clustering," Journal of Multivariate Analysis, Elsevier, vol. 161(C), pages 141-156.
    12. McLachlan, Geoffrey J. & Lee, Sharon X., 2016. "Comment on “On nomenclature, and the relative merits of two formulations of skew distributions” by A. Azzalini, R. Browne, M. Genton, and P. McNicholas," Statistics & Probability Letters, Elsevier, vol. 116(C), pages 1-5.
    13. Wraith, Darren & Forbes, Florence, 2015. "Location and scale mixtures of Gaussians with flexible tail behaviour: Properties, inference and application to multivariate clustering," Computational Statistics & Data Analysis, Elsevier, vol. 90(C), pages 61-73.
    14. Azzalini, Adelchi & Browne, Ryan P. & Genton, Marc G. & McNicholas, Paul D., 2016. "On nomenclature for, and the relative merits of, two formulations of skew distributions," Statistics & Probability Letters, Elsevier, vol. 110(C), pages 201-206.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Ahad Jamalizadeh & Tsung-I Lin, 2017. "A general class of scale-shape mixtures of skew-normal distributions: properties and estimation," Computational Statistics, Springer, vol. 32(2), pages 451-474, June.
    2. Mehdi Amiri & Ahad Jamalizadeh & Mina Towhidi, 2015. "Inference and further probabilistic properties of the $$ SUN_{n,2}$$ S U N n , 2 -distribution," Statistical Papers, Springer, vol. 56(4), pages 1071-1098, November.
    3. Komárek, Arnošt & Komárková, Lenka, 2014. "Capabilities of R Package mixAK for Clustering Based on Multivariate Continuous and Discrete Longitudinal Data," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 59(i12).
    4. Wan-Lun Wang & Min Liu & Tsung-I Lin, 2017. "Robust skew-t factor analysis models for handling missing data," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 26(4), pages 649-672, November.
    5. Tao Lu, 2017. "Bayesian inference on longitudinal-survival data with multiple features," Computational Statistics, Springer, vol. 32(3), pages 845-866, September.
    6. Wan-Lun Wang & Tsung-I Lin, 2015. "Robust model-based clustering via mixtures of skew-t distributions with missing information," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 9(4), pages 423-445, December.
    7. Yangxin Huang & Tao Lu, 2017. "Bayesian inference on partially linear mixed-effects joint models for longitudinal data with multiple features," Computational Statistics, Springer, vol. 32(1), pages 179-196, March.
    8. John Marangos & Charles J. Whalen, 2011. "Evolution without fundamental change: the Washington Consensus on economic development," Chapters, in: Charles J. Whalen (ed.), Financial Instability and Economic Security after the Great Recession, chapter 8, pages 153-178, Edward Elgar Publishing.
    9. Alvaro Cuervo-Cazurra & Luis Alfonso Dau, 2009. "Structural Reform and Firm Exports," Management International Review, Springer, vol. 49(4), pages 479-507, September.
    10. Catarina Figueira & David Parker, 2011. "Infrastructure Liberalization: Challenges to the New Economic Paradigm in the Context of Developing Countries," Chapters, in: Matthias Finger & Rolf W. Künneke (ed.), International Handbook of Network Industries, chapter 27, Edward Elgar Publishing.
    11. Matthias Finger & Rolf W. Künneke (ed.), 2011. "International Handbook of Network Industries," Books, Edward Elgar Publishing, number 12961.
    12. Maria Carolina Basso, 2016. "A Economia Brasileira Sob Restrição Do Balanço De Pagamentos: Uma Análise Empírica Da Lei De Thirlwall No Boom Das Commodities," Anais do XLII Encontro Nacional de Economia [Proceedings of the 42nd Brazilian Economics Meeting] 089, ANPEC - Associação Nacional dos Centros de Pós-Graduação em Economia [Brazilian Association of Graduate Programs in Economics].
    13. Mahdi Salehi & Ahad Jamalizadeh & Mahdi Doostparast, 2014. "A generalized skew two-piece skew-elliptical distribution," Statistical Papers, Springer, vol. 55(2), pages 409-429, May.
    14. Ryo Kinoshita, 2015. "Asset allocation under higher moments with the GARCH filter," Empirical Economics, Springer, vol. 49(1), pages 235-254, August.
    15. Fatma Zehra Doğru & Olcay Arslan, 2021. "Finite mixtures of skew Laplace normal distributions with random skewness," Computational Statistics, Springer, vol. 36(1), pages 423-447, March.
    16. Olcay Arslan, 2015. "Variance-mean mixture of the multivariate skew normal distribution," Statistical Papers, Springer, vol. 56(2), pages 353-378, May.
    17. Byungsoo Kim & Sangyeol Lee, 2014. "Minimum density power divergence estimator for covariance matrix based on skew $$t$$ t distribution," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 23(4), pages 565-575, November.
    18. Azzalini, Adelchi, 2022. "An overview on the progeny of the skew-normal family— A personal perspective," Journal of Multivariate Analysis, Elsevier, vol. 188(C).
    19. Eugenia Correa, 2012. "Money and Institutions: The Long Path of the Latin American Financial Reforms," Chapters, in: Claude Gnos & Louis-Philippe Rochon & Domenica Tropeano (ed.), Employment, Growth and Development, chapter 11, Edward Elgar Publishing.
    20. Abbas Mahdavi & Vahid Amirzadeh & Ahad Jamalizadeh & Tsung-I Lin, 2021. "Maximum likelihood estimation for scale-shape mixtures of flexible generalized skew normal distributions via selection representation," Computational Statistics, Springer, vol. 36(3), pages 2201-2230, September.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:jss:jstsof:v:055:i12. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Christopher F. Baum (email available below). General contact details of provider: http://www.jstatsoft.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.