IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v132y2019icp145-166.html
   My bibliography  Save this article

Asymmetric clusters and outliers: Mixtures of multivariate contaminated shifted asymmetric Laplace distributions

Author

Listed:
  • Morris, Katherine
  • Punzo, Antonio
  • McNicholas, Paul D.
  • Browne, Ryan P.

Abstract

Mixtures of multivariate contaminated shifted asymmetric Laplace distributions are developed for handling asymmetric clusters in the presence of outliers (also referred to as bad points herein). In addition to the parameters of the related non-contaminated mixture, for each (asymmetric) cluster, our model has one parameter controlling the proportion of outliers and another specifying the degree of contamination. Crucially, these parameters do not have to be specified a priori, adding a flexibility to our approach that is absent from other approaches such as trimming. Moreover, each observation is given an a posteriori probability of belonging to a particular cluster, and of being an outlier or not; advantageously, this allows for the automatic detection of outliers. An expectation–conditional maximization algorithm is outlined for parameter estimation and various implementation issues are discussed. The behavior of the proposed model is investigated, and compared with well-established finite mixture approaches, on artificial and real data.

Suggested Citation

  • Morris, Katherine & Punzo, Antonio & McNicholas, Paul D. & Browne, Ryan P., 2019. "Asymmetric clusters and outliers: Mixtures of multivariate contaminated shifted asymmetric Laplace distributions," Computational Statistics & Data Analysis, Elsevier, vol. 132(C), pages 145-166.
  • Handle: RePEc:eee:csdana:v:132:y:2019:i:c:p:145-166
    DOI: 10.1016/j.csda.2018.12.001
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167947318302809
    Download Restriction: Full text for ScienceDirect subscribers only.

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Maia Berkane & P. M. Bentler, 1988. "Estimation of Contamination Parameters and Identification of Outliers in Multivariate Data," Sociological Methods & Research, , vol. 17(1), pages 55-64, August.
    2. Vrbik, I. & McNicholas, P.D., 2012. "Analytic calculations for the EM algorithm for multivariate skew-t mixture models," Statistics & Probability Letters, Elsevier, vol. 82(6), pages 1169-1174.
    3. repec:spr:alstar:v:101:y:2017:i:3:d:10.1007_s10182-016-0281-0 is not listed on IDEAS
    4. Sanjeena Subedi & Paul McNicholas, 2014. "Variational Bayes approximations for clustering via mixtures of normal inverse Gaussian distributions," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 8(2), pages 167-193, June.
    5. Sanjeena Subedi & Antonio Punzo & Salvatore Ingrassia & Paul McNicholas, 2015. "Cluster-weighted $$t$$ t -factor analyzers for robust model-based clustering and dimension reduction," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 24(4), pages 623-649, November.
    6. repec:taf:japsta:v:45:y:2018:i:14:p:2563-2584 is not listed on IDEAS
    7. repec:eee:csdana:v:130:y:2019:i:c:p:18-41 is not listed on IDEAS
    8. repec:eee:insuma:v:81:y:2018:i:c:p:95-107 is not listed on IDEAS
    9. Edward I. Altman, 1968. "Financial Ratios, Discriminant Analysis And The Prediction Of Corporate Bankruptcy," Journal of Finance, American Finance Association, vol. 23(4), pages 589-609, September.
    10. Lawrence Hubert & Phipps Arabie, 1985. "Comparing partitions," Journal of Classification, Springer;The Classification Society, vol. 2(1), pages 193-218, December.
    11. McLachlan, G. J. & Peel, D. & Bean, R. W., 2003. "Modelling high-dimensional data by mixtures of factor analyzers," Computational Statistics & Data Analysis, Elsevier, vol. 41(3-4), pages 379-388, January.
    12. Luca Bagnato & Antonio Punzo, 2013. "Finite mixtures of unimodal beta and gamma densities and the $$k$$ -bumps algorithm," Computational Statistics, Springer, vol. 28(4), pages 1571-1597, August.
    13. repec:eee:ecosta:v:3:y:2017:i:c:p:160-168 is not listed on IDEAS
    14. Utkarsh J. Dang & Antonio Punzo & Paul D. McNicholas & Salvatore Ingrassia & Ryan P. Browne, 2017. "Multivariate Response and Parsimony for Gaussian Cluster-Weighted Models," Journal of Classification, Springer;The Classification Society, vol. 34(1), pages 4-34, April.
    15. Sanjeena Subedi & Antonio Punzo & Salvatore Ingrassia & Paul McNicholas, 2013. "Clustering and classification via cluster-weighted factor analyzers," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 7(1), pages 5-40, March.
    16. Vrbik, Irene & McNicholas, Paul D., 2014. "Parsimonious skew mixture models for model-based clustering and classification," Computational Statistics & Data Analysis, Elsevier, vol. 71(C), pages 196-210.
    17. repec:eee:jmvana:v:161:y:2017:i:c:p:141-156 is not listed on IDEAS
    18. repec:eee:stapro:v:145:y:2019:i:c:p:103-109 is not listed on IDEAS
    19. Adelchi Azzalini, 2005. "The Skew‐normal Distribution and Related Multivariate Families," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 32(2), pages 159-188, June.
    20. repec:spr:stmapp:v:26:y:2017:i:4:d:10.1007_s10260-017-0388-9 is not listed on IDEAS
    21. Paul D. McNicholas, 2016. "Model-Based Clustering," Journal of Classification, Springer;The Classification Society, vol. 33(3), pages 331-373, October.
    22. repec:eee:csdana:v:113:y:2017:i:c:p:475-496 is not listed on IDEAS
    23. C. Ruwet & L. García-Escudero & A. Gordaliza & A. Mayo-Iscar, 2012. "The influence function of the TCLUST robust clustering procedure," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 6(2), pages 107-130, July.
    24. Utkarsh J. Dang & Ryan P. Browne & Paul D. McNicholas, 2015. "Mixtures of multivariate power exponential distributions," Biometrics, The International Biometric Society, vol. 71(4), pages 1081-1089, December.
    25. Murray, Paula M. & Browne, Ryan P. & McNicholas, Paul D., 2014. "Mixtures of skew-t factor analyzers," Computational Statistics & Data Analysis, Elsevier, vol. 77(C), pages 326-335.
    26. Prates, Marcos Oliveira & Lachos, Victor Hugo & Barbosa Cabral, Celso Rômulo, 2013. "mixsmsn: Fitting Finite Mixture of Scale Mixture of Skew-Normal Distributions," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 54(i12).
    27. Jian Zhang & Faming Liang, 2010. "Robust Clustering Using Exponential Power Mixtures," Biometrics, The International Biometric Society, vol. 66(4), pages 1078-1086, December.
    28. Biernacki, Christophe & Celeux, Gilles & Govaert, Gerard, 2003. "Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models," Computational Statistics & Data Analysis, Elsevier, vol. 41(3-4), pages 561-575, January.
    29. repec:spr:jclass:v:34:y:2017:i:2:d:10.1007_s00357-017-9234-x is not listed on IDEAS
    30. Melnykov, Volodymyr & Melnykov, Igor, 2012. "Initializing the EM algorithm in Gaussian mixture models with an unknown number of components," Computational Statistics & Data Analysis, Elsevier, vol. 56(6), pages 1381-1395.
    31. Cabral, Celso Rômulo Barbosa & Lachos, Víctor Hugo & Prates, Marcos O., 2012. "Multivariate mixture modeling using skew-normal independent distributions," Computational Statistics & Data Analysis, Elsevier, vol. 56(1), pages 126-142, January.
    32. Tsung-I Lin & Hsiu Ho & Pao Shen, 2009. "Computationally efficient learning of multivariate t mixture models with missing information," Computational Statistics, Springer, vol. 24(3), pages 375-392, August.
    33. McNicholas, P.D. & Murphy, T.B. & McDaid, A.F. & Frost, D., 2010. "Serial and parallel implementations of model-based clustering via parsimonious Gaussian mixture models," Computational Statistics & Data Analysis, Elsevier, vol. 54(3), pages 711-723, March.
    34. Karlis, Dimitris & Xekalaki, Evdokia, 2003. "Choosing initial values for the EM algorithm for finite mixtures," Computational Statistics & Data Analysis, Elsevier, vol. 41(3-4), pages 577-590, January.
    35. Dankmar Böhning & Ekkehart Dietz & Rainer Schaub & Peter Schlattmann & Bruce Lindsay, 1994. "The distribution of the likelihood ratio for mixtures of densities from the one-parameter exponential family," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 46(2), pages 373-388, June.
    36. Basso, Rodrigo M. & Lachos, Víctor H. & Cabral, Celso Rômulo Barbosa & Ghosh, Pulak, 2010. "Robust mixture modeling based on scale mixtures of skew-normal distributions," Computational Statistics & Data Analysis, Elsevier, vol. 54(12), pages 2926-2941, December.
    37. McLachlan, G.J. & Bean, R.W. & Ben-Tovim Jones, L., 2007. "Extension of the mixture of factor analyzers model to incorporate the multivariate t-distribution," Computational Statistics & Data Analysis, Elsevier, vol. 51(11), pages 5327-5338, July.
    38. O’Hagan, Adrian & Murphy, Thomas Brendan & Gormley, Isobel Claire & McNicholas, Paul D. & Karlis, Dimitris, 2016. "Clustering with the multivariate normal inverse Gaussian distribution," Computational Statistics & Data Analysis, Elsevier, vol. 93(C), pages 18-30.
    Full references (including those not matched with items on IDEAS)

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:132:y:2019:i:c:p:145-166. See general information about how to correct material in RePEc.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: (Dana Niculescu). General contact details of provider: http://www.elsevier.com/locate/csda .

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service hosted by the Research Division of the Federal Reserve Bank of St. Louis . RePEc uses bibliographic data supplied by the respective publishers.