IDEAS home Printed from https://ideas.repec.org/a/eee/ecosta/v22y2022icp136-158.html
   My bibliography  Save this article

Vine copula mixture models and clustering for non-Gaussian data

Author

Listed:
  • Sahin, Özge
  • Czado, Claudia

Abstract

The majority of finite mixture models suffer from not allowing asymmetric tail dependencies within components and not capturing non-elliptical clusters in clustering applications. Since vine copulas are very flexible in capturing these dependencies, a novel vine copula mixture model for continuous data is proposed. The model selection and parameter estimation problems are discussed, and further, a new model-based clustering algorithm is formulated. The use of vine copulas in clustering allows for a range of shapes and dependency structures for the clusters. The simulation experiments illustrate a significant gain in clustering accuracy when notably asymmetric tail dependencies or/and non-Gaussian margins within the components exist. The analysis of real data sets accompanies the proposed method. The model-based clustering algorithm with vine copula mixture models outperforms others, especially for the non-Gaussian multivariate data.

Suggested Citation

  • Sahin, Özge & Czado, Claudia, 2022. "Vine copula mixture models and clustering for non-Gaussian data," Econometrics and Statistics, Elsevier, vol. 22(C), pages 136-158.
  • Handle: RePEc:eee:ecosta:v:22:y:2022:i:c:p:136-158
    DOI: 10.1016/j.ecosta.2021.08.011
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S2452306221001052
    Download Restriction: Full text for ScienceDirect subscribers only. Contains open access articles

    File URL: https://libkey.io/10.1016/j.ecosta.2021.08.011?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Bouveyron, Charles & Brunet-Saumard, Camille, 2014. "Model-based clustering of high-dimensional data: A review," Computational Statistics & Data Analysis, Elsevier, vol. 71(C), pages 52-78.
    2. Wan-Lun Wang & Tsung-I Lin, 2015. "Robust model-based clustering via mixtures of skew-t distributions with missing information," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 9(4), pages 423-445, December.
    3. Paul D. McNicholas, 2016. "Model-Based Clustering," Journal of Classification, Springer;The Classification Society, vol. 33(3), pages 331-373, October.
    4. Utkarsh J. Dang & Ryan P. Browne & Paul D. McNicholas, 2015. "Mixtures of multivariate power exponential distributions," Biometrics, The International Biometric Society, vol. 71(4), pages 1081-1089, December.
    5. Weiß, Gregor N.F. & Scheffer, Marcus, 2015. "Mixture pair-copula-constructions," Journal of Banking & Finance, Elsevier, vol. 54(C), pages 175-191.
    6. Kim, Daeyoung & Kim, Jong-Min & Liao, Shu-Min & Jung, Yoon-Sung, 2013. "Mixture of D-vine copulas for modeling dependence," Computational Statistics & Data Analysis, Elsevier, vol. 64(C), pages 1-19.
    7. Anastasios Panagiotelis & Claudia Czado & Harry Joe, 2012. "Pair Copula Constructions for Multivariate Discrete Data," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 107(499), pages 1063-1072, September.
    8. Panagiotelis, Anastasios & Czado, Claudia & Joe, Harry & Stöber, Jakob, 2017. "Model selection for discrete regular vine copulas," Computational Statistics & Data Analysis, Elsevier, vol. 106(C), pages 138-152.
    9. Dißmann, J. & Brechmann, E.C. & Czado, C. & Kurowicka, D., 2013. "Selecting and estimating regular vine copulae and application to financial returns," Computational Statistics & Data Analysis, Elsevier, vol. 59(C), pages 52-69.
    10. Murray, Paula M. & Browne, Ryan P. & McNicholas, Paul D., 2017. "A mixture of SDB skew-t factor analyzers," Econometrics and Statistics, Elsevier, vol. 3(C), pages 160-168.
    11. Luca Scrucca & Adrian Raftery, 2015. "Improved initialisation of model-based clustering using Gaussian hierarchical partitions," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 9(4), pages 447-460, December.
    12. Aas, Kjersti & Czado, Claudia & Frigessi, Arnoldo & Bakken, Henrik, 2009. "Pair-copula constructions of multiple dependence," Insurance: Mathematics and Economics, Elsevier, vol. 44(2), pages 182-198, April.
    13. Nagler, T. & Bumann, C. & Czado, C., 2019. "Model selection in sparse high-dimensional vine copula models with an application to portfolio risk," Journal of Multivariate Analysis, Elsevier, vol. 172(C), pages 180-192.
    14. Prates, Marcos Oliveira & Lachos, Victor Hugo & Barbosa Cabral, Celso Rômulo, 2013. "mixsmsn: Fitting Finite Mixture of Scale Mixture of Skew-Normal Distributions," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 54(i12).
    15. Cabral, Celso Rômulo Barbosa & Lachos, Víctor Hugo & Prates, Marcos O., 2012. "Multivariate mixture modeling using skew-normal independent distributions," Computational Statistics & Data Analysis, Elsevier, vol. 56(1), pages 126-142, January.
    16. Gambacciani, Marco & Paolella, Marc S., 2017. "Robust normal mixtures for financial portfolio allocation," Econometrics and Statistics, Elsevier, vol. 3(C), pages 91-111.
    17. Cathy Maugis & Gilles Celeux & Marie-Laure Martin-Magniette, 2009. "Variable Selection for Clustering with Gaussian Mixture Models," Biometrics, The International Biometric Society, vol. 65(3), pages 701-709, September.
    18. Karlis, Dimitris & Xekalaki, Evdokia, 2003. "Choosing initial values for the EM algorithm for finite mixtures," Computational Statistics & Data Analysis, Elsevier, vol. 41(3-4), pages 577-590, January.
    19. Olvi L. Mangasarian & W. Nick Street & William H. Wolberg, 1995. "Breast Cancer Diagnosis and Prognosis Via Linear Programming," Operations Research, INFORMS, vol. 43(4), pages 570-577, August.
    20. Ling Hu, 2006. "Dependence patterns across financial markets: a mixed copula approach," Applied Financial Economics, Taylor & Francis Journals, vol. 16(10), pages 717-729.
    21. Raftery, Adrian E. & Dean, Nema, 2006. "Variable Selection for Model-Based Clustering," Journal of the American Statistical Association, American Statistical Association, vol. 101, pages 168-178, March.
    22. Christian Hennig, 2010. "Methods for merging Gaussian mixture components," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 4(1), pages 3-34, April.
    23. Stöber, Jakob & Joe, Harry & Czado, Claudia, 2013. "Simplified pair copula constructions—Limitations and extensions," Journal of Multivariate Analysis, Elsevier, vol. 119(C), pages 101-118.
    24. Krupskii, Pavel & Joe, Harry, 2013. "Factor copula models for multivariate data," Journal of Multivariate Analysis, Elsevier, vol. 120(C), pages 85-101.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Colubi, Ana & Ramos-Guajardo, Ana Belén, 2023. "Fuzzy sets and (fuzzy) random sets in Econometrics and Statistics," Econometrics and Statistics, Elsevier, vol. 26(C), pages 84-98.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Morris, Katherine & Punzo, Antonio & McNicholas, Paul D. & Browne, Ryan P., 2019. "Asymmetric clusters and outliers: Mixtures of multivariate contaminated shifted asymmetric Laplace distributions," Computational Statistics & Data Analysis, Elsevier, vol. 132(C), pages 145-166.
    2. Alessandro Casa & Andrea Cappozzo & Michael Fop, 2022. "Group-Wise Shrinkage Estimation in Penalized Model-Based Clustering," Journal of Classification, Springer;The Classification Society, vol. 39(3), pages 648-674, November.
    3. Weiß, Gregor N.F. & Scheffer, Marcus, 2015. "Mixture pair-copula-constructions," Journal of Banking & Finance, Elsevier, vol. 54(C), pages 175-191.
    4. Wang, Fan & Li, Heng & Dong, Chao, 2021. "Understanding near-miss count data on construction sites using greedy D-vine copula marginal regression," Reliability Engineering and System Safety, Elsevier, vol. 213(C).
    5. Müller, Dominik & Czado, Claudia, 2019. "Dependence modelling in ultra high dimensions with vine copulas and the Graphical Lasso," Computational Statistics & Data Analysis, Elsevier, vol. 137(C), pages 211-232.
    6. Weiping Zhang & MengMeng Zhang & Yu Chen, 2020. "A Copula-Based GLMM Model for Multivariate Longitudinal Data with Mixed-Types of Responses," Sankhya B: The Indian Journal of Statistics, Springer;Indian Statistical Institute, vol. 82(2), pages 353-379, November.
    7. Eugen Ivanov & Aleksey Min & Franz Ramsauer, 2017. "Copula-Based Factor Models for Multivariate Asset Returns," Econometrics, MDPI, vol. 5(2), pages 1-24, May.
    8. Wei, Yuhong & Tang, Yang & McNicholas, Paul D., 2019. "Mixtures of generalized hyperbolic distributions and mixtures of skew-t distributions for model-based clustering with incomplete data," Computational Statistics & Data Analysis, Elsevier, vol. 130(C), pages 18-41.
    9. Jakob Stöber & Ulf Schepsmeier, 2013. "Estimating standard errors in regular vine copula models," Computational Statistics, Springer, vol. 28(6), pages 2679-2707, December.
    10. Zhu, Xuwen & Melnykov, Volodymyr, 2018. "Manly transformation in finite mixture modeling," Computational Statistics & Data Analysis, Elsevier, vol. 121(C), pages 190-208.
    11. Paula M. Murray & Ryan P. Browne & Paul D. McNicholas, 2020. "Mixtures of Hidden Truncation Hyperbolic Factor Analyzers," Journal of Classification, Springer;The Classification Society, vol. 37(2), pages 366-379, July.
    12. Mazo, Gildas & Averyanov, Yaroslav, 2019. "Constraining kernel estimators in semiparametric copula mixture models," Computational Statistics & Data Analysis, Elsevier, vol. 138(C), pages 170-189.
    13. Aristidis Nikoloulopoulos & Harry Joe, 2015. "Factor Copula Models for Item Response Data," Psychometrika, Springer;The Psychometric Society, vol. 80(1), pages 126-150, March.
    14. Nagler, Thomas & Czado, Claudia, 2016. "Evading the curse of dimensionality in nonparametric density estimation with simplified vine copulas," Journal of Multivariate Analysis, Elsevier, vol. 151(C), pages 69-89.
    15. Apergis, Nicholas & Gozgor, Giray & Lau, Chi Keung Marco & Wang, Shixuan, 2020. "Dependence structure in the Australian electricity markets: New evidence from regular vine copulae," Energy Economics, Elsevier, vol. 90(C).
    16. Maziar Sahamkhadam & Andreas Stephan, 2019. "Portfolio optimization based on forecasting models using vine copulas: An empirical assessment for the financial crisis," Papers 1912.10328, arXiv.org.
    17. Brechmann, Eike & Czado, Claudia & Paterlini, Sandra, 2014. "Flexible dependence modeling of operational risk losses and its impact on total capital requirements," Journal of Banking & Finance, Elsevier, vol. 40(C), pages 271-285.
    18. Himchan Jeong & Dipak Dey, 2020. "Application of a Vine Copula for Multi-Line Insurance Reserving," Risks, MDPI, vol. 8(4), pages 1-23, October.
    19. Chang, Bo & Joe, Harry, 2019. "Prediction based on conditional distributions of vine copulas," Computational Statistics & Data Analysis, Elsevier, vol. 139(C), pages 45-63.
    20. Utkarsh J. Dang & Michael P.B. Gallaugher & Ryan P. Browne & Paul D. McNicholas, 2023. "Model-Based Clustering and Classification Using Mixtures of Multivariate Skewed Power Exponential Distributions," Journal of Classification, Springer;The Classification Society, vol. 40(1), pages 145-167, April.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:ecosta:v:22:y:2022:i:c:p:136-158. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: https://www.journals.elsevier.com/econometrics-and-statistics .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.