IDEAS home Printed from https://ideas.repec.org/a/spr/jclass/v40y2023i3d10.1007_s00357-023-09445-z.html
   My bibliography  Save this article

On Model-Based Clustering of Directional Data with Heavy Tails

Author

Listed:
  • Yingying Zhang

    (Western Michigan University)

  • Volodymyr Melnykov

    (University of Alabama)

  • Igor Melnykov

    (University of Minnesota Duluth)

Abstract

Directional statistics deals with data that can be naturally expressed in the form of vector directions. The von Mises-Fisher distribution is one of the most fundamental parametric models to describe directional data. Mixtures of von Mises-Fisher distributions represent a popular approach to handling heterogeneous populations. However, components of such models can be affected by the presence of mild outliers or cluster tails heavier than what can be accommodated by means of a von Mises-Fisher distribution. To relax these model limitations, a mixture of contaminated von Mises-Fisher distributions is proposed. The performance of the proposed methodology is tested on synthetic data and applied to text and genetics data. The obtained results demonstrate the importance of the proposed procedure and its superiority over the traditional mixture of von Mises-Fisher distributions in the presence of heavy tails.

Suggested Citation

  • Yingying Zhang & Volodymyr Melnykov & Igor Melnykov, 2023. "On Model-Based Clustering of Directional Data with Heavy Tails," Journal of Classification, Springer;The Classification Society, vol. 40(3), pages 527-551, November.
  • Handle: RePEc:spr:jclass:v:40:y:2023:i:3:d:10.1007_s00357-023-09445-z
    DOI: 10.1007/s00357-023-09445-z
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s00357-023-09445-z
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s00357-023-09445-z?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. repec:bla:biomet:v:71:y:2015:i:4:p:1081-1089 is not listed on IDEAS
    2. Hyonho Chun & Sündüz Keleş, 2010. "Sparse partial least squares regression for simultaneous dimension reduction and variable selection," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 72(1), pages 3-25, January.
    3. Cabral, Celso Rômulo Barbosa & Lachos, Víctor Hugo & Prates, Marcos O., 2012. "Multivariate mixture modeling using skew-normal independent distributions," Computational Statistics & Data Analysis, Elsevier, vol. 56(1), pages 126-142, January.
    4. Yana Melnykov & Xuwen Zhu & Volodymyr Melnykov, 2021. "Transformation mixture modeling for skewed data groups with heavy tails and scatter," Computational Statistics, Springer, vol. 36(1), pages 61-78, March.
    5. Jian Zhang & Faming Liang, 2010. "Robust Clustering Using Exponential Power Mixtures," Biometrics, The International Biometric Society, vol. 66(4), pages 1078-1086, December.
    6. Alessio Farcomeni & Antonio Punzo, 2020. "Robust model-based clustering with mild and gross outliers," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 29(4), pages 989-1007, December.
    7. Lawrence Hubert & Phipps Arabie, 1985. "Comparing partitions," Journal of Classification, Springer;The Classification Society, vol. 2(1), pages 193-218, December.
    8. Sungsu Kim & Ashis SenGupta, 2021. "Multimodal exponential families of circular distributions with application to daily peak hours of PM2.5 level in a large city," Journal of Applied Statistics, Taylor & Francis Journals, vol. 48(16), pages 3193-3207, December.
    9. Kurt Hornik & Bettina Grün, 2014. "On maximum likelihood estimation of the concentration parameter of von Mises–Fisher distributions," Computational Statistics, Springer, vol. 29(5), pages 945-957, October.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Alessio Farcomeni & Antonio Punzo, 2020. "Robust model-based clustering with mild and gross outliers," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 29(4), pages 989-1007, December.
    2. Morris, Katherine & Punzo, Antonio & McNicholas, Paul D. & Browne, Ryan P., 2019. "Asymmetric clusters and outliers: Mixtures of multivariate contaminated shifted asymmetric Laplace distributions," Computational Statistics & Data Analysis, Elsevier, vol. 132(C), pages 145-166.
    3. Xuwen Zhu & Yana Melnykov & Angelina S. Kolomoytseva, 2024. "Contamination transformation matrix mixture modeling for skewed data groups with heavy tails and scatter," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 18(1), pages 85-101, March.
    4. Zhu, Xuwen & Melnykov, Volodymyr, 2018. "Manly transformation in finite mixture modeling," Computational Statistics & Data Analysis, Elsevier, vol. 121(C), pages 190-208.
    5. Tortora, Cristina & Franczak, Brian C. & Bagnato, Luca & Punzo, Antonio, 2024. "A Laplace-based model with flexible tail behavior," Computational Statistics & Data Analysis, Elsevier, vol. 192(C).
    6. Abbas Mahdavi & Anthony F. Desmond & Ahad Jamalizadeh & Tsung-I Lin, 2024. "Skew Multiple Scaled Mixtures of Normal Distributions with Flexible Tail Behavior and Their Application to Clustering," Journal of Classification, Springer;The Classification Society, vol. 41(3), pages 620-649, November.
    7. Ryan P. Browne & Luca Bagnato & Antonio Punzo, 2024. "Parsimony and parameter estimation for mixtures of multivariate leptokurtic-normal distributions," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 18(3), pages 597-625, September.
    8. Naderi, Mehrdad & Hung, Wen-Liang & Lin, Tsung-I & Jamalizadeh, Ahad, 2019. "A novel mixture model using the multivariate normal mean–variance mixture of Birnbaum–Saunders distributions and its application to extrasolar planets," Journal of Multivariate Analysis, Elsevier, vol. 171(C), pages 126-138.
    9. Yang, Yu-Chen & Lin, Tsung-I & Castro, Luis M. & Wang, Wan-Lun, 2020. "Extending finite mixtures of t linear mixed-effects models with concomitant covariates," Computational Statistics & Data Analysis, Elsevier, vol. 148(C).
    10. Melnykov, Volodymyr & Zhu, Xuwen, 2018. "On model-based clustering of skewed matrix data," Journal of Multivariate Analysis, Elsevier, vol. 167(C), pages 181-194.
    11. Salvatore D. Tomarchio & Antonio Punzo & Antonello Maruotti, 2024. "Matrix-Variate Hidden Markov Regression Models: Fixed and Random Covariates," Journal of Classification, Springer;The Classification Society, vol. 41(3), pages 429-454, November.
    12. Wan-Lun Wang & Tsung-I Lin, 2023. "Model-based clustering via mixtures of unrestricted skew normal factor analyzers with complete and incomplete data," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 32(3), pages 787-817, September.
    13. Sugasawa, Shonosuke & Kobayashi, Genya, 2022. "Robust fitting of mixture models using weighted complete estimating equations," Computational Statistics & Data Analysis, Elsevier, vol. 174(C).
    14. Sharon Lee & Geoffrey McLachlan, 2013. "Model-based clustering and classification with non-normal mixture distributions," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 22(4), pages 427-454, November.
    15. Wan-Lun Wang & Luis M. Castro & Yen-Ting Chang & Tsung-I Lin, 2019. "Mixtures of restricted skew-t factor analyzers with common factor loadings," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 13(2), pages 445-480, June.
    16. Wan-Lun Wang & Tsung-I Lin, 2022. "Robust clustering of multiply censored data via mixtures of t factor analyzers," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 31(1), pages 22-53, March.
    17. Xavier Bry & Lionel Cucala, 2022. "A von Mises–Fisher mixture model for clustering numerical and categorical variables," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 16(2), pages 429-455, June.
    18. Vrbik, Irene & McNicholas, Paul D., 2014. "Parsimonious skew mixture models for model-based clustering and classification," Computational Statistics & Data Analysis, Elsevier, vol. 71(C), pages 196-210.
    19. Lee, Sharon X. & McLachlan, Geoffrey J., 2022. "An overview of skew distributions in model-based clustering," Journal of Multivariate Analysis, Elsevier, vol. 188(C).
    20. Salvatore D. Tomarchio & Luca Bagnato & Antonio Punzo, 2022. "Model-based clustering via new parsimonious mixtures of heavy-tailed distributions," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 106(2), pages 315-347, June.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:jclass:v:40:y:2023:i:3:d:10.1007_s00357-023-09445-z. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.