IDEAS home Printed from https://ideas.repec.org/a/gam/jstats/v4y2021i4p53-930d677377.html
   My bibliography  Save this article

Incorporating Clustering Techniques into GAMLSS

Author

Listed:
  • Thiago G. Ramires

    (Campus Apucarana, Universidade Tecnológica Federal do Paraná, Apucarana 86812-460, Brazil
    These authors contributed equally to this work.)

  • Luiz R. Nakamura

    (Departamento de Informática e Estatística, Universidade Federal de Santa Catarina, Florianópolis 88040-900, Brazil
    These authors contributed equally to this work.)

  • Ana J. Righetto

    (Alvaz Agritech, Londrina 86050-268, Brazil
    These authors contributed equally to this work.)

  • Andréa C. Konrath

    (Departamento de Informática e Estatística, Universidade Federal de Santa Catarina, Florianópolis 88040-900, Brazil
    These authors contributed equally to this work.)

  • Carlos A. B. Pereira

    (Instituto de Matemática e Estatística, Universidade de São Paulo, São Paulo 05508-090, Brazil
    These authors contributed equally to this work.)

Abstract

A method for statistical analysis of multimodal and/or highly distorted data is presented. The new methodology combines different clustering methods with the GAMLSS (generalized additive models for location, scale, and shape) framework, and is therefore called c-GAMLSS, for “clustering GAMLSS. ” In this new extended structure, a latent variable (cluster) is created to explain the response-variable (target). Any and all parameters of the distribution for the response variable can also be modeled by functions of the new covariate added to other available resources (features). The method of selecting resources to be used is carried out in stages, a step-based method. A simulation study considering multiple scenarios is presented to compare the c-GAMLSS method with existing Gaussian mixture models. We show by means of four different data applications that in cases where other authentic explanatory variables are or are not available, the c-GAMLSS structure outperforms mixture models, some recently developed complex distributions, cluster-weighted models, and a mixture-of-experts model. Even though we use simple distributions in our examples, other more sophisticated distributions can be used to explain the response variable.

Suggested Citation

  • Thiago G. Ramires & Luiz R. Nakamura & Ana J. Righetto & Andréa C. Konrath & Carlos A. B. Pereira, 2021. "Incorporating Clustering Techniques into GAMLSS," Stats, MDPI, vol. 4(4), pages 1-15, November.
  • Handle: RePEc:gam:jstats:v:4:y:2021:i:4:p:53-930:d:677377
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2571-905X/4/4/53/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2571-905X/4/4/53/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Keefe Murphy & Thomas Brendan Murphy, 2020. "Gaussian parsimonious clustering models with covariates and a noise component," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 14(2), pages 293-325, June.
    2. Fionn Murtagh & Pierre Legendre, 2014. "Ward’s Hierarchical Agglomerative Clustering Method: Which Algorithms Implement Ward’s Criterion?," Journal of Classification, Springer;The Classification Society, vol. 31(3), pages 274-295, October.
    3. R. A. Rigby & D. M. Stasinopoulos, 2005. "Generalized additive models for location, scale and shape," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 54(3), pages 507-554, June.
    4. Stasinopoulos, D. Mikis & Rigby, Robert A., 2007. "Generalized Additive Models for Location Scale and Shape (GAMLSS) in R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 23(i07).
    5. A. Azzalini & A.W. Bowman, 1990. "A Look at Some Data on the Old Faithful Geyser," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 39(3), pages 357-365, November.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Cleary, Sean & Willcott, Neal, 2022. "The cost of delaying to invest: A Canadian perspective," Finance Research Letters, Elsevier, vol. 50(C).
    2. Cheng, Fangwei & Luo, Hongxi & Jenkins, Jesse D. & Larson, Eric D., 2023. "The value of low- and negative-carbon fuels in the transition to net-zero emission economies: Lifecycle greenhouse gas emissions and cost assessments across multiple fuel types," Applied Energy, Elsevier, vol. 331(C).

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Yixuan Wang & Jianzhu Li & Ping Feng & Rong Hu, 2015. "A Time-Dependent Drought Index for Non-Stationary Precipitation Series," Water Resources Management: An International Journal, Published for the European Water Resources Association (EWRA), Springer;European Water Resources Association (EWRA), vol. 29(15), pages 5631-5647, December.
    2. Panayi, Efstathios & Peters, Gareth W. & Danielsson, Jon & Zigrand, Jean-Pierre, 2018. "Designating market maker behaviour in limit order book markets," Econometrics and Statistics, Elsevier, vol. 5(C), pages 20-44.
    3. Gauss Cordeiro & Josemar Rodrigues & Mário Castro, 2012. "The exponential COM-Poisson distribution," Statistical Papers, Springer, vol. 53(3), pages 653-664, August.
    4. Christian Kleiber & Achim Zeileis, 2016. "Visualizing Count Data Regressions Using Rootograms," The American Statistician, Taylor & Francis Journals, vol. 70(3), pages 296-303, July.
    5. Matteo Malavasi & Gareth W. Peters & Pavel V. Shevchenko & Stefan Truck & Jiwook Jang & Georgy Sofronov, 2021. "Cyber Risk Frequency, Severity and Insurance Viability," Papers 2111.03366, arXiv.org, revised Mar 2022.
    6. Lucio Masserini & Matilde Bini & Monica Pratesi, 2017. "Effectiveness of non-selective evaluation test scores for predicting first-year performance in university career: a zero-inflated beta regression approach," Quality & Quantity: International Journal of Methodology, Springer, vol. 51(2), pages 693-708, March.
    7. Tong, Edward N.C. & Mues, Christophe & Thomas, Lyn, 2013. "A zero-adjusted gamma model for mortgage loan loss given default," International Journal of Forecasting, Elsevier, vol. 29(4), pages 548-562.
    8. Alexander Silbersdorff & Kai Sebastian Schneider, 2019. "Distributional Regression Techniques in Socioeconomic Research on the Inequality of Health with an Application on the Relationship between Mental Health and Income," IJERPH, MDPI, vol. 16(20), pages 1-28, October.
    9. Tong, Edward N.C. & Mues, Christophe & Brown, Iain & Thomas, Lyn C., 2016. "Exposure at default models with and without the credit conversion factor," European Journal of Operational Research, Elsevier, vol. 252(3), pages 910-920.
    10. Micha{l} Narajewski & Florian Ziel, 2020. "Ensemble Forecasting for Intraday Electricity Prices: Simulating Trajectories," Papers 2005.01365, arXiv.org, revised Aug 2020.
    11. D. Chiru Naik & Sagar Rohidas Chavan & P. Sonali, 2023. "Incorporating the climate oscillations in the computation of meteorological drought over India," Natural Hazards: Journal of the International Society for the Prevention and Mitigation of Natural Hazards, Springer;International Society for the Prevention and Mitigation of Natural Hazards, vol. 117(3), pages 2617-2646, July.
    12. Shuhui Guo & Lihua Xiong & Jie Chen & Shenglian Guo & Jun Xia & Ling Zeng & Chong-Yu Xu, 2023. "Nonstationary Regional Flood Frequency Analysis Based on the Bayesian Method," Water Resources Management: An International Journal, Published for the European Water Resources Association (EWRA), Springer;European Water Resources Association (EWRA), vol. 37(2), pages 659-681, January.
    13. Maike Hohberg & Katja Landau & Thomas Kneib & Stephan Klasen & Walter Zucchini, 2018. "Vulnerability to poverty revisited: Flexible modeling and better predictive performance," The Journal of Economic Inequality, Springer;Society for the Study of Economic Inequality, vol. 16(3), pages 439-454, September.
    14. Kuntz, Laura-Chloé, 2020. "Beta dispersion and market timing," Journal of Empirical Finance, Elsevier, vol. 59(C), pages 235-256.
    15. Epstein, Leonardo D. & Inostroza-Quezada, Ignacio E. & Goodstein, Ronald C. & Choi, S. Chan, 2021. "Dynamic effects of store promotions on purchase conversion: Expanding technology applications with innovative analytics," Journal of Business Research, Elsevier, vol. 128(C), pages 279-289.
    16. Serinaldi, Francesco, 2011. "Distributional modeling and short-term forecasting of electricity prices by Generalized Additive Models for Location, Scale and Shape," Energy Economics, Elsevier, vol. 33(6), pages 1216-1226.
    17. Yolanda M. Gómez & Diego I. Gallardo & Marcelo Bourguignon & Eduardo Bertolli & Vinicius F. Calsavara, 2023. "A general class of promotion time cure rate models with a new biological interpretation," Lifetime Data Analysis: An International Journal Devoted to Statistical Methods and Applications for Time-to-Event Data, Springer, vol. 29(1), pages 66-86, January.
    18. Christophe Croux & Irène Gijbels & Ilaria Prosdocimi, 2012. "Robust Estimation of Mean and Dispersion Functions in Extended Generalized Additive Models," Biometrics, The International Biometric Society, vol. 68(1), pages 31-44, March.
    19. I. Gijbels & I. Prosdocimi & G. Claeskens, 2010. "Nonparametric estimation of mean and dispersion functions in extended generalized linear models," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 19(3), pages 580-608, November.
    20. Groll, Andreas & Hambuckers, Julien & Kneib, Thomas & Umlauf, Nikolaus, 2019. "LASSO-type penalization in the framework of generalized additive models for location, scale and shape," Computational Statistics & Data Analysis, Elsevier, vol. 140(C), pages 59-73.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jstats:v:4:y:2021:i:4:p:53-930:d:677377. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.