IDEAS home Printed from https://ideas.repec.org/a/spr/stmapp/v30y2021i3d10.1007_s10260-021-00569-3.html
   My bibliography  Save this article

Semiautomatic robust regression clustering of international trade data

Author

Listed:
  • Francesca Torti

    (Joint Research Centre (JRC))

  • Marco Riani

    (University of Parma)

  • Gianluca Morelli

    (University of Parma)

Abstract

The purpose of this paper is to show in regression clustering how to choose the most relevant solutions, analyze their stability, and provide information about best combinations of optimal number of groups, restriction factor among the error variance across groups and level of trimming. The procedure is based on two steps. First we generalize the information criteria of constrained robust multivariate clustering to the case of clustering weighted models. Differently from the traditional approaches which are based on the choice of the best solution found minimizing an information criterion (i.e. BIC), we concentrate our attention on the so called optimal stable solutions. In the second step, using the monitoring approach, we select the best value of the trimming factor. Finally, we validate the solution using a confirmatory forward search approach. A motivating example based on a novel dataset concerning the European Union trade of face masks shows the limitations of the current existing procedures. The suggested approach is initially applied to a set of well known datasets in the literature of robust regression clustering. Then, we focus our attention on a set of international trade datasets and we provide a novel informative way of updating the subset in the random start approach. The Supplementary material, in the spirit of the Special Issue, deepens the analysis of trade data and compares the suggested approach with the existing ones available in the literature.

Suggested Citation

  • Francesca Torti & Marco Riani & Gianluca Morelli, 2021. "Semiautomatic robust regression clustering of international trade data," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 30(3), pages 863-894, September.
  • Handle: RePEc:spr:stmapp:v:30:y:2021:i:3:d:10.1007_s10260-021-00569-3
    DOI: 10.1007/s10260-021-00569-3
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s10260-021-00569-3
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s10260-021-00569-3?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Marco Riani & Andrea Cerioli & Domenico Perrotta & Francesca Torti, 2015. "Simulating mixtures of multivariate data with fixed cluster overlap in FSDA library," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 9(4), pages 461-481, December.
    2. N. Gershenfeld & B. Schoner & E. Metois, 1999. "Cluster-weighted modelling for time-series analysis," Nature, Nature, vol. 397(6717), pages 329-332, January.
    3. Andrea Cerioli & Alessio Farcomeni & Marco Riani, 2019. "Wild adaptive trimming for robust estimation and cluster analysis," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 46(1), pages 235-256, March.
    4. Marco Riani & Aldo Corbellini & Anthony C. Atkinson, 2018. "The Use of Prior Information in Very Robust Regression for Fraud Detection," International Statistical Review, International Statistical Institute, vol. 86(2), pages 205-218, August.
    5. Torti, Francesca & Corbellini, Aldo & Atkinson, Anthony C., 2021. "fsdaSAS: a package for robust regression for very large datasets including the batch forward search," LSE Research Online Documents on Economics 109895, London School of Economics and Political Science, LSE Library.
    6. Anthony Atkinson & Marco Riani, 2004. "The forward search and data visualisation," Computational Statistics, Springer, vol. 19(1), pages 29-54, February.
    7. Rousseeuw, Peter & Perrotta, Domenico & Riani, Marco & Hubert, Mia, 2019. "Robust Monitoring of Time Series with Application to Fraud Detection," Econometrics and Statistics, Elsevier, vol. 9(C), pages 108-121.
    8. Wayne DeSarbo & William Cron, 1988. "A maximum likelihood methodology for clusterwise linear regression," Journal of Classification, Springer;The Classification Society, vol. 5(2), pages 249-282, September.
    9. García-Escudero, L.A. & Gordaliza, A. & Mayo-Iscar, A. & San Martín, R., 2010. "Robust clusterwise linear regression through trimming," Computational Statistics & Data Analysis, Elsevier, vol. 54(12), pages 3057-3069, December.
    10. Grun, Bettina & Leisch, Friedrich, 2007. "Fitting finite mixtures of generalized linear regressions in R," Computational Statistics & Data Analysis, Elsevier, vol. 51(11), pages 5247-5252, July.
    11. Andrea Cerioli & Domenico Perrotta, 2014. "Robust clustering around regression lines with high density regions," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 8(1), pages 5-26, March.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Tingting Wang & Linjie Qin & Chao Dai & Zhen Wang & Chenqi Gong, 2023. "Heterogeneous Learning of Functional Clustering Regression and Application to Chinese Air Pollution Data," IJERPH, MDPI, vol. 20(5), pages 1-21, February.
    2. Lucio Barabesi & Andrea Cerioli & Domenico Perrotta, 2021. "Forum on Benford’s law and statistical methods for the detection of frauds," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 30(3), pages 767-778, September.
    3. Andrea Cappozzo & Luis Angel García Escudero & Francesca Greselin & Agustín Mayo-Iscar, 2021. "Parameter Choice, Stability and Validity for Robust Cluster Weighted Modeling," Stats, MDPI, vol. 4(3), pages 1-14, July.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Francesca Torti & Domenico Perrotta & Marco Riani & Andrea Cerioli, 2019. "Assessing trimming methodologies for clustering linear regression data," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 13(1), pages 227-257, March.
    2. Lucio Barabesi & Andrea Cerioli & Domenico Perrotta, 2021. "Forum on Benford’s law and statistical methods for the detection of frauds," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 30(3), pages 767-778, September.
    3. Wu, Qiang & Yao, Weixin, 2016. "Mixtures of quantile regressions," Computational Statistics & Data Analysis, Elsevier, vol. 93(C), pages 162-176.
    4. Salvatore Ingrassia & Simona Minotti & Giorgio Vittadini, 2012. "Local Statistical Modeling via a Cluster-Weighted Approach with Elliptical Distributions," Journal of Classification, Springer;The Classification Society, vol. 29(3), pages 363-401, October.
    5. Adil M. Bagirov & Julien Ugon & Hijran G. Mirzayeva, 2015. "Nonsmooth Optimization Algorithm for Solving Clusterwise Linear Regression Problems," Journal of Optimization Theory and Applications, Springer, vol. 164(3), pages 755-780, March.
    6. Rainer Schlittgen, 2011. "A weighted least-squares approach to clusterwise regression," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 95(2), pages 205-217, June.
    7. Salvatore D. Tomarchio & Paul D. McNicholas & Antonio Punzo, 2021. "Matrix Normal Cluster-Weighted Models," Journal of Classification, Springer;The Classification Society, vol. 38(3), pages 556-575, October.
    8. Francesco Dotto & Alessio Farcomeni & Luis Angel García-Escudero & Agustín Mayo-Iscar, 2017. "A fuzzy approach to robust regression clustering," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 11(4), pages 691-710, December.
    9. Lloyd-Jones, Luke R. & Nguyen, Hien D. & McLachlan, Geoffrey J., 2018. "A globally convergent algorithm for lasso-penalized mixture of linear regression models," Computational Statistics & Data Analysis, Elsevier, vol. 119(C), pages 19-38.
    10. Riani, Marco & Atkinson, Anthony Curtis & Corbellini, Aldo & Farcomeni, Alessio & Laurini, Fabrizio, 2024. "Information Criteria for Outlier Detection Avoiding Arbitrary Significance Levels," Econometrics and Statistics, Elsevier, vol. 29(C), pages 189-205.
    11. Brenton R. Clarke & Andrew Grose, 2023. "A further study comparing forward search multivariate outlier methods including ATLA with an application to clustering," Statistical Papers, Springer, vol. 64(2), pages 395-420, April.
    12. Mirfarah, Elham & Naderi, Mehrdad & Chen, Ding-Geng, 2021. "Mixture of linear experts model for censored data: A novel approach with scale-mixture of normal distributions," Computational Statistics & Data Analysis, Elsevier, vol. 158(C).
    13. Joki, Kaisa & Bagirov, Adil M. & Karmitsa, Napsu & Mäkelä, Marko M. & Taheri, Sona, 2020. "Clusterwise support vector linear regression," European Journal of Operational Research, Elsevier, vol. 287(1), pages 19-35.
    14. Antonio Punzo & Paul. D. McNicholas, 2017. "Robust Clustering in Regression Analysis via the Contaminated Gaussian Cluster-Weighted Model," Journal of Classification, Springer;The Classification Society, vol. 34(2), pages 249-293, July.
    15. Angelo Mazza & Antonio Punzo, 2020. "Mixtures of multivariate contaminated normal regression models," Statistical Papers, Springer, vol. 61(2), pages 787-822, April.
    16. Luca Greco, 2022. "Robust fitting of mixtures of GLMs by weighted likelihood," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 106(1), pages 25-48, March.
    17. Marco Riani & Anthony C. Atkinson & Andrea Cerioli & Aldo Corbellini, 2019. "Comments on: Data science, big data and statistics," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 28(2), pages 349-352, June.
    18. Bagirov, Adil M. & Ugon, Julien & Mirzayeva, Hijran, 2013. "Nonsmooth nonconvex optimization approach to clusterwise linear regression problems," European Journal of Operational Research, Elsevier, vol. 229(1), pages 132-142.
    19. Gianfranco DI VAIO & Michele BATTISTI, 2010. "A Spatially-Filtered Mixture of Beta-Convergence Regression for EU Regions, 1980-2002," Regional and Urban Modeling 284100013, EcoMod.
    20. Frenkel Ter Hofstede & Michel Wedel & Jan-Benedict E.M. Steenkamp, 2002. "Identifying Spatial Segments in International Markets," Marketing Science, INFORMS, vol. 21(2), pages 160-177, July.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:stmapp:v:30:y:2021:i:3:d:10.1007_s10260-021-00569-3. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.