IDEAS home Printed from https://ideas.repec.org/a/spr/jclass/v33y2016i3d10.1007_s00357-016-9212-8.html
   My bibliography  Save this article

Piecewise Regression Mixture for Simultaneous Functional Data Clustering and Optimal Segmentation

Author

Listed:
  • Faicel Chamroukhi

    (Université de Toulon, CNRS, LSIS, UMR 7296
    Aix Marseille Université, CNRS, ENSAM, LSIS, UMR 7296
    Laboratoire Paul Painlevé, CNRS, UMR 8524)

Abstract

This paper introduces a novel mixture model-based approach to the simultaneous clustering and optimal segmentation of functional data, which are curves presenting regime changes. The proposed model consists of a finite mixture of piecewise polynomial regression models. Each piecewise polynomial regression model is associated with a cluster, and within each cluster, each piecewise polynomial component is associated with a regime (i.e., a segment). We derive two approaches to learning the model parameters: the first is an estimation approach which maximizes the observed-data likelihood via a dedicated expectation-maximization (EM) algorithm, then yielding a fuzzy partition of the curves into K clusters obtained at convergence by maximizing the posterior cluster probabilities. The second is a classification approach and optimizes a specific classification likelihood criterion through a dedicated classification expectation-maximization (CEM) algorithm. The optimal curve segmentation is performed by using dynamic programming. In the classification approach, both the curve clustering and the optimal segmentation are performed simultaneously as the CEM learning proceeds. We show that the classification approach is a probabilistic version generalizing the deterministic K-means-like algorithm proposed in Hébrail, Hugueney, Lechevallier, and Rossi (2010). The proposed approach is evaluated using simulated curves and real-world curves. Comparisons with alternatives including regression mixture models and the K-means-like algorithm for piecewise regression demonstrate the effectiveness of the proposed approach.

Suggested Citation

  • Faicel Chamroukhi, 2016. "Piecewise Regression Mixture for Simultaneous Functional Data Clustering and Optimal Segmentation," Journal of Classification, Springer;The Classification Society, vol. 33(3), pages 374-411, October.
  • Handle: RePEc:spr:jclass:v:33:y:2016:i:3:d:10.1007_s00357-016-9212-8
    DOI: 10.1007/s00357-016-9212-8
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s00357-016-9212-8
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s00357-016-9212-8?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Bouveyron, Charles & Brunet-Saumard, Camille, 2014. "Model-based clustering of high-dimensional data: A review," Computational Statistics & Data Analysis, Elsevier, vol. 71(C), pages 52-78.
    2. F. Picard & S. Robin & E. Lebarbier & J.-J. Daudin, 2007. "A Segmentation/Clustering Model for the Analysis of Array CGH Data," Biometrics, The International Biometric Society, vol. 63(3), pages 758-766, September.
    3. Paul Fearnhead & Zhen Liu, 2007. "On‐line inference for multiple changepoint problems," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 69(4), pages 589-605, September.
    4. Sharon Lee & Geoffrey McLachlan, 2013. "Model-based clustering and classification with non-normal mixture distributions," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 22(4), pages 427-454, November.
    5. Charles Bouveyron, 2014. "Adaptive Mixture Discriminant Analysis for Supervised Learning with Unobserved Classes," Journal of Classification, Springer;The Classification Society, vol. 31(1), pages 49-84, April.
    6. Salvatore Ingrassia & Simona Minotti & Giorgio Vittadini, 2012. "Local Statistical Modeling via a Cluster-Weighted Approach with Elliptical Distributions," Journal of Classification, Springer;The Classification Society, vol. 29(3), pages 363-401, October.
    7. Melnykov, Volodymyr, 2016. "Model-based biclustering of clickstream data," Computational Statistics & Data Analysis, Elsevier, vol. 93(C), pages 31-45.
    8. Allou Samé & Faicel Chamroukhi & Gérard Govaert & Patrice Aknin, 2011. "Model-based clustering and segmentation of time series with changes in regime," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 5(4), pages 301-321, December.
    9. Salvatore Ingrassia & Antonio Punzo & Giorgio Vittadini & Simona Minotti, 2015. "Erratum to: The Generalized Linear Mixed Cluster-Weighted Model," Journal of Classification, Springer;The Classification Society, vol. 32(2), pages 327-355, July.
    10. Douglas Steinley & Michael J. Brusco, 2007. "Initializing K-means Batch Clustering: A Critical Evaluation of Several Techniques," Journal of Classification, Springer;The Classification Society, vol. 24(1), pages 99-121, June.
    11. Jacques, Julien & Preda, Cristian, 2014. "Model-based clustering for multivariate functional data," Computational Statistics & Data Analysis, Elsevier, vol. 71(C), pages 92-106.
    12. Liu, Xueli & Yang, Mark C.K., 2009. "Simultaneous curve registration and clustering for functional data," Computational Statistics & Data Analysis, Elsevier, vol. 53(4), pages 1361-1376, February.
    13. Salvatore Ingrassia & Antonio Punzo & Giorgio Vittadini & Simona Minotti, 2015. "The Generalized Linear Mixed Cluster-Weighted Model," Journal of Classification, Springer;The Classification Society, vol. 32(1), pages 85-113, April.
    14. Biernacki, Christophe & Celeux, Gilles & Govaert, Gerard, 2003. "Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models," Computational Statistics & Data Analysis, Elsevier, vol. 41(3-4), pages 561-575, January.
    15. Nguyen, Hien D. & McLachlan, Geoffrey J. & Wood, Ian A., 2016. "Mixtures of spatial spline regressions for clustering and classification," Computational Statistics & Data Analysis, Elsevier, vol. 93(C), pages 76-85.
    16. Celeux, Gilles & Govaert, Gerard, 1992. "A classification EM algorithm for clustering and two stochastic versions," Computational Statistics & Data Analysis, Elsevier, vol. 14(3), pages 315-332, October.
    17. Jeffrey Andrews & Paul McNicholas, 2014. "Variable Selection for Clustering and Classification," Journal of Classification, Springer;The Classification Society, vol. 31(2), pages 136-153, July.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Snježana Majstorović & Kristian Sabo & Johannes Jung & Matija Klarić, 2018. "Spectral methods for growth curve clustering," Central European Journal of Operations Research, Springer;Slovak Society for Operations Research;Hungarian Operational Research Society;Czech Society for Operations Research;Österr. Gesellschaft für Operations Research (ÖGOR);Slovenian Society Informatika - Section for Operational Research;Croatian Operational Research Society, vol. 26(3), pages 715-737, September.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Paul D. McNicholas, 2016. "Model-Based Clustering," Journal of Classification, Springer;The Classification Society, vol. 33(3), pages 331-373, October.
    2. Utkarsh J. Dang & Antonio Punzo & Paul D. McNicholas & Salvatore Ingrassia & Ryan P. Browne, 2017. "Multivariate Response and Parsimony for Gaussian Cluster-Weighted Models," Journal of Classification, Springer;The Classification Society, vol. 34(1), pages 4-34, April.
    3. Salvatore Ingrassia & Antonio Punzo, 2020. "Cluster Validation for Mixtures of Regressions via the Total Sum of Squares Decomposition," Journal of Classification, Springer;The Classification Society, vol. 37(2), pages 526-547, July.
    4. Michael P. B. Gallaugher & Paul D. McNicholas, 2019. "On Fractionally-Supervised Classification: Weight Selection and Extension to the Multivariate t-Distribution," Journal of Classification, Springer;The Classification Society, vol. 36(2), pages 232-265, July.
    5. Antonio Punzo & Paul. D. McNicholas, 2017. "Robust Clustering in Regression Analysis via the Contaminated Gaussian Cluster-Weighted Model," Journal of Classification, Springer;The Classification Society, vol. 34(2), pages 249-293, July.
    6. Roberto Mari & Salvatore Ingrassia & Antonio Punzo, 2023. "Local and Overall Deviance R-Squared Measures for Mixtures of Generalized Linear Models," Journal of Classification, Springer;The Classification Society, vol. 40(2), pages 233-266, July.
    7. Maruotti, Antonello & Punzo, Antonio, 2017. "Model-based time-varying clustering of multivariate longitudinal data with covariates and outliers," Computational Statistics & Data Analysis, Elsevier, vol. 113(C), pages 475-496.
    8. Zhu, Xuwen & Melnykov, Volodymyr, 2018. "Manly transformation in finite mixture modeling," Computational Statistics & Data Analysis, Elsevier, vol. 121(C), pages 190-208.
    9. Diani, Cecilia & Galimberti, Giuliano & Soffritti, Gabriele, 2022. "Multivariate cluster-weighted models based on seemingly unrelated linear regression," Computational Statistics & Data Analysis, Elsevier, vol. 171(C).
    10. Salvatore D. Tomarchio & Paul D. McNicholas & Antonio Punzo, 2021. "Matrix Normal Cluster-Weighted Models," Journal of Classification, Springer;The Classification Society, vol. 38(3), pages 556-575, October.
    11. Yang, Yu-Chen & Lin, Tsung-I & Castro, Luis M. & Wang, Wan-Lun, 2020. "Extending finite mixtures of t linear mixed-effects models with concomitant covariates," Computational Statistics & Data Analysis, Elsevier, vol. 148(C).
    12. Michael P. B. Gallaugher & Salvatore D. Tomarchio & Paul D. McNicholas & Antonio Punzo, 2022. "Multivariate cluster weighted models using skewed distributions," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 16(1), pages 93-124, March.
    13. Melnykov, Volodymyr & Zhu, Xuwen, 2018. "On model-based clustering of skewed matrix data," Journal of Multivariate Analysis, Elsevier, vol. 167(C), pages 181-194.
    14. Volodymyr Melnykov & Semhar Michael, 2020. "Clustering Large Datasets by Merging K-Means Solutions," Journal of Classification, Springer;The Classification Society, vol. 37(1), pages 97-123, April.
    15. Naderi, Mehrdad & Mirfarah, Elham & Wang, Wan-Lun & Lin, Tsung-I, 2023. "Robust mixture regression modeling based on the normal mean-variance mixture distributions," Computational Statistics & Data Analysis, Elsevier, vol. 180(C).
    16. Sangkon Oh & Byungtae Seo, 2023. "Merging Components in Linear Gaussian Cluster-Weighted Models," Journal of Classification, Springer;The Classification Society, vol. 40(1), pages 25-51, April.
    17. Paolo Berta & Salvatore Ingrassia & Antonio Punzo & Giorgio Vittadini, 2016. "Multilevel cluster-weighted models for the evaluation of hospitals," METRON, Springer;Sapienza Università di Roma, vol. 74(3), pages 275-292, December.
    18. Gabriele Soffritti, 2021. "Estimating the Covariance Matrix of the Maximum Likelihood Estimator Under Linear Cluster-Weighted Models," Journal of Classification, Springer;The Classification Society, vol. 38(3), pages 594-625, October.
    19. Xiaoqiong Fang & Andy W. Chen & Derek S. Young, 2023. "Predictors with measurement error in mixtures of polynomial regressions," Computational Statistics, Springer, vol. 38(1), pages 373-401, March.
    20. Angelo Mazza & Antonio Punzo, 2020. "Mixtures of multivariate contaminated normal regression models," Statistical Papers, Springer, vol. 61(2), pages 787-822, April.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:jclass:v:33:y:2016:i:3:d:10.1007_s00357-016-9212-8. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.