IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v206y2025ics0167947325000027.html
   My bibliography  Save this article

Robust generalized canonical correlation analysis based on scatter matrices

Author

Listed:
  • Kudraszow, Nadia L.
  • Vahnovan, Alejandra V.
  • Ferrario, Julieta
  • Fasano, M. Victoria

Abstract

Generalized Canonical Correlation Analysis (GCCA) is a powerful tool for analyzing and understanding linear relationships between multiple sets of variables. However, its classical estimations are highly sensitive to outliers, which can significantly affect the results of the analysis. A functional version of GCCA is proposed, based on scatter matrices, leading to robust and Fisher consistent estimators for appropriate choices of the scatter matrix. In cases where scatter matrices are ill-conditioned, a modification based on an estimation of the precision matrix is introduced. A procedure to identify influential observations is also developed. A simulation study evaluates the finite-sample performance of the proposed methods under clean and contaminated samples. The advantages of the influential data detection approach are demonstrated through an application to a real dataset.

Suggested Citation

  • Kudraszow, Nadia L. & Vahnovan, Alejandra V. & Ferrario, Julieta & Fasano, M. Victoria, 2025. "Robust generalized canonical correlation analysis based on scatter matrices," Computational Statistics & Data Analysis, Elsevier, vol. 206(C).
  • Handle: RePEc:eee:csdana:v:206:y:2025:i:c:s0167947325000027
    DOI: 10.1016/j.csda.2025.108126
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167947325000027
    Download Restriction: Full text for ScienceDirect subscribers only.

    File URL: https://libkey.io/10.1016/j.csda.2025.108126?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to

    for a different version of it.

    References listed on IDEAS

    as
    1. Ledoit, Olivier & Wolf, Michael, 2004. "A well-conditioned estimator for large-dimensional covariance matrices," Journal of Multivariate Analysis, Elsevier, vol. 88(2), pages 365-411, February.
    2. Adrover, Jorge G. & Donato, Stella M., 2015. "A robust predictive approach for canonical correlation analysis," Journal of Multivariate Analysis, Elsevier, vol. 133(C), pages 356-376.
    3. Maronna, Ricardo A. & Yohai, Victor J., 2017. "Robust and efficient estimation of multivariate scatter and location," Computational Statistics & Data Analysis, Elsevier, vol. 109(C), pages 64-75.
    4. N. Locantore & J. Marron & D. Simpson & N. Tripoli & J. Zhang & K. Cohen & Graciela Boente & Ricardo Fraiman & Babette Brumback & Christophe Croux & Jianqing Fan & Alois Kneip & John Marden & Daniel P, 1999. "Robust principal component analysis for functional data," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 8(1), pages 1-73, June.
    5. Michel Tenenhaus & Arthur Tenenhaus & Patrick J. F. Groenen, 2017. "Regularized Generalized Canonical Correlation Analysis: A Framework for Sequential Multiblock Component Methods," Psychometrika, Springer;The Psychometric Society, vol. 82(3), pages 737-777, September.
    6. Schäfer Juliane & Strimmer Korbinian, 2005. "A Shrinkage Approach to Large-Scale Covariance Matrix Estimation and Implications for Functional Genomics," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 4(1), pages 1-32, November.
    7. Salibian-Barrera, Matias & Van Aelst, Stefan & Willems, Gert, 2006. "Principal Components Analysis Based on Multivariate MM Estimators With Fast and Robust Bootstrap," Journal of the American Statistical Association, American Statistical Association, vol. 101, pages 1198-1211, September.
    8. Taskinen, Sara & Croux, Christophe & Kankainen, Annaliisa & Ollila, Esa & Oja, Hannu, 2006. "Influence functions and efficiencies of the canonical correlation and vector estimates based on scatter and shape matrices," Journal of Multivariate Analysis, Elsevier, vol. 97(2), pages 359-384, February.
    9. Graciela Boente & Matías Salibian-Barrera, 2015. "S -Estimators for Functional Principal Component Analysis," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 110(511), pages 1100-1111, September.
    10. Michel Tenenhaus, 2011. "Regularized generalized canonical correlation analysis," Post-Print hal-00578321, HAL.
    11. Arthur Tenenhaus & Michel Tenenhaus, 2011. "Regularized Generalized Canonical Correlation Analysis," Psychometrika, Springer;The Psychometric Society, vol. 76(2), pages 257-284, April.
    12. Michel Tenenhaus & Arthur Tenenhaus, 2011. "Regularized Generalized Canonical Correlation Analysis," Post-Print hal-00609220, HAL.
    13. Xiuli Du & Xiaohu Jiang & Jinguan Lin, 2023. "Multinomial Logistic Factor Regression for Multi-source Functional Block-wise Missing Data," Psychometrika, Springer;The Psychometric Society, vol. 88(3), pages 975-1001, September.
    14. Hubert, M. & Vandervieren, E., 2008. "An adjusted boxplot for skewed distributions," Computational Statistics & Data Analysis, Elsevier, vol. 52(12), pages 5186-5201, August.
    15. Ming Yuan & Yi Lin, 2007. "Model selection and estimation in the Gaussian graphical model," Biometrika, Biometrika Trust, vol. 94(1), pages 19-35.
    16. Andreas Alfons & Christophe Croux & Peter Filzmoser, 2017. "Robust Maximum Association Estimators," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 112(517), pages 436-445, January.
    17. N. A. Campbell, 1982. "Robust Procedures in Multivariate Analysis II. Robust Canonical Variate Analysis," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 31(1), pages 1-8, March.
    18. Tarr, G. & Müller, S. & Weber, N.C., 2016. "Robust estimation of precision matrices under cellwise contamination," Computational Statistics & Data Analysis, Elsevier, vol. 93(C), pages 404-420.
    19. Jorge G. Adrover & Stella M. Donato, 2023. "Aspects of robust canonical correlation analysis, principal components and association," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 32(2), pages 623-650, June.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Alvarez, Agustín & Boente, Graciela & Kudraszow, Nadia, 2019. "Robust sieve estimators for functional canonical correlation analysis," Journal of Multivariate Analysis, Elsevier, vol. 170(C), pages 46-62.
    2. Anna L. Tyler & J. Matthew Mahoney & Mark P. Keller & Candice N. Baker & Margaret Gaca & Anuj Srivastava & Isabela Gerdes Gyuricza & Madeleine J. Braun & Nadia A. Rosenthal & Alan D. Attie & Gary A. C, 2025. "Transcripts with high distal heritability mediate genetic effects on complex metabolic traits," Nature Communications, Nature, vol. 16(1), pages 1-21, December.
    3. Tenenhaus, Arthur & Philippe, Cathy & Frouin, Vincent, 2015. "Kernel Generalized Canonical Correlation Analysis," Computational Statistics & Data Analysis, Elsevier, vol. 90(C), pages 114-131.
    4. Michel Tenenhaus & Arthur Tenenhaus & Patrick J. F. Groenen, 2017. "Regularized Generalized Canonical Correlation Analysis: A Framework for Sequential Multiblock Component Methods," Psychometrika, Springer;The Psychometric Society, vol. 82(3), pages 737-777, September.
    5. Olivier Ledoit & Michael Wolf, 2019. "Quadratic shrinkage for large covariance matrices," ECON - Working Papers 335, Department of Economics - University of Zurich, revised Dec 2020.
    6. Sera Şanlı, 2023. "Untapped potentials on a well‐endowed plate: A sustainable future catalogue for the harmony of renewable technologies with the water‐energy‐climate‐SDGs nexus," Natural Resources Forum, Blackwell Publishing, vol. 47(4), pages 672-698, November.
    7. Boyi Guo & Hannah D. Holscher & Loretta S. Auvil & Michael E. Welge & Colleen B. Bushell & Janet A. Novotny & David J. Baer & Nicholas A. Burd & Naiman A. Khan & Ruoqing Zhu, 2023. "Estimating Heterogeneous Treatment Effect on Multivariate Responses Using Random Forests," Statistics in Biosciences, Springer;International Chinese Statistical Association, vol. 15(3), pages 545-561, December.
    8. Cruz-Cano, Raul & Lee, Mei-Ling Ting, 2014. "Fast regularized canonical correlation analysis," Computational Statistics & Data Analysis, Elsevier, vol. 70(C), pages 88-100.
    9. Xiuli Du & Xiaohu Jiang & Jinguan Lin, 2023. "Multinomial Logistic Factor Regression for Multi-source Functional Block-wise Missing Data," Psychometrika, Springer;The Psychometric Society, vol. 88(3), pages 975-1001, September.
    10. Heungsun Hwang & Gyeongcheol Cho, 2020. "Global Least Squares Path Modeling: A Full-Information Alternative to Partial Least Squares Path Modeling," Psychometrika, Springer;The Psychometric Society, vol. 85(4), pages 947-972, December.
    11. Wang, Wenjia & Zhou, Yi-Hui, 2021. "Eigenvector-based sparse canonical correlation analysis: Fast computation for estimation of multiple canonical vectors," Journal of Multivariate Analysis, Elsevier, vol. 185(C).
    12. Lam, Clifford, 2020. "High-dimensional covariance matrix estimation," LSE Research Online Documents on Economics 101667, London School of Economics and Political Science, LSE Library.
    13. Gautam Sabnis & Debdeep Pati & Anirban Bhattacharya, 2019. "Compressed Covariance Estimation with Automated Dimension Learning," Sankhya A: The Indian Journal of Statistics, Springer;Indian Statistical Institute, vol. 81(2), pages 466-481, December.
    14. Joseph F. Hair & G. Tomas M. Hult & Christian M. Ringle & Marko Sarstedt & Kai Oliver Thiele, 2017. "Mirror, mirror on the wall: a comparative evaluation of composite-based structural equation modeling methods," Journal of the Academy of Marketing Science, Springer, vol. 45(5), pages 616-632, September.
    15. Lukáš Malec & Vladimír Janovský, 2020. "Connecting the multivariate partial least squares with canonical analysis: a path-following approach," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 14(3), pages 589-609, September.
    16. Stéphanie Bougeard & Hervé Abdi & Gilbert Saporta & Ndèye Niang, 2018. "Clusterwise analysis for multiblock component methods," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 12(2), pages 285-313, June.
    17. Cristina Davino & Pasquale Dolce & Stefania Taralli & Domenico Vistocco, 2022. "Composite-Based Path Modeling for Conditional Quantiles Prediction. An Application to Assess Health Differences at Local Level in a Well-Being Perspective," Social Indicators Research: An International and Interdisciplinary Journal for Quality-of-Life Measurement, Springer, vol. 161(2), pages 907-936, June.
    18. Bailey, Natalia & Pesaran, M. Hashem & Smith, L. Vanessa, 2019. "A multiple testing approach to the regularisation of large sample correlation matrices," Journal of Econometrics, Elsevier, vol. 208(2), pages 507-534.
    19. Avagyan, Vahe & Alonso Fernández, Andrés Modesto & Nogales, Francisco J., 2014. "Improving the graphical lasso estimation for the precision matrix through roots ot the sample convariance matrix," DES - Working Papers. Statistics and Econometrics. WS ws141208, Universidad Carlos III de Madrid. Departamento de Estadística.
    20. Vahe Avagyan & Andrés M. Alonso & Francisco J. Nogales, 2018. "D-trace estimation of a precision matrix using adaptive Lasso penalties," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 12(2), pages 425-447, June.

    More about this item

    Keywords

    ;
    ;
    ;
    ;

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:206:y:2025:i:c:s0167947325000027. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.