IDEAS home Printed from https://ideas.repec.org/a/spr/sankha/v79y2017i2d10.1007_s13171-017-0106-6.html
   My bibliography  Save this article

New Asymptotic Results in Principal Component Analysis

Author

Listed:
  • Vladimir Koltchinskii

    (Georgia Institute of Technology)

  • Karim Lounici

    (Georgia Institute of Technology
    Université Côte d’Azur)

Abstract

Let X be a mean zero Gaussian random vector in a separable Hilbert space ℍ ${\mathbb H}$ with covariance operator Σ : = E ( X ⊗ X ) . ${\Sigma }:={\mathbb E}(X\otimes X).$ Let Σ = ∑ r ≥ 1 μ r P r ${\Sigma }={\sum }_{r\geq 1}\mu _{r} P_{r}$ be the spectral decomposition of Σ with distinct eigenvalues μ 1 > μ 2 > … $\mu _{1}>\mu _{2}> \dots $ and the corresponding spectral projectors P 1 , P 2 , … . $P_{1}, P_{2}, \dots .$ Given a sample X 1 , … , X n $X_{1},\dots , X_{n}$ of size n of i.i.d. copies of X, the sample covariance operator is defined as Σ ̂ n : = n − 1 ∑ j = 1 n X j ⊗ X j . $\hat {\Sigma }_{n} := n^{-1}{\sum }_{j=1}^{n} X_{j}\otimes X_{j}.$ The main goal of principal component analysis is to estimate spectral projectors P 1 , P 2 , … $P_{1}, P_{2}, \dots $ by their empirical counterparts P ̂ 1 , P ̂ 2 , … $\hat P_{1}, \hat P_{2}, \dots $ properly defined in terms of spectral decomposition of the sample covariance operator Σ ̂ n . $\hat {\Sigma }_{n}.$ The aim of this paper is to study asymptotic distributions of important statistics related to this problem, in particular, of statistic ∥ P ̂ r − P r ∥ 2 2 , $\|\hat P_{r}-P_{r}\|_{2}^{2},$ where ∥ ⋅ ∥ 2 2 $\|\cdot \|_{2}^{2}$ is the squared Hilbert–Schmidt norm. This is done in a “high-complexity” asymptotic framework in which the so called effective rank r ( Σ ) : = tr ( Σ ) ∥ Σ ∥ ∞ $\textbf {r}({\Sigma }):=\frac {\text {tr}({\Sigma })}{\|{\Sigma }\|_{\infty }}$ (tr(⋅) being the trace and ∥ ⋅ ∥ ∞ $\|\cdot \|_{\infty }$ being the operator norm) of the true covariance Σ is becoming large simultaneously with the sample size n, but r(Σ) = o(n) as n → ∞ . $n\to \infty .$ In this setting, we prove that, in the case of one-dimensional spectral projector P r , the properly centered and normalized statistic ∥ P ̂ r − P r ∥ 2 2 $\|\hat P_{r}-P_{r}\|_{2}^{2}$ with data-dependent centering and normalization converges in distribution to a Cauchy type limit. The proofs of this and other related results rely on perturbation analysis and Gaussian concentration.

Suggested Citation

  • Vladimir Koltchinskii & Karim Lounici, 2017. "New Asymptotic Results in Principal Component Analysis," Sankhya A: The Indian Journal of Statistics, Springer;Indian Statistical Institute, vol. 79(2), pages 254-297, August.
  • Handle: RePEc:spr:sankha:v:79:y:2017:i:2:d:10.1007_s13171-017-0106-6
    DOI: 10.1007/s13171-017-0106-6
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s13171-017-0106-6
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s13171-017-0106-6?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Johnstone, Iain M. & Lu, Arthur Yu, 2009. "On Consistency and Sparsity for Principal Components Analysis in High Dimensions," Journal of the American Statistical Association, American Statistical Association, vol. 104(486), pages 682-693.
    2. Dauxois, J. & Pousse, A. & Romain, Y., 1982. "Asymptotic theory for the principal component analysis of a vector random function: Some applications to statistical inference," Journal of Multivariate Analysis, Elsevier, vol. 12(1), pages 136-154, March.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Mike Ludkovski & Glen Swindle & Eric Grannan, 2022. "Large Scale Probabilistic Simulation of Renewables Production," Papers 2205.04736, arXiv.org.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Candelon, B. & Hurlin, C. & Tokpavi, S., 2012. "Sampling error and double shrinkage estimation of minimum variance portfolios," Journal of Empirical Finance, Elsevier, vol. 19(4), pages 511-527.
    2. Mingotti, Nicola & Lillo Rodríguez, Rosa Elvira & Romo, Juan, 2015. "A Random Walk Test for Functional Time Series," DES - Working Papers. Statistics and Econometrics. WS ws1506, Universidad Carlos III de Madrid. Departamento de Estadística.
    3. Yata, Kazuyoshi & Aoshima, Makoto, 2013. "PCA consistency for the power spiked model in high-dimensional settings," Journal of Multivariate Analysis, Elsevier, vol. 122(C), pages 334-354.
    4. Asai, Manabu & McAleer, Michael, 2015. "Forecasting co-volatilities via factor models with asymmetry and long memory in realized covariance," Journal of Econometrics, Elsevier, vol. 189(2), pages 251-262.
    5. María Edo & Walter Sosa Escudero & Marcela Svarc, 2021. "A multidimensional approach to measuring the middle class," The Journal of Economic Inequality, Springer;Society for the Study of Economic Inequality, vol. 19(1), pages 139-162, March.
    6. Guangxing Wang & Sisheng Liu & Fang Han & Chong‐Zhi Di, 2023. "Robust functional principal component analysis via a functional pairwise spatial sign operator," Biometrics, The International Biometric Society, vol. 79(2), pages 1239-1253, June.
    7. Wang, Shao-Hsuan & Huang, Su-Yun, 2022. "Perturbation theory for cross data matrix-based PCA," Journal of Multivariate Analysis, Elsevier, vol. 190(C).
    8. Silin, Igor & Spokoiny, Vladimir, 2018. "Bayesian inference for spectral projectors of covariance matrix," IRTG 1792 Discussion Papers 2018-027, Humboldt University of Berlin, International Research Training Group 1792 "High Dimensional Nonstationary Time Series".
    9. Barigozzi, Matteo & Trapani, Lorenzo, 2020. "Sequential testing for structural stability in approximate factor models," Stochastic Processes and their Applications, Elsevier, vol. 130(8), pages 5149-5187.
    10. Qi, Xin & Zhao, Hongyu, 2011. "Some theoretical properties of Silverman's method for Smoothed functional principal component analysis," Journal of Multivariate Analysis, Elsevier, vol. 102(4), pages 741-767, April.
    11. Ci-Ren Jiang & John A. D. Aston & Jane-Ling Wang, 2016. "A Functional Approach to Deconvolve Dynamic Neuroimaging Data," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 111(513), pages 1-13, March.
    12. Delsol, Laurent & Ferraty, Frédéric & Vieu, Philippe, 2011. "Structural test in regression on functional variables," Journal of Multivariate Analysis, Elsevier, vol. 102(3), pages 422-447, March.
    13. Maillet, Bertrand & Tokpavi, Sessi & Vaucher, Benoit, 2015. "Global minimum variance portfolio optimisation under some model risk: A robust regression-based approach," European Journal of Operational Research, Elsevier, vol. 244(1), pages 289-299.
    14. Steland, Ansgar, 2020. "Testing and estimating change-points in the covariance matrix of a high-dimensional time series," Journal of Multivariate Analysis, Elsevier, vol. 177(C).
    15. Michal Benko & Alois Kneip, 2005. "Common functional component modelling," SFB 649 Discussion Papers SFB649DP2005-016, Sonderforschungsbereich 649, Humboldt University, Berlin, Germany.
    16. Lam, Clifford & Yao, Qiwei & Bathia, Neil, 2011. "Estimation of latent factors for high-dimensional time series," LSE Research Online Documents on Economics 31549, London School of Economics and Political Science, LSE Library.
    17. Kristoffer Herland Hellton & Magne Thoresen, 2014. "The Impact of Measurement Error on Principal Component Analysis," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 41(4), pages 1051-1063, December.
    18. Ziwei Zhu & Tengyao Wang & Richard J. Samworth, 2022. "High‐dimensional principal component analysis with heterogeneous missingness," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 84(5), pages 2000-2031, November.
    19. Beran, Jan & Liu, Haiyan, 2016. "Estimation of eigenvalues, eigenvectors and scores in FDA models with dependent errors," Journal of Multivariate Analysis, Elsevier, vol. 147(C), pages 218-233.
    20. Chung Chang & Yakuan Chen & R. Ogden, 2014. "Functional data classification: a wavelet approach," Computational Statistics, Springer, vol. 29(6), pages 1497-1513, December.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:sankha:v:79:y:2017:i:2:d:10.1007_s13171-017-0106-6. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.