IDEAS home Printed from https://ideas.repec.org/p/pra/mprapa/38697.html
   My bibliography  Save this paper

Large covariance estimation by thresholding principal orthogonal complements

Author

Listed:
  • Fan, Jianqing
  • Liao, Yuan
  • Mincheva, Martina

Abstract

This paper deals with estimation of high-dimensional covariance with a conditional sparsity structure, which is the composition of a low-rank matrix plus a sparse matrix. By assuming sparse error covariance matrix in a multi-factor model, we allow the presence of the cross-sectional correlation even after taking out common but unobservable factors. We introduce the Principal Orthogonal complEment Thresholding (POET) method to explore such an approximate factor structure. The POET estimator includes the sample covariance matrix, the factor-based covariance matrix (Fan, Fan and Lv, 2008), the thresholding estimator (Bickel and Levina, 2008) and the adaptive thresholding estimator (Cai and Liu, 2011) as specic examples. We provide mathematical insights when the factor analysis is approximately the same as the principal component analysis for high dimensional data. The rates of convergence of the sparse residual covariance matrix and the conditional sparse covariance matrix are studied under various norms, including the spectral norm. It is shown that the impact of estimating the unknown factors vanishes as the dimensionality increases. The uniform rates of convergence for the unobserved factors and their factor loadings are derived. The asymptotic results are also veried by extensive simulation studies.

Suggested Citation

  • Fan, Jianqing & Liao, Yuan & Mincheva, Martina, 2011. "Large covariance estimation by thresholding principal orthogonal complements," MPRA Paper 38697, University Library of Munich, Germany.
  • Handle: RePEc:pra:mprapa:38697
    as

    Download full text from publisher

    File URL: https://mpra.ub.uni-muenchen.de/38697/1/MPRA_paper_38697.pdf
    File Function: original version
    Download Restriction: no

    Other versions of this item:

    References listed on IDEAS

    as
    1. Ledoit, Olivier & Wolf, Michael, 2004. "A well-conditioned estimator for large-dimensional covariance matrices," Journal of Multivariate Analysis, Elsevier, vol. 88(2), pages 365-411, February.
    2. Fama, Eugene F & French, Kenneth R, 1992. " The Cross-Section of Expected Stock Returns," Journal of Finance, American Finance Association, vol. 47(2), pages 427-465, June.
    3. Peter Hall & J. S. Marron & Amnon Neeman, 2005. "Geometric representation of high dimension, low sample size data," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(3), pages 427-444.
    4. Johnstone, Iain M. & Lu, Arthur Yu, 2009. "On Consistency and Sparsity for Principal Components Analysis in High Dimensions," Journal of the American Statistical Association, American Statistical Association, vol. 104(486), pages 682-693.
    5. Forni, Mario & Lippi, Marco, 2001. "The Generalized Dynamic Factor Model: Representation Theory," Econometric Theory, Cambridge University Press, vol. 17(06), pages 1113-1141, December.
    6. H. Wang, 2012. "Factor profiled sure independence screening," Biometrika, Biometrika Trust, vol. 99(1), pages 15-28.
    7. Forni, Mario & Hallin, Marc & Lippi, Marco & Reichlin, Lucrezia, 2004. "The generalized dynamic factor model consistency and rates," Journal of Econometrics, Elsevier, vol. 119(2), pages 231-255, April.
    8. Jushan Bai & Serena Ng, 2002. "Determining the Number of Factors in Approximate Factor Models," Econometrica, Econometric Society, vol. 70(1), pages 191-221, January.
    9. Kourtis, Apostolos & Dotsis, George & Markellos, Raphael N., 2012. "Parameter uncertainty in portfolio selection: Shrinking the inverse covariance matrix," Journal of Banking & Finance, Elsevier, vol. 36(9), pages 2522-2531.
    10. Ross, Stephen A., 1976. "The arbitrage theory of capital asset pricing," Journal of Economic Theory, Elsevier, vol. 13(3), pages 341-360, December.
    11. Shen, Haipeng & Huang, Jianhua Z., 2008. "Sparse principal component analysis via regularized low rank matrix approximation," Journal of Multivariate Analysis, Elsevier, vol. 99(6), pages 1015-1034, July.
    12. Ravi Jagannathan & Tongshu Ma, 2003. "Risk Reduction in Large Portfolios: Why Imposing the Wrong Constraints Helps," Journal of Finance, American Finance Association, vol. 58(4), pages 1651-1684, August.
    13. Kaufman, Cari G. & Schervish, Mark J. & Nychka, Douglas W., 2008. "Covariance Tapering for Likelihood-Based Estimation in Large Spatial Data Sets," Journal of the American Statistical Association, American Statistical Association, vol. 103(484), pages 1545-1555.
    14. Boivin, Jean & Ng, Serena, 2006. "Are more data always better for factor analysis?," Journal of Econometrics, Elsevier, vol. 132(1), pages 169-194, May.
    15. Shen, Dan & Shen, Haipeng & Marron, J.S., 2013. "Consistency of sparse PCA in High Dimension, Low Sample Size contexts," Journal of Multivariate Analysis, Elsevier, vol. 115(C), pages 317-333.
    16. Xi Luo, 2011. "Recovering Model Structures from Large Low Rank and Sparse Covariance Matrix Estimation," Papers 1111.1133, arXiv.org, revised Mar 2013.
    17. Joong-Ho Won & Johan Lim & Seung-Jean Kim & Bala Rajaratnam, 2013. "Condition-number-regularized covariance estimation," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 75(3), pages 427-450, June.
    18. Liu, Yufeng & Hayes, David Neil & Nobel, Andrew & Marron, J. S, 2008. "Statistical Significance of Clustering for High-Dimension, Low–Sample Size Data," Journal of the American Statistical Association, American Statistical Association, vol. 103(483), pages 1281-1293.
    19. Doz, Catherine & Giannone, Domenico & Reichlin, Lucrezia, 2011. "A two-step estimator for large approximate dynamic factor models based on Kalman filtering," Journal of Econometrics, Elsevier, vol. 164(1), pages 188-205, September.
    20. Alexander Chudik & M. Hashem Pesaran & Elisa Tosetti, 2011. "Weak and strong cross‐section dependence and estimation of large panels," Econometrics Journal, Royal Economic Society, vol. 14(1), pages 45-90, February.
    21. Jianqing Fan & Jingjin Zhang & Ke Yu, 2008. "Asset Allocation and Risk Assessment with Gross Exposure Constraints for Vast Portfolios," Papers 0812.2604, arXiv.org.
    22. repec:hal:journl:peer-00844811 is not listed on IDEAS
    23. Alexei Onatski, 2009. "Testing Hypotheses About the Number of Factors in Large Factor Models," Econometrica, Econometric Society, vol. 77(5), pages 1447-1479, September.
    24. Clifford Lam & Qiwei Yao & Neil Bathia, 2011. "Estimation of latent factors for high-dimensional time series," Biometrika, Biometrika Trust, vol. 98(4), pages 901-918.
    25. Jianqing Fan & Jingjin Zhang & Ke Yu, 2012. "Vast Portfolio Selection With Gross-Exposure Constraints," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 107(498), pages 592-606, June.
    26. Hallin, Marc & Liska, Roman, 2007. "Determining the Number of Factors in the General Dynamic Factor Model," Journal of the American Statistical Association, American Statistical Association, vol. 102, pages 603-617, June.
    27. Lam, Clifford & Yao, Qiwei & Bathia, Neil, 2011. "Estimation of latent factors for high-dimensional time series," LSE Research Online Documents on Economics 31549, London School of Economics and Political Science, LSE Library.
    28. Jushan Bai & Shuzhong Shi, 2011. "Estimating High Dimensional Covariance Matrices and its Applications," Annals of Economics and Finance, Society for AEF, vol. 12(2), pages 199-215, November.
    29. Lingzhou Xue & Shiqian Ma & Hui Zou, 2012. "Positive-Definite ℓ 1 -Penalized Estimation of Large Covariance Matrices," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 107(500), pages 1480-1491, December.
    30. Jushan Bai, 2003. "Inferential Theory for Factor Models of Large Dimensions," Econometrica, Econometric Society, vol. 71(1), pages 135-171, January.
    31. Mario Forni & Marc Hallin & Marco Lippi & Lucrezia Reichlin, 2000. "The Generalized Dynamic-Factor Model: Identification And Estimation," The Review of Economics and Statistics, MIT Press, vol. 82(4), pages 540-554, November.
    32. Ledoit, Olivier & Wolf, Michael, 2003. "Improved estimation of the covariance matrix of stock returns with an application to portfolio selection," Journal of Empirical Finance, Elsevier, vol. 10(5), pages 603-621, December.
    33. Bouveyron, C. & Girard, S. & Schmid, C., 2007. "High-dimensional data clustering," Computational Statistics & Data Analysis, Elsevier, vol. 52(1), pages 502-519, September.
    34. M. Hashem Pesaran, 2006. "Estimation and Inference in Large Heterogeneous Panels with a Multifactor Error Structure," Econometrica, Econometric Society, vol. 74(4), pages 967-1012, July.
    35. Kapetanios, George, 2010. "A Testing Procedure for Determining the Number of Factors in Approximate Factor Models With Large Datasets," Journal of Business & Economic Statistics, American Statistical Association, vol. 28(3), pages 397-409.
    36. Jung, Sungkyu & Sen, Arusharka & Marron, J.S., 2012. "Boundary behavior in High Dimension, Low Sample Size asymptotics of PCA," Journal of Multivariate Analysis, Elsevier, vol. 109(C), pages 190-203.
    37. Chamberlain, Gary & Rothschild, Michael, 1983. "Arbitrage, Factor Structure, and Mean-Variance Analysis on Large Asset Markets," Econometrica, Econometric Society, vol. 51(5), pages 1281-1304, September.
    38. Fama, Eugene F. & French, Kenneth R., 1993. "Common risk factors in the returns on stocks and bonds," Journal of Financial Economics, Elsevier, vol. 33(1), pages 3-56, February.
    39. William F. Sharpe, 1964. "Capital Asset Prices: A Theory Of Market Equilibrium Under Conditions Of Risk," Journal of Finance, American Finance Association, vol. 19(3), pages 425-442, September.
    40. Cai, Tony & Liu, Weidong, 2011. "Adaptive Thresholding for Sparse Covariance Matrix Estimation," Journal of the American Statistical Association, American Statistical Association, vol. 106(494), pages 672-684.
    41. Stock, James H & Watson, Mark W, 2002. "Macroeconomic Forecasting Using Diffusion Indexes," Journal of Business & Economic Statistics, American Statistical Association, vol. 20(2), pages 147-162, April.
    42. Alexei Onatski, 2010. "Determining the Number of Factors from Empirical Distribution of Eigenvalues," The Review of Economics and Statistics, MIT Press, vol. 92(4), pages 1004-1016, November.
    43. Lam, Clifford & Fan, Jianqing, 2009. "Sparsistency and rates of convergence in large covariance matrix estimation," LSE Research Online Documents on Economics 31540, London School of Economics and Political Science, LSE Library.
    44. Baik, Jinho & Silverstein, Jack W., 2006. "Eigenvalues of large sample covariance matrices of spiked population models," Journal of Multivariate Analysis, Elsevier, vol. 97(6), pages 1382-1408, July.
    45. Jianqing Fan & Jinchi Lv, 2008. "Sure independence screening for ultrahigh dimensional feature space," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 70(5), pages 849-911.
    46. Enrique Sentana, 2009. "The econometrics of mean-variance efficiency tests: a survey," Econometrics Journal, Royal Economic Society, vol. 12(3), pages 65-101, November.
    47. John Stephen Yap & Jianqing Fan & Rongling Wu, 2009. "Nonparametric Modeling of Longitudinal Covariance Structure in Functional Mapping of Quantitative Trait Loci," Biometrics, The International Biometric Society, vol. 65(4), pages 1068-1077, December.
    48. Ahn, Seung Chan & Hoon Lee, Young & Schmidt, Peter, 2001. "GMM estimation of linear panel data models with time-varying individual effects," Journal of Econometrics, Elsevier, vol. 101(2), pages 219-255, April.
    49. Rothman, Adam J. & Levina, Elizaveta & Zhu, Ji, 2009. "Generalized Thresholding of Large Covariance Matrices," Journal of the American Statistical Association, American Statistical Association, vol. 104(485), pages 177-186.
    50. Bai, Jushan & Ng, Serena, 2008. "Large Dimensional Factor Analysis," Foundations and Trends(R) in Econometrics, now publishers, vol. 3(2), pages 89-163, June.
    51. Lam, Clifford & Yao, Qiwei, 2012. "Factor modeling for high-dimensional time series: inference for the number of factors," LSE Research Online Documents on Economics 45684, London School of Economics and Political Science, LSE Library.
    52. Stock J.H. & Watson M.W., 2002. "Forecasting Using Principal Components From a Large Number of Predictors," Journal of the American Statistical Association, American Statistical Association, vol. 97, pages 1167-1179, December.
    53. repec:hal:journl:hal-00638009 is not listed on IDEAS
    54. Alessi, Lucia & Barigozzi, Matteo & Capasso, Marco, 2010. "Improved penalization for determining the number of factors in approximate factor models," Statistics & Probability Letters, Elsevier, vol. 80(23-24), pages 1806-1813, December.
    55. Fan, Jianqing & Fan, Yingying & Lv, Jinchi, 2008. "High dimensional covariance matrix estimation using a factor model," Journal of Econometrics, Elsevier, vol. 147(1), pages 186-197, November.
    56. Carvalho, Carlos M. & Chang, Jeffrey & Lucas, Joseph E. & Nevins, Joseph R. & Wang, Quanli & West, Mike, 2008. "High-Dimensional Sparse Factor Modeling: Applications in Gene Expression Genomics," Journal of the American Statistical Association, American Statistical Association, vol. 103(484), pages 1438-1456.
    57. Efron, Bradley, 2010. "Correlated z-Values and the Accuracy of Large-Scale Statistical Estimates," Journal of the American Statistical Association, American Statistical Association, vol. 105(491), pages 1042-1055.
    58. Efron, Bradley, 2007. "Correlation and Large-Scale Simultaneous Significance Testing," Journal of the American Statistical Association, American Statistical Association, vol. 102, pages 93-103, March.
    59. Hallin, Marc & Liska, Roman, 2011. "Dynamic factors in the presence of blocks," Journal of Econometrics, Elsevier, vol. 163(1), pages 29-41, July.
    Full references (including those not matched with items on IDEAS)

    More about this item

    Keywords

    High dimensionality; approximate factor model; unknown factors; principal components; sparse matrix; low-rank matrix; thresholding; cross-sectional correlation;

    JEL classification:

    • C13 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods and Methodology: General - - - Estimation: General
    • C01 - Mathematical and Quantitative Methods - - General - - - Econometrics

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:pra:mprapa:38697. See general information about how to correct material in RePEc.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: (Joachim Winter) or (Rebekah McClure). General contact details of provider: http://edirc.repec.org/data/vfmunde.html .

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service hosted by the Research Division of the Federal Reserve Bank of St. Louis . RePEc uses bibliographic data supplied by the respective publishers.