IDEAS home Printed from https://ideas.repec.org/p/pra/mprapa/38697.html
   My bibliography  Save this paper

Large covariance estimation by thresholding principal orthogonal complements

Author

Listed:
  • Fan, Jianqing
  • Liao, Yuan
  • Mincheva, Martina

Abstract

This paper deals with estimation of high-dimensional covariance with a conditional sparsity structure, which is the composition of a low-rank matrix plus a sparse matrix. By assuming sparse error covariance matrix in a multi-factor model, we allow the presence of the cross-sectional correlation even after taking out common but unobservable factors. We introduce the Principal Orthogonal complEment Thresholding (POET) method to explore such an approximate factor structure. The POET estimator includes the sample covariance matrix, the factor-based covariance matrix (Fan, Fan and Lv, 2008), the thresholding estimator (Bickel and Levina, 2008) and the adaptive thresholding estimator (Cai and Liu, 2011) as specic examples. We provide mathematical insights when the factor analysis is approximately the same as the principal component analysis for high dimensional data. The rates of convergence of the sparse residual covariance matrix and the conditional sparse covariance matrix are studied under various norms, including the spectral norm. It is shown that the impact of estimating the unknown factors vanishes as the dimensionality increases. The uniform rates of convergence for the unobserved factors and their factor loadings are derived. The asymptotic results are also veried by extensive simulation studies.

Suggested Citation

  • Fan, Jianqing & Liao, Yuan & Mincheva, Martina, 2011. "Large covariance estimation by thresholding principal orthogonal complements," MPRA Paper 38697, University Library of Munich, Germany.
  • Handle: RePEc:pra:mprapa:38697
    as

    Download full text from publisher

    File URL: https://mpra.ub.uni-muenchen.de/38697/1/MPRA_paper_38697.pdf
    File Function: original version
    Download Restriction: no
    ---><---

    Other versions of this item:

    References listed on IDEAS

    as
    1. Ledoit, Olivier & Wolf, Michael, 2004. "A well-conditioned estimator for large-dimensional covariance matrices," Journal of Multivariate Analysis, Elsevier, vol. 88(2), pages 365-411, February.
    2. Fama, Eugene F & French, Kenneth R, 1992. "The Cross-Section of Expected Stock Returns," Journal of Finance, American Finance Association, vol. 47(2), pages 427-465, June.
    3. Peter Hall & J. S. Marron & Amnon Neeman, 2005. "Geometric representation of high dimension, low sample size data," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(3), pages 427-444, June.
    4. Johnstone, Iain M. & Lu, Arthur Yu, 2009. "On Consistency and Sparsity for Principal Components Analysis in High Dimensions," Journal of the American Statistical Association, American Statistical Association, vol. 104(486), pages 682-693.
    5. Kapetanios, George, 2010. "A Testing Procedure for Determining the Number of Factors in Approximate Factor Models With Large Datasets," Journal of Business & Economic Statistics, American Statistical Association, vol. 28(3), pages 397-409.
    6. Forni, Mario & Lippi, Marco, 2001. "The Generalized Dynamic Factor Model: Representation Theory," Econometric Theory, Cambridge University Press, vol. 17(6), pages 1113-1141, December.
    7. H. Wang, 2012. "Factor profiled sure independence screening," Biometrika, Biometrika Trust, vol. 99(1), pages 15-28.
    8. Forni, Mario & Hallin, Marc & Lippi, Marco & Reichlin, Lucrezia, 2004. "The generalized dynamic factor model consistency and rates," Journal of Econometrics, Elsevier, vol. 119(2), pages 231-255, April.
    9. Doz, Catherine & Giannone, Domenico & Reichlin, Lucrezia, 2011. "A two-step estimator for large approximate dynamic factor models based on Kalman filtering," Journal of Econometrics, Elsevier, vol. 164(1), pages 188-205, September.
    10. Jushan Bai & Serena Ng, 2002. "Determining the Number of Factors in Approximate Factor Models," Econometrica, Econometric Society, vol. 70(1), pages 191-221, January.
    11. Kourtis, Apostolos & Dotsis, George & Markellos, Raphael N., 2012. "Parameter uncertainty in portfolio selection: Shrinking the inverse covariance matrix," Journal of Banking & Finance, Elsevier, vol. 36(9), pages 2522-2531.
    12. Stephen A. Ross, 2013. "The Arbitrage Theory of Capital Asset Pricing," World Scientific Book Chapters, in: Leonard C MacLean & William T Ziemba (ed.), HANDBOOK OF THE FUNDAMENTALS OF FINANCIAL DECISION MAKING Part I, chapter 1, pages 11-30, World Scientific Publishing Co. Pte. Ltd..
    13. Shen, Haipeng & Huang, Jianhua Z., 2008. "Sparse principal component analysis via regularized low rank matrix approximation," Journal of Multivariate Analysis, Elsevier, vol. 99(6), pages 1015-1034, July.
    14. Kaufman, Cari G. & Schervish, Mark J. & Nychka, Douglas W., 2008. "Covariance Tapering for Likelihood-Based Estimation in Large Spatial Data Sets," Journal of the American Statistical Association, American Statistical Association, vol. 103(484), pages 1545-1555.
    15. Boivin, Jean & Ng, Serena, 2006. "Are more data always better for factor analysis?," Journal of Econometrics, Elsevier, vol. 132(1), pages 169-194, May.
    16. Shen, Dan & Shen, Haipeng & Marron, J.S., 2013. "Consistency of sparse PCA in High Dimension, Low Sample Size contexts," Journal of Multivariate Analysis, Elsevier, vol. 115(C), pages 317-333.
    17. Xi Luo, 2011. "Recovering Model Structures from Large Low Rank and Sparse Covariance Matrix Estimation," Papers 1111.1133, arXiv.org, revised Mar 2013.
    18. Joong-Ho Won & Johan Lim & Seung-Jean Kim & Bala Rajaratnam, 2013. "Condition-number-regularized covariance estimation," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 75(3), pages 427-450, June.
    19. Liu, Yufeng & Hayes, David Neil & Nobel, Andrew & Marron, J. S, 2008. "Statistical Significance of Clustering for High-Dimension, Low–Sample Size Data," Journal of the American Statistical Association, American Statistical Association, vol. 103(483), pages 1281-1293.
    20. Enrique Sentana, 2009. "The econometrics of mean-variance efficiency tests: a survey," Econometrics Journal, Royal Economic Society, vol. 12(3), pages 65-101, November.
    21. Alexander Chudik & M. Hashem Pesaran & Elisa Tosetti, 2011. "Weak and strong cross‐section dependence and estimation of large panels," Econometrics Journal, Royal Economic Society, vol. 14(1), pages 45-90, February.
    22. Jianqing Fan & Jingjin Zhang & Ke Yu, 2008. "Asset Allocation and Risk Assessment with Gross Exposure Constraints for Vast Portfolios," Papers 0812.2604, arXiv.org.
    23. repec:hal:journl:peer-00844811 is not listed on IDEAS
    24. Alexei Onatski, 2009. "Testing Hypotheses About the Number of Factors in Large Factor Models," Econometrica, Econometric Society, vol. 77(5), pages 1447-1479, September.
    25. Clifford Lam & Qiwei Yao & Neil Bathia, 2011. "Estimation of latent factors for high-dimensional time series," Biometrika, Biometrika Trust, vol. 98(4), pages 901-918.
    26. Jianqing Fan & Jingjin Zhang & Ke Yu, 2012. "Vast Portfolio Selection With Gross-Exposure Constraints," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 107(498), pages 592-606, June.
    27. Hallin, Marc & Liska, Roman, 2007. "Determining the Number of Factors in the General Dynamic Factor Model," Journal of the American Statistical Association, American Statistical Association, vol. 102, pages 603-617, June.
    28. Lam, Clifford & Yao, Qiwei & Bathia, Neil, 2011. "Estimation of latent factors for high-dimensional time series," LSE Research Online Documents on Economics 31549, London School of Economics and Political Science, LSE Library.
    29. Jushan Bai & Shuzhong Shi, 2011. "Estimating High Dimensional Covariance Matrices and its Applications," Annals of Economics and Finance, Society for AEF, vol. 12(2), pages 199-215, November.
    30. Lingzhou Xue & Shiqian Ma & Hui Zou, 2012. "Positive-Definite ℓ 1 -Penalized Estimation of Large Covariance Matrices," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 107(500), pages 1480-1491, December.
    31. Jushan Bai, 2003. "Inferential Theory for Factor Models of Large Dimensions," Econometrica, Econometric Society, vol. 71(1), pages 135-171, January.
    32. Mario Forni & Marc Hallin & Marco Lippi & Lucrezia Reichlin, 2000. "The Generalized Dynamic-Factor Model: Identification And Estimation," The Review of Economics and Statistics, MIT Press, vol. 82(4), pages 540-554, November.
    33. Ledoit, Olivier & Wolf, Michael, 2003. "Improved estimation of the covariance matrix of stock returns with an application to portfolio selection," Journal of Empirical Finance, Elsevier, vol. 10(5), pages 603-621, December.
    34. Bouveyron, C. & Girard, S. & Schmid, C., 2007. "High-dimensional data clustering," Computational Statistics & Data Analysis, Elsevier, vol. 52(1), pages 502-519, September.
    35. M. Hashem Pesaran, 2006. "Estimation and Inference in Large Heterogeneous Panels with a Multifactor Error Structure," Econometrica, Econometric Society, vol. 74(4), pages 967-1012, July.
    36. Jung, Sungkyu & Sen, Arusharka & Marron, J.S., 2012. "Boundary behavior in High Dimension, Low Sample Size asymptotics of PCA," Journal of Multivariate Analysis, Elsevier, vol. 109(C), pages 190-203.
    37. Chamberlain, Gary & Rothschild, Michael, 1983. "Arbitrage, Factor Structure, and Mean-Variance Analysis on Large Asset Markets," Econometrica, Econometric Society, vol. 51(5), pages 1281-1304, September.
    38. Fama, Eugene F. & French, Kenneth R., 1993. "Common risk factors in the returns on stocks and bonds," Journal of Financial Economics, Elsevier, vol. 33(1), pages 3-56, February.
    39. William F. Sharpe, 1964. "Capital Asset Prices: A Theory Of Market Equilibrium Under Conditions Of Risk," Journal of Finance, American Finance Association, vol. 19(3), pages 425-442, September.
    40. Cai, Tony & Liu, Weidong, 2011. "Adaptive Thresholding for Sparse Covariance Matrix Estimation," Journal of the American Statistical Association, American Statistical Association, vol. 106(494), pages 672-684.
    41. Stock, James H & Watson, Mark W, 2002. "Macroeconomic Forecasting Using Diffusion Indexes," Journal of Business & Economic Statistics, American Statistical Association, vol. 20(2), pages 147-162, April.
    42. Alexei Onatski, 2010. "Determining the Number of Factors from Empirical Distribution of Eigenvalues," The Review of Economics and Statistics, MIT Press, vol. 92(4), pages 1004-1016, November.
    43. Lam, Clifford & Fan, Jianqing, 2009. "Sparsistency and rates of convergence in large covariance matrix estimation," LSE Research Online Documents on Economics 31540, London School of Economics and Political Science, LSE Library.
    44. Baik, Jinho & Silverstein, Jack W., 2006. "Eigenvalues of large sample covariance matrices of spiked population models," Journal of Multivariate Analysis, Elsevier, vol. 97(6), pages 1382-1408, July.
    45. Jianqing Fan & Jinchi Lv, 2008. "Sure independence screening for ultrahigh dimensional feature space," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 70(5), pages 849-911, November.
    46. John Stephen Yap & Jianqing Fan & Rongling Wu, 2009. "Nonparametric Modeling of Longitudinal Covariance Structure in Functional Mapping of Quantitative Trait Loci," Biometrics, The International Biometric Society, vol. 65(4), pages 1068-1077, December.
    47. Ahn, Seung Chan & Hoon Lee, Young & Schmidt, Peter, 2001. "GMM estimation of linear panel data models with time-varying individual effects," Journal of Econometrics, Elsevier, vol. 101(2), pages 219-255, April.
    48. Ravi Jagannathan & Tongshu Ma, 2003. "Risk Reduction in Large Portfolios: Why Imposing the Wrong Constraints Helps," Journal of Finance, American Finance Association, vol. 58(4), pages 1651-1684, August.
    49. Rothman, Adam J. & Levina, Elizaveta & Zhu, Ji, 2009. "Generalized Thresholding of Large Covariance Matrices," Journal of the American Statistical Association, American Statistical Association, vol. 104(485), pages 177-186.
    50. Bai, Jushan & Ng, Serena, 2008. "Large Dimensional Factor Analysis," Foundations and Trends(R) in Econometrics, now publishers, vol. 3(2), pages 89-163, June.
    51. Lam, Clifford & Yao, Qiwei, 2012. "Factor modeling for high-dimensional time series: inference for the number of factors," LSE Research Online Documents on Economics 45684, London School of Economics and Political Science, LSE Library.
    52. Stock J.H. & Watson M.W., 2002. "Forecasting Using Principal Components From a Large Number of Predictors," Journal of the American Statistical Association, American Statistical Association, vol. 97, pages 1167-1179, December.
    53. Pesaran, M. Hashem & Yamagata, Takashi, 2012. "Testing CAPM with a Large Number of Assets," IZA Discussion Papers 6469, Institute of Labor Economics (IZA).
    54. Alessi, Lucia & Barigozzi, Matteo & Capasso, Marco, 2010. "Improved penalization for determining the number of factors in approximate factor models," Statistics & Probability Letters, Elsevier, vol. 80(23-24), pages 1806-1813, December.
    55. Fan, Jianqing & Fan, Yingying & Lv, Jinchi, 2008. "High dimensional covariance matrix estimation using a factor model," Journal of Econometrics, Elsevier, vol. 147(1), pages 186-197, November.
    56. Carvalho, Carlos M. & Chang, Jeffrey & Lucas, Joseph E. & Nevins, Joseph R. & Wang, Quanli & West, Mike, 2008. "High-Dimensional Sparse Factor Modeling: Applications in Gene Expression Genomics," Journal of the American Statistical Association, American Statistical Association, vol. 103(484), pages 1438-1456.
    57. Efron, Bradley, 2010. "Correlated z-Values and the Accuracy of Large-Scale Statistical Estimates," Journal of the American Statistical Association, American Statistical Association, vol. 105(491), pages 1042-1055.
    58. Efron, Bradley, 2007. "Correlation and Large-Scale Simultaneous Significance Testing," Journal of the American Statistical Association, American Statistical Association, vol. 102, pages 93-103, March.
    59. Hallin, Marc & Liska, Roman, 2011. "Dynamic factors in the presence of blocks," Journal of Econometrics, Elsevier, vol. 163(1), pages 29-41, July.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Aït-Sahalia, Yacine & Xiu, Dacheng, 2017. "Using principal component analysis to estimate a high dimensional factor model with high-frequency data," Journal of Econometrics, Elsevier, vol. 201(2), pages 384-399.
    2. Jianqing Fan & Yuan Liao & Han Liu, 2016. "An overview of the estimation of large covariance and precision matrices," Econometrics Journal, Royal Economic Society, vol. 19(1), pages 1-32, February.
    3. Bodnar, Taras & Reiß, Markus, 2016. "Exact and asymptotic tests on a factor model in low and large dimensions with applications," Journal of Multivariate Analysis, Elsevier, vol. 150(C), pages 125-151.
    4. Bai, Jushan & Liao, Yuan, 2016. "Efficient estimation of approximate factor models via penalized maximum likelihood," Journal of Econometrics, Elsevier, vol. 191(1), pages 1-18.
    5. Gagliardini, Patrick & Ossola, Elisa & Scaillet, Olivier, 2019. "A diagnostic criterion for approximate factor structure," Journal of Econometrics, Elsevier, vol. 212(2), pages 503-521.
    6. Dai, Chaoxing & Lu, Kun & Xiu, Dacheng, 2019. "Knowing factors or factor loadings, or neither? Evaluating estimators of large covariance matrices with noisy and asynchronous data," Journal of Econometrics, Elsevier, vol. 208(1), pages 43-79.
    7. Fan, Jianqing & Liao, Yuan & Shi, Xiaofeng, 2015. "Risks of large portfolios," Journal of Econometrics, Elsevier, vol. 186(2), pages 367-387.
    8. Barigozzi, Matteo & Trapani, Lorenzo, 2020. "Sequential testing for structural stability in approximate factor models," Stochastic Processes and their Applications, Elsevier, vol. 130(8), pages 5149-5187.
    9. Lam, Clifford, 2020. "High-dimensional covariance matrix estimation," LSE Research Online Documents on Economics 101667, London School of Economics and Political Science, LSE Library.
    10. Matteo Luciani, 2015. "Monetary Policy and the Housing Market: A Structural Factor Analysis," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 30(2), pages 199-218, March.
    11. Jörg Breitung & In Choi, 2013. "Factor models," Chapters, in: Nigar Hashimzade & Michael A. Thornton (ed.), Handbook of Research Methods and Applications in Empirical Macroeconomics, chapter 11, pages 249-265, Edward Elgar Publishing.
      • In Choi & Jorg Breitung, 2011. "Factor models," Working Papers 1121, Research Institute for Market Economy, Sogang University, revised Dec 2011.
    12. Fan, Jianqing & Ke, Yuan & Liao, Yuan, 2021. "Augmented factor models with applications to validating market risk factors and forecasting bond risk premia," Journal of Econometrics, Elsevier, vol. 222(1), pages 269-294.
    13. Matteo Barigozzi & Antonio M. Conti & Matteo Luciani, 2014. "Do Euro Area Countries Respond Asymmetrically to the Common Monetary Policy?," Oxford Bulletin of Economics and Statistics, Department of Economics, University of Oxford, vol. 76(5), pages 693-714, October.
    14. Stock, J.H. & Watson, M.W., 2016. "Dynamic Factor Models, Factor-Augmented Vector Autoregressions, and Structural Vector Autoregressions in Macroeconomics," Handbook of Macroeconomics, in: J. B. Taylor & Harald Uhlig (ed.), Handbook of Macroeconomics, edition 1, volume 2, chapter 0, pages 415-525, Elsevier.
    15. Bailey, Natalia & Pesaran, M. Hashem & Smith, L. Vanessa, 2019. "A multiple testing approach to the regularisation of large sample correlation matrices," Journal of Econometrics, Elsevier, vol. 208(2), pages 507-534.
    16. Heaton, Chris & Solo, Victor, 2012. "Estimation of high-dimensional linear factor models with grouped variables," Journal of Multivariate Analysis, Elsevier, vol. 105(1), pages 348-367.
    17. Pilar Poncela & Esther Ruiz, 2016. "Small- Versus Big-Data Factor Extraction in Dynamic Factor Models: An Empirical Assessment," Advances in Econometrics, in: Eric Hillebrand & Siem Jan Koopman (ed.), Dynamic Factor Models, volume 35, pages 401-434, Emerald Publishing Ltd.
    18. Fan, Jianqing & Xue, Lingzhou & Yao, Jiawei, 2017. "Sufficient forecasting using factor models," Journal of Econometrics, Elsevier, vol. 201(2), pages 292-306.
    19. Smeekes, Stephan & Wijler, Etienne, 2018. "Macroeconomic forecasting using penalized regression methods," International Journal of Forecasting, Elsevier, vol. 34(3), pages 408-430.
    20. Helmut Lütkepohl, 2014. "Structural Vector Autoregressive Analysis in a Data Rich Environment: A Survey," Discussion Papers of DIW Berlin 1351, DIW Berlin, German Institute for Economic Research.

    More about this item

    Keywords

    High dimensionality; approximate factor model; unknown factors; principal components; sparse matrix; low-rank matrix; thresholding; cross-sectional correlation;
    All these keywords.

    JEL classification:

    • C13 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods and Methodology: General - - - Estimation: General
    • C01 - Mathematical and Quantitative Methods - - General - - - Econometrics

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:pra:mprapa:38697. See general information about how to correct material in RePEc.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: . General contact details of provider: https://edirc.repec.org/data/vfmunde.html .

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Joachim Winter (email available below). General contact details of provider: https://edirc.repec.org/data/vfmunde.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service hosted by the Research Division of the Federal Reserve Bank of St. Louis . RePEc uses bibliographic data supplied by the respective publishers.