IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2503.15186.html
   My bibliography  Save this paper

Optimal Data Splitting for Holdout Cross-Validation in Large Covariance Matrix Estimation

Author

Listed:
  • Lamia Lamrani
  • Christian Bongiorno
  • Marc Potters

Abstract

Cross-validation is a statistical tool that can be used to improve large covariance matrix estimation. Although its efficiency is observed in practical applications, the theoretical reasons behind it remain largely intuitive, with formal proofs currently lacking. To carry on analytical analysis, we focus on the holdout method, a single iteration of cross-validation, rather than the traditional $k$-fold approach. We derive a closed-form expression for the estimation error when the population matrix follows a white inverse Wishart distribution, and we observe the optimal train-test split scales as the square root of the matrix dimension. For general population matrices, we connected the error to the variance of eigenvalues distribution, but approximations are necessary. Interestingly, in the high-dimensional asymptotic regime, both the holdout and $k$-fold cross-validation methods converge to the optimal estimator when the train-test ratio scales with the square root of the matrix dimension.

Suggested Citation

  • Lamia Lamrani & Christian Bongiorno & Marc Potters, 2025. "Optimal Data Splitting for Holdout Cross-Validation in Large Covariance Matrix Estimation," Papers 2503.15186, arXiv.org.
  • Handle: RePEc:arx:papers:2503.15186
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2503.15186
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Olivier Ledoit & Michael Wolf, 2022. "The Power of (Non-)Linear Shrinking: A Review and Guide to Covariance Matrix Estimation [Design-Free Estimation of Variance Matrices]," Journal of Financial Econometrics, Oxford University Press, vol. 20(1), pages 187-218.
    2. Haff, L. R., 1979. "An identity for the Wishart distribution with applications," Journal of Multivariate Analysis, Elsevier, vol. 9(4), pages 531-544, December.
    3. Joel Bun & Romain Allez & Jean-Philippe Bouchaud & Marc Potters, 2015. "Rotational invariant estimator for general noisy matrices," Papers 1502.06736, arXiv.org, revised Oct 2016.
    4. Fama, Eugene F. & French, Kenneth R., 1993. "Common risk factors in the returns on stocks and bonds," Journal of Financial Economics, Elsevier, vol. 33(1), pages 3-56, February.
    5. Bruce G. Marcot & Anca M. Hanea, 2021. "What is an optimal value of k in k-fold cross-validation in discrete Bayesian network analysis?," Computational Statistics, Springer, vol. 36(3), pages 2009-2031, September.
    6. Mörstedt, Torsten & Lutz, Bernhard & Neumann, Dirk, 2024. "Cross validation based transfer learning for cross-sectional non-linear shrinkage: A data-driven approach in portfolio optimization," European Journal of Operational Research, Elsevier, vol. 318(2), pages 670-685.
    7. Ester Pantaleo & Michele Tumminello & Fabrizio Lillo & Rosario Mantegna, 2011. "When do improved covariance matrix estimators enhance portfolio optimization? An empirical comparative study of nine estimators," Quantitative Finance, Taylor & Francis Journals, vol. 11(7), pages 1067-1080.
    8. Christian Bongiorno & Damien Challet & Grégoire Loeper, 2023. "Filtering time-dependent covariance matrices using time-independent eigenvalues," Post-Print hal-03481441, HAL.
    9. Fan, Jianqing & Fan, Yingying & Lv, Jinchi, 2008. "High dimensional covariance matrix estimation using a factor model," Journal of Econometrics, Elsevier, vol. 147(1), pages 186-197, November.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Esra Ulasan & A. Özlem Önder, 2023. "Large portfolio optimisation approaches," Journal of Asset Management, Palgrave Macmillan, vol. 24(6), pages 485-497, October.
    2. Francesco Lautizi, 2015. "Large Scale Covariance Estimates for Portfolio Selection," CEIS Research Paper 353, Tor Vergata University, CEIS, revised 07 Aug 2015.
    3. Yan Zhang & Jiyuan Tao & Zhixiang Yin & Guoqiang Wang, 2022. "Improved Large Covariance Matrix Estimation Based on Efficient Convex Combination and Its Application in Portfolio Optimization," Mathematics, MDPI, vol. 10(22), pages 1-15, November.
    4. Jin Yuan & Xianghui Yuan, 2023. "A Best Linear Empirical Bayes Method for High-Dimensional Covariance Matrix Estimation," SAGE Open, , vol. 13(2), pages 21582440231, June.
    5. Joel Bun & Jean-Philippe Bouchaud & Marc Potters, 2016. "Cleaning large correlation matrices: tools from random matrix theory," Papers 1610.08104, arXiv.org.
    6. Ikeda, Yuki & Kubokawa, Tatsuya, 2016. "Linear shrinkage estimation of large covariance matrices using factor models," Journal of Multivariate Analysis, Elsevier, vol. 152(C), pages 61-81.
    7. Fan, Jianqing & Liao, Yuan & Shi, Xiaofeng, 2015. "Risks of large portfolios," Journal of Econometrics, Elsevier, vol. 186(2), pages 367-387.
    8. Seyoung Park & Eun Ryung Lee & Sungchul Lee & Geonwoo Kim, 2019. "Dantzig Type Optimization Method with Applications to Portfolio Selection," Sustainability, MDPI, vol. 11(11), pages 1-32, June.
    9. Nikolaus Hautsch & Lada M. Kyj & Peter Malec, 2015. "Do High‐Frequency Data Improve High‐Dimensional Portfolio Allocations?," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 30(2), pages 263-290, March.
    10. Hafner, Christian M. & Linton, Oliver B. & Tang, Haihan, 2020. "Estimation of a multiplicative correlation structure in the large dimensional case," Journal of Econometrics, Elsevier, vol. 217(2), pages 431-470.
    11. Pesaran, M. Hashem & Yamagata, Takashi, 2012. "Testing CAPM with a Large Number of Assets," IZA Discussion Papers 6469, Institute of Labor Economics (IZA).
    12. Tae-Hwy Lee & Ekaterina Seregina, 2024. "Optimal Portfolio Using Factor Graphical Lasso," Journal of Financial Econometrics, Oxford University Press, vol. 22(3), pages 670-695.
    13. Francisco Peñaranda & Enrique Sentana, 2024. "Portfolio management with big data," Working Papers wp2024_2411, CEMFI.
    14. Tu, Jun & Zhou, Guofu, 2011. "Markowitz meets Talmud: A combination of sophisticated and naive diversification strategies," Journal of Financial Economics, Elsevier, vol. 99(1), pages 204-215, January.
    15. Arco van Oord & Martin Martens & Herman K. van Dijk, 2009. "Robust Optimization of the Equity Momentum Strategy," Tinbergen Institute Discussion Papers 09-011/4, Tinbergen Institute.
    16. Yilie Huang & Yanwei Jia & Xun Yu Zhou, 2024. "Mean--Variance Portfolio Selection by Continuous-Time Reinforcement Learning: Algorithms, Regret Analysis, and Empirical Study," Papers 2412.16175, arXiv.org.
    17. Pun, Chi Seng & Wong, Hoi Ying, 2019. "A linear programming model for selection of sparse high-dimensional multiperiod portfolios," European Journal of Operational Research, Elsevier, vol. 273(2), pages 754-771.
    18. Christian Bongiorno & Damien Challet, 2023. "The Oracle estimator is suboptimal for global minimum variance portfolio optimisation," Post-Print hal-03491913, HAL.
    19. Jianqing Fan & Jingjin Zhang & Ke Yu, 2008. "Asset Allocation and Risk Assessment with Gross Exposure Constraints for Vast Portfolios," Papers 0812.2604, arXiv.org.
    20. Wei Lan & Ronghua Luo & Chih-Ling Tsai & Hansheng Wang & Yunhong Yang, 2015. "Testing the Diagonality of a Large Covariance Matrix in a Regression Setting," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 33(1), pages 76-86, January.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2503.15186. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.