IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v55y2011i11p2975-2990.html
   My bibliography  Save this article

Two-group classification with high-dimensional correlated data: A factor model approach

Author

Listed:
  • Pedro Duarte Silva, A.

Abstract

A class of linear classification rules, specifically designed for high-dimensional problems, is proposed. The new rules are based on Gaussian factor models and are able to incorporate successfully the information contained in the sample correlations. Asymptotic results, that allow the number of variables to grow faster than the number of observations, demonstrate that the worst possible expected error rate of the proposed rules converges to the error of the optimal Bayes rule when the postulated model is true, and to a slightly larger constant when this model is a reasonable approximation to the data generating process. Numerical comparisons suggest that, when combined with appropriate variable selection strategies, rules derived from one-factor models perform comparably, or better, than the most successful extant alternatives under the conditions they were designed for. The proposed methods are implemented as an R package named HiDimDA, available from the CRAN repository.

Suggested Citation

  • Pedro Duarte Silva, A., 2011. "Two-group classification with high-dimensional correlated data: A factor model approach," Computational Statistics & Data Analysis, Elsevier, vol. 55(11), pages 2975-2990, November.
  • Handle: RePEc:eee:csdana:v:55:y:2011:i:11:p:2975-2990
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S016794731100168X
    Download Restriction: Full text for ScienceDirect subscribers only.
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Ledoit, Olivier & Wolf, Michael, 2004. "A well-conditioned estimator for large-dimensional covariance matrices," Journal of Multivariate Analysis, Elsevier, vol. 88(2), pages 365-411, February.
    2. Xu, Ping & Brock, Guy N. & Parrish, Rudolph S., 2009. "Modified linear discriminant analysis approaches for classification of high-dimensional microarray data," Computational Statistics & Data Analysis, Elsevier, vol. 53(5), pages 1674-1687, March.
    3. Chakraborty, Sounak & Guo, Ruixin, 2011. "A Bayesian hybrid Huberized support vector machine and its applications in high-dimensional medical data," Computational Statistics & Data Analysis, Elsevier, vol. 55(3), pages 1342-1356, March.
    4. Schäfer Juliane & Strimmer Korbinian, 2005. "A Shrinkage Approach to Large-Scale Covariance Matrix Estimation and Implications for Functional Genomics," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 4(1), pages 1-32, November.
    5. Choi, Hosik & Yeo, Donghwa & Kwon, Sunghoon & Kim, Yongdai, 2011. "Gene selection and prediction for cancer classification using support vector machines with a reject option," Computational Statistics & Data Analysis, Elsevier, vol. 55(5), pages 1897-1908, May.
    6. Duarte Silva, António Pedro, 2001. "Efficient Variable Screening for Multivariate Analysis," Journal of Multivariate Analysis, Elsevier, vol. 76(1), pages 35-62, January.
    7. Dudoit S. & Fridlyand J. & Speed T. P, 2002. "Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data," Journal of the American Statistical Association, American Statistical Association, vol. 97, pages 77-87, March.
    8. Efron, Bradley, 2004. "Large-Scale Simultaneous Hypothesis Testing: The Choice of a Null Hypothesis," Journal of the American Statistical Association, American Statistical Association, vol. 99, pages 96-104, January.
    9. Fisher, Thomas J. & Sun, Xiaoqian, 2011. "Improved Stein-type shrinkage estimators for the high-dimensional multivariate normal covariance matrix," Computational Statistics & Data Analysis, Elsevier, vol. 55(5), pages 1909-1918, May.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Ledoit, Olivier & Wolf, Michael, 2015. "Spectrum estimation: A unified framework for covariance matrix estimation and PCA in large dimensions," Journal of Multivariate Analysis, Elsevier, vol. 139(C), pages 360-384.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Touloumis, Anestis, 2015. "Nonparametric Stein-type shrinkage covariance matrix estimators in high-dimensional settings," Computational Statistics & Data Analysis, Elsevier, vol. 83(C), pages 251-261.
    2. van Wieringen, Wessel N. & Stam, Koen A. & Peeters, Carel F.W. & van de Wiel, Mark A., 2020. "Updating of the Gaussian graphical model through targeted penalized estimation," Journal of Multivariate Analysis, Elsevier, vol. 178(C).
    3. Brett Naul & Bala Rajaratnam & Dario Vincenzi, 2016. "The role of the isotonizing algorithm in Stein’s covariance matrix estimator," Computational Statistics, Springer, vol. 31(4), pages 1453-1476, December.
    4. Arnab Chakrabarti & Rituparna Sen, 2018. "Some Statistical Problems with High Dimensional Financial data," Papers 1808.02953, arXiv.org.
    5. Xu, Ping & Brock, Guy N. & Parrish, Rudolph S., 2009. "Modified linear discriminant analysis approaches for classification of high-dimensional microarray data," Computational Statistics & Data Analysis, Elsevier, vol. 53(5), pages 1674-1687, March.
    6. Fisher, Thomas J. & Sun, Xiaoqian, 2011. "Improved Stein-type shrinkage estimators for the high-dimensional multivariate normal covariance matrix," Computational Statistics & Data Analysis, Elsevier, vol. 55(5), pages 1909-1918, May.
    7. Ikeda, Yuki & Kubokawa, Tatsuya & Srivastava, Muni S., 2016. "Comparison of linear shrinkage estimators of a large covariance matrix in normal and non-normal distributions," Computational Statistics & Data Analysis, Elsevier, vol. 95(C), pages 95-108.
    8. Shen, Yanfeng & Lin, Zhengyan & Zhu, Jun, 2011. "Shrinkage-based regularization tests for high-dimensional data with application to gene set analysis," Computational Statistics & Data Analysis, Elsevier, vol. 55(7), pages 2221-2233, July.
    9. Hannart, Alexis & Naveau, Philippe, 2014. "Estimating high dimensional covariance matrices: A new look at the Gaussian conjugate framework," Journal of Multivariate Analysis, Elsevier, vol. 131(C), pages 149-162.
    10. Avagyan, Vahe & Alonso Fernández, Andrés Modesto & Nogales, Francisco J., 2015. "D-trace Precision Matrix Estimation Using Adaptive Lasso Penalties," DES - Working Papers. Statistics and Econometrics. WS 21775, Universidad Carlos III de Madrid. Departamento de Estadística.
    11. Christian Bongiorno, 2020. "Bootstraps Regularize Singular Correlation Matrices," Working Papers hal-02536278, HAL.
    12. Won, Joong-Ho & Lim, Johan & Yu, Donghyeon & Kim, Byung Soo & Kim, Kyunga, 2014. "Monotone false discovery rate," Statistics & Probability Letters, Elsevier, vol. 87(C), pages 86-93.
    13. Tatsuya Kubokawa & Muni S. Srivastava, 2013. "Optimal Ridge-type Estimators of Covariance Matrix in High Dimension," CIRJE F-Series CIRJE-F-906, CIRJE, Faculty of Economics, University of Tokyo.
    14. Zongliang Hu & Zhishui Hu & Kai Dong & Tiejun Tong & Yuedong Wang, 2021. "A shrinkage approach to joint estimation of multiple covariance matrices," Metrika: International Journal for Theoretical and Applied Statistics, Springer, vol. 84(3), pages 339-374, April.
    15. Yang, Guangren & Liu, Yiming & Pan, Guangming, 2019. "Weighted covariance matrix estimation," Computational Statistics & Data Analysis, Elsevier, vol. 139(C), pages 82-98.
    16. Tenenhaus, Arthur & Philippe, Cathy & Frouin, Vincent, 2015. "Kernel Generalized Canonical Correlation Analysis," Computational Statistics & Data Analysis, Elsevier, vol. 90(C), pages 114-131.
    17. Ledoit, Olivier & Wolf, Michael, 2017. "Numerical implementation of the QuEST function," Computational Statistics & Data Analysis, Elsevier, vol. 115(C), pages 199-223.
    18. Tsukuma, Hisayuki, 2016. "Estimation of a high-dimensional covariance matrix with the Stein loss," Journal of Multivariate Analysis, Elsevier, vol. 148(C), pages 1-17.
    19. Sumanjay Dutta & Shashi Jain, 2023. "Precision versus Shrinkage: A Comparative Analysis of Covariance Estimation Methods for Portfolio Allocation," Papers 2305.11298, arXiv.org.
    20. Lam, Clifford, 2020. "High-dimensional covariance matrix estimation," LSE Research Online Documents on Economics 101667, London School of Economics and Political Science, LSE Library.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:55:y:2011:i:11:p:2975-2990. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.