IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v124y2018icp132-150.html
   My bibliography  Save this article

Variable selection for high dimensional Gaussian copula regression model: An adaptive hypothesis testing procedure

Author

Listed:
  • He, Yong
  • Zhang, Xinsheng
  • Zhang, Liwen

Abstract

In this paper we consider the variable selection problem for high dimensional Gaussian copula regression model. We transform the variable selection problem into a multiple testing problem. Compared to the existing methods depending on regularization or a stepwise algorithm, our method avoids the ambiguous relationship between the regularized parameter and the number of false discovered variables or the decision of a stopping rule. We exploit nonparametric rank-based correlation coefficient estimators to construct our test statistics which achieve robustness and adaptivity to the unknown monotone marginal transformations. We show that our multiple testing procedure can control the false discovery rate (FDR) or the average number of falsely discovered variables (FDV) asymptotically. We also propose a screening multiple testing procedure to deal with the extremely high dimensional setting. Besides theoretical analysis, we also conduct numerical simulations to compare the variable selection performance of our method with some state-of-the-art methods. The proposed method is also applied on a communities and crime unnormalized data set to illustrate its empirical usefulness.

Suggested Citation

  • He, Yong & Zhang, Xinsheng & Zhang, Liwen, 2018. "Variable selection for high dimensional Gaussian copula regression model: An adaptive hypothesis testing procedure," Computational Statistics & Data Analysis, Elsevier, vol. 124(C), pages 132-150.
  • Handle: RePEc:eee:csdana:v:124:y:2018:i:c:p:132-150
    DOI: 10.1016/j.csda.2018.03.003
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167947318300513
    Download Restriction: Full text for ScienceDirect subscribers only.

    File URL: https://libkey.io/10.1016/j.csda.2018.03.003?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Zou, Hui, 2006. "The Adaptive Lasso and Its Oracle Properties," Journal of the American Statistical Association, American Statistical Association, vol. 101, pages 1418-1429, December.
    2. Rui Song & Wenbin Lu & Shuangge Ma & X. Jessie Jeng, 2014. "Censored rank independence screening for high-dimensional survival data," Biometrika, Biometrika Trust, vol. 101(4), pages 799-814.
    3. Michael Pitt & David Chan & Robert Kohn, 2006. "Efficient Bayesian inference for Gaussian copula regression models," Biometrika, Biometrika Trust, vol. 93(3), pages 537-554, September.
    4. Radchenko, Peter, 2015. "High dimensional single index models," Journal of Multivariate Analysis, Elsevier, vol. 139(C), pages 266-282.
    5. Pradeep Ravikumar & John Lafferty & Han Liu & Larry Wasserman, 2009. "Sparse additive models," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 71(5), pages 1009-1030, November.
    6. Fan J. & Li R., 2001. "Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties," Journal of the American Statistical Association, American Statistical Association, vol. 96, pages 1348-1360, December.
    7. Hohsuk Noh & Anouar El Ghouch & Taoufik Bouezmarni, 2013. "Copula-Based Regression Estimation and Inference," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 108(502), pages 676-688, June.
    8. He, Yong & Zhang, Xinsheng & Wang, Pingping & Zhang, Liwen, 2017. "High dimensional Gaussian copula graphical model with FDR control," Computational Statistics & Data Analysis, Elsevier, vol. 113(C), pages 457-474.
    9. He, Yong & Zhang, Xinsheng & Wang, Pingping, 2016. "Discriminant analysis on high dimensional Gaussian copula model," Statistics & Probability Letters, Elsevier, vol. 117(C), pages 100-112.
    10. Jianqing Fan & Jinchi Lv, 2008. "Sure independence screening for ultrahigh dimensional feature space," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 70(5), pages 849-911, November.
    11. Zhu, Li-Ping & Zhu, Li-Xing, 2009. "Nonconcave penalized inverse regression in single-index models with high dimensional predictors," Journal of Multivariate Analysis, Elsevier, vol. 100(5), pages 862-875, May.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Stanislav Anatolyev & Vladimir Pyrlik, 2021. "Shrinkage for Gaussian and t Copulas in Ultra-High Dimensions," CERGE-EI Working Papers wp699, The Center for Economic Research and Graduate Education - Economics Institute, Prague.
    2. Li Liu & Yu-Min Liu & Jong-Min Kim & Rui Zhong & Guang-Qian Ren, 2020. "Analysis of Tail Dependence between Sovereign Debt Distress and Bank Non-Performing Loans," Sustainability, MDPI, vol. 12(2), pages 1-20, January.
    3. Nikoloulopoulos, Aristidis K., 2023. "Efficient and feasible inference for high-dimensional normal copula regression models," Computational Statistics & Data Analysis, Elsevier, vol. 179(C).
    4. Anatolyev, Stanislav & Pyrlik, Vladimir, 2022. "Copula shrinkage and portfolio allocation in ultra-high dimensions," Journal of Economic Dynamics and Control, Elsevier, vol. 143(C).
    5. Yu, Long & He, Yong & Zhang, Xinsheng, 2019. "Robust factor number specification for large-dimensional elliptical factor model," Journal of Multivariate Analysis, Elsevier, vol. 174(C).

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Tan, Xin Lu, 2019. "Optimal estimation of slope vector in high-dimensional linear transformation models," Journal of Multivariate Analysis, Elsevier, vol. 169(C), pages 179-204.
    2. He, Yong & Zhang, Liang & Ji, Jiadong & Zhang, Xinsheng, 2019. "Robust feature screening for elliptical copula regression model," Journal of Multivariate Analysis, Elsevier, vol. 173(C), pages 568-582.
    3. Chen, Xiaolin & Chen, Xiaojing & Wang, Hong, 2018. "Robust feature screening for ultra-high dimensional right censored data via distance correlation," Computational Statistics & Data Analysis, Elsevier, vol. 119(C), pages 118-138.
    4. Zhong, Wei & Wang, Jiping & Chen, Xiaolin, 2021. "Censored mean variance sure independence screening for ultrahigh dimensional survival data," Computational Statistics & Data Analysis, Elsevier, vol. 159(C).
    5. Jing Zhang & Guosheng Yin & Yanyan Liu & Yuanshan Wu, 2018. "Censored cumulative residual independent screening for ultrahigh-dimensional survival data," Lifetime Data Analysis: An International Journal Devoted to Statistical Methods and Applications for Time-to-Event Data, Springer, vol. 24(2), pages 273-292, April.
    6. Jing Zhang & Haibo Zhou & Yanyan Liu & Jianwen Cai, 2021. "Conditional screening for ultrahigh-dimensional survival data in case-cohort studies," Lifetime Data Analysis: An International Journal Devoted to Statistical Methods and Applications for Time-to-Event Data, Springer, vol. 27(4), pages 632-661, October.
    7. Zhang, Jing & Liu, Yanyan & Wu, Yuanshan, 2017. "Correlation rank screening for ultrahigh-dimensional survival data," Computational Statistics & Data Analysis, Elsevier, vol. 108(C), pages 121-132.
    8. Yan, Xiaodong & Wang, Hongni & Wang, Wei & Xie, Jinhan & Ren, Yanyan & Wang, Xinjun, 2021. "Optimal model averaging forecasting in high-dimensional survival analysis," International Journal of Forecasting, Elsevier, vol. 37(3), pages 1147-1155.
    9. Liu, Yanyan & Zhang, Jing & Zhao, Xingqiu, 2018. "A new nonparametric screening method for ultrahigh-dimensional survival data," Computational Statistics & Data Analysis, Elsevier, vol. 119(C), pages 74-85.
    10. Kuang-Yao Lee & Bing Li & Hongyu Zhao, 2016. "Variable selection via additive conditional independence," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 78(5), pages 1037-1055, November.
    11. Du, Pang & Cheng, Guang & Liang, Hua, 2012. "Semiparametric regression models with additive nonparametric components and high dimensional parametric components," Computational Statistics & Data Analysis, Elsevier, vol. 56(6), pages 2006-2017.
    12. Li-Pang Chen, 2021. "Feature screening based on distance correlation for ultrahigh-dimensional censored data with covariate measurement error," Computational Statistics, Springer, vol. 36(2), pages 857-884, June.
    13. Guo, Chaohui & Lv, Jing & Wu, Jibo, 2021. "Composite quantile regression for ultra-high dimensional semiparametric model averaging," Computational Statistics & Data Analysis, Elsevier, vol. 160(C).
    14. Yize Zhao & Matthias Chung & Brent A. Johnson & Carlos S. Moreno & Qi Long, 2016. "Hierarchical Feature Selection Incorporating Known and Novel Biological Information: Identifying Genomic Features Related to Prostate Cancer Recurrence," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 111(516), pages 1427-1439, October.
    15. Gaorong Li & Liugen Xue & Heng Lian, 2012. "SCAD-penalised generalised additive models with non-polynomial dimensionality," Journal of Nonparametric Statistics, Taylor & Francis Journals, vol. 24(3), pages 681-697.
    16. Shan Luo & Zehua Chen, 2014. "Sequential Lasso Cum EBIC for Feature Selection With Ultra-High Dimensional Feature Space," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 109(507), pages 1229-1240, September.
    17. Shi Chen & Wolfgang Karl Hardle & Brenda L'opez Cabrera, 2020. "Regularization Approach for Network Modeling of German Power Derivative Market," Papers 2009.09739, arXiv.org.
    18. Wang, Christina Dan & Chen, Zhao & Lian, Yimin & Chen, Min, 2022. "Asset selection based on high frequency Sharpe ratio," Journal of Econometrics, Elsevier, vol. 227(1), pages 168-188.
    19. Peter Bühlmann & Jacopo Mandozzi, 2014. "High-dimensional variable screening and bias in subsequent inference, with an empirical comparison," Computational Statistics, Springer, vol. 29(3), pages 407-430, June.
    20. Anders Bredahl Kock, 2012. "On the Oracle Property of the Adaptive Lasso in Stationary and Nonstationary Autoregressions," CREATES Research Papers 2012-05, Department of Economics and Business Economics, Aarhus University.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:124:y:2018:i:c:p:132-150. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.