IDEAS home Printed from https://ideas.repec.org/a/spr/compst/v39y2024i3d10.1007_s00180-023-01346-4.html
   My bibliography  Save this article

Controlling the false discovery rate by a Latent Gaussian Copula Knockoff procedure

Author

Listed:
  • Alejandro Román Vásquez

    (Centro de Investigación en Matemáticas A.C., Unidad Monterrey)

  • José Ulises Márquez Urbina

    (Centro de Investigación en Matemáticas A.C., Unidad Monterrey
    Consejo Nacional de Ciencia y Tecnología)

  • Graciela González Farías

    (Centro de Investigación en Matemáticas A.C., Unidad Monterrey)

  • Gabriel Escarela

    (Universidad Autónoma Metropolitana)

Abstract

The penalized Lasso Cox proportional hazards model has been widely used to identify prognosis biomarkers in high-dimension settings. However, this method tends to select many false positives, affecting its interpretability. In order to improve the reproducibility, we develop a knockoff procedure that consists on wrapping the Lasso Cox model with the model-X knockoff, resulting in a powerful tool for variable selection that allows for the control of the false discovery rate in the presence of finite sample guarantees. In this paper, we propose a novel approach to sample valid knockoffs for ordinal and continuous variables whose distributions can be skewed or heavy-tailed, which employs a Latent Mixed Gaussian Copula model to account for the dependence structure between the variables, leading to what we call the Latent Gaussian Copula Knockoff (LGCK) procedure. We then combine the LGCK method with the Lasso coefficient difference (LCD) statistic as the importance metric. To our knowledge, our proposal is the first knockoff framework for jointly considering ordinal and continuous data in a non-Gaussian setting and a survival context. We illustrate the proposed methodology’s effectiveness by applying it to a real lung cancer gene expression dataset.

Suggested Citation

  • Alejandro Román Vásquez & José Ulises Márquez Urbina & Graciela González Farías & Gabriel Escarela, 2024. "Controlling the false discovery rate by a Latent Gaussian Copula Knockoff procedure," Computational Statistics, Springer, vol. 39(3), pages 1435-1458, May.
  • Handle: RePEc:spr:compst:v:39:y:2024:i:3:d:10.1007_s00180-023-01346-4
    DOI: 10.1007/s00180-023-01346-4
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s00180-023-01346-4
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s00180-023-01346-4?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Xiang, Anny & Lapuerta, Pablo & Ryutov, Alex & Buckley, Jonathan & Azen, Stanley, 2000. "Comparison of the performance of neural network methods and Cox regression for censored survival data," Computational Statistics & Data Analysis, Elsevier, vol. 34(2), pages 243-257, August.
    2. Stephen Bates & Emmanuel Candès & Lucas Janson & Wenshuo Wang, 2021. "Metropolized Knockoff Sampling," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 116(535), pages 1413-1427, July.
    3. Simon, Noah & Friedman, Jerome H. & Hastie, Trevor & Tibshirani, Rob, 2011. "Regularization Paths for Cox's Proportional Hazards Model via Coordinate Descent," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 39(i05).
    4. Grace Yoon & Raymond J Carroll & Irina Gaynanova, 2020. "Sparse semiparametric canonical correlation analysis for data of mixed types," Biometrika, Biometrika Trust, vol. 107(3), pages 609-625.
    5. Kim, Hyoung-Moon & Mallick, Bani K., 2003. "Moments of random vectors with skew t distribution and their quadratic forms," Statistics & Probability Letters, Elsevier, vol. 63(4), pages 417-423, July.
    6. Roberts, S. & Nowak, G., 2014. "Stabilizing the lasso against cross-validation variability," Computational Statistics & Data Analysis, Elsevier, vol. 70(C), pages 198-211.
    7. Jianqing Fan & Han Liu & Yang Ning & Hui Zou, 2017. "High dimensional semiparametric latent graphical model for mixed data," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 79(2), pages 405-421, March.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Yutong Liu & Toni Darville & Xiaojing Zheng & Quefeng Li, 2023. "Decomposition of variation of mixed variables by a latent mixed Gaussian copula model," Biometrics, The International Biometric Society, vol. 79(2), pages 1187-1200, June.
    2. Soave, David & Lawless, Jerald F., 2023. "Regularized regression for two phase failure time studies," Computational Statistics & Data Analysis, Elsevier, vol. 182(C).
    3. Hua Xin & Yuhlong Lio & Hsien-Ching Chen & Tzong-Ru Tsai, 2024. "Zero-Inflated Binary Classification Model with Elastic Net Regularization," Mathematics, MDPI, vol. 12(19), pages 1-17, September.
    4. Zemin Zheng & Jie Zhang & Yang Li, 2022. "L 0 -Regularized Learning for High-Dimensional Additive Hazards Regression," INFORMS Journal on Computing, INFORMS, vol. 34(5), pages 2762-2775, September.
    5. Li, Xiao & Matsuda, Takeru & Komaki, Fumiyasu, 2024. "Empirical Bayes Poisson matrix completion," Computational Statistics & Data Analysis, Elsevier, vol. 197(C).
    6. Simon Bussy & Mokhtar Z. Alaya & Anne‐Sophie Jannot & Agathe Guilloux, 2022. "Binacox: automatic cut‐point detection in high‐dimensional Cox model with applications in genetics," Biometrics, The International Biometric Society, vol. 78(4), pages 1414-1426, December.
    7. Biagini, Francesca & Groll, Andreas & Widenmann, Jan, 2013. "Intensity-based premium evaluation for unemployment insurance products," Insurance: Mathematics and Economics, Elsevier, vol. 53(1), pages 302-316.
    8. Benedicte Sjo Tislevoll & Monica Hellesøy & Oda Helen Eck Fagerholt & Stein-Erik Gullaksen & Aashish Srivastava & Even Birkeland & Dimitrios Kleftogiannis & Pilar Ayuda-Durán & Laure Piechaczyk & Dagi, 2023. "Early response evaluation by single cell signaling profiling in acute myeloid leukemia," Nature Communications, Nature, vol. 14(1), pages 1-17, December.
    9. Fang, B.Q., 2006. "Sample mean, covariance and T2 statistic of the skew elliptical model," Journal of Multivariate Analysis, Elsevier, vol. 97(7), pages 1675-1690, August.
    10. Matthew F Dixon, 2017. "A High Frequency Trade Execution Model for Supervised Learning," Papers 1710.03870, arXiv.org, revised Dec 2017.
    11. Leandro C. Hermida & E. Michael Gertz & Eytan Ruppin, 2022. "Predicting cancer prognosis and drug response from the tumor microbiome," Nature Communications, Nature, vol. 13(1), pages 1-15, December.
    12. Takumi Saegusa & Tianzhou Ma & Gang Li & Ying Qing Chen & Mei-Ling Ting Lee, 2020. "Variable Selection in Threshold Regression Model with Applications to HIV Drug Adherence Data," Statistics in Biosciences, Springer;International Chinese Statistical Association, vol. 12(3), pages 376-398, December.
    13. Zhixuan Fu & Shuangge Ma & Haiqun Lin & Chirag R. Parikh & Bingqing Zhou, 2017. "Penalized Variable Selection for Multi-center Competing Risks Data," Statistics in Biosciences, Springer;International Chinese Statistical Association, vol. 9(2), pages 379-405, December.
    14. Loperfido, Nicola, 2018. "Skewness-based projection pursuit: A computational approach," Computational Statistics & Data Analysis, Elsevier, vol. 120(C), pages 42-57.
    15. Wenhua Liang & Jianhua Yao & Ailan Chen & Qingquan Lv & Mark Zanin & Jun Liu & SookSan Wong & Yimin Li & Jiatao Lu & Hengrui Liang & Guoqiang Chen & Haiyan Guo & Jun Guo & Rong Zhou & Limin Ou & Niyun, 2020. "Early triage of critically ill COVID-19 patients using deep learning," Nature Communications, Nature, vol. 11(1), pages 1-7, December.
    16. Tatyana Deryugina & Garth Heutel & Nolan H. Miller & David Molitor & Julian Reif, 2019. "The Mortality and Medical Costs of Air Pollution: Evidence from Changes in Wind Direction," American Economic Review, American Economic Association, vol. 109(12), pages 4178-4219, December.
    17. Andreas Groll & Gerhard Tutz, 2017. "Variable selection in discrete survival models including heterogeneity," Lifetime Data Analysis: An International Journal Devoted to Statistical Methods and Applications for Time-to-Event Data, Springer, vol. 23(2), pages 305-338, April.
    18. Kevin He & Yue Wang & Xiang Zhou & Han Xu & Can Huang, 2019. "An improved variable selection procedure for adaptive Lasso in high-dimensional survival analysis," Lifetime Data Analysis: An International Journal Devoted to Statistical Methods and Applications for Time-to-Event Data, Springer, vol. 25(3), pages 569-585, July.
    19. Yue Zhao & Ingrid Van Keilegom & Shanshan Ding, 2022. "Envelopes for censored quantile regression," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 49(4), pages 1562-1585, December.
    20. Das Ujjwal & Ebrahimi Nader, 2018. "A New Method For Covariate Selection In Cox Model," Statistics in Transition New Series, Statistics Poland, vol. 19(2), pages 297-314, June.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:compst:v:39:y:2024:i:3:d:10.1007_s00180-023-01346-4. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.