IDEAS home Printed from https://ideas.repec.org/a/spr/compst/v39y2024i4d10.1007_s00180-023-01426-5.html
   My bibliography  Save this article

High dimensional controlled variable selection with model-X knockoffs in the AFT model

Author

Listed:
  • Baihua He

    (University of Science and Technology of China)

  • Di Xia

    (Hubei University)

  • Yingli Pan

    (Hubei University)

Abstract

Interpretability and stability are two important characteristics required for the application of high dimensional data in statistics. Although the former has been favored by many existing forecasting methods to some extent, the latter in the sense of controlling the fraction of wrongly discovered features is still largely underdeveloped. Under the accelerated failure time model, this paper introduces a controlled variable selection method with the general framework of Model-X knockoffs to tackle high dimensional data. We provide theoretical justifications on the asymptotic false discovery rate (FDR) control. The proposed method has attracted significant interest due to its strong control of the FDR while preserving predictive power. Several simulation examples are conducted to assess the finite sample performance with desired interpretability and stability. A real data example from Acute Myeloid Leukemia study is analyzed to demonstrate the utility of the proposed method in practice.

Suggested Citation

  • Baihua He & Di Xia & Yingli Pan, 2024. "High dimensional controlled variable selection with model-X knockoffs in the AFT model," Computational Statistics, Springer, vol. 39(4), pages 1993-2009, June.
  • Handle: RePEc:spr:compst:v:39:y:2024:i:4:d:10.1007_s00180-023-01426-5
    DOI: 10.1007/s00180-023-01426-5
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s00180-023-01426-5
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s00180-023-01426-5?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Yingying Fan & Jinchi Lv & Mahrad Sharifvaghefi & Yoshimasa Uematsu, 2020. "IPAD: Stable Interpretable Forecasting with Knockoffs Inference," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 115(532), pages 1822-1834, December.
    2. Ritesh Ramchandani & Dianne M. Finkelstein & David A. Schoenfeld, 2020. "Estimation for an accelerated failure time model with intermediate states as auxiliary information," Lifetime Data Analysis: An International Journal Devoted to Statistical Methods and Applications for Time-to-Event Data, Springer, vol. 26(1), pages 1-20, January.
    3. Emmanuel Candès & Yingying Fan & Lucas Janson & Jinchi Lv, 2018. "Panning for gold: ‘model‐X’ knockoffs for high dimensional controlled variable selection," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 80(3), pages 551-577, June.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Xie, Zilong & Chen, Yunxiao & von Davier, Matthias & Weng, Haolei, 2023. "Variable selection in latent variable models via knockoffs: an application to international large-scale assessment in education," LSE Research Online Documents on Economics 120812, London School of Economics and Political Science, LSE Library.
    2. Challet, Damien & Bongiorno, Christian & Pelletier, Guillaume, 2021. "Financial factors selection with knockoffs: Fund replication, explanatory and prediction networks," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 580(C).
    3. Pan, Yingli, 2022. "Feature screening and FDR control with knockoff features for ultrahigh-dimensional right-censored data," Computational Statistics & Data Analysis, Elsevier, vol. 173(C).
    4. Guo, Xu & Li, Runze & Liu, Jingyuan & Zeng, Mudong, 2023. "Statistical inference for linear mediation models with high-dimensional mediators and application to studying stock reaction to COVID-19 pandemic," Journal of Econometrics, Elsevier, vol. 235(1), pages 166-179.
    5. Guo, Xu & Li, Runze & Liu, Jingyuan & Zeng, Mudong, 2024. "Reprint: Statistical inference for linear mediation models with high-dimensional mediators and application to studying stock reaction to COVID-19 pandemic," Journal of Econometrics, Elsevier, vol. 239(2).
    6. Pedro Delicado & Daniel Peña, 2023. "Understanding complex predictive models with ghost variables," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 32(1), pages 107-145, March.
    7. Liu, Jingyuan & Sun, Ao & Ke, Yuan, 2024. "A generalized knockoff procedure for FDR control in structural change detection," Journal of Econometrics, Elsevier, vol. 239(2).
    8. Yoshimasa Uematsu & Takashi Yamagata, 2019. "Estimation of Weak Factor Models," ISER Discussion Paper 1053r, Institute of Social and Economic Research, The University of Osaka, revised Mar 2020.
    9. Dong, Yan & Li, Daoji & Zheng, Zemin & Zhou, Jia, 2022. "Reproducible feature selection in high-dimensional accelerated failure time models," Statistics & Probability Letters, Elsevier, vol. 181(C).
    10. Emmanuel Candès & Chiara Sabatti, 2020. "Discussion of the Paper “Prediction, Estimation, and Attribution” by B. Efron," International Statistical Review, International Statistical Institute, vol. 88(S1), pages 60-63, December.
    11. Zihuai He & Linxi Liu & Michael E. Belloy & Yann Guen & Aaron Sossin & Xiaoxia Liu & Xinran Qi & Shiyang Ma & Prashnna K. Gyawali & Tony Wyss-Coray & Hua Tang & Chiara Sabatti & Emmanuel Candès & Mich, 2022. "GhostKnockoff inference empowers identification of putative causal variants in genome-wide association studies," Nature Communications, Nature, vol. 13(1), pages 1-16, December.
    12. Rajchert, Andrew & Keich, Uri, 2023. "Controlling the false discovery rate via competition: Is the +1 needed?," Statistics & Probability Letters, Elsevier, vol. 197(C).
    13. Yumei Ren & Guoqiang Tang & Xin Li & Xuchang Chen, 2023. "A Study of Multifactor Quantitative Stock-Selection Strategies Incorporating Knockoff and Elastic Net-Logistic Regression," Mathematics, MDPI, vol. 11(16), pages 1-20, August.
    14. Yi Liu & Veronika Ročková & Yuexi Wang, 2021. "Variable selection with ABC Bayesian forests," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 83(3), pages 453-481, July.
    15. Laura Freijeiro‐González & Manuel Febrero‐Bande & Wenceslao González‐Manteiga, 2022. "A Critical Review of LASSO and Its Derivatives for Variable Selection Under Dependence Among Covariates," International Statistical Review, International Statistical Institute, vol. 90(1), pages 118-145, April.
    16. Ran Hu & Di Xia & Haoyu Wang & Caixu Xu & Yingli Pan, 2025. "Model-X Knockoffs for high-dimensional controlled variable selection under the proportional hazards model with heterogeneity parameter," Metrika: International Journal for Theoretical and Applied Statistics, Springer, vol. 88(4), pages 501-521, May.
    17. Dae Woong Ham & Jiaze Qiu, 2023. "Hypothesis testing in adaptively sampled data: ART to maximize power beyond iid sampling," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 32(3), pages 998-1037, September.
    18. Jeng, X. Jessie & Chen, Xiongzhi, 2019. "Predictor ranking and false discovery proportion control in high-dimensional regression," Journal of Multivariate Analysis, Elsevier, vol. 171(C), pages 163-175.
    19. L Bottolo & S Richardson, 2019. "Discussion of ‘Gene hunting with hidden Markov model knockoffs’," Biometrika, Biometrika Trust, vol. 106(1), pages 19-22.
    20. Wen, Xin & Li, Yang & Zheng, Zemin, 2024. "Scalable efficient reproducible multi-task learning via data splitting," Statistics & Probability Letters, Elsevier, vol. 208(C).

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:compst:v:39:y:2024:i:4:d:10.1007_s00180-023-01426-5. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.