IDEAS home Printed from https://ideas.repec.org/a/spr/compst/v39y2024i1d10.1007_s00180-022-01268-7.html
   My bibliography  Save this article

A robust knockoff filter for sparse regression analysis of microbiome compositional data

Author

Listed:
  • Gianna Serafina Monti

    (University of Milano Bicocca)

  • Peter Filzmoser

    (TU Wien)

Abstract

Microbiome data analysis often relies on the identification of a subset of potential biomarkers associated with a clinical outcome of interest. Robust ZeroSum regression, an elastic-net penalized compositional regression built on the least trimmed squares estimator, is a variable selection procedure capable to cope with the high dimensionality of these data, their compositional nature, and, at the same time, it guarantees robustness against the presence of outliers. The necessity of discovering “true” effects and to improve clinical research quality and reproducibility has motivated us to propose a two-step robust compositional knockoff filter procedure, which allows selecting the set of relevant biomarkers, among the many measured features having a nonzero effect on the response, controlling the expected fraction of false positives. We demonstrate the effectiveness of our proposal in an extensive simulation study, and illustrate its usefulness in an application to intestinal microbiome analysis.

Suggested Citation

  • Gianna Serafina Monti & Peter Filzmoser, 2024. "A robust knockoff filter for sparse regression analysis of microbiome compositional data," Computational Statistics, Springer, vol. 39(1), pages 271-288, February.
  • Handle: RePEc:spr:compst:v:39:y:2024:i:1:d:10.1007_s00180-022-01268-7
    DOI: 10.1007/s00180-022-01268-7
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s00180-022-01268-7
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s00180-022-01268-7?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. M Sesia & C Sabatti & E J Candès, 2019. "Rejoinder: ‘Gene hunting with hidden Markov model knockoffs’," Biometrika, Biometrika Trust, vol. 106(1), pages 35-45.
    2. Wei Lin & Pixu Shi & Rui Feng & Hongzhe Li, 2014. "Variable selection in regression with compositional covariates," Biometrika, Biometrika Trust, vol. 101(4), pages 785-797.
    3. Stephen Bates & Emmanuel Candès & Lucas Janson & Wenshuo Wang, 2021. "Metropolized Knockoff Sampling," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 116(535), pages 1413-1427, July.
    4. Arun Srinivasan & Lingzhou Xue & Xiang Zhan, 2021. "Compositional knockoff filter for high‐dimensional regression analysis of microbiome data," Biometrics, The International Biometric Society, vol. 77(3), pages 984-995, September.
    5. M Sesia & C Sabatti & E J Candès, 2019. "Gene hunting with hidden Markov model knockoffs," Biometrika, Biometrika Trust, vol. 106(1), pages 1-18.
    6. Jianqing Fan & Jinchi Lv, 2008. "Sure independence screening for ultrahigh dimensional feature space," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 70(5), pages 849-911, November.
    7. Runze Li & Wei Zhong & Liping Zhu, 2012. "Feature Screening via Distance Correlation Learning," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 107(499), pages 1129-1139, September.
    8. Xiaoyi Zhu & Yuhong Yang, 2015. "Variable selection after screening: with or without data splitting?," Computational Statistics, Springer, vol. 30(1), pages 191-203, March.
    9. Jacob T. Nearing & Gavin M. Douglas & Molly G. Hayes & Jocelyn MacDonald & Dhwani K. Desai & Nicole Allward & Casey M. A. Jones & Robyn J. Wright & Akhilesh S. Dhanani & André M. Comeau & Morgan G. I., 2022. "Microbiome differential abundance methods produce different results across 38 datasets," Nature Communications, Nature, vol. 13(1), pages 1-16, December.
    10. John D. Storey, 2002. "A direct approach to false discovery rates," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 64(3), pages 479-498, August.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Srinivasan, Arun & Xue, Lingzhou & Zhan, Xiang, 2023. "Identification of microbial features in multivariate regression under false discovery rate control," Computational Statistics & Data Analysis, Elsevier, vol. 181(C).
    2. Panxu Yuan & Yinfei Kong & Gaorong Li, 2024. "FDR control and power analysis for high-dimensional logistic regression via StabKoff," Statistical Papers, Springer, vol. 65(5), pages 2719-2749, July.
    3. Arun Srinivasan & Lingzhou Xue & Xiang Zhan, 2021. "Compositional knockoff filter for high‐dimensional regression analysis of microbiome data," Biometrics, The International Biometric Society, vol. 77(3), pages 984-995, September.
    4. Yi Liu & Qihua Wang, 2018. "Model-free feature screening for ultrahigh-dimensional data conditional on some variables," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 70(2), pages 283-301, April.
    5. Jing Zhang & Qihua Wang & Xuan Wang, 2022. "Surrogate-variable-based model-free feature screening for survival data under the general censoring mechanism," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 74(2), pages 379-397, April.
    6. Craig, Sarah J.C. & Kenney, Ana M. & Lin, Junli & Paul, Ian M. & Birch, Leann L. & Savage, Jennifer S. & Marini, Michele E. & Chiaromonte, Francesca & Reimherr, Matthew L. & Makova, Kateryna D., 2023. "Constructing a polygenic risk score for childhood obesity using functional data analysis," Econometrics and Statistics, Elsevier, vol. 25(C), pages 66-86.
    7. Zhaoyu Xing & Yang Wan & Juan Wen & Wei Zhong, 2024. "GOLFS: feature selection via combining both global and local information for high dimensional clustering," Computational Statistics, Springer, vol. 39(5), pages 2651-2675, July.
    8. Wang, Christina Dan & Chen, Zhao & Lian, Yimin & Chen, Min, 2022. "Asset selection based on high frequency Sharpe ratio," Journal of Econometrics, Elsevier, vol. 227(1), pages 168-188.
    9. Dong, Yuexiao & Yu, Zhou & Zhu, Liping, 2020. "Model-free variable selection for conditional mean in regression," Computational Statistics & Data Analysis, Elsevier, vol. 152(C).
    10. Shuaishuai Chen & Jun Lu, 2023. "Quantile-Composited Feature Screening for Ultrahigh-Dimensional Data," Mathematics, MDPI, vol. 11(10), pages 1-21, May.
    11. Linh H. Nghiem & Francis K.C. Hui & Samuel Müller & A.H. Welsh, 2023. "Screening methods for linear errors‐in‐variables models in high dimensions," Biometrics, The International Biometric Society, vol. 79(2), pages 926-939, June.
    12. Hung Hung & Su‐Yun Huang, 2019. "Sufficient dimension reduction via random‐partitions for the large‐p‐small‐n problem," Biometrics, The International Biometric Society, vol. 75(1), pages 245-255, March.
    13. Jing Zhang & Haibo Zhou & Yanyan Liu & Jianwen Cai, 2021. "Feature screening for case‐cohort studies with failure time outcome," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 48(1), pages 349-370, March.
    14. Loann David Denis Desboulets, 2018. "A Review on Variable Selection in Regression Analysis," Econometrics, MDPI, vol. 6(4), pages 1-27, November.
    15. Jiujing Wu & Hengjian Cui, 2024. "Model-free feature screening based on Hellinger distance for ultrahigh dimensional data," Statistical Papers, Springer, vol. 65(9), pages 5903-5930, December.
    16. Ma, Xuejun & Zhang, Jingxiao, 2016. "Robust model-free feature screening via quantile correlation," Journal of Multivariate Analysis, Elsevier, vol. 143(C), pages 472-480.
    17. Chuan Hong & Yang Ning & Shuang Wang & Hao Wu & Raymond J. Carroll & Yong Chen, 2017. "PLEMT: A Novel Pseudolikelihood-Based EM Test for Homogeneity in Generalized Exponential Tilt Mixture Models," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 112(520), pages 1393-1404, October.
    18. Jingyuan Liu & Runze Li & Rongling Wu, 2014. "Feature Selection for Varying Coefficient Models With Ultrahigh-Dimensional Covariates," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 109(505), pages 266-274, March.
    19. Zhang, Jing & Wang, Qihua & Kang, Jian, 2020. "Feature screening under missing indicator imputation with non-ignorable missing response," Computational Statistics & Data Analysis, Elsevier, vol. 149(C).
    20. Nikolaos Ignatiadis & Wolfgang Huber, 2021. "Covariate powered cross‐weighted multiple testing," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 83(4), pages 720-751, September.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:compst:v:39:y:2024:i:1:d:10.1007_s00180-022-01268-7. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.