IDEAS home Printed from https://ideas.repec.org/a/wly/envmet/v32y2021i7ne2682.html
   My bibliography  Save this article

Bayesian variable selection for high‐dimensional rank data

Author

Listed:
  • Can Cui
  • Susheela P. Singh
  • Ana‐Maria Staicu
  • Brian J. Reich

Abstract

The study of microbiomes has become a topic of intense interest in last several decades as the development of new sequencing technologies has made DNA data accessible across disciplines. In this paper, we analyze a global dataset to investigate environmental factors that affect topsoil microbiome. As yet, much associated work has focused on linking indicators of microbial health to specific outcomes in various fields, rather than understanding how external factors may influence the microbiome composition itself. This is partially due to limited statistical methods to model abundance counts. The counts are high‐dimensional, overdispersed, often zero‐inflated, and exhibit complex dependence structures. Additionally, the raw counts are often noisy and compositional, and thus are not directly comparable across samples. Often, practitioners transform the counts to presence–absence indicators, but this transformation discards much of the data. As an alternative, we propose transforming to taxa ranks and develop a Bayesian variable selection model that uses ranks to identify covariates that influence microbiome composition. We show by simulation that the proposed model outperforms competitors across various settings and particular improvement in recall for small magnitude and low prevalence covariates. When applied to the topsoil data, the proposed method identifies several factors that affect microbiome composition.

Suggested Citation

  • Can Cui & Susheela P. Singh & Ana‐Maria Staicu & Brian J. Reich, 2021. "Bayesian variable selection for high‐dimensional rank data," Environmetrics, John Wiley & Sons, Ltd., vol. 32(7), November.
  • Handle: RePEc:wly:envmet:v:32:y:2021:i:7:n:e2682
    DOI: 10.1002/env.2682
    as

    Download full text from publisher

    File URL: https://doi.org/10.1002/env.2682
    Download Restriction: no

    File URL: https://libkey.io/10.1002/env.2682?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. David I. Warton, 2011. "Regularized Sandwich Estimators for Analysis of High-Dimensional Data Using Generalized Estimating Equations," Biometrics, The International Biometric Society, vol. 67(1), pages 116-123, March.
    2. Pratheepa Jeganathan & Susan P. Holmes, 2021. "A Statistical Perspective on the Challenges in Molecular Microbial Biology," Journal of Agricultural, Biological and Environmental Statistics, Springer;The International Biometric Society;American Statistical Association, vol. 26(2), pages 131-160, June.
    3. Bradley J. Barney & Federica Amici & Filippo Aureli & Josep Call & Valen E. Johnson, 2015. "Joint Bayesian Modeling of Binomial and Rank Data for Primate Cognition," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 110(510), pages 573-582, June.
    4. Johnson V. E. & Deaner R. O. & van Schaik C. P., 2002. "Bayesian Analysis of Rank Data With Application to Primate Intelligence Experiments," Journal of the American Statistical Association, American Statistical Association, vol. 97, pages 8-17, March.
    5. Koop, G & Poirier, D J, 1994. "Rank-Ordered Logit Models: An Empirical Analysis of Ontario Voter Preferences," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 9(4), pages 369-388, Oct.-Dec..
    6. Fan Xia & Jun Chen & Wing Kam Fung & Hongzhe Li, 2013. "A Logistic Normal Multinomial Regression Model for Microbiome Compositional Data Analysis," Biometrics, The International Biometric Society, vol. 69(4), pages 1053-1063, December.
    7. Junjie Qin & Yingrui Li & Zhiming Cai & Shenghui Li & Jianfeng Zhu & Fan Zhang & Suisha Liang & Wenwei Zhang & Yuanlin Guan & Dongqian Shen & Yangqing Peng & Dongya Zhang & Zhuye Jie & Wenxian Wu & Yo, 2012. "A metagenome-wide association study of gut microbiota in type 2 diabetes," Nature, Nature, vol. 490(7418), pages 55-60, October.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. McCabe, Christopher & Brazier, John & Gilks, Peter & Tsuchiya, Aki & Roberts, Jennifer & O'Hagan, Anthony & Stevens, Katherine, 2006. "Using rank data to estimate health state utility models," Journal of Health Economics, Elsevier, vol. 25(3), pages 418-431, May.
    2. Duo Jiang & Thomas Sharpton & Yuan Jiang, 2021. "Microbial Interaction Network Estimation via Bias-Corrected Graphical Lasso," Statistics in Biosciences, Springer;International Chinese Statistical Association, vol. 13(2), pages 329-350, July.
    3. Philip Yu, 2000. "Bayesian analysis of order-statistics models for ranking data," Psychometrika, Springer;The Psychometric Society, vol. 65(3), pages 281-299, September.
    4. Fabrício Otávio do Nascimento Pereira & Graciliano Galdino Alves dos Santos & Anderson Borges Serra & Cleuton Lima Miranda & Guilherme da Silva Araújo & Emil José Hernández Ruz, 2023. "Composition of the Anuran Community in a Forest Management Area in Southeastern Amazonia," Land, MDPI, vol. 12(7), pages 1-13, July.
    5. Andreas Wartel & Patrik Lindenfors & Johan Lind, 2019. "Whatever you want: Inconsistent results are the rule, not the exception, in the study of primate brain evolution," PLOS ONE, Public Library of Science, vol. 14(7), pages 1-15, July.
    6. McCabe, C & Brazier, J & Gilks, P & Tsuchiya, A & Roberts, J & O'Hagan, A & Stevens, K, 2004. "Estimating population cardinal health state valuation models from individual ordinal (rank) health state preference data," MPRA Paper 29759, University Library of Munich, Germany.
    7. Kerstin Thriene & Karin B. Michels, 2023. "Human Gut Microbiota Plasticity throughout the Life Course," IJERPH, MDPI, vol. 20(2), pages 1-14, January.
    8. Srinivasan, Arun & Xue, Lingzhou & Zhan, Xiang, 2023. "Identification of microbial features in multivariate regression under false discovery rate control," Computational Statistics & Data Analysis, Elsevier, vol. 181(C).
    9. Alain Carpentier & Karine Latouche & Pierre Rainelli & . Association of Environmental And Resource Economists, 2002. "Food safety in the demand for meat quality : the case of pork chops in France," Post-Print hal-01937048, HAL.
    10. Maja Czerwińska-Rogowska & Karolina Skonieczna-Żydecka & Krzysztof Kaseja & Karolina Jakubczyk & Joanna Palma & Marta Bott-Olejnik & Sławomir Brzozowski & Ewa Stachowska, 2022. "Kitchen Diet vs. Industrial Diets—Impact on Intestinal Barrier Parameters among Stroke Patients," IJERPH, MDPI, vol. 19(10), pages 1-11, May.
    11. Poirier, Dale J., 1996. "A Bayesian analysis of nested logit models," Journal of Econometrics, Elsevier, vol. 75(1), pages 163-181, November.
    12. Daphna Rothschild & Sigal Leviatan & Ariel Hanemann & Yossi Cohen & Omer Weissbrod & Eran Segal, 2022. "An atlas of robust microbiome associations with phenotypic traits based on large-scale cohorts from two continents," PLOS ONE, Public Library of Science, vol. 17(3), pages 1-20, March.
    13. Dennis Fok & Richard Paap & Bram Van Dijk, 2012. "A Rank‐Ordered Logit Model With Unobserved Heterogeneity In Ranking Capabilities," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 27(5), pages 831-846, August.
    14. Mulatu Debalke, Negash, 2011. "Determinants of farmers’ preference for adaptation strategies to climate change: evidence from north shoa zone of Amhara region Ethiopia," MPRA Paper 48753, University Library of Munich, Germany.
    15. Doris R. Pierce & Malcolm McDonald & Lea Merone & Luke Becker & Fintan Thompson & Chris Lewis & Rachael Y. M. Ryan & Sze Fui Hii & Patsy A. Zendejas-Heredia & Rebecca J. Traub & Matthew A. Field & Ton, 2023. "Effect of experimental hookworm infection on insulin resistance in people at risk of type 2 diabetes," Nature Communications, Nature, vol. 14(1), pages 1-11, December.
    16. Jim Parker & Claire O’Brien & Jason Hawrelak & Felice L. Gersh, 2022. "Polycystic Ovary Syndrome: An Evolutionary Adaptation to Lifestyle and the Environment," IJERPH, MDPI, vol. 19(3), pages 1-25, January.
    17. Jakub Stoklosa & Heloise Gibb & David I. Warton, 2014. "Fast forward selection for generalized estimating equations with a large number of predictor variables," Biometrics, The International Biometric Society, vol. 70(1), pages 110-120, March.
    18. Peyhardi, Jean & Fernique, Pierre & Durand, Jean-Baptiste, 2021. "Splitting models for multivariate count data," Journal of Multivariate Analysis, Elsevier, vol. 181(C).
    19. Seung Jin Han & Kyoung Hwa Ha & Ja Young Jeon & Hae Jin Kim & Kwan Woo Lee & Dae Jung Kim, 2015. "Impact of Cadmium Exposure on the Association between Lipopolysaccharide and Metabolic Syndrome," IJERPH, MDPI, vol. 12(9), pages 1-14, September.
    20. Poirier, Dale J., 1997. "Comparing and choosing between two models with a third model in the background," Journal of Econometrics, Elsevier, vol. 78(2), pages 139-151, June.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:wly:envmet:v:32:y:2021:i:7:n:e2682. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: http://www.interscience.wiley.com/jpages/1180-4009/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.