IDEAS home Printed from https://ideas.repec.org/a/bla/jorssc/v70y2021i4p961-979.html
   My bibliography  Save this article

A Bayesian nonparametric analysis for zero‐inflated multivariate count data with application to microbiome study

Author

Listed:
  • Kurtis Shuler
  • Samuel Verbanic
  • Irene A. Chen
  • Juhee Lee

Abstract

High‐throughput sequencing technology has enabled researchers to profile microbial communities from a variety of environments, but analysis of multivariate taxon count data remains challenging. We develop a Bayesian nonparametric (BNP) regression model with zero inflation to analyse multivariate count data from microbiome studies. A BNP approach flexibly models microbial associations with covariates, such as environmental factors and clinical characteristics. The model produces estimates for probability distributions which relate microbial diversity and differential abundance to covariates, and facilitates community comparisons beyond those provided by simple statistical tests. We compare the model to simpler models and popular alternatives in simulation studies, showing, in addition to these additional community‐level insights, it yields superior parameter estimates and model fit in various settings. The model's utility is demonstrated by applying it to a chronic wound microbiome data set and a Human Microbiome Project data set, where it is used to compare microbial communities present in different environments.

Suggested Citation

  • Kurtis Shuler & Samuel Verbanic & Irene A. Chen & Juhee Lee, 2021. "A Bayesian nonparametric analysis for zero‐inflated multivariate count data with application to microbiome study," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 70(4), pages 961-979, August.
  • Handle: RePEc:bla:jorssc:v:70:y:2021:i:4:p:961-979
    DOI: 10.1111/rssc.12493
    as

    Download full text from publisher

    File URL: https://doi.org/10.1111/rssc.12493
    Download Restriction: no

    File URL: https://libkey.io/10.1111/rssc.12493?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. De Iorio, Maria & Muller, Peter & Rosner, Gary L. & MacEachern, Steven N., 2004. "An ANOVA Model for Dependent Random Measures," Journal of the American Statistical Association, American Statistical Association, vol. 99, pages 205-215, January.
    2. Luis E. Nieto-Barajas & Peter Müller & Yuan Ji & Yiling Lu & Gordon B. Mills, 2012. "A Time-Series DDP for Functional Proteomics Profiles," Biometrics, The International Biometric Society, vol. 68(3), pages 859-868, September.
    3. Maria De Iorio & Wesley O. Johnson & Peter Müller & Gary L. Rosner, 2009. "Bayesian Nonparametric Nonproportional Hazards Survival Modeling," Biometrics, The International Biometric Society, vol. 65(3), pages 762-771, September.
    4. Gelfand, Alan E. & Kottas, Athanasios & MacEachern, Steven N., 2005. "Bayesian Nonparametric Spatial Modeling With Dirichlet Process Mixing," Journal of the American Statistical Association, American Statistical Association, vol. 100, pages 1021-1035, September.
    5. Griffin, J.E. & Steel, M.F.J., 2011. "Stick-breaking autoregressive processes," Journal of Econometrics, Elsevier, vol. 162(2), pages 383-396, June.
    6. Jialiang Mao & Yuhan Chen & Li Ma, 2020. "Bayesian Graphical Compositional Regression for Microbiome Data," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 115(530), pages 610-624, April.
    7. Russell B. Millar, 2009. "Comparison of Hierarchical Bayesian Models for Overdispersed Count Data using DIC and Bayes' Factors," Biometrics, The International Biometric Society, vol. 65(3), pages 962-969, September.
    8. David J. Spiegelhalter & Nicola G. Best & Bradley P. Carlin & Angelika Van Der Linde, 2002. "Bayesian measures of model complexity and fit," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 64(4), pages 583-639, October.
    9. Paul J McMurdie & Susan Holmes, 2014. "Waste Not, Want Not: Why Rarefying Microbiome Data Is Inadmissible," PLOS Computational Biology, Public Library of Science, vol. 10(4), pages 1-12, April.
    10. Jason A. Duan & Michele Guindani & Alan E. Gelfand, 2007. "Generalized Spatial Dirichlet Process Models," Biometrika, Biometrika Trust, vol. 94(4), pages 809-825.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Chen, Kunzhi & Shen, Weining & Zhu, Weixuan, 2023. "Covariate dependent Beta-GOS process," Computational Statistics & Data Analysis, Elsevier, vol. 180(C).
    2. Bhattacharya, Indrabati & Ghosal, Subhashis, 2021. "Bayesian multivariate quantile regression using Dependent Dirichlet Process prior," Journal of Multivariate Analysis, Elsevier, vol. 185(C).
    3. Stefano Favaro & Antonio Lijoi & Igor Prünster, 2012. "On the stick–breaking representation of normalized inverse Gaussian priors," DEM Working Papers Series 008, University of Pavia, Department of Economics and Management.
    4. Bassetti, Federico & Casarin, Roberto & Leisen, Fabrizio, 2014. "Beta-product dependent Pitman–Yor processes for Bayesian inference," Journal of Econometrics, Elsevier, vol. 180(1), pages 49-72.
    5. repec:jss:jstsof:40:i05 is not listed on IDEAS
    6. Kassandra Fronczyk & Athanasios Kottas, 2017. "Risk Assessment for Toxicity Experiments with Discrete and Continuous Outcomes: A Bayesian Nonparametric Approach," Journal of Agricultural, Biological and Environmental Statistics, Springer;The International Biometric Society;American Statistical Association, vol. 22(4), pages 585-601, December.
    7. Zahra Barzegar & Firoozeh Rivaz, 2020. "A scalable Bayesian nonparametric model for large spatio-temporal data," Computational Statistics, Springer, vol. 35(1), pages 153-173, March.
    8. Igor Prünster & Matteo Ruggiero, 2011. "A Bayesian nonparametric approach to modeling market share dynamics," Carlo Alberto Notebooks 217, Collegio Carlo Alberto.
    9. Bassetti, Federico & Casarin, Roberto & Leisen, Fabrizio, 2011. "Beta-product Poisson-Dirichlet Processes," DES - Working Papers. Statistics and Econometrics. WS 12160, Universidad Carlos III de Madrid. Departamento de Estadística.
    10. Pati, Debdeep & Dunson, David B. & Tokdar, Surya T., 2013. "Posterior consistency in conditional distribution estimation," Journal of Multivariate Analysis, Elsevier, vol. 116(C), pages 456-472.
    11. Abel Rodriguez & Enrique ter Horst, 2008. "Measuring expectations in options markets: An application to the SP500 index," Papers 0901.0033, arXiv.org.
    12. Weixuan Zhu & Fabrizio Leisen, 2015. "A multivariate extension of a vector of two-parameter Poisson-Dirichlet processes," Journal of Nonparametric Statistics, Taylor & Francis Journals, vol. 27(1), pages 89-105, March.
    13. Cornwall, Gary J. & Parent, Olivier, 2017. "Embracing heterogeneity: the spatial autoregressive mixture model," Regional Science and Urban Economics, Elsevier, vol. 64(C), pages 148-161.
    14. Bruno Scarpa & David B. Dunson, 2009. "Bayesian Hierarchical Functional Data Analysis Via Contaminated Informative Priors," Biometrics, The International Biometric Society, vol. 65(3), pages 772-780, September.
    15. Yushu Shi & Purushottam Laud & Joan Neuner, 2021. "A dependent Dirichlet process model for survival data with competing risks," Lifetime Data Analysis: An International Journal Devoted to Statistical Methods and Applications for Time-to-Event Data, Springer, vol. 27(1), pages 156-176, January.
    16. Shamsi Zamenjani, Azam, 2021. "Do financial variables help predict the conditional distribution of the market portfolio?," Journal of Empirical Finance, Elsevier, vol. 62(C), pages 327-345.
    17. Kathryn M. Irvine & T. J. Rodhouse & Ilai N. Keren, 2016. "Extending Ordinal Regression with a Latent Zero-Augmented Beta Distribution," Journal of Agricultural, Biological and Environmental Statistics, Springer;The International Biometric Society;American Statistical Association, vol. 21(4), pages 619-640, December.
    18. Mahdi Hosseinpouri & Majid Jafari Khaledi, 2019. "An area-specific stick breaking process for spatial data," Statistical Papers, Springer, vol. 60(1), pages 199-221, February.
    19. Iraj Kazemi & Fatemeh Hassanzadeh, 2021. "Marginalized random-effects models for clustered binomial data through innovative link functions," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 105(2), pages 197-228, June.
    20. Li, Yong & Yu, Jun & Zeng, Tao, 2018. "Integrated Deviance Information Criterion for Latent Variable Models," Economics and Statistics Working Papers 6-2018, Singapore Management University, School of Economics.
    21. Juhee Lee & Peter F. Thall & Bora Lim & Pavlos Msaouel, 2022. "Utility‐based Bayesian personalized treatment selection for advanced breast cancer," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 71(5), pages 1605-1622, November.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:jorssc:v:70:y:2021:i:4:p:961-979. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: https://edirc.repec.org/data/rssssea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.