IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1006329.html
   My bibliography  Save this article

A marginalized two-part Beta regression model for microbiome compositional data

Author

Listed:
  • Haitao Chai
  • Hongmei Jiang
  • Lu Lin
  • Lei Liu

Abstract

In microbiome studies, an important goal is to detect differential abundance of microbes across clinical conditions and treatment options. However, the microbiome compositional data (quantified by relative abundance) are highly skewed, bounded in [0, 1), and often have many zeros. A two-part model is commonly used to separate zeros and positive values explicitly by two submodels: a logistic model for the probability of a specie being present in Part I, and a Beta regression model for the relative abundance conditional on the presence of the specie in Part II. However, the regression coefficients in Part II cannot provide a marginal (unconditional) interpretation of covariate effects on the microbial abundance, which is of great interest in many applications. In this paper, we propose a marginalized two-part Beta regression model which captures the zero-inflation and skewness of microbiome data and also allows investigators to examine covariate effects on the marginal (unconditional) mean. We demonstrate its practical performance using simulation studies and apply the model to a real metagenomic dataset on mouse skin microbiota. We find that under the proposed marginalized model, without loss in power, the likelihood ratio test performs better in controlling the type I error than those under conventional methods.Author summary: Semi-continuous compositional data are typically analyzed using two-part models which separately describe the probability of zero values and the distribution of positive values. The second part of the model provides a conditional interpretation of covariate effects on the positive response. However, it is of great interest in many applications to assess the covariate effect on the marginal mean of the response. For this purpose, we propose a marginalized two-part model by reparameterizing the marginal mean in Part II. We show that the proposed marginalized two-part model outperforms conventional methods by simulation studies in terms of controlling the Type I error and maximizing the power. We apply our method to a microbiota dataset, and find consistent results with our simulation studies.

Suggested Citation

  • Haitao Chai & Hongmei Jiang & Lu Lin & Lei Liu, 2018. "A marginalized two-part Beta regression model for microbiome compositional data," PLOS Computational Biology, Public Library of Science, vol. 14(7), pages 1-16, July.
  • Handle: RePEc:plo:pcbi00:1006329
    DOI: 10.1371/journal.pcbi.1006329
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1006329
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1006329&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1006329?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. J. L. Scealy & A. H. Welsh, 2011. "Regression for compositional data by using distributions defined on the hypersphere," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 73(3), pages 351-375, June.
    2. Ospina, Raydonal & Ferrari, Silvia L.P., 2012. "A general class of zero-or-one inflated beta regression models," Computational Statistics & Data Analysis, Elsevier, vol. 56(6), pages 1609-1623.
    3. Liu, Lei & Strawderman, Robert L. & Cowen, Mark E. & Shih, Ya-Chen T., 2010. "A flexible two-part random effects model for correlated medical costs," Journal of Health Economics, Elsevier, vol. 29(1), pages 110-123, January.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Tugba Akkaya Hocagil & Richard J. Cook & Sandra W. Jacobson & Joseph L. Jacobson & Louise M. Ryan, 2021. "Propensity score analysis for a semi‐continuous exposure variable: a study of gestational alcohol exposure and childhood cognition," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 184(4), pages 1390-1413, October.
    2. Haixiang Zhang & Jun Chen & Zhigang Li & Lei Liu, 2021. "Testing for Mediation Effect with Application to Human Microbiome Data," Statistics in Biosciences, Springer;International Chinese Statistical Association, vol. 13(2), pages 313-328, July.
    3. Jian Wang & Cielito C. Reyes-Gibby & Sanjay Shete, 2021. "An Approach to Analyze Longitudinal Zero-Inflated Microbiome Count Data Using Two-Stage Mixed Effects Models," Statistics in Biosciences, Springer;International Chinese Statistical Association, vol. 13(2), pages 267-290, July.
    4. Meier Richard & Thompson Jeffrey A. & Koestler Devin C. & Chung Mei & Zhao Naisi & Michaud Dominique S. & Kelsey Karl T., 2019. "A Bayesian framework for identifying consistent patterns of microbial abundance between body sites," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 18(6), pages 1-15, December.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Enrico Bergamini & Georg Zachmann, 2020. "Exploring EU’s Regional Potential in Low-Carbon Technologies," Sustainability, MDPI, vol. 13(1), pages 1-28, December.
    2. Gourieroux, Christian & Lu, Yang, 2019. "Least impulse response estimator for stress test exercises," Journal of Banking & Finance, Elsevier, vol. 103(C), pages 62-77.
    3. Guillermo Martínez-Flórez & Artur J. Lemonte & Germán Moreno-Arenas & Roger Tovar-Falón, 2022. "The Bivariate Unit-Sinh-Normal Distribution and Its Related Regression Model," Mathematics, MDPI, vol. 10(17), pages 1-26, August.
    4. Xiongtao Dai & Zhenhua Lin & Hans‐Georg Müller, 2021. "Modeling sparse longitudinal data on Riemannian manifolds," Biometrics, The International Biometric Society, vol. 77(4), pages 1328-1341, December.
    5. Lucio Masserini & Matilde Bini & Monica Pratesi, 2017. "Effectiveness of non-selective evaluation test scores for predicting first-year performance in university career: a zero-inflated beta regression approach," Quality & Quantity: International Journal of Methodology, Springer, vol. 51(2), pages 693-708, March.
    6. Harald Oberhofer & Michael Pfaffermayr, 2014. "Two-Part Models for Fractional Responses Defined as Ratios of Integers," Econometrics, MDPI, vol. 2(3), pages 1-22, September.
    7. Jinji, Naoto & Zhang, Xingyuan & Haruna, Shoji, 2019. "Does a firm with higher Tobin’s q prefer foreign direct investment to foreign outsourcing?," The North American Journal of Economics and Finance, Elsevier, vol. 50(C).
    8. Ricardo Ocaña-Riola & Carmen Pérez-Romero & Mª Isabel Ortega-Díaz & José Jesús Martín-Martín, 2021. "Multilevel Zero-One Inflated Beta Regression Model for the Analysis of the Relationship between Exogenous Health Variables and Technical Efficiency in the Spanish National Health System Hospitals," IJERPH, MDPI, vol. 18(19), pages 1-18, September.
    9. Silvia Noirjean & Mario Biggeri & Laura Forastiere & Fabrizia Mealli & Maria Nannini, 2023. "Estimating causal effects of community health financing via principal stratification," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 32(4), pages 1317-1350, October.
    10. Carlos Rojas & Bernardo Riffo & Ernesto Guerra, 2023. "Word Retrieval After the 80s: Evidence From Specific and Multiple Words Naming Tasks," SAGE Open, , vol. 13(2), pages 21582440231, May.
    11. Napoleón Vargas Jurado & Kent M. Eskridge & Stephen D. Kachman & Ronald M. Lewis, 2018. "Using a Bayesian Hierarchical Linear Mixing Model to Estimate Botanical Mixtures," Journal of Agricultural, Biological and Environmental Statistics, Springer;The International Biometric Society;American Statistical Association, vol. 23(2), pages 190-207, June.
    12. Ehsan Bahrami Samani & Elham Tabrizi, 2023. "Joint Linear Modeling of Mixed Data and Its Application to Email Analysis," Sankhya B: The Indian Journal of Statistics, Springer;Indian Statistical Institute, vol. 85(1), pages 175-209, May.
    13. Cristine Rauber & Francisco Cribari-Neto & Fábio M. Bayer, 2020. "Improved testing inferences for beta regressions with parametric mean link function," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 104(4), pages 687-717, December.
    14. Aaron J. Staples & Trey Malone & J. Robert Sirrine, 2021. "Hopping on the localness craze: What brewers want from state‐grown hops," Managerial and Decision Economics, John Wiley & Sons, Ltd., vol. 42(2), pages 463-473, March.
    15. Murilo Wohlgemuth & Carlos Ernani Fries & Ângelo Márcio Oliveira Sant’Anna & Ricardo Giglio & Diego Castro Fettermann, 2020. "Assessment of the technical efficiency of Brazilian logistic operators using data envelopment analysis and one inflated beta regression," Annals of Operations Research, Springer, vol. 286(1), pages 703-717, March.
    16. Reboul, E. & Guérin, I. & Nordman, C.J., 2021. "The gender of debt and credit: Insights from rural Tamil Nadu," World Development, Elsevier, vol. 142(C).
    17. Tsagris, Michail & Preston, Simon & T.A. Wood, Andrew, 2016. "Improved classi cation for compositional data using the $\alpha$-transformation," MPRA Paper 67657, University Library of Munich, Germany.
    18. Y. T. Hwang & C. H. Huang & W. L. Yeh & Y. D. Shen, 2017. "The weighted general linear model for longitudinal medical cost data – an application in colorectal cancer," Journal of Applied Statistics, Taylor & Francis Journals, vol. 44(2), pages 288-307, January.
    19. Diego Ramos Canterle & Fábio Mariano Bayer, 2019. "Variable dispersion beta regressions with parametric link functions," Statistical Papers, Springer, vol. 60(5), pages 1541-1567, October.
    20. Yury R. Benites & Vicente G. Cancho & Edwin M. M. Ortega & Roberto Vila & Gauss M. Cordeiro, 2022. "A New Regression Model on the Unit Interval: Properties, Estimation, and Application," Mathematics, MDPI, vol. 10(17), pages 1-17, September.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1006329. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.