IDEAS home Printed from https://ideas.repec.org/a/bla/biomet/v79y2023i3p2321-2332.html
   My bibliography  Save this article

Microbiome subcommunity learning with logistic‐tree normal latent Dirichlet allocation

Author

Listed:
  • Patrick LeBlanc
  • Li Ma

Abstract

Mixed‐membership (MM) models such as latent Dirichlet allocation (LDA) have been applied to microbiome compositional data to identify latent subcommunities of microbial species. These subcommunities are informative for understanding the biological interplay of microbes and for predicting health outcomes. However, microbiome compositions typically display substantial cross‐sample heterogeneities in subcommunity compositions—that is, the variability in the proportions of microbes in shared subcommunities across samples—which is not accounted for in prior analyses. As a result, LDA can produce inference, which is highly sensitive to the specification of the number of subcommunities and often divides a single subcommunity into multiple artificial ones. To address this limitation, we incorporate the logistic‐tree normal (LTN) model into LDA to form a new MM model. This model allows cross‐sample variation in the composition of each subcommunity around some “centroid” composition that defines the subcommunity. Incorporation of auxiliary Pólya‐Gamma variables enables a computationally efficient collapsed blocked Gibbs sampler to carry out Bayesian inference under this model. By accounting for such heterogeneity, our new model restores the robustness of the inference in the specification of the number of subcommunities and allows meaningful subcommunities to be identified.

Suggested Citation

  • Patrick LeBlanc & Li Ma, 2023. "Microbiome subcommunity learning with logistic‐tree normal latent Dirichlet allocation," Biometrics, The International Biometric Society, vol. 79(3), pages 2321-2332, September.
  • Handle: RePEc:bla:biomet:v:79:y:2023:i:3:p:2321-2332
    DOI: 10.1111/biom.13772
    as

    Download full text from publisher

    File URL: https://doi.org/10.1111/biom.13772
    Download Restriction: no

    File URL: https://libkey.io/10.1111/biom.13772?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Ian Holmes & Keith Harris & Christopher Quince, 2012. "Dirichlet Multinomial Mixtures: Generative Models for Microbial Metagenomics," PLOS ONE, Public Library of Science, vol. 7(2), pages 1-15, February.
    2. Pratheepa Jeganathan & Susan P. Holmes, 2021. "A Statistical Perspective on the Challenges in Molecular Microbial Biology," Journal of Agricultural, Biological and Environmental Statistics, Springer;The International Biometric Society;American Statistical Association, vol. 26(2), pages 131-160, June.
    3. Nicholas G. Polson & James G. Scott & Jesse Windle, 2013. "Bayesian Inference for Logistic Models Using Pólya--Gamma Latent Variables," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 108(504), pages 1339-1349, December.
    4. Jialiang Mao & Yuhan Chen & Li Ma, 2020. "Bayesian Graphical Compositional Regression for Microbiome Data," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 115(530), pages 610-624, April.
    5. Jingru Zhang & Wei Lin, 2019. "Scalable estimation and regularization for the logistic normal multinomial model," Biometrics, The International Biometric Society, vol. 75(4), pages 1098-1108, December.
    6. Tao Wang & Hongyu Zhao, 2017. "A Dirichlet-tree multinomial regression model for associating dietary nutrients with gut microorganisms," Biometrics, The International Biometric Society, vol. 73(3), pages 792-801, September.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Zhao, Xin & Zhang, Jingru & Lin, Wei, 2023. "Clustering multivariate count data via Dirichlet-multinomial network fusion," Computational Statistics & Data Analysis, Elsevier, vol. 179(C).
    2. Yaru Song & Hongyu Zhao & Tao Wang, 2020. "An adaptive independence test for microbiome community data," Biometrics, The International Biometric Society, vol. 76(2), pages 414-426, June.
    3. Matthew D. Koslovsky, 2023. "A Bayesian zero‐inflated Dirichlet‐multinomial regression model for multivariate compositional count data," Biometrics, The International Biometric Society, vol. 79(4), pages 3239-3251, December.
    4. Buddhavarapu, Prasad & Bansal, Prateek & Prozzi, Jorge A., 2021. "A new spatial count data model with time-varying parameters," Transportation Research Part B: Methodological, Elsevier, vol. 150(C), pages 566-586.
    5. Niko Hauzenberger & Florian Huber, 2020. "Model instability in predictive exchange rate regressions," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 39(2), pages 168-186, March.
    6. Anindya Bhadra & Arvind Rao & Veerabhadran Baladandayuthapani, 2018. "Inferring network structure in non†normal and mixed discrete†continuous genomic data," Biometrics, The International Biometric Society, vol. 74(1), pages 185-195, March.
    7. Haoying Wang & Guohui Wu, 2022. "Modeling discrete choices with large fine-scale spatial data: opportunities and challenges," Journal of Geographical Systems, Springer, vol. 24(3), pages 325-351, July.
    8. Sahar Zarmehri & Ephraim M. Hanks & Lin Lin, 2021. "A Sample Covariance-Based Approach For Spatial Binary Data," Journal of Agricultural, Biological and Environmental Statistics, Springer;The International Biometric Society;American Statistical Association, vol. 26(2), pages 220-249, June.
    9. Carter Allen & Yuzhou Chang & Brian Neelon & Won Chang & Hang J. Kim & Zihai Li & Qin Ma & Dongjun Chung, 2023. "A Bayesian multivariate mixture model for high throughput spatial transcriptomics," Biometrics, The International Biometric Society, vol. 79(3), pages 1775-1787, September.
    10. Laura Anderlucci & Cinzia Viroli, 2020. "Mixtures of Dirichlet-Multinomial distributions for supervised and unsupervised classification of short text data," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 14(4), pages 759-770, December.
    11. Wang, Xin & Roy, Vivekananda, 2018. "Analysis of the Pólya-Gamma block Gibbs sampler for Bayesian logistic linear mixed models," Statistics & Probability Letters, Elsevier, vol. 137(C), pages 251-256.
    12. Kihyun Lee & Sebastien Raguideau & Kimmo Sirén & Francesco Asnicar & Fabio Cumbo & Falk Hildebrand & Nicola Segata & Chang-Jun Cha & Christopher Quince, 2023. "Population-level impacts of antibiotic usage on the human gut microbiome," Nature Communications, Nature, vol. 14(1), pages 1-19, December.
    13. Li He & Yu-Bo Wang & William C. Bridges & Zhulin He & S. Megan Che, 2023. "Bayesian Framework for Causal Inference with Principal Stratification and Clusters," Statistics in Biosciences, Springer;International Chinese Statistical Association, vol. 15(1), pages 114-140, April.
    14. Matthew W. Wheeler, 2019. "Bayesian additive adaptive basis tensor product models for modeling high dimensional surfaces: an application to high‐throughput toxicity testing," Biometrics, The International Biometric Society, vol. 75(1), pages 193-201, March.
    15. Toryn L. J. Schafer & Christopher K. Wikle & Jay A. VonBank & Bart M. Ballard & Mitch D. Weegman, 2020. "A Bayesian Markov Model with Pólya-Gamma Sampling for Estimating Individual Behavior Transition Probabilities from Accelerometer Classifications," Journal of Agricultural, Biological and Environmental Statistics, Springer;The International Biometric Society;American Statistical Association, vol. 25(3), pages 365-382, September.
    16. Paul A. Parker & Scott H. Holan, 2023. "A Bayesian functional data model for surveys collected under informative sampling with application to mortality estimation using NHANES," Biometrics, The International Biometric Society, vol. 79(2), pages 1397-1408, June.
    17. Reem Aljarallah & Samer A Kharroubi, 2021. "Use of Bayesian Markov Chain Monte Carlo Methods to Model Kuwait Medical Genetic Center Data: An Application to Down Syndrome and Mental Retardation," Mathematics, MDPI, vol. 9(3), pages 1-11, January.
    18. James Joseph Balamuta & Steven Andrew Culpepper, 2022. "Exploratory Restricted Latent Class Models with Monotonicity Requirements under PÒLYA–GAMMA Data Augmentation," Psychometrika, Springer;The Psychometric Society, vol. 87(3), pages 903-945, September.
    19. Zhehan Jiang & Jonathan Templin, 2019. "Gibbs Samplers for Logistic Item Response Models via the Pólya–Gamma Distribution: A Computationally Efficient Data-Augmentation Strategy," Psychometrika, Springer;The Psychometric Society, vol. 84(2), pages 358-374, June.
    20. Bansal, Prateek & Krueger, Rico & Graham, Daniel J., 2021. "Fast Bayesian estimation of spatial count data models," Computational Statistics & Data Analysis, Elsevier, vol. 157(C).

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:biomet:v:79:y:2023:i:3:p:2321-2332. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: http://www.blackwellpublishing.com/journal.asp?ref=0006-341X .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.