IDEAS home Printed from https://ideas.repec.org/a/spr/stabio/v17y2025i1d10.1007_s12561-024-09435-8.html
   My bibliography  Save this article

A Comprehensive Performance Comparison Study of Various Statistical Models that Accommodate Challenges of the Gut Microbiome Data

Author

Listed:
  • Morteza Hajihosseini

    (University of Alberta)

  • Payam Amini

    (Keele University)

  • Alireza Saidi-Mehrabad

    (Division of Hydrological Sciences)

  • Nastaran Hajizadeh

    (University of Alberta)

  • Anita L. Kozyrskyj

    (University of Alberta)

  • Irina Dinu

    (University of Alberta)

Abstract

The human gut microbiome refers to trillions of symbiotic bacteria that colonize the human gut after birth, having an essential role in maintaining human health. Various factors can influence the human microbiome, delaying normal gut microbiota’s maturation and leading to the onset of various diseases. Therefore, studying gut microbiome composition offers evidence for early disease detection and intervention opportunities. Stool samples analyzed based on 16S ribosomal RNA via high-throughput sequencing technologies, usually result in the generation of a count table (number of reads) of detected species per sample in a form of amplicon sequence variants. The ASV count data has several inherent challenges, such as over-dispersion, within-samples correlation, and a large number of zeros. Appropriate statistical methods are necessary to measure the effect of important factors on the gut microbial community while addressing specific challenges inherent to the ASV counts. This paper compared the behavior of the most common statistical methods that accommodate the challenges of gut microbiome data in a comprehensive simulation study. Sixty-seven percent of our simulation scenarios indicate that Zero Inflated Negative Binomial model had a lower mean square error as compared to the other methods, and the zero-inflated gaussian mixture model had better statistical power. The real data application on the SKOT Cohorts dataset showed the effect of maternal obesity on the taxon abundance of infants at 9- and 18-months assessments. Our study showed that some of the more recent methods could adequately accommodate the challenges in the gut microbiome data without requiring data transformation or normalization.

Suggested Citation

  • Morteza Hajihosseini & Payam Amini & Alireza Saidi-Mehrabad & Nastaran Hajizadeh & Anita L. Kozyrskyj & Irina Dinu, 2025. "A Comprehensive Performance Comparison Study of Various Statistical Models that Accommodate Challenges of the Gut Microbiome Data," Statistics in Biosciences, Springer;International Chinese Statistical Association, vol. 17(1), pages 216-231, April.
  • Handle: RePEc:spr:stabio:v:17:y:2025:i:1:d:10.1007_s12561-024-09435-8
    DOI: 10.1007/s12561-024-09435-8
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s12561-024-09435-8
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s12561-024-09435-8?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Trivedi, Pravin K. & Zimmer, David M., 2007. "Copula Modeling: An Introduction for Practitioners," Foundations and Trends(R) in Econometrics, now publishers, vol. 1(1), pages 1-111, April.
    2. Lizhen Xu & Andrew D Paterson & Williams Turpin & Wei Xu, 2015. "Assessment and Selection of Competing Models for Zero-Inflated Microbiome Data," PLOS ONE, Public Library of Science, vol. 10(7), pages 1-30, July.
    3. David I Warton & Loïc Thibaut & Yi Alice Wang, 2017. "The PIT-trap—A “model-free” bootstrap procedure for inference about regression models with discrete, multivariate responses," PLOS ONE, Public Library of Science, vol. 12(7), pages 1-18, July.
    4. Smith, Michael & Min, Aleksey & Almeida, Carlos & Czado, Claudia, 2010. "Modeling Longitudinal Data Using a Pair-Copula Decomposition of Serial Dependence," Journal of the American Statistical Association, American Statistical Association, vol. 105(492), pages 1467-1479.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Partha Deb & Pravin K. Trivedi & David M. Zimmer, 2014. "Cost‐Offsets Of Prescription Drug Expenditures: Data Analysis Via A Copula‐Based Bivariate Dynamic Hurdle Model," Health Economics, John Wiley & Sons, Ltd., vol. 23(10), pages 1242-1259, October.
    2. Petropoulos, Fotios & Apiletti, Daniele & Assimakopoulos, Vassilios & Babai, Mohamed Zied & Barrow, Devon K. & Ben Taieb, Souhaib & Bergmeir, Christoph & Bessa, Ricardo J. & Bijak, Jakub & Boylan, Joh, 2022. "Forecasting: theory and practice," International Journal of Forecasting, Elsevier, vol. 38(3), pages 705-871.
      • Fotios Petropoulos & Daniele Apiletti & Vassilios Assimakopoulos & Mohamed Zied Babai & Devon K. Barrow & Souhaib Ben Taieb & Christoph Bergmeir & Ricardo J. Bessa & Jakub Bijak & John E. Boylan & Jet, 2020. "Forecasting: theory and practice," Papers 2012.03854, arXiv.org, revised Jan 2022.
    3. Mozhaeva, Irina, 2022. "Inequalities in utilization of institutional care among older people in Estonia," Health Policy, Elsevier, vol. 126(7), pages 704-714.
    4. Koen Decancq, 2020. "Measuring cumulative deprivation and affluence based on the diagonal dependence diagram," METRON, Springer;Sapienza Università di Roma, vol. 78(2), pages 103-117, August.
    5. Yonatan Berman & Branko Milanovic, 2024. "Homoploutia: Top Labor and Capital Incomes in the United States, 1950–2020," Review of Income and Wealth, International Association for Research in Income and Wealth, vol. 70(3), pages 766-784, September.
    6. Jörg Schwiebert, 2016. "Multinomial choice models based on Archimedean copulas," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 100(3), pages 333-354, July.
    7. Lu Yang & Claudia Czado, 2022. "Two‐part D‐vine copula models for longitudinal insurance claim data," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 49(4), pages 1534-1561, December.
    8. Raj Chetty & Nathaniel Hendren & Patrick Kline & Emmanuel Saez, 2014. "Where is the land of Opportunity? The Geography of Intergenerational Mobility in the United States," The Quarterly Journal of Economics, President and Fellows of Harvard College, vol. 129(4), pages 1553-1623.
    9. Salmon, Claire & Tanguy, Jeremy, 2016. "Rural Electrification and Household Labor Supply: Evidence from Nigeria," World Development, Elsevier, vol. 82(C), pages 48-68.
    10. Marc Gronwald & Janina Ketterer & Stefan Trück, 2011. "The Dependence Structure between Carbon Emission Allowances and Financial Markets - A Copula Analysis," CESifo Working Paper Series 3418, CESifo.
    11. Terence C. Cheng & Pravin K. Trivedi, 2015. "Attrition Bias in Panel Data: A Sheep in Wolf's Clothing? A Case Study Based on the Mabel Survey," Health Economics, John Wiley & Sons, Ltd., vol. 24(9), pages 1101-1117, September.
    12. Genius, Margarita & Stefanou, Spiro E. & Tzouvelekas, Vangelis, 2012. "Measuring productivity growth under factor non-substitution: An application to US steam-electric power generation utilities," European Journal of Operational Research, Elsevier, vol. 220(3), pages 844-852.
    13. Lee, Richard J. & Sener, Ipek N. & Mokhtarian, Patricia L. & Handy, Susan L., 2017. "Relationships between the online and in-store shopping frequency of Davis, California residents," Transportation Research Part A: Policy and Practice, Elsevier, vol. 100(C), pages 40-52.
    14. Chandra Bhat & Ipek Sener, 2009. "A copula-based closed-form binary logit choice model for accommodating spatial correlation across observational units," Journal of Geographical Systems, Springer, vol. 11(3), pages 243-272, September.
    15. Azam, Kazim & Pitt, Michael, 2014. "Bayesian Inference for a Semi-Parametric Copula-based Markov Chain," The Warwick Economics Research Paper Series (TWERPS) 1051, University of Warwick, Department of Economics.
    16. repec:cfe:wpcefa:2013_12 is not listed on IDEAS
    17. Jeffrey Racine, 2015. "Mixed data kernel copulas," Empirical Economics, Springer, vol. 48(1), pages 37-59, February.
    18. Shasha Liu & Toshiyuki Yamamoto & Enjian Yao, 2023. "Joint modeling of mode choice and travel distance with intra-household interactions," Transportation, Springer, vol. 50(5), pages 1527-1552, October.
    19. Erlend Bø & Peter Lambert & Thor Thoresen, 2012. "Horizontal inequity under a dual income tax system: principles and measurement," International Tax and Public Finance, Springer;International Institute of Public Finance, vol. 19(5), pages 625-640, October.
    20. Trinh Thi, Huong & Simioni, Michel & Thomas-Agnan, Christine, 2018. "Decomposition of changes in the consumption of macronutrients in Vietnam between 2004 and 2014," Economics & Human Biology, Elsevier, vol. 31(C), pages 259-275.
    21. J. Christopher Westland, 2015. "Economics of eBay’s buyer protection plan," Financial Innovation, Springer;Southwestern University of Finance and Economics, vol. 1(1), pages 1-20, December.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:stabio:v:17:y:2025:i:1:d:10.1007_s12561-024-09435-8. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.