IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1011659.html
   My bibliography  Save this article

Addressing erroneous scale assumptions in microbe and gene set enrichment analysis

Author

Listed:
  • Kyle C McGovern
  • Michelle Pistner Nixon
  • Justin D Silverman

Abstract

By applying Differential Set Analysis (DSA) to sequence count data, researchers can determine whether groups of microbes or genes are differentially enriched. Yet sequence count data suffer from a scale limitation: these data lack information about the scale (i.e., size) of the biological system under study, leading some authors to call these data compositional (i.e., proportional). In this article, we show that commonly used DSA methods that rely on normalization make strong, implicit assumptions about the unmeasured system scale. We show that even small errors in these scale assumptions can lead to positive predictive values as low as 9%. To address this problem, we take three novel approaches. First, we introduce a sensitivity analysis framework to identify when modeling results are robust to such errors and when they are suspect. Unlike standard benchmarking studies, this framework does not require ground-truth knowledge and can therefore be applied to both simulated and real data. Second, we introduce a statistical test that provably controls Type-I error at a nominal rate despite errors in scale assumptions. Finally, we discuss how the impact of scale limitations depends on a researcher’s scientific goals and provide tools that researchers can use to evaluate whether their goals are at risk from erroneous scale assumptions. Overall, the goal of this article is to catalyze future research into the impact of scale limitations in analyses of sequence count data; to illustrate that scale limitations can lead to inferential errors in practice; yet to also show that rigorous and reproducible scale reliant inference is possible if done carefully.Author summary: A common task in the analysis of DNA sequence count data is to determine whether sets of biologically related genes or microbes are differentially enriched between two experimental conditions (Differential Set Analysis; DSA). Yet DSA can be confounded by the non-biological (i.e., technical) variation in sequencing depth. To address this issue, many researchers use normalization techniques to remove this variation. The choice of normalization can dominate modeling results yet we lack tools for properly validating this decision. Here we develop statistical and computational tools that allow researchers to quantify the robustness of analytical results to the choice of normalization. These methods aim to improve the rigor and reproducibility of commonly performed set enrichment analyses.

Suggested Citation

  • Kyle C McGovern & Michelle Pistner Nixon & Justin D Silverman, 2023. "Addressing erroneous scale assumptions in microbe and gene set enrichment analysis," PLOS Computational Biology, Public Library of Science, vol. 19(11), pages 1-16, November.
  • Handle: RePEc:plo:pcbi00:1011659
    DOI: 10.1371/journal.pcbi.1011659
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1011659
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1011659&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1011659?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Dvir Aran & Roman Camarda & Justin Odegaard & Hyojung Paik & Boris Oskotsky & Gregor Krings & Andrei Goga & Marina Sirota & Atul J. Butte, 2017. "Comprehensive analysis of normal adjacent to tumor transcriptomes," Nature Communications, Nature, vol. 8(1), pages 1-14, December.
    2. Annelien Verfaillie & Hana Imrichova & Zeynep Kalender Atak & Michael Dewaele & Florian Rambow & Gert Hulselmans & Valerie Christiaens & Dmitry Svetlichnyy & Flavie Luciani & Laura Van den Mooter & So, 2015. "Decoding the regulatory landscape of melanoma reveals TEADS as regulators of the invasive cell state," Nature Communications, Nature, vol. 6(1), pages 1-16, May.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Miles C. Andrews & Junna Oba & Chang-Jiun Wu & Haifeng Zhu & Tatiana Karpinets & Caitlin A. Creasy & Marie-Andrée Forget & Xiaoxing Yu & Xingzhi Song & Xizeng Mao & A. Gordon Robertson & Gabriele Roma, 2022. "Multi-modal molecular programs regulate melanoma cell state," Nature Communications, Nature, vol. 13(1), pages 1-18, December.
    2. Bingjie Guan & Youdong Liu & Bowen Xie & Senlin Zhao & Abudushalamu Yalikun & Weiwei Chen & Menghua Zhou & Qi Gu & Dongwang Yan, 2024. "Mitochondrial genome transfer drives metabolic reprogramming in adjacent colonic epithelial cells promoting TGFβ1-mediated tumor progression," Nature Communications, Nature, vol. 15(1), pages 1-18, December.
    3. Yukinari Haraoka & Yuki Akieda & Yuri Nagai & Chihiro Mogi & Tohru Ishitani, 2022. "Zebrafish imaging reveals TP53 mutation switching oncogene-induced senescence from suppressor to driver in primary tumorigenesis," Nature Communications, Nature, vol. 13(1), pages 1-15, December.
    4. Guillaume Harmange & Raúl A. Reyes Hueros & Dylan L. Schaff & Benjamin Emert & Michael Saint-Antoine & Laura C. Kim & Zijian Niu & Shivani Nellore & Mitchell E. Fane & Gretchen M. Alicea & Ashani T. W, 2023. "Disrupting cellular memory to overcome drug resistance," Nature Communications, Nature, vol. 14(1), pages 1-17, December.
    5. Claudia Capparelli & Timothy J. Purwin & McKenna Glasheen & Signe Caksa & Manoela Tiago & Nicole Wilski & Danielle Pomante & Sheera Rosenbaum & Mai Q. Nguyen & Weijia Cai & Janusz Franco-Barraza & Ric, 2022. "Targeting SOX10-deficient cells to reduce the dormant-invasive phenotype state in melanoma," Nature Communications, Nature, vol. 13(1), pages 1-16, December.
    6. Marc A. Vittoria & Nathan Kingston & Kristyna Kotynkova & Eric Xia & Rui Hong & Lee Huang & Shayna McDonald & Andrew Tilston-Lunel & Revati Darp & Joshua D. Campbell & Deborah Lang & Xiaowei Xu & Crai, 2022. "Inactivation of the Hippo tumor suppressor pathway promotes melanoma," Nature Communications, Nature, vol. 13(1), pages 1-17, December.
    7. Julia Velz & Lena K. Freudenmann & Gioele Medici & Marissa Dubbelaar & Malte Mohme & David R. Ghasemi & Jonas Scheid & Daniel J. Kowalewski & Angelica B. Patterson & Anna M. Zeitlberger & Katrin Lamsz, 2025. "Mapping naturally presented T cell antigens in medulloblastoma based on integrative multi-omics," Nature Communications, Nature, vol. 16(1), pages 1-17, December.
    8. Nan Li & Alex Quan & Dan Li & Jiajia Pan & Hua Ren & Gerard Hoeltzel & Natalia Val & Dana Ashworth & Weiming Ni & Jing Zhou & Sean Mackay & Stephen M. Hewitt & Raul Cachau & Mitchell Ho, 2023. "The IgG4 hinge with CD28 transmembrane domain improves VHH-based CAR T cells targeting a membrane-distal epitope of GPC1 in pancreatic cancer," Nature Communications, Nature, vol. 14(1), pages 1-19, December.
    9. Saverio Ranciati & Alberto Roverato & Alessandra Luati, 2021. "Fused graphical lasso for brain networks with symmetries," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 70(5), pages 1299-1322, November.
    10. Michael F. Emmons & Richard L. Bennett & Alberto Riva & Kanchan Gupta & Larissa Anastasio Da Costa Carvalho & Chao Zhang & Robert Macaulay & Daphne Dupéré-Richér & Bin Fang & Edward Seto & John M. Koo, 2023. "HDAC8-mediated inhibition of EP300 drives a transcriptional state that increases melanoma brain metastasis," Nature Communications, Nature, vol. 14(1), pages 1-18, December.
    11. Caitriona M. McEvoy & Julia M. Murphy & Lin Zhang & Sergi Clotet-Freixas & Jessica A. Mathews & James An & Mehran Karimzadeh & Delaram Pouyabahar & Shenghui Su & Olga Zaslaver & Hannes Röst & Rangi Ar, 2022. "Single-cell profiling of healthy human kidney reveals features of sex-based transcriptional programs and tissue-specific immunity," Nature Communications, Nature, vol. 13(1), pages 1-18, December.
    12. Weilin Pu & Xiao Shi & Pengcheng Yu & Meiying Zhang & Zhiyan Liu & Licheng Tan & Peizhen Han & Yu Wang & Dongmei Ji & Hualei Gan & Wenjun Wei & Zhongwu Lu & Ning Qu & Jiaqian Hu & Xiaohua Hu & Zaili L, 2021. "Single-cell transcriptomic analysis of the tumor ecosystems underlying initiation and progression of papillary thyroid carcinoma," Nature Communications, Nature, vol. 12(1), pages 1-18, December.
    13. Igor Dolgalev & Hua Zhou & Nina Murrell & Hortense Le & Theodore Sakellaropoulos & Nicolas Coudray & Kelsey Zhu & Varshini Vasudevaraja & Anna Yeaton & Chandra Goparaju & Yonghua Li & Imran Sulaiman &, 2023. "Inflammation in the tumor-adjacent lung as a predictor of clinical outcome in lung adenocarcinoma," Nature Communications, Nature, vol. 14(1), pages 1-16, December.
    14. Dianne Lumaquin-Yin & Emily Montal & Eleanor Johns & Arianna Baggiolini & Ting-Hsiang Huang & Yilun Ma & Charlotte LaPlante & Shruthy Suresh & Lorenz Studer & Richard M. White, 2023. "Lipid droplets are a metabolic vulnerability in melanoma," Nature Communications, Nature, vol. 14(1), pages 1-16, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1011659. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.