IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1014102.html

Multidimensional scaling informed by F-statistic: Visualizing grouped microbiome data with inference

Author

Listed:
  • Hyungseok Kim
  • Soobin Kim
  • Jeffrey A Kimbrel
  • Megan M Morris
  • Xavier Mayali
  • Cullen R Buie

Abstract

Multidimensional scaling (MDS) is a widely used dimensionality reduction technique in microbial ecology data analysis that captures the multivariate structure of the data while preserving pairwise distances between samples. While improvements in MDS have enhanced the ability to reveal group-specific data patterns, these MDS-based methods require prior assumptions for inference, limiting their application in general microbiome analysis. In this study, we introduce a new MDS-based ordination method, “F-informed MDS,” which configures the data distribution based on the F-statistic, the ratio of dispersion between groups sharing common and different characteristics. Using semisynthetic datasets, we demonstrate that the proposed method is robust to hyperparameter selection while maintaining statistical significance throughout the ordination process. Various quality metrics for evaluating dimensionality reduction confirm that F-informed MDS is comparable to state-of-the-art methods in preserving both local and global data structures. Its application to a diatom-associated bacterial community suggests the role of this new method in interpreting the community’s response to the host. Our approach offers a well-founded refinement of MDS that aligns with statistical test results, which can be beneficial for broader multidimensional data analyses in microbiology and ecology. This new visualization tool can be incorporated into standard microbiome data analyses.Author summary: Multidimensional scaling (MDS), also known as principal coordinate analysis, is a fundamental step in exploratory data analysis for interpreting microbial community samples processed via high-throughput sequencing. The interpretation of MDS results often involves linking patterns obtained from MDS with experimental treatments applied to the samples, such as environmental conditions or host phenotypes. However, retaining these patterns during ordination is not always guaranteed, as MDS itself does not consider group information during its learning process. This limitation reduces the effectiveness of conventional MDS, particularly for general microbiome datasets, where maintaining meaningful biological patterns is crucial. To address this gap, we present a robust statistical framework designed to represent microbiome datasets in a lower dimension, while preserving hypothesis testing results for group differences in the original dimension. Our approach, which relies on sample dispersion measured by the F-statistic, ensures a more stable and reliable performance compared to existing ordination methods. By incorporating statistical rigor into the ordination process, our framework improves the visualization of microbial community data and allows configurations to be adjusted within reasonable limits. This advancement provides researchers with a more effective tool for analyzing and interpreting complex microbiome data, ultimately leading to insightful conclusions.

Suggested Citation

  • Hyungseok Kim & Soobin Kim & Jeffrey A Kimbrel & Megan M Morris & Xavier Mayali & Cullen R Buie, 2026. "Multidimensional scaling informed by F-statistic: Visualizing grouped microbiome data with inference," PLOS Computational Biology, Public Library of Science, vol. 22(4), pages 1-22, April.
  • Handle: RePEc:plo:pcbi00:1014102
    DOI: 10.1371/journal.pcbi.1014102
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1014102
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1014102&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1014102?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1014102. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.