IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1013455.html

Coherent cross-modal generation of synthetic biomedical data to advance multimodal precision medicine

Author

Listed:
  • Raffaele Marchesi
  • Nicolò Lazzaro
  • Walter Endrizzi
  • Gianluca Leonardi
  • Matteo Pozzi
  • Flavio Ragni
  • Stefano Bovo
  • Monica Moroni
  • Venet Osmani
  • Giuseppe Jurman

Abstract

Integration of multimodal, multi-omics data is critical for advancing precision medicine, yet its application is frequently limited by incomplete datasets where one or more modalities are missing. To address this challenge, we developed a generative framework capable of synthesizing any missing modality from an arbitrary subset of available modalities. We introduce Coherent Denoising, a novel ensemble-based generative diffusion method that aggregates predictions from multiple specialized, single-condition models and enforces consensus during the sampling process. We compare this approach against a multi-condition, generative model that uses a flexible masking strategy to handle arbitrary subsets of inputs. The results show that our architectures successfully generate high-fidelity data that preserve the complex biological signals required for downstream tasks. We demonstrate that the generated synthetic data can be used to maintain the performance of predictive models on incomplete patient profiles and can leverage counterfactual analysis to guide the prioritization of diagnostic tests. We validated the framework’s efficacy on a large-scale multimodal, multi-omics cohort from The Cancer Genome Atlas (TCGA) of over 10,000 samples spanning across 20 tumor types, using data modalities such as copy-number alterations (CNA), transcriptomics (RNA-Seq), proteomics (RPPA), and histopathology (WSI). This work establishes a robust and flexible generative framework to address sparsity in multimodal datasets, providing a key step toward improving precision oncology.Author summary: To make precision medicine a reality, doctors need to understand a patient’s status from many angles, using different data types like genetic information (omics) and tissue slide images (histopathology). The problem is that most patient records are incomplete, with one or more of these data types missing, which can limit the effectiveness of powerful predictive tools. We have built a generative AI system designed to learn the complex biological patterns that connect all these different data types. By looking at the patient data that is available, our system can then generate a realistic, synthetic version of any missing piece. We developed a novel method called Coherent Denoising to do this, which is flexible and helps protect patient privacy. We validated this approach on a large dataset of over 10,000 cancer patient profiles. We show that our AI-generated data is high-fidelity and can successfully complete these sparse patient profiles, allowing AI models for crucial tasks like cancer staging and survival prediction to work at their best even with incomplete patient data. We also demonstrate how this tool can be used to evaluate the potential impact of new tests, helping to prioritize which expensive diagnostic tests would be most beneficial for a patient.

Suggested Citation

  • Raffaele Marchesi & Nicolò Lazzaro & Walter Endrizzi & Gianluca Leonardi & Matteo Pozzi & Flavio Ragni & Stefano Bovo & Monica Moroni & Venet Osmani & Giuseppe Jurman, 2026. "Coherent cross-modal generation of synthetic biomedical data to advance multimodal precision medicine," PLOS Computational Biology, Public Library of Science, vol. 22(4), pages 1-23, April.
  • Handle: RePEc:plo:pcbi00:1013455
    DOI: 10.1371/journal.pcbi.1013455
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1013455
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1013455&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1013455?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1013455. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.