IDEAS home Printed from https://ideas.repec.org/a/plo/pgen00/1012145.html

Tissue-specific transfer learning improves functional variant and therapeutic target discoveries in breast and prostate cancer

Author

Listed:
  • Qing Li
  • Dinghao Wang
  • Zilong Zhang
  • Deshan Perera
  • Zhishan Chen
  • Wanqing Wen
  • M Ethan MacDonald
  • Weijia Cai
  • Jun Yan
  • Xiao-Ou Shu
  • Wei Zheng
  • Xingyi Guo
  • Quan Long

Abstract

DNA foundation models trained on large-scale genomic and epigenomic datasets have shown promise for regulatory variant interpretation, yet their application to tissue-specific contexts remain limited. Here, we present a transfer learning (TL) framework to adapt Enformer, a deep neural network trained on 5,313 multi-omics tracks, to breast and prostate cancer using 275 and 357 tissue-specific transcription factor (TF) ChIP–seq tracks, respectively. We computed tissue-specific cis-regulatory activity (tCRA) scores for millions of single-nucleotide variants (SNVs) in genome-wide association study (GWAS) datasets and prioritized high-impact SNV subsets (1M, 1.5M, and 2M). These TL-prioritized variants demonstrated consistently greater enrichment in tissue-specific enhancers, cancer GWAS risk variants, and ClinVar pathogenic variants compared to the original Enformer model. Transcriptome-wide association studies (TWAS) using TL-based SNVs identified more cancer-relevant genes, many of which exhibited functional essentiality (DepMap), therapeutic tractability (drug databases), and disease relevance (DisGeNET). Notably, TL models outperformed the base model in identifying genes enriched for drug targets and clinically relevant disease associations. Our results show that TL-derived tCRA scores enhance regulatory variant prioritization and improve susceptibility gene discovery in a tissue-specific manner. Our study provides a generalizable framework for tailoring foundation models to disease-relevant contexts, with implications for variant interpretation, therapeutic target discovery, and precision medicine.Author summary: Understanding how genetic changes contribute to cancer remains a central challenge in human genetics. While powerful deep learning models like Enformer can predict how DNA variants might affect gene regulation, they are often trained on very broad data and may not capture the tissue-specific mechanisms relevant to specific cancers. In this study, we developed a transfer learning (TL) approach to adapt Enformer for breast and prostate cancer by retraining it on datasets specific to each cancer type. This allowed us to compute regulatory scores for millions of genetic variants and identify those most likely to affect cancer risk. We found that our TL-enhanced models perform better at highlighting genetic variants located in tissue-specific regulatory regions. Using these high-priority variants, we linked genes to cancer risk through transcriptome-wide association studies (TWAS) and showed that many of the identified genes are important for cancer cell growth and are potential drug targets. Our findings demonstrate how adapting existing models to more disease-relevant data can significantly improve our ability to uncover genes and variants involved in cancer. This work provides a new tool for researchers aiming to understand genetic risk and discover future therapies.

Suggested Citation

  • Qing Li & Dinghao Wang & Zilong Zhang & Deshan Perera & Zhishan Chen & Wanqing Wen & M Ethan MacDonald & Weijia Cai & Jun Yan & Xiao-Ou Shu & Wei Zheng & Xingyi Guo & Quan Long, 2026. "Tissue-specific transfer learning improves functional variant and therapeutic target discoveries in breast and prostate cancer," PLOS Genetics, Public Library of Science, vol. 22(5), pages 1-19, May.
  • Handle: RePEc:plo:pgen00:1012145
    DOI: 10.1371/journal.pgen.1012145
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1012145
    Download Restriction: no

    File URL: https://journals.plos.org/plosgenetics/article/file?id=10.1371/journal.pgen.1012145&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pgen.1012145?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pgen00:1012145. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosgenetics (email available below). General contact details of provider: https://journals.plos.org/plosgenetics/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.