IDEAS home Printed from https://ideas.repec.org/a/plo/pgen00/1010597.html
   My bibliography  Save this article

Transfer learning with false negative control improves polygenic risk prediction

Author

Listed:
  • Xinge Jessie Jeng
  • Yifei Hu
  • Vaishnavi Venkat
  • Tzu-Pin Lu
  • Jung-Ying Tzeng

Abstract

Polygenic risk score (PRS) is a quantity that aggregates the effects of variants across the genome and estimates an individual’s genetic predisposition for a given trait. PRS analysis typically contains two input data sets: base data for effect size estimation and target data for individual-level prediction. Given the availability of large-scale base data, it becomes more common that the ancestral background of base and target data do not perfectly match. In this paper, we treat the GWAS summary information obtained in the base data as knowledge learned from a pre-trained model, and adopt a transfer learning framework to effectively leverage the knowledge learned from the base data that may or may not have similar ancestral background as the target samples to build prediction models for target individuals. Our proposed transfer learning framework consists of two main steps: (1) conducting false negative control (FNC) marginal screening to extract useful knowledge from the base data; and (2) performing joint model training to integrate the knowledge extracted from base data with the target training data for accurate trans-data prediction. This new approach can significantly enhance the computational and statistical efficiency of joint-model training, alleviate over-fitting, and facilitate more accurate trans-data prediction when heterogeneity level between target and base data sets is small or high.Author summary: Polygenic risk score (PRS) can quantify the genetic predisposition for a trait. PRS construction typically contains two input datasets: base data for variant-effect estimation and target data for individual-level prediction. Given the availability of large-scale base data, it becomes common that the ancestral background of base and target data do not perfectly match. In this paper, we introduce a PRS method under a transfer learning framework to effectively leverage the knowledge learned from the base data that may or may not have similar background as the target samples to build prediction models for target individuals. Our method first utilizes a unique false-negative control strategy to extract useful information from base data while ensuring to retain a high proportion of true signals; it then applies the extracted information to re-train PRS models in a statistically and computationally efficient fashion. We use numerical studies based on simulated and real data to show that the proposed method can increase the accuracy and robustness of polygenic prediction across different ranges of heterogeneities between base and target data and sample sizes, reduce computational cost in model re-training, and result in more parsimonious models that can facilitate PRS interpretation and/or exploration of complex, non-additive PRS models.

Suggested Citation

  • Xinge Jessie Jeng & Yifei Hu & Vaishnavi Venkat & Tzu-Pin Lu & Jung-Ying Tzeng, 2023. "Transfer learning with false negative control improves polygenic risk prediction," PLOS Genetics, Public Library of Science, vol. 19(11), pages 1-17, November.
  • Handle: RePEc:plo:pgen00:1010597
    DOI: 10.1371/journal.pgen.1010597
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1010597
    Download Restriction: no

    File URL: https://journals.plos.org/plosgenetics/article/file?id=10.1371/journal.pgen.1010597&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pgen.1010597?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Qianqian Zhang & Florian Privé & Bjarni Vilhjálmsson & Doug Speed, 2021. "Improved genetic prediction of complex traits from individual-level data or summary statistics," Nature Communications, Nature, vol. 12(1), pages 1-9, December.
    2. Tian Ge & Chia-Yen Chen & Yang Ni & Yen-Chen Anne Feng & Jordan W. Smoller, 2019. "Polygenic prediction via Bayesian regression and continuous shrinkage priors," Nature Communications, Nature, vol. 10(1), pages 1-10, December.
    3. Tanya M. Teslovich & Kiran Musunuru & Albert V. Smith & Andrew C. Edmondson & Ioannis M. Stylianou & Masahiro Koseki & James P. Pirruccello & Samuli Ripatti & Daniel I. Chasman & Cristen J. Willer & C, 2010. "Biological, clinical and population relevance of 95 loci for blood lipids," Nature, Nature, vol. 466(7307), pages 707-713, August.
    4. T. Tony Cai & Wenguang Sun, 2017. "Optimal screening and discovery of sparse signals with applications to multistage high throughput studies," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 79(1), pages 197-223, January.
    5. Magdalena Zimoń & Yunfeng Huang & Anthi Trasta & Aliaksandr Halavatyi & Jimmy Z. Liu & Chia-Yen Chen & Peter Blattmann & Bernd Klaus & Christopher D. Whelan & David Sexton & Sally John & Wolfgang Hube, 2021. "Pairwise effects between lipid GWAS genes modulate lipid plasma levels and cellular uptake," Nature Communications, Nature, vol. 12(1), pages 1-16, December.
    6. repec:plo:pgen00:1003608 is not listed on IDEAS
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Jiacheng Miao & Hanmin Guo & Gefei Song & Zijie Zhao & Lin Hou & Qiongshi Lu, 2023. "Quantifying portable genetic effects and improving cross-ancestry genetic prediction with GWAS summary statistics," Nature Communications, Nature, vol. 14(1), pages 1-13, December.
    2. Qile Dai & Geyu Zhou & Hongyu Zhao & Urmo Võsa & Lude Franke & Alexis Battle & Alexander Teumer & Terho Lehtimäki & Olli T. Raitakari & Tõnu Esko & Michael P. Epstein & Jingjing Yang, 2023. "OTTERS: a powerful TWAS framework leveraging summary-level reference data," Nature Communications, Nature, vol. 14(1), pages 1-13, December.
    3. Clara Albiñana & Zhihong Zhu & Andrew J. Schork & Andrés Ingason & Hugues Aschard & Isabell Brikell & Cynthia M. Bulik & Liselotte V. Petersen & Esben Agerbo & Jakob Grove & Merete Nordentoft & David , 2023. "Multi-PGS enhances polygenic prediction by combining 937 polygenic scores," Nature Communications, Nature, vol. 14(1), pages 1-11, December.
    4. Ruth Heller & Saharon Rosset, 2021. "Optimal control of false discovery criteria in the two‐group model," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 83(1), pages 133-155, February.
    5. Chen Wang & Havell Markus & Avantika R. Diwadkar & Chachrit Khunsriraksakul & Laura Carrel & Bingshan Li & Xue Zhong & Xingyan Wang & Xiaowei Zhan & Galen T. Foulke & Nancy J. Olsen & Dajiang J. Liu &, 2025. "Integrating electronic health records and GWAS summary statistics to predict the progression of autoimmune diseases from preclinical stages," Nature Communications, Nature, vol. 16(1), pages 1-17, December.
    6. Yash Patel & Jean Shin & Eeva Sliz & Ariana Tang & Aniket Mishra & Rui Xia & Edith Hofer & Hema Sekhar Reddy Rajula & Ruiqi Wang & Frauke Beyer & Katrin Horn & Max Riedl & Jing Yu & Henry Völzke & Rob, 2024. "Genetic risk factors underlying white matter hyperintensities and cortical atrophy," Nature Communications, Nature, vol. 15(1), pages 1-11, December.
    7. Jordi Manuello & Joosung Min & Paul McCarthy & Fidel Alfaro-Almagro & Soojin Lee & Stephen Smith & Lloyd T. Elliott & Anderson M. Winkler & Gwenaëlle Douaud, 2024. "The effects of genetic and modifiable risk factors on brain regions vulnerable to ageing and disease," Nature Communications, Nature, vol. 15(1), pages 1-11, December.
    8. Ruoyu Tian & Tian Ge & Hyeokmoon Kweon & Daniel B. Rocha & Max Lam & Jimmy Z. Liu & Kritika Singh & Daniel F. Levey & Joel Gelernter & Murray B. Stein & Ellen A. Tsai & Hailiang Huang & Christopher F., 2024. "Whole-exome sequencing in UK Biobank reveals rare genetic architecture for depression," Nature Communications, Nature, vol. 15(1), pages 1-12, December.
    9. Tuomo Hartonen & Bradley Jermy & Hanna Sõnajalg & Pekka Vartiainen & Kristi Krebs & Andrius Vabalas & Tuija Leino & Hanna Nohynek & Jonas Sivelä & Reedik Mägi & Mark Daly & Hanna M. Ollila & Lili Mila, 2023. "Nationwide health, socio-economic and genetic predictors of COVID-19 vaccination status in Finland," Nature Human Behaviour, Nature, vol. 7(7), pages 1069-1083, July.
    10. Adrienne Tin & Pascal Schlosser & Pamela R. Matias-Garcia & Chris H. L. Thio & Roby Joehanes & Hongbo Liu & Zhi Yu & Antoine Weihs & Anselm Hoppmann & Franziska Grundner-Culemann & Josine L. Min & Vic, 2021. "Epigenome-wide association study of serum urate reveals insights into urate co-regulation and the SLC2A9 locus," Nature Communications, Nature, vol. 12(1), pages 1-18, December.
    11. James P. Pirruccello & Paolo Achille & Seung Hoan Choi & Joel T. Rämö & Shaan Khurshid & Mahan Nekoui & Sean J. Jurgens & Victor Nauffal & Shinwan Kany & Kenney Ng & Samuel F. Friedman & Puneet Batra , 2024. "Deep learning of left atrial structure and function provides link to atrial fibrillation risk," Nature Communications, Nature, vol. 15(1), pages 1-17, December.
    12. Mattia Cordioli & Andrea Corbetta & Hanna Maria Kariis & Sakari Jukarainen & Pekka Vartiainen & Tuomo Kiiskinen & Matteo Ferro & Markus Perola & Mikko Niemi & Samuli Ripatti & Kelli Lehto & Lili Milan, 2024. "Socio-demographic and genetic risk factors for drug adherence and persistence across 5 common medication classes," Nature Communications, Nature, vol. 15(1), pages 1-13, December.
    13. Sara J Cromer & Victoria Chen & Christopher Han & William Marshall & Shekina Emongo & Evelyn Greaux & Tim Majarian & Jose C Florez & Josep Mercader & Miriam S Udler, 2022. "Algorithmic identification of atypical diabetes in electronic health record (EHR) systems," PLOS ONE, Public Library of Science, vol. 17(12), pages 1-13, December.
    14. Bingxin Zhao & Yujue Li & Zirui Fan & Zhenyi Wu & Juan Shu & Xiaochen Yang & Yilin Yang & Xifeng Wang & Bingxuan Li & Xiyao Wang & Carlos Copana & Yue Yang & Jinjie Lin & Yun Li & Jason L. Stein & Joa, 2024. "Eye-brain connections revealed by multimodal retinal and brain imaging genetics," Nature Communications, Nature, vol. 15(1), pages 1-19, December.
    15. Eugene Lin & Yu-Ting Yan & Mu-Hong Chen & Albert C. Yang & Po-Hsiu Kuo & Shih-Jen Tsai, 2025. "Gene clusters linked to insulin resistance identified in a genome-wide study of the Taiwan Biobank population," Nature Communications, Nature, vol. 16(1), pages 1-14, December.
    16. Magdalena Zimoń & Yunfeng Huang & Anthi Trasta & Aliaksandr Halavatyi & Jimmy Z. Liu & Chia-Yen Chen & Peter Blattmann & Bernd Klaus & Christopher D. Whelan & David Sexton & Sally John & Wolfgang Hube, 2021. "Pairwise effects between lipid GWAS genes modulate lipid plasma levels and cellular uptake," Nature Communications, Nature, vol. 12(1), pages 1-16, December.
    17. Jingning Zhang & Jianan Zhan & Jin Jin & Cheng Ma & Ruzhang Zhao & Jared O’Connell & Yunxuan Jiang & Bertram L. Koelsch & Haoyu Zhang & Nilanjan Chatterjee, 2024. "An ensemble penalized regression method for multi-ancestry polygenic risk prediction," Nature Communications, Nature, vol. 15(1), pages 1-14, December.
    18. Song Zhai & Hong Zhang & Devan V. Mehrotra & Judong Shen, 2022. "Pharmacogenomics polygenic risk score for drug response prediction using PRS-PGx methods," Nature Communications, Nature, vol. 13(1), pages 1-13, December.
    19. repec:plo:pone00:0071494 is not listed on IDEAS
    20. Rikifumi Ohta & Yosuke Tanigawa & Yuta Suzuki & Manolis Kellis & Shinichi Morishita, 2024. "A polygenic score method boosted by non-additive models," Nature Communications, Nature, vol. 15(1), pages 1-13, December.
    21. Geyu Zhou & Hongyu Zhao, 2021. "A fast and robust Bayesian nonparametric method for prediction of complex traits using summary statistics," PLOS Genetics, Public Library of Science, vol. 17(7), pages 1-17, July.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pgen00:1010597. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosgenetics (email available below). General contact details of provider: https://journals.plos.org/plosgenetics/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.