Author
Listed:
- Xinge Jessie Jeng
- Yifei Hu
- Vaishnavi Venkat
- Tzu-Pin Lu
- Jung-Ying Tzeng
Abstract
Polygenic risk score (PRS) is a quantity that aggregates the effects of variants across the genome and estimates an individual’s genetic predisposition for a given trait. PRS analysis typically contains two input data sets: base data for effect size estimation and target data for individual-level prediction. Given the availability of large-scale base data, it becomes more common that the ancestral background of base and target data do not perfectly match. In this paper, we treat the GWAS summary information obtained in the base data as knowledge learned from a pre-trained model, and adopt a transfer learning framework to effectively leverage the knowledge learned from the base data that may or may not have similar ancestral background as the target samples to build prediction models for target individuals. Our proposed transfer learning framework consists of two main steps: (1) conducting false negative control (FNC) marginal screening to extract useful knowledge from the base data; and (2) performing joint model training to integrate the knowledge extracted from base data with the target training data for accurate trans-data prediction. This new approach can significantly enhance the computational and statistical efficiency of joint-model training, alleviate over-fitting, and facilitate more accurate trans-data prediction when heterogeneity level between target and base data sets is small or high.Author summary: Polygenic risk score (PRS) can quantify the genetic predisposition for a trait. PRS construction typically contains two input datasets: base data for variant-effect estimation and target data for individual-level prediction. Given the availability of large-scale base data, it becomes common that the ancestral background of base and target data do not perfectly match. In this paper, we introduce a PRS method under a transfer learning framework to effectively leverage the knowledge learned from the base data that may or may not have similar background as the target samples to build prediction models for target individuals. Our method first utilizes a unique false-negative control strategy to extract useful information from base data while ensuring to retain a high proportion of true signals; it then applies the extracted information to re-train PRS models in a statistically and computationally efficient fashion. We use numerical studies based on simulated and real data to show that the proposed method can increase the accuracy and robustness of polygenic prediction across different ranges of heterogeneities between base and target data and sample sizes, reduce computational cost in model re-training, and result in more parsimonious models that can facilitate PRS interpretation and/or exploration of complex, non-additive PRS models.
Suggested Citation
Xinge Jessie Jeng & Yifei Hu & Vaishnavi Venkat & Tzu-Pin Lu & Jung-Ying Tzeng, 2023.
"Transfer learning with false negative control improves polygenic risk prediction,"
PLOS Genetics, Public Library of Science, vol. 19(11), pages 1-17, November.
Handle:
RePEc:plo:pgen00:1010597
DOI: 10.1371/journal.pgen.1010597
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pgen00:1010597. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosgenetics (email available below). General contact details of provider: https://journals.plos.org/plosgenetics/ .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.