Author
Listed:
- Fabio Morgante
- Peter Carbonetto
- Gao Wang
- Yuxin Zou
- Abhishek Sarkar
- Matthew Stephens
Abstract
Predicting phenotypes from genotypes is a fundamental task in quantitative genetics. With technological advances, it is now possible to measure multiple phenotypes in large samples. Multiple phenotypes can share their genetic component; therefore, modeling these phenotypes jointly may improve prediction accuracy by leveraging effects that are shared across phenotypes. However, effects can be shared across phenotypes in a variety of ways, so computationally efficient statistical methods are needed that can accurately and flexibly capture patterns of effect sharing. Here, we describe new Bayesian multivariate, multiple regression methods that, by using flexible priors, are able to model and adapt to different patterns of effect sharing and specificity across phenotypes. Simulation results show that these new methods are fast and improve prediction accuracy compared with existing methods in a wide range of settings where effects are shared. Further, in settings where effects are not shared, our methods still perform competitively with state-of-the-art methods. In real data analyses of expression data in the Genotype Tissue Expression (GTEx) project, our methods improve prediction performance on average for all tissues, with the greatest gains in tissues where effects are strongly shared, and in the tissues with smaller sample sizes. While we use gene expression prediction to illustrate our methods, the methods are generally applicable to any multi-phenotype applications, including prediction of polygenic scores and breeding values. Thus, our methods have the potential to provide improvements across fields and organisms.Author summary: Predicting phenotypes from genotypes is a fundamental problem in quantitative genetics. Thanks to recent advances, it is increasingly feasible to collect data on many phenotypes and genome-wide genotypes in large samples. Here, we tackle the problem of predicting multiple phenotypes from genotypes using a new method based on a multivariate, multiple linear regression model. Although the use of a multivariate, multiple linear regression model is not new, in this paper we introduce a flexible and computationally efficient empirical Bayes approach based on this model. This approach uses a prior that captures how the effects of genotypes on phenotypes are shared across the different phenotypes, and then the prior is adapted to the data in order to capture the most prominent sharing patterns present in the data. We assess the benefits of this flexible Bayesian approach in simulated genetic data sets, and we illustrate its application in predicting gene expression measured in multiple human tissues. We show that our methods can outperform competing methods in terms of prediction accuracy, and the computations involved in fitting the model and making the predictions scale well to large data sets.
Suggested Citation
Fabio Morgante & Peter Carbonetto & Gao Wang & Yuxin Zou & Abhishek Sarkar & Matthew Stephens, 2023.
"A flexible empirical Bayes approach to multivariate multiple regression, and its improved accuracy in predicting multi-tissue gene expression from genotypes,"
PLOS Genetics, Public Library of Science, vol. 19(7), pages 1-21, July.
Handle:
RePEc:plo:pgen00:1010539
DOI: 10.1371/journal.pgen.1010539
Download full text from publisher
References listed on IDEAS
- repec:plo:pgen00:1005048 is not listed on IDEAS
Full references (including those not matched with items on IDEAS)
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pgen00:1010539. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosgenetics (email available below). General contact details of provider: https://journals.plos.org/plosgenetics/ .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.