IDEAS home Printed from https://ideas.repec.org/a/plo/pgen00/1003657.html
   My bibliography  Save this article

GUESS-ing Polygenic Associations with Multiple Phenotypes Using a GPU-Based Evolutionary Stochastic Search Algorithm

Author

Listed:
  • Leonardo Bottolo
  • Marc Chadeau-Hyam
  • David I Hastie
  • Tanja Zeller
  • Benoit Liquet
  • Paul Newcombe
  • Loic Yengo
  • Philipp S Wild
  • Arne Schillert
  • Andreas Ziegler
  • Sune F Nielsen
  • Adam S Butterworth
  • Weang Kee Ho
  • Raphaële Castagné
  • Thomas Munzel
  • David Tregouet
  • Mario Falchi
  • François Cambien
  • Børge G Nordestgaard
  • Fredéric Fumeron
  • Anne Tybjærg-Hansen
  • Philippe Froguel
  • John Danesh
  • Enrico Petretto
  • Stefan Blankenberg
  • Laurence Tiret
  • Sylvia Richardson

Abstract

Genome-wide association studies (GWAS) yielded significant advances in defining the genetic architecture of complex traits and disease. Still, a major hurdle of GWAS is narrowing down multiple genetic associations to a few causal variants for functional studies. This becomes critical in multi-phenotype GWAS where detection and interpretability of complex SNP(s)-trait(s) associations are complicated by complex Linkage Disequilibrium patterns between SNPs and correlation between traits. Here we propose a computationally efficient algorithm (GUESS) to explore complex genetic-association models and maximize genetic variant detection. We integrated our algorithm with a new Bayesian strategy for multi-phenotype analysis to identify the specific contribution of each SNP to different trait combinations and study genetic regulation of lipid metabolism in the Gutenberg Health Study (GHS). Despite the relatively small size of GHS (n = 3,175), when compared with the largest published meta-GWAS (n>100,000), GUESS recovered most of the major associations and was better at refining multi-trait associations than alternative methods. Amongst the new findings provided by GUESS, we revealed a strong association of SORT1 with TG-APOB and LIPC with TG-HDL phenotypic groups, which were overlooked in the larger meta-GWAS and not revealed by competing approaches, associations that we replicated in two independent cohorts. Moreover, we demonstrated the increased power of GUESS over alternative multi-phenotype approaches, both Bayesian and non-Bayesian, in a simulation study that mimics real-case scenarios. We showed that our parallel implementation based on Graphics Processing Units outperforms alternative multi-phenotype methods. Beyond multivariate modelling of multi-phenotypes, our Bayesian model employs a flexible hierarchical prior structure for genetic effects that adapts to any correlation structure of the predictors and increases the power to identify associated variants. This provides a powerful tool for the analysis of diverse genomic features, for instance including gene expression and exome sequencing data, where complex dependencies are present in the predictor space.Author Summary: Nowadays, the availability of cheaper and accurate assays to quantify multiple (endo)phenotypes in large population cohorts allows multi-trait studies. However, these studies are limited by the lack of flexible models integrated with efficient computational tools for genome-wide multi SNPs-traits analyses. To overcome this problem, we propose a novel Bayesian analysis strategy and a new algorithmic implementation which exploits parallel processing architecture for fully multivariate modeling of groups of correlated phenotypes at the genome-wide scale. In addition to increased power of our algorithm over alternative Bayesian and well-established non-Bayesian multi-phenotype methods, we provide an application to a real case study of several blood lipid traits, and show how our method recovered most of the major associations and is better at refining multi-trait polygenic associations than alternative methods. We reveal and replicate in independent cohorts new associations with two phenotypic groups that were not detected by competing multivariate approaches and not noticed by a large meta-GWAS. We also discuss the applicability of the proposed method to large meta-analyses involving hundreds of thousands of individuals and to diverse genomic datasets where complex dependencies in the predictor space are present.

Suggested Citation

  • Leonardo Bottolo & Marc Chadeau-Hyam & David I Hastie & Tanja Zeller & Benoit Liquet & Paul Newcombe & Loic Yengo & Philipp S Wild & Arne Schillert & Andreas Ziegler & Sune F Nielsen & Adam S Butterwo, 2013. "GUESS-ing Polygenic Associations with Multiple Phenotypes Using a GPU-Based Evolutionary Stochastic Search Algorithm," PLOS Genetics, Public Library of Science, vol. 9(8), pages 1-17, August.
  • Handle: RePEc:plo:pgen00:1003657
    DOI: 10.1371/journal.pgen.1003657
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1003657
    Download Restriction: no

    File URL: https://journals.plos.org/plosgenetics/article/file?id=10.1371/journal.pgen.1003657&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pgen.1003657?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. P. J. Brown & M. Vannucci & T. Fearn, 1998. "Multivariate Bayesian variable selection and prediction," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 60(3), pages 627-641.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Jose A Seoane & Colin Campbell & Ian N M Day & Juan P Casas & Tom R Gaunt, 2014. "Canonical Correlation Analysis for Gene-Based Pleiotropy Discovery," PLOS Computational Biology, Public Library of Science, vol. 10(10), pages 1-13, October.
    2. L Bottolo & S Richardson, 2019. "Discussion of ‘Gene hunting with hidden Markov model knockoffs’," Biometrika, Biometrika Trust, vol. 106(1), pages 19-22.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Riccardo (Jack) Lucchetti & Luca Pedini, 2020. "ParMA: Parallelised Bayesian Model Averaging for Generalised Linear Models," Working Papers 2020:28, Department of Economics, University of Venice "Ca' Foscari".
    2. Kan Shao & Mitchell J. Small, 2011. "Potential Uncertainty Reduction in Model‐Averaged Benchmark Dose Estimates Informed by an Additional Dose Study," Risk Analysis, John Wiley & Sons, vol. 31(10), pages 1561-1575, October.
    3. Stephan Wachtel & Thomas Otter, 2013. "Successive Sample Selection and Its Relevance for Management Decisions," Marketing Science, INFORMS, vol. 32(1), pages 170-185, September.
    4. Gary M. Koop, 2013. "Forecasting with Medium and Large Bayesian VARS," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 28(2), pages 177-203, March.
    5. Doris A. Oberdabernig & Stefan Humer & Jesus Crespo Cuaresma, 2018. "Democracy, Geography and Model Uncertainty," Scottish Journal of Political Economy, Scottish Economic Society, vol. 65(2), pages 154-185, May.
    6. Bernardi, Mauro & Costola, Michele, 2019. "High-dimensional sparse financial networks through a regularised regression model," SAFE Working Paper Series 244, Leibniz Institute for Financial Research SAFE.
    7. Spiliopoulos, Leonidas, 2010. "The determinants of macroeconomic volatility: A Bayesian model averaging approach," MPRA Paper 26832, University Library of Munich, Germany.
    8. Camba-Méndez, Gonzalo & Serwa, Dobromił, 2016. "Market perception of sovereign credit risk in the euro area during the financial crisis," The North American Journal of Economics and Finance, Elsevier, vol. 37(C), pages 168-189.
    9. Debamita Kundu & Riten Mitra & Jeremy T. Gaskins, 2021. "Bayesian variable selection for multioutcome models through shared shrinkage," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 48(1), pages 295-320, March.
    10. Wang, Hao, 2010. "Sparse seemingly unrelated regression modelling: Applications in finance and econometrics," Computational Statistics & Data Analysis, Elsevier, vol. 54(11), pages 2866-2877, November.
    11. Gilles Celeux & Mohammed El Anbari & Jean-Michel Marin & Christian P. Robert, 2010. "Regularization in Regression : Comparing Bayesian and Frequentist Methods in a Poorly Informative Situation," Working Papers 2010-43, Center for Research in Economics and Statistics.
    12. Theo S. Eicher & Chris Papageorgiou & Adrian E. Raftery, 2011. "Default priors and predictive performance in Bayesian model averaging, with application to growth determinants," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 26(1), pages 30-55, January/F.
    13. Dimitris Korobilis, 2008. "Forecasting in vector autoregressions with many predictors," Advances in Econometrics, in: Bayesian Econometrics, pages 403-431, Emerald Group Publishing Limited.
    14. Brock, William A. & Durlauf, Steven N. & West, Kenneth D., 2007. "Model uncertainty and policy evaluation: Some theory and empirics," Journal of Econometrics, Elsevier, vol. 136(2), pages 629-664, February.
    15. Dovern, Jonas & Feldkircher, Martin & Huber, Florian, 2016. "Does joint modelling of the world economy pay off? Evaluating global forecasts from a Bayesian GVAR," Journal of Economic Dynamics and Control, Elsevier, vol. 70(C), pages 86-100.
    16. Njindan Iyke, Bernard, 2015. "Macro Determinants of the Real Exchange Rate in a Small Open Small Island Economy: Evidence from Mauritius via BMA," MPRA Paper 68968, University Library of Munich, Germany.
    17. Min Wang & Xiaoqian Sun & Tao Lu, 2015. "Bayesian structured variable selection in linear regression models," Computational Statistics, Springer, vol. 30(1), pages 205-229, March.
    18. Ishwaran, Hemant & Sunil Rao, J., 2011. "Consistency of spike and slab regression," Statistics & Probability Letters, Elsevier, vol. 81(12), pages 1920-1928.
    19. Daniel Spencer & Rajarshi Guhaniyogi & Raquel Prado, 2020. "Joint Bayesian Estimation of Voxel Activation and Inter-regional Connectivity in fMRI Experiments," Psychometrika, Springer;The Psychometric Society, vol. 85(4), pages 845-869, December.
    20. Robert Kohn & Rachida Ouysse, 2007. "Bayesian Variable Selection of Risk Factors in the APT Model," Discussion Papers 2007-32, School of Economics, The University of New South Wales.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pgen00:1003657. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosgenetics (email available below). General contact details of provider: https://journals.plos.org/plosgenetics/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.