IDEAS home Printed from https://ideas.repec.org/a/plo/pgen00/1004969.html
   My bibliography  Save this article

Simultaneous Discovery, Estimation and Prediction Analysis of Complex Traits Using a Bayesian Mixture Model

Author

Listed:
  • Gerhard Moser
  • Sang Hong Lee
  • Ben J Hayes
  • Michael E Goddard
  • Naomi R Wray
  • Peter M Visscher

Abstract

Gene discovery, estimation of heritability captured by SNP arrays, inference on genetic architecture and prediction analyses of complex traits are usually performed using different statistical models and methods, leading to inefficiency and loss of power. Here we use a Bayesian mixture model that simultaneously allows variant discovery, estimation of genetic variance explained by all variants and prediction of unobserved phenotypes in new samples. We apply the method to simulated data of quantitative traits and Welcome Trust Case Control Consortium (WTCCC) data on disease and show that it provides accurate estimates of SNP-based heritability, produces unbiased estimators of risk in new samples, and that it can estimate genetic architecture by partitioning variation across hundreds to thousands of SNPs. We estimated that, depending on the trait, 2,633 to 9,411 SNPs explain all of the SNP-based heritability in the WTCCC diseases. The majority of those SNPs (>96%) had small effects, confirming a substantial polygenic component to common diseases. The proportion of the SNP-based variance explained by large effects (each SNP explaining 1% of the variance) varied markedly between diseases, ranging from almost zero for bipolar disorder to 72% for type 1 diabetes. Prediction analyses demonstrate that for diseases with major loci, such as type 1 diabetes and rheumatoid arthritis, Bayesian methods outperform profile scoring or mixed model approaches.Author Summary: Most genome-wide association studies performed to date have focused on testing individual genetic markers for associations with phenotype. Recently, methods that analyse the joint effects of multiple markers on genetic variation have provided further insights into the genetic basis of complex human traits. In addition, there is increasing interest in using genotype data for genetic risk prediction of disease. Often disparate analytical methods are used for each of these tasks. We propose a flexible novel approach that simultaneously performs identification of susceptibility loci, inference on the genetic architecture and provides polygenic risk prediction in the same statistical model. We illustrate the broad applicability of the approach by considering both simulated and real data. In the analysis of seven common diseases we show large differences in the proportion of genetic variation due to loci with different effect sizes and differences in prediction accuracy between complex traits. These findings are important for future studies and the understanding of the complex genetic architecture of common diseases.

Suggested Citation

  • Gerhard Moser & Sang Hong Lee & Ben J Hayes & Michael E Goddard & Naomi R Wray & Peter M Visscher, 2015. "Simultaneous Discovery, Estimation and Prediction Analysis of Complex Traits Using a Bayesian Mixture Model," PLOS Genetics, Public Library of Science, vol. 11(4), pages 1-22, April.
  • Handle: RePEc:plo:pgen00:1004969
    DOI: 10.1371/journal.pgen.1004969
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1004969
    Download Restriction: no

    File URL: https://journals.plos.org/plosgenetics/article/file?id=10.1371/journal.pgen.1004969&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pgen.1004969?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Clive J Hoggart & John C Whittaker & Maria De Iorio & David J Balding, 2008. "Simultaneous Analysis of All SNPs in Genome-Wide and Re-Sequencing Association Studies," PLOS Genetics, Public Library of Science, vol. 4(7), pages 1-8, July.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Cox Lwaka Tamba & Yuan-Li Ni & Yuan-Ming Zhang, 2017. "Iterative sure independence screening EM-Bayesian LASSO algorithm for multi-locus genome-wide association studies," PLOS Computational Biology, Public Library of Science, vol. 13(1), pages 1-20, January.
    2. Katharina B Böndel & Susanne A Kraemer & Toby Samuels & Deirdre McClean & Josianne Lachapelle & Rob W Ness & Nick Colegrave & Peter D Keightley, 2019. "Inferring the distribution of fitness effects of spontaneous mutations in Chlamydomonas reinhardtii," PLOS Biology, Public Library of Science, vol. 17(6), pages 1-24, June.
    3. Malka Gorfine & Sonja I Berndt & Jenny Chang-Claude & Michael Hoffmeister & Loic Le Marchand & John Potter & Martha L Slattery & Nir Keret & Ulrike Peters & Li Hsu, 2017. "Heritability Estimation using a Regularized Regression Approach (HERRA): Applicable to continuous, dichotomous or age-at-onset outcome," PLOS ONE, Public Library of Science, vol. 12(8), pages 1-19, August.
    4. M. S. Clark & J. I. Hoffman & L. S. Peck & L. Bargelloni & D. Gande & C. Havermans & B. Meyer & T. Patarnello & T. Phillips & K. R. Stoof-Leichsenring & D. L. J. Vendrami & A. Beck & G. Collins & M. W, 2023. "Multi-omics for studying and understanding polar life," Nature Communications, Nature, vol. 14(1), pages 1-12, December.
    5. Theo Meuwissen & Ben Hayes & Iona MacLeod & Michael Goddard, 2022. "Identification of Genomic Variants Causing Variation in Quantitative Traits: A Review," Agriculture, MDPI, vol. 12(10), pages 1-11, October.
    6. Gao Wang & Abhishek Sarkar & Peter Carbonetto & Matthew Stephens, 2020. "A simple new approach to variable selection in regression, with application to genetic fine mapping," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 82(5), pages 1273-1300, December.
    7. Marion Patxot & Daniel Trejo Banos & Athanasios Kousathanas & Etienne J. Orliac & Sven E. Ojavee & Gerhard Moser & Alexander Holloway & Julia Sidorenko & Zoltan Kutalik & Reedik Mägi & Peter M. Vissch, 2021. "Probabilistic inference of the genetic architecture underlying functional enrichment of complex traits," Nature Communications, Nature, vol. 12(1), pages 1-16, December.
    8. Ye, Mao & Zhang, Peng & Nie, Lizhen, 2018. "Clustering sparse binary data with hierarchical Bayesian Bernoulli mixture model," Computational Statistics & Data Analysis, Elsevier, vol. 123(C), pages 32-49.
    9. Carla Márquez-Luna & Steven Gazal & Po-Ru Loh & Samuel S. Kim & Nicholas Furlotte & Adam Auton & Alkes L. Price, 2021. "Incorporating functional priors improves polygenic prediction accuracy in UK Biobank and 23andMe data sets," Nature Communications, Nature, vol. 12(1), pages 1-11, December.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Frommlet, Florian & Ruhaltinger, Felix & Twaróg, Piotr & Bogdan, Małgorzata, 2012. "Modified versions of Bayesian Information Criterion for genome-wide association studies," Computational Statistics & Data Analysis, Elsevier, vol. 56(5), pages 1038-1051.
    2. Ahmed Ismaïl & Hartikainen Anna-Liisa & Järvelin Marjo-Riitta & Richardson Sylvia, 2011. "False Discovery Rate Estimation for Stability Selection: Application to Genome-Wide Association Studies," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 10(1), pages 1-20, November.
    3. Szefer Elena & Graham Jinko & Lu Donghuan & Beg Mirza Faisal & Nathoo Farouk, 2017. "Multivariate association between single-nucleotide polymorphisms in Alzgene linkage regions and structural changes in the brain: discovery, refinement and validation," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 16(5-6), pages 349-365, December.
    4. Lee Anthony & Caron Francois & Doucet Arnaud & Holmes Chris, 2012. "Bayesian Sparsity-Path-Analysis of Genetic Association Signal using Generalized t Priors," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 11(2), pages 1-31, January.
    5. Hai-Yan Lü & Xiao-Fen Liu & Shi-Ping Wei & Yuan-Ming Zhang, 2011. "Epistatic Association Mapping in Homozygous Crop Cultivars," PLOS ONE, Public Library of Science, vol. 6(3), pages 1-10, March.
    6. Claude Renaux & Laura Buzdugan & Markus Kalisch & Peter Bühlmann, 2020. "Hierarchical inference for genome-wide association studies: a view on methodology with software," Computational Statistics, Springer, vol. 35(1), pages 1-40, March.
    7. Silver Matt & Montana Giovanni & Alzheimer's Disease Neuroimaging Initiative, 2012. "Fast Identification of Biological Pathways Associated with a Quantitative Trait Using Group Lasso with Overlaps," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 11(1), pages 1-43, January.
    8. Gabriel E Hoffman & Benjamin A Logsdon & Jason G Mezey, 2013. "PUMA: A Unified Framework for Penalized Multiple Regression Analysis of GWAS Data," PLOS Computational Biology, Public Library of Science, vol. 9(6), pages 1-19, June.
    9. Laura N Anderson & Laurent Briollais & Helen C Atkinson & Julie A Marsh & Jingxiong Xu & Kristin L Connor & Stephen G Matthews & Craig E Pennell & Stephen J Lye, 2014. "Investigation of Genetic Variants, Birthweight and Hypothalamic-Pituitary-Adrenal Axis Function Suggests a Genetic Variant in the SERPINA6 Gene Is Associated with Corticosteroid Binding Globulin in th," PLOS ONE, Public Library of Science, vol. 9(4), pages 1-8, April.
    10. Tomi Peltola & Pekka Marttinen & Aki Vehtari, 2012. "Finite Adaptation and Multistep Moves in the Metropolis-Hastings Algorithm for Variable Selection in Genome-Wide Association Analysis," PLOS ONE, Public Library of Science, vol. 7(11), pages 1-11, November.
    11. Castro, Bruno M. & Lemes, Renan B. & Cesar, Jonatas & Hünemeier, Tábita & Leonardi, Florencia, 2018. "A model selection approach for multiple sequence segmentation and dimensionality reduction," Journal of Multivariate Analysis, Elsevier, vol. 167(C), pages 319-330.
    12. Gao Wang & Abhishek Sarkar & Peter Carbonetto & Matthew Stephens, 2020. "A simple new approach to variable selection in regression, with application to genetic fine mapping," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 82(5), pages 1273-1300, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pgen00:1004969. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosgenetics (email available below). General contact details of provider: https://journals.plos.org/plosgenetics/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.