IDEAS home Printed from https://ideas.repec.org/a/plo/pgen00/1007856.html
   My bibliography  Save this article

Bayesian multiple logistic regression for case-control GWAS

Author

Listed:
  • Saikat Banerjee
  • Lingyao Zeng
  • Heribert Schunkert
  • Johannes Söding

Abstract

Genetic variants in genome-wide association studies (GWAS) are tested for disease association mostly using simple regression, one variant at a time. Standard approaches to improve power in detecting disease-associated SNPs use multiple regression with Bayesian variable selection in which a sparsity-enforcing prior on effect sizes is used to avoid overtraining and all effect sizes are integrated out for posterior inference. For binary traits, the logistic model has not yielded clear improvements over the linear model. For multi-SNP analysis, the logistic model required costly and technically challenging MCMC sampling to perform the integration. Here, we introduce the quasi-Laplace approximation to solve the integral and avoid MCMC sampling. We expect the logistic model to perform much better than multiple linear regression except when predicted disease risks are spread closely around 0.5, because only close to its inflection point can the logistic function be well approximated by a linear function. Indeed, in extensive benchmarks with simulated phenotypes and real genotypes, our Bayesian multiple LOgistic REgression method (B-LORE) showed considerable improvements (1) when regressing on many variants in multiple loci at heritabilities ≥ 0.4 and (2) for unbalanced case-control ratios. B-LORE also enables meta-analysis by approximating the likelihood functions of individual studies by multivariate normal distributions, using their means and covariance matrices as summary statistics. Our work should make sparse multiple logistic regression attractive also for other applications with binary target variables. B-LORE is freely available from: https://github.com/soedinglab/b-lore.Author summary: In recent years, genome wide association studies (GWAS) have become the primary approach for identifying genetic variants associated with the origination of complex diseases. In case-control GWAS, the genotypes of roughly equal number of diseased (“cases”) and healthy (“controls”) people are compared to determine which genetic variants are significantly more frequent among cases. From the disease-associated variants we hope to get insights into how the disease develops. To find the disease-associated variants, a linear relationship between the disease risk and the number of minor alleles at the variant sites has usually been assumed, because the more appropriate sigmoid relationship requires slow and cumbersome sampling techniques. We found an efficient analytical approximation that renders sampling unnecessary and makes our multiple logistic regression model easy to train. We show that it outperforms the usually employed multiple linear regression model whenever nonlinearities become strong, which is the case, for example, when the numbers of case and control patients differ significantly. Therefore, novel genetic disease-associated variants could be found by adding controls to existing case-control GWAS and reanalyzing them with B-LORE.

Suggested Citation

  • Saikat Banerjee & Lingyao Zeng & Heribert Schunkert & Johannes Söding, 2018. "Bayesian multiple logistic regression for case-control GWAS," PLOS Genetics, Public Library of Science, vol. 14(12), pages 1-27, December.
  • Handle: RePEc:plo:pgen00:1007856
    DOI: 10.1371/journal.pgen.1007856
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1007856
    Download Restriction: no

    File URL: https://journals.plos.org/plosgenetics/article/file?id=10.1371/journal.pgen.1007856&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pgen.1007856?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Xiang Zhou & Peter Carbonetto & Matthew Stephens, 2013. "Polygenic Modeling with Bayesian Sparse Linear Mixed Models," PLOS Genetics, Public Library of Science, vol. 9(2), pages 1-14, February.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Lorena Alonso & Ignasi Morán & Cecilia Salvoro & David Torrents, 2021. "In Search of Complex Disease Risk through Genome Wide Association Studies," Mathematics, MDPI, vol. 9(23), pages 1-26, November.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Dominic Holland & Oleksandr Frei & Rahul Desikan & Chun-Chieh Fan & Alexey A Shadrin & Olav B Smeland & V S Sundar & Paul Thompson & Ole A Andreassen & Anders M Dale, 2020. "Beyond SNP heritability: Polygenicity and discoverability of phenotypes estimated with a univariate Gaussian mixture model," PLOS Genetics, Public Library of Science, vol. 16(5), pages 1-30, May.
    2. Yiming Hu & Qiongshi Lu & Ryan Powles & Xinwei Yao & Can Yang & Fang Fang & Xinran Xu & Hongyu Zhao, 2017. "Leveraging functional annotations in genetic risk prediction for human complex diseases," PLOS Computational Biology, Public Library of Science, vol. 13(6), pages 1-16, June.
    3. Carla Márquez-Luna & Steven Gazal & Po-Ru Loh & Samuel S. Kim & Nicholas Furlotte & Adam Auton & Alkes L. Price, 2021. "Incorporating functional priors improves polygenic prediction accuracy in UK Biobank and 23andMe data sets," Nature Communications, Nature, vol. 12(1), pages 1-11, December.
    4. Yanyi Song & Xiang Zhou & Min Zhang & Wei Zhao & Yongmei Liu & Sharon L. R. Kardia & Ana V. Diez Roux & Belinda L. Needham & Jennifer A. Smith & Bhramar Mukherjee, 2020. "Bayesian shrinkage estimation of high dimensional causal mediation effects in omics studies," Biometrics, The International Biometric Society, vol. 76(3), pages 700-710, September.
    5. Hui Li & Rahul Mazumder & Xihong Lin, 2023. "Accurate and efficient estimation of local heritability using summary statistics and the linkage disequilibrium matrix," Nature Communications, Nature, vol. 14(1), pages 1-13, December.
    6. McMahan Christopher & Bridges William & Joyner Chase & Lund Robert & Baurley James & Kacamarga Muhamad Fitra & Pardamean Carissa & Pardamean Bens, 2017. "A Bayesian hierarchical model for identifying significant polygenic effects while controlling for confounding and repeated measures," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 16(5-6), pages 407-419, December.
    7. Gao Wang & Abhishek Sarkar & Peter Carbonetto & Matthew Stephens, 2020. "A simple new approach to variable selection in regression, with application to genetic fine mapping," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 82(5), pages 1273-1300, December.
    8. Heather E Wheeler & Kaanan P Shah & Jonathon Brenner & Tzintzuni Garcia & Keston Aquino-Michaels & GTEx Consortium & Nancy J Cox & Dan L Nicolae & Hae Kyung Im, 2016. "Survey of the Heritability and Sparse Architecture of Gene Expression Traits across Human Tissues," PLOS Genetics, Public Library of Science, vol. 12(11), pages 1-23, November.
    9. Lulu Shang & Wei Zhao & Yi Zhe Wang & Zheng Li & Jerome J. Choi & Minjung Kho & Thomas H. Mosley & Sharon L. R. Kardia & Jennifer A. Smith & Xiang Zhou, 2023. "meQTL mapping in the GENOA study reveals genetic determinants of DNA methylation in African Americans," Nature Communications, Nature, vol. 14(1), pages 1-16, December.
    10. Abrahamsen, Tavis & Hobert, James P., 2019. "Fast Monte Carlo Markov chains for Bayesian shrinkage models with random effects," Journal of Multivariate Analysis, Elsevier, vol. 169(C), pages 61-80.
    11. Niloy Biswas & Anirban Bhattacharya & Pierre E. Jacob & James E. Johndrow, 2022. "Coupling‐based convergence assessment of some Gibbs samplers for high‐dimensional Bayesian regression with shrinkage priors," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 84(3), pages 973-996, July.
    12. Brieuc Lehmann & Maxine Mackintosh & Gil McVean & Chris Holmes, 2023. "Optimal strategies for learning multi-ancestry polygenic scores vary across traits," Nature Communications, Nature, vol. 14(1), pages 1-15, December.
    13. Yiming Hu & Qiongshi Lu & Wei Liu & Yuhua Zhang & Mo Li & Hongyu Zhao, 2017. "Joint modeling of genetically correlated diseases and functional annotations increases accuracy of polygenic risk prediction," PLOS Genetics, Public Library of Science, vol. 13(6), pages 1-22, June.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pgen00:1007856. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosgenetics (email available below). General contact details of provider: https://journals.plos.org/plosgenetics/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.