IDEAS home Printed from https://ideas.repec.org/a/nat/natcom/v14y2023i1d10.1038_s41467-023-38930-7.html
   My bibliography  Save this article

Optimal strategies for learning multi-ancestry polygenic scores vary across traits

Author

Listed:
  • Brieuc Lehmann

    (University College London)

  • Maxine Mackintosh

    (Genomics England
    The Alan Turing Institute)

  • Gil McVean

    (University of Oxford)

  • Chris Holmes

    (The Alan Turing Institute
    University of Oxford
    University of Oxford)

Abstract

Polygenic scores (PGSs) are individual-level measures that aggregate the genome-wide genetic predisposition to a given trait. As PGS have predominantly been developed using European-ancestry samples, trait prediction using such European ancestry-derived PGS is less accurate in non-European ancestry individuals. Although there has been recent progress in combining multiple PGS trained on distinct populations, the problem of how to maximize performance given a multiple-ancestry cohort is largely unexplored. Here, we investigate the effect of sample size and ancestry composition on PGS performance for fifteen traits in UK Biobank. For some traits, PGS estimated using a relatively small African-ancestry training set outperformed, on an African-ancestry test set, PGS estimated using a much larger European-ancestry only training set. We observe similar, but not identical, results when considering other minority-ancestry groups within UK Biobank. Our results emphasise the importance of targeted data collection from underrepresented groups in order to address existing disparities in PGS performance.

Suggested Citation

  • Brieuc Lehmann & Maxine Mackintosh & Gil McVean & Chris Holmes, 2023. "Optimal strategies for learning multi-ancestry polygenic scores vary across traits," Nature Communications, Nature, vol. 14(1), pages 1-15, December.
  • Handle: RePEc:nat:natcom:v:14:y:2023:i:1:d:10.1038_s41467-023-38930-7
    DOI: 10.1038/s41467-023-38930-7
    as

    Download full text from publisher

    File URL: https://www.nature.com/articles/s41467-023-38930-7
    File Function: Abstract
    Download Restriction: no

    File URL: https://libkey.io/10.1038/s41467-023-38930-7?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Armin P. Schoech & Daniel M. Jordan & Po-Ru Loh & Steven Gazal & Luke J. O’Connor & Daniel J. Balick & Pier F. Palamara & Hilary K. Finucane & Shamil R. Sunyaev & Alkes L. Price, 2019. "Quantification of frequency-dependent genetic architectures in 25 UK Biobank traits reveals action of negative selection," Nature Communications, Nature, vol. 10(1), pages 1-10, December.
    2. Clare Bycroft & Colin Freeman & Desislava Petkova & Gavin Band & Lloyd T. Elliott & Kevin Sharp & Allan Motyer & Damjan Vukcevic & Olivier Delaneau & Jared O’Connell & Adrian Cortes & Samantha Welsh &, 2018. "The UK Biobank resource with deep phenotyping and genomic data," Nature, Nature, vol. 562(7726), pages 203-209, October.
    3. Marco Scutari & Ian Mackay & David Balding, 2016. "Using Genetic Distance to Infer the Accuracy of Genomic Prediction," PLOS Genetics, Public Library of Science, vol. 12(9), pages 1-19, September.
    4. Ying Wang & Jing Guo & Guiyan Ni & Jian Yang & Peter M. Visscher & Loic Yengo, 2020. "Theoretical and empirical quantification of the accuracy of polygenic scores in ancestry divergent populations," Nature Communications, Nature, vol. 11(1), pages 1-9, December.
    5. Xiang Zhou & Peter Carbonetto & Matthew Stephens, 2013. "Polygenic Modeling with Bayesian Sparse Linear Mixed Models," PLOS Genetics, Public Library of Science, vol. 9(2), pages 1-14, February.
    6. Jerome Kelleher & Alison M Etheridge & Gilean McVean, 2016. "Efficient Coalescent Simulation and Genealogical Analysis for Large Sample Sizes," PLOS Computational Biology, Public Library of Science, vol. 12(5), pages 1-22, May.
    7. Friedman, Jerome H. & Hastie, Trevor & Tibshirani, Rob, 2010. "Regularization Paths for Generalized Linear Models via Coordinate Descent," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 33(i01).
    8. Junyang Qian & Yosuke Tanigawa & Wenfei Du & Matthew Aguirre & Chris Chang & Robert Tibshirani & Manuel A Rivas & Trevor Hastie, 2020. "A fast and scalable framework for large-scale and ultrahigh-dimensional sparse regression with application to the UK Biobank," PLOS Genetics, Public Library of Science, vol. 16(10), pages 1-30, October.
    9. Sebastian Okser & Tapio Pahikkala & Antti Airola & Tapio Salakoski & Samuli Ripatti & Tero Aittokallio, 2014. "Regularized Machine Learning in the Genetic Prediction of Complex Traits," PLOS Genetics, Public Library of Science, vol. 10(11), pages 1-9, November.
    10. Alice B. Popejoy & Stephanie M. Fullerton, 2016. "Genomics is failing on diversity," Nature, Nature, vol. 538(7624), pages 161-164, October.
    11. L. Duncan & H. Shen & B. Gelaye & J. Meijsen & K. Ressler & M. Feldman & R. Peterson & B. Domingue, 2019. "Analysis of polygenic risk score usage and performance in diverse human populations," Nature Communications, Nature, vol. 10(1), pages 1-9, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Carla Márquez-Luna & Steven Gazal & Po-Ru Loh & Samuel S. Kim & Nicholas Furlotte & Adam Auton & Alkes L. Price, 2021. "Incorporating functional priors improves polygenic prediction accuracy in UK Biobank and 23andMe data sets," Nature Communications, Nature, vol. 12(1), pages 1-11, December.
    2. Yosuke Tanigawa & Junyang Qian & Guhan Venkataraman & Johanne Marie Justesen & Ruilin Li & Robert Tibshirani & Trevor Hastie & Manuel A Rivas, 2022. "Significant sparse polygenic risk scores across 813 traits in UK Biobank," PLOS Genetics, Public Library of Science, vol. 18(3), pages 1-21, March.
    3. Jiacheng Miao & Hanmin Guo & Gefei Song & Zijie Zhao & Lin Hou & Qiongshi Lu, 2023. "Quantifying portable genetic effects and improving cross-ancestry genetic prediction with GWAS summary statistics," Nature Communications, Nature, vol. 14(1), pages 1-13, December.
    4. Zhen Qiao & Julia Sidorenko & Joana A. Revez & Angli Xue & Xueling Lu & Katri Pärna & Harold Snieder & Peter M. Visscher & Naomi R. Wray & Loic Yengo, 2023. "Estimation and implications of the genetic architecture of fasting and non-fasting blood glucose," Nature Communications, Nature, vol. 14(1), pages 1-11, December.
    5. The Tien Mai, 2023. "Reliable Genetic Correlation Estimation via Multiple Sample Splitting and Smoothing," Mathematics, MDPI, vol. 11(9), pages 1-13, May.
    6. Pereira, Rita & Biroli, Pietro & von hinke, stephanie & Van Kippersluis, Hans & Galama, Titus & Rietveld, Niels & Thom, Kevin, 2022. "Gene-Environment Interplay in the Social Sciences," OSF Preprints d96z3, Center for Open Science.
    7. Hui Li & Rahul Mazumder & Xihong Lin, 2023. "Accurate and efficient estimation of local heritability using summary statistics and the linkage disequilibrium matrix," Nature Communications, Nature, vol. 14(1), pages 1-13, December.
    8. H. Serhat Tetikol & Deniz Turgut & Kubra Narci & Gungor Budak & Ozem Kalay & Elif Arslan & Sinem Demirkaya-Budak & Alexey Dolgoborodov & Duygu Kabakci-Zorlu & Vladimir Semenyuk & Amit Jain & Brandi N., 2022. "Pan-African genome demonstrates how population-specific genome graphs improve high-throughput sequencing data analysis," Nature Communications, Nature, vol. 13(1), pages 1-11, December.
    9. Qin Qin Huang & Neneh Sallah & Diana Dunca & Bhavi Trivedi & Karen A. Hunt & Sam Hodgson & Samuel A. Lambert & Elena Arciero & John Wright & Chris Griffiths & Richard C. Trembath & Harry Hemingway & M, 2022. "Transferability of genetic loci and polygenic scores for cardiometabolic traits in British Pakistani and Bangladeshi individuals," Nature Communications, Nature, vol. 13(1), pages 1-11, December.
    10. Junyang Qian & Yosuke Tanigawa & Wenfei Du & Matthew Aguirre & Chris Chang & Robert Tibshirani & Manuel A Rivas & Trevor Hastie, 2020. "A fast and scalable framework for large-scale and ultrahigh-dimensional sparse regression with application to the UK Biobank," PLOS Genetics, Public Library of Science, vol. 16(10), pages 1-30, October.
    11. Ananyo Choudhury & Jean-Tristan Brandenburg & Tinashe Chikowore & Dhriti Sengupta & Palwende Romuald Boua & Nigel J. Crowther & Godfred Agongo & Gershim Asiki & F. Xavier Gómez-Olivé & Isaac Kisiangan, 2022. "Meta-analysis of sub-Saharan African studies provides insights into genetic architecture of lipid traits," Nature Communications, Nature, vol. 13(1), pages 1-13, December.
    12. Heather E Wheeler & Kaanan P Shah & Jonathon Brenner & Tzintzuni Garcia & Keston Aquino-Michaels & GTEx Consortium & Nancy J Cox & Dan L Nicolae & Hae Kyung Im, 2016. "Survey of the Heritability and Sparse Architecture of Gene Expression Traits across Human Tissues," PLOS Genetics, Public Library of Science, vol. 12(11), pages 1-23, November.
    13. Laurin Charles & Boomsma Dorret & Lubke Gitta, 2016. "The use of vector bootstrapping to improve variable selection precision in Lasso models," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 15(4), pages 305-320, August.
    14. Clara Albiñana & Zhihong Zhu & Andrew J. Schork & Andrés Ingason & Hugues Aschard & Isabell Brikell & Cynthia M. Bulik & Liselotte V. Petersen & Esben Agerbo & Jakob Grove & Merete Nordentoft & David , 2023. "Multi-PGS enhances polygenic prediction by combining 937 polygenic scores," Nature Communications, Nature, vol. 14(1), pages 1-11, December.
    15. Brandy M Mapes & Christopher S Foster & Sheila V Kusnoor & Marcia I Epelbaum & Mona AuYoung & Gwynne Jenkins & Maria Lopez-Class & Dara Richardson-Heron & Ahmed Elmi & Karl Surkan & Robert M Cronin & , 2020. "Diversity and inclusion for the All of Us research program: A scoping review," PLOS ONE, Public Library of Science, vol. 15(7), pages 1-14, July.
    16. Niloy Biswas & Anirban Bhattacharya & Pierre E. Jacob & James E. Johndrow, 2022. "Coupling‐based convergence assessment of some Gibbs samplers for high‐dimensional Bayesian regression with shrinkage priors," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 84(3), pages 973-996, July.
    17. Tutz, Gerhard & Pößnecker, Wolfgang & Uhlmann, Lorenz, 2015. "Variable selection in general multinomial logit models," Computational Statistics & Data Analysis, Elsevier, vol. 82(C), pages 207-222.
    18. Matteo Di Scipio & Mohammad Khan & Shihong Mao & Michael Chong & Conor Judge & Nazia Pathan & Nicolas Perrot & Walter Nelson & Ricky Lali & Shuang Di & Robert Morton & Jeremy Petch & Guillaume Paré, 2023. "A versatile, fast and unbiased method for estimation of gene-by-environment interaction effects on biobank-scale datasets," Nature Communications, Nature, vol. 14(1), pages 1-15, December.
    19. Dominic Holland & Oleksandr Frei & Rahul Desikan & Chun-Chieh Fan & Alexey A Shadrin & Olav B Smeland & V S Sundar & Paul Thompson & Ole A Andreassen & Anders M Dale, 2020. "Beyond SNP heritability: Polygenicity and discoverability of phenotypes estimated with a univariate Gaussian mixture model," PLOS Genetics, Public Library of Science, vol. 16(5), pages 1-30, May.
    20. Ernesto Carrella & Richard M. Bailey & Jens Koed Madsen, 2018. "Indirect inference through prediction," Papers 1807.01579, arXiv.org.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:natcom:v:14:y:2023:i:1:d:10.1038_s41467-023-38930-7. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.