IDEAS home Printed from https://ideas.repec.org/a/nat/natcom/v16y2025i1d10.1038_s41467-025-57885-5.html
   My bibliography  Save this article

Diverse ancestral representation improves genetic intolerance metrics

Author

Listed:
  • Alexander L. Han

    (Baylor College of Medicine
    Texas Children’s Hospital)

  • Chloe F. Sands

    (Baylor College of Medicine
    Texas Children’s Hospital)

  • Dorota Matelska

    (AstraZeneca)

  • Jessica C. Butts

    (Rice University
    Rice University)

  • Vida Ravanmehr

    (Baylor College of Medicine
    Texas Children’s Hospital)

  • Fengyuan Hu

    (AstraZeneca)

  • Esmeralda Villavicencio Gonzalez

    (Texas Children’s Hospital
    Baylor College of Medicine)

  • Nicholas Katsanis

    (Galatea Bio, Inc)

  • Carlos D. Bustamante

    (Galatea Bio, Inc)

  • Quanli Wang

    (AstraZeneca)

  • Slavé Petrovski

    (AstraZeneca
    University of Melbourne)

  • Dimitrios Vitsios

    (AstraZeneca)

  • Ryan S. Dhindsa

    (Baylor College of Medicine
    Texas Children’s Hospital
    Baylor College of Medicine)

Abstract

The unprecedented scale of genomic databases has revolutionized our ability to identify regions in the human genome intolerant to variation—regions often implicated in disease. However, these datasets remain constrained by limited ancestral diversity. Here, we analyze whole-exome sequencing data from 460,551 UK Biobank and 125,748 Genome Aggregation Database (gnomAD) participants across multiple ancestries to test several key intolerance metrics, including the Residual Variance Intolerance Score (RVIS), Missense Tolerance Ratio (MTR), and Loss-of-Function Observed/Expected ratio (LOF O/E). We demonstrate that increasing ancestral representation, rather than sample size alone, critically drives their performance. Scores trained on variation observed in African and Admixed American ancestral groups show higher resolution in detecting haploinsufficient and neurodevelopmental disease risk genes compared to scores trained on European ancestry groups. Most strikingly, MTR trained on 43,000 multi-ancestry exomes demonstrates greater predictive power than when trained on a nearly 10-fold larger dataset of 440,000 non-Finnish European exomes. We further find that European ancestry group-based scores are likely approaching saturation. These findings highlight the need for enhanced population representation in genomic resources to fully realize the potential of precision medicine and drug discovery. Ancestry group-specific scores are publicly available through an interactive portal: http://intolerance.public.cgr.astrazeneca.com/ .

Suggested Citation

  • Alexander L. Han & Chloe F. Sands & Dorota Matelska & Jessica C. Butts & Vida Ravanmehr & Fengyuan Hu & Esmeralda Villavicencio Gonzalez & Nicholas Katsanis & Carlos D. Bustamante & Quanli Wang & Slav, 2025. "Diverse ancestral representation improves genetic intolerance metrics," Nature Communications, Nature, vol. 16(1), pages 1-9, December.
  • Handle: RePEc:nat:natcom:v:16:y:2025:i:1:d:10.1038_s41467-025-57885-5
    DOI: 10.1038/s41467-025-57885-5
    as

    Download full text from publisher

    File URL: https://www.nature.com/articles/s41467-025-57885-5
    File Function: Abstract
    Download Restriction: no

    File URL: https://libkey.io/10.1038/s41467-025-57885-5?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Dimitrios Vitsios & Ryan S. Dhindsa & Lawrence Middleton & Ayal B. Gussow & Slavé Petrovski, 2021. "Prioritizing non-coding regions based on human genomic constraint and sequence context with deep learning," Nature Communications, Nature, vol. 12(1), pages 1-14, December.
    2. Bjarni V. Halldorsson & Hannes P. Eggertsson & Kristjan H. S. Moore & Hannes Hauswedell & Ogmundur Eiriksson & Magnus O. Ulfarsson & Gunnar Palsson & Marteinn T. Hardarson & Asmundur Oddsson & Brynjar, 2022. "The sequences of 150,119 genomes in the UK Biobank," Nature, Nature, vol. 607(7920), pages 732-740, July.
    3. Yali Xue & Massimo Mezzavilla & Marc Haber & Shane McCarthy & Yuan Chen & Vagheesh Narasimhan & Arthur Gilly & Qasim Ayub & Vincenza Colonna & Lorraine Southam & Christopher Finan & Andrea Massaia & H, 2017. "Enrichment of low-frequency functional variants revealed by whole-genome sequencing of multiple isolated European populations," Nature Communications, Nature, vol. 8(1), pages 1-7, August.
    4. Quanli Wang & Ryan S. Dhindsa & Keren Carss & Andrew R. Harper & Abhishek Nag & Ioanna Tachmazidou & Dimitrios Vitsios & Sri V. V. Deevi & Alex Mackay & Daniel Muthas & Michael Hühn & Susan Monkley & , 2021. "Rare variant contribution to human disease in 281,104 UK Biobank exomes," Nature, Nature, vol. 597(7877), pages 527-532, September.
    5. Danish Saleheen & Pradeep Natarajan & Irina M. Armean & Wei Zhao & Asif Rasheed & Sumeet A. Khetarpal & Hong-Hee Won & Konrad J. Karczewski & Anne H. O’Donnell-Luria & Kaitlin E. Samocha & Benjamin We, 2017. "Human knockouts and phenotypic analysis in a cohort with a high rate of consanguinity," Nature, Nature, vol. 544(7649), pages 235-239, April.
    6. Ryan S. Dhindsa & Oliver S. Burren & Benjamin B. Sun & Bram P. Prins & Dorota Matelska & Eleanor Wheeler & Jonathan Mitchell & Erin Oerton & Ventzislava A. Hristova & Katherine R. Smith & Keren Carss , 2023. "Rare variant associations with plasma protein levels in the UK Biobank," Nature, Nature, vol. 622(7982), pages 339-347, October.
    7. Siwei Chen & Laurent C. Francioli & Julia K. Goodrich & Ryan L. Collins & Masahiro Kanai & Qingbo Wang & Jessica Alföldi & Nicholas A. Watts & Christopher Vittal & Laura D. Gauthier & Timothy Poterba , 2024. "A genomic mutational constraint map using variation in 76,156 human genomes," Nature, Nature, vol. 625(7993), pages 92-100, January.
    8. Siwei Chen & Laurent C. Francioli & Julia K. Goodrich & Ryan L. Collins & Masahiro Kanai & Qingbo Wang & Jessica Alföldi & Nicholas A. Watts & Christopher Vittal & Laura D. Gauthier & Timothy Poterba , 2024. "Author Correction: A genomic mutational constraint map using variation in 76,156 human genomes," Nature, Nature, vol. 626(7997), pages 1-1, February.
    9. Konrad J. Karczewski & Laurent C. Francioli & Grace Tiao & Beryl B. Cummings & Jessica Alföldi & Qingbo Wang & Ryan L. Collins & Kristen M. Laricchia & Andrea Ganna & Daniel P. Birnbaum & Laura D. Gau, 2020. "The mutational constraint spectrum quantified from variation in 141,456 humans," Nature, Nature, vol. 581(7809), pages 434-443, May.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Leslie A. Smith & James A. Cahill & Ji-Hyun Lee & Kiley Graim, 2025. "Equitable machine learning counteracts ancestral bias in precision medicine," Nature Communications, Nature, vol. 16(1), pages 1-17, December.
    2. Mihail Halachev & Viktoria-Eleni Gountouna & Alison Meynert & Gannie Tzoneva & Alan R. Shuldiner & Colin A. Semple & James F. Wilson, 2024. "Regionally enriched rare deleterious exonic variants in the UK and Ireland," Nature Communications, Nature, vol. 15(1), pages 1-14, December.
    3. Gareth Hawkes & Robin N. Beaumont & Zilin Li & Ravi Mandla & Xihao Li & Christine M. Albert & Donna K. Arnett & Allison E. Ashley-Koch & Aneel A. Ashrani & Kathleen C. Barnes & Eric Boerwinkle & Jenni, 2024. "Whole-genome sequencing in 333,100 individuals reveals rare non-coding single variant and aggregate associations with height," Nature Communications, Nature, vol. 15(1), pages 1-11, December.
    4. Margaret Sunitha Selvaraj & Xihao Li & Zilin Li & Akhil Pampana & David Y. Zhang & Joseph Park & Stella Aslibekyan & Joshua C. Bis & Jennifer A. Brody & Brian E. Cade & Lee-Ming Chuang & Ren-Hua Chung, 2022. "Whole genome sequence analysis of blood lipid levels in >66,000 individuals," Nature Communications, Nature, vol. 13(1), pages 1-18, December.
    5. Andrea B. Jonsdottir & Gardar Sveinbjornsson & Rosa B. Thorolfsdottir & Max Tamlander & Vinicius Tragante & Thorhildur Olafsdottir & Solvi Rognvaldsson & Asgeir Sigurdsson & Hannes P. Eggertsson & Hil, 2025. "Missense variants in FRS3 affect body mass index in populations of diverse ancestries," Nature Communications, Nature, vol. 16(1), pages 1-16, December.
    6. Maryam May & Aaron Chuah & Nicole Lehmann & Llewelyn Goodall & Vicky Cho & T. Daniel Andrews, 2025. "Functionally constrained human proteins are less prone to mutational instability from single amino acid substitutions," Nature Communications, Nature, vol. 16(1), pages 1-11, December.
    7. Katherine A. Kentistou & Brandon E. M. Lim & Lena R. Kaisinger & Valgerdur Steinthorsdottir & Luke N. Sharp & Kashyap A. Patel & Vinicius Tragante & Gareth Hawkes & Eugene J. Gardner & Thorhildur Olaf, 2025. "Rare variant associations with birth weight identify genes involved in adipose tissue regulation, placental function and insulin-like growth factor signalling," Nature Communications, Nature, vol. 16(1), pages 1-12, December.
    8. Jonathan Mitchell & Niedzica Camacho & Patrick Shea & Konrad H. Stopsack & Vijai Joseph & Oliver S. Burren & Ryan S. Dhindsa & Abhishek Nag & Jacob E. Berchuck & Amanda O’Neill & Ali Abbasi & Anthony , 2025. "Assessing the contribution of rare protein-coding germline variants to prostate cancer risk and severity in 37,184 cases," Nature Communications, Nature, vol. 16(1), pages 1-11, December.
    9. Marcin Kierczak & Nima Rafati & Julia Höglund & Hadrien Gourlé & Valeria Lo Faro & Daniel Schmitz & Weronica E. Ek & Ulf Gyllensten & Stefan Enroth & Diana Ekman & Björn Nystedt & Torgny Karlsson & Ås, 2022. "Contribution of rare whole-genome sequencing variants to plasma protein levels and the missing heritability," Nature Communications, Nature, vol. 13(1), pages 1-12, December.
    10. Matthias Wuttke & Eva König & Maria-Alexandra Katsara & Holger Kirsten & Saeed Khomeijani Farahani & Alexander Teumer & Yong Li & Martin Lang & Burulca Göcmen & Cristian Pattaro & Dorothee Günzel & An, 2023. "Imputation-powered whole-exome analysis identifies genes associated with kidney function and disease in the UK Biobank," Nature Communications, Nature, vol. 14(1), pages 1-16, December.
    11. Alexander T. Williams & Jing Chen & Kayesha Coley & Chiara Batini & Abril Izquierdo & Richard Packer & Erik Abner & Stavroula Kanoni & David J. Shepherd & Robert C. Free & Edward J. Hollox & Nigel J. , 2023. "Genome-wide association study of thyroid-stimulating hormone highlights new genes, pathways and associations with thyroid disease," Nature Communications, Nature, vol. 14(1), pages 1-14, December.
    12. Mischan Vali-Pour & Solip Park & Jose Espinosa-Carrasco & Daniel Ortiz-Martínez & Ben Lehner & Fran Supek, 2022. "The impact of rare germline variants on human somatic mutation processes," Nature Communications, Nature, vol. 13(1), pages 1-21, December.
    13. Asmundur Oddsson & Patrick Sulem & Gardar Sveinbjornsson & Gudny A. Arnadottir & Valgerdur Steinthorsdottir & Gisli H. Halldorsson & Bjarni A. Atlason & Gudjon R. Oskarsson & Hannes Helgason & Henriet, 2023. "Deficit of homozygosity among 1.52 million individuals and genetic causes of recessive lethality," Nature Communications, Nature, vol. 14(1), pages 1-15, December.
    14. Marie C. Sadler & Alexander Apostolov & Caterina Cevallos & Chiara Auwerx & Diogo M. Ribeiro & Russ B. Altman & Zoltán Kutalik, 2025. "Leveraging large-scale biobank EHRs to enhance pharmacogenetics of cardiometabolic disease medications," Nature Communications, Nature, vol. 16(1), pages 1-18, December.
    15. Matthew Tegtmeyer & Jatin Arora & Samira Asgari & Beth A. Cimini & Ajay Nadig & Emily Peirent & Dhara Liyanage & Gregory P. Way & Erin Weisbart & Aparna Nathan & Tiffany Amariuta & Kevin Eggan & Marzi, 2024. "High-dimensional phenotyping to define the genetic basis of cellular morphology," Nature Communications, Nature, vol. 15(1), pages 1-12, December.
    16. Xiaoyi Raymond Gao & Marion Chiariglione & Alexander J. Arch, 2022. "Whole-exome sequencing study identifies rare variants and genes associated with intraocular pressure and glaucoma," Nature Communications, Nature, vol. 13(1), pages 1-10, December.
    17. Nazia Pathan & Wei Q. Deng & Matteo Di Scipio & Mohammad Khan & Shihong Mao & Robert W. Morton & Ricky Lali & Marie Pigeyre & Michael R. Chong & Guillaume Paré, 2024. "A method to estimate the contribution of rare coding variants to complex trait heritability," Nature Communications, Nature, vol. 15(1), pages 1-16, December.
    18. Aimee M. Deaton & Aditi Dubey & Lucas D. Ward & Peter Dornbos & Jason Flannick & Elaine Yee & Simina Ticau & Leila Noetzli & Margaret M. Parker & Rachel A. Hoffing & Carissa Willis & Mollie E. Plekan , 2022. "Rare loss of function variants in the hepatokine gene INHBE protect from abdominal obesity," Nature Communications, Nature, vol. 13(1), pages 1-12, December.
    19. Jonathan E. Shoag & Amoolya Srinivasa & Caitlin A. Loh & Mei Hong Liu & Emilie Lassen & Shana Melanaphy & Benjamin M. Costa & Marta Grońska-Pęski & Nisrine T. Jabara & Shany Picciotto & Una Choi & Any, 2025. "Direct measurement of the male germline mutation rate in individuals using sequential sperm samples," Nature Communications, Nature, vol. 16(1), pages 1-13, December.
    20. Scott D. Findlay & Lindsay Romo & Christopher B. Burge, 2024. "Quantifying negative selection in human 3ʹ UTRs uncovers constrained targets of RNA-binding proteins," Nature Communications, Nature, vol. 15(1), pages 1-15, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:natcom:v:16:y:2025:i:1:d:10.1038_s41467-025-57885-5. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.