IDEAS home Printed from https://ideas.repec.org/a/nat/natcom/v16y2025i1d10.1038_s41467-025-59937-2.html
   My bibliography  Save this article

A probabilistic graphical model for estimating selection coefficients of nonsynonymous variants from human population sequence data

Author

Listed:
  • Yige Zhao

    (Columbia University Irving Medical Center
    Columbia University)

  • Tian Lan

    (Columbia University Irving Medical Center)

  • Guojie Zhong

    (Columbia University Irving Medical Center
    Columbia University)

  • Jake Hagen

    (Columbia University Irving Medical Center
    Boston Children’s Hospital and Harvard Medical School)

  • Hongbing Pan

    (Columbia University Irving Medical Center)

  • Wendy K. Chung

    (Boston Children’s Hospital and Harvard Medical School)

  • Yufeng Shen

    (Columbia University Irving Medical Center
    Columbia University Irving Medical Center
    Columbia University)

Abstract

Accurately predicting the effect of missense variants is important in discovering disease risk genes and clinical genetic diagnostics. Commonly used computational methods predict pathogenicity, which does not capture the quantitative impact on fitness in humans. We develop a method, MisFit, to estimate missense fitness effect using a graphical model. MisFit jointly models the effect at a molecular level ( $$d$$ d ) and a population level (selection coefficient, $$s$$ s ), assuming that in the same gene, missense variants with similar $$d$$ d have similar $$s$$ s . We train it by maximizing probability of observed allele counts in 236,017 individuals of European ancestry. We show that $$s$$ s is informative in predicting allele frequency across ancestries and consistent with the fraction of de novo mutations in sites under strong selection. Further, $$s$$ s outperforms previous methods in prioritizing de novo missense variants in individuals with neurodevelopmental disorders. In conclusion, MisFit accurately predicts $$s$$ s and yields new insights from genomic data.

Suggested Citation

  • Yige Zhao & Tian Lan & Guojie Zhong & Jake Hagen & Hongbing Pan & Wendy K. Chung & Yufeng Shen, 2025. "A probabilistic graphical model for estimating selection coefficients of nonsynonymous variants from human population sequence data," Nature Communications, Nature, vol. 16(1), pages 1-12, December.
  • Handle: RePEc:nat:natcom:v:16:y:2025:i:1:d:10.1038_s41467-025-59937-2
    DOI: 10.1038/s41467-025-59937-2
    as

    Download full text from publisher

    File URL: https://www.nature.com/articles/s41467-025-59937-2
    File Function: Abstract
    Download Restriction: no

    File URL: https://libkey.io/10.1038/s41467-025-59937-2?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. D. G. MacArthur & T. A. Manolio & D. P. Dimmock & H. L. Rehm & J. Shendure & G. R. Abecasis & D. R. Adams & R. B. Altman & S. E. Antonarakis & E. A. Ashley & J. C. Barrett & L. G. Biesecker & D. F. Co, 2014. "Guidelines for investigating causality of sequence variants in human disease," Nature, Nature, vol. 508(7497), pages 469-476, April.
    2. repec:plo:pgen00:1003671 is not listed on IDEAS
    3. Lukas Gerasimavicius & Benjamin J. Livesey & Joseph A. Marsh, 2022. "Loss-of-function, gain-of-function and dominant-negative mutations have profoundly different effects on protein structure," Nature Communications, Nature, vol. 13(1), pages 1-15, December.
    4. Konrad J. Karczewski & Laurent C. Francioli & Grace Tiao & Beryl B. Cummings & Jessica Alföldi & Qingbo Wang & Ryan L. Collins & Kristen M. Laricchia & Andrea Ganna & Daniel P. Birnbaum & Laura D. Gau, 2020. "The mutational constraint spectrum quantified from variation in 141,456 humans," Nature, Nature, vol. 581(7809), pages 434-443, May.
    5. Ivan Iossifov & Brian J. O’Roak & Stephan J. Sanders & Michael Ronemus & Niklas Krumm & Dan Levy & Holly A. Stessman & Kali T. Witherspoon & Laura Vives & Karynne E. Patterson & Joshua D. Smith & Brya, 2014. "The contribution of de novo coding mutations to autism spectrum disorder," Nature, Nature, vol. 515(7526), pages 216-221, November.
    6. Gregory M. Findlay & Riza M. Daza & Beth Martin & Melissa D. Zhang & Anh P. Leith & Molly Gasperini & Joseph D. Janizek & Xingfan Huang & Lea M. Starita & Jay Shendure, 2018. "Accurate classification of BRCA1 variants with saturation genome editing," Nature, Nature, vol. 562(7726), pages 217-222, October.
    7. Hongjian Qi & Haicang Zhang & Yige Zhao & Chen Chen & John J. Long & Wendy K. Chung & Yongtao Guan & Yufeng Shen, 2021. "MVP predicts the pathogenicity of missense variants by deep learning," Nature Communications, Nature, vol. 12(1), pages 1-9, December.
    8. Monkol Lek & Konrad J. Karczewski & Eric V. Minikel & Kaitlin E. Samocha & Eric Banks & Timothy Fennell & Anne H. O’Donnell-Luria & James S. Ware & Andrew J. Hill & Beryl B. Cummings & Taru Tukiainen , 2016. "Analysis of protein-coding genetic variation in 60,706 humans," Nature, Nature, vol. 536(7616), pages 285-291, August.
    9. Jonathan Frazer & Pascal Notin & Mafalda Dias & Aidan Gomez & Joseph K. Min & Kelly Brock & Yarin Gal & Debora S. Marks, 2021. "Disease variant prediction with deep generative models of evolutionary data," Nature, Nature, vol. 599(7883), pages 91-95, November.
    10. Ambroise Wonkam, 2021. "Sequence three million genomes across Africa," Nature, Nature, vol. 590(7845), pages 209-211, February.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Kian Hong Kock & Patrick K. Kimes & Stephen S. Gisselbrecht & Sachi Inukai & Sabrina K. Phanor & James T. Anderson & Gayatri Ramakrishnan & Colin H. Lipper & Dongyuan Song & Jesse V. Kurland & Julia M, 2024. "DNA binding analysis of rare variants in homeodomains reveals homeodomain specificity-determining residues," Nature Communications, Nature, vol. 15(1), pages 1-19, December.
    2. Sheng Wang & Belinda Wang & Vanessa Drury & Sam Drake & Nawei Sun & Hasan Alkhairo & Juan Arbelaez & Clif Duhn & Vanessa H. Bal & Kate Langley & Joanna Martin & Pieter J. Hoekstra & Andrea Dietrich & , 2023. "Rare X-linked variants carry predominantly male risk in autism, Tourette syndrome, and ADHD," Nature Communications, Nature, vol. 14(1), pages 1-18, December.
    3. Ada J. S. Chan & Worrawat Engchuan & Miriam S. Reuter & Zhuozhi Wang & Bhooma Thiruvahindrapuram & Brett Trost & Thomas Nalpathamkalam & Carol Negrijn & Sylvia Lamoureux & Giovanna Pellecchia & Rohan , 2022. "Genome-wide rare variant score associates with morphological subtypes of autism spectrum disorder," Nature Communications, Nature, vol. 13(1), pages 1-16, December.
    4. Bian Li & Dan M. Roden & John A. Capra, 2022. "The 3D mutational constraint on amino acid sites in the human proteome," Nature Communications, Nature, vol. 13(1), pages 1-15, December.
    5. Ruoyu Tian & Tian Ge & Hyeokmoon Kweon & Daniel B. Rocha & Max Lam & Jimmy Z. Liu & Kritika Singh & Daniel F. Levey & Joel Gelernter & Murray B. Stein & Ellen A. Tsai & Hailiang Huang & Christopher F., 2024. "Whole-exome sequencing in UK Biobank reveals rare genetic architecture for depression," Nature Communications, Nature, vol. 15(1), pages 1-12, December.
    6. Iker Núñez-Carpintero & Maria Rigau & Mattia Bosio & Emily O’Connor & Sally Spendiff & Yoshiteru Azuma & Ana Topf & Rachel Thompson & Peter A. C. ’t Hoen & Teodora Chamova & Ivailo Tournev & Velina Gu, 2024. "Rare disease research workflow using multilayer networks elucidates the molecular determinants of severity in Congenital Myasthenic Syndromes," Nature Communications, Nature, vol. 15(1), pages 1-15, December.
    7. Panagiotis Katsonis & Olivier Lichtarge, 2025. "Meta-EA: a gene-specific combination of available computational tools for predicting missense variant effects," Nature Communications, Nature, vol. 16(1), pages 1-13, December.
    8. Ricky Lali & Michael Chong & Arghavan Omidi & Pedrum Mohammadi-Shemirani & Ann Le & Edward Cui & Guillaume Paré, 2021. "Calibrated rare variant genetic risk scores for complex disease prediction using large exome sequence repositories," Nature Communications, Nature, vol. 12(1), pages 1-15, December.
    9. Tetsuo Shoda & Kenneth M. Kaufman & Ting Wen & Julie M. Caldwell & Garrett A. Osswald & Pathre Purnima & Nives Zimmermann & Margaret H. Collins & Kira Rehn & Heather Foote & Michael D. Eby & Wenying Z, 2021. "Desmoplakin and periplakin genetically and functionally contribute to eosinophilic esophagitis," Nature Communications, Nature, vol. 12(1), pages 1-15, December.
    10. Jörn Bethune & April Kleppe & Søren Besenbacher, 2022. "A method to build extended sequence context models of point mutations and indels," Nature Communications, Nature, vol. 13(1), pages 1-10, December.
    11. Kellan P. Weston & Xiaoyi Gao & Jinghan Zhao & Kwang-Soo Kim & Susan E. Maloney & Jill Gotoff & Sumit Parikh & Yen-Chen Leu & Kuen-Phon Wu & Marwan Shinawi & Joshua P. Steimel & Joseph S. Harrison & J, 2021. "Identification of disease-linked hyperactivating mutations in UBE3A through large-scale functional variant analysis," Nature Communications, Nature, vol. 12(1), pages 1-15, December.
    12. Minhui Chen & Andy Dahl, 2024. "A robust model for cell type-specific interindividual variation in single-cell RNA sequencing data," Nature Communications, Nature, vol. 15(1), pages 1-12, December.
    13. Noah Dukler & Mehreen R. Mughal & Ritika Ramani & Yi-Fei Huang & Adam Siepel, 2022. "Extreme purifying selection against point mutations in the human genome," Nature Communications, Nature, vol. 13(1), pages 1-12, December.
    14. Lukas Gerasimavicius & Benjamin J. Livesey & Joseph A. Marsh, 2022. "Loss-of-function, gain-of-function and dominant-negative mutations have profoundly different effects on protein structure," Nature Communications, Nature, vol. 13(1), pages 1-15, December.
    15. Sally J. Adua & Anna Arnal-Estapé & Minghui Zhao & Bowen Qi & Zongzhi Z. Liu & Carolyn Kravitz & Heather Hulme & Nicole Strittmatter & Francesc López-Giráldez & Sampada Chande & Alexandra E. Albert & , 2022. "Brain metastatic outgrowth and osimertinib resistance are potentiated by RhoA in EGFR-mutant lung cancer," Nature Communications, Nature, vol. 13(1), pages 1-17, December.
    16. Scott D. Findlay & Lindsay Romo & Christopher B. Burge, 2024. "Quantifying negative selection in human 3ʹ UTRs uncovers constrained targets of RNA-binding proteins," Nature Communications, Nature, vol. 15(1), pages 1-15, December.
    17. Jeffrey D. Wall & J. Fah Sathirapongsasuti & Ravi Gupta & Asif Rasheed & Radha Venkatesan & Saurabh Belsare & Ramesh Menon & Sameer Phalke & Anuradha Mittal & John Fang & Deepak Tanneeru & Manjari Des, 2023. "South Asian medical cohorts reveal strong founder effects and high rates of homozygosity," Nature Communications, Nature, vol. 14(1), pages 1-11, December.
    18. Federica Luppino & Ivan A. Adzhubei & Christopher A. Cassa & Agnes Toth-Petroczy, 2023. "DeMAG predicts the effects of variants in clinically actionable genes by integrating structural and evolutionary epistatic features," Nature Communications, Nature, vol. 14(1), pages 1-14, December.
    19. Mofan Feng & Xiaoxi Wei & Xi Zheng & Liangjie Liu & Lin Lin & Manying Xia & Guang He & Yi Shi & Qing Lu, 2024. "Decoding Missense Variants by Incorporating Phase Separation via Machine Learning," Nature Communications, Nature, vol. 15(1), pages 1-17, December.
    20. Yiqin Wang & Xiaoxian Guo & Xiumei Hong & Guoying Wang & Colleen Pearson & Barry Zuckerman & Andrew G. Clark & Kimberly O. O’Brien & Xiaobin Wang & Zhenglong Gu, 2022. "Association of mitochondrial DNA content, heteroplasmies and inter-generational transmission with autism," Nature Communications, Nature, vol. 13(1), pages 1-14, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:natcom:v:16:y:2025:i:1:d:10.1038_s41467-025-59937-2. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.