IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0302504.html
   My bibliography  Save this article

Predicting hotspots for disease-causing single nucleotide variants using sequences-based coevolution, network analysis, and machine learning

Author

Listed:
  • Wenjun Zheng

Abstract

To enable personalized medicine, it is important yet highly challenging to accurately predict disease-causing mutations in target proteins at high throughput. Previous computational methods have been developed using evolutionary information in combination with various biochemical and structural features of protein residues to discriminate neutral vs. deleterious mutations. However, the power of these methods is often limited because they either assume known protein structures or treat residues independently without fully considering their interactions. To address the above limitations, we build upon recent progress in machine learning, network analysis, and protein language models, and develop a sequences-based variant site prediction workflow based on the protein residue contact networks: 1. We employ and integrate various methods of building protein residue networks using state-of-the-art coevolution analysis tools (RaptorX, DeepMetaPSICOV, and SPOT-Contact) powered by deep learning. 2. We use machine learning algorithms (Random Forest, Gradient Boosting, and Extreme Gradient Boosting) to optimally combine 20 network centrality scores to jointly predict key residues as hot spots for disease mutations. 3. Using a dataset of 107 proteins rich in disease mutations, we rigorously evaluate the network scores individually and collectively (via machine learning). This work supports a promising strategy of combining an ensemble of network scores based on different coevolution analysis methods (and optionally predictive scores from other methods) via machine learning to predict hotspot sites of disease mutations, which will inform downstream applications of disease diagnosis and targeted drug design.

Suggested Citation

  • Wenjun Zheng, 2024. "Predicting hotspots for disease-causing single nucleotide variants using sequences-based coevolution, network analysis, and machine learning," PLOS ONE, Public Library of Science, vol. 19(5), pages 1-21, May.
  • Handle: RePEc:plo:pone00:0302504
    DOI: 10.1371/journal.pone.0302504
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0302504
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0302504&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0302504?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Sheng Wang & Siqi Sun & Zhen Li & Renyu Zhang & Jinbo Xu, 2017. "Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model," PLOS Computational Biology, Public Library of Science, vol. 13(1), pages 1-34, January.
    2. Kathryn Tunyasuvunakool & Jonas Adler & Zachary Wu & Tim Green & Michal Zielinski & Augustin Žídek & Alex Bridgland & Andrew Cowie & Clemens Meyer & Agata Laydon & Sameer Velankar & Gerard J. Kleywegt, 2021. "Highly accurate protein structure prediction for the human proteome," Nature, Nature, vol. 596(7873), pages 590-596, August.
    3. John Jumper & Richard Evans & Alexander Pritzel & Tim Green & Michael Figurnov & Olaf Ronneberger & Kathryn Tunyasuvunakool & Russ Bates & Augustin Žídek & Anna Potapenko & Alex Bridgland & Clemens Me, 2021. "Highly accurate protein structure prediction with AlphaFold," Nature, Nature, vol. 596(7873), pages 583-589, August.
    4. Michael S. Wolfe & Weiming Xia & Beth L. Ostaszewski & Thekla S. Diehl & W. Taylor Kimberly & Dennis J. Selkoe, 1999. "Two transmembrane aspartates in presenilin-1 required for presenilin endoproteolysis and γ-secretase activity," Nature, Nature, vol. 398(6727), pages 513-517, April.
    5. Hongjian Qi & Haicang Zhang & Yige Zhao & Chen Chen & John J. Long & Wendy K. Chung & Yongtao Guan & Yufeng Shen, 2021. "MVP predicts the pathogenicity of missense variants by deep learning," Nature Communications, Nature, vol. 12(1), pages 1-9, December.
    6. Martin Steinegger & Johannes Söding, 2018. "Clustering huge protein sequence sets in linear time," Nature Communications, Nature, vol. 9(1), pages 1-8, December.
    7. Abhishek Niroula & Siddhaling Urolagin & Mauno Vihinen, 2015. "PON-P2: Prediction Method for Fast and Reliable Identification of Harmful Variants," PLOS ONE, Public Library of Science, vol. 10(2), pages 1-17, February.
    8. Lukas Burger & Erik van Nimwegen, 2010. "Disentangling Direct from Indirect Co-Evolution of Residues in Protein Alignments," PLOS Computational Biology, Public Library of Science, vol. 6(1), pages 1-18, January.
    9. Vikas Pejaver & Jorge Urresti & Jose Lugo-Martinez & Kymberleigh A. Pagel & Guan Ning Lin & Hyun-Jun Nam & Matthew Mort & David N. Cooper & Jonathan Sebat & Lilia M. Iakoucheva & Sean D. Mooney & Pred, 2020. "Inferring the molecular and phenotypic impact of amino acid variants with MutPred2," Nature Communications, Nature, vol. 11(1), pages 1-13, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Peicong Lin & Yumeng Yan & Huanyu Tao & Sheng-You Huang, 2023. "Deep transfer learning for inter-chain contact predictions of transmembrane protein complexes," Nature Communications, Nature, vol. 14(1), pages 1-16, December.
    2. Yingying Zhang & Alden K. Leung & Jin Joo Kang & Yu Sun & Guanxi Wu & Le Li & Jiayang Sun & Lily Cheng & Tian Qiu & Junke Zhang & Shayne D. Wierbowski & Shagun Gupta & James G. Booth & Haiyuan Yu, 2025. "A multiscale functional map of somatic mutations in cancer integrating protein structure and network topology," Nature Communications, Nature, vol. 16(1), pages 1-18, December.
    3. Jeffrey A. Ruffolo & Lee-Shin Chu & Sai Pooja Mahajan & Jeffrey J. Gray, 2023. "Fast, accurate antibody structure prediction from deep learning on massive set of natural antibodies," Nature Communications, Nature, vol. 14(1), pages 1-13, December.
    4. Lei Wang & Jiangguo Zhang & Dali Wang & Chen Song, 2022. "Membrane contact probability: An essential and predictive character for the structural and functional studies of membrane proteins," PLOS Computational Biology, Public Library of Science, vol. 18(3), pages 1-27, March.
    5. Ivan Koludarov & Tobias Senoner & Timothy N. W. Jackson & Daniel Dashevsky & Michael Heinzinger & Steven D. Aird & Burkhard Rost, 2023. "Domain loss enabled evolution of novel functions in the snake three-finger toxin gene superfamily," Nature Communications, Nature, vol. 14(1), pages 1-15, December.
    6. Zhiye Guo & Jian Liu & Jeffrey Skolnick & Jianlin Cheng, 2022. "Prediction of inter-chain distance maps of protein complexes with 2D attention-based deep neural networks," Nature Communications, Nature, vol. 13(1), pages 1-10, December.
    7. Junhui Peng & Li Zhao, 2024. "The origin and structural evolution of de novo genes in Drosophila," Nature Communications, Nature, vol. 15(1), pages 1-14, December.
    8. Nicolae Sapoval & Amirali Aghazadeh & Michael G. Nute & Dinler A. Antunes & Advait Balaji & Richard Baraniuk & C. J. Barberan & Ruth Dannenfelser & Chen Dun & Mohammadamin Edrisi & R. A. Leo Elworth &, 2022. "Current progress and open challenges for applying deep learning across the biosciences," Nature Communications, Nature, vol. 13(1), pages 1-12, December.
    9. Meghana Kshirsagar & Artur Meller & Ian R. Humphreys & Samuel Sledzieski & Yixi Xu & Rahul Dodhia & Eric Horvitz & Bonnie Berger & Gregory R. Bowman & Juan Lavista Ferres & David Baker & Minkyung Baek, 2025. "Rapid and accurate prediction of protein homo-oligomer symmetry using Seq2Symm," Nature Communications, Nature, vol. 16(1), pages 1-11, December.
    10. David Moi & Shunsuke Nishio & Xiaohui Li & Clari Valansi & Mauricio Langleib & Nicolas G. Brukman & Kateryna Flyak & Christophe Dessimoz & Daniele de Sanctis & Kathryn Tunyasuvunakool & John Jumper & , 2022. "Discovery of archaeal fusexins homologous to eukaryotic HAP2/GCS1 gamete fusion proteins," Nature Communications, Nature, vol. 13(1), pages 1-18, December.
    11. Salvatore Daniele Bianco & Luca Parca & Francesco Petrizzelli & Tommaso Biagini & Agnese Giovannetti & Niccolò Liorni & Alessandro Napoli & Massimo Carella & Vincent Procaccio & Marie T. Lott & Shipin, 2023. "APOGEE 2: multi-layer machine-learning model for the interpretable prediction of mitochondrial missense variants," Nature Communications, Nature, vol. 14(1), pages 1-13, December.
    12. Bian Li & Dan M. Roden & John A. Capra, 2022. "The 3D mutational constraint on amino acid sites in the human proteome," Nature Communications, Nature, vol. 13(1), pages 1-15, December.
    13. Julia Koehler Leman & Pawel Szczerbiak & P. Douglas Renfrew & Vladimir Gligorijevic & Daniel Berenberg & Tommi Vatanen & Bryn C. Taylor & Chris Chandler & Stefan Janssen & Andras Pataki & Nick Carrier, 2023. "Sequence-structure-function relationships in the microbial protein universe," Nature Communications, Nature, vol. 14(1), pages 1-11, December.
    14. Pantelis Livanos & Choy Kriechbaum & Sophia Remers & Arvid Herrmann & Sabine Müller, 2025. "Kinesin-12 POK2 polarization is a prerequisite for a fully functional division site and aids cell plate positioning," Nature Communications, Nature, vol. 16(1), pages 1-17, December.
    15. Surabhi Kokane & Ashutosh Gulati & Pascal F. Meier & Rei Matsuoka & Tanadet Pipatpolkai & Giuseppe Albano & Tin Manh Ho & Lucie Delemotte & Daniel Fuster & David Drew, 2025. "PIP2-mediated oligomerization of the endosomal sodium/proton exchanger NHE9," Nature Communications, Nature, vol. 16(1), pages 1-17, December.
    16. Pierre Azoulay & Joshua Krieger & Abhishek Nagaraj, 2024. "Old Moats for New Models: Openness, Control, and Competition in Generative Artificial Intelligence," NBER Chapters, in: Entrepreneurship and Innovation Policy and the Economy, volume 4, pages 7-46, National Bureau of Economic Research, Inc.
    17. Xin Yong & Guowen Jia & Qin Yang & Chunzhuang Zhou & Sitao Zhang & Huaqing Deng & Daniel D. Billadeau & Zhaoming Su & Da Jia, 2025. "Cryo-EM structure of the BLOC-3 complex provides insights into the pathogenesis of Hermansky-Pudlak syndrome," Nature Communications, Nature, vol. 16(1), pages 1-15, December.
    18. Jun-Yu Si & Yuan-Mei Chen & Ye-Hui Sun & Meng-Xue Gu & Mei-Ling Huang & Lu-Lu Shi & Xiao Yu & Xiao Yang & Qing Xiong & Cheng-Bao Ma & Peng Liu & Zheng-Li Shi & Huan Yan, 2024. "Sarbecovirus RBD indels and specific residues dictating multi-species ACE2 adaptiveness," Nature Communications, Nature, vol. 15(1), pages 1-15, December.
    19. Deyun Qiu & Jinxin V. Pei & James E. O. Rosling & Vandana Thathy & Dongdi Li & Yi Xue & John D. Tanner & Jocelyn Sietsma Penington & Yi Tong Vincent Aw & Jessica Yi Han Aw & Guoyue Xu & Abhai K. Tripa, 2022. "A G358S mutation in the Plasmodium falciparum Na+ pump PfATP4 confers clinically-relevant resistance to cipargamin," Nature Communications, Nature, vol. 13(1), pages 1-18, December.
    20. Shuo-Shuo Liu & Tian-Xia Jiang & Fan Bu & Ji-Lan Zhao & Guang-Fei Wang & Guo-Heng Yang & Jie-Yan Kong & Yun-Fan Qie & Pei Wen & Li-Bin Fan & Ning-Ning Li & Ning Gao & Xiao-Bo Qiu, 2024. "Molecular mechanisms underlying the BIRC6-mediated regulation of apoptosis and autophagy," Nature Communications, Nature, vol. 15(1), pages 1-16, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0302504. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.