IDEAS home Printed from https://ideas.repec.org/a/nat/natcom/v14y2023i1d10.1038_s41467-023-43651-y.html
   My bibliography  Save this article

PhenoSV: interpretable phenotype-aware model for the prioritization of genes affected by structural variants

Author

Listed:
  • Zhuoran Xu

    (University of Pennsylvania Perelman School of Medicine
    Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children’s Hospital of Philadelphia
    Weill Cornell Medicine)

  • Quan Li

    (Princess Margaret Cancer Centre, University Health Network, University of Toronto)

  • Luigi Marchionni

    (Weill Cornell Medicine)

  • Kai Wang

    (Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children’s Hospital of Philadelphia
    University of Pennsylvania)

Abstract

Structural variants (SVs) represent a major source of genetic variation associated with phenotypic diversity and disease susceptibility. While long-read sequencing can discover over 20,000 SVs per human genome, interpreting their functional consequences remains challenging. Existing methods for identifying disease-related SVs focus on deletion/duplication only and cannot prioritize individual genes affected by SVs, especially for noncoding SVs. Here, we introduce PhenoSV, a phenotype-aware machine-learning model that interprets all major types of SVs and genes affected. PhenoSV segments and annotates SVs with diverse genomic features and employs a transformer-based architecture to predict their impacts under a multiple-instance learning framework. With phenotype information, PhenoSV further utilizes gene-phenotype associations to prioritize phenotype-related SVs. Evaluation on extensive human SV datasets covering all SV types demonstrates PhenoSV’s superior performance over competing methods. Applications in diseases suggest that PhenoSV can determine disease-related genes from SVs. A web server and a command-line tool for PhenoSV are available at https://phenosv.wglab.org .

Suggested Citation

  • Zhuoran Xu & Quan Li & Luigi Marchionni & Kai Wang, 2023. "PhenoSV: interpretable phenotype-aware model for the prioritization of genes affected by structural variants," Nature Communications, Nature, vol. 14(1), pages 1-16, December.
  • Handle: RePEc:nat:natcom:v:14:y:2023:i:1:d:10.1038_s41467-023-43651-y
    DOI: 10.1038/s41467-023-43651-y
    as

    Download full text from publisher

    File URL: https://www.nature.com/articles/s41467-023-43651-y
    File Function: Abstract
    Download Restriction: no

    File URL: https://libkey.io/10.1038/s41467-023-43651-y?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Kuhn, Max, 2008. "Building Predictive Models in R Using the caret Package," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 28(i05).
    2. Donald F. Conrad & Dalila Pinto & Richard Redon & Lars Feuk & Omer Gokcumen & Yujun Zhang & Jan Aerts & T. Daniel Andrews & Chris Barnes & Peter Campbell & Tomas Fitzgerald & Min Hu & Chun Hwa Ihm & K, 2010. "Origins and functional impact of copy number variation in the human genome," Nature, Nature, vol. 464(7289), pages 704-712, April.
    3. Jesse R. Dixon & Siddarth Selvaraj & Feng Yue & Audrey Kim & Yan Li & Yin Shen & Ming Hu & Jun S. Liu & Bing Ren, 2012. "Topological domains in mammalian genomes identified by analysis of chromatin interactions," Nature, Nature, vol. 485(7398), pages 376-380, May.
    4. Yilong Li & Nicola D. Roberts & Jeremiah A. Wala & Ofer Shapira & Steven E. Schumacher & Kiran Kumar & Ekta Khurana & Sebastian Waszak & Jan O. Korbel & James E. Haber & Marcin Imielinski & Joachim We, 2020. "Patterns of somatic structural variation in human cancer genomes," Nature, Nature, vol. 578(7793), pages 112-121, February.
    5. Konrad J. Karczewski & Laurent C. Francioli & Grace Tiao & Beryl B. Cummings & Jessica Alföldi & Qingbo Wang & Ryan L. Collins & Kristen M. Laricchia & Andrea Ganna & Daniel P. Birnbaum & Laura D. Gau, 2020. "The mutational constraint spectrum quantified from variation in 141,456 humans," Nature, Nature, vol. 581(7809), pages 434-443, May.
    6. Ryan L. Collins & Harrison Brand & Konrad J. Karczewski & Xuefang Zhao & Jessica Alföldi & Laurent C. Francioli & Amit V. Khera & Chelsea Lowther & Laura D. Gauthier & Harold Wang & Nicholas A. Watts , 2020. "A structural variation reference for medical and population genetics," Nature, Nature, vol. 581(7809), pages 444-451, May.
    7. Teri A. Manolio & Francis S. Collins & Nancy J. Cox & David B. Goldstein & Lucia A. Hindorff & David J. Hunter & Mark I. McCarthy & Erin M. Ramos & Lon R. Cardon & Aravinda Chakravarti & Judy H. Cho &, 2009. "Finding the missing heritability of complex diseases," Nature, Nature, vol. 461(7265), pages 747-753, October.
    8. Esther Rheinbay & Morten Muhlig Nielsen & Federico Abascal & Jeremiah A. Wala & Ofer Shapira & Grace Tiao & Henrik Hornshøj & Julian M. Hess & Randi Istrup Juul & Ziao Lin & Lars Feuerbach & Radhakris, 2020. "Analyses of non-coding somatic drivers in 2,658 cancer whole genomes," Nature, Nature, vol. 578(7793), pages 102-111, February.
    9. Carles A. Boix & Benjamin T. James & Yongjin P. Park & Wouter Meuleman & Manolis Kellis, 2021. "Regulatory genomic circuitry of human disease loci by integrative epigenomics," Nature, Nature, vol. 590(7845), pages 300-307, February.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Alexander Martinez-Fundichely & Austin Dixon & Ekta Khurana, 2022. "Modeling tissue-specific breakpoint proximity of structural variations from whole-genomes to identify cancer drivers," Nature Communications, Nature, vol. 13(1), pages 1-15, December.
    2. Parithi Balachandran & Isha A. Walawalkar & Jacob I. Flores & Jacob N. Dayton & Peter A. Audano & Christine R. Beck, 2022. "Transposable element-mediated rearrangements are prevalent in human genomes," Nature Communications, Nature, vol. 13(1), pages 1-14, December.
    3. Yirong Shi & Yiwei Niu & Peng Zhang & Huaxia Luo & Shuai Liu & Sijia Zhang & Jiajia Wang & Yanyan Li & Xinyue Liu & Tingrui Song & Tao Xu & Shunmin He, 2023. "Characterization of genome-wide STR variation in 6487 human genomes," Nature Communications, Nature, vol. 14(1), pages 1-18, December.
    4. Liyuan Zhou & Qiongzi Qiu & Qing Zhou & Jianwei Li & Mengqian Yu & Kezhen Li & Lingling Xu & Xiaohui Ke & Haiming Xu & Bingjian Lu & Hui Wang & Weiguo Lu & Pengyuan Liu & Yan Lu, 2022. "Long-read sequencing unveils high-resolution HPV integration and its oncogenic progression in cervical cancer," Nature Communications, Nature, vol. 13(1), pages 1-18, December.
    5. Andrea Wilderman & Eva D’haene & Machteld Baetens & Tara N. Yankee & Emma Wentworth Winchester & Nicole Glidden & Ellen Roets & Jo Dorpe & Sandra Janssens & Danny E. Miller & Miranda Galey & Kari M. B, 2024. "A distant global control region is essential for normal expression of anterior HOXA genes during mouse and human craniofacial development," Nature Communications, Nature, vol. 15(1), pages 1-23, December.
    6. Oriol Pich & Iker Reyes-Salazar & Abel Gonzalez-Perez & Nuria Lopez-Bigas, 2022. "Discovering the drivers of clonal hematopoiesis," Nature Communications, Nature, vol. 13(1), pages 1-12, December.
    7. Ada J. S. Chan & Worrawat Engchuan & Miriam S. Reuter & Zhuozhi Wang & Bhooma Thiruvahindrapuram & Brett Trost & Thomas Nalpathamkalam & Carol Negrijn & Sylvia Lamoureux & Giovanna Pellecchia & Rohan , 2022. "Genome-wide rare variant score associates with morphological subtypes of autism spectrum disorder," Nature Communications, Nature, vol. 13(1), pages 1-16, December.
    8. Peter H. Dixon & Adam P. Levine & Inês Cebola & Melanie M. Y. Chan & Aliya S. Amin & Anshul Aich & Monika Mozere & Hannah Maude & Alice L. Mitchell & Jun Zhang & Jenny Chambers & Argyro Syngelaki & Je, 2022. "GWAS meta-analysis of intrahepatic cholestasis of pregnancy implicates multiple hepatic genes and regulatory elements," Nature Communications, Nature, vol. 13(1), pages 1-18, December.
    9. Jinlong Shi & Zhilong Jia & Jinxiu Sun & Xiaoreng Wang & Xiaojing Zhao & Chenghui Zhao & Fan Liang & Xinyu Song & Jiawei Guan & Xue Jia & Jing Yang & Qi Chen & Kang Yu & Qian Jia & Jing Wu & Depeng Wa, 2023. "Structural variants involved in high-altitude adaptation detected using single-molecule long-read sequencing," Nature Communications, Nature, vol. 14(1), pages 1-15, December.
    10. Fengju Chen & Yiqun Zhang & Darshan S. Chandrashekar & Sooryanarayana Varambally & Chad J. Creighton, 2023. "Global impact of somatic structural variation on the cancer proteome," Nature Communications, Nature, vol. 14(1), pages 1-19, December.
    11. Mischan Vali-Pour & Solip Park & Jose Espinosa-Carrasco & Daniel Ortiz-Martínez & Ben Lehner & Fran Supek, 2022. "The impact of rare germline variants on human somatic mutation processes," Nature Communications, Nature, vol. 13(1), pages 1-21, December.
    12. Remo Monti & Pia Rautenstrauch & Mahsa Ghanbari & Alva Rani James & Matthias Kirchler & Uwe Ohler & Stefan Konigorski & Christoph Lippert, 2022. "Identifying interpretable gene-biomarker associations with functionally informed kernel-based tests in 190,000 exomes," Nature Communications, Nature, vol. 13(1), pages 1-16, December.
    13. Zhen-Hui Wang & Xin-Feng Wang & Tianyuan Lu & Ming-Rui Li & Peng Jiang & Jing Zhao & Si-Tong Liu & Xue-Qi Fu & Jonathan F. Wendel & Yves Peer & Bao Liu & Lin-Feng Li, 2022. "Reshuffling of the ancestral core-eudicot genome shaped chromatin topology and epigenetic modification in Panax," Nature Communications, Nature, vol. 13(1), pages 1-12, December.
    14. Prabal Das & D. A. Sachindra & Kironmala Chanda, 2022. "Machine Learning-Based Rainfall Forecasting with Multiple Non-Linear Feature Selection Algorithms," Water Resources Management: An International Journal, Published for the European Water Resources Association (EWRA), Springer;European Water Resources Association (EWRA), vol. 36(15), pages 6043-6071, December.
    15. Paulo Infante & Gonçalo Jacinto & Anabela Afonso & Leonor Rego & Pedro Nogueira & Marcelo Silva & Vitor Nogueira & José Saias & Paulo Quaresma & Daniel Santos & Patrícia Góis & Paulo Rebelo Manuel, 2023. "Factors That Influence the Type of Road Traffic Accidents: A Case Study in a District of Portugal," Sustainability, MDPI, vol. 15(3), pages 1-16, January.
    16. Vincent Michaud & Eulalie Lasseaux & David J. Green & Dave T. Gerrard & Claudio Plaisant & Tomas Fitzgerald & Ewan Birney & Benoît Arveiler & Graeme C. Black & Panagiotis I. Sergouniotis, 2022. "The contribution of common regulatory and protein-coding TYR variants to the genetic architecture of albinism," Nature Communications, Nature, vol. 13(1), pages 1-8, December.
    17. Ephrem Habyarimana & Faheem S Baloch, 2021. "Machine learning models based on remote and proximal sensing as potential methods for in-season biomass yields prediction in commercial sorghum fields," PLOS ONE, Public Library of Science, vol. 16(3), pages 1-23, March.
    18. Alexander Wettstein & Gabriel Jenni & Ida Schneider & Fabienne Kühne & Martin grosse Holtforth & Roberto La Marca, 2023. "Predictors of Psychological Strain and Allostatic Load in Teachers: Examining the Long-Term Effects of Biopsychosocial Risk and Protective Factors Using a LASSO Regression Approach," IJERPH, MDPI, vol. 20(10), pages 1-20, May.
    19. Tang, Kayu & Parsons, David J. & Jude, Simon, 2019. "Comparison of automatic and guided learning for Bayesian networks to analyse pipe failures in the water distribution system," Reliability Engineering and System Safety, Elsevier, vol. 186(C), pages 24-36.
    20. Zhikun Wu & Zehang Jiang & Tong Li & Chuanbo Xie & Liansheng Zhao & Jiaqi Yang & Shuai Ouyang & Yizhi Liu & Tao Li & Zhi Xie, 2021. "Structural variants in the Chinese population and their impact on phenotypes, diseases and population adaptation," Nature Communications, Nature, vol. 12(1), pages 1-12, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:natcom:v:14:y:2023:i:1:d:10.1038_s41467-023-43651-y. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.