IDEAS home Printed from https://ideas.repec.org/a/nat/natcom/v16y2025i1d10.1038_s41467-025-57756-z.html
   My bibliography  Save this article

SVLearn: a dual-reference machine learning approach enables accurate cross-species genotyping of structural variants

Author

Listed:
  • Qimeng Yang

    (Northwest A&F University)

  • Jianfeng Sun

    (University of Oxford)

  • Xinyu Wang

    (Northwest A&F University)

  • Jiong Wang

    (Northwest A&F University)

  • Quanzhong Liu

    (Northwest A&F University)

  • Jinlong Ru

    (Helmholtz Centre Munich - German Research Centre for Environmental Health)

  • Xin Zhang

    (Northwest A&F University)

  • Sizhe Wang

    (Northwest A&F University)

  • Ran Hao

    (Northwest A&F University)

  • Peipei Bian

    (Northwest A&F University)

  • Xuelei Dai

    (Northwest A&F University
    Yazhouwan National Laboratory)

  • Mian Gong

    (Northwest A&F University
    Chinese Academy of Agricultural Sciences (CAAS))

  • Zhuangbiao Zhang

    (Northwest A&F University)

  • Ao Wang

    (Northwest A&F University)

  • Fengting Bai

    (Northwest A&F University)

  • Ran Li

    (Northwest A&F University)

  • Yudong Cai

    (Northwest A&F University)

  • Yu Jiang

    (Northwest A&F University)

Abstract

Structural variations (SVs) are diverse forms of genetic alterations and drive a wide range of human diseases. Accurately genotyping SVs, particularly occurring at repetitive genomic regions, from short-read sequencing data remains challenging. Here, we introduce SVLearn, a machine-learning approach for genotyping bi-allelic SVs. It exploits a dual-reference strategy to engineer a curated set of genomic, alignment, and genotyping features based on a reference genome in concert with an allele-based alternative genome. Using 38,613 human-derived SVs, we show that SVLearn significantly outperforms four state-of-the-art tools, with precision improvements of up to 15.61% for insertions and 13.75% for deletions in repetitive regions. On two additional sets of 121,435 cattle SVs and 113,042 sheep SVs, SVLearn demonstrates a strong generalizability to cross-species genotype SVs with a weighted genotype concordance score of up to 90%. Notably, SVLearn enables accurate genotyping of SVs at low sequencing coverage, which is comparable to the accuracy at 30× coverage. Our studies suggest that SVLearn can accelerate the understanding of associations between the genome-scale, high-quality genotyped SVs and diseases across multiple species.

Suggested Citation

  • Qimeng Yang & Jianfeng Sun & Xinyu Wang & Jiong Wang & Quanzhong Liu & Jinlong Ru & Xin Zhang & Sizhe Wang & Ran Hao & Peipei Bian & Xuelei Dai & Mian Gong & Zhuangbiao Zhang & Ao Wang & Fengting Bai , 2025. "SVLearn: a dual-reference machine learning approach enables accurate cross-species genotyping of structural variants," Nature Communications, Nature, vol. 16(1), pages 1-14, December.
  • Handle: RePEc:nat:natcom:v:16:y:2025:i:1:d:10.1038_s41467-025-57756-z
    DOI: 10.1038/s41467-025-57756-z
    as

    Download full text from publisher

    File URL: https://www.nature.com/articles/s41467-025-57756-z
    File Function: Abstract
    Download Restriction: no

    File URL: https://libkey.io/10.1038/s41467-025-57756-z?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Alexander S. Leonard & Danang Crysnanto & Zih-Hua Fang & Michael P. Heaton & Brian L. Vander Ley & Carolina Herrera & Heinrich Bollwein & Derek M. Bickhart & Kristen L. Kuhn & Timothy P. L. Smith & Be, 2022. "Structural variant-based pangenome construction has low sensitivity to variability of haplotype-resolved bovine assemblies," Nature Communications, Nature, vol. 13(1), pages 1-13, December.
    2. Wen-Wei Liao & Mobin Asri & Jana Ebler & Daniel Doerr & Marina Haukness & Glenn Hickey & Shuangjia Lu & Julian K. Lucas & Jean Monlong & Haley J. Abel & Silvia Buonaiuto & Xian H. Chang & Haoyu Cheng , 2023. "A draft human pangenome reference," Nature, Nature, vol. 617(7960), pages 312-324, May.
    3. Ting-Ting Li & Tian Xia & Jia-Qi Wu & Hao Hong & Zhao-Lin Sun & Ming Wang & Fang-Rong Ding & Jing Wang & Shuai Jiang & Jin Li & Jie Pan & Guang Yang & Jian-Nan Feng & Yun-Ping Dai & Xue-Min Zhang & Ta, 2023. "De novo genome assembly depicts the immune genomic characteristics of cattle," Nature Communications, Nature, vol. 14(1), pages 1-14, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Sean A. Misek & Aaron Fultineer & Jeremie Kalfon & Javad Noorbakhsh & Isabella Boyle & Priyanka Roy & Joshua Dempster & Lia Petronio & Katherine Huang & Alham Saadat & Thomas Green & Adam Brown & John, 2024. "Germline variation contributes to false negatives in CRISPR-based experiments with varying burden across ancestries," Nature Communications, Nature, vol. 15(1), pages 1-11, December.
    2. Leslie A. Smith & James A. Cahill & Ji-Hyun Lee & Kiley Graim, 2025. "Equitable machine learning counteracts ancestral bias in precision medicine," Nature Communications, Nature, vol. 16(1), pages 1-17, December.
    3. Xiao Chen & Daniel Baker & Egor Dolzhenko & Joseph M. Devaney & Jessica Noya & April S. Berlyoung & Rhonda Brandon & Kathleen S. Hruska & Lucas Lochovsky & Paul Kruszka & Scott Newman & Emily Farrow &, 2025. "Genome-wide profiling of highly similar paralogous genes using HiFi sequencing," Nature Communications, Nature, vol. 16(1), pages 1-13, December.
    4. Xinfeng Liu & Wenyu Liu & Johannes A. Lenstra & Zeyu Zheng & Xiaoyun Wu & Jiao Yang & Bowen Li & Yongzhi Yang & Qiang Qiu & Hongyu Liu & Kexin Li & Chunnian Liang & Xian Guo & Xiaoming Ma & Richard J., 2023. "Evolutionary origin of genomic structural variations in domestic yaks," Nature Communications, Nature, vol. 14(1), pages 1-14, December.
    5. Celine A. Manigbas & Bharati Jadhav & Paras Garg & Mariya Shadrina & William Lee & Gabrielle Altman & Alejandro Martin-Trujillo & Andrew J. Sharp, 2024. "A phenome-wide association study of tandem repeat variation in 168,554 individuals from the UK Biobank," Nature Communications, Nature, vol. 15(1), pages 1-12, December.
    6. Jiao Gong & Huiru Sun & Kaiyuan Wang & Yanhui Zhao & Yechao Huang & Qinsheng Chen & Hui Qiao & Yang Gao & Jialin Zhao & Yunchao Ling & Ruifang Cao & Jingze Tan & Qi Wang & Yanyun Ma & Jing Li & Jingch, 2025. "Long-read sequencing of 945 Han individuals identifies structural variants associated with phenotypic diversity and disease susceptibility," Nature Communications, Nature, vol. 16(1), pages 1-21, December.
    7. Sarah A. Mueller & Justin Merondun & Sonja Lečić & Jochen B. W. Wolf, 2025. "Epigenetic variation in light of population genetic practice," Nature Communications, Nature, vol. 16(1), pages 1-14, December.
    8. Caitlin Guccione & Lucas Patel & Yoshihiko Tomofuji & Daniel McDonald & Antonio Gonzalez & Gregory D. Sepich-Poore & Kyuto Sonehara & Mohsen Zakeri & Yang Chen & Amanda Hazel Dilmore & Neil Damle & Se, 2025. "Incomplete human reference genomes can drive false sex biases and expose patient-identifying information in metagenomic data," Nature Communications, Nature, vol. 16(1), pages 1-14, December.
    9. Cristian Groza & Xun Chen & Travis J. Wheeler & Guillaume Bourque & Clément Goubert, 2024. "A unified framework to analyze transposable element insertion polymorphisms using graph genomes," Nature Communications, Nature, vol. 15(1), pages 1-17, December.
    10. Tuomas Hämälä & Christopher Moore & Laura Cowan & Matthew Carlile & David Gopaulchan & Marie K. Brandrud & Siri Birkeland & Matthew Loose & Filip Kolář & Marcus A. Koch & Levi Yant, 2024. "Impact of whole-genome duplications on structural variant evolution in Cochlearia," Nature Communications, Nature, vol. 15(1), pages 1-13, December.
    11. Cristian Groza & Carl Schwendinger-Schreck & Warren A. Cheung & Emily G. Farrow & Isabelle Thiffault & Juniper Lake & William B. Rizzo & Gilad Evrony & Tom Curran & Guillaume Bourque & Tomi Pastinen, 2024. "Pangenome graphs improve the analysis of structural variants in rare genetic diseases," Nature Communications, Nature, vol. 15(1), pages 1-12, December.
    12. Tobias T. Schmidt & Carly Tyer & Preeyesh Rughani & Candy Haggblom & Jeffrey R. Jones & Xiaoguang Dai & Kelly A. Frazer & Fred H. Gage & Sissel Juul & Scott Hickey & Jan Karlseder, 2024. "High resolution long-read telomere sequencing reveals dynamic mechanisms in aging and cancer," Nature Communications, Nature, vol. 15(1), pages 1-11, December.
    13. Oscar Florez-Vargas & Michelle Ho & Maxwell H. Hogshead & Brenen W. Papenberg & Chia-Han Lee & Kaitlin Forsythe & Kristine Jones & Wen Luo & Kedest Teshome & Cornelis Blauwendraat & Kimberly J. Billin, 2025. "Genetic regulation of TERT splicing affects cancer risk by altering cellular longevity and replicative potential," Nature Communications, Nature, vol. 16(1), pages 1-20, December.
    14. Justin Wagner & Nathan D. Olson & Jennifer McDaniel & Lindsay Harris & Brendan J. Pinto & David Jáspez & Adrián Muñoz-Barrera & Luis A. Rubio-Rodríguez & José M. Lorenzo-Salazar & Carlos Flores & Saye, 2025. "Small variant benchmark from a complete assembly of X and Y chromosomes," Nature Communications, Nature, vol. 16(1), pages 1-7, December.
    15. Adam C. English & Fabio Cunial & Ginger A. Metcalf & Richard A. Gibbs & Fritz J. Sedlazeck, 2025. "K-mer analysis of long-read alignment pileups for structural variant genotyping," Nature Communications, Nature, vol. 16(1), pages 1-11, December.
    16. Wolfram Höps & Tobias Rausch & Michael Jendrusch & Jan O. Korbel & Fritz J. Sedlazeck, 2024. "Impact and characterization of serial structural variations across humans and great apes," Nature Communications, Nature, vol. 15(1), pages 1-15, December.
    17. Can Luo & Yichen Henry Liu & Xin Maizie Zhou, 2024. "VolcanoSV enables accurate and robust structural variant calling in diploid genomes from single-molecule long read sequencing," Nature Communications, Nature, vol. 15(1), pages 1-20, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:natcom:v:16:y:2025:i:1:d:10.1038_s41467-025-57756-z. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.