IDEAS home Printed from https://ideas.repec.org/a/nat/nature/v611y2022i7936d10.1038_s41586-022-05325-5.html
   My bibliography  Save this article

Semi-automated assembly of high-quality diploid human reference genomes

Author

Listed:
  • Erich D. Jarvis

    (The Rockefeller University
    Howard Hughes Medical Institute)

  • Giulio Formenti

    (The Rockefeller University)

  • Arang Rhie

    (National Institutes of Health)

  • Andrea Guarracino

    (Viale Rita Levi-Montalcini)

  • Chentao Yang

    (BGI-Shenzhen)

  • Jonathan Wood

    (Wellcome Sanger Institute)

  • Alan Tracey

    (Wellcome Sanger Institute)

  • Francoise Thibaud-Nissen

    (National Institutes of Health)

  • Mitchell R. Vollger

    (University of Washington School of Medicine)

  • David Porubsky

    (University of Washington School of Medicine)

  • Haoyu Cheng

    (Dana-Farber Cancer Institute
    Harvard Medical School)

  • Mobin Asri

    (University of California)

  • Glennis A. Logsdon

    (University of Washington School of Medicine)

  • Paolo Carnevali

    (Chan Zuckerberg Initiative)

  • Mark J. P. Chaisson

    (University of Southern California)

  • Chen-Shan Chin

    (Foundation for Biological Data Science)

  • Sarah Cody

    (Washington University School of Medicine)

  • Joanna Collins

    (Wellcome Sanger Institute)

  • Peter Ebert

    (Heinrich Heine University)

  • Merly Escalona

    (University of California Santa Cruz)

  • Olivier Fedrigo

    (The Rockefeller University)

  • Robert S. Fulton

    (Washington University School of Medicine)

  • Lucinda L. Fulton

    (Washington University School of Medicine)

  • Shilpa Garg

    (University of Copenhagen)

  • Jennifer L. Gerton

    (Stowers Institute for Medical Research)

  • Jay Ghurye

    (Dovetail Genomics)

  • Anastasiya Granat

    (Illumina, Inc.)

  • Richard E. Green

    (University of California)

  • William Harvey

    (University of Washington School of Medicine)

  • Patrick Hasenfeld

    (Genome Biology Unit)

  • Alex Hastie

    (Bionano Genomics)

  • Marina Haukness

    (University of California)

  • Erich B. Jaeger

    (Illumina, Inc.)

  • Miten Jain

    (University of California)

  • Melanie Kirsche

    (Johns Hopkins University)

  • Mikhail Kolmogorov

    (University of California San Diego)

  • Jan O. Korbel

    (Genome Biology Unit)

  • Sergey Koren

    (National Institutes of Health)

  • Jonas Korlach

    (Pacific Biosciences)

  • Joyce Lee

    (Bionano Genomics)

  • Daofeng Li

    (Washington University School of Medicine
    Washington University School of Medicine)

  • Tina Lindsay

    (Washington University School of Medicine)

  • Julian Lucas

    (University of California)

  • Feng Luo

    (Clemson University)

  • Tobias Marschall

    (Heinrich Heine University)

  • Matthew W. Mitchell

    (Coriell Institute for Medical Research)

  • Jennifer McDaniel

    (National Institute of Standards and Technology)

  • Fan Nie

    (Central South University)

  • Hugh E. Olsen

    (University of California)

  • Nathan D. Olson

    (National Institute of Standards and Technology)

  • Trevor Pesout

    (University of California)

  • Tamara Potapova

    (Stowers Institute for Medical Research)

  • Daniela Puiu

    (Johns Hopkins University)

  • Allison Regier

    (DNAnexus)

  • Jue Ruan

    (Chinese Academy of Agricultural Sciences)

  • Steven L. Salzberg

    (Johns Hopkins University)

  • Ashley D. Sanders

    (Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC))

  • Michael C. Schatz

    (Johns Hopkins University)

  • Anthony Schmitt

    (Arima Genomics)

  • Valerie A. Schneider

    (National Institutes of Health)

  • Siddarth Selvaraj

    (Arima Genomics)

  • Kishwar Shafin

    (University of California)

  • Alaina Shumate

    (Johns Hopkins University)

  • Nathan O. Stitziel

    (Washington University School of Medicine
    Washington University School of Medicine
    Washington University School of Medicine)

  • Catherine Stober

    (Genome Biology Unit)

  • James Torrance

    (Wellcome Sanger Institute)

  • Justin Wagner

    (National Institute of Standards and Technology)

  • Jianxin Wang

    (Central South University)

  • Aaron Wenger

    (Pacific Biosciences)

  • Chuanle Xiao

    (Sun Yat-sen University)

  • Aleksey V. Zimin

    (Johns Hopkins University)

  • Guojie Zhang

    (Zhejiang University School of Medicine)

  • Ting Wang

    (Washington University School of Medicine
    Washington University School of Medicine
    Washington University School of Medicine)

  • Heng Li

    (Dana-Farber Cancer Institute)

  • Erik Garrison

    (University of Tennessee Health Science Center)

  • David Haussler

    (Howard Hughes Medical Institute
    University of California Santa Cruz)

  • Ira Hall

    (Yale School of Medicine)

  • Justin M. Zook

    (National Institute of Standards and Technology)

  • Evan E. Eichler

    (Howard Hughes Medical Institute
    University of Washington School of Medicine)

  • Adam M. Phillippy

    (National Institutes of Health)

  • Benedict Paten

    (University of California)

  • Kerstin Howe

    (Wellcome Sanger Institute)

  • Karen H. Miga

    (University of California)

Abstract

The current human reference genome, GRCh38, represents over 20 years of effort to generate a high-quality assembly, which has benefitted society1,2. However, it still has many gaps and errors, and does not represent a biological genome as it is a blend of multiple individuals3,4. Recently, a high-quality telomere-to-telomere reference, CHM13, was generated with the latest long-read technologies, but it was derived from a hydatidiform mole cell line with a nearly homozygous genome5. To address these limitations, the Human Pangenome Reference Consortium formed with the goal of creating high-quality, cost-effective, diploid genome assemblies for a pangenome reference that represents human genetic diversity6. Here, in our first scientific report, we determined which combination of current genome sequencing and assembly approaches yield the most complete and accurate diploid genome assembly with minimal manual curation. Approaches that used highly accurate long reads and parent–child data with graph-based haplotype phasing during assembly outperformed those that did not. Developing a combination of the top-performing methods, we generated our first high-quality diploid reference assembly, containing only approximately four gaps per chromosome on average, with most chromosomes within ±1% of the length of CHM13. Nearly 48% of protein-coding genes have non-synonymous amino acid changes between haplotypes, and centromeric regions showed the highest diversity. Our findings serve as a foundation for assembling near-complete diploid human genomes at scale for a pangenome reference to capture global genetic variation from single nucleotides to structural rearrangements.

Suggested Citation

  • Erich D. Jarvis & Giulio Formenti & Arang Rhie & Andrea Guarracino & Chentao Yang & Jonathan Wood & Alan Tracey & Francoise Thibaud-Nissen & Mitchell R. Vollger & David Porubsky & Haoyu Cheng & Mobin , 2022. "Semi-automated assembly of high-quality diploid human reference genomes," Nature, Nature, vol. 611(7936), pages 519-531, November.
  • Handle: RePEc:nat:nature:v:611:y:2022:i:7936:d:10.1038_s41586-022-05325-5
    DOI: 10.1038/s41586-022-05325-5
    as

    Download full text from publisher

    File URL: https://www.nature.com/articles/s41586-022-05325-5
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1038/s41586-022-05325-5?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Tim Dunn & Satish Narayanasamy, 2023. "vcfdist: accurately benchmarking phased small variant calls in human genomes," Nature Communications, Nature, vol. 14(1), pages 1-12, December.
    2. Fan Nie & Peng Ni & Neng Huang & Jun Zhang & Zhenyu Wang & Chuanle Xiao & Feng Luo & Jianxin Wang, 2024. "De novo diploid genome assembly using long noisy reads," Nature Communications, Nature, vol. 15(1), pages 1-15, December.
    3. Taotao Li & Duo Du & Dandan Zhang & Yicheng Lin & Jiakang Ma & Mengyu Zhou & Weida Meng & Zelin Jin & Ziqiang Chen & Haozhe Yuan & Jue Wang & Shulong Dong & Shaoyang Sun & Wenjing Ye & Bosen Li & Houb, 2023. "CRISPR-based targeted haplotype-resolved assembly of a megabase region," Nature Communications, Nature, vol. 14(1), pages 1-15, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:nature:v:611:y:2022:i:7936:d:10.1038_s41586-022-05325-5. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.