IDEAS home Printed from https://ideas.repec.org/a/nat/natcom/v9y2018i1d10.1038_s41467-018-06159-4.html
   My bibliography  Save this article

Functional equivalence of genome sequencing analysis pipelines enables harmonized variant calling across human genetics projects

Author

Listed:
  • Allison A. Regier

    (McDonnell Genome Institute, Washington University School of Medicine)

  • Yossi Farjoun

    (Broad Institute of MIT and Harvard)

  • David E. Larson

    (McDonnell Genome Institute, Washington University School of Medicine)

  • Olga Krasheninina

    (Human Genome Sequencing Center, Baylor College of Medicine)

  • Hyun Min Kang

    (University of Michigan)

  • Daniel P. Howrigan

    (Broad Institute of MIT and Harvard)

  • Bo-Juen Chen

    (New York Genome Center
    Google)

  • Manisha Kher

    (New York Genome Center)

  • Eric Banks

    (Broad Institute of MIT and Harvard)

  • Darren C. Ames

    (DNAnexus Inc)

  • Adam C. English

    (Spiral Genetics)

  • Heng Li

    (Broad Institute of MIT and Harvard)

  • Jinchuan Xing

    (Rutgers University)

  • Yeting Zhang

    (Rutgers University)

  • Tara Matise

    (Rutgers University)

  • Goncalo R. Abecasis

    (University of Michigan)

  • Will Salerno

    (Human Genome Sequencing Center, Baylor College of Medicine)

  • Michael C. Zody

    (New York Genome Center)

  • Benjamin M. Neale

    (Broad Institute of MIT and Harvard
    Massachusetts General Hospital)

  • Ira M. Hall

    (McDonnell Genome Institute, Washington University School of Medicine)

Abstract

Hundreds of thousands of human whole genome sequencing (WGS) datasets will be generated over the next few years. These data are more valuable in aggregate: joint analysis of genomes from many sources increases sample size and statistical power. A central challenge for joint analysis is that different WGS data processing pipelines cause substantial differences in variant calling in combined datasets, necessitating computationally expensive reprocessing. This approach is no longer tenable given the scale of current studies and data volumes. Here, we define WGS data processing standards that allow different groups to produce functionally equivalent (FE) results, yet still innovate on data processing pipelines. We present initial FE pipelines developed at five genome centers and show that they yield similar variant calling results and produce significantly less variability than sequencing replicates. This work alleviates a key technical bottleneck for genome aggregation and helps lay the foundation for community-wide human genetics studies.

Suggested Citation

  • Allison A. Regier & Yossi Farjoun & David E. Larson & Olga Krasheninina & Hyun Min Kang & Daniel P. Howrigan & Bo-Juen Chen & Manisha Kher & Eric Banks & Darren C. Ames & Adam C. English & Heng Li & J, 2018. "Functional equivalence of genome sequencing analysis pipelines enables harmonized variant calling across human genetics projects," Nature Communications, Nature, vol. 9(1), pages 1-8, December.
  • Handle: RePEc:nat:natcom:v:9:y:2018:i:1:d:10.1038_s41467-018-06159-4
    DOI: 10.1038/s41467-018-06159-4
    as

    Download full text from publisher

    File URL: https://www.nature.com/articles/s41467-018-06159-4
    File Function: Abstract
    Download Restriction: no

    File URL: https://libkey.io/10.1038/s41467-018-06159-4?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Yuk Yee Leung & Adam C. Naj & Yi-Fan Chou & Otto Valladares & Michael Schmidt & Kara Hamilton-Nelson & Nicholas Wheeler & Honghuang Lin & Prabhakaran Gangadharan & Liming Qu & Kaylyn Clark & Amanda B., 2024. "Human whole-exome genotype data for Alzheimer’s disease," Nature Communications, Nature, vol. 15(1), pages 1-15, December.
    2. Wenan Chen & Shuoguo Wang & Saima Sultana Tithi & David W. Ellison & Daniel J. Schaid & Gang Wu, 2022. "A rare variant analysis framework using public genotype summary counts to prioritize disease-predisposition genes," Nature Communications, Nature, vol. 13(1), pages 1-18, December.
    3. Nazia Pathan & Wei Q. Deng & Matteo Di Scipio & Mohammad Khan & Shihong Mao & Robert W. Morton & Ricky Lali & Marie Pigeyre & Michael R. Chong & Guillaume Paré, 2024. "A method to estimate the contribution of rare coding variants to complex trait heritability," Nature Communications, Nature, vol. 15(1), pages 1-16, December.
    4. Marsha M. Wheeler & Adrienne M. Stilp & Shuquan Rao & Bjarni V. Halldórsson & Doruk Beyter & Jia Wen & Anna V. Mihkaylova & Caitlin P. McHugh & John Lane & Min-Zhi Jiang & Laura M. Raffield & Goo Jun , 2022. "Whole genome sequencing identifies structural variants contributing to hematologic traits in the NHLBI TOPMed program," Nature Communications, Nature, vol. 13(1), pages 1-18, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:natcom:v:9:y:2018:i:1:d:10.1038_s41467-018-06159-4. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.