IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1002604.html

Efficiency and Power as a Function of Sequence Coverage, SNP Array Density, and Imputation

Author

Listed:
  • Jason Flannick
  • Joshua M Korn
  • Pierre Fontanillas
  • George B Grant
  • Eric Banks
  • Mark A Depristo
  • David Altshuler

Abstract

High coverage whole genome sequencing provides near complete information about genetic variation. However, other technologies can be more efficient in some settings by (a) reducing redundant coverage within samples and (b) exploiting patterns of genetic variation across samples. To characterize as many samples as possible, many genetic studies therefore employ lower coverage sequencing or SNP array genotyping coupled to statistical imputation. To compare these approaches individually and in conjunction, we developed a statistical framework to estimate genotypes jointly from sequence reads, array intensities, and imputation. In European samples, we find similar sensitivity (89%) and specificity (99.6%) from imputation with either 1× sequencing or 1 M SNP arrays. Sensitivity is increased, particularly for low-frequency polymorphisms (), when low coverage sequence reads are added to dense genome-wide SNP arrays — the converse, however, is not true. At sites where sequence reads and array intensities produce different sample genotypes, joint analysis reduces genotype errors and identifies novel error modes. Our joint framework informs the use of next-generation sequencing in genome wide association studies and supports development of improved methods for genotype calling. Author Summary: In this work we address a series of questions prompted by the rise of next-generation sequencing as a data collection strategy for genetic studies. How does low coverage sequencing compare to traditional microarray based genotyping? Do studies increase sensitivity by collecting both sequencing and array data? What can we learn about technology error modes based on analysis of SNPs for which sequence and array data disagree? To answer these questions, we developed a statistical framework to estimate genotypes from sequence reads, array intensities, and imputation. Through experiments with intensity and read data from the Hapmap and 1000 Genomes (1000 G) Projects, we show that 1 M SNP arrays used for genome wide association studies perform similarly to 1× sequencing. We find that adding low coverage sequence reads to dense array data significantly increases rare variant sensitivity, but adding dense array data to low coverage sequencing has only a small impact. Finally, we describe an improved SNP calling algorithm used in the 1000 G project, inspired by a novel next-generation sequencing error mode identified through analysis of disputed SNPs. These results inform the use of next-generation sequencing in genetic studies and model an approach to further improve genotype calling methods.

Suggested Citation

  • Jason Flannick & Joshua M Korn & Pierre Fontanillas & George B Grant & Eric Banks & Mark A Depristo & David Altshuler, 2012. "Efficiency and Power as a Function of Sequence Coverage, SNP Array Density, and Imputation," PLOS Computational Biology, Public Library of Science, vol. 8(7), pages 1-13, July.
  • Handle: RePEc:plo:pcbi00:1002604
    DOI: 10.1371/journal.pcbi.1002604
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1002604
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1002604&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1002604?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Sarah B. Ng & Emily H. Turner & Peggy D. Robertson & Steven D. Flygare & Abigail W. Bigham & Choli Lee & Tristan Shaffer & Michelle Wong & Arindam Bhattacharjee & Evan E. Eichler & Michael Bamshad & D, 2009. "Targeted capture and massively parallel sequencing of 12 human exomes," Nature, Nature, vol. 461(7261), pages 272-276, September.
    2. David Reich & Kumarasamy Thangaraj & Nick Patterson & Alkes L. Price & Lalji Singh, 2009. "Reconstructing Indian population history," Nature, Nature, vol. 461(7263), pages 489-494, September.
    3. Yongtao Guan & Matthew Stephens, 2008. "Practical Issues in Imputation-Based Association Mapping," PLOS Genetics, Public Library of Science, vol. 4(12), pages 1-11, December.
    4. Heng Li & Richard Durbin, 2011. "Inference of human population history from individual whole-genome sequences," Nature, Nature, vol. 475(7357), pages 493-496, July.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. repec:plo:pgen00:1002078 is not listed on IDEAS
    2. Elaine T. Lim & Yingleong Chan & Pepper Dawes & Xiaoge Guo & Serkan Erdin & Derek J. C. Tai & Songlei Liu & Julia M. Reichert & Mannix J. Burns & Ying Kai Chan & Jessica J. Chiang & Katharina Meyer & , 2022. "Orgo-Seq integrates single-cell and bulk transcriptomic data to identify cell type specific-driver genes associated with autism spectrum disorder," Nature Communications, Nature, vol. 13(1), pages 1-14, December.
    3. Zura Kakushadze & Willie Yu, 2017. "Mutation Clusters from Cancer Exome," Papers 1707.08504, arXiv.org.
    4. Gyaneshwer Chaubey & Anurag Kadian & Saroj Bala & Vadlamudi Raghavendra Rao, 2015. "Genetic Affinity of the Bhil, Kol and Gond Mentioned in Epic Ramayana," PLOS ONE, Public Library of Science, vol. 10(6), pages 1-11, June.
    5. Thomas L. Schmidt & Nancy M. Endersby-Harshman & Anthony R. J. Rooyen & Michelle Katusele & Rebecca Vinit & Leanne J. Robinson & Moses Laman & Stephan Karl & Ary A. Hoffmann, 2024. "Global, asynchronous partial sweeps at multiple insecticide resistance genes in Aedes mosquitoes," Nature Communications, Nature, vol. 15(1), pages 1-19, December.
    6. Gideon S Bradburd & Peter L Ralph & Graham M Coop, 2016. "A Spatial Framework for Understanding Population Structure and Admixture," PLOS Genetics, Public Library of Science, vol. 12(1), pages 1-38, January.
    7. Kevin J Liu & Jingxuan Dai & Kathy Truong & Ying Song & Michael H Kohn & Luay Nakhleh, 2014. "An HMM-Based Comparative Genomic Framework for Detecting Introgression in Eukaryotes," PLOS Computational Biology, Public Library of Science, vol. 10(6), pages 1-13, June.
    8. Kimmel, Marek & Wojdyła, Tomasz, 2016. "Genetic demographic networks: Mathematical model and applications," Theoretical Population Biology, Elsevier, vol. 111(C), pages 75-86.
    9. Juraj Bergman & Rasmus Ø. Pedersen & Erick J. Lundgren & Rhys T. Lemoine & Sophie Monsarrat & Elena A. Pearce & Mikkel H. Schierup & Jens-Christian Svenning, 2023. "Worldwide Late Pleistocene and Early Holocene population declines in extant megafauna are associated with Homo sapiens expansion rather than climate change," Nature Communications, Nature, vol. 14(1), pages 1-11, December.
    10. Mason Liang & Mikhail Shishkin & Anastasia Mikhailova & Vladimir Shchur & Rasmus Nielsen, 2022. "Estimating the timing of multiple admixture events using 3-locus linkage disequilibrium," PLOS Genetics, Public Library of Science, vol. 18(7), pages 1-17, July.
    11. Per Unneberg & Mårten Larsson & Anna Olsson & Ola Wallerman & Anna Petri & Ignas Bunikis & Olga Vinnere Pettersson & Chiara Papetti & Astthor Gislason & Henrik Glenner & Joan E. Cartes & Leocadio Blan, 2024. "Ecological genomics in the Northern krill uncovers loci for local adaptation across ocean basins," Nature Communications, Nature, vol. 15(1), pages 1-29, December.
    12. Michael Bridges & Elizabeth A Heron & Colm O'Dushlaine & Ricardo Segurado & The International Schizophrenia Consortium (ISC) & Derek Morris & Aiden Corvin & Michael Gill & Carlos Pinto, 2011. "Genetic Classification of Populations Using Supervised Learning," PLOS ONE, Public Library of Science, vol. 6(5), pages 1-12, May.
    13. Yvonne Willi & Kay Lucek & Olivier Bachmann & Nora Walden, 2022. "Recent speciation associated with range expansion and a shift to self-fertilization in North American Arabidopsis," Nature Communications, Nature, vol. 13(1), pages 1-10, December.
    14. Kay Young McChesney, 2015. "Teaching Diversity," SAGE Open, , vol. 5(4), pages 21582440156, October.
    15. Priya Moorjani & Nick Patterson & Joel N Hirschhorn & Alon Keinan & Li Hao & Gil Atzmon & Edward Burns & Harry Ostrer & Alkes L Price & David Reich, 2011. "The History of African Gene Flow into Southern Europeans, Levantines, and Jews," PLOS Genetics, Public Library of Science, vol. 7(4), pages 1-13, April.
    16. Barbara E Engelhardt & Matthew Stephens, 2010. "Analysis of Population Structure: A Unifying Framework and Novel Methods Based on Sparse Factor Analysis," PLOS Genetics, Public Library of Science, vol. 6(9), pages 1-12, September.
    17. Yedael Y Waldman & Arjun Biddanda & Natalie R Davidson & Paul Billing-Ross & Maya Dubrovsky & Christopher L Campbell & Carole Oddoux & Eitan Friedman & Gil Atzmon & Eran Halperin & Harry Ostrer & Alon, 2016. "The Genetics of Bene Israel from India Reveals Both Substantial Jewish and Indian Ancestry," PLOS ONE, Public Library of Science, vol. 11(3), pages 1-28, March.
    18. Priya Moorjani & Nick Patterson & Po-Ru Loh & Mark Lipson & Péter Kisfali & Bela I Melegh & Michael Bonin & Ľudevít Kádaši & Olaf Rieß & Bonnie Berger & David Reich & Béla Melegh, 2013. "Reconstructing Roma History from Genome-Wide Data," PLOS ONE, Public Library of Science, vol. 8(3), pages 1-11, March.
    19. Temple, Seth D. & Thompson, Elizabeth A., 2025. "Identity-by-descent segments in large samples," Theoretical Population Biology, Elsevier, vol. 165(C), pages 10-21.
    20. Ya-Mei Ding & Xiao-Xu Pang & Yu Cao & Wei-Ping Zhang & Susanne S. Renner & Da-Yong Zhang & Wei-Ning Bai, 2023. "Genome structure-based Juglandaceae phylogenies contradict alignment-based phylogenies and substitution rates vary with DNA repair genes," Nature Communications, Nature, vol. 14(1), pages 1-13, December.
    21. Romain Fournier & Zoi Tsangalidou & David Reich & Pier Francesco Palamara, 2023. "Haplotype-based inference of recent effective population size in modern and ancient DNA samples," Nature Communications, Nature, vol. 14(1), pages 1-13, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1002604. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.