IDEAS home Printed from https://ideas.repec.org/a/nat/natcom/v12y2021i1d10.1038_s41467-021-26938-w.html
   My bibliography  Save this article

Accurate and scalable variant calling from single cell DNA sequencing data with ProSolo

Author

Listed:
  • David Lähnemann

    (Helmholtz Centre for Infection Research
    Technische Universität Braunschweig
    Heinrich Heine University Düsseldorf
    University Hospital, Medical Faculty, Heinrich Heine University Düsseldorf)

  • Johannes Köster

    (University of Duisburg-Essen
    Centrum Wiskunde & Informatica)

  • Ute Fischer

    (University Hospital, Medical Faculty, Heinrich Heine University Düsseldorf)

  • Arndt Borkhardt

    (University Hospital, Medical Faculty, Heinrich Heine University Düsseldorf)

  • Alice C. McHardy

    (Helmholtz Centre for Infection Research
    Technische Universität Braunschweig
    Heinrich Heine University Düsseldorf)

  • Alexander Schönhuth

    (Centrum Wiskunde & Informatica
    Bielefeld University)

Abstract

Accurate single cell mutational profiles can reveal genomic cell-to-cell heterogeneity. However, sequencing libraries suitable for genotyping require whole genome amplification, which introduces allelic bias and copy errors. The resulting data violates assumptions of variant callers developed for bulk sequencing. Thus, only dedicated models accounting for amplification bias and errors can provide accurate calls. We present ProSolo for calling single nucleotide variants from multiple displacement amplified (MDA) single cell DNA sequencing data. ProSolo probabilistically models a single cell jointly with a bulk sequencing sample and integrates all relevant MDA biases in a site-specific and scalable—because computationally efficient—manner. This achieves a higher accuracy in calling and genotyping single nucleotide variants in single cells in comparison to state-of-the-art tools and supports imputation of insufficiently covered genotypes, when downstream tools cannot handle missing data. Moreover, ProSolo implements the first approach to control the false discovery rate reliably and flexibly. ProSolo is implemented in an extendable framework, with code and usage at: https://github.com/prosolo/prosolo

Suggested Citation

  • David Lähnemann & Johannes Köster & Ute Fischer & Arndt Borkhardt & Alice C. McHardy & Alexander Schönhuth, 2021. "Accurate and scalable variant calling from single cell DNA sequencing data with ProSolo," Nature Communications, Nature, vol. 12(1), pages 1-11, December.
  • Handle: RePEc:nat:natcom:v:12:y:2021:i:1:d:10.1038_s41467-021-26938-w
    DOI: 10.1038/s41467-021-26938-w
    as

    Download full text from publisher

    File URL: https://www.nature.com/articles/s41467-021-26938-w
    File Function: Abstract
    Download Restriction: no

    File URL: https://libkey.io/10.1038/s41467-021-26938-w?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Brandon Milholland & Xiao Dong & Lei Zhang & Xiaoxiao Hao & Yousin Suh & Jan Vijg, 2017. "Differences between germline and somatic mutation rates in humans and mice," Nature Communications, Nature, vol. 8(1), pages 1-8, August.
    2. Yong Wang & Jill Waters & Marco L. Leung & Anna Unruh & Whijae Roh & Xiuqing Shi & Ken Chen & Paul Scheet & Selina Vattathil & Han Liang & Asha Multani & Hong Zhang & Rui Zhao & Franziska Michor & Fun, 2014. "Clonal evolution in breast cancer revealed by single nucleus genome sequencing," Nature, Nature, vol. 512(7513), pages 155-160, August.
    3. Salem Malikic & Katharina Jahn & Jack Kuipers & S. Cenk Sahinalp & Niko Beerenwinkel, 2019. "Integrative inference of subclonal tumour evolution from single-cell and bulk sequencing data," Nature Communications, Nature, vol. 10(1), pages 1-12, December.
    4. Gang Peng & Yu Fan & Wenyi Wang, 2014. "FamSeq: A Variant Calling Program for Family-Based Sequencing Data Using Graphics Processing Units," PLOS Computational Biology, Public Library of Science, vol. 10(10), pages 1-6, October.
    5. Jochen Singer & Jack Kuipers & Katharina Jahn & Niko Beerenwinkel, 2018. "Single-cell mutation identification via phylogenetic inference," Nature Communications, Nature, vol. 9(1), pages 1-8, December.
    6. Peter Muller & Giovanni Parmigiani & Christian Robert & Judith Rousseau, 2004. "Optimal Sample Size for Multiple Testing: The Case of Gene Expression Microarrays," Journal of the American Statistical Association, American Statistical Association, vol. 99, pages 990-1001, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Wei Sun & Chong Jin & Jonathan A. Gelfond & Ming‐Hui Chen & Joseph G. Ibrahim, 2020. "Joint analysis of single‐cell and bulk tissue sequencing data to infer intratumor heterogeneity," Biometrics, The International Biometric Society, vol. 76(3), pages 983-994, September.
    2. Haochen Zhang & Elias-Ramzey Karnoub & Shigeaki Umeda & Ronan Chaligné & Ignas Masilionis & Caitlin A. McIntyre & Palash Sashittal & Akimasa Hayashi & Amanda Zucker & Katelyn Mullen & Jungeui Hong & A, 2023. "Application of high-throughput single-nucleus DNA sequencing in pancreatic cancer," Nature Communications, Nature, vol. 14(1), pages 1-14, December.
    3. Seong-Hwan Jun & Hosein Toosi & Jeff Mold & Camilla Engblom & Xinsong Chen & Ciara O’Flanagan & Michael Hagemann-Jensen & Rickard Sandberg & Samuel Aparicio & Johan Hartman & Andrew Roth & Jens Lagerg, 2023. "Reconstructing clonal tree for phylo-phenotypic characterization of cancer using single-cell transcriptomics," Nature Communications, Nature, vol. 14(1), pages 1-16, December.
    4. Humberto Contreras-Trujillo & Jiya Eerdeng & Samir Akre & Du Jiang & Jorge Contreras & Basia Gala & Mary C. Vergel-Rodriguez & Yeachan Lee & Aparna Jorapur & Areen Andreasian & Lisa Harton & Charles S, 2021. "Deciphering intratumoral heterogeneity using integrated clonal tracking and single-cell transcriptome analyses," Nature Communications, Nature, vol. 12(1), pages 1-14, December.
    5. Ghosh Debashis, 2012. "Incorporating the Empirical Null Hypothesis into the Benjamini-Hochberg Procedure," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 11(4), pages 1-21, July.
    6. Gómez-Villegas Miguel A. & Sanz Luis & Salazar Isabel, 2014. "A Bayesian decision procedure for testing multiple hypotheses in DNA microarray experiments," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 13(1), pages 49-65, February.
    7. Cheng-Kai Shiau & Lina Lu & Rachel Kieser & Kazutaka Fukumura & Timothy Pan & Hsiao-Yun Lin & Jie Yang & Eric L. Tong & GaHyun Lee & Yuanqing Yan & Jason T. Huse & Ruli Gao, 2023. "High throughput single cell long-read sequencing analyses of same-cell genotypes and phenotypes in human tumors," Nature Communications, Nature, vol. 14(1), pages 1-12, December.
    8. Edsel Peña & Joshua Habiger & Wensong Wu, 2015. "Classes of multiple decision functions strongly controlling FWER and FDR," Metrika: International Journal for Theoretical and Applied Statistics, Springer, vol. 78(5), pages 563-595, July.
    9. Xiaoquan Wen, 2017. "Robust Bayesian FDR Control Using Bayes Factors, with Applications to Multi-tissue eQTL Discovery," Statistics in Biosciences, Springer;International Chinese Statistical Association, vol. 9(1), pages 28-49, June.
    10. Zichen Ma & Shannon W. Davis & Yen‐Yi Ho, 2023. "Flexible copula model for integrating correlated multi‐omics data from single‐cell experiments," Biometrics, The International Biometric Society, vol. 79(2), pages 1559-1572, June.
    11. Noirrit Kiran Chandra & Sourabh Bhattacharya, 2021. "Asymptotic theory of dependent Bayesian multiple testing procedures under possible model misspecification," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 73(5), pages 891-920, October.
    12. Xiang Ge Luo & Jack Kuipers & Niko Beerenwinkel, 2023. "Joint inference of exclusivity patterns and recurrent trajectories from tumor mutation trees," Nature Communications, Nature, vol. 14(1), pages 1-14, December.
    13. Jinhyun Kim & Sungsik Kim & Huiran Yeom & Seo Woo Song & Kyoungseob Shin & Sangwook Bae & Han Suk Ryu & Ji Young Kim & Ahyoun Choi & Sumin Lee & Taehoon Ryu & Yeongjae Choi & Hamin Kim & Okju Kim & Yu, 2023. "Barcoded multiple displacement amplification for high coverage sequencing in spatial genomics," Nature Communications, Nature, vol. 14(1), pages 1-18, December.
    14. Brian M Lang & Jack Kuipers & Benjamin Misselwitz & Niko Beerenwinkel, 2020. "Predicting colorectal cancer risk from adenoma detection via a two-type branching process model," PLOS Computational Biology, Public Library of Science, vol. 16(2), pages 1-23, February.
    15. Sudipto Banerjee, 2023. "Discussion of “Optimal test procedures for multiple hypotheses controlling the familywise expected loss” by Willi Maurer, Frank Bretz, and Xiaolei Xun," Biometrics, The International Biometric Society, vol. 79(4), pages 2798-2801, December.
    16. Ashley T. Sendell-Price & Frank J. Tulenko & Mats Pettersson & Du Kang & Margo Montandon & Sylke Winkler & Kathleen Kulb & Gavin P. Naylor & Adam Phillippy & Olivier Fedrigo & Jacquelyn Mountcastle & , 2023. "Low mutation rate in epaulette sharks is consistent with a slow rate of evolution in sharks," Nature Communications, Nature, vol. 14(1), pages 1-13, December.
    17. Etienne Sollier & Jack Kuipers & Koichi Takahashi & Niko Beerenwinkel & Katharina Jahn, 2023. "COMPASS: joint copy number and mutation phylogeny reconstruction from amplicon single-cell sequencing data," Nature Communications, Nature, vol. 14(1), pages 1-13, December.
    18. Willi Maurer & Frank Bretz & Xiaolei Xun, 2023. "Rejoinder to discussions on “Optimal test procedures for multiple hypotheses controlling the familywise expected loss”," Biometrics, The International Biometric Society, vol. 79(4), pages 2811-2814, December.
    19. Kiranmoy Das, 2016. "A semiparametric Bayesian approach for joint modeling of longitudinal trait and event time," Journal of Applied Statistics, Taylor & Francis Journals, vol. 43(15), pages 2850-2865, November.
    20. Jack Jewson & Li Li & Laura Battaglia & Stephen Hansen & David Rossell & Piotr Zwiernik, 2022. "Graphical model inference with external network data," CeMMAP working papers 20/22, Institute for Fiscal Studies.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:natcom:v:12:y:2021:i:1:d:10.1038_s41467-021-26938-w. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.