IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0260177.html
   My bibliography  Save this article

A comparative analysis of current phasing and imputation software

Author

Listed:
  • Adriano De Marino
  • Abdallah Amr Mahmoud
  • Madhuchanda Bose
  • Karatuğ Ozan Bircan
  • Andrew Terpolovsky
  • Varuna Bamunusinghe
  • Sandra Bohn
  • Umar Khan
  • Biljana Novković
  • Puya G Yazdi

Abstract

Whole-genome data has become significantly more accessible over the last two decades. This can largely be attributed to both reduced sequencing costs and imputation models which make it possible to obtain nearly whole-genome data from less expensive genotyping methods, such as microarray chips. Although there are many different approaches to imputation, the Hidden Markov Model (HMM) remains the most widely used. In this study, we compared the latest versions of the most popular HMM-based tools for phasing and imputation: Beagle5.4, Eagle2.4.1, Shapeit4, Impute5 and Minimac4. We benchmarked them on four input datasets with three levels of chip density. We assessed each imputation software on the basis of accuracy, speed and memory usage, and showed how the choice of imputation accuracy metric can result in different interpretations. The highest average concordance rate was achieved by Beagle5.4, followed by Impute5 and Minimac4, using a reference-based approach during phasing and the highest density chip. IQS and R2 metrics revealed that Impute5 and Minimac4 obtained better results for low frequency markers, while Beagle5.4 remained more accurate for common markers (MAF>5%). Computational load as measured by run time was lower for Beagle5.4 than Minimac4 and Impute5, while Minimac4 utilized the least memory of the imputation tools we compared. ShapeIT4, used the least memory of the phasing tools examined with genotype chip data, while Eagle2.4.1 used the least memory phasing WGS data. Finally, we determined the combination of phasing software, imputation software, and reference panel, best suited for different situations and analysis needs and created an automated pipeline that provides a way for users to create customized chips designed to optimize their imputation results.

Suggested Citation

  • Adriano De Marino & Abdallah Amr Mahmoud & Madhuchanda Bose & Karatuğ Ozan Bircan & Andrew Terpolovsky & Varuna Bamunusinghe & Sandra Bohn & Umar Khan & Biljana Novković & Puya G Yazdi, 2022. "A comparative analysis of current phasing and imputation software," PLOS ONE, Public Library of Science, vol. 17(10), pages 1-22, October.
  • Handle: RePEc:plo:pone00:0260177
    DOI: 10.1371/journal.pone.0260177
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0260177
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0260177&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0260177?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Olivier Delaneau & Jean-François Zagury & Matthew R. Robinson & Jonathan L. Marchini & Emmanouil T. Dermitzakis, 2019. "Accurate, scalable and integrative haplotype estimation," Nature Communications, Nature, vol. 10(1), pages 1-10, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Heng Du & Lei Zhou & Zhen Liu & Yue Zhuo & Meilin Zhang & Qianqian Huang & Shiyu Lu & Kai Xing & Li Jiang & Jian-Feng Liu, 2024. "The 1000 Chinese Indigenous Pig Genomes Project provides insights into the genomic architecture of pigs," Nature Communications, Nature, vol. 15(1), pages 1-18, December.
    2. Katherine A. Kentistou & Brandon E. M. Lim & Lena R. Kaisinger & Valgerdur Steinthorsdottir & Luke N. Sharp & Kashyap A. Patel & Vinicius Tragante & Gareth Hawkes & Eugene J. Gardner & Thorhildur Olaf, 2025. "Rare variant associations with birth weight identify genes involved in adipose tissue regulation, placental function and insulin-like growth factor signalling," Nature Communications, Nature, vol. 16(1), pages 1-12, December.
    3. Saedis Saevarsdottir & Kristbjörg Bjarnadottir & Thorsteinn Markusson & Jonas Berglund & Thorunn A. Olafsdottir & Gisli H. Halldorsson & Gudrun Rutsdottir & Kristbjorg Gunnarsdottir & Asgeir Orn Arnth, 2024. "Start codon variant in LAG3 is associated with decreased LAG-3 expression and increased risk of autoimmune thyroid disease," Nature Communications, Nature, vol. 15(1), pages 1-12, December.
    4. Kyuto Sonehara & Yoshitaka Yano & Tatsuhiko Naito & Shinobu Goto & Hiroyuki Yoshihara & Takahiro Otani & Fumiko Ozawa & Tamao Kitaori & Koichi Matsuda & Takashi Nishiyama & Yukinori Okada & Mayumi Sug, 2024. "Common and rare genetic variants predisposing females to unexplained recurrent pregnancy loss," Nature Communications, Nature, vol. 15(1), pages 1-9, December.
    5. Lijing Tang & Benjamin Swedlund & Sébastien Dupont & Chad Harland & Gabriel Costa Monteiro Moreira & Keith Durkin & Maria Artesi & Eric Mullaart & Arnaud Sartelet & Latifa Karim & Wouter Coppieters & , 2024. "GWAS reveals determinants of mobilization rate and dynamics of an active endogenous retrovirus of cattle," Nature Communications, Nature, vol. 15(1), pages 1-18, December.
    6. Robin J. Hofmeister & Simone Rubinacci & Diogo M. Ribeiro & Alfonso Buil & Zoltán Kutalik & Olivier Delaneau, 2022. "Parent-of-Origin inference for biobanks," Nature Communications, Nature, vol. 13(1), pages 1-15, December.
    7. Megan C. Lancaster & Hung-Hsin Chen & M. Benjamin Shoemaker & Matthew R. Fleming & Teresa L. Strickland & James T. Baker & Grahame F. Evans & Hannah G. Polikowsky & David C. Samuels & Chad D. Huff & D, 2024. "Detection of distant relatedness in biobanks to identify undiagnosed cases of Mendelian disease as applied to Long QT syndrome," Nature Communications, Nature, vol. 15(1), pages 1-10, December.
    8. Junhui Yuan & Sanjie Jiang & Jianbo Jian & Mingyu Liu & Zhen Yue & Jiabao Xu & Juan Li & Chunyan Xu & Lihong Lin & Yi Jing & Xiaoxiao Zhang & Haixin Chen & Linjuan Zhang & Tao Fu & Shuiyan Yu & Zhangy, 2022. "Genomic basis of the giga-chromosomes and giga-genome of tree peony Paeonia ostii," Nature Communications, Nature, vol. 13(1), pages 1-16, December.
    9. Xinkai Tong & Dong Chen & Jianchao Hu & Shiyao Lin & Ziqi Ling & Huashui Ai & Zhiyan Zhang & Lusheng Huang, 2023. "Accurate haplotype construction and detection of selection signatures enabled by high quality pig genome sequences," Nature Communications, Nature, vol. 14(1), pages 1-11, December.
    10. Bárbara Sousa da Mota & Simone Rubinacci & Diana Ivette Cruz Dávalos & Carlos Eduardo G. Amorim & Martin Sikora & Niels N. Johannsen & Marzena H. Szmyt & Piotr Włodarczak & Anita Szczepanek & Marcin M, 2023. "Imputation of ancient human genomes," Nature Communications, Nature, vol. 14(1), pages 1-17, December.
    11. Parker C. Wilson & Yoshiharu Muto & Haojia Wu & Anil Karihaloo & Sushrut S. Waikar & Benjamin D. Humphreys, 2022. "Multimodal single cell sequencing implicates chromatin accessibility and genetic background in diabetic kidney disease progression," Nature Communications, Nature, vol. 13(1), pages 1-20, December.
    12. Seppe Goovaerts & Hanne Hoskens & Ryan J. Eller & Noah Herrick & Anthony M. Musolf & Cristina M. Justice & Meng Yuan & Sahin Naqvi & Myoung Keun Lee & Dirk Vandermeulen & Heather L. Szabo-Rogers & Pau, 2023. "Joint multi-ancestry and admixed GWAS reveals the complex genetics behind human cranial vault shape," Nature Communications, Nature, vol. 14(1), pages 1-21, December.
    13. Luca Cornetti & Peter D. Fields & Louis Du Pasquier & Dieter Ebert, 2024. "Long-term balancing selection for pathogen resistance maintains trans-species polymorphisms in a planktonic crustacean," Nature Communications, Nature, vol. 15(1), pages 1-11, December.
    14. Gabriel E. Rech & Santiago Radío & Sara Guirao-Rico & Laura Aguilera & Vivien Horvath & Llewellyn Green & Hannah Lindstadt & Véronique Jamilloux & Hadi Quesneville & Josefa González, 2022. "Population-scale long-read sequencing uncovers transposable elements associated with gene expression variation and adaptive signatures in Drosophila," Nature Communications, Nature, vol. 13(1), pages 1-16, December.
    15. Mathilde André & Nicolas Brucato & Georgi Hudjasov & Vasili Pankratov & Danat Yermakovich & Francesco Montinaro & Rita Kreevan & Jason Kariwiga & John Muke & Anne Boland & Jean-François Deleuze & Vinc, 2024. "Positive selection in the genomes of two Papua New Guinean populations at distinct altitude levels," Nature Communications, Nature, vol. 15(1), pages 1-15, December.
    16. Matteo Sebastianelli & Sifiso M. Lukhele & Simona Secomandi & Stacey G. Souza & Bettina Haase & Michaella Moysi & Christos Nikiforou & Alexander Hutfluss & Jacquelyn Mountcastle & Jennifer Balacco & S, 2024. "A genomic basis of vocal rhythm in birds," Nature Communications, Nature, vol. 15(1), pages 1-15, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0260177. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.