IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0007767.html
   My bibliography  Save this article

BFAST: An Alignment Tool for Large Scale Genome Resequencing

Author

Listed:
  • Nils Homer
  • Barry Merriman
  • Stanley F Nelson

Abstract

Background: The new generation of massively parallel DNA sequencers, combined with the challenge of whole human genome resequencing, result in the need for rapid and accurate alignment of billions of short DNA sequence reads to a large reference genome. Speed is obviously of great importance, but equally important is maintaining alignment accuracy of short reads, in the 25–100 base range, in the presence of errors and true biological variation. Methodology: We introduce a new algorithm specifically optimized for this task, as well as a freely available implementation, BFAST, which can align data produced by any of current sequencing platforms, allows for user-customizable levels of speed and accuracy, supports paired end data, and provides for efficient parallel and multi-threaded computation on a computer cluster. The new method is based on creating flexible, efficient whole genome indexes to rapidly map reads to candidate alignment locations, with arbitrary multiple independent indexes allowed to achieve robustness against read errors and sequence variants. The final local alignment uses a Smith-Waterman method, with gaps to support the detection of small indels. Conclusions: We compare BFAST to a selection of large-scale alignment tools - BLAT, MAQ, SHRiMP, and SOAP - in terms of both speed and accuracy, using simulated and real-world datasets. We show BFAST can achieve substantially greater sensitivity of alignment in the context of errors and true variants, especially insertions and deletions, and minimize false mappings, while maintaining adequate speed compared to other current methods. We show BFAST can align the amount of data needed to fully resequence a human genome, one billion reads, with high sensitivity and accuracy, on a modest computer cluster in less than 24 hours. BFAST is available at http://bfast.sourceforge.net.

Suggested Citation

  • Nils Homer & Barry Merriman & Stanley F Nelson, 2009. "BFAST: An Alignment Tool for Large Scale Genome Resequencing," PLOS ONE, Public Library of Science, vol. 4(11), pages 1-12, November.
  • Handle: RePEc:plo:pone00:0007767
    DOI: 10.1371/journal.pone.0007767
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0007767
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0007767&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0007767?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Lars Hahn & Chris-André Leimeister & Rachid Ounit & Stefano Lonardi & Burkhard Morgenstern, 2016. "rasbhari: Optimizing Spaced Seeds for Database Searching, Read Mapping and Alignment-Free Sequence Comparison," PLOS Computational Biology, Public Library of Science, vol. 12(10), pages 1-18, October.
    2. Swetansu Pattnaik & Srividya Vaidyanathan & Durgad G Pooja & Sa Deepak & Binay Panda, 2012. "Customisation of the Exome Data Analysis Pipeline Using a Combinatorial Approach," PLOS ONE, Public Library of Science, vol. 7(1), pages 1-9, January.
    3. Joshua C Bis & Anita DeStefano & Xiaoming Liu & Jennifer A Brody & Seung Hoan Choi & Benjamin F J Verhaaren & Stéphanie Debette & M Arfan Ikram & Eyal Shahar & Kenneth R Butler Jr & Rebecca F Gottesma, 2014. "Associations of NINJ2 Sequence Variants with Incident Ischemic Stroke in the Cohorts for Heart and Aging in Genomic Epidemiology (CHARGE) Consortium," PLOS ONE, Public Library of Science, vol. 9(6), pages 1-7, June.
    4. Le’an Qu & Zhenjie Chen & Manchun Li, 2019. "CART-RF Classification with Multifilter for Monitoring Land Use Changes Based on MODIS Time-Series Data: A Case Study from Jiangsu Province, China," Sustainability, MDPI, vol. 11(20), pages 1-23, October.
    5. Afonso R. M. Almeida & João L. Neto & Ana Cachucho & Mayara Euzébio & Xiangyu Meng & Rathana Kim & Marta B. Fernandes & Beatriz Raposo & Mariana L. Oliveira & Daniel Ribeiro & Rita Fragoso & Priscila , 2021. "Interleukin-7 receptor α mutational activation can initiate precursor B-cell acute lymphoblastic leukemia," Nature Communications, Nature, vol. 12(1), pages 1-16, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0007767. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.