IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1002195.html
   My bibliography  Save this article

Accelerated Profile HMM Searches

Author

Listed:
  • Sean R Eddy

Abstract

Profile hidden Markov models (profile HMMs) and probabilistic inference methods have made important contributions to the theory of sequence database homology search. However, practical use of profile HMM methods has been hindered by the computational expense of existing software implementations. Here I describe an acceleration heuristic for profile HMMs, the “multiple segment Viterbi” (MSV) algorithm. The MSV algorithm computes an optimal sum of multiple ungapped local alignment segments using a striped vector-parallel approach previously described for fast Smith/Waterman alignment. MSV scores follow the same statistical distribution as gapped optimal local alignment scores, allowing rapid evaluation of significance of an MSV score and thus facilitating its use as a heuristic filter. I also describe a 20-fold acceleration of the standard profile HMM Forward/Backward algorithms using a method I call “sparse rescaling”. These methods are assembled in a pipeline in which high-scoring MSV hits are passed on for reanalysis with the full HMM Forward/Backward algorithm. This accelerated pipeline is implemented in the freely available HMMER3 software package. Performance benchmarks show that the use of the heuristic MSV filter sacrifices negligible sensitivity compared to unaccelerated profile HMM searches. HMMER3 is substantially more sensitive and 100- to 1000-fold faster than HMMER2. HMMER3 is now about as fast as BLAST for protein searches. Author Summary: Searching sequence databases is one of the most important applications in computational molecular biology. The main workhorse in the field is the BLAST suite of programs. Since the introduction of BLAST in the 1990's, important theoretical advances in homology search methodology have been made using probabilistic inference methods and hidden Markov models (HMMs). However, previous software implementations of these newer probabilistic methods were slower than BLAST by about 100-fold. This hindered their utility, because computation speed is so critical with the rapidly increasing size of modern sequence databases. Here I describe the acceleration methods I implemented in a new, freely available profile HMM software package, HMMER3. HMMER3 makes profile HMM searches about as fast as BLAST, while retaining the power of using probabilistic inference technology.

Suggested Citation

  • Sean R Eddy, 2011. "Accelerated Profile HMM Searches," PLOS Computational Biology, Public Library of Science, vol. 7(10), pages 1-16, October.
  • Handle: RePEc:plo:pcbi00:1002195
    DOI: 10.1371/journal.pcbi.1002195
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1002195
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1002195&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1002195?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Sean R Eddy, 2008. "A Probabilistic Model of Local Sequence Alignment That Simplifies Statistical Significance Estimation," PLOS Computational Biology, Public Library of Science, vol. 4(5), pages 1-14, May.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Bilig Sod & Lei Xu & Yajiao Liu & Fei He & Yanchao Xu & Mingna Li & Tianhui Yang & Ting Gao & Junmei Kang & Qingchuan Yang & Ruicai Long, 2023. "Genome-Wide Identification and Expression Analysis of the CesA/Csl Gene Superfamily in Alfalfa ( Medicago sativa L.)," Agriculture, MDPI, vol. 13(9), pages 1-14, August.
    2. Alejandro Ochoa & John D Storey & Manuel Llinás & Mona Singh, 2015. "Beyond the E-Value: Stratified Statistics for Protein Domain Prediction," PLOS Computational Biology, Public Library of Science, vol. 11(11), pages 1-21, November.
    3. Snehal Dilip Karpe & Vikas Tiwari & Sowdhamini Ramanathan, 2021. "InsectOR—Webserver for sensitive identification of insect olfactory receptor genes from non-model genomes," PLOS ONE, Public Library of Science, vol. 16(1), pages 1-15, January.
    4. Damiano Piovesan & Andras Hatos & Giovanni Minervini & Federica Quaglia & Alexander Miguel Monzon & Silvio C E Tosatto, 2020. "Assessing predictors for new post translational modification sites: A case study on hydroxylation," PLOS Computational Biology, Public Library of Science, vol. 16(6), pages 1-15, June.
    5. Binqi Li & Muhammad Moaaz Ali & Tianxin Guo & Shariq Mahmood Alam & Shaista Gull & Junaid Iftikhar & Ahmed Fathy Yousef & Walid F. A. Mosa & Faxing Chen, 2022. "Genome-Wide Identification, In Silico Analysis and Expression Profiling of SWEET Gene Family in Loquat ( Eriobotrya japonica Lindl.)," Agriculture, MDPI, vol. 12(9), pages 1-17, August.
    6. Gerry Q Tonkin-Hill & Leily Trianty & Rintis Noviyanti & Hanh H T Nguyen & Boni F Sebayang & Daniel A Lampah & Jutta Marfurt & Simon A Cobbold & Janavi S Rambhatla & Malcolm J McConville & Stephen J R, 2018. "The Plasmodium falciparum transcriptome in severe malaria reveals altered expression of genes involved in important processes including surface antigen–encoding var genes," PLOS Biology, Public Library of Science, vol. 16(3), pages 1-40, March.
    7. Atul Kumar Upadhyay & Ramanathan Sowdhamini, 2016. "Genome-Wide Prediction and Analysis of 3D-Domain Swapped Proteins in the Human Genome from Sequence Information," PLOS ONE, Public Library of Science, vol. 11(7), pages 1-20, July.
    8. Amit A Upadhyay & Aaron D Fleetwood & Ogun Adebali & Robert D Finn & Igor B Zhulin, 2016. "Cache Domains That are Homologous to, but Different from PAS Domains Comprise the Largest Superfamily of Extracellular Sensors in Prokaryotes," PLOS Computational Biology, Public Library of Science, vol. 12(4), pages 1-21, April.
    9. William C Nelson & Emily B Graham & Alex R Crump & Sarah J Fansler & Evan V Arntzen & David W Kennedy & James C Stegen, 2020. "Distinct temporal diversity profiles for nitrogen cycling genes in a hyporheic microbiome," PLOS ONE, Public Library of Science, vol. 15(1), pages 1-19, January.
    10. Marco Orlando & Patrick C F Buchholz & Marina Lotti & Jürgen Pleiss, 2021. "The GH19 Engineering Database: Sequence diversity, substrate scope, and evolution in glycoside hydrolase family 19," PLOS ONE, Public Library of Science, vol. 16(10), pages 1-30, October.
    11. Ezequiel A Galpern & María I Freiberger & Diego U Ferreiro, 2020. "Large Ankyrin repeat proteins are formed with similar and energetically favorable units," PLOS ONE, Public Library of Science, vol. 15(6), pages 1-16, June.
    12. David Lee & Sayoni Das & Natalie L Dawson & Dragana Dobrijevic & John Ward & Christine Orengo, 2016. "Novel Computational Protocols for Functionally Classifying and Characterising Serine Beta-Lactamases," PLOS Computational Biology, Public Library of Science, vol. 12(6), pages 1-33, June.
    13. Cuncong Zhong & Anna Edlund & Youngik Yang & Jeffrey S McLean & Shibu Yooseph, 2016. "Metagenome and Metatranscriptome Analyses Using Protein Family Profiles," PLOS Computational Biology, Public Library of Science, vol. 12(7), pages 1-22, July.
    14. Dowan Kim & Myunghee Jung & In Jin Ha & Min Young Lee & Seok-Geun Lee & Younhee Shin & Sathiyamoorthy Subramaniyam & Jaehyeon Oh, 2018. "Transcriptional Profiles of Secondary Metabolite Biosynthesis Genes and Cytochromes in the Leaves of Four Papaver Species," Data, MDPI, vol. 3(4), pages 1-15, November.
    15. Jaume Bonet & Sarah Wehrle & Karen Schriever & Che Yang & Anne Billet & Fabian Sesterhenn & Andreas Scheck & Freyr Sverrisson & Barbora Veselkova & Sabrina Vollers & Roxanne Lourman & Mélanie Villard , 2018. "Rosetta FunFolDes – A general framework for the computational design of functional proteins," PLOS Computational Biology, Public Library of Science, vol. 14(11), pages 1-30, November.
    16. Samantha Petti & Sean R Eddy, 2022. "Constructing benchmark test sets for biological sequence analysis using independent set algorithms," PLOS Computational Biology, Public Library of Science, vol. 18(3), pages 1-14, March.
    17. Sony Malhotra & Ali F Alsulami & Yang Heiyun & Bernardo Montano Ochoa & Harry Jubb & Simon Forbes & Tom L Blundell, 2019. "Understanding the impacts of missense mutations on structures and functions of human cancer-related genes: A preliminary computational analysis of the COSMIC Cancer Gene Census," PLOS ONE, Public Library of Science, vol. 14(7), pages 1-22, July.
    18. Balázs Szalkai & Ildikó Scheer & Kinga Nagy & Beáta G Vértessy & Vince Grolmusz, 2014. "The Metagenomic Telescope," PLOS ONE, Public Library of Science, vol. 9(7), pages 1-9, July.
    19. Jianzhu Ma & Sheng Wang & Zhiyong Wang & Jinbo Xu, 2014. "MRFalign: Protein Homology Detection through Alignment of Markov Random Fields," PLOS Computational Biology, Public Library of Science, vol. 10(3), pages 1-12, March.
    20. Yang Li & Chengxin Zhang & Eric W Bell & Wei Zheng & Xiaogen Zhou & Dong-Jun Yu & Yang Zhang, 2021. "Deducing high-accuracy protein contact-maps from a triplet of coevolutionary matrices through deep residual convolutional networks," PLOS Computational Biology, Public Library of Science, vol. 17(3), pages 1-19, March.
    21. Dong-Hyun Kim & Hyun-Sik Yun & Young-Saeng Kim & Jong-Guk Kim, 2021. "Pollutant-Removing Biofilter Strains Associated with High Ammonia and Hydrogen Sulfide Removal Rate in a Livestock Wastewater Treatment Facility," Sustainability, MDPI, vol. 13(13), pages 1-16, June.
    22. Nitish K Mishra & Junil Chang & Patrick X Zhao, 2014. "Prediction of Membrane Transport Proteins and Their Substrate Specificities Using Primary Sequence Information," PLOS ONE, Public Library of Science, vol. 9(6), pages 1-14, June.
    23. Ngaam J Cheung & Wookyung Yu, 2018. "De novo protein structure prediction using ultra-fast molecular dynamics simulation," PLOS ONE, Public Library of Science, vol. 13(11), pages 1-17, November.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Alejandro Ochoa & John D Storey & Manuel Llinás & Mona Singh, 2015. "Beyond the E-Value: Stratified Statistics for Protein Domain Prediction," PLOS Computational Biology, Public Library of Science, vol. 11(11), pages 1-21, November.
    2. Friedrich Torben & Koetschan Christian & Müller Tobias, 2010. "Optimisation of HMM Topologies Enhances DNA and Protein Sequence Modelling," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 9(1), pages 1-27, January.
    3. Angelina Beavogui & Auriane Lacroix & Nicolas Wiart & Julie Poulain & Tom O. Delmont & Lucas Paoli & Patrick Wincker & Pedro H. Oliveira, 2024. "The defensome of complex bacterial communities," Nature Communications, Nature, vol. 15(1), pages 1-15, December.
    4. Wing-Cheong Wong & Sebastian Maurer-Stroh & Frank Eisenhaber, 2010. "More Than 1,001 Problems with Protein Domain Databases: Transmembrane Regions, Signal Peptides and the Issue of Sequence Homology," PLOS Computational Biology, Public Library of Science, vol. 6(7), pages 1-19, July.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1002195. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.