IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0105067.html
   My bibliography  Save this article

Profile Hidden Markov Models for the Detection of Viruses within Metagenomic Sequence Data

Author

Listed:
  • Peter Skewes-Cox
  • Thomas J Sharpton
  • Katherine S Pollard
  • Joseph L DeRisi

Abstract

Rapid, sensitive, and specific virus detection is an important component of clinical diagnostics. Massively parallel sequencing enables new diagnostic opportunities that complement traditional serological and PCR based techniques. While massively parallel sequencing promises the benefits of being more comprehensive and less biased than traditional approaches, it presents new analytical challenges, especially with respect to detection of pathogen sequences in metagenomic contexts. To a first approximation, the initial detection of viruses can be achieved simply through alignment of sequence reads or assembled contigs to a reference database of pathogen genomes with tools such as BLAST. However, recognition of highly divergent viral sequences is problematic, and may be further complicated by the inherently high mutation rates of some viral types, especially RNA viruses. In these cases, increased sensitivity may be achieved by leveraging position-specific information during the alignment process. Here, we constructed HMMER3-compatible profile hidden Markov models (profile HMMs) from all the virally annotated proteins in RefSeq in an automated fashion using a custom-built bioinformatic pipeline. We then tested the ability of these viral profile HMMs (“vFams”) to accurately classify sequences as viral or non-viral. Cross-validation experiments with full-length gene sequences showed that the vFams were able to recall 91% of left-out viral test sequences without erroneously classifying any non-viral sequences into viral protein clusters. Thorough reanalysis of previously published metagenomic datasets with a set of the best-performing vFams showed that they were more sensitive than BLAST for detecting sequences originating from more distant relatives of known viruses. To facilitate the use of the vFams for rapid detection of remote viral homologs in metagenomic data, we provide two sets of vFams, comprising more than 4,000 vFams each, in the HMMER3 format. We also provide the software necessary to build custom profile HMMs or update the vFams as more viruses are discovered (http://derisilab.ucsf.edu/software/vFam).

Suggested Citation

  • Peter Skewes-Cox & Thomas J Sharpton & Katherine S Pollard & Joseph L DeRisi, 2014. "Profile Hidden Markov Models for the Detection of Viruses within Metagenomic Sequence Data," PLOS ONE, Public Library of Science, vol. 9(8), pages 1-12, August.
  • Handle: RePEc:plo:pone00:0105067
    DOI: 10.1371/journal.pone.0105067
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0105067
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0105067&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0105067?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Kate E. Jones & Nikkita G. Patel & Marc A. Levy & Adam Storeygard & Deborah Balk & John L. Gittleman & Peter Daszak, 2008. "Global trends in emerging infectious diseases," Nature, Nature, vol. 451(7181), pages 990-993, February.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Ardi Tampuu & Zurab Bzhalava & Joakim Dillner & Raul Vicente, 2019. "ViraMiner: Deep learning on raw DNA sequences for identifying viral genomes in human samples," PLOS ONE, Public Library of Science, vol. 14(9), pages 1-17, September.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Nikolett Orosz & Tünde Tóthné Tóth & Gyöngyi Vargáné Gyuró & Zsoltné Tibor Nábrádi & Klára Hegedűsné Sorosi & Zsuzsa Nagy & Éva Rigó & Ádám Kaposi & Gabriella Gömöri & Cornelia Melinda Adi Santoso & A, 2022. "Comparison of Length of Hospital Stay for Community-Acquired Infections Due to Enteric Pathogens, Influenza Viruses and Multidrug-Resistant Bacteria: A Cross-Sectional Study in Hungary," IJERPH, MDPI, vol. 19(23), pages 1-16, November.
    2. Mudassar Arsalan & Omar Mubin & Fady Alnajjar & Belal Alsinglawi, 2020. "COVID-19 Global Risk: Expectation vs. Reality," IJERPH, MDPI, vol. 17(15), pages 1-10, August.
    3. Ceddia, M.G. & Bardsley, N.O. & Goodwin, R. & Holloway, G.J. & Nocella, G. & Stasi, A., 2013. "A complex system perspective on the emergence and spread of infectious diseases: Integrating economic and ecological aspects," Ecological Economics, Elsevier, vol. 90(C), pages 124-131.
    4. John M Drake & Tobias S Brett & Shiyang Chen & Bogdan I Epureanu & Matthew J Ferrari & Éric Marty & Paige B Miller & Eamon B O’Dea & Suzanne M O’Regan & Andrew W Park & Pejman Rohani, 2019. "The statistics of epidemic transitions," PLOS Computational Biology, Public Library of Science, vol. 15(5), pages 1-14, May.
    5. Ongolo, Symphorien & Giessen, Lukas & Karsenty, Alain & Tchamba, Martin & Krott, Max, 2021. "Forestland policies and politics in Africa: Recent evidence and new challenges," Forest Policy and Economics, Elsevier, vol. 127(C).
    6. Paige, Sarah B. & Malavé, Carly & Mbabazi, Edith & Mayer, Jonathan & Goldberg, Tony L., 2015. "Uncovering zoonoses awareness in an emerging disease ‘hotspot’," Social Science & Medicine, Elsevier, vol. 129(C), pages 78-86.
    7. Jianhua Wang & Guan-Zhu Han, 2023. "Genome mining shows that retroviruses are pervasively invading vertebrate genomes," Nature Communications, Nature, vol. 14(1), pages 1-11, December.
    8. Livia Marchetti & Valentina Cattivelli & Claudia Cocozza & Fabio Salbitano & Marco Marchetti, 2020. "Beyond Sustainability in Food Systems: Perspectives from Agroecology and Social Innovation," Sustainability, MDPI, vol. 12(18), pages 1-24, September.
    9. Ivan Montiel & Junghoon Park & Bryan W. Husted & Andres Velez-Calle, 2022. "Tracing the connections between international business and communicable diseases," Journal of International Business Studies, Palgrave Macmillan;Academy of International Business, vol. 53(8), pages 1785-1804, October.
    10. Maxwell B Joseph & William E Stutz & Pieter T J Johnson, 2016. "Multilevel Models for the Distribution of Hosts and Symbionts," PLOS ONE, Public Library of Science, vol. 11(11), pages 1-15, November.
    11. Laure Bonnaud & Nicolas Fortané, 2017. "Serge Morand and Muriel Figuié (eds), 2016, Emergence de maladies infectieuses. Risques et enjeux de société (The emergence of infectious diseases. Societal risks and stakes)," Review of Agricultural, Food and Environmental Studies, Springer, vol. 98(3), pages 225-228, December.
    12. Chen, Xiaowei & Chong, Wing Fung & Feng, Runhuan & Zhang, Linfeng, 2021. "Pandemic risk management: Resources contingency planning and allocation," Insurance: Mathematics and Economics, Elsevier, vol. 101(PB), pages 359-383.
    13. Lin Zhang & Jason Rohr & Ruina Cui & Yusi Xin & Lixia Han & Xiaona Yang & Shimin Gu & Yuanbao Du & Jing Liang & Xuyu Wang & Zhengjun Wu & Qin Hao & Xuan Liu, 2022. "Biological invasions facilitate zoonotic disease emergences," Nature Communications, Nature, vol. 13(1), pages 1-11, December.
    14. Elisa Giannone & Nuno Paixao & Xinle Pang, 2021. "The Geography of Pandemic Containment," Staff Working Papers 21-26, Bank of Canada.
    15. Ricardo Aguas & Neil M Ferguson, 2013. "Feature Selection Methods for Identifying Genetic Determinants of Host Species in RNA Viruses," PLOS Computational Biology, Public Library of Science, vol. 9(10), pages 1-10, October.
    16. Katarzyna Kubiak & Hanna Szymańska & Małgorzata Dmitryjuk & Ewa Dzika, 2022. "Abundance of Ixodes ricinus Ticks (Acari: Ixodidae) and the Diversity of Borrelia Species in Northeastern Poland," IJERPH, MDPI, vol. 19(12), pages 1-18, June.
    17. Anna C. Peterson & Himanshu Sharma & Arvind Kumar & Bruno M. Ghersi & Scott J. Emrich & Kurt J. Vandegrift & Amit Kapoor & Michael J. Blum, 2021. "Rodent Virus Diversity and Differentiation across Post-Katrina New Orleans," Sustainability, MDPI, vol. 13(14), pages 1-18, July.
    18. Blanco, Esther & Baier, Alexandra & Holzmeister, Felix & Jaber-Lopez, Tarek & Struwe, Natalie, 2022. "Substitution of social sustainability concerns under the Covid-19 pandemic," Ecological Economics, Elsevier, vol. 192(C).
    19. Rosemary A. McFarlane & Adrian C. Sleigh & Anthony J. McMichael, 2013. "Land-Use Change and Emerging Infectious Disease on an Island Continent," IJERPH, MDPI, vol. 10(7), pages 1-21, June.
    20. Luiza M Karpavicius & Ariaster Chimeli, 2023. "Forest Protection and Human Health: The Case of Malaria in the Brazilian Amazon," Working Papers, Department of Economics 2023_08, University of São Paulo (FEA-USP), revised 26 Jul 2023.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0105067. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.