IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1011001.html
   My bibliography  Save this article

ResMiCo: Increasing the quality of metagenome-assembled genomes with deep learning

Author

Listed:
  • Olga Mineeva
  • Daniel Danciu
  • Bernhard Schölkopf
  • Ruth E Ley
  • Gunnar Rätsch
  • Nicholas D Youngblut

Abstract

The number of published metagenome assemblies is rapidly growing due to advances in sequencing technologies. However, sequencing errors, variable coverage, repetitive genomic regions, and other factors can produce misassemblies, which are challenging to detect for taxonomically novel genomic data. Assembly errors can affect all downstream analyses of the assemblies. Accuracy for the state of the art in reference-free misassembly prediction does not exceed an AUPRC of 0.57, and it is not clear how well these models generalize to real-world data. Here, we present the Residual neural network for Misassembled Contig identification (ResMiCo), a deep learning approach for reference-free identification of misassembled contigs. To develop ResMiCo, we first generated a training dataset of unprecedented size and complexity that can be used for further benchmarking and developments in the field. Through rigorous validation, we show that ResMiCo is substantially more accurate than the state of the art, and the model is robust to novel taxonomic diversity and varying assembly methods. ResMiCo estimated 7% misassembled contigs per metagenome across multiple real-world datasets. We demonstrate how ResMiCo can be used to optimize metagenome assembly hyperparameters to improve accuracy, instead of optimizing solely for contiguity. The accuracy, robustness, and ease-of-use of ResMiCo make the tool suitable for general quality control of metagenome assemblies and assembly methodology optimization.Author summary: Metagenome assembly quality is fundamental to all downstream analyses of such data. The number of metagenome assemblies, especially metagenome-assembled genomes (MAGs), is rapidly increasing, but tools to assess the quality of these assemblies lack the accuracy needed for robust quality control. Moreover, existing models have been trained on datasets lacking complexity and realism, which may limit their generalization to novel data. Due to the limitations of existing models, most studies forgo such approaches and instead rely on CheckM to assess assembly quality, an approach that only utilizes a small portion of all genomic information and does not identify specific misassemblies. We harnessed existing large genomic datasets and high-performance computing to produce a training dataset of unprecedented size and complexity and thereby trained a deep learning model for predicting misassemblies that can robustly generalize to novel taxonomy and varying assembly methodologies.

Suggested Citation

  • Olga Mineeva & Daniel Danciu & Bernhard Schölkopf & Ruth E Ley & Gunnar Rätsch & Nicholas D Youngblut, 2023. "ResMiCo: Increasing the quality of metagenome-assembled genomes with deep learning," PLOS Computational Biology, Public Library of Science, vol. 19(5), pages 1-20, May.
  • Handle: RePEc:plo:pcbi00:1011001
    DOI: 10.1371/journal.pcbi.1011001
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1011001
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1011001&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1011001?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Stephen Nayfach & Zhou Jason Shi & Rekha Seshadri & Katherine S. Pollard & Nikos C. Kyrpides, 2019. "New insights from uncultivated genomes of the global human gut microbiome," Nature, Nature, vol. 568(7753), pages 505-510, April.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Li Zhang & Karen R. Jonscher & Zuyuan Zhang & Yi Xiong & Ryan S. Mueller & Jacob E. Friedman & Chongle Pan, 2022. "Islet autoantibody seroconversion in type-1 diabetes is associated with metagenome-assembled genomes in infant gut microbiomes," Nature Communications, Nature, vol. 13(1), pages 1-14, December.
    2. Joe J. Lim & Christian Diener & James Wilson & Jacob J. Valenzuela & Nitin S. Baliga & Sean M. Gibbons, 2023. "Growth phase estimation for abundant bacterial populations sampled longitudinally from human stool metagenomes," Nature Communications, Nature, vol. 14(1), pages 1-13, December.
    3. Elio L Herzog & Melania Wäfler & Irene Keller & Sebastian Wolf & Martin S Zinkernagel & Denise C Zysset-Burri, 2021. "The importance of age in compositional and functional profiling of the human intestinal microbiome," PLOS ONE, Public Library of Science, vol. 16(10), pages 1-13, October.
    4. Djawad Radjabzadeh & Jos A. Bosch & André G. Uitterlinden & Aeilko H. Zwinderman & M. Arfan Ikram & Joyce B. J. Meurs & Annemarie I. Luik & Max Nieuwdorp & Anja Lok & Cornelia M. Duijn & Robert Kraaij, 2022. "Gut microbiome-wide association study of depressive symptoms," Nature Communications, Nature, vol. 13(1), pages 1-10, December.
    5. J. L. Rolando & M. Kolton & T. Song & Y. Liu & P. Pinamang & R. Conrad & J. T. Morris & K. T. Konstantinidis & J. E. Kostka, 2024. "Sulfur oxidation and reduction are coupled to nitrogen fixation in the roots of the salt marsh foundation plant Spartina alterniflora," Nature Communications, Nature, vol. 15(1), pages 1-15, December.
    6. Chan Yeong Kim & Junyeong Ma & Insuk Lee, 2022. "HiFi metagenomic sequencing enables assembly of accurate and complete genomes from human gut microbiota," Nature Communications, Nature, vol. 13(1), pages 1-11, December.
    7. Qiuyun Jiang & Lei Cao & Yingchun Han & Shengjie Li & Rui Zhao & Xiaoli Zhang & S. Emil Ruff & Zhuoming Zhao & Jiaxue Peng & Jing Liao & Baoli Zhu & Minxiao Wang & Xianbiao Lin & Xiyang Dong, 2025. "Cold seeps are potential hotspots of deep-sea nitrogen loss driven by microorganisms across 21 phyla," Nature Communications, Nature, vol. 16(1), pages 1-16, December.
    8. Adina Howe & Nejc Stopnisek & Shane K. Dooley & Fan Yang & Keara L. Grady & Ashley Shade, 2023. "Seasonal activities of the phyllosphere microbiome of perennial crops," Nature Communications, Nature, vol. 14(1), pages 1-16, December.
    9. Bin Ma & Caiyu Lu & Yiling Wang & Jingwen Yu & Kankan Zhao & Ran Xue & Hao Ren & Xiaofei Lv & Ronghui Pan & Jiabao Zhang & Yongguan Zhu & Jianming Xu, 2023. "A genomic catalogue of soil microbiomes boosts mining of biodiversity and genetic resources," Nature Communications, Nature, vol. 14(1), pages 1-14, December.
    10. Fiona B. Tamburini & Dylan Maghini & Ovokeraye H. Oduaran & Ryan Brewster & Michaella R. Hulley & Venesa Sahibdeen & Shane A. Norris & Stephen Tollman & Kathleen Kahn & Ryan G. Wagner & Alisha N. Wade, 2022. "Short- and long-read metagenomics of urban and rural South African gut microbiomes reveal a transitional composition and undescribed taxa," Nature Communications, Nature, vol. 13(1), pages 1-18, December.
    11. Ziye Wang & Ronghui You & Haitao Han & Wei Liu & Fengzhu Sun & Shanfeng Zhu, 2024. "Effective binning of metagenomic contigs using contrastive multi-view representation learning," Nature Communications, Nature, vol. 15(1), pages 1-14, December.
    12. Ji-Woo Park & Yeo-Eun Yun & Jin Ah Cho & Su-In Yoon & Su-A In & Eun-Jin Park & Min-Soo Kim, 2025. "Characterization of the phyllosphere virome of fresh vegetables and potential transfer to the human gut," Nature Communications, Nature, vol. 16(1), pages 1-16, December.
    13. Shuqin Zeng & Dhrati Patangia & Alexandre Almeida & Zhemin Zhou & Dezhi Mu & R. Paul Ross & Catherine Stanton & Shaopu Wang, 2022. "A compendium of 32,277 metagenome-assembled genomes and over 80 million genes from the early-life human gut microbiome," Nature Communications, Nature, vol. 13(1), pages 1-15, December.
    14. Mingyue Cheng & Shuai Luo & Peng Zhang & Guangzhou Xiong & Kai Chen & Chuanqi Jiang & Fangdian Yang & Hanhui Huang & Pengshuo Yang & Guanxi Liu & Yuhao Zhang & Sang Ba & Ping Yin & Jie Xiong & Wei Mia, 2024. "A genome and gene catalog of the aquatic microbiomes of the Tibetan Plateau," Nature Communications, Nature, vol. 15(1), pages 1-13, December.
    15. Sigal Leviatan & Saar Shoer & Daphna Rothschild & Maria Gorodetski & Eran Segal, 2022. "An expanded reference map of the human gut microbiome reveals hundreds of previously unknown species," Nature Communications, Nature, vol. 13(1), pages 1-14, December.
    16. Jeremy Armetta & Simone S. Li & Troels Holger Vaaben & Ruben Vazquez-Uribe & Morten O. A. Sommer, 2025. "Metagenome-guided culturomics for the targeted enrichment of gut microbes," Nature Communications, Nature, vol. 16(1), pages 1-14, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1011001. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.