IDEAS home Printed from https://ideas.repec.org/a/nat/natcom/v15y2024i1d10.1038_s41467-024-45024-5.html
   My bibliography  Save this article

ContScout: sensitive detection and removal of contamination from annotated genomes

Author

Listed:
  • Balázs Bálint

    (HUN-REN Biological Research Centre, Szeged)

  • Zsolt Merényi

    (HUN-REN Biological Research Centre, Szeged)

  • Botond Hegedüs

    (HUN-REN Biological Research Centre, Szeged)

  • Igor V. Grigoriev

    (Lawrence Berkeley National Laboratory
    University of California Berkeley)

  • Zhihao Hou

    (HUN-REN Biological Research Centre, Szeged
    University of Szeged)

  • Csenge Földi

    (HUN-REN Biological Research Centre, Szeged
    University of Szeged)

  • László G. Nagy

    (HUN-REN Biological Research Centre, Szeged)

Abstract

Contamination of genomes is an increasingly recognized problem affecting several downstream applications, from comparative evolutionary genomics to metagenomics. Here we introduce ContScout, a precise tool for eliminating foreign sequences from annotated genomes. It achieves high specificity and sensitivity on synthetic benchmark data even when the contaminant is a closely related species, outperforms competing tools, and can distinguish horizontal gene transfer from contamination. A screen of 844 eukaryotic genomes for contamination identified bacteria as the most common source, followed by fungi and plants. Furthermore, we show that contaminants in ancestral genome reconstructions lead to erroneous early origins of genes and inflate gene loss rates, leading to a false notion of complex ancestral genomes. Taken together, we offer here a tool for sensitive removal of foreign proteins, identify and remove contaminants from diverse eukaryotic genomes and evaluate their impact on phylogenomic analyses.

Suggested Citation

  • Balázs Bálint & Zsolt Merényi & Botond Hegedüs & Igor V. Grigoriev & Zhihao Hou & Csenge Földi & László G. Nagy, 2024. "ContScout: sensitive detection and removal of contamination from annotated genomes," Nature Communications, Nature, vol. 15(1), pages 1-12, December.
  • Handle: RePEc:nat:natcom:v:15:y:2024:i:1:d:10.1038_s41467-024-45024-5
    DOI: 10.1038/s41467-024-45024-5
    as

    Download full text from publisher

    File URL: https://www.nature.com/articles/s41467-024-45024-5
    File Function: Abstract
    Download Restriction: no

    File URL: https://libkey.io/10.1038/s41467-024-45024-5?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. László G. Nagy & Robin A. Ohm & Gábor M. Kovács & Dimitrios Floudas & Robert Riley & Attila Gácser & Mátyás Sipiczki & John M. Davis & Sharon L. Doty & G Sybren de Hoog & B. Franz Lang & Joseph W. Spa, 2014. "Latent homology and convergent regulatory evolution underlies the repeated emergence of yeasts," Nature Communications, Nature, vol. 5(1), pages 1-8, December.
    2. Xueyan Li & Dingding Fan & Wei Zhang & Guichun Liu & Lu Zhang & Li Zhao & Xiaodong Fang & Lei Chen & Yang Dong & Yuan Chen & Yun Ding & Ruoping Zhao & Mingji Feng & Yabing Zhu & Yue Feng & Xuanting Ji, 2015. "Outbred genome sequencing and CRISPR/Cas9 gene editing in butterflies," Nature Communications, Nature, vol. 6(1), pages 1-10, November.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Youjie Zhao & Chengyong Su & Bo He & Ruie Nie & Yunliang Wang & Junye Ma & Jingyu Song & Qun Yang & Jiasheng Hao, 2023. "Dispersal from the Qinghai-Tibet plateau by a high-altitude butterfly is associated with rapid expansion and reorganization of its genome," Nature Communications, Nature, vol. 14(1), pages 1-13, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:natcom:v:15:y:2024:i:1:d:10.1038_s41467-024-45024-5. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.