IDEAS home Printed from https://ideas.repec.org/a/plo/ppat00/1000508.html
   My bibliography  Save this article

Genome-Scale Identification of Legionella pneumophila Effectors Using a Machine Learning Approach

Author

Listed:
  • David Burstein
  • Tal Zusman
  • Elena Degtyar
  • Ram Viner
  • Gil Segal
  • Tal Pupko

Abstract

A large number of highly pathogenic bacteria utilize secretion systems to translocate effector proteins into host cells. Using these effectors, the bacteria subvert host cell processes during infection. Legionella pneumophila translocates effectors via the Icm/Dot type-IV secretion system and to date, approximately 100 effectors have been identified by various experimental and computational techniques. Effector identification is a critical first step towards the understanding of the pathogenesis system in L. pneumophila as well as in other bacterial pathogens. Here, we formulate the task of effector identification as a classification problem: each L. pneumophila open reading frame (ORF) was classified as either effector or not. We computationally defined a set of features that best distinguish effectors from non-effectors. These features cover a wide range of characteristics including taxonomical dispersion, regulatory data, genomic organization, similarity to eukaryotic proteomes and more. Machine learning algorithms utilizing these features were then applied to classify all the ORFs within the L. pneumophila genome. Using this approach we were able to predict and experimentally validate 40 new effectors, reaching a success rate of above 90%. Increasing the number of validated effectors to around 140, we were able to gain novel insights into their characteristics. Effectors were found to have low G+C content, supporting the hypothesis that a large number of effectors originate via horizontal gene transfer, probably from their protozoan host. In addition, effectors were found to cluster in specific genomic regions. Finally, we were able to provide a novel description of the C-terminal translocation signal required for effector translocation by the Icm/Dot secretion system. To conclude, we have discovered 40 novel L. pneumophila effectors, predicted over a hundred additional highly probable effectors, and shown the applicability of machine learning algorithms for the identification and characterization of bacterial pathogenesis determinants.Author Summary: Many pathogenic bacteria exert their function by translocating a set of proteins, termed effectors, into the cytoplasm of their host cell. These effectors subvert various host cell processes for the benefit of the bacteria. Our goal in this study was to identify novel effectors in a genomic scale, towards a better understanding of the molecular mechanisms of bacterial pathogenesis. We developed a computational approach for the detection of new effectors in the intracellular pathogen Legionella pneumophila, the causative agent of the Legionnaires' disease, a severe pneumonia-like disease. The novelty of our approach for detecting effectors is the combination of state-of-the-art machine learning classification algorithms with broad biological knowledge on effector biology in a genomic scale. Applying this method, we detected and experimentally validated dozens of new effectors. Notably, our computational predictions had an exceedingly high accuracy of over 90%. In analyzing these effectors we were able to obtain new insights into the molecular mechanism of the pathogenesis system. Our results suggest, for the first time, that over 10% of the Legionella genome is dedicated to pathogenesis. Finally, our approach is general and can be utilized to study effectors in many other human pathogens.

Suggested Citation

  • David Burstein & Tal Zusman & Elena Degtyar & Ram Viner & Gil Segal & Tal Pupko, 2009. "Genome-Scale Identification of Legionella pneumophila Effectors Using a Machine Learning Approach," PLOS Pathogens, Public Library of Science, vol. 5(7), pages 1-12, July.
  • Handle: RePEc:plo:ppat00:1000508
    DOI: 10.1371/journal.ppat.1000508
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plospathogens/article?id=10.1371/journal.ppat.1000508
    Download Restriction: no

    File URL: https://journals.plos.org/plospathogens/article/file?id=10.1371/journal.ppat.1000508&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.ppat.1000508?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Zhila Esna Ashari & Nairanjana Dasgupta & Kelly A Brayton & Shira L Broschat, 2018. "An optimal set of features for predicting type IV secretion system effector proteins for a subset of species based on a multi-level feature selection approach," PLOS ONE, Public Library of Science, vol. 13(5), pages 1-16, May.
    2. Koray Açıcı & Tunç Aşuroğlu & Çağatay Berke Erdaş & Hasan Oğul, 2019. "T4SS Effector Protein Prediction with Deep Learning," Data, MDPI, vol. 4(1), pages 1-13, March.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:ppat00:1000508. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plospathogens (email available below). General contact details of provider: https://journals.plos.org/plospathogens .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.