IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0032491.html
   My bibliography  Save this article

Using the RDP Classifier to Predict Taxonomic Novelty and Reduce the Search Space for Finding Novel Organisms

Author

Listed:
  • Yemin Lan
  • Qiong Wang
  • James R Cole
  • Gail L Rosen

Abstract

Background: Currently, the naïve Bayesian classifier provided by the Ribosomal Database Project (RDP) is one of the most widely used tools to classify 16S rRNA sequences, mainly collected from environmental samples. We show that RDP has 97+% assignment accuracy and is fast for 250 bp and longer reads when the read originates from a taxon known to the database. Because most environmental samples will contain organisms from taxa whose 16S rRNA genes have not been previously sequenced, we aim to benchmark how well the RDP classifier and other competing methods can discriminate these novel taxa from known taxa. Principal Findings: Because each fragment is assigned a score (containing likelihood or confidence information such as the boostrap score in the RDP classifier), we “train” a threshold to discriminate between novel and known organisms and observe its performance on a test set. The threshold that we determine tends to be conservative (low sensitivity but high specificity) for naïve Bayesian methods. Nonetheless, our method performs better with the RDP classifier than the other methods tested, measured by the f-measure and the area-under-the-curve on the receiver operating characteristic of the test set. By constraining the database to well-represented genera, sensitivity improves 3–15%. Finally, we show that the detector is a good predictor to determine novel abundant taxa (especially for finer levels of taxonomy where novelty is more likely to be present). Conclusions: We conclude that selecting a read-length appropriate RDP bootstrap score can significantly reduce the search space for identifying novel genera and higher levels in taxonomy. In addition, having a well-represented database significantly improves performance while having genera that are “highly” similar does not make a significant improvement. On a real dataset from an Amazon Terra Preta soil sample, we show that the detector can predict (or correlates to) whether novel sequences will be assigned to new taxa when the RDP database “doubles” in the future.

Suggested Citation

  • Yemin Lan & Qiong Wang & James R Cole & Gail L Rosen, 2012. "Using the RDP Classifier to Predict Taxonomic Novelty and Reduce the Search Space for Finding Novel Organisms," PLOS ONE, Public Library of Science, vol. 7(3), pages 1-15, March.
  • Handle: RePEc:plo:pone00:0032491
    DOI: 10.1371/journal.pone.0032491
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0032491
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0032491&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0032491?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Thomas J Sharpton & Samantha J Riesenfeld & Steven W Kembel & Joshua Ladau & James P O'Dwyer & Jessica L Green & Jonathan A Eisen & Katherine S Pollard, 2011. "PhylOTU: A High-Throughput Procedure Quantifies Microbial Community Diversity and Resolves Novel Taxa from Metagenomic Data," PLOS Computational Biology, Public Library of Science, vol. 7(1), pages 1-13, January.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Jianghua Tang & Lili Su & Yanfei Fang & Chen Wang & Linyi Meng & Jiayong Wang & Junyao Zhang & Wenxiu Xu, 2023. "Moderate Nitrogen Reduction Increases Nitrogen Use Efficiency and Positively Affects Microbial Communities in Agricultural Soils," Agriculture, MDPI, vol. 13(4), pages 1-24, March.
    2. Yun-Shin Sew & Shazwan Abdul Shukor & Sarah Sabidi & Soo Peng Koh, 2020. "Effects of Fermented Jackfruit Leaf and Pulp Beverages on Gut Microbiota and Faecal Short Chain Fatty Acids Content in Sprague-Dawley Rats," Biomedical Journal of Scientific & Technical Research, Biomedical Research Network+, LLC, vol. 29(3), pages 22528-22536, August.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Wei Chen & Clarence K Zhang & Yongmei Cheng & Shaowu Zhang & Hongyu Zhao, 2013. "A Comparison of Methods for Clustering 16S rRNA Sequences into OTUs," PLOS ONE, Public Library of Science, vol. 8(8), pages 1-10, August.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0032491. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.