IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1006078.html
   My bibliography  Save this article

SARNAclust: Semi-automatic detection of RNA protein binding motifs from immunoprecipitation data

Author

Listed:
  • Ivan Dotu
  • Scott I Adamson
  • Benjamin Coleman
  • Cyril Fournier
  • Emma Ricart-Altimiras
  • Eduardo Eyras
  • Jeffrey H Chuang

Abstract

RNA-protein binding is critical to gene regulation, controlling fundamental processes including splicing, translation, localization and stability, and aberrant RNA-protein interactions are known to play a role in a wide variety of diseases. However, molecular understanding of RNA-protein interactions remains limited; in particular, identification of RNA motifs that bind proteins has long been challenging, especially when such motifs depend on both sequence and structure. Moreover, although RNA binding proteins (RBPs) often contain more than one binding domain, algorithms capable of identifying more than one binding motif simultaneously have not been developed. In this paper we present a novel pipeline to determine binding peaks in crosslinking immunoprecipitation (CLIP) data, to discover multiple possible RNA sequence/structure motifs among them, and to experimentally validate such motifs. At the core is a new semi-automatic algorithm SARNAclust, the first unsupervised method to identify and deconvolve multiple sequence/structure motifs simultaneously. SARNAclust computes similarity between sequence/structure objects using a graph kernel, providing the ability to isolate the impact of specific features through the bulge graph formalism. Application of SARNAclust to synthetic data shows its capability of clustering 5 motifs at once with a V-measure value of over 0.95, while GraphClust achieves only a V-measure of 0.083 and RNAcontext cannot detect any of the motifs. When applied to existing eCLIP sets, SARNAclust finds known motifs for SLBP and HNRNPC and novel motifs for several other RBPs such as AGGF1, AKAP8L and ILF3. We demonstrate an experimental validation protocol, a targeted Bind-n-Seq-like high-throughput sequencing approach that relies on RNA inverse folding for oligo pool design, that can validate the components within the SLBP motif. Finally, we use this protocol to experimentally interrogate the SARNAclust motif predictions for protein ILF3. Our results support a newly identified partially double-stranded UUUUUGAGA motif similar to that known for the splicing factor HNRNPC.Author summary: RNA-protein binding is critical to gene regulation, and aberrant RNA-protein interactions play a role in a wide variety of diseases. However, molecular understanding of these interactions remains limited because of the difficulty of ascertaining the motifs that bind each protein. To address this challenge, we have developed a novel algorithm, SARNAclust, to computationally identify combined structure/sequence motifs from immunoprecipitation data. SARNAclust can deconvolve multiple motifs simultaneously and determine the importance of specific features through a graph kernel and bulge graph formalism. We have verified SARNAclust to be effective on synthetic motif data and also tested it on ENCODE eCLIP datasets, identifying known motifs and novel predictions. We have experimentally validated SARNAclust for two proteins, SLBP and ILF3, using RNA Bind-n-Seq measurements. Applying SARNAclust to ENCODE data provides new evidence for previously unknown regulatory interactions, notably splicing co-regulation by ILF3 and the splicing factor hnRNPC.

Suggested Citation

  • Ivan Dotu & Scott I Adamson & Benjamin Coleman & Cyril Fournier & Emma Ricart-Altimiras & Eduardo Eyras & Jeffrey H Chuang, 2018. "SARNAclust: Semi-automatic detection of RNA protein binding motifs from immunoprecipitation data," PLOS Computational Biology, Public Library of Science, vol. 14(3), pages 1-25, March.
  • Handle: RePEc:plo:pcbi00:1006078
    DOI: 10.1371/journal.pcbi.1006078
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1006078
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1006078&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1006078?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Rahul Siddharthan & Eric D Siggia & Erik van Nimwegen, 2005. "PhyloGibbs: A Gibbs Sampling Motif Finder That Incorporates Phylogeny," PLOS Computational Biology, Public Library of Science, vol. 1(7), pages 1-23, December.
    2. Hilal Kazan & Debashish Ray & Esther T Chan & Timothy R Hughes & Quaid Morris, 2010. "RNAcontext: A New Method for Learning the Sequence and Structure Binding Preferences of RNA-Binding Proteins," PLOS Computational Biology, Public Library of Science, vol. 6(7), pages 1-10, July.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Harri Lähdesmäki & Alistair G Rust & Ilya Shmulevich, 2008. "Probabilistic Inference of Transcription Factor Binding from Multiple Data Sources," PLOS ONE, Public Library of Science, vol. 3(3), pages 1-24, March.
    2. Jia Lu & Xiaoyi Cao & Sheng Zhong, 2018. "A likelihood approach to testing hypotheses on the co-evolution of epigenome and genome," PLOS Computational Biology, Public Library of Science, vol. 14(12), pages 1-28, December.
    3. Saeed Omidi & Mihaela Zavolan & Mikhail Pachkov & Jeremie Breda & Severin Berger & Erik van Nimwegen, 2017. "Automated incorporation of pairwise dependency in transcription factor binding site prediction using dinucleotide weight tensors," PLOS Computational Biology, Public Library of Science, vol. 13(7), pages 1-22, July.
    4. Aqil M Azmi & Abdulrakeeb Al-Ssulami, 2014. "Encoded Expansion: An Efficient Algorithm to Discover Identical String Motifs," PLOS ONE, Public Library of Science, vol. 9(5), pages 1-9, May.
    5. Timothy E Reddy & Charles DeLisi & Boris E Shakhnovich, 2007. "Binding Site Graphs: A New Graph Theoretical Framework for Prediction of Transcription Factor Binding Sites," PLOS Computational Biology, Public Library of Science, vol. 3(5), pages 1-11, May.
    6. Ivan Dotu & Vinodh Mechery & Peter Clote, 2014. "Energy Parameters and Novel Algorithms for an Extended Nearest Neighbor Energy Model of RNA," PLOS ONE, Public Library of Science, vol. 9(2), pages 1-14, February.
    7. Siewert Elizabeth A & Kechris Katerina J, 2009. "Prediction of Motifs Based on a Repeated-Measures Model for Integrating Cross-Species Sequence and Expression Data," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 8(1), pages 1-34, September.
    8. Kenzie D MacIsaac & Ernest Fraenkel, 2006. "Practical Strategies for Discovering Regulatory DNA Sequence Motifs," PLOS Computational Biology, Public Library of Science, vol. 2(4), pages 1-10, April.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1006078. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.