IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1000154.html
   My bibliography  Save this article

A Feature-Based Approach to Modeling Protein–DNA Interactions

Author

Listed:
  • Eilon Sharon
  • Shai Lubliner
  • Eran Segal

Abstract

Transcription factor (TF) binding to its DNA target site is a fundamental regulatory interaction. The most common model used to represent TF binding specificities is a position specific scoring matrix (PSSM), which assumes independence between binding positions. However, in many cases, this simplifying assumption does not hold. Here, we present feature motif models (FMMs), a novel probabilistic method for modeling TF–DNA interactions, based on log-linear models. Our approach uses sequence features to represent TF binding specificities, where each feature may span multiple positions. We develop the mathematical formulation of our model and devise an algorithm for learning its structural features from binding site data. We also developed a discriminative motif finder, which discovers de novo FMMs that are enriched in target sets of sequences compared to background sets. We evaluate our approach on synthetic data and on the widely used TF chromatin immunoprecipitation (ChIP) dataset of Harbison et al. We then apply our algorithm to high-throughput TF ChIP data from mouse and human, reveal sequence features that are present in the binding specificities of mouse and human TFs, and show that FMMs explain TF binding significantly better than PSSMs. Our FMM learning and motif finder software are available at http://genie.weizmann.ac.il/.Author Summary: Transcription factor (TF) protein binding to its DNA target sequences is a fundamental physical interaction underlying gene regulation. Characterizing the binding specificities of TFs is essential for deducing which genes are regulated by which TFs. Recently, several high-throughput methods that measure sequences enriched for TF targets genomewide were developed. Since TFs recognize relatively short sequences, much effort has been directed at developing computational methods that identify enriched subsequences (motifs) from these sequences. However, little effort has been directed towards improving the representation of motifs. Practically, available motif finding software use the position specific scoring matrix (PSSM) model, which assumes independence between different motif positions. We present an alternative, richer model, called the feature motif model (FMM), that enables the representation of a variety of sequence features and captures dependencies that exist between binding site positions. We show how FMMs explain TF binding data better than PSSMs on both synthetic and real data. We also present a motif finder algorithm that learns FMM motifs from unaligned promoter sequences and show how de novo FMMs, learned from binding data of the human TFs c-Myc and CTCF, reveal intriguing insights about their binding specificities.

Suggested Citation

  • Eilon Sharon & Shai Lubliner & Eran Segal, 2008. "A Feature-Based Approach to Modeling Protein–DNA Interactions," PLOS Computational Biology, Public Library of Science, vol. 4(8), pages 1-17, August.
  • Handle: RePEc:plo:pcbi00:1000154
    DOI: 10.1371/journal.pcbi.1000154
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1000154
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1000154&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1000154?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Leelavati Narlikar & Raluca Gordân & Alexander J Hartemink, 2007. "A Nucleosome-Guided Map of Transcription Factor Binding Sites in Yeast," PLOS Computational Biology, Public Library of Science, vol. 3(11), pages 1-10, November.
    2. Christopher T. Harbison & D. Benjamin Gordon & Tong Ihn Lee & Nicola J. Rinaldi & Kenzie D. Macisaac & Timothy W. Danford & Nancy M. Hannett & Jean-Bosco Tagne & David B. Reynolds & Jane Yoo & Ezra G., 2004. "Transcriptional regulatory code of a eukaryotic genome," Nature, Nature, vol. 431(7004), pages 99-104, September.
    3. Manolis Kellis & Nick Patterson & Matthew Endrizzi & Bruce Birren & Eric S. Lander, 2003. "Sequencing and comparison of yeast species to identify genes and regulatory elements," Nature, Nature, vol. 423(6937), pages 241-254, May.
    4. Vishwanath R. Iyer & Christine E. Horak & Charles S. Scafe & David Botstein & Michael Snyder & Patrick O. Brown, 2001. "Genomic binding sites of the yeast cell-cycle transcription factors SBF and MBF," Nature, Nature, vol. 409(6819), pages 533-538, January.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Yue Zhao & David Granas & Gary D Stormo, 2009. "Inferring Binding Energies from Selected Binding Sites," PLOS Computational Biology, Public Library of Science, vol. 5(12), pages 1-8, December.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Zing Tsung-Yeh Tsai & Shin-Han Shiu & Huai-Kuang Tsai, 2015. "Contribution of Sequence Motif, Chromatin State, and DNA Structure Features to Predictive Models of Transcription Factor Binding in Yeast," PLOS Computational Biology, Public Library of Science, vol. 11(8), pages 1-22, August.
    2. Rahul Siddharthan & Eric D Siggia & Erik van Nimwegen, 2005. "PhyloGibbs: A Gibbs Sampling Motif Finder That Incorporates Phylogeny," PLOS Computational Biology, Public Library of Science, vol. 1(7), pages 1-23, December.
    3. Harri Lähdesmäki & Alistair G Rust & Ilya Shmulevich, 2008. "Probabilistic Inference of Transcription Factor Binding from Multiple Data Sources," PLOS ONE, Public Library of Science, vol. 3(3), pages 1-24, March.
    4. Leelavati Narlikar & Raluca Gordân & Alexander J Hartemink, 2007. "A Nucleosome-Guided Map of Transcription Factor Binding Sites in Yeast," PLOS Computational Biology, Public Library of Science, vol. 3(11), pages 1-10, November.
    5. Kyoung-Jae Won & Saurabh Agarwal & Li Shen & Robert Shoemaker & Bing Ren & Wei Wang, 2009. "An Integrated Approach to Identifying Cis-Regulatory Modules in the Human Genome," PLOS ONE, Public Library of Science, vol. 4(5), pages 1-8, May.
    6. Kenzie D MacIsaac & Ernest Fraenkel, 2006. "Practical Strategies for Discovering Regulatory DNA Sequence Motifs," PLOS Computational Biology, Public Library of Science, vol. 2(4), pages 1-10, April.
    7. Tao Song & Hong Gu, 2014. "Discriminative Motif Discovery via Simulated Evolution and Random Under-Sampling," PLOS ONE, Public Library of Science, vol. 9(2), pages 1-10, February.
    8. John E Reid & Lorenz Wernisch, 2014. "STEME: A Robust, Accurate Motif Finder for Large Data Sets," PLOS ONE, Public Library of Science, vol. 9(3), pages 1-11, March.
    9. Gross, Eitan, 2015. "Effect of environmental stress on regulation of gene expression in the yeast," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 430(C), pages 224-235.
    10. Xinyi Liu & Bin Liu & Zhimin Huang & Ting Shi & Yingyi Chen & Jian Zhang, 2012. "SPPS: A Sequence-Based Method for Predicting Probability of Protein-Protein Interaction Partners," PLOS ONE, Public Library of Science, vol. 7(1), pages 1-6, January.
    11. Alexander Kawrykow & Gary Roumanis & Alfred Kam & Daniel Kwak & Clarence Leung & Chu Wu & Eleyine Zarour & Phylo players & Luis Sarmenta & Mathieu Blanchette & Jérôme Waldispühl, 2012. "Phylo: A Citizen Science Approach for Improving Multiple Sequence Alignment," PLOS ONE, Public Library of Science, vol. 7(3), pages 1-9, March.
    12. Eivind Valen & Albin Sandelin & Ole Winther & Anders Krogh, 2009. "Discovery of Regulatory Elements is Improved by a Discriminatory Approach," PLOS Computational Biology, Public Library of Science, vol. 5(11), pages 1-8, November.
    13. G. Saharidis & I. Androulakis & M. Ierapetritou, 2011. "Model building using bi-level optimization," Journal of Global Optimization, Springer, vol. 49(1), pages 49-67, January.
    14. Armita Nourmohammad & Michael Lässig, 2011. "Formation of Regulatory Modules by Local Sequence Duplication," PLOS Computational Biology, Public Library of Science, vol. 7(10), pages 1-12, October.
    15. Alessandro L. V. Coradini & Christopher Ne Ville & Zachary A. Krieger & Joshua Roemer & Cara Hull & Shawn Yang & Daniel T. Lusk & Ian M. Ehrenreich, 2023. "Building synthetic chromosomes from natural DNA," Nature Communications, Nature, vol. 14(1), pages 1-12, December.
    16. Emily N Manderson & Mohan Malleshaiah & Stephen W Michnick, 2008. "A Novel Genetic Screen Implicates Elm1 in the Inactivation of the Yeast Transcription Factor SBF," PLOS ONE, Public Library of Science, vol. 3(1), pages 1-9, January.
    17. Wei-Sheng Wu & Fu-Jou Lai, 2016. "Detecting Cooperativity between Transcription Factors Based on Functional Coherence and Similarity of Their Target Gene Sets," PLOS ONE, Public Library of Science, vol. 11(9), pages 1-12, September.
    18. Valerie Storms & Marleen Claeys & Aminael Sanchez & Bart De Moor & Annemieke Verstuyf & Kathleen Marchal, 2010. "The Effect of Orthology and Coregulation on Detecting Regulatory Motifs," PLOS ONE, Public Library of Science, vol. 5(2), pages 1-11, February.
    19. Robert K Bradley & Adam Roberts & Michael Smoot & Sudeep Juvekar & Jaeyoung Do & Colin Dewey & Ian Holmes & Lior Pachter, 2009. "Fast Statistical Alignment," PLOS Computational Biology, Public Library of Science, vol. 5(5), pages 1-15, May.
    20. Cheemeng Tan & Robert Phillip Smith & Ming-Chi Tsai & Russell Schwartz & Lingchong You, 2014. "Phenotypic Signatures Arising from Unbalanced Bacterial Growth," PLOS Computational Biology, Public Library of Science, vol. 10(8), pages 1-10, August.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1000154. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.