IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1000158.html
   My bibliography  Save this article

Modeling ChIP Sequencing In Silico with Applications

Author

Listed:
  • Zhengdong D Zhang
  • Joel Rozowsky
  • Michael Snyder
  • Joseph Chang
  • Mark Gerstein

Abstract

ChIP sequencing (ChIP-seq) is a new method for genomewide mapping of protein binding sites on DNA. It has generated much excitement in functional genomics. To score data and determine adequate sequencing depth, both the genomic background and the binding sites must be properly modeled. To develop a computational foundation to tackle these issues, we first performed a study to characterize the observed statistical nature of this new type of high-throughput data. By linking sequence tags into clusters, we show that there are two components to the distribution of tag counts observed in a number of recent experiments: an initial power-law distribution and a subsequent long right tail. Then we develop in silico ChIP-seq, a computational method to simulate the experimental outcome by placing tags onto the genome according to particular assumed distributions for the actual binding sites and for the background genomic sequence. In contrast to current assumptions, our results show that both the background and the binding sites need to have a markedly nonuniform distribution in order to correctly model the observed ChIP-seq data, with, for instance, the background tag counts modeled by a gamma distribution. On the basis of these results, we extend an existing scoring approach by using a more realistic genomic-background model. This enables us to identify transcription-factor binding sites in ChIP-seq data in a statistically rigorous fashion.Author Summary: ChIP-seq is an apt combination of chromosome immunoprecipitation and next-generation sequencing to identify transcription factor binding sites in vivo on the whole-genome scale. Since its advent, this new method has generated much excitement in the field of functional genomics. Proper computational modeling of the ChIP-seq process is needed for both data scoring and determination of adequate sequencing depth, as it provides the computational foundation for analyzing ChIP-seq data. In our study, we show the characteristics of ChIP-seq data and present in silico ChIP sequencing, a computational method to simulate the experimental outcome. On the basis of our data characterization, we observed transcription factor binding sites with excessive enrichment of sequence tags. Our simulation results reveal that both the genomic background and the binding sites are not uniform. On the basis of our simulation results, we propose a statistical procedure using the more realistic genomic background model to identify binding sites in ChIP-seq data.

Suggested Citation

  • Zhengdong D Zhang & Joel Rozowsky & Michael Snyder & Joseph Chang & Mark Gerstein, 2008. "Modeling ChIP Sequencing In Silico with Applications," PLOS Computational Biology, Public Library of Science, vol. 4(8), pages 1-10, August.
  • Handle: RePEc:plo:pcbi00:1000158
    DOI: 10.1371/journal.pcbi.1000158
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1000158
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1000158&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1000158?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Vishwanath R. Iyer & Christine E. Horak & Charles S. Scafe & David Botstein & Michael Snyder & Patrick O. Brown, 2001. "Genomic binding sites of the yeast cell-cycle transcription factors SBF and MBF," Nature, Nature, vol. 409(6819), pages 533-538, January.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Youngsook Lucy Jung & Wenping Zhao & Ian Li & Dhawal Jain & Charles B. Epstein & Bradley E. Bernstein & Sareh Parangi & Richard Sherwood & Cassianne Robinson-Cohen & Yi-Hsiang Hsu & Peter J. Park & Mi, 2024. "Epigenetic profiling reveals key genes and cis-regulatory networks specific to human parathyroids," Nature Communications, Nature, vol. 15(1), pages 1-13, December.
    2. Guannan Sun & Rajini Srinivasan & Camila Lopez-Anido & Holly A Hung & John Svaren & Sündüz Keleş, 2014. "In Silico Pooling of ChIP-seq Control Experiments," PLOS ONE, Public Library of Science, vol. 9(11), pages 1-9, November.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. John E Reid & Lorenz Wernisch, 2014. "STEME: A Robust, Accurate Motif Finder for Large Data Sets," PLOS ONE, Public Library of Science, vol. 9(3), pages 1-11, March.
    2. Xinyi Liu & Bin Liu & Zhimin Huang & Ting Shi & Yingyi Chen & Jian Zhang, 2012. "SPPS: A Sequence-Based Method for Predicting Probability of Protein-Protein Interaction Partners," PLOS ONE, Public Library of Science, vol. 7(1), pages 1-6, January.
    3. G. Saharidis & I. Androulakis & M. Ierapetritou, 2011. "Model building using bi-level optimization," Journal of Global Optimization, Springer, vol. 49(1), pages 49-67, January.
    4. Emily N Manderson & Mohan Malleshaiah & Stephen W Michnick, 2008. "A Novel Genetic Screen Implicates Elm1 in the Inactivation of the Yeast Transcription Factor SBF," PLOS ONE, Public Library of Science, vol. 3(1), pages 1-9, January.
    5. Cheemeng Tan & Robert Phillip Smith & Ming-Chi Tsai & Russell Schwartz & Lingchong You, 2014. "Phenotypic Signatures Arising from Unbalanced Bacterial Growth," PLOS Computational Biology, Public Library of Science, vol. 10(8), pages 1-10, August.
    6. Kyoung-Jae Won & Saurabh Agarwal & Li Shen & Robert Shoemaker & Bing Ren & Wei Wang, 2009. "An Integrated Approach to Identifying Cis-Regulatory Modules in the Human Genome," PLOS ONE, Public Library of Science, vol. 4(5), pages 1-8, May.
    7. Eilon Sharon & Shai Lubliner & Eran Segal, 2008. "A Feature-Based Approach to Modeling Protein–DNA Interactions," PLOS Computational Biology, Public Library of Science, vol. 4(8), pages 1-17, August.
    8. Xun Lan & Christopher Adams & Mark Landers & Miroslav Dudas & Daniel Krissinger & George Marnellos & Russell Bonneville & Maoxiong Xu & Junbai Wang & Tim H-M Huang & Gavin Meredith & Victor X Jin, 2011. "High Resolution Detection and Analysis of CpG Dinucleotides Methylation Using MBD-Seq Technology," PLOS ONE, Public Library of Science, vol. 6(7), pages 1-11, July.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1000158. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.