IDEAS home Printed from https://ideas.repec.org/p/bep/ucbbio/1145.html
   My bibliography  Save this paper

Regulatory Motif Finding by Logic Regression

Author

Listed:
  • Sunduz Keles

    (Division of Biostatistics, School of Public Health, University of California, Berkeley)

  • Mark van der Laan

    (Division of Biostatistics, School of Public Health, University of California, Berkeley)

  • Chris Vulpe

    (Nutritional Science & Toxicology, University of California, Berkeley)

Abstract

Multiple transcription factors coordinately control transcriptional regulation of genes in eukaryotes. Although multiple computational methods consider the identification of individual transcription factor binding sites (TFBSs), very few focus on the interactions between these sites. We consider finding transcription factor binding sites and their context specific interactions using microarray gene expression data. We devise a hybrid approach called LogicMotif composed of a TFBS identification method combined with the new regression methodology logic regression of Ruczinski et al. (2003). LogicMotif has two steps: First potential binding sites are identified from transcription control regions of genes of interest. Various available methods can be used in this first step when the genes of interest can be divided into groups such as up and down regulated. For this step, we also develop a simple univariate regression and extension method MFURE to extract candidate TFBSs from a large number of genes in the availability of microarray gene expression data. MFURE provides an alternative method for this step when partitioning of the genes into disjoint groups is not preferred. This first step aims to identify individual sites within gene groups of interest or sites that are correlated with the gene expression outcome. In the second step, logic regression is used to build a predictive model of outcome of interest (either gene expression or up and down regulation) using these potential sites. This two-fold approach creates a rich diverse set of potential binding sites in the first step and builds regression or classification models in the second step using logic regression that is particularly good at identifying complex interactions.LogicMotif is applied to two publicly available data sets. A genome-wide gene expression data set of Saccharomyces cerevisiae is used for validation. The regression models obtained are interpretable and the biological implications are in agreement with the known resuts. This analysis suggests that LogicMotif provides biologically more reasonable regression models than previous analysis of this data set with standard linear regression methods. Another data set of Saccharomyces cerevisiae illustrates the use of LogicMotif in classification questions by building a model that discriminates between up and down regulated genes in iron copper deficiency. LogicMotif identified an inductive and two repressor motifs in this data set. The inductive motif matches the binding site of the transcription factor Aft1p that has a key role in regulation of the uptake process. One of the novel repressor sites is highly present in transcription control regions of FeS genes. This site could represent a TFBS for an unknown transcription factor involved in repression of genes encoding FeS proteins in iron deficiency. We established the stability of the method to the type of outcome variable by using both continuous and binary outcome variables for this data set. Our results indicate that logic regression used in combination with cluster/group operating binding site identification methods or with our proposed method MFURE is a powerful and flexible alternative to linear regression based motif finding methods.

Suggested Citation

  • Sunduz Keles & Mark van der Laan & Chris Vulpe, 2004. "Regulatory Motif Finding by Logic Regression," U.C. Berkeley Division of Biostatistics Working Paper Series 1145, Berkeley Electronic Press.
  • Handle: RePEc:bep:ucbbio:1145
    Note: oai:bepress.com:ucbbiostat-1145
    as

    Download full text from publisher

    File URL: http://www.bepress.com/cgi/viewcontent.cgi?article=1145&context=ucbbiostat
    Download Restriction: no
    ---><---

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Sohn, Insuk & Shim, Jooyong & Hwang, Changha & Kim, Sujong & Lee, Jae Won, 2009. "Informative transcription factor selection using support vector machine-based generalized approximate cross validation criteria," Computational Statistics & Data Analysis, Elsevier, vol. 53(5), pages 1727-1735, March.
    2. Yuan Yuan & Lei Guo & Lei Shen & Jun S Liu, 2007. "Predicting Gene Expression from Sequence: A Reexamination," PLOS Computational Biology, Public Library of Science, vol. 3(11), pages 1-7, November.
    3. Tuglus Catherine & van der Laan Mark J., 2011. "Repeated Measures Semiparametric Regression Using Targeted Maximum Likelihood Methodology with Application to Transcription Factor Activity Discovery," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 10(1), pages 1-31, January.
    4. Baierl, Andreas & Futschik, Andreas & Bogdan, Malgorzata & Biecek, Przemyslaw, 2007. "Locating multiple interacting quantitative trait loci using robust model selection," Computational Statistics & Data Analysis, Elsevier, vol. 51(12), pages 6423-6434, August.
    5. Insuk Sohn & Jooyong Shim & Changha Hwang & Sujong Kim & Jae Won Lee, 2014. "Transcription factor-binding site identification and gene classification via fusion of the supervised-weighted discrete kernel clustering and support vector machine," Journal of Applied Statistics, Taylor & Francis Journals, vol. 41(3), pages 573-581, March.
    6. Siewert Elizabeth A & Kechris Katerina J, 2009. "Prediction of Motifs Based on a Repeated-Measures Model for Integrating Cross-Species Sequence and Expression Data," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 8(1), pages 1-34, September.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bep:ucbbio:1145. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Christopher F. Baum (email available below). General contact details of provider: http://www.bepress.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.