IDEAS home Printed from https://ideas.repec.org/a/taf/japsta/v41y2014i3p573-581.html
   My bibliography  Save this article

Transcription factor-binding site identification and gene classification via fusion of the supervised-weighted discrete kernel clustering and support vector machine

Author

Listed:
  • Insuk Sohn
  • Jooyong Shim
  • Changha Hwang
  • Sujong Kim
  • Jae Won Lee

Abstract

The genetic regulatory mechanism heavily influences a substantial portion of biological functions and processes needed to sustain life. For a comprehensive mechanistic understanding of biological processes, it is important to identify the common transcription factor (TF) binding sites (TFBSs) from a set of promoter sequences of co-regulated genes and classify genes that are co-regulated by certain TFs, therefore to provide an insight into the mechanism that underlies the interaction among the co-regulated genes and complicate genetic regulation. We propose a new supervised-weighted discrete kernel clustering (SWDKC) classification method for the identification of TFBS and the classification of gene. Our SWDKC method gave smaller misclassification error rate than the other methods on both the simulated data and the real NF-κB data. We verify that the selected over-represented TFBSs serve informative TFBSs from a biological point of view.

Suggested Citation

  • Insuk Sohn & Jooyong Shim & Changha Hwang & Sujong Kim & Jae Won Lee, 2014. "Transcription factor-binding site identification and gene classification via fusion of the supervised-weighted discrete kernel clustering and support vector machine," Journal of Applied Statistics, Taylor & Francis Journals, vol. 41(3), pages 573-581, March.
  • Handle: RePEc:taf:japsta:v:41:y:2014:i:3:p:573-581
    DOI: 10.1080/02664763.2013.845143
    as

    Download full text from publisher

    File URL: http://hdl.handle.net/10.1080/02664763.2013.845143
    Download Restriction: Access to full text is restricted to subscribers.

    File URL: https://libkey.io/10.1080/02664763.2013.845143?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Shim, Jooyong & Sohn, Insuk & Kim, Sujong & Lee, Jae Won & Green, Paul E. & Hwang, Changha, 2009. "Selecting marker genes for cancer classification using supervised weighted kernel clustering and the support vector machine," Computational Statistics & Data Analysis, Elsevier, vol. 53(5), pages 1736-1742, March.
    2. Sohn, Insuk & Shim, Jooyong & Hwang, Changha & Kim, Sujong & Lee, Jae Won, 2009. "Informative transcription factor selection using support vector machine-based generalized approximate cross validation criteria," Computational Statistics & Data Analysis, Elsevier, vol. 53(5), pages 1727-1735, March.
    3. Sunduz Keles & Mark van der Laan & Chris Vulpe, 2004. "Regulatory Motif Finding by Logic Regression," U.C. Berkeley Division of Biostatistics Working Paper Series 1145, Berkeley Electronic Press.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Drechsler, Jörg & Reiter, Jerome P., 2011. "An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets," Computational Statistics & Data Analysis, Elsevier, vol. 55(12), pages 3232-3243, December.
    2. Tuglus Catherine & van der Laan Mark J., 2011. "Repeated Measures Semiparametric Regression Using Targeted Maximum Likelihood Methodology with Application to Transcription Factor Activity Discovery," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 10(1), pages 1-31, January.
    3. Sohn, Insuk & Shim, Jooyong & Hwang, Changha & Kim, Sujong & Lee, Jae Won, 2009. "Informative transcription factor selection using support vector machine-based generalized approximate cross validation criteria," Computational Statistics & Data Analysis, Elsevier, vol. 53(5), pages 1727-1735, March.
    4. Baierl, Andreas & Futschik, Andreas & Bogdan, Malgorzata & Biecek, Przemyslaw, 2007. "Locating multiple interacting quantitative trait loci using robust model selection," Computational Statistics & Data Analysis, Elsevier, vol. 51(12), pages 6423-6434, August.
    5. Yuan Yuan & Lei Guo & Lei Shen & Jun S Liu, 2007. "Predicting Gene Expression from Sequence: A Reexamination," PLOS Computational Biology, Public Library of Science, vol. 3(11), pages 1-7, November.
    6. Siewert Elizabeth A & Kechris Katerina J, 2009. "Prediction of Motifs Based on a Repeated-Measures Model for Integrating Cross-Species Sequence and Expression Data," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 8(1), pages 1-34, September.
    7. Ramos, Sandra & Amaral Turkman, Antónia & Antunes, Marília, 2010. "Bayesian classification for bivariate normal gene expression," Computational Statistics & Data Analysis, Elsevier, vol. 54(8), pages 2012-2020, August.
    8. Wenyan Zhong & Jingjing Wu, 2017. "Feature Selection for Cancer Classification Using Microarray Gene Expression Data," Biostatistics and Biometrics Open Access Journal, Juniper Publishers Inc., vol. 1(2), pages 33-39, April.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:taf:japsta:v:41:y:2014:i:3:p:573-581. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Longhurst (email available below). General contact details of provider: http://www.tandfonline.com/CJAS20 .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.