IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v53y2009i5p1727-1735.html
   My bibliography  Save this article

Informative transcription factor selection using support vector machine-based generalized approximate cross validation criteria

Author

Listed:
  • Sohn, Insuk
  • Shim, Jooyong
  • Hwang, Changha
  • Kim, Sujong
  • Lee, Jae Won

Abstract

The genetic regulatory mechanism plays a pivotal role in many biological processes ranging from development to survival. The identification of the common transcription factor binding sites (TFBSs) from a set of known co-regulated gene promoters and the identification of genes that are regulated by the transcription factor (TF) that have important roles in a particular biological function will advance our understanding of the interaction among the co-regulated genes and intricate genetic regulatory mechanism underlying this function. To identify the common TFBSs from a set of known co-regulated gene promoters and classify genes that are regulated by TFs, the new approaches using Support Vector Machine (SVM)-based Generalized Approximate Cross Validation (GACV) criteria are proposed. Two variable selection methods are considered for Recursive Feature Elimination (RFE) and Recursive Feature Addition (RFA). Performances of the proposed methods are compared with the existing SVM-based criteria, Logistic Regression Analysis (LRA), Logic Regression (LR), and Decision Tree (DT) methods by using both two real TF target genes data and the simulated data. In terms of test error rates, the proposed methods perform better than the existing methods.

Suggested Citation

  • Sohn, Insuk & Shim, Jooyong & Hwang, Changha & Kim, Sujong & Lee, Jae Won, 2009. "Informative transcription factor selection using support vector machine-based generalized approximate cross validation criteria," Computational Statistics & Data Analysis, Elsevier, vol. 53(5), pages 1727-1735, March.
  • Handle: RePEc:eee:csdana:v:53:y:2009:i:5:p:1727-1735
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167-9473(08)00252-1
    Download Restriction: Full text for ScienceDirect subscribers only.
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Sunduz Keles & Mark van der Laan & Chris Vulpe, 2004. "Regulatory Motif Finding by Logic Regression," U.C. Berkeley Division of Biostatistics Working Paper Series 1145, Berkeley Electronic Press.
    2. Yoonkyung Lee & Yuwon Kim & Sangjun Lee & Ja-Yong Koo, 2006. "Structured multicategory support vector machines with analysis of variance decomposition," Biometrika, Biometrika Trust, vol. 93(3), pages 555-571, September.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Insuk Sohn & Jooyong Shim & Changha Hwang & Sujong Kim & Jae Won Lee, 2014. "Transcription factor-binding site identification and gene classification via fusion of the supervised-weighted discrete kernel clustering and support vector machine," Journal of Applied Statistics, Taylor & Francis Journals, vol. 41(3), pages 573-581, March.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Tuglus Catherine & van der Laan Mark J., 2011. "Repeated Measures Semiparametric Regression Using Targeted Maximum Likelihood Methodology with Application to Transcription Factor Activity Discovery," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 10(1), pages 1-31, January.
    2. Hoai An Le Thi & Manh Cuong Nguyen, 2017. "DCA based algorithms for feature selection in multi-class support vector machine," Annals of Operations Research, Springer, vol. 249(1), pages 273-300, February.
    3. Park, Beomjin & Park, Changyi, 2021. "Kernel variable selection for multicategory support vector machines," Journal of Multivariate Analysis, Elsevier, vol. 186(C).
    4. Lee, Sangjun & Park, Changyi & Koo, Ja-Yong, 2011. "Feature selection in the Laplacian support vector machine," Computational Statistics & Data Analysis, Elsevier, vol. 55(1), pages 567-577, January.
    5. Chen, Zhen-Yu & Fan, Zhi-Ping & Sun, Minghe, 2012. "A hierarchical multiple kernel support vector machine for customer churn prediction using longitudinal behavioral data," European Journal of Operational Research, Elsevier, vol. 223(2), pages 461-472.
    6. Park, Beomjin & Park, Changyi, 2023. "Multiclass Laplacian support vector machine with functional analysis of variance decomposition," Computational Statistics & Data Analysis, Elsevier, vol. 187(C).
    7. Zhilan Lou & Jun Shao & Menggang Yu, 2018. "Optimal treatment assignment to maximize expected outcome with multiple treatments," Biometrics, The International Biometric Society, vol. 74(2), pages 506-516, June.
    8. Park, Changyi & Koo, Ja-Yong & Kim, Peter T. & Lee, Jae Won, 2008. "Stepwise feature selection using generalized logistic loss," Computational Statistics & Data Analysis, Elsevier, vol. 52(7), pages 3709-3718, March.
    9. Baierl, Andreas & Futschik, Andreas & Bogdan, Malgorzata & Biecek, Przemyslaw, 2007. "Locating multiple interacting quantitative trait loci using robust model selection," Computational Statistics & Data Analysis, Elsevier, vol. 51(12), pages 6423-6434, August.
    10. Insuk Sohn & Jooyong Shim & Changha Hwang & Sujong Kim & Jae Won Lee, 2014. "Transcription factor-binding site identification and gene classification via fusion of the supervised-weighted discrete kernel clustering and support vector machine," Journal of Applied Statistics, Taylor & Francis Journals, vol. 41(3), pages 573-581, March.
    11. Yuan Yuan & Lei Guo & Lei Shen & Jun S Liu, 2007. "Predicting Gene Expression from Sequence: A Reexamination," PLOS Computational Biology, Public Library of Science, vol. 3(11), pages 1-7, November.
    12. Siewert Elizabeth A & Kechris Katerina J, 2009. "Prediction of Motifs Based on a Repeated-Measures Model for Integrating Cross-Species Sequence and Expression Data," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 8(1), pages 1-34, September.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:53:y:2009:i:5:p:1727-1735. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.