IDEAS home Printed from https://ideas.repec.org/a/bpj/sagmbi/v11y2012i1n6.html
   My bibliography  Save this article

MicroRNA Transcription Start Site Prediction with Multi-objective Feature Selection

Author

Listed:
  • Bhattacharyya Malay

    (Indian Statistical Institute, Kolkata)

  • Feuerbach Lars

    (Max-Planck Institute for Informatics)

  • Bhadra Tapas

    (Indian Statistical Institute, Kolkata)

  • Lengauer Thomas

    (Max-Planck Institute for Informatics)

  • Bandyopadhyay Sanghamitra

    (Indian Statistical Institute, Kolkata)

Abstract

MicroRNAs (miRNAs) are non-coding, short (21-23nt) regulators of protein-coding genes that are generally transcribed first into primary miRNA (pri-miR), followed by the generation of precursor miRNA (pre-miR). This finally leads to the production of the mature miRNA. A large amount of information is available on the pre- and mature miRNAs. However, very little is known about the pri-miRs, due to a lack of knowledge about their transcription start sites (TSSs). Based on the genomic loci, miRNAs can be categorized into two types -intragenic (intra-miR) and intergenic (inter-miR). While it is already an established fact that intra-miRs are commonly transcribed in conjunction with their host genes, the transcription machinery of inter-miRs is poorly understood. Although it is assumed that miRNA promoters are similar in structure to gene promoters, since both are transcribed by RNA polymerase II (Pol II), computational validations exhibit poor performance of gene promoter prediction methods on miRNAs. In this paper, we concentrate on the problem of TSS prediction for miRNAs. The present study begins with the identification of positive and negative promoter samples from recently published data stemming from RNA-sequencing studies. From these samples of experimentally validated miRNA TSSs, a number of standard sequence features are extracted. Furthermore, to account for potential footprints related to promoter regulation by CpG dinucleotide targeted DNA methylation, a number of novel features are defined. We develop a support vector machine (SVM) with RBF kernel for the prediction of miRNA TSSs trained on human miRNA promoters. A novel feature reduction technique based on archived multi-objective simulated annealing (AMOSA) identifies the final set of features. The resulting model trained on miRNA promoters shows improved performance over the one trained on protein-coding gene promoters in terms of classification accuracy, sensitivity and specificity. Results are also reported for a completely independent biologically validated test set. In a part of the investigation, the proposed approach is used to predict protein-coding gene TSSs. It shows a significantly improved performance when compared to previously published gene TSS prediction methods.

Suggested Citation

  • Bhattacharyya Malay & Feuerbach Lars & Bhadra Tapas & Lengauer Thomas & Bandyopadhyay Sanghamitra, 2012. "MicroRNA Transcription Start Site Prediction with Multi-objective Feature Selection," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 11(1), pages 1-25, January.
  • Handle: RePEc:bpj:sagmbi:v:11:y:2012:i:1:n:6
    DOI: 10.2202/1544-6115.1743
    as

    Download full text from publisher

    File URL: https://doi.org/10.2202/1544-6115.1743
    Download Restriction: For access to full text, subscription to the journal or payment for the individual article is required.

    File URL: https://libkey.io/10.2202/1544-6115.1743?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Ulf Schaefer & Rimantas Kodzius & Chikatoshi Kai & Jun Kawai & Piero Carninci & Yoshihide Hayashizaki & Vladimir B Bajic, 2010. "High Sensitivity TSS Prediction: Estimates of Locations Where TSS Cannot Occur," PLOS ONE, Public Library of Science, vol. 5(11), pages 1-9, November.
    2. Ramkrishna Mitra & Sanghamitra Bandyopadhyay, 2011. "MultiMiTar: A Novel Multi Objective Optimization based miRNA-Target Prediction Method," PLOS ONE, Public Library of Science, vol. 6(9), pages 1-13, September.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Mariana R Mendoza & Guilherme C da Fonseca & Guilherme Loss-Morais & Ronnie Alves & Rogerio Margis & Ana L C Bazzan, 2013. "RFMirTarget: Predicting Human MicroRNA Target Genes with a Random Forest Classifier," PLOS ONE, Public Library of Science, vol. 8(7), pages 1-18, July.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bpj:sagmbi:v:11:y:2012:i:1:n:6. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Peter Golla (email available below). General contact details of provider: https://www.degruyter.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.