IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1012385.html
   My bibliography  Save this article

Benchmarking the negatives: Effect of negative data generation on the classification of miRNA-mRNA interactions

Author

Listed:
  • Efrat Cohen-Davidi
  • Isana Veksler-Lublinsky

Abstract

MicroRNAs (miRNAs) are small non-coding RNAs that regulate gene expression post-transcriptionally. In animals, this regulation is achieved via base-pairing with partially complementary sequences on mainly 3’ UTR region of messenger RNAs (mRNAs). Computational approaches that predict miRNA target interactions (MTIs) facilitate the process of narrowing down potential targets for experimental validation. The availability of new datasets of high-throughput, direct MTIs has led to the development of machine learning (ML) based methods for MTI prediction. To train an ML algorithm, it is beneficial to provide entries from all class labels (i.e., positive and negative). Currently, no high-throughput assays exist for capturing negative examples. Therefore, current ML approaches must rely on either artificially generated or inferred negative examples deduced from experimentally identified positive miRNA-target datasets. Moreover, the lack of uniform standards for generating such data leads to biased results and hampers comparisons between studies. In this comprehensive study, we collected methods for generating negative data for animal miRNA–target interactions and investigated their impact on the classification of true human MTIs. Our study relies on training ML models on a fixed positive dataset in combination with different negative datasets and evaluating their intra- and cross-dataset performance. As a result, we were able to examine each method independently and evaluate ML models’ sensitivity to the methodologies utilized in negative data generation. To achieve a deep understanding of the performance results, we analyzed unique features that distinguish between datasets. In addition, we examined whether one-class classification models that utilize solely positive interactions for training are suitable for the task of MTI classification. We demonstrate the importance of negative data in MTI classification, analyze specific methodological characteristics that differentiate negative datasets, and highlight the challenge of ML models generalizing interaction rules from training to testing sets derived from different approaches. This study provides valuable insights into the computational prediction of MTIs that can be further used to establish standards in the field.Author summary: Gene expression regulation is fundamental for all organisms’ development, homeostasis, and environmental adaptation. microRNAs (miRNAs) play a central role in post-transcriptional gene regulation by binding to target mRNAs and repressing their translation or mediating their degradation. Technical challenges in experimental miRNA target identification led to growing interest in computational target prediction. While machine learning (ML) models have shown success in this area, they rely heavily on artificially generated negative examples due to limited experimental data. The diversity of methods for generating negative interactions and the lack of a uniform standardized approach introduce bias and hinder the comparison of results across different studies.

Suggested Citation

  • Efrat Cohen-Davidi & Isana Veksler-Lublinsky, 2024. "Benchmarking the negatives: Effect of negative data generation on the classification of miRNA-mRNA interactions," PLOS Computational Biology, Public Library of Science, vol. 20(8), pages 1-25, August.
  • Handle: RePEc:plo:pcbi00:1012385
    DOI: 10.1371/journal.pcbi.1012385
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1012385
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1012385&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1012385?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1012385. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.