Author
Listed:
- Yan Shao
- Yazhou Li
- Hexin Zhai
- Shimin Dong
Abstract
Predicting microRNA target genes is essential for understanding their biological functions. This study developed a miRNA target gene prediction model based on input-feature dependency. Features were treated as multiple random variables, with marginal densities estimated using Gaussian mixture models (GMM) and dependencies captured by regular vine (R-vine) copula to derive joint probability density functions. We constructed class-conditional joint densities for positive and negative samples separately using GMM and R-vine copula, then combined these with prior probabilities using Bayes’ rule to obtain posterior probabilities of positive interactions, using a standard 0.5 probability threshold for deterministic prediction. To address insufficient data and class imbalance, hybrid distribution mega-trend diffusion was used to generate virtual samples for data augmentation. Computational validation showed high predictive performance even when only 30% of the training data were used. As proof-of-concept, we experimentally validated one predicted interaction (miR-8485 targeting JAK2) using dual-luciferase, cellular, and animal experiments, confirming the biological relevance of this specific model-generated prediction. These findings provide a valuable tool for understanding miRNA functions and disease mechanisms.Author summary: In this study, we developed a new computational model to more accurately predict which genes are regulated by microRNAs—small RNA molecules that play key roles in health and disease. Predicting these targets is difficult because biological data are often limited, imbalanced, and contain complex relationships between features. Our model addresses these challenges by combining two innovations: a probabilistic prediction framework that accounts for dependencies between input features, and a data expansion method that generates realistic synthetic samples to balance the dataset. Computational experiments show that our model performs well even when trained on only 30% of the training data and outperforms existing methods in predictive accuracy. Through laboratory experiments, we validated one prediction—that miR-8485 targets the JAK2 gene—serving as a proof-of-concept demonstration that the model can generate biologically-plausible hypotheses. Our findings provide researchers with a promising tool for uncovering microRNA functions, which can help advance our understanding of diseases and support the development of new therapies.
Suggested Citation
Yan Shao & Yazhou Li & Hexin Zhai & Shimin Dong, 2026.
"MicroRNA target gene prediction model based on input-feature dependency and sample data expansion technique,"
PLOS Computational Biology, Public Library of Science, vol. 22(6), pages 1-24, June.
Handle:
RePEc:plo:pcbi00:1014402
DOI: 10.1371/journal.pcbi.1014402
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1014402. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.