Author
Listed:
- Akash Bahai
- Chee Keong Kwoh
- Yuguang Mu
- Yinghui Li
Abstract
The 3D structure of RNA critically influences its functionality, and understanding this structure is vital for deciphering RNA biology. Experimental methods for determining RNA structures are labour-intensive, expensive, and time-consuming. Computational approaches have emerged as valuable tools, leveraging physics-based-principles and machine learning to predict RNA structures rapidly. Despite advancements, the accuracy of computational methods remains modest, especially when compared to protein structure prediction. Deep learning methods, while successful in protein structure prediction, have shown some promise for RNA structure prediction as well, but face unique challenges. This study systematically benchmarks state-of-the-art deep learning methods for RNA structure prediction across diverse datasets. Our aim is to identify factors influencing performance variation, such as RNA family diversity, sequence length, RNA type, multiple sequence alignment (MSA) quality, and deep learning model architecture. We show that generally ML-based methods perform much better than non-ML methods on most RNA targets, although the performance difference isn’t substantial when working with unseen novel or synthetic RNAs. The quality of the MSA and secondary structure prediction both play an important role and most methods aren’t able to predict non-Watson-Crick pairs in the RNAs. Overall among the automated 3D RNA structure prediction methods, DeepFoldRNA has the best prediction results followed by DRFold as the second best method. Finally, we also suggest possible mitigations to improve the quality of the prediction for future method development.Author summary: Systematic benchmarking of five latest deep-learning and two fragment-assembly based methods on diverse datasetsCompiled a new balanced dataset with latest RNA structures for benchmarkingGenerally, the ML-based methods outperform the traditional fragment-assembly based methods with DeepFoldRNA having the best predicted models overallOn orphan RNA’s, the ML-based methods are only slightly better than FA-based methods, and generally all methods have poor performance on orphan RNAs.The performance of the methods is dependent on the MSA depth, RNA type, and secondary structure.
Suggested Citation
Akash Bahai & Chee Keong Kwoh & Yuguang Mu & Yinghui Li, 2024.
"Systematic benchmarking of deep-learning methods for tertiary RNA structure prediction,"
PLOS Computational Biology, Public Library of Science, vol. 20(12), pages 1-44, December.
Handle:
RePEc:plo:pcbi00:1012715
DOI: 10.1371/journal.pcbi.1012715
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1012715. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.