IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1012340.html
   My bibliography  Save this article

Overcoming CRISPR-Cas9 off-target prediction hurdles: A novel approach with ESB rebalancing strategy and CRISPR-MCA model

Author

Listed:
  • Yanpeng Yang
  • Yanyi Zheng
  • Quan Zou
  • Jian Li
  • Hailin Feng

Abstract

The off-target activities within the CRISPR-Cas9 system remains a formidable barrier to its broader application and development. Recent advancements have highlighted the potential of deep learning models in predicting these off-target effects, yet they encounter significant hurdles including imbalances within datasets and the intricacies associated with encoding schemes and model architectures. To surmount these challenges, our study innovatively introduces an Efficiency and Specificity-Based (ESB) class rebalancing strategy, specifically devised for datasets featuring mismatches-only off-target instances, marking a pioneering approach in this realm. Furthermore, through a meticulous evaluation of various One-hot encoding schemes alongside numerous hybrid neural network models, we discern that encoding and models of moderate complexity ideally balance performance and efficiency. On this foundation, we advance a novel hybrid model, the CRISPR-MCA, which capitalizes on multi-feature extraction to enhance predictive accuracy. The empirical results affirm that the ESB class rebalancing strategy surpasses five conventional methods in addressing extreme dataset imbalances, demonstrating superior efficacy and broader applicability across diverse models. Notably, the CRISPR-MCA model excels in off-target effect prediction across four distinct mismatches-only datasets and significantly outperforms contemporary state-of-the-art models in datasets comprising both mismatches and indels. In summation, the CRISPR-MCA model, coupled with the ESB rebalancing strategy, offers profound insights and a robust framework for future explorations in this field.Author summary: In the field of gene editing, the application of deep learning technologies holds significant promise for predicting off-target effects in the CRISPR-Cas9 system. Nevertheless, one of the primary challenges encountered is the extreme imbalance among classes within the off-target datasets, which severely hampers the predictive accuracy for certain classes. Furthermore, as an array of sequence encoding methods continue to evolve, there has been a corresponding increase in model complexity. Addressing these issues, we introduce a novel Efficiency and Specificity-Based (ESB) class rebalancing strategy designed to mitigate the impact of class imbalance. Additionally, we assess the influence of six encoding schemes and four distinct architectural approaches on the prediction performance, employing four benchmark datasets for validation. Building upon these insights, we have developed a new hybrid model, termed CRISPR-MCA. Our experimental results demonstrate that the ESB strategy significantly surpasses the performance of existing baseline methods across multiple models. Moreover, the CRISPR-MCA model exhibits robust performance on two distinct types of datasets, affirming its effectiveness in enhancing the accuracy of deep learning predictions for off-target activities.

Suggested Citation

  • Yanpeng Yang & Yanyi Zheng & Quan Zou & Jian Li & Hailin Feng, 2024. "Overcoming CRISPR-Cas9 off-target prediction hurdles: A novel approach with ESB rebalancing strategy and CRISPR-MCA model," PLOS Computational Biology, Public Library of Science, vol. 20(9), pages 1-24, September.
  • Handle: RePEc:plo:pcbi00:1012340
    DOI: 10.1371/journal.pcbi.1012340
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1012340
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1012340&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1012340?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1012340. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.