IDEAS home Printed from https://ideas.repec.org/a/nat/natcom/v14y2023i1d10.1038_s41467-023-41143-7.html
   My bibliography  Save this article

A generalizable Cas9/sgRNA prediction model using machine transfer learning with small high-quality datasets

Author

Listed:
  • Dalton T. Ham

    (Schulich School of Medicine and Dentistry)

  • Tyler S. Browne

    (Schulich School of Medicine and Dentistry)

  • Pooja N. Banglorewala

    (Schulich School of Medicine and Dentistry)

  • Tyler L. Wilson

    (Tesseraqt Optimization Inc)

  • Richard K. Michael

    (Tesseraqt Optimization Inc)

  • Gregory B. Gloor

    (Schulich School of Medicine and Dentistry)

  • David R. Edgell

    (Schulich School of Medicine and Dentistry)

Abstract

The CRISPR/Cas9 nuclease from Streptococcus pyogenes (SpCas9) can be used with single guide RNAs (sgRNAs) as a sequence-specific antimicrobial agent and as a genome-engineering tool. However, current bacterial sgRNA activity models struggle with accurate predictions and do not generalize well, possibly because the underlying datasets used to train the models do not accurately measure SpCas9/sgRNA activity and cannot distinguish on-target cleavage from toxicity. Here, we solve this problem by using a two-plasmid positive selection system to generate high-quality data that more accurately reports on SpCas9/sgRNA cleavage and that separates activity from toxicity. We develop a machine learning architecture (crisprHAL) that can be trained on existing datasets, that shows marked improvements in sgRNA activity prediction accuracy when transfer learning is used with small amounts of high-quality data, and that can generalize predictions to different bacteria. The crisprHAL model recapitulates known SpCas9/sgRNA-target DNA interactions and provides a pathway to a generalizable sgRNA bacterial activity prediction tool that will enable accurate antimicrobial and genome engineering applications.

Suggested Citation

  • Dalton T. Ham & Tyler S. Browne & Pooja N. Banglorewala & Tyler L. Wilson & Richard K. Michael & Gregory B. Gloor & David R. Edgell, 2023. "A generalizable Cas9/sgRNA prediction model using machine transfer learning with small high-quality datasets," Nature Communications, Nature, vol. 14(1), pages 1-16, December.
  • Handle: RePEc:nat:natcom:v:14:y:2023:i:1:d:10.1038_s41467-023-41143-7
    DOI: 10.1038/s41467-023-41143-7
    as

    Download full text from publisher

    File URL: https://www.nature.com/articles/s41467-023-41143-7
    File Function: Abstract
    Download Restriction: no

    File URL: https://libkey.io/10.1038/s41467-023-41143-7?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Thomas A. Hamilton & Gregory M. Pellegrino & Jasmine A. Therrien & Dalton T. Ham & Peter C. Bartlett & Bogumil J. Karas & Gregory B. Gloor & David R. Edgell, 2019. "Efficient inter-species conjugative transfer of a CRISPR nuclease for targeted bacterial killing," Nature Communications, Nature, vol. 10(1), pages 1-9, December.
    2. Dipankar Baisya & Adithya Ramesh & Cory Schwartz & Stefano Lonardi & Ian Wheeldon, 2022. "Genome-wide functional screens enable the prediction of high activity CRISPR-Cas9 and -Cas12a guides in Yarrowia lipolytica," Nature Communications, Nature, vol. 13(1), pages 1-10, December.
    3. Carolin Anders & Ole Niewoehner & Alessia Duerst & Martin Jinek, 2014. "Structural basis of PAM-dependent target DNA recognition by the Cas9 endonuclease," Nature, Nature, vol. 513(7519), pages 569-573, September.
    4. Elitza Deltcheva & Krzysztof Chylinski & Cynthia M. Sharma & Karine Gonzales & Yanjie Chao & Zaid A. Pirzada & Maria R. Eckert & Jörg Vogel & Emmanuelle Charpentier, 2011. "CRISPR RNA maturation by trans-encoded small RNA and host factor RNase III," Nature, Nature, vol. 471(7340), pages 602-607, March.
    5. Benjamin P. Kleinstiver & Michelle S. Prew & Shengdar Q. Tsai & Ved V. Topkar & Nhu T. Nguyen & Zongli Zheng & Andrew P. W. Gonzales & Zhuyun Li & Randall T. Peterson & Jing-Ruey Joanna Yeh & Martin J, 2015. "Engineered CRISPR-Cas9 nucleases with altered PAM specificities," Nature, Nature, vol. 523(7561), pages 481-485, July.
    6. Mazhar Adli, 2018. "The CRISPR tool kit for genome editing and beyond," Nature Communications, Nature, vol. 9(1), pages 1-13, December.
    7. E. A. Moreb & M. D. Lynch, 2021. "Genome dependent Cas9/gRNA search time underlies sequence dependent gRNA activity," Nature Communications, Nature, vol. 12(1), pages 1-13, December.
    8. Marie-Ève Dupuis & Manuela Villion & Alfonso H. Magadán & Sylvain Moineau, 2013. "CRISPR-Cas and restriction–modification systems are compatible and increase phage resistance," Nature Communications, Nature, vol. 4(1), pages 1-7, October.
    9. Andrew D Fernandes & Jean M Macklaim & Thomas G Linn & Gregor Reid & Gregory B Gloor, 2013. "ANOVA-Like Differential Expression (ALDEx) Analysis for Mixed Population RNA-Seq," PLOS ONE, Public Library of Science, vol. 8(7), pages 1-15, July.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Jian Wang & Yuxi Teng & Ruihua Zhang & Yifei Wu & Lei Lou & Yusong Zou & Michelle Li & Zhong-Ru Xie & Yajun Yan, 2021. "Engineering a PAM-flexible SpdCas9 variant as a universal gene repressor," Nature Communications, Nature, vol. 12(1), pages 1-10, December.
    2. Giulia I. Corsi & Kunli Qu & Ferhat Alkan & Xiaoguang Pan & Yonglun Luo & Jan Gorodkin, 2022. "CRISPR/Cas9 gRNA activity depends on free energy changes and on the target PAM context," Nature Communications, Nature, vol. 13(1), pages 1-14, December.
    3. Yang Liu & Filipe Pinto & Xinyi Wan & Zhugen Yang & Shuguang Peng & Mengxi Li & Jonathan M. Cooper & Zhen Xie & Christopher E. French & Baojun Wang, 2022. "Reprogrammed tracrRNAs enable repurposing of RNAs as crRNAs and sequence-specific RNA biosensors," Nature Communications, Nature, vol. 13(1), pages 1-12, December.
    4. Shiran Abadi & Winston X Yan & David Amar & Itay Mayrose, 2017. "A machine learning approach for predicting CRISPR-Cas9 cleavage efficiencies and patterns underlying its mechanism of action," PLOS Computational Biology, Public Library of Science, vol. 13(10), pages 1-24, October.
    5. Daniel C. Volke & Román A. Martino & Ekaterina Kozaeva & Andrea M. Smania & Pablo I. Nikel, 2022. "Modular (de)construction of complex bacterial phenotypes by CRISPR/nCas9-assisted, multiplex cytidine base-editing," Nature Communications, Nature, vol. 13(1), pages 1-14, December.
    6. Fang Liang & Yu Zhang & Lin Li & Yexin Yang & Ji-Feng Fei & Yanmei Liu & Wei Qin, 2022. "SpG and SpRY variants expand the CRISPR toolbox for genome editing in zebrafish," Nature Communications, Nature, vol. 13(1), pages 1-10, December.
    7. Raed Ibraheim & Phillip W. L. Tai & Aamir Mir & Nida Javeed & Jiaming Wang & Tomás C. Rodríguez & Suk Namkung & Samantha Nelson & Eraj Shafiq Khokhar & Esther Mintzer & Stacy Maitland & Zexiang Chen &, 2021. "Self-inactivating, all-in-one AAV vectors for precision Cas9 genome editing via homology-directed repair in vivo," Nature Communications, Nature, vol. 12(1), pages 1-17, December.
    8. Ulaganathan, Kandasamy & Goud, Sravanthi & Reddy, Madhavi & Kayalvili, Ulaganathan, 2017. "Genome engineering for breaking barriers in lignocellulosic bioethanol production," Renewable and Sustainable Energy Reviews, Elsevier, vol. 74(C), pages 1080-1107.
    9. Zhaohui Zhong & Guanqing Liu & Zhongjie Tang & Shuyue Xiang & Liang Yang & Lan Huang & Yao He & Tingting Fan & Shishi Liu & Xuelian Zheng & Tao Zhang & Yiping Qi & Jian Huang & Yong Zhang, 2023. "Efficient plant genome engineering using a probiotic sourced CRISPR-Cas9 system," Nature Communications, Nature, vol. 14(1), pages 1-16, December.
    10. Margot Karlikow & Evan Amalfitano & Xiaolong Yang & Jennifer Doucet & Abigail Chapman & Peivand Sadat Mousavi & Paige Homme & Polina Sutyrina & Winston Chan & Sofia Lemak & Alexander F. Yakunin & Adam, 2023. "CRISPR-induced DNA reorganization for multiplexed nucleic acid detection," Nature Communications, Nature, vol. 14(1), pages 1-11, December.
    11. Péter István Kulcsár & András Tálas & Zoltán Ligeti & Eszter Tóth & Zsófia Rakvács & Zsuzsa Bartos & Sarah Laura Krausz & Ágnes Welker & Vanessza Laura Végi & Krisztina Huszár & Ervin Welker, 2023. "A cleavage rule for selection of increased-fidelity SpCas9 variants with high efficiency and no detectable off-targets," Nature Communications, Nature, vol. 14(1), pages 1-20, December.
    12. Jiongyu Zhang & Chengyu Hou & Changchun Liu, 2024. "CRISPR-powered quantitative keyword search engine in DNA data storage," Nature Communications, Nature, vol. 15(1), pages 1-12, December.
    13. Dawn G. L. Thean & Hoi Yee Chu & John H. C. Fong & Becky K. C. Chan & Peng Zhou & Cynthia C. S. Kwok & Yee Man Chan & Silvia Y. L. Mak & Gigi C. G. Choi & Joshua W. K. Ho & Zongli Zheng & Alan S. L. W, 2022. "Machine learning-coupled combinatorial mutagenesis enables resource-efficient engineering of CRISPR-Cas9 genome editor activities," Nature Communications, Nature, vol. 13(1), pages 1-14, December.
    14. Jeremy Vicencio & Carlos Sánchez-Bolaños & Ismael Moreno-Sánchez & David Brena & Charles E. Vejnar & Dmytro Kukhtar & Miguel Ruiz-López & Mariona Cots-Ponjoan & Alejandro Rubio & Natalia Rodrigo Meler, 2022. "Genome editing in animals with minimal PAM CRISPR-Cas9 enzymes," Nature Communications, Nature, vol. 13(1), pages 1-13, December.
    15. Adeeb Rahman & Neeti Sanan-Mishra, 2024. "When an Intruder Comes Home: GM and GE Strategies to Combat Virus Infection in Plants," Agriculture, MDPI, vol. 14(2), pages 1-26, February.
    16. Alicia Broto & Erika Gaspari & Samuel Miravet-Verde & Vitor A. P. Martins Santos & Mark Isalan, 2022. "A genetic toolkit and gene switches to limit Mycoplasma growth for biosafety applications," Nature Communications, Nature, vol. 13(1), pages 1-13, December.
    17. Annabel K. Sangree & Audrey L. Griffith & Zsofia M. Szegletes & Priyanka Roy & Peter C. DeWeirdt & Mudra Hegde & Abby V. McGee & Ruth E. Hanna & John G. Doench, 2022. "Benchmarking of SpCas9 variants enables deeper base editor screens of BRCA1 and BCL2," Nature Communications, Nature, vol. 13(1), pages 1-17, December.
    18. Maarten H. Geurts & Shashank Gandhi & Matteo G. Boretto & Ninouk Akkerman & Lucca L. M. Derks & Gijs Son & Martina Celotti & Sarina Harshuk-Shabso & Flavia Peci & Harry Begthel & Delilah Hendriks & Pa, 2023. "One-step generation of tumor models by base editor multiplexing in adult stem cell-derived organoids," Nature Communications, Nature, vol. 14(1), pages 1-18, December.
    19. Hu, Xiaojun & Rousseau, Ronald, 2016. "Scientific influence is not always visible: The phenomenon of under-cited influential publications," Journal of Informetrics, Elsevier, vol. 10(4), pages 1079-1091.
    20. Sarah J Vancuren & Scott J Dos Santos & Janet E Hill & the Maternal Microbiome Legacy Project Team, 2020. "Evaluation of variant calling for cpn60 barcode sequence-based microbiome profiling," PLOS ONE, Public Library of Science, vol. 15(7), pages 1-14, July.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:natcom:v:14:y:2023:i:1:d:10.1038_s41467-023-41143-7. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.