IDEAS home Printed from https://ideas.repec.org/a/nat/natcom/v16y2025i1d10.1038_s41467-025-60622-7.html
   My bibliography  Save this article

Data-driven protease engineering by DNA-recording and epistasis-aware machine learning

Author

Listed:
  • Lukas Huber

    (ETH Zurich)

  • Tim Kucera

    (ETH Zurich
    Swiss Institute of Bioinformatics
    Max Planck Institute of Biochemistry)

  • Simon Höllerer

    (ETH Zurich)

  • Karsten Borgwardt

    (ETH Zurich
    Swiss Institute of Bioinformatics
    Max Planck Institute of Biochemistry)

  • Sven Panke

    (ETH Zurich)

  • Markus Jeschek

    (ETH Zurich
    University of Regensburg
    École Polytechnique Fédérale de Lausanne (EPFL))

Abstract

Protein engineering has recently seen tremendous transformation due to machine learning (ML) tools that predict structure from sequence at unprecedented precision. Predicting catalytic activity, however, remains challenging, restricting our capabilities to design protein sequences with desired catalytic function in silico. This predicament is mainly rooted in a lack of experimental methods capable of recording sequence-activity data in quantities sufficient for data-intensive ML techniques, and the inefficiency of searches in the enormous sequence spaces inherent to proteins. Herein, we address both limitations in the context of engineering proteases with tailored substrate specificity. We introduce a DNA recorder for deep specificity profiling of proteases in Escherichia coli as we demonstrate testing 29,716 candidate proteases against up to 134 substrates in parallel. The resulting sequence-activity data on approximately 600,000 protease-substrate pairs does not only reveal key sequence determinants governing protease specificity, but allows to build a data-efficient deep learning model that accurately predicts protease sequences with desired on- and off-target activities. Moreover, we present epistasis-aware training set design as a generalizable strategy to streamline searches within enormous sequence spaces, which strongly increases model accuracy at given experimental efforts and is thus likely to have implications for protein engineering far beyond proteases.

Suggested Citation

  • Lukas Huber & Tim Kucera & Simon Höllerer & Karsten Borgwardt & Sven Panke & Markus Jeschek, 2025. "Data-driven protease engineering by DNA-recording and epistasis-aware machine learning," Nature Communications, Nature, vol. 16(1), pages 1-15, December.
  • Handle: RePEc:nat:natcom:v:16:y:2025:i:1:d:10.1038_s41467-025-60622-7
    DOI: 10.1038/s41467-025-60622-7
    as

    Download full text from publisher

    File URL: https://www.nature.com/articles/s41467-025-60622-7
    File Function: Abstract
    Download Restriction: no

    File URL: https://libkey.io/10.1038/s41467-025-60622-7?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Johnny H. Hu & Shannon M. Miller & Maarten H. Geurts & Weixin Tang & Liwei Chen & Ning Sun & Christina M. Zeina & Xue Gao & Holly A. Rees & Zhi Lin & David R. Liu, 2018. "Evolved Cas9 variants with broad PAM compatibility and high DNA specificity," Nature, Nature, vol. 556(7699), pages 57-63, April.
    2. Philipp Knyphausen & Mariana Rangel Pereira & Paul Brear & Marko Hyvönen & Lutz Jermutus & Florian Hollfelder, 2023. "Evolution of protease activation and specificity via alpha-2-macroglobulin-mediated covalent capture," Nature Communications, Nature, vol. 14(1), pages 1-13, December.
    3. Kevin M. Esvelt & Jacob C. Carlson & David R. Liu, 2011. "A system for the continuous directed evolution of biomolecules," Nature, Nature, vol. 472(7344), pages 499-503, April.
    4. Ahmed H. Badran & Victor M. Guzov & Qing Huai & Melissa M. Kemp & Prashanth Vishwanath & Wendy Kain & Autumn M. Nance & Artem Evdokimov & Farhad Moshiri & Keith H. Turner & Ping Wang & Thomas Malvar &, 2016. "Continuous evolution of Bacillus thuringiensis toxins overcomes insect resistance," Nature, Nature, vol. 533(7601), pages 58-63, May.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Mary S. Morrison & Tina Wang & Aditya Raguram & Colin Hemez & David R. Liu, 2021. "Disulfide-compatible phage-assisted continuous evolution in the periplasmic space," Nature Communications, Nature, vol. 12(1), pages 1-14, December.
    2. Paul Vincelli, 2016. "Genetic Engineering and Sustainable Crop Disease Management: Opportunities for Case-by-Case Decision-Making," Sustainability, MDPI, vol. 8(5), pages 1-22, May.
    3. Grace N. Hibshman & Jack P. K. Bravo & Matthew M. Hooper & Tyler L. Dangerfield & Hongshan Zhang & Ilya J. Finkelstein & Kenneth A. Johnson & David W. Taylor, 2024. "Unraveling the mechanisms of PAMless DNA interrogation by SpRY-Cas9," Nature Communications, Nature, vol. 15(1), pages 1-15, December.
    4. Fang Liang & Yu Zhang & Lin Li & Yexin Yang & Ji-Feng Fei & Yanmei Liu & Wei Qin, 2022. "SpG and SpRY variants expand the CRISPR toolbox for genome editing in zebrafish," Nature Communications, Nature, vol. 13(1), pages 1-10, December.
    5. Zhaohui Zhong & Guanqing Liu & Zhongjie Tang & Shuyue Xiang & Liang Yang & Lan Huang & Yao He & Tingting Fan & Shishi Liu & Xuelian Zheng & Tao Zhang & Yiping Qi & Jian Huang & Yong Zhang, 2023. "Efficient plant genome engineering using a probiotic sourced CRISPR-Cas9 system," Nature Communications, Nature, vol. 14(1), pages 1-16, December.
    6. Anna Zimmermann & Julian E. Prieto-Vivas & Charlotte Cautereels & Anton Gorkovskiy & Jan Steensels & Yves Peer & Kevin J. Verstrepen, 2023. "A Cas3-base editing tool for targetable in vivo mutagenesis," Nature Communications, Nature, vol. 14(1), pages 1-16, December.
    7. Jian Wang & Yuxi Teng & Ruihua Zhang & Yifei Wu & Lei Lou & Yusong Zou & Michelle Li & Zhong-Ru Xie & Yajun Yan, 2021. "Engineering a PAM-flexible SpdCas9 variant as a universal gene repressor," Nature Communications, Nature, vol. 12(1), pages 1-10, December.
    8. Emily Zhang & Monica E. Neugebauer & Nicholas A. Krasnow & David R. Liu, 2024. "Phage-assisted evolution of highly active cytosine base editors with enhanced selectivity and minimal sequence context preference," Nature Communications, Nature, vol. 15(1), pages 1-13, December.
    9. Simeon D. Castle & Michiel Stock & Thomas E. Gorochowski, 2024. "Engineering is evolution: a perspective on design processes to engineer biology," Nature Communications, Nature, vol. 15(1), pages 1-10, December.
    10. Koki Mise & Jianyin Long & Daniel L. Galvan & Zengchun Ye & Guizhen Fan & Rajesh Sharma & Irina I. Serysheva & Travis I. Moore & Collene R. Jeter & M. Anna Zal & Motoo Araki & Jun Wada & Paul T. Schum, 2024. "NDUFS4 regulates cristae remodeling in diabetic kidney disease," Nature Communications, Nature, vol. 15(1), pages 1-19, December.
    11. Lara Sellés Vidal & James W. Murray & John T. Heap, 2021. "Versatile selective evolutionary pressure using synthetic defect in universal metabolism," Nature Communications, Nature, vol. 12(1), pages 1-15, December.
    12. Marion Rosello & Malo Serafini & Luca Mignani & Dario Finazzi & Carine Giovannangeli & Marina C. Mione & Jean-Paul Concordet & Filippo Del Bene, 2022. "Disease modeling by efficient genome editing using a near PAM-less base editor in vivo," Nature Communications, Nature, vol. 13(1), pages 1-13, December.
    13. Péter István Kulcsár & András Tálas & Zoltán Ligeti & Eszter Tóth & Zsófia Rakvács & Zsuzsa Bartos & Sarah Laura Krausz & Ágnes Welker & Vanessza Laura Végi & Krisztina Huszár & Ervin Welker, 2023. "A cleavage rule for selection of increased-fidelity SpCas9 variants with high efficiency and no detectable off-targets," Nature Communications, Nature, vol. 14(1), pages 1-20, December.
    14. Yanik Weber & Desirée Böck & Anastasia Ivașcu & Nicolas Mathis & Tanja Rothgangl & Eleonora I. Ioannidi & Alex C. Blaudt & Lisa Tidecks & Máté Vadovics & Hiromi Muramatsu & Andreas Reichmuth & Kim F. , 2024. "Enhancing prime editor activity by directed protein evolution in yeast," Nature Communications, Nature, vol. 15(1), pages 1-11, December.
    15. Giulia I. Corsi & Kunli Qu & Ferhat Alkan & Xiaoguang Pan & Yonglun Luo & Jan Gorodkin, 2022. "CRISPR/Cas9 gRNA activity depends on free energy changes and on the target PAM context," Nature Communications, Nature, vol. 13(1), pages 1-14, December.
    16. Angelo Miskalis & Shraddha Shirguppe & Jackson Winter & Gianna Elias & Devyani Swami & Ananthan Nambiar & Michelle Stilger & Wendy S. Woods & Nicholas Gosstola & Michael Gapinske & Alejandra Zeballos , 2024. "SPLICER: a highly efficient base editing toolbox that enables in vivo therapeutic exon skipping," Nature Communications, Nature, vol. 15(1), pages 1-16, December.
    17. Zeyu Lu & Lingtian Zhang & Qing Mu & Junyang Liu & Yu Chen & Haoyuan Wang & Yanjun Zhang & Rui Su & Ruijun Wang & Zhiying Wang & Qi Lv & Zhihong Liu & Jiasen Liu & Yunhua Li & Yanhong Zhao, 2024. "Progress in Research and Prospects for Application of Precision Gene-Editing Technology Based on CRISPR–Cas9 in the Genetic Improvement of Sheep and Goats," Agriculture, MDPI, vol. 14(3), pages 1-17, March.
    18. Jie Yang & Tongyao Wang & Ying Huang & Zhaoyi Long & Xuzichao Li & Shuqin Zhang & Lingling Zhang & Zhikun Liu & Qian Zhang & Huabing Sun & Minjie Zhang & Hang Yin & Zhongmin Liu & Heng Zhang, 2025. "Insights into the compact CRISPR–Cas9d system," Nature Communications, Nature, vol. 16(1), pages 1-12, December.
    19. Shun-Qing Liang & Pengpeng Liu & Jordan L. Smith & Esther Mintzer & Stacy Maitland & Xiaolong Dong & Qiyuan Yang & Jonathan Lee & Cole M. Haynes & Lihua Julie Zhu & Jonathan K. Watts & Erik J. Sonthei, 2022. "Genome-wide detection of CRISPR editing in vivo using GUIDE-tag," Nature Communications, Nature, vol. 13(1), pages 1-14, December.
    20. Jeonghye Yu & Jongpil Shin & Jihwan Yu & Jihye Kim & Daseuli Yu & Won Do Heo, 2024. "Programmable RNA base editing with photoactivatable CRISPR-Cas13," Nature Communications, Nature, vol. 15(1), pages 1-14, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:natcom:v:16:y:2025:i:1:d:10.1038_s41467-025-60622-7. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.