IDEAS home Printed from https://ideas.repec.org/a/nat/natcom/v13y2022i1d10.1038_s41467-022-28536-w.html
   My bibliography  Save this article

Biocatalysed synthesis planning using data-driven learning

Author

Listed:
  • Daniel Probst

    (IBM Research Europe
    National Center for Competence in Research-Catalysis (NCCR-Catalysis))

  • Matteo Manica

    (IBM Research Europe)

  • Yves Gaetan Nana Teukam

    (IBM Research Europe)

  • Alessandro Castrogiovanni

    (IBM Research Europe
    National Center for Competence in Research-Catalysis (NCCR-Catalysis))

  • Federico Paratore

    (IBM Research Europe)

  • Teodoro Laino

    (IBM Research Europe
    National Center for Competence in Research-Catalysis (NCCR-Catalysis))

Abstract

Enzyme catalysts are an integral part of green chemistry strategies towards a more sustainable and resource-efficient chemical synthesis. However, the use of biocatalysed reactions in retrosynthetic planning clashes with the difficulties in predicting the enzymatic activity on unreported substrates and enzyme-specific stereo- and regioselectivity. As of now, only rule-based systems support retrosynthetic planning using biocatalysis, while initial data-driven approaches are limited to forward predictions. Here, we extend the data-driven forward reaction as well as retrosynthetic pathway prediction models based on the Molecular Transformer architecture to biocatalysis. The enzymatic knowledge is learned from an extensive data set of publicly available biochemical reactions with the aid of a new class token scheme based on the enzyme commission classification number, which captures catalysis patterns among different enzymes belonging to the same hierarchy. The forward reaction prediction model (top-1 accuracy of 49.6%), the retrosynthetic pathway (top-1 single-step round-trip accuracy of 39.6%) and the curated data set are made publicly available to facilitate the adoption of enzymatic catalysis in the design of greener chemistry processes.

Suggested Citation

  • Daniel Probst & Matteo Manica & Yves Gaetan Nana Teukam & Alessandro Castrogiovanni & Federico Paratore & Teodoro Laino, 2022. "Biocatalysed synthesis planning using data-driven learning," Nature Communications, Nature, vol. 13(1), pages 1-11, December.
  • Handle: RePEc:nat:natcom:v:13:y:2022:i:1:d:10.1038_s41467-022-28536-w
    DOI: 10.1038/s41467-022-28536-w
    as

    Download full text from publisher

    File URL: https://www.nature.com/articles/s41467-022-28536-w
    File Function: Abstract
    Download Restriction: no

    File URL: https://libkey.io/10.1038/s41467-022-28536-w?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Giorgio Pesciullesi & Philippe Schwaller & Teodoro Laino & Jean-Louis Reymond, 2020. "Transfer learning enables the molecular transformer to predict regio- and stereoselective reactions on carbohydrates," Nature Communications, Nature, vol. 11(1), pages 1-8, December.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Itai Levin & Mengjie Liu & Christopher A. Voigt & Connor W. Coley, 2022. "Merging enzymatic and synthetic chemistry with computational synthesis planning," Nature Communications, Nature, vol. 13(1), pages 1-14, December.
    2. Shuangjia Zheng & Tao Zeng & Chengtao Li & Binghong Chen & Connor W. Coley & Yuedong Yang & Ruibo Wu, 2022. "Deep learning driven biosynthetic pathways navigation for natural products with BioNavi-NP," Nature Communications, Nature, vol. 13(1), pages 1-9, December.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Shu-Wen Li & Li-Cheng Xu & Cheng Zhang & Shuo-Qing Zhang & Xin Hong, 2023. "Reaction performance prediction with an extrapolative and interpretable graph model based on chemical knowledge," Nature Communications, Nature, vol. 14(1), pages 1-12, December.
    2. Umit V. Ucak & Islambek Ashyrmamatov & Junsu Ko & Juyong Lee, 2022. "Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments," Nature Communications, Nature, vol. 13(1), pages 1-10, December.
    3. Shuangjia Zheng & Tao Zeng & Chengtao Li & Binghong Chen & Connor W. Coley & Yuedong Yang & Ruibo Wu, 2022. "Deep learning driven biosynthetic pathways navigation for natural products with BioNavi-NP," Nature Communications, Nature, vol. 13(1), pages 1-9, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:natcom:v:13:y:2022:i:1:d:10.1038_s41467-022-28536-w. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.