IDEAS home Printed from https://ideas.repec.org/a/nat/natcom/v16y2025i1d10.1038_s41467-025-63036-7.html
   My bibliography  Save this article

Chemical knowledge-informed framework for privacy-aware retrosynthesis learning

Author

Listed:
  • Guikun Chen

    (Zhejiang University)

  • Xu Zhang

    (Zhejiang University)

  • Xiaolin Hu

    (Renmin University of China)

  • Yong Liu

    (Renmin University of China)

  • Yi Yang

    (Zhejiang University)

  • Wenguan Wang

    (Zhejiang University)

Abstract

Chemical reaction data is a pivotal asset, driving advances in competitive fields such as pharmaceuticals, materials science, and industrial chemistry. Its proprietary nature renders it sensitive, as it often includes confidential insights and competitive advantages organizations strive to protect. However, in contrast to this need for confidentiality, the current standard training paradigm for machine learning-based retrosynthesis gathers reaction data from multiple sources into one single edge to train prediction models. This paradigm poses considerable privacy risks as it necessitates broad data availability across organizational boundaries and frequent data transmission between entities, potentially exposing proprietary information to unauthorized access or interception during storage and transfer. In the present study, we introduce the chemical knowledge-informed framework (CKIF), a privacy-preserving approach for learning retrosynthesis models. CKIF enables distributed training across multiple chemical organizations without compromising the confidentiality of proprietary reaction data. Instead of gathering raw reaction data, CKIF learns retrosynthesis models through iterative, chemical knowledge-informed aggregation of model parameters. In particular, the chemical properties of predicted reactants are leveraged to quantitatively assess the observable behaviors of individual models, which in turn determines the adaptive weights used for model aggregation. On a variety of reaction datasets, CKIF outperforms several strong baselines by a clear margin.

Suggested Citation

  • Guikun Chen & Xu Zhang & Xiaolin Hu & Yong Liu & Yi Yang & Wenguan Wang, 2025. "Chemical knowledge-informed framework for privacy-aware retrosynthesis learning," Nature Communications, Nature, vol. 16(1), pages 1-14, December.
  • Handle: RePEc:nat:natcom:v:16:y:2025:i:1:d:10.1038_s41467-025-63036-7
    DOI: 10.1038/s41467-025-63036-7
    as

    Download full text from publisher

    File URL: https://www.nature.com/articles/s41467-025-63036-7
    File Function: Abstract
    Download Restriction: no

    File URL: https://libkey.io/10.1038/s41467-025-63036-7?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Lei Fang & Junren Li & Ming Zhao & Li Tan & Jian-Guang Lou, 2023. "Single-step retrosynthesis prediction by leveraging commonly preserved substructures," Nature Communications, Nature, vol. 14(1), pages 1-14, December.
    2. Weihe Zhong & Ziduo Yang & Calvin Yu-Chian Chen, 2023. "Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing," Nature Communications, Nature, vol. 14(1), pages 1-14, December.
    3. Bao Feng & Jiangfeng Shi & Liebin Huang & Zhiqi Yang & Shi-Ting Feng & Jianpeng Li & Qinxian Chen & Huimin Xue & Xiangguang Chen & Cuixia Wan & Qinghui Hu & Enming Cui & Yehang Chen & Wansheng Long, 2024. "Robustly federated learning model for identifying high-risk patients with postoperative gastric cancer recurrence," Nature Communications, Nature, vol. 15(1), pages 1-11, December.
    4. Jonathan Schooler, 2011. "Unpublished results hide the decline effect," Nature, Nature, vol. 470(7335), pages 437-437, February.
    5. Igor V. Tetko & Pavel Karpov & Ruud Deursen & Guillaume Godin, 2020. "State-of-the-art augmented NLP transformer models for direct and single-step retrosynthesis," Nature Communications, Nature, vol. 11(1), pages 1-11, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Yuqiang Han & Xiaoyang Xu & Chang-Yu Hsieh & Keyan Ding & Hongxia Xu & Renjun Xu & Tingjun Hou & Qiang Zhang & Huajun Chen, 2024. "Retrosynthesis prediction with an iterative string editing model," Nature Communications, Nature, vol. 15(1), pages 1-16, December.
    2. Yafeng Deng & Xinda Zhao & Hanyu Sun & Yu Chen & Xiaorui Wang & Xi Xue & Liangning Li & Jianfei Song & Chang-Yu Hsieh & Tingjun Hou & Xiandao Pan & Taghrid Saad Alomar & Xiangyang Ji & Xiaojian Wang, 2025. "RSGPT: a generative transformer model for retrosynthesis planning pre-trained on ten billion datapoints," Nature Communications, Nature, vol. 16(1), pages 1-14, December.
    3. Yu Shee & Haote Li & Pengpeng Zhang & Andrea M. Nikolic & Wenxin Lu & H. Ray Kelly & Vidhyadhar Manee & Sanil Sreekumar & Frederic G. Buono & Jinhua J. Song & Timothy R. Newhouse & Victor S. Batista, 2024. "Site-specific template generative approach for retrosynthetic planning," Nature Communications, Nature, vol. 15(1), pages 1-10, December.
    4. Dimos, Christos & Pugh, Geoff & Hisarciklilar, Mehtap & Talam, Ema & Jackson, Ian, 2022. "The relative effectiveness of R&D tax credits and R&D subsidies: A comparative meta-regression analysis," Technovation, Elsevier, vol. 115(C).
    5. Vieira, Valter Afonso & Rafael, Diego Nogueira & Agnihotri, Raj, 2022. "Augmented reality generalizations: A meta-analytical review on consumer-related outcomes and the mediating role of hedonic and utilitarian values," Journal of Business Research, Elsevier, vol. 151(C), pages 170-184.
    6. Fanelli, Daniele, 2020. "Metascientific reproducibility patterns revealed by informatic measure of knowledge," MetaArXiv 5vnhj, Center for Open Science.
    7. Nicolas Vallois & Dorian Jullien, 2017. "Replication in experimental economics: A historical and quantitative approach focused on public good game experiments," Université Paris1 Panthéon-Sorbonne (Post-Print and Working Papers) halshs-01651080, HAL.
    8. Martin E Héroux & Janet L Taylor & Simon C Gandevia, 2015. "The Use and Abuse of Transcranial Magnetic Stimulation to Modulate Corticospinal Excitability in Humans," PLOS ONE, Public Library of Science, vol. 10(12), pages 1-10, December.
    9. Edward H. Lee & Michelle Han & Jason Wright & Michael Kuwabara & Jacob Mevorach & Gang Fu & Olivia Choudhury & Ujjwal Ratan & Michael Zhang & Matthias W. Wagner & Robert Goetti & Sebastian Toescu & Se, 2024. "An international study presenting a federated learning AI platform for pediatric brain tumors," Nature Communications, Nature, vol. 15(1), pages 1-11, December.
    10. Johannes Auer & Dominik Papies, 2020. "Cross-price elasticities and their determinants: a meta-analysis and new empirical generalizations," Journal of the Academy of Marketing Science, Springer, vol. 48(3), pages 584-605, May.
    11. Lynch, John G. & Bradlow, Eric T. & Huber, Joel C. & Lehmann, Donald R., 2015. "Reflections on the replication corner: In praise of conceptual replications," International Journal of Research in Marketing, Elsevier, vol. 32(4), pages 333-342.
    12. Senliang Lu & Yehang Chen & Yuan Chen & Peijun Li & Junqi Sun & Changye Zheng & Yujian Zou & Bo Liang & Mingwei Li & Qinggeng Jin & Enming Cui & Wansheng Long & Bao Feng, 2025. "General lightweight framework for vision foundation model supporting multi-task and multi-center medical image analysis," Nature Communications, Nature, vol. 16(1), pages 1-16, December.
    13. Mangirdas Morkunas & Elzė Rudienė & Lukas Giriūnas & Laura Daučiūnienė, 2020. "Assessment of Factors Causing Bias in Marketing- Related Publications," Publications, MDPI, vol. 8(4), pages 1-16, October.
    14. Frank Renkewitz & Heather M. Fuchs & Susann Fiedler, 2011. "Is there evidence of publication biases in JDM research?," Judgment and Decision Making, Society for Judgment and Decision Making, vol. 6(8), pages 870-881, December.
    15. Peng-Cheng Zhao & Xue-Xin Wei & Qiong Wang & Qi-Hao Wang & Jia-Ning Li & Jie Shang & Cheng Lu & Jian-Yu Shi, 2025. "Single-step retrosynthesis prediction via multitask graph representation learning," Nature Communications, Nature, vol. 16(1), pages 1-19, December.
    16. Yasuhiro Yoshikai & Tadahaya Mizuno & Shumpei Nemoto & Hiroyuki Kusuhara, 2024. "Difficulty in chirality recognition for Transformer architectures learning chemical structures from string representations," Nature Communications, Nature, vol. 15(1), pages 1-12, December.
    17. Jinho Chang & Jong Chul Ye, 2024. "Bidirectional generation of structure and properties through a single molecular foundation model," Nature Communications, Nature, vol. 15(1), pages 1-14, December.
    18. Christopher Schlaegel & Michael Koenig, 2014. "Determinants of Entrepreneurial Intent: A Meta–Analytic Test and Integration of Competing Models," Entrepreneurship Theory and Practice, , vol. 38(2), pages 291-332, March.
    19. Laura Smarandescu & Terence Shimp, 2015. "Drink coca-cola, eat popcorn, and choose powerade: testing the limits of subliminal persuasion," Marketing Letters, Springer, vol. 26(4), pages 715-726, December.
    20. Nicolas Vallois & Dorian Jullien, 2017. "Replication in experimental economics: A historical and quantitative approach focused on public good game experiments," Working Papers halshs-01651080, HAL.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:natcom:v:16:y:2025:i:1:d:10.1038_s41467-025-63036-7. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.