IDEAS home Printed from https://ideas.repec.org/a/nat/natcom/v16y2025i1d10.1038_s41467-025-61778-y.html
   My bibliography  Save this article

Accurate prediction of synthesizability and precursors of 3D crystal structures via large language models

Author

Listed:
  • Zhilong Song

    (Southeast University)

  • Shuaihua Lu

    (Southeast University)

  • Minggang Ju

    (Southeast University)

  • Qionghua Zhou

    (Southeast University
    Suzhou Laboratory)

  • Jinlan Wang

    (Southeast University
    Suzhou Laboratory)

Abstract

Accessing the synthesizability of crystal structures is crucial for transforming theoretical materials into real-world applications. Nevertheless, there is a significant gap between actual synthesizability and thermodynamic or kinetic stability commonly used to screen synthesizable structures. Herein, we develop the Crystal Synthesis Large Language Models (CSLLM) framework, which utilizes three specialized LLMs to predict the synthesizability of arbitrary 3D crystal structures, possible synthetic methods, and suitable precursors, respectively. We construct a comprehensive dataset including synthesizable/non-synthesizable crystal structures and develop an efficient text representation for crystal structures to fine-tune LLMs. Our Synthesizability LLM achieves state-of-the-art accuracy (98.6%), significantly outperforming traditional synthesizability screening based on thermodynamic and kinetic stability. Its outstanding generalization ability is further demonstrated in experimental structures with complexity considerably exceeding that of the training data. Furthermore, both the Method and Precursor LLMs exceed 90% accuracy in classifying possible synthetic methods and identifying solid-state synthetic precursors for common binary and ternary compounds, respectively. Leveraging CSLLM, tens of thousands of synthesizable theoretical structures are successfully identified, with their 23 key properties predicted using accurate graph neural network models.

Suggested Citation

  • Zhilong Song & Shuaihua Lu & Minggang Ju & Qionghua Zhou & Jinlan Wang, 2025. "Accurate prediction of synthesizability and precursors of 3D crystal structures via large language models," Nature Communications, Nature, vol. 16(1), pages 1-11, December.
  • Handle: RePEc:nat:natcom:v:16:y:2025:i:1:d:10.1038_s41467-025-61778-y
    DOI: 10.1038/s41467-025-61778-y
    as

    Download full text from publisher

    File URL: https://www.nature.com/articles/s41467-025-61778-y
    File Function: Abstract
    Download Restriction: no

    File URL: https://libkey.io/10.1038/s41467-025-61778-y?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Daniil A. Boiko & Robert MacKnight & Ben Kline & Gabe Gomes, 2023. "Autonomous chemical research with large language models," Nature, Nature, vol. 624(7992), pages 570-578, December.
    2. Keith T. Butler & Daniel W. Davies & Hugh Cartwright & Olexandr Isayev & Aron Walsh, 2018. "Machine learning for molecular and materials science," Nature, Nature, vol. 559(7715), pages 547-555, July.
    3. Yuanfeng Xu & Luis Elcoro & Zhi-Da Song & Benjamin J. Wieder & M. G. Vergniory & Nicolas Regnault & Yulin Chen & Claudia Felser & B. Andrei Bernevig, 2020. "High-throughput calculations of magnetic topological materials," Nature, Nature, vol. 586(7831), pages 702-707, October.
    4. Shuaihua Lu & Qionghua Zhou & Yixin Ouyang & Yilv Guo & Qiang Li & Jinlan Wang, 2018. "Accelerated discovery of stable lead-free hybrid organic-inorganic perovskites via machine learning," Nature Communications, Nature, vol. 9(1), pages 1-8, December.
    5. Xinyu Chen & Shuaihua Lu & Qian Chen & Qionghua Zhou & Jinlan Wang, 2024. "Author Correction: From bulk effective mass to 2D carrier mobility accurate prediction via adversarial transfer learning," Nature Communications, Nature, vol. 15(1), pages 1-1, December.
    6. Claudio Zeni & Robert Pinsler & Daniel Zügner & Andrew Fowler & Matthew Horton & Xiang Fu & Zilong Wang & Aliaksandra Shysheya & Jonathan Crabbé & Shoko Ueda & Roberto Sordillo & Lixin Sun & Jake Smit, 2025. "A generative model for inorganic materials design," Nature, Nature, vol. 639(8055), pages 624-632, March.
    7. Yilei Wu & Chang-Feng Wang & Ming-Gang Ju & Qiangqiang Jia & Qionghua Zhou & Shuaihua Lu & Xinying Gao & Yi Zhang & Jinlan Wang, 2024. "Universal machine learning aided synthesis approach of two-dimensional perovskites in a typical laboratory," Nature Communications, Nature, vol. 15(1), pages 1-10, December.
    8. Arunima K. Singh & Joseph H. Montoya & John M. Gregoire & Kristin A. Persson, 2019. "Robust and synthesizable photocatalysts for CO2 reduction: a data-driven materials discovery," Nature Communications, Nature, vol. 10(1), pages 1-9, December.
    9. Baicheng Weng & Zhilong Song & Rilong Zhu & Qingyu Yan & Qingde Sun & Corey G. Grice & Yanfa Yan & Wan-Jian Yin, 2020. "Simple descriptor derived from symbolic regression accelerating the discovery of new perovskite catalysts," Nature Communications, Nature, vol. 11(1), pages 1-8, December.
    10. Xinyu Chen & Shuaihua Lu & Qian Chen & Qionghua Zhou & Jinlan Wang, 2024. "From bulk effective mass to 2D carrier mobility accurate prediction via adversarial transfer learning," Nature Communications, Nature, vol. 15(1), pages 1-9, December.
    11. Luis M. Antunes & Keith T. Butler & Ricardo Grau-Crespo, 2024. "Crystal structure generation with autoregressive large language modeling," Nature Communications, Nature, vol. 15(1), pages 1-16, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Zhilong Song & Linfeng Fan & Shuaihua Lu & Chongyi Ling & Qionghua Zhou & Jinlan Wang, 2025. "Inverse design of promising electrocatalysts for CO2 reduction via generative models and bird swarm algorithm," Nature Communications, Nature, vol. 16(1), pages 1-10, December.
    2. Xinyu Chen & Shuaihua Lu & Qian Chen & Qionghua Zhou & Jinlan Wang, 2024. "From bulk effective mass to 2D carrier mobility accurate prediction via adversarial transfer learning," Nature Communications, Nature, vol. 15(1), pages 1-9, December.
    3. Luis M. Antunes & Keith T. Butler & Ricardo Grau-Crespo, 2024. "Crystal structure generation with autoregressive large language modeling," Nature Communications, Nature, vol. 15(1), pages 1-16, December.
    4. Yilei Wu & Chang-Feng Wang & Ming-Gang Ju & Qiangqiang Jia & Qionghua Zhou & Shuaihua Lu & Xinying Gao & Yi Zhang & Jinlan Wang, 2024. "Universal machine learning aided synthesis approach of two-dimensional perovskites in a typical laboratory," Nature Communications, Nature, vol. 15(1), pages 1-10, December.
    5. Luozhijie Jin & Zijian Du & Le Shu & Yan Cen & Yuanfeng Xu & Yongfeng Mei & Hao Zhang, 2025. "Transformer-generated atomic embeddings to enhance prediction accuracy of crystal properties with machine learning," Nature Communications, Nature, vol. 16(1), pages 1-11, December.
    6. Xiaoxin Zhang & Hongyuan He & Yu Chen & Guangming Yang & Xiao Xiao & Haiping Lv & Yongkang Xiang & Shuxiong Wang & Chang Jiang & Jianhui Li & Zhou Chen & Subiao Liu & Ning Yan & Xue Yong & Abdullah N., 2025. "Co-expression of multi-genes for polynary perovskite electrocatalysts for reversible solid oxide cells," Nature Communications, Nature, vol. 16(1), pages 1-14, December.
    7. Han Li & Ruotian Zhang & Yaosen Min & Dacheng Ma & Dan Zhao & Jianyang Zeng, 2023. "A knowledge-guided pre-training framework for improving molecular representation learning," Nature Communications, Nature, vol. 14(1), pages 1-13, December.
    8. Li, Yi & Liu, Kailong & Foley, Aoife M. & Zülke, Alana & Berecibar, Maitane & Nanini-Maury, Elise & Van Mierlo, Joeri & Hoster, Harry E., 2019. "Data-driven health estimation and lifetime prediction of lithium-ion batteries: A review," Renewable and Sustainable Energy Reviews, Elsevier, vol. 113(C), pages 1-1.
    9. Youssef El Arfaoui & Mohammed Khenfouch & Nabil Habiballah & Simone Giusepponi, 2025. "Engineering optoelectronic properties of the Pb-free perovskite FASiBr3 − XIX (X = 0, 1, 2 or 3) for photovoltaic applications: first principle analysis," The European Physical Journal B: Condensed Matter and Complex Systems, Springer;EDP Sciences, vol. 98(4), pages 1-15, April.
    10. Sarmad Dashti Latif & Ali Najah Ahmed, 2023. "A review of deep learning and machine learning techniques for hydrological inflow forecasting," Environment, Development and Sustainability: A Multidisciplinary Approach to the Theory and Practice of Sustainable Development, Springer, vol. 25(11), pages 12189-12216, November.
    11. Wang, Zixuan & Chen, Zijian & Wang, Boyuan & Wu, Chuang & Zhou, Chao & Peng, Yang & Zhang, Xinyu & Ni, Zongming & Chung, Chi-yung & Chan, Ching-chuen & Yang, Jian & Zhao, Haitao, 2025. "Digital manufacturing of perovskite materials and solar cells," Applied Energy, Elsevier, vol. 377(PB).
    12. Fozer, Daniel & Owsianiak, Mikołaj & Hauschild, Michael Zwicky, 2025. "Quantifying environmental learning and scaling rates for prospective life cycle assessment of e-ammonia production," Renewable and Sustainable Energy Reviews, Elsevier, vol. 213(C).
    13. Nina Miolane, 2025. "The fifth era of science: Artificial scientific intelligence," PLOS Biology, Public Library of Science, vol. 23(6), pages 1-4, June.
    14. Niklas W. A. Gebauer & Michael Gastegger & Stefaan S. P. Hessmann & Klaus-Robert Müller & Kristof T. Schütt, 2022. "Inverse design of 3d molecular structures with conditional generative neural networks," Nature Communications, Nature, vol. 13(1), pages 1-11, December.
    15. Yao, Qiuxiang & Wang, Linyang & Ma, Mingming & Ma, Li & He, Lei & Ma, Duo & Sun, Ming, 2024. "A quantitative investigation on pyrolysis behaviors of metal ion-exchanged coal macerals by interpretable machine learning algorithms," Energy, Elsevier, vol. 300(C).
    16. Gang Wang & Shinya Mine & Duotian Chen & Yuan Jing & Kah Wei Ting & Taichi Yamaguchi & Motoshi Takao & Zen Maeno & Ichigaku Takigawa & Koichi Matsushita & Ken-ichi Shimizu & Takashi Toyao, 2023. "Accelerated discovery of multi-elemental reverse water-gas shift catalysts using extrapolative machine learning approach," Nature Communications, Nature, vol. 14(1), pages 1-12, December.
    17. Erjian Cheng & Limin Yan & Xianbiao Shi & Rui Lou & Alexander Fedorov & Mahdi Behnami & Jian Yuan & Pengtao Yang & Bosen Wang & Jin-Guang Cheng & Yuanji Xu & Yang Xu & Wei Xia & Nikolai Pavlovskii & D, 2024. "Tunable positions of Weyl nodes via magnetism and pressure in the ferromagnetic Weyl semimetal CeAlSi," Nature Communications, Nature, vol. 15(1), pages 1-10, December.
    18. Huziel E. Sauceda & Luis E. Gálvez-González & Stefan Chmiela & Lauro Oliver Paz-Borbón & Klaus-Robert Müller & Alexandre Tkatchenko, 2022. "BIGDML—Towards accurate quantum machine learning force fields for materials," Nature Communications, Nature, vol. 13(1), pages 1-16, December.
    19. Sukriti Manna & Troy D. Loeffler & Rohit Batra & Suvo Banik & Henry Chan & Bilvin Varughese & Kiran Sasikumar & Michael Sternberg & Tom Peterka & Mathew J. Cherukara & Stephen K. Gray & Bobby G. Sumpt, 2022. "Learning in continuous action space for developing high dimensional potential energy models," Nature Communications, Nature, vol. 13(1), pages 1-10, December.
    20. Yuxi Ke & Eesha Sharma & Hannah K. Wayment-Steele & Winston R. Becker & Anthony Ho & Emil Marklund & William J. Greenleaf, 2025. "High-throughput DNA melt measurements enable improved models of DNA folding thermodynamics," Nature Communications, Nature, vol. 16(1), pages 1-19, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:natcom:v:16:y:2025:i:1:d:10.1038_s41467-025-61778-y. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.