IDEAS home Printed from https://ideas.repec.org/a/nat/natcom/v16y2025i1d10.1038_s41467-025-59232-0.html
   My bibliography  Save this article

Model-free estimation of completeness, uncertainties, and outliers in atomistic machine learning using information theory

Author

Listed:
  • Daniel Schwalbe-Koda

    (Lawrence Livermore National Laboratory
    University of California)

  • Sebastien Hamel

    (Lawrence Livermore National Laboratory)

  • Babak Sadigh

    (Lawrence Livermore National Laboratory)

  • Fei Zhou

    (Lawrence Livermore National Laboratory)

  • Vincenzo Lordi

    (Lawrence Livermore National Laboratory)

Abstract

An accurate description of information is relevant for a range of problems in atomistic machine learning (ML), such as crafting training sets, performing uncertainty quantification (UQ), or extracting physical insights from large datasets. However, atomistic ML often relies on unsupervised learning or model predictions to analyze information contents from simulation or training data. Here, we introduce a theoretical framework that provides a rigorous, model-free tool to quantify information contents in atomistic simulations. We demonstrate that the information entropy of a distribution of atom-centered environments explains known heuristics in ML potential developments, from training set sizes to dataset optimality. Using this tool, we propose a model-free UQ method that reliably predicts epistemic uncertainty and detects out-of-distribution samples, including rare events in systems such as nucleation. This method provides a general tool for data-driven atomistic modeling and combines efforts in ML, simulations, and physical explainability.

Suggested Citation

  • Daniel Schwalbe-Koda & Sebastien Hamel & Babak Sadigh & Fei Zhou & Vincenzo Lordi, 2025. "Model-free estimation of completeness, uncertainties, and outliers in atomistic machine learning using information theory," Nature Communications, Nature, vol. 16(1), pages 1-13, December.
  • Handle: RePEc:nat:natcom:v:16:y:2025:i:1:d:10.1038_s41467-025-59232-0
    DOI: 10.1038/s41467-025-59232-0
    as

    Download full text from publisher

    File URL: https://www.nature.com/articles/s41467-025-59232-0
    File Function: Abstract
    Download Restriction: no

    File URL: https://libkey.io/10.1038/s41467-025-59232-0?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Simon Batzner & Albert Musaelian & Lixin Sun & Mario Geiger & Jonathan P. Mailoa & Mordechai Kornbluth & Nicola Molinari & Tess E. Smidt & Boris Kozinsky, 2022. "E(3)-equivariant graph neural networks for data-efficient and accurate interatomic potentials," Nature Communications, Nature, vol. 13(1), pages 1-11, December.
    2. Kangming Li & Daniel Persaud & Kamal Choudhary & Brian DeCost & Michael Greenwood & Jason Hattrick-Simpers, 2023. "Exploiting redundancy in large materials datasets for efficient machine learning with less data," Nature Communications, Nature, vol. 14(1), pages 1-10, December.
    3. Amil Merchant & Simon Batzner & Samuel S. Schoenholz & Muratahan Aykol & Gowoon Cheon & Ekin Dogus Cubuk, 2023. "Scaling deep learning for materials discovery," Nature, Nature, vol. 624(7990), pages 80-85, December.
    4. Justin S. Smith & Benjamin Nebgen & Nithin Mathew & Jie Chen & Nicholas Lubbers & Leonid Burakovsky & Sergei Tretiak & Hai Ah Nam & Timothy Germann & Saryu Fensin & Kipton Barros, 2021. "Automated discovery of a robust interatomic potential for aluminum," Nature Communications, Nature, vol. 12(1), pages 1-13, December.
    5. Luis A. Zepeda-Ruiz & Alexander Stukowski & Tomas Oppelstrup & Vasily V. Bulatov, 2017. "Probing the limits of metal plasticity with molecular dynamics simulations," Nature, Nature, vol. 550(7677), pages 492-495, October.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Keke Song & Rui Zhao & Jiahui Liu & Yanzhou Wang & Eric Lindgren & Yong Wang & Shunda Chen & Ke Xu & Ting Liang & Penghua Ying & Nan Xu & Zhiqiang Zhao & Jiuyang Shi & Junjie Wang & Shuang Lyu & Zezhu, 2024. "General-purpose machine-learned potential for 16 elemental metals and their alloys," Nature Communications, Nature, vol. 15(1), pages 1-15, December.
    2. Ziduo Yang & Yi-Ming Zhao & Xian Wang & Xiaoqing Liu & Xiuying Zhang & Yifan Li & Qiujie Lv & Calvin Yu-Chian Chen & Lei Shen, 2024. "Scalable crystal structure relaxation using an iteration-free deep generative model with uncertainty quantification," Nature Communications, Nature, vol. 15(1), pages 1-15, December.
    3. Gaétan de Rassenfosse & Adam B. Jaffe & Joel Waldfogel, 2025. "Intellectual Property and Creative Machines," Entrepreneurship and Innovation Policy and the Economy, University of Chicago Press, vol. 4(1), pages 47-79.
    4. Wang, Zixuan & Chen, Zijian & Wang, Boyuan & Wu, Chuang & Zhou, Chao & Peng, Yang & Zhang, Xinyu & Ni, Zongming & Chung, Chi-yung & Chan, Ching-chuen & Yang, Jian & Zhao, Haitao, 2025. "Digital manufacturing of perovskite materials and solar cells," Applied Energy, Elsevier, vol. 377(PB).
    5. Chen, Xin & Zhang, Lin & Huang, JiangBo & Jin, Lei & Song, YongShi & Zheng, XianHua & Zou, ZhiXiong, 2025. "A thermodynamics-consistent machine learning approach for ammonia-water thermal cycles," Energy, Elsevier, vol. 315(C).
    6. Juno Nam & Jiayu Peng & Rafael Gómez-Bombarelli, 2025. "Interpolation and differentiation of alchemical degrees of freedom in machine learning interatomic potentials," Nature Communications, Nature, vol. 16(1), pages 1-14, December.
    7. Luis M. Antunes & Keith T. Butler & Ricardo Grau-Crespo, 2024. "Crystal structure generation with autoregressive large language modeling," Nature Communications, Nature, vol. 15(1), pages 1-16, December.
    8. Bin Han & Kuang Yu, 2025. "Refining potential energy surface through dynamical properties via differentiable molecular simulation," Nature Communications, Nature, vol. 16(1), pages 1-12, December.
    9. Jingbo Liu & Fan Jiang & Shinichi Tashiro & Shujun Chen & Manabu Tanaka, 2025. "A physics-informed and data-driven framework for robotic welding in manufacturing," Nature Communications, Nature, vol. 16(1), pages 1-18, December.
    10. Alessio Fallani & Leonardo Medrano Sandonas & Alexandre Tkatchenko, 2024. "Inverse mapping of quantum properties to structures for chemical space of small organic molecules," Nature Communications, Nature, vol. 15(1), pages 1-14, December.
    11. David Buterez & Jon Paul Janet & Dino Oglic & Pietro Liò, 2025. "An end-to-end attention-based approach for learning on graphs," Nature Communications, Nature, vol. 16(1), pages 1-16, December.
    12. Yusong Wang & Tong Wang & Shaoning Li & Xinheng He & Mingyu Li & Zun Wang & Nanning Zheng & Bin Shao & Tie-Yan Liu, 2024. "Enhancing geometric representations for molecules with equivariant vector-scalar interactive message passing," Nature Communications, Nature, vol. 15(1), pages 1-13, December.
    13. Rama Oktavian & Ruben Goeminne & Lawson T. Glasby & Ping Song & Racheal Huynh & Omid Taheri Qazvini & Omid Ghaffari-Nik & Nima Masoumifard & Joan L. Cordiner & Pierre Hovington & Veronique Speybroeck , 2024. "Gas adsorption and framework flexibility of CALF-20 explored via experiments and simulations," Nature Communications, Nature, vol. 15(1), pages 1-10, December.
    14. Taoyong Cui & Chenyu Tang & Dongzhan Zhou & Yuqiang Li & Xingao Gong & Wanli Ouyang & Mao Su & Shufei Zhang, 2025. "Online test-time adaptation for better generalization of interatomic potentials to out-of-distribution data," Nature Communications, Nature, vol. 16(1), pages 1-11, December.
    15. Lucien F. Krapp & Luciano A. Abriata & Fabio Cortés Rodriguez & Matteo Dal Peraro, 2023. "PeSTo: parameter-free geometric deep learning for accurate prediction of protein binding interfaces," Nature Communications, Nature, vol. 14(1), pages 1-11, December.
    16. Stefano Falletta & Andrea Cepellotti & Anders Johansson & Chuin Wei Tan & Marc L. Descoteaux & Albert Musaelian & Cameron J. Owen & Boris Kozinsky, 2025. "Unified differentiable learning of electric response," Nature Communications, Nature, vol. 16(1), pages 1-12, December.
    17. Mingfeng Liu & Jiantao Wang & Junwei Hu & Peitao Liu & Haiyang Niu & Xuexi Yan & Jiangxu Li & Haile Yan & Bo Yang & Yan Sun & Chunlin Chen & Georg Kresse & Liang Zuo & Xing-Qiu Chen, 2024. "Layer-by-layer phase transformation in Ti3O5 revealed by machine-learning molecular dynamics simulations," Nature Communications, Nature, vol. 15(1), pages 1-10, December.
    18. Jonathan P. Mailoa & Xin Li & Shengyu Zhang, 2024. "3T-VASP: fast ab-initio electrochemical reactor via multi-scale gradient energy minimization," Nature Communications, Nature, vol. 15(1), pages 1-11, December.
    19. Charlotte Loh & Thomas Christensen & Rumen Dangovski & Samuel Kim & Marin Soljačić, 2022. "Surrogate- and invariance-boosted contrastive learning for data-scarce applications in science," Nature Communications, Nature, vol. 13(1), pages 1-12, December.
    20. Hanwen Zhang & Veronika Juraskova & Fernanda Duarte, 2024. "Modelling chemical processes in explicit solvents with machine learning potentials," Nature Communications, Nature, vol. 15(1), pages 1-11, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:natcom:v:16:y:2025:i:1:d:10.1038_s41467-025-59232-0. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.