IDEAS home Printed from https://ideas.repec.org/a/nat/natcom/v16y2025i1d10.1038_s41467-025-59232-0.html
   My bibliography  Save this article

Model-free estimation of completeness, uncertainties, and outliers in atomistic machine learning using information theory

Author

Listed:
  • Daniel Schwalbe-Koda

    (Lawrence Livermore National Laboratory
    University of California)

  • Sebastien Hamel

    (Lawrence Livermore National Laboratory)

  • Babak Sadigh

    (Lawrence Livermore National Laboratory)

  • Fei Zhou

    (Lawrence Livermore National Laboratory)

  • Vincenzo Lordi

    (Lawrence Livermore National Laboratory)

Abstract

An accurate description of information is relevant for a range of problems in atomistic machine learning (ML), such as crafting training sets, performing uncertainty quantification (UQ), or extracting physical insights from large datasets. However, atomistic ML often relies on unsupervised learning or model predictions to analyze information contents from simulation or training data. Here, we introduce a theoretical framework that provides a rigorous, model-free tool to quantify information contents in atomistic simulations. We demonstrate that the information entropy of a distribution of atom-centered environments explains known heuristics in ML potential developments, from training set sizes to dataset optimality. Using this tool, we propose a model-free UQ method that reliably predicts epistemic uncertainty and detects out-of-distribution samples, including rare events in systems such as nucleation. This method provides a general tool for data-driven atomistic modeling and combines efforts in ML, simulations, and physical explainability.

Suggested Citation

  • Daniel Schwalbe-Koda & Sebastien Hamel & Babak Sadigh & Fei Zhou & Vincenzo Lordi, 2025. "Model-free estimation of completeness, uncertainties, and outliers in atomistic machine learning using information theory," Nature Communications, Nature, vol. 16(1), pages 1-13, December.
  • Handle: RePEc:nat:natcom:v:16:y:2025:i:1:d:10.1038_s41467-025-59232-0
    DOI: 10.1038/s41467-025-59232-0
    as

    Download full text from publisher

    File URL: https://www.nature.com/articles/s41467-025-59232-0
    File Function: Abstract
    Download Restriction: no

    File URL: https://libkey.io/10.1038/s41467-025-59232-0?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Amil Merchant & Simon Batzner & Samuel S. Schoenholz & Muratahan Aykol & Gowoon Cheon & Ekin Dogus Cubuk, 2023. "Scaling deep learning for materials discovery," Nature, Nature, vol. 624(7990), pages 80-85, December.
    2. Justin S. Smith & Benjamin Nebgen & Nithin Mathew & Jie Chen & Nicholas Lubbers & Leonid Burakovsky & Sergei Tretiak & Hai Ah Nam & Timothy Germann & Saryu Fensin & Kipton Barros, 2021. "Automated discovery of a robust interatomic potential for aluminum," Nature Communications, Nature, vol. 12(1), pages 1-13, December.
    3. Simon Batzner & Albert Musaelian & Lixin Sun & Mario Geiger & Jonathan P. Mailoa & Mordechai Kornbluth & Nicola Molinari & Tess E. Smidt & Boris Kozinsky, 2022. "E(3)-equivariant graph neural networks for data-efficient and accurate interatomic potentials," Nature Communications, Nature, vol. 13(1), pages 1-11, December.
    4. Luis A. Zepeda-Ruiz & Alexander Stukowski & Tomas Oppelstrup & Vasily V. Bulatov, 2017. "Probing the limits of metal plasticity with molecular dynamics simulations," Nature, Nature, vol. 550(7677), pages 492-495, October.
    5. Kangming Li & Daniel Persaud & Kamal Choudhary & Brian DeCost & Michael Greenwood & Jason Hattrick-Simpers, 2023. "Exploiting redundancy in large materials datasets for efficient machine learning with less data," Nature Communications, Nature, vol. 14(1), pages 1-10, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Keke Song & Rui Zhao & Jiahui Liu & Yanzhou Wang & Eric Lindgren & Yong Wang & Shunda Chen & Ke Xu & Ting Liang & Penghua Ying & Nan Xu & Zhiqiang Zhao & Jiuyang Shi & Junjie Wang & Shuang Lyu & Zezhu, 2024. "General-purpose machine-learned potential for 16 elemental metals and their alloys," Nature Communications, Nature, vol. 15(1), pages 1-15, December.
    2. Ziduo Yang & Yi-Ming Zhao & Xian Wang & Xiaoqing Liu & Xiuying Zhang & Yifan Li & Qiujie Lv & Calvin Yu-Chian Chen & Lei Shen, 2024. "Scalable crystal structure relaxation using an iteration-free deep generative model with uncertainty quantification," Nature Communications, Nature, vol. 15(1), pages 1-15, December.
    3. Grigorii Skorupskii & Fabio Orlandi & Iñigo Robredo & Milena Jovanovic & Rinsuke Yamada & Fatmagül Katmer & Maia G. Vergniory & Pascal Manuel & Max Hirschberger & Leslie M. Schoop, 2024. "Designing giant Hall response in layered topological semimetals," Nature Communications, Nature, vol. 15(1), pages 1-11, December.
    4. Gaétan de Rassenfosse & Adam B. Jaffe & Joel Waldfogel, 2025. "Intellectual Property and Creative Machines," Entrepreneurship and Innovation Policy and the Economy, University of Chicago Press, vol. 4(1), pages 47-79.
    5. Andreas Erlebach & Martin Šípka & Indranil Saha & Petr Nachtigall & Christopher J. Heard & Lukáš Grajciar, 2024. "A reactive neural network framework for water-loaded acidic zeolites," Nature Communications, Nature, vol. 15(1), pages 1-14, December.
    6. Wang, Zixuan & Chen, Zijian & Wang, Boyuan & Wu, Chuang & Zhou, Chao & Peng, Yang & Zhang, Xinyu & Ni, Zongming & Chung, Chi-yung & Chan, Ching-chuen & Yang, Jian & Zhao, Haitao, 2025. "Digital manufacturing of perovskite materials and solar cells," Applied Energy, Elsevier, vol. 377(PB).
    7. Chen, Xin & Zhang, Lin & Huang, JiangBo & Jin, Lei & Song, YongShi & Zheng, XianHua & Zou, ZhiXiong, 2025. "A thermodynamics-consistent machine learning approach for ammonia-water thermal cycles," Energy, Elsevier, vol. 315(C).
    8. Juno Nam & Jiayu Peng & Rafael Gómez-Bombarelli, 2025. "Interpolation and differentiation of alchemical degrees of freedom in machine learning interatomic potentials," Nature Communications, Nature, vol. 16(1), pages 1-14, December.
    9. Li Zheng & Konstantinos Karapiperis & Siddhant Kumar & Dennis M. Kochmann, 2023. "Unifying the design space and optimizing linear and nonlinear truss metamaterials by generative modeling," Nature Communications, Nature, vol. 14(1), pages 1-14, December.
    10. Li Zhong & Yin Zhang & Xiang Wang & Ting Zhu & Scott X. Mao, 2024. "Atomic-scale observation of nucleation- and growth-controlled deformation twinning in body-centered cubic nanocrystals," Nature Communications, Nature, vol. 15(1), pages 1-9, December.
    11. J. Thorben Frank & Oliver T. Unke & Klaus-Robert Müller & Stefan Chmiela, 2024. "A Euclidean transformer for fast and stable machine learned force fields," Nature Communications, Nature, vol. 15(1), pages 1-16, December.
    12. Chenghan Li & Or Sharir & Shunyue Yuan & Garnet Kin-Lic Chan, 2025. "Image super-resolution inspired electron density prediction," Nature Communications, Nature, vol. 16(1), pages 1-9, December.
    13. David Buterez & Jon Paul Janet & Steven J. Kiddle & Dino Oglic & Pietro Lió, 2024. "Transfer learning with graph neural networks for improved molecular property prediction in the multi-fidelity setting," Nature Communications, Nature, vol. 15(1), pages 1-18, December.
    14. Luis M. Antunes & Keith T. Butler & Ricardo Grau-Crespo, 2024. "Crystal structure generation with autoregressive large language modeling," Nature Communications, Nature, vol. 15(1), pages 1-16, December.
    15. Yaolong Zhang & Bin Jiang, 2023. "Universal machine learning for the response of atomistic systems to external fields," Nature Communications, Nature, vol. 14(1), pages 1-11, December.
    16. Wei Lu & Jixian Zhang & Weifeng Huang & Ziqiao Zhang & Xiangyu Jia & Zhenyu Wang & Leilei Shi & Chengtao Li & Peter G. Wolynes & Shuangjia Zheng, 2024. "DynamicBind: predicting ligand-specific protein-ligand complex structure with a deep equivariant generative model," Nature Communications, Nature, vol. 15(1), pages 1-13, December.
    17. Adil Kabylda & Valentin Vassilev-Galindo & Stefan Chmiela & Igor Poltavsky & Alexandre Tkatchenko, 2023. "Efficient interatomic descriptors for accurate machine learning force fields of extended molecules," Nature Communications, Nature, vol. 14(1), pages 1-12, December.
    18. Bin Han & Kuang Yu, 2025. "Refining potential energy surface through dynamical properties via differentiable molecular simulation," Nature Communications, Nature, vol. 16(1), pages 1-12, December.
    19. Junjie Wang & Yong Wang & Haoting Zhang & Ziyang Yang & Zhixin Liang & Jiuyang Shi & Hui-Tian Wang & Dingyu Xing & Jian Sun, 2024. "E(n)-Equivariant cartesian tensor message passing interatomic potential," Nature Communications, Nature, vol. 15(1), pages 1-9, December.
    20. Jingbo Liu & Fan Jiang & Shinichi Tashiro & Shujun Chen & Manabu Tanaka, 2025. "A physics-informed and data-driven framework for robotic welding in manufacturing," Nature Communications, Nature, vol. 16(1), pages 1-18, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:natcom:v:16:y:2025:i:1:d:10.1038_s41467-025-59232-0. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.