IDEAS home Printed from https://ideas.repec.org/a/nat/natcom/v16y2025i1d10.1038_s41467-025-59232-0.html
   My bibliography  Save this article

Model-free estimation of completeness, uncertainties, and outliers in atomistic machine learning using information theory

Author

Listed:
  • Daniel Schwalbe-Koda

    (Lawrence Livermore National Laboratory
    University of California)

  • Sebastien Hamel

    (Lawrence Livermore National Laboratory)

  • Babak Sadigh

    (Lawrence Livermore National Laboratory)

  • Fei Zhou

    (Lawrence Livermore National Laboratory)

  • Vincenzo Lordi

    (Lawrence Livermore National Laboratory)

Abstract

An accurate description of information is relevant for a range of problems in atomistic machine learning (ML), such as crafting training sets, performing uncertainty quantification (UQ), or extracting physical insights from large datasets. However, atomistic ML often relies on unsupervised learning or model predictions to analyze information contents from simulation or training data. Here, we introduce a theoretical framework that provides a rigorous, model-free tool to quantify information contents in atomistic simulations. We demonstrate that the information entropy of a distribution of atom-centered environments explains known heuristics in ML potential developments, from training set sizes to dataset optimality. Using this tool, we propose a model-free UQ method that reliably predicts epistemic uncertainty and detects out-of-distribution samples, including rare events in systems such as nucleation. This method provides a general tool for data-driven atomistic modeling and combines efforts in ML, simulations, and physical explainability.

Suggested Citation

  • Daniel Schwalbe-Koda & Sebastien Hamel & Babak Sadigh & Fei Zhou & Vincenzo Lordi, 2025. "Model-free estimation of completeness, uncertainties, and outliers in atomistic machine learning using information theory," Nature Communications, Nature, vol. 16(1), pages 1-13, December.
  • Handle: RePEc:nat:natcom:v:16:y:2025:i:1:d:10.1038_s41467-025-59232-0
    DOI: 10.1038/s41467-025-59232-0
    as

    Download full text from publisher

    File URL: https://www.nature.com/articles/s41467-025-59232-0
    File Function: Abstract
    Download Restriction: no

    File URL: https://libkey.io/10.1038/s41467-025-59232-0?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:natcom:v:16:y:2025:i:1:d:10.1038_s41467-025-59232-0. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.