Author
Listed:
- Vahid Jafari
(Helmut Schmidt University, Chair for High Performance Computing)
- Piet Jarmatz
(Helmut Schmidt University, Chair for High Performance Computing)
- Helene Wittenberg
(Helmut Schmidt University, Chair for High Performance Computing)
- Amartya Das Sharma
(Helmut Schmidt University, Chair for High Performance Computing)
- Louis Viot
(Helmut Schmidt University, Chair for High Performance Computing)
- Felix Maurer
(Helmut Schmidt University, Chair for High Performance Computing)
- Niklas Wittmer
(Helmut Schmidt University, Chair for High Performance Computing)
- Philipp Neumann
(Helmut Schmidt University, Chair for High Performance Computing)
Abstract
Molecular-continuum simulations couple molecular dynamics (MD) and computational fluid dynamics (CFD) simulations in a domain decomposition sense to assess fluid flow, e.g., in process engineering applications, at the nanoscale. Running these simulations on extreme-scale supercomputers, an issue consists in single compute cores or nodes failing due to hardware- or software-sided errors. This imposes a challenge to robustness of numerical simulations and, as such, also to molecular-continuum systems. We introduce a fault tolerance method in our macro-micro-coupling tool (MaMiCo) that has been developed in the past as molecular-continuum simulation software solution. With MaMiCo leveraging ensemble simulations to cope with statistical errors in the MD solutions, we extended the ensemble approach to recognize failing MPI processes and react to these failures. Once a failure is encountered, the affected MD simulations are removed from these MPI processes and relaunched on well-operating MPI process groups. We detail our approach and report scalability results for our approach, achieved on the supercomputer HAWK at HLRS.
Suggested Citation
Vahid Jafari & Piet Jarmatz & Helene Wittenberg & Amartya Das Sharma & Louis Viot & Felix Maurer & Niklas Wittmer & Philipp Neumann, 2024.
"Fault Tolerant Molecular-Continuum Flow Simulation,"
Springer Books, in: Wolfgang E. Nagel & Dietmar H. Kröner & Michael M. Resch (ed.), High Performance Computing in Science and Engineering '22, pages 463-475,
Springer.
Handle:
RePEc:spr:sprchp:978-3-031-46870-4_30
DOI: 10.1007/978-3-031-46870-4_30
Download full text from publisher
To our knowledge, this item is not available for
download. To find whether it is available, there are three
options:
1. Check below whether another version of this item is available online.
2. Check on the provider's
web page
whether it is in fact available.
3. Perform a
for a similarly titled item that would be
available.
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:sprchp:978-3-031-46870-4_30. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.