Supporting Task-level Fault-Tolerance in HPC Workflows by
Launching MPI Jobs inside MPI Jobs
Author/Presenter
Event Type
Workshop
TimeMonday, November 13th10:55am -
11:20am
Location501
DescriptionWhile the use of workflows for HPC is growing, MPI
interoperability remains a challenge for workflow
management systems. The MPI standard and/or its
implementations provide a number of ways to build
multiple-programs-multiple-data (MPMD) applications.
These methods present limitations related to fault
tolerance, and are not easy to use. In this paper, we
advocate for a novel MPI_Comm_launch function acting as
the parallel counterpart of a system(3) call.
MPI_Comm_launch allows a child MPI application to be
launched inside the resources originally held by
processes of a parent MPI application. Two important
aspects of MPI_Comm_launch is that it pauses the calling
process, and runs the child processes on the parent's
CPU cores, but in an isolated manner with respect to
memory. This function makes it easier to build MPMD
applications with well-decoupled subtasks. We show how
this feature can provide better flexibility and better
fault tolerance in ensemble simulations and HPC
workflows. We report results showing 2x throughput
improvement for application workflows with faults, and
scaling results for challenging workloads up to 256
nodes.
Author/Presenter




