Addressing Global Data Dependencies in Heterogeneous
Asynchronous Runtime Systems on GPUs
Author/Presenters
Event Type
Workshop
Accelerators
Deep Learning
Exascale
GPU
Parallel Application Frameworks
Parallel Programming Languages, Libraries, Models
and Notations
SIGHPC Workshop
System Software
TimeSunday, November 12th10:30am -
11am
Location505
DescriptionLarge-scale parallel applications with complex global
data dependencies beyond those of reductions pose
significant scalability challenges in an asynchronous
runtime system. Internodal challenges include
identifying the all-to-all communication of data
dependencies among the nodes. Intranodal challenges
include gathering together these data dependencies into
usable data objects while avoiding data duplication.
This paper addresses these challenges within the context
of a large-scale, industrial coal boiler simulation
using the Uintah asynchronous many-task runtime system
on GPU architectures. We show significant reduction in
time spent analyzing data dependencies through
refinements in our dependency search algorithm. Multiple
task graphs are used to eliminate subsequent analysis
when task graphs change in predictable and repeatable
ways. Using a combined data store and task scheduler
redesign reduces data dependency duplication ensuring
that problems fit within host and GPU memory. These
modifications did not require any changes to application
code or sweeping changes to the Uintah runtime system.
We report results running on the DOE Titan system on
119K CPU cores and 7.5K GPUs simultaneously. Our
solutions can be generalized to other task dependency
problems with global dependencies among thousands of
nodes which must be processed efficiently at large
scale.
Author/Presenters




