Leveraging NVLINK and Asynchronous Data Transfer to Scale
Beyond the Memory Capacity of GPUs
Author/Presenters
Event Type
Workshop
Algorithms
Exascale
Resiliency
SIGHPC Workshop
TimeMonday, November 13th11:50am -
12:10pm
Location607
DescriptionIn this paper we demonstrate the utility of fast GPU to
CPU interconnects to weak scale on hierarchical nodes
without being limited to problem sizes that fit only in
the GPU memory capacity. We show the speedup possible
for a new regime of algorithms which traditionally have
not benefited from being ported to GPUs because of an
insufficient amount of computational work relative to
bytes of data that must be transferred (offload
intensity). This new capability is demonstrated with an
example of our hierarchical GPU port of UMT, the 51K
line CORAL benchmark application for Lawrence Livermore
National Lab's radiation transport code. By overlapping
data transfers and using the NVLINK connection between
IBM POWER 8 CPUs and NVIDIA P100 GPUs, we demonstrate a
speedup that continues even when scaling the problem
size well beyond the memory capacity of the GPUs.
Author/Presenters




