Benchmarking and Evaluating Unified Memory for OpenMP GPU
Offloading
Author/Presenters
Event Type
Workshop
Compiler Analysis and Optimization
Compilers
Debugging
Parallel Programming Languages, Libraries, Models
and Notations
Program Transformation
SIGHPC Workshop
TimeMonday, November 13th11am -
11:30am
Location710
DescriptionThe latest OpenMP standard offers automatic device
offloading capabilities which facilitate GPU
programming. Despite this, there remain many challenges.
One of these is the unified memory feature introduced in
recent GPUs. GPUs in current and future HPC systems have
enhanced support for unified memory space. In such
systems, CPU and GPU can access each other’s memory
transparently, that is, the data movement is managed
automatically by the underlying system software and
hardware. Memory oversubscription is also possible in
these systems. However, there is a significant lack of
knowledge about how this mechanism will perform, and how
programmers should use it. In this paper, we aim to
study and improve the performance of unified memory for
automatic GPU offloading via the OpenMP API and runtime,
and leveraging the Rodinia benchmark suite. We also
modify the LLVM compiler to allow OpenMP to use unified
memory. Then we conduct our evaluation on these
benchmarks. The results reveal that while the
performance of unified memory is comparable with that of
normal GPU offloading for benchmarks with little data
reuse, it suffers from significant overhead when GPU
memory is oversubcribed for benchmarks with large amount
of data reuse. Based on these results, we provide
several guidelines for programmers to achieve better
performance with unified memory.




