Presentation

· Presenters · Organizations · Search Program

Workshop

: Benchmarking and Evaluating Unified Memory for OpenMP GPU Offloading

SessionLLVM-HPC2017: Fourth Workshop on the LLVM Compiler Infrastructure in HPC

Author/Presenters

Alok Mishra

Lingda Li

Martin Kong

Hal Finkel

Barbara Chapman

Event Type

Workshop

Tags

TimeMonday, November 13th11am - 11:30am

Location710

DescriptionThe latest OpenMP standard offers automatic device offloading capabilities which facilitate GPU programming. Despite this, there remain many challenges. One of these is the unified memory feature introduced in recent GPUs. GPUs in current and future HPC systems have enhanced support for unified memory space. In such systems, CPU and GPU can access each other’s memory transparently, that is, the data movement is managed automatically by the underlying system software and hardware. Memory oversubscription is also possible in these systems. However, there is a significant lack of knowledge about how this mechanism will perform, and how programmers should use it. In this paper, we aim to study and improve the performance of unified memory for automatic GPU offloading via the OpenMP API and runtime, and leveraging the Rodinia benchmark suite. We also modify the LLVM compiler to allow OpenMP to use unified memory. Then we conduct our evaluation on these benchmarks. The results reveal that while the performance of unified memory is comparable with that of normal GPU offloading for benchmarks with little data reuse, it suffers from significant overhead when GPU memory is oversubcribed for benchmarks with large amount of data reuse. Based on these results, we provide several guidelines for programmers to achieve better performance with unified memory.

Author/Presenters

Alok Mishra

Stony Brook University

Lingda Li

Brookhaven National Laboratory

Martin Kong

Brookhaven National Laboratory

Hal Finkel

Argonne National Laboratory

Barbara Chapman

Stony Brook University

Navigation