Presentation

· Presenters · Organizations · Search Program

ACM Student Research Competition

Poster

Reception

: P82: Performance Evaluation of the NVIDIA Tesla P100: Our Directive-Based Partitioning and Pipelining vs. NVIDIA’s Unified Memory

SessionPoster Reception

Authors

Xuewen Cui

Thomas R. W. Scogland

Bronis R. de Supinski

Wu-chun Feng

Event Type

ACM Student Research Competition

Poster

Reception

Tags

TimeTuesday, November 14th5:15pm - 7pm

LocationFour Seasons Ballroom

DescriptionWe need simpler mechanisms to leverage the performance of accelerators, such as GPUs, in supercomputers. Programming models like OpenMP offer simple-to-use but powerful directive-based offload mechanisms. By default, these models naively copy data to or from the device without overlapping computation. Achieving performance can require extensive hand-tuning to apply optimizations such as pipelining. Users must manually partition data whenever it exceeds device memory. Our directive-based partitioning and pipelining extension for accelerators overlaps data transfers and kernel computation without explicit user data-splitting. We compare a prototype implementation of our extension to NVIDIA's Unified Memory on the Pascal P100 GPU and find that our extension outperforms Unified Memory on average by 68% for data sets that fit into GPU memory and 550% for those that do not.