P85: GPU Mekong: Simplified Multi-GPU Programming Using
Automated Partitioning
SessionPoster Reception
Author
Event Type
ACM Student Research Competition
Poster
Reception
TimeTuesday, November 14th5:15pm -
7pm
LocationFour Seasons Ballroom
DescriptionGPU accelerators are pervasively used in the HPC
community, because they provide excellent computational
performance at a reasonable power efficiency. While
programming single-GPU applications is comparatively
productive, programming multiple GPUs using
data-parallel languages is tedious and error prone as
the user has to manually orchestrate data movements and
kernel launches.
The Mekong research project is driven by the motivation to improve productivity of multi-GPU systems by compiler based partitioning of single-device data-parallel programs. Key to scalable performance improvement is the resolution of data dependencies between kernels and the orchestration of these kernels. Mekong relies on polyhedral compilation to identify memory access patterns in order to compile a single-GPU application into a multi-GPU application.
In this work, the Mekong project is introduced and its components explained. While the tool is still under development, preliminary results are available and are shortly discussed demonstrating the potential of this approach.
The Mekong research project is driven by the motivation to improve productivity of multi-GPU systems by compiler based partitioning of single-device data-parallel programs. Key to scalable performance improvement is the resolution of data dependencies between kernels and the orchestration of these kernels. Mekong relies on polyhedral compilation to identify memory access patterns in order to compile a single-GPU application into a multi-GPU application.
In this work, the Mekong project is introduced and its components explained. While the tool is still under development, preliminary results are available and are shortly discussed demonstrating the potential of this approach.




