Concurrent parallel processing on Graphics and Multicore
Processors with OpenACC and OpenMP
Presenter
Event Type
Workshop
Accelerators
Compilers
Parallel Programming Languages, Libraries, Models
and Notations
Runtime Systems
TimeMonday, November 13th2:30pm -
3pm
Location712
DescriptionHierarchical parallel computing is rapidly becoming
ubiquitous in high performance computing (HPC) systems.
Programming models used commonly in turbomachinery and
other engineering simulation codes have traditionally
relied upon distributed memory parallelism with MPI and
have ignored thread and data parallelism. This paper
presents methods for programming multi-block codes for
concurrent computational on host multicore CPUs and
many-core accelerators such as graphics processing
units. Portable and standardized language directives are
used to expose data and thread parallelism within the
hybrid shared- and distributed-memory simulation system.
A single-source, multiple-object strategy is used to
simplify code management and allow for heterogeneous
computing. Automated load balancing is implemented to
determine what portions of the domain are computed by
the multicore CPUs and GPUs. Benchmark results show that
significant parallel speed-up is attainable on multicore
CPUs and many-core devices such as the Intel Xeon Phi
Knights Landing using OpenMP SIMD and thread parallel
directives. Modest speed-up, relative to a CPU core, was
achieved with OpenACC offloading to NVIDIA GPUs.
Combining both GPU offloading with multicore host
parallelism improved the single-device performance by
30% but further speed-up was not realized when more
heterogeneous CPU-GPU device pairs were included.




