Beyond 16GB: Out-of-Core Stencil Computations
Author/Presenters
Event Type
Workshop
Compiler Analysis and Optimization
NVRAM
Parallel Programming Languages, Libraries, Models
and Notations
Performance
SIGHPC Workshop
TimeSunday, November 12th11:35am -
12pm
Location702
DescriptionStencil computations are a key class of applications,
widely used in the scientific computing community, and a
class that has particularly benefited from performance
improvements on architectures with high memory
bandwidth. Unfortunately, such architectures come with a
limited amount of fast memory, which is limiting the
size of the problems that can be efficiently solved. In
this paper, we address this challenge by applying the
well-known cache-blocking tiling technique to large
scale stencil codes implemented using the OPS domain
specific language, such as CloverLeaf 2D, CloverLeaf 3D,
and OpenSBLI. We introduce a number of techniques and
optimisations to help manage data resident in fast
memory, and minimise data movement. Evaluating our work
on Intel's Knights Landing Platform as well as NVIDIA
P100 GPUs, we demonstrate that it is possible to solve 3
times larger problems than the on-chip memory size with
at most 15% loss in efficiency.




