P01: Cache-Blocking Tiling of Large Stencil Codes at
Runtime
SessionPoster Reception
Event Type
ACM Student Research Competition
Poster
Reception
TimeTuesday, November 14th5:15pm -
7pm
LocationFour Seasons Ballroom
DescriptionStencil codes on structured meshes are well-known to be
bound by memory bandwidth. Previous research has shown
that compiler techniques that reorder loop schedules to
improve temporal locality across loop nests, such as
tiling, work particularly well. However in large codes
the scope of such analysis is limited by the large
number of code paths, compilation units, and run-time
parameters. We present how, through run-time analysis of
data dependencies across stencil loops enables the OPS
domain specific language to tile across a large number
of different loops. This lets us tackle much larger
applications than previously studied: we demonstrate
1.7-3.5x performance improvement on CloverLeaf 2D,
CloverLeaf 3D, TeaLeaf and OpenSBLI, tiling across up to
650 subsequent loopnests accessing up to 30 different
state variables per gridpoint with up to 46 different
stencils. We also demonstrate excellent strong and weak
scalability of our approach on up to 4608 Broadwell
cores.




