Snowpack: Efficient Parameter Choice for GPU Kernels via
Static Analysis and Statistical Prediction
Author/Presenters
Event Type
Workshop
Algorithms
Exascale
Resiliency
SIGHPC Workshop
TimeMonday, November 13th11:30am -
11:50am
Location607
DescriptionThe running time of GPU kernels depends on an
invocation parameter, the number of threads in each
thread block. Sometime the dependence is quite strong
leading to 50-100% change in execution time for
long-running kernels. Until now, it has been an art form
to decide on the optimal setting for this parameter.
Nvidia provides a tool for CUDA kernels, called OCC,
that guides a developer toward this goal. In this paper,
we show that OCC maximizes occupancy of GPU cores but
does not meet the performance goal in a wide class of
applications. We develop a solution called Snowpack that
uses static features in a statistical learning framework
to choose the optimal block size parameter. It does this
without needing to execute the kernel multiple times, as
a possible alternate solution Autotuner does. We
evaluate our solution, Snowpack, on 89 kernels of 10
applications.




