P31: Understanding the Performance of Small Convolution
Operations for CNN on Intel Architecture
SessionPoster Reception
Authors
Event Type
ACM Student Research Competition
Poster
Reception
TimeTuesday, November 14th5:15pm -
7pm
LocationFour Seasons Ballroom
DescriptionConvolution layers are prevalent in many classes of
deep neural networks, including Convolutional Neural
Networks (CNNs) which provide state-of-the-art results
for tasks like image recognition, natural language
processing, and speech recognition. The computationally
expensive nature of a convolution operation has led to
the proliferation of implementations including
matrix-matrix multiplication formulation,
FFT-formulation, Winograd transformation, and direct
convolution primarily targeting GPUs. In this paper, we
optimize a direct convolution and Winograd
implementation for x86 architectures, in particular for
Xeon Phi systems, via a dynamic compilation approach. We
then show how these JIT optimizations can be integrated
in a high-level domain-specific language setting. We
shed light on what is possible and what is not possible
based on different data-formats and blocking techniques.
Our JIT-based Ninja implementation shows close to
theoretical peak results on modern x86 architectures,
depending on setting and the CPU architecture at
hand.




