Presentation

· Presenters · Organizations · Search Program

ACM Student Research Competition

Poster

Reception

: P31: Understanding the Performance of Small Convolution Operations for CNN on Intel Architecture

SessionPoster Reception

Authors

Alexander Heinecke

Evangelos Georganas

Kunal Banerjee

Dhiraj Kalmakar

Narayanan Sundaram

Anand Venkat

Greg Henry

Hans Pabst

Event Type

ACM Student Research Competition

Poster

Reception

Tags

TimeTuesday, November 14th5:15pm - 7pm

LocationFour Seasons Ballroom

DescriptionConvolution layers are prevalent in many classes of deep neural networks, including Convolutional Neural Networks (CNNs) which provide state-of-the-art results for tasks like image recognition, natural language processing, and speech recognition. The computationally expensive nature of a convolution operation has led to the proliferation of implementations including matrix-matrix multiplication formulation, FFT-formulation, Winograd transformation, and direct convolution primarily targeting GPUs. In this paper, we optimize a direct convolution and Winograd implementation for x86 architectures, in particular for Xeon Phi systems, via a dynamic compilation approach. We then show how these JIT optimizations can be integrated in a high-level domain-specific language setting. We shed light on what is possible and what is not possible based on different data-formats and blocking techniques. Our JIT-based Ninja implementation shows close to theoretical peak results on modern x86 architectures, depending on setting and the CPU architecture at hand.

Authors

Alexander Heinecke

Intel Corporation

Evangelos Georganas

Intel Corporation

Kunal Banerjee

Intel Corporation

Dhiraj Kalmakar

Intel Corporation

Narayanan Sundaram

Intel Corporation

Anand Venkat

Intel Corporation

Greg Henry

Intel Corporation

Hans Pabst

Intel Corporation

Navigation