P08: Performance Optimization of Matrix-free
Finite-Element Algorithms within deal.II
SessionPoster Reception
Authors
Event Type
ACM Student Research Competition
Poster
Reception
TimeTuesday, November 14th5:15pm -
7pm
LocationFour Seasons Ballroom
DescriptionWe present a performance comparison of highly tuned
matrix-free finite element kernels from the deal.II
finite element library on three contemporary computer
architectures, an NVIDIA P100 GPU, an Intel Knights
Landing Xeon Phi, and two multi-core Intel CPUs. The
algorithms are based on fast integration on hexahedra
using sum factorization techniques. On Cartesian meshes
with a relatively high arithmetic intensity, the four
architectures provide a surprisingly similar
computational throughput. On curved meshes, the kernel
is heavily memory bandwidth limited which reveals
distinct differences between the architectures: the P100
is twice as fast as KNL, and almost four times as fast
as the Haswell and Broadwell CPUs, effectively
leveraging the higher memory bandwidth and the favorable
shared memory programming model on the GPU.




