P17: Fully Non-Blocking Communication-Computation Overlap
Using Assistant Cores toward Exascale Computing
SessionPoster Reception
Authors
Event Type
ACM Student Research Competition
Poster
Reception
TimeTuesday, November 14th5:15pm -
7pm
LocationFour Seasons Ballroom
DescriptionA fully non-blocking optimized
Communication-Computation overlap technique using
assistant cores (AC), which are independent from the
calculation cores, is proposed for the application to
the five-dimensional plasma turbulence simulation code
with spectral (FFT) and finite-difference schemes,
toward exascale supercomputing. The effects of
optimization are examined in Fujitsu FX100 (2.62PFlop/s)
with 32 ordinary cores and 2 Assistant cores/node, where
AC enables us to employ the fully non-blocking MPI
communications overlapped by the thread-parallelized
calculations with OpenMP Static scheduling with much
less overheads. It is clarified that the combination of
the non-blocking communications by AC and the static
scheduling leads to not only reduction in OpenMP
overhead, but also improved load/store and cash
performance, where about 22.5% improved numerical
performance is confirmed in comparison to the
conventional overlap by the master thread communications
with dynamic scheduling.
Authors




