Presentation

· Presenters · Organizations · Search Program

ACM Student Research Competition

Poster

Reception

: A27: High-Performance and Scalable Broadcast Schemes for Deep Learning on GPU Clusters

SessionPoster Reception

Author

Ching-Hsiang Chu

Event Type

ACM Student Research Competition

Poster

Reception

Tags

TimeTuesday, November 14th5:15pm - 7pm

LocationFour Seasons Ballroom

DescriptionBroadcast operations are a widely used operation in many streaming and deep learning applications to disseminate large amounts of data on emerging heterogeneous High-Performance Computing (HPC) systems. Further, traditional broadcast schemes are not well optimized for upcoming large-scale Graphics Processing Unit (GPU)-based systems. However, utilizing cutting-edge features of modern HPC technologies such like InfiniBand (IB) and NVIDIA GPUs to enable scalable heterogeneous broadcast operations remains an open challenge.

Toward delivering the best performance for streaming and deep learning workloads, we propose high-performance and scalable broadcast schemes that exploit IB hardware multicast (IB-MCAST) and NVIDIA GPUDirect technology. We present experimental results and find that they indicate improved scalability and up to 68% reduction of latency compared to the state-of-the-art solutions in the benchmark-level evaluation. Furthermore, the proposed design yields up to 24% performance improvement for the popular deep learning framework, Microsoft cognitive toolkit (CNTK), with no application changes.

Author

Ching-Hsiang Chu

Ohio State University

Navigation