Presentation

· Presenters · Organizations · Search Program

Paper

: Scalable Reduction Collectives with Data Partitioning-Based Multi-Leader Design

SessionOptimizing MPI

Authors

Mohammadreza Bayatpour

Sourav Chakraborty

Hari Subramoni

Xiaoyi Lu

Dhabaleswar K. Panda

Event Type

Paper

Tags

TimeThursday, November 16th4:30pm - 5pm

Location405-406-407

DescriptionExisting designs for MPI Allreduce do not take advantage of the vast parallelism available in modern multi-/many-core processors like Intel Xeon/Xeon Phi or the increases in communication throughput and recent advances in high-end features seen with modern interconnects like InfiniBand and OmniPath. In this paper, we propose a high-performance and scalable Data Partitioning-based Multi-Leader (DPML) solution for MPI Allreduce that can take advantage of the parallelism offered by multi-/many-core architectures in conjunction with high throughput and high-end features offered by InfiniBand and Omni-Path to significantly enhance the performance of MPI Allreduce on modern HPC systems. We also model DPML-based designs to analyze the communication costs theoretically. Microbenchmark level evaluations show that the proposed DPML-based designs are able to deliver up to 3.5 times performance improvement for MPI Allreduce for multiple HPC systems at scale. At the application-level, up to 35% and 60% improvements is seen for HPCG and miniAMR respectively.

Download PDF: here

Authors

Mohammadreza Bayatpour

Ohio State University

Sourav Chakraborty

Ohio State University

Hari Subramoni

Ohio State University

Xiaoyi Lu

Ohio State University

Dhabaleswar K. Panda

Ohio State University

Navigation