Graph500 on OpenSHMEM: Using a Practical Survey of Past
Work to Motivate Novel Algorithmic Developments
Author/Presenters
Event Type
Workshop
Applications
Effective Application of HPC
Parallel Programming Languages, Libraries, Models
and Notations
Performance
Programming Systems
SIGHPC Workshop
Scientific Computing
TimeMonday, November 13th11am -
11:30am
Location702
DescriptionGraph500 is an open specification of a graph-based
benchmark for high-performance computing (HPC). The core
computational kernel of Graph500 is a breadth-first
search of an undirected graph. Unlike many other HPC
benchmarks, Graph500 is therefore characterized by
heavily irregular and fine-grain computation, memory
accesses, and network communication. Therefore, it can
serve as a more realistic stress test of modern HPC
hardware, software, and algorithmic techniques than
other benchmarking efforts.
On the other hand, OpenSHMEM is an open, PGAS, and SPMD specification of a communication model for communicating across large numbers of processing elements. OpenSHMEM explicitly focuses on applications characterized by fine-grain communication, of which Graph500 is one example.
Therefore, there is a natural synergy between the communication patterns of Graph500 and the capabilities of OpenSHMEM. In this work we explore that synergy by developing several novel implementations of Graph500 on various OpenSHMEM implementations. We contribute a review of the state-of-the-art in distributed Graph500 implementations, as well as a performance and programmability comparison between the state-of-the-art and our own OpenSHMEM-based implementations. Our results demonstrate improved scaling of Graph500's BFS kernel out to 1,024 nodes of the Edison supercomputer, achieving 2.5x performance improvement relative to the highest performing reference implementation at that scale.
On the other hand, OpenSHMEM is an open, PGAS, and SPMD specification of a communication model for communicating across large numbers of processing elements. OpenSHMEM explicitly focuses on applications characterized by fine-grain communication, of which Graph500 is one example.
Therefore, there is a natural synergy between the communication patterns of Graph500 and the capabilities of OpenSHMEM. In this work we explore that synergy by developing several novel implementations of Graph500 on various OpenSHMEM implementations. We contribute a review of the state-of-the-art in distributed Graph500 implementations, as well as a performance and programmability comparison between the state-of-the-art and our own OpenSHMEM-based implementations. Our results demonstrate improved scaling of Graph500's BFS kernel out to 1,024 nodes of the Edison supercomputer, achieving 2.5x performance improvement relative to the highest performing reference implementation at that scale.




