Evaluating On-Node GPU Interconnects for Deep Learning
Workloads
Author/Presenters
Event Type
Workshop
Accelerators
Benchmarks
Compiler Analysis and Optimization
Deep Learning
Effective Application of HPC
Energy
Exascale
GPU
I/O
Parallel Application Frameworks
Parallel Programming Languages, Libraries, Models
and Notations
Performance
Simulation
Storage
TimeMonday, November 13th9:01am -
9:30am
Location704-706
DescriptionScaling deep learning workloads across multiple GPUs on
a single node has become increasingly important in data
analytics. A key question is how well a PCIe-based GPU
interconnect can perform relative to a custom
high-performance interconnect such as NVIDIA's NVLink.
This paper evaluates two such on-node interconnects for
eight NVIDIA Pascal P100 GPUs: (a) the NVIDIA DGX-1's
NVLink 1.0 'hybrid cube mesh'; and (b) the Cirrascale
GX8's two-level PCIe tree using dual SR3615 switch
risers. To show the effects of a range of neural network
workloads, we define a parameterized version of the
popular ResNet architecture. We define a workload
intensity metric that characterizes the expected
computation/communication ratio; we also locate AlexNet
and GoogLeNet within that space. As expected, the DGX-1
typically has superior performance. However, the GX8 is
very competitive on all ResNet workloads. With 8 GPUs,
the GX8 can outperform the DGX-1 on all-to-all
reductions by 10% for medium-sized payloads; and in rare
cases, the GX8 slightly outperforms on ResNet.




