Presentation

· Presenters · Organizations · Search Program

Paper

: Topology-Aware GPU Scheduling for Learning Workloads in Cloud Environments

SessionDistributed Computing and Clouds

Authors

Marcelo Carneiro do Amaral

Jorda Polo

David Carrera

Seetharami Seelam

Malgorzata Steinder

Event Type

Paper

Tags

TimeTuesday, November 14th4pm - 4:30pm

Location405-406-407

DescriptionRecent advances in hardware, such as systems with multiple GPUs and their availability in the cloud, are enabling deep learning in various domains including health care, autonomous vehicles, and Internet of Things. Multi-GPU systems exhibit complex connectivity among GPUs and between GPUs and CPUs. Workload schedulers must consider hardware topology and workload communication requirements in order to allocate CPU and GPU resources for optimal execution time and improved utilization in shared cloud environments.

This paper presents a new topology-aware workload placement strategy to schedule deep learning jobs on multi-GPU systems. The placement strategy is evaluated with a prototype on a Power8 machine with Tesla P100 cards, showing speedups of up to ≈1.30x compared to state-of-the-art strategies; the proposed algorithm achieves this result by allocating GPUs that satisfy workload requirements while preventing interference. Additionally, a large-scale simulation shows that the proposed strategy provides higher resource utilization and performance in cloud systems.

Download PDF: here

Authors

Marcelo Carneiro do Amaral

Barcelona Supercomputing Center

Jorda Polo

Barcelona Supercomputing Center

David Carrera

Barcelona Supercomputing Center

Seetharami Seelam

IBM

Malgorzata Steinder

IBM

Navigation