A Slurm Simulator: Implementation and Parametric Analysis
Author/Presenters
Event Type
Workshop
Accelerators
Benchmarks
Compiler Analysis and Optimization
Deep Learning
Effective Application of HPC
Energy
Exascale
GPU
I/O
Parallel Application Frameworks
Parallel Programming Languages, Libraries, Models
and Notations
Performance
Simulation
Storage
TimeMonday, November 13th4:30pm -
5pm
Location704-706
DescriptionSlurm is an open-source resource manager for HPC that
provides high configurability for inhomogeneous
resources and job scheduling. Various Slurm parametric
settings can significantly influence HPC resource
utilization and job wait time, however in many cases it
is hard to judge how these options will affect the
overall HPC resource performance. The Slurm simulator
can be a very helpful tool to aid parameter selection
for a particular HPC resource. Here, we report our
implementation of a Slurm simulator and the impact of
parameter choice on HPC resource performance. The
simulator is based on a real Slurm instance with
modifications to allow simulation of historical jobs and
to improve the simulation speed. The simulator speed
heavily depends on job composition, HPC resource size
and Slurm configuration. For an 8000 cores heterogeneous
cluster, we achieve about 100 times acceleration, e.g.
20 days can be simulated in 5 hours.
Author/Presenters




