Path-Synchronous Performance Monitoring in HPC
Interconnection Networks with Source-Code Attribution
Author/Presenters
Event Type
Workshop
Accelerators
Benchmarks
Compiler Analysis and Optimization
Deep Learning
Effective Application of HPC
Energy
Exascale
GPU
I/O
Parallel Application Frameworks
Parallel Programming Languages, Libraries, Models
and Notations
Performance
Simulation
Storage
TimeMonday, November 13th11:30am -
11:50am
Location704-706
DescriptionPerformance anomalies involving interconnection
networks have largely remained a “black box” for
developers relying on traditional CPU profilers.
Network-side profilers collect aggregate statistics and
lack source-code attribution. We have incorporated an
effective protocol extension in the Gen-Z communication
protocol for tagging network packets in an
interconnection network; additionally, we have backed
the protocol extension with hardware and software
enhancements that allow tracking the flow of a network
transaction through every hop in the interconnection
network and associate it back to the application source
code. The result is a first-of-its-kind
hardware-assisted telemetry of disparate, autonomous
interconnection networking components with application
source code association that offers better developer
insights. Our scheme works on a sampling basis to ensure
low runtime overhead and generates modest volumes of
data. Simulation of our methods in the open-source
Structural Simulation Toolkit (SST/Macro) shows its
effectiveness - deep insights into the underlying
network details to the developer at minimal
overheads.
Author/Presenters




