A Scalable Analytical Memory Model for CPU Performance
Prediction
Author/Presenters
Event Type
Workshop
Accelerators
Benchmarks
Compiler Analysis and Optimization
Deep Learning
Effective Application of HPC
Energy
Exascale
GPU
I/O
Parallel Application Frameworks
Parallel Programming Languages, Libraries, Models
and Notations
Performance
Simulation
Storage
TimeMonday, November 13th4pm -
4:30pm
Location704-706
DescriptionAs the US Department of Energy (DOE) invests in
exascale computing, performance modeling of physics
codes on CPUs remain a challenge in computational
co-design due to the complex design of processors
including memory hierarchies, instruction pipelining,
and speculative execution. We present Analytical Memory
Model (AMM), a model of cache memory hierarchy, embedded
in the Performance Prediction Toolkit (PPT) -- a suite
of discrete-event-simulation-based co-design hardware
and software models. AMM enables PPT to significantly
improve the quality of its runtime predictions of
scientific codes.
AMM uses a computationally efficient, stochastic method to predict the reuse distance profiles of codes, where reuse distance is a hardware architecture-independent measure of the patterns of virtual memory accesses. AMM relies on a stochastic, static basic block-level analysis of reuse profiles measured from the memory traces of applications on small instances. The analytical reuse distribution is useful to estimate the effective latency and throughput of memory access, which in turn are used to predict the overall runtime of a scientific application.
Our experimental results demonstrate the scalability of AMM, where the predicted and actual runtimes of three scientific mini-applications are similar.
AMM uses a computationally efficient, stochastic method to predict the reuse distance profiles of codes, where reuse distance is a hardware architecture-independent measure of the patterns of virtual memory accesses. AMM relies on a stochastic, static basic block-level analysis of reuse profiles measured from the memory traces of applications on small instances. The analytical reuse distribution is useful to estimate the effective latency and throughput of memory access, which in turn are used to predict the overall runtime of a scientific application.
Our experimental results demonstrate the scalability of AMM, where the predicted and actual runtimes of three scientific mini-applications are similar.
Author/Presenters




