REFINE: Realistic Fault Injection via Compiler-Based
Instrumentation for Accuracy, Portability and Speed
Event Type
Paper
Reliability
System Software
TimeWednesday, November 15th11am -
11:30am
Location405-406-407
DescriptionCompiler-based fault injection (FI) has become a
popular technique for resilience studies to understand
the impact of soft errors in supercomputing systems.
Compiler-based FI frameworks inject faults at a high
intermediate-representation level. However, they are
less accurate than machine code, binary-level FI because
they lack access to all dynamic instructions, thus they
fail to mimic certain fault manifestations. In this
paper, we study the limitations of current practices in
compiler-based FI and how they impact the interpretation
of results in resilience studies.
We propose REFINE, a novel framework that addresses these limitations, performing FI in a compiler backend. Our approach provides the portability and efficiency of compiler-based FI while keeping accuracy comparable to binary-level FI methods. We demonstrate our approach in 14 HPC benchmarks and show that, due to our unique design, its runtime overhead is significantly smaller than state-of-the-art compiler-based FI frameworks, reducing the time for large FI experiments.
We propose REFINE, a novel framework that addresses these limitations, performing FI in a compiler backend. Our approach provides the portability and efficiency of compiler-based FI while keeping accuracy comparable to binary-level FI methods. We demonstrate our approach in 14 HPC benchmarks and show that, due to our unique design, its runtime overhead is significantly smaller than state-of-the-art compiler-based FI frameworks, reducing the time for large FI experiments.
Download PDF:
here




