Investigating Half-Precision Arithmetic to Accelerate
Dense Linear System Solvers
Author/Presenters
Event Type
Workshop
Algorithms
Exascale
Resiliency
SIGHPC Workshop
TimeMonday, November 13th2:40pm -
3pm
Location607
DescriptionThe use of low-precision arithmetic in mixed-precision
computing methods has been a powerful tool to accelerate
numerous scientific computing applications. Artificial
intelligence (AI), in particular, has pushed this to
current extremes, making use of half-precision
floating-point arithmetic (FP16) in approaches based on
neural networks. The appeal of FP16 is in the high
performance that can be achieved using it on today's
powerful manycore GPU accelerators, e.g., like the
NVIDIA V100, that can provide 120 TeraFLOPS alone in
FP16. We present an investigation showing that other HPC
applications can harness this power too, and, in
particular, the general HPC problem of solving (A x =
b), where A is a large dense matrix, and the solution is
needed in FP32 or FP64 accuracy. Our approach is based
on the mixed-precision iterative refinement technique --
we generalize and extend prior advances into a
framework, for which we develop architecture-specific
algorithms and highly-tuned implementations that resolve
the main computational challenges of efficiently
parallelizing, scaling, and using FP16 arithmetic in the
approach on high-end GPUs. Subsequently, we show for the
first time how the use of FP16 arithmetic can
significantly accelerate, as well as make more energy
efficient, FP32 or FP64-precision (A x = b) solvers. Our
results are reproducible and the developments will be
made available through the MAGMA library. We quantify in
practice the performance and limitations of the
approach.




