ScrubJay: Deriving Knowledge from the Disarray of HPC
Performance Data
SessionPerformance Evaluation Tools
Authors
Event Type
Paper
Applications
Performance
TimeWednesday, November 15th2:30pm -
3pm
Location402-403-404
DescriptionModern HPC centers comprise clusters, storage,
networks, power and cooling infrastructure, and more.
Analyzing the efficiency of these complex facilities is
a daunting task. Increasingly, facilities deploy sensors
and monitoring tools, but with millions of instrumented
components, analyzing collected data manually is
intractable. Data from an HPC center comprises different
formats, granularities, and semantics, and handwritten
scripts no longer suffice to transform the data into a
digestible form.
We present ScrubJay, an intuitive, scalable framework for automatic analysis of disparate HPC data. ScrubJay decouples the task of specifying data relationships from the task of analyzing data. Domain experts can store reusable transformations that describe the projection of one domain onto another. ScrubJay also automates performance analysis. Analysts provide a query over logical domains of interest, and ScrubJay automatically derives needed steps to transform raw measurements. ScrubJay makes large-scale analysis tractable, reproducible, and provides insights into HPC facilities.
We present ScrubJay, an intuitive, scalable framework for automatic analysis of disparate HPC data. ScrubJay decouples the task of specifying data relationships from the task of analyzing data. Domain experts can store reusable transformations that describe the projection of one domain onto another. ScrubJay also automates performance analysis. Analysts provide a query over logical domains of interest, and ScrubJay automatically derives needed steps to transform raw measurements. ScrubJay makes large-scale analysis tractable, reproducible, and provides insights into HPC facilities.
Download PDF:
here
Authors




