A Machine Learning Approach for Modular Workflow
Performance Prediction
Author/Presenters
Event Type
Workshop
TimeMonday, November 13th3:50pm -
4:15pm
Location501
DescriptionScientific workflows provide an opportunity for
declarative computational experiment design in an
intuitive and efficient way. A distributed workflow is
typically executed on a variety of resources and it uses
a variety of computational algorithms or tools to
achieve the desired outcomes. Such a variety imposes
additional complexity in scheduling these workflows on
large scale computers. As computation becomes more
distributed, insights into expected workload that a
workflow presents become critical for effective resource
allocation. In this paper, we present a modular
framework that leverages Machine Learning for creating
precise performance predictions of a workflow. The
central idea is to partition a workflow in such a way
that makes the task of forecasting each atomic unit
manageable and gives us a way to combine the individual
predictions efficiently. We recognize a combination of
an executable and a specific physical resource as a
single module. This gives us a handle to characterize
workload and machine power as a single unit of
prediction. The modular approach of the presented
framework allows it to adapt to highly complex nested
workflows and scale to new scenarios. We present
performance estimation results of independent workflow
modules executed on the XSEDE SDSC Comet cluster using
various Machine Learning algorithms. The results provide
insights into the behavior and effectiveness of
different algorithms in the context of scientific
workflow performance prediction.




