Electronic International Standard Serial Number (EISSN)
1872-7115
abstract
Workflow engines are commonly used to orchestrate large-scale scientific computations such as, but not limited to weather, climate, natural disasters, food safety, and territorial management. However, to implement, manage, and execute real-world scientific applications in the form of workflows on multiple infrastructures (servers, clusters, cloud) remains a challenge. In this paper, we present DagOnStar (Directed Acyclic Graph OnAnything), a lightweight Python library implementing a workflow paradigm based on parallel patterns that can be executed on any combination of local machines, on-premise high performance computing clusters, containers, and cloud-based virtual infrastructures. DagOnStar is designed to minimize data movement to reduce the application storage footprint. A case study based on a real-world application is explored to illustrate the use of this novel workflow engine: a containerized weather data collection application deployed on multiple infrastructures. An experimental comparison with other state-of-the-art workflow engines shows that DagOnStar can run workflows on multiple types of infrastructure with an improvement of 50.19% in run time when using a parallel pattern with eight task-level workers.
Classification
subjects
Computer Science
keywords
cloud computing; data intensive; directed acyclic graph; parallel processing; workflow