An efficient pattern-based approach for workflow supporting large-scale science: The DagOnStar experience Articles uri icon

authors

  • SÁNCHEZ GALLEGOS, DANTE DOMIZZI
  • Di Luccio, Diana
  • Kosta, Sokol
  • Gonzalez-Compean, J. L.
  • MONTELLA, RAFFAELE

publication date

  • September 2021

start page

  • 187

end page

  • 203

volume

  • 122

International Standard Serial Number (ISSN)

  • 0167-739X

Electronic International Standard Serial Number (EISSN)

  • 1872-7115

abstract

  • Workflow engines are commonly used to orchestrate large-scale scientific computations such as, but not limited to weather, climate, natural disasters, food safety, and territorial management. However, to implement, manage, and execute real-world scientific applications in the form of workflows on multiple infrastructures (servers, clusters, cloud) remains a challenge. In this paper, we present DagOnStar (Directed Acyclic Graph OnAnything), a lightweight Python library implementing a workflow paradigm based on parallel patterns that can be executed on any combination of local machines, on-premise high performance computing clusters, containers, and cloud-based virtual infrastructures. DagOnStar is designed to minimize data movement to reduce the application storage footprint. A case study based on a real-world application is explored to illustrate the use of this novel workflow engine: a containerized weather data collection application deployed on multiple infrastructures. An experimental comparison with other state-of-the-art workflow engines shows that DagOnStar can run workflows on multiple types of infrastructure with an improvement of 50.19% in run time when using a parallel pattern with eight task-level workers.

subjects

  • Computer Science

keywords

  • cloud computing; data intensive; directed acyclic graph; parallel processing; workflow