Orchestration and Workflows in eScience: Problems, Standards, and Solutions
The Netherlands eScience Center works in partnership with scientists from many different fields, from humanities to high energy physics. This gives us a unique overview of the problems in these fields. One common problem we see is the need for compute power, often for relatively independent tasks. In this paper, we will give an overview of the requirements for running these tasks. This list is relatively short, as we often encounter the same problems across projects. We argue that too often software to solve these problems is built from scratch, leading to a lot of duplicated effort.
Our approach is to re-use and contribute to existing solutions as much as possible, and above all else use standards whenever possible. Software changes quickly, standards hopefully last longer. We will discuss some of the (emerging) standards we use, including the Common Workflow Language (CWL) and Basic Model Interface (BMI) from the BioInformatics and Geosciences communities respectively. Using examples from projects, we will also discuss software we use.We hope that the scientific community can come together to exchange knowledge on this topic: hopefully leading to a better overview standards related to workflows and orchestration, and more usage of some of the great software out there.