figshare
Browse
gateways-2017-v02.pdf (7.09 MB)

Enabling End-to-end Experiment Sharing and Reuse with Workflows via Jupyter Notebooks

Download (7.09 MB)
poster
posted on 2017-10-09, 17:37 authored by Rafael Ferreira da SilvaRafael Ferreira da Silva, Karan Vahi, Mats Rynge, Rajiv Mayani, Ewa Deelman
Scientific workflows are a mainstream solution to process large-scale modeling, simulations, and data analytics computations in distributed systems, and have supported traditional and breakthrough researches across several domains. While scientific workflows have enabled large-scale scientific computations and data analysis, and lowered the barriers for experiment sharing, preservation (including provenance), and reuse between heterogeneous platforms (HTC and HPC), the reproducibility of an end-to-end scientific experiment is hindered by the lack of methodologies to capture pre- and post-analysis (or steps) performed out of the scope of the workflow execution. Online notebook technologies (e.g., Jupyter Notebook) emerged as an open-source web application that allows scientists to create and share documents that contain live code, equations, visualizations and explanatory text. Jupyter Notebooks has a strong potential to reduce the gap between researchers and the complex knowledge required to run large-scale scientific workflows via a programmatic high-level interface to access/manage workflow capabilities. This poster describes our approach for integrating the Pegasus workflow management system with Jupyter to foster easiness of usage, reproducibility (all the information to run an experiment is in a unique place), and reuse (notebooks are portable if running in equivalent environments). Since Pegasus 4.8, a Python API to declare and manage Pegasus workflows via Jupyter has been provided. The user can create a notebook and declare a workflow application using the Pegasus DAX API – allows the scientists to specify data or control dependencies between computational jobs. This API encapsulates most of Pegasus commands (e.g., plan, run, statistics, among others), and also allows workflow creation, execution, and monitoring. Additionally, the API also provides mechanisms to define Pegasus catalogs (sites, replica, and transformation), as well as to generate tutorial example workflows.

Funding

National Science Foundation under the OAC SI2-SSI program, grant #1664162

History