Kale: A System for Enabling Human-in-the-loop Interactivity in HPC Workflows

<p>Scientific problem-solving frequently requires interactive, iterative exploration and analysis. Web-based interactive electronic notebook interfaces such as Jupyter offer an important mechanism for scientists to capture analyses in a reproducible narrative context. An increasing number of science gateway environments are providing support for Jupyter Notebooks as a means to enable custom, ad-hoc analyses on scientific data. However, Jupyter Notebooks alone are not enough to fulfill the needs of scientific researchers today. Scientists are producing and consuming large amounts of data, and require significant computational resources to process and analyze that data, causing scientific workflows to become increasingly asynchronous in nature as processing is off-loaded to remote resources. Many scientific researchers turn to HPC systems for processing, but the traditional asynchronous batch-queue environment used in HPC for such computationally intensive tasks is largely separate from interactive Notebook-based workflows, producing a fragmented workflow for scientists that does not facilitate rapid scientific inquiry. We introduce our system “Kale” that enables Jupyter Notebooks to seamlessly interface with HPC workflows, leveraging distributed computational resources for iterative human-in-the-loop scientific exploration.</p>