Open Curation and Repeatability for Scientific Artifact Evaluation

2017-10-10T06:28:11Z (GMT) by Wil Kie
Occam is a computation-focused digital archival system that preserves software designed for scientific fields, humanities and the arts. This software tool is a free, open-source backend storage and deployment system that makes use of existing technologies such as git and Docker to provide a reproducible computational environment that can be distributed among machines or institutions. It is also a front-facing web portal that anybody can install to build and share graphical workflows to run configurable experiments in an accessible manner, lowering the barrier of entry to the increasingly sophisticated domain of digital science. <br><br>Occam generates containers on-the-fly based on what you want to run. It handles dependencies and the maintenance of software. Developers get their software to run within Occam, which is typically just describing dependencies and what commands to run, and that software will be preserved and available to any person who wishes to use it. Occam determines, from that description and the native capability of the machine, how it can run that software. How it determines how to run software may change as the underlying technology or hardware changes so as to better preserve software over time as technologies such as Docker eventually decay. <br><br>By generating containers on-demand, software can be distributed as it would normally: just the source code or binaries, instead of throwing around large Docker containers or virtual machine images. Also, the software, whittled down to only what it needs to run, can be more easily composed with other software in order to reuse research tools more effectively. This technique also allows for easier development of derived work by allowing research software to exist as code and not be trapped within an obscure tar file or lost within a massive virtual machine image. <br><br>One interesting use of building containers in this way is that people can add extra functionality whenever it becomes useful. For example, we can tell the system to run an object by building a container with a VNC capability. This will add a VNC server along with the running object. This allows us to have our front-end website have an interactive stream of native graphical applications, such as visualization tools, using a javascript VNC client. <br><br>Our poster, along with an on-hand demonstration, will highlight the above at a high-level enforced by several real use-cases of Occam. There will be a demonstration of an interactive visualization tool, such as Cytoscape or VisIt, showing how files can be generated, saved, shared, and then loaded again using graphical tools in the browser without requiring their installation. We will show a conventional automated simulation showing how the system can automatically generate multiple jobs deployed in parallel through ranged configuration parameters and a graph-based workflow connecting multiple software objects, such as a simulator with a graphing tool, together. Finally, there will be a look at interactive graphing and how generated graphs from such experimentation can be embedded while also preserving provenance and reproducibility by containing a link to their computation.