SeedMe: Data sharing building blocks
datasetposted on 27.11.2017 by Amit Chourasia, David R. Nadeau, Michael L. Norman
Datasets usually provide raw data for analysis. This raw data often comes in spreadsheet form, but can be any collection of data, on which analysis can be performed.
The need for data sharing and rapid data access has become central with the rise of collaborative research in many disciplines. For the general public, several file sharing products are available that post and share files using web browsers. But for science data and research use, these products are not well suited. While consumer products get by with manual user interfaces to add and remove a few shared files, this is not practical for sharing large numbers of science data files, like those generated during and after large-scale computation. Instead, automated and scriptable mechanisms are required that can integrate into computation workflows to post files during and after computation jobs. Scientific data sharing also requires support for collaborative discussion of research results, quick rough-draft visualizations to analyze the data, and support for metadata and descriptive information that can record job and compute platform characteristics, input data, job parameters, job completion status, and other provenance information.
Here we describe work in progress under the umbrella of the SeedMe (Stream, Encode, Explore and Disseminate My Experiments) project that is developing scientific data-sharing and data management tools that cater to the unique needs of computational scientists. These tools support automated and scriptable access to shared data, browser-based data access, secure data storage, sharing with a project workgroup, data descriptions and metadata, threaded collaborative discussion, and light-weight visualization.