Digital Environment for Enabling Data-Driven Science (DEEDS) : NSF CSSI PI Meeting 2020 Poster

With DEEDS ... data, computing, and scientific workflows come together in datasets that research teams build, use and share across their entire investigation.

Researchers build and share DEEDS datasets using an interactive dashboard. The dashboard provides full-featured services for the organization and structuring of research activities; the upload and classification of data; the extraction and assignment of metadata; research computing and statistical modeling (including HPC computing); the automatic capture of scientific workflows for data provenance and reproducibility; and the analysis and visualization of results.

User-friendly interfaces help research teams create, annotate, and link file repositories, multi-dimensional, hierarchical data tables, computational software, statistical models, scientific workflows and analytics – offering interactive search, exploration, and visualization across all elements of the dataset. Datasets are FAIR-compliant and can be published for discovery and interactive exploration of data, computing tools, and workflows for reuse and reinterpretation.

The DEEDS dashboard provides services for

Data: Upload, preserve, manage and explore your data. Assign metadata, follow rules for metadata standards (FAIR compliance), integrate scripts to automatically transform, validate, curate and check completeness of your uploaded data. Datasets provide data services for

-- Files classified by type, format and use, including standard categories and user defined categories (e.g., sensor data, mass spectrometry data, geospatial data, protocols). Files can be imported from external repositories (e.g., DuraMat through CKAN API). DEEDS offers applications to search, explore and visualize data by type (e.g., geotiff tile generation for map overlays). Some DEEDS dataset repositories have more than 3M files that are indexed, linked and classified by DEEDS for fast, user-friendly navigation, search and visualization.

-- Data Tables that represent hierarchical, multi-dimensional data models for measurements, properties, observations and other data. These can be customized, organized, re-organized, cloned, and annotated across the investigation lifecycle. Users can upload spreadsheets or interactively update (including bulk updates). Data tables can be viewed, browsed, searched, filtered, and downloaded. Data table operations are robust and user friendly. Some DEEDS datasets have more than 300 columns, some have more than 11M rows. Data table cells can represent a single data points, data arrays, and single/multiple linked tabular datasheets. DEEDS data table update, view, search, filter and exploration remain robust across all representations. Data tables are linked to map overlays for unified location data exploration, and data tables are fundamental analytics structures for visualization, computing and analysis.

Computing tools: Define computing tools to your dataset, then launch and track execution workflows. Your dataset computing tools can be computational research codes, open source software packages, modeling scripts, licensed software, Jupyter notebooks, RShiny and other Hubzero tools. DEEDS tools have full access to your dataset repository. A tool launched from the DEEDS dashboard follows the owner’s tool definition, allowing users to specify tool arguments, choose input files, select execution resources (including HPC facilities), and determine how the computing workflow should be tracked and captured. Output data are returned by DEEDS to the dataset – they are annotated and linked to input, tool, resource and user. Captured workflows can be viewed and searched for data provenance and results traceability. Tools defined for DEEDS datasets are added to the DEEDS tool repository, where they can be imported into other datasets with permission of their owners.

Analytics: Define R data frames based on dataset files and data tables. DEEDS R-based analytics supports data filtering, merging, statistical computing, algorithm specification/ computation, and visualization. Data across datasets as well as external data can be merged within analytics for comparison and analysis. Introduction of new analytics features is ongoing, based on priorities of DEEDS research groups.

FAIR compliance: DEEDS guarantees adherence to the principles of Findable, Accessible, Interoperable, and Reusable data management and stewardship for your research. DEEDS offers fine-grained access control as needed for your data and computing tools.