figshare
Browse
SDAP_Poster.pdf (30.76 MB)

Apache Science Data Analytics Platform

Download (30.76 MB)
poster
posted on 2018-01-13, 04:28 authored by Thomas HuangThomas Huang

The Apache Science Data Analytics Platform (SDAP) (http://sdap.incubator.apache.org) is a suite of Big Data solutions created through the support of the NASA Advanced Information Technology (AIST) and the NASA Earth Science Data System (ESDS) programs. SDAP is led by the NASA Jet Propulsion Laboratory through collaboration with George Mason University, the Center for Ocean-Atmospheric Prediction Studies (COAPS) at the Florida State University, and the National Center for Atmospheric Research (NCAR). Its goal is to create a community-supported, integrated platform for big geospatial data analysis using Cloud computing technology. The SDAP currently includes NEXUS, OceanXtremes, DOMS, EDGE, and MUDROD. It is being infused into the NASA Sea Level Change Portal (https://sealevel.nasa.gov) and the NASA Physical Oceanography Distributed Active Archive (https://podaac.jpl.nasa.gov).

  • NEXUS provides a suite of on-the-fly data analysis services including time series generation, area averaged map, climatological map, etc. that are essential to climate research. It has can analyze data hundreds of times faster than traditional file-based analysis method. All the NEXUS’ analytic capabilities are exposed as RESTful API, hiding the complexity of horizontal-scaling and map-reduced computing.
  • OceanXtremes is a data-intensive anomaly detection solution that is built on the NEXUS solution. It provides cloud-based climatology generation and on-the-fly comparison of observational data against the climatology. OceanXtremes is equipped with the ability for researcher to document, share, and re-create identified ocean anomalies.
  • Distributed satellite and in situ matchup – The Distributed Oceanographic Matchup Service (DOMS) delivers a cloud-based matchup solution by integrating distributed in situ data hosted at JPL, NCAR, and COAPS. The project has standardized access to point-based in situ data using open source implementation of OpenSearch called the Extensible Data Gateway Environment (EDGE). DOMS translates the temporal spatial query into in situ subset requires to the external data centers. Upon receiving the subsetting in situ data, DOMS executes its map-reduced, matchup algorithm on the cloud. The matchup result is packed in CSV/netCDF and visualized.
  • Extensible Data Gateway Environment is an implementation of the standard OpenSearch specification (http://www.opensearch.org) using Apache Solr or ElasticSearch as the backend repository. Using this integration platform, data provider and expose their holdings using standard OpenSearch, RESTful API. The technology has already been infused in several production environments.
  • Mining and Utilizing Dataset Relevancy from Oceanographic Dataset (MUDROD) is a search analytic technology by continuously mining search logs from data portals. Through machine learning technology, MUDROD exposes hidden relationships between ocean datasets and dynamically adjust data ranking to show the most relevant datasets first.

History