figshare
Browse
Potential for Big Data Technologies to Radically Change.pptx (6.52 MB)

Potential for Big Data Technologies to Radically Change the SWE of HPC Visualization and Analysis Tools

Download (6.52 MB)
presentation
posted on 2017-03-02, 16:05 authored by Mark C. Miller
Forbes Magazine estimated the Big Data Analytics market at $125 billion in 2015. That is 25x the entire DOE 2015 budget for HPC computing. This market uses Hadoop Map-Reduce and Apache Spark for parallel processing of vast amounts of textual data. The parallel programming paradigm is highly simplified relative to its message-passing based, HPC counterpart.

~~~ Some HPC researches have considered Big Data technologies for HPC workflows. However, for visualization tools, too much focus has been on rendering. While scalable, parallel surface and volume rendering is important, this does not represent the majority of recent software engineering investments in tools such as VisIt or ParaView. More and more, these tools represent large investments in scalable, parallel data processing algorithms involving end-to-end advancements from parallel I/O, to task management to computed results which are often numerical metrics instead rendered images. In addition, new algorithms can involve varying degrees of machine learning, a cornerstone of the Big Data toolbox.

~~~ We will give an overview of research in the application of Big Data technologies for HPC visualization and analysis with some emphasis on approaches studied at LLNL. We will look in depth at how best to represent HPC scientific mesh and field data for Big Data tools and outline potential challenges and rewards of a new parallel programing model for writing HPC data analysis algorithms using PySpark.

Note: this presentation is offered as a PowerPoint file rather than a PDF because its use of animations, which can result in some content being obscured when converted to PDF format.  It is best viewed as a PowerPoint slide show.

Presented at SIAM CSE17 Minisymposium: Software Productivity and Sustainability for CSE and Data Science

History

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC