figshare
Browse

SPADE: Scalable Performance and Accuracy analysis for Distributed and Extreme-scale systems

Download (1.75 MB)
poster
posted on 2024-08-01, 22:09 authored by Heike JagodeHeike Jagode, Shirley V. Moore, Vincent WeaverVincent Weaver, Anthony DanalisAnthony Danalis, Christoph Lauter

The SPADE project focuses on advancing monitoring, optimization, evaluation, and decision-making capabilities for extreme-scale systems. In Year 1, the team targets several advanced monitoring capabilities, such as developing support for AMD's new RocProfiler SDK to enable the analysis of hardware performance counters on AMD APUs like MI300, which will be integrated into El Capitan. The SPADE team is also extending the PAPI library for heterogeneous CPU support. This will allow users to simultaneously monitor the performance of chips that support both high-end and low-end processors, enabling the system to be tuned for more effective switching between the various cores. Another initiative is the development of a Python version of PAPI (cyPAPI), specifically for use with frameworks and tools being developed for Python in HPC environments. The team is exploring beta versions of cyPAPI with PyTorch to advance decision-making capabilities for mixed-precision tuning of machine learning applications.

Funding

Collaborative Research: Frameworks: Scalable Performance and Accuracy analysis for Distributed and Extreme-scale systems (SPADE)

Directorate for Computer & Information Science & Engineering

Find out more...

Collaborative Research: Frameworks: Scalable Performance and Accuracy analysis for Distributed and Extreme-scale systems (SPADE)

Directorate for Computer & Information Science & Engineering

Find out more...

Collaborative Research: Frameworks: Scalable Performance and Accuracy analysis for Distributed and Extreme-scale systems (SPADE)

Directorate for Computer & Information Science & Engineering

Find out more...

History

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC