figshare
Browse
si2-cooperman.pdf (249.55 kB)

NSCI: SI2-SSE: An Extensible Model to Support Scalable Checkpoint-Restart for DMTCP across Multiple Disciplines

Download (249.55 kB)
journal contribution
posted on 2018-04-24, 01:28 authored by Gene CoopermanGene Cooperman
DMTCP (Distributed MultiThreaded CheckPointing) is a widely used package for transparent checkpoint-restart. Checkpoint-restart saves to disk the state of a running process, and then to restart (possibly on a new computer) the process where it left off. DMTCP has grown from a monolithic package to a highly adaptable package supporting HPC (e.g., MPI), GPUs, high-performance networks; and applications such as cyber-security, EDA, science, and engineering.

Funding

National Science Foundation award OAC-1740218

History

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC