TJ
Publications
- Linux OS Jitter Measurements at Large Node Counts using a BlueGene/L
- Purple L1 Milestone Review Panel - MPI
- Design and Implementation of a Scalable Membership Service for Supercomputer Resiliency-Aware Runtime
- Analyzing the Interplay of Failures and Workload on a Leadership-Class Supercomputer
- A uGNI-Based Asynchronous Message-driven Runtime System for Cray Supercomputers with Gemini Interconnect
- Time Distribution Alternatives for the Smart Grid Workshop Report
- A Clock Synchronization Strategy for Minimizing Clock Variance at Runtime in High-End Computing Environments
- MVAPICH-Aptus: Scalable High-Performance Mult-Transport MPI over InfiniBand
- System-Level Support for Composition of Applications
- UNITY: Unified Memory and File Space
- Advanced Electrical Power System Sensors Workshop Report
- Quantifying Scheduling Challenges for Exascale System Software
- Digital Object Identifiers For OLCF
- Reducing Connection Memory Requirements of MPI for InfiniBand Clusters: A Message Coalescing Approach
- Mapping Dense LU Factorization on Multicore Supercomputer Nodes
- Filtering log data: Finding the Needles in the Haystack
- Linux Kernel Co-Scheduling For Bulk Synchronous Parallel Applications
- Time Synchronization in the Electric Power System
- Optimizing Fine-grained Communication in a Biomolecular Simulation Application on Cray XK6
- Accurate Fault Prediction of BlueGene/P RAS Logs Via Geometric Reduction
- HPC-Colony: Services and Interfaces for Very Large Systems
- HPC System Call Usage Trends
- Providing Runtime Clock Synchronization With Minimal Node-to-Node Time Deviation on XT4s and XT5s
- TALC: A Simple C Language Extension For ImprovedPerformance and Code Maintainability
- MPI PERUSE: An MPI Extension for Revealing Unexposed Implementation Information
- scalable infrastructure to support supercomputer resiliency-aware applications and load balancing
- An Alternative Timing and Synchronization Approach for Situational Awareness and Predictive Analytics
- Evaluating the effectiveness of program data features for guiding memory management
- IEEE Standard for a Precision Clock Synchronization Protocol for Networked Measurement and Control Systems
- Understanding failures through the lifetime of a top-level supercomputer
- An evaluation of the state of time synchronization on leadership class supercomputers
- Flexible and Effective Object Tiering for Heterogeneous Memory Systems
- High Performance Computing
- Large-scale distributed deep learning: A study of mechanisms and trade-offs with pytorch
- Optimizing I/O forwarding techniques for extreme-scale event tracing
- Recent Advances in Precision Clock Synchronization Protocols for Power Grid Control Systems
- Online Application Guidance for Heterogeneous Memory Systems
- Large-Scale Distributed Deep Learning: A Study of Mechanisms and Trade-Offs with PyTorch
- Clock synchronization in high‐end computing environments: a strategy for minimizing clock variance at runtime
- Linux kernel co-scheduling and bulk synchronous parallelism
- Understanding Soft Error Sensitivity of Deep Learning Models and Frameworks through Checkpoint Alteration
- Towards a Model to Estimate the Reliability of Large-Scale Hybrid Supercomputers
- 3-Dimensional root cause diagnosis via co-analysis
- Analyzing a Five-Year Failure Record of a Leadership-Class Supercomputer
- Portable application guidance for complex memory systems
- Autonomy Loops for Monitoring, Operational Data Analytics, Feedback, and Response in HPC Operations
- Enabling event tracing at leadership-class scale through I/O forwarding middleware
- Performance Potential of Mixed Data Management Modes for Heterogeneous Memory Systems
- The ECP SICM project: Managing complex memory hierarchies for exascale applications
- Flexible and Effective Object Tiering for Heterogeneous Memory Systems
- Improving the Scalability of Parallel Jobs by adding Parallel Awareness to the Operating System
- Impacts of Operating Systems on the Scalability of Applications
- Performance of an MPI-IO implementation using third-party transfer
- Sizing and Tuning GPFS
- Performance of the IBM General Parallel File System
- An MPI-IO Interface to HPSS
- Parallelizing Monte Carlo with PMC
Usage metrics
Co-workers & collaborators
- SS
Sameer Shende
- JM
John Mellor-Crummey
- ML
Michael A. Lang
- GE
Greg Eisenhauer
- MB
Michael Brim
- GV
Geoffroy Vallee