InfiniBand Monitoring Methods at Los Alamos National Laboratory
Los Alamos National Laboratory utilizes InfiniBand monitoring methods to track fabric errors, optimize links, and analyze performance issues in clusters ranging from 8 to 1600 nodes. Developed by Susan Coulter, the IBMon2 suite of scripts identifies hardware errors and performance metrics, sending a
0 views • 16 slides
Dynamic Time-Variant Connection Management for PGAS Models on InfiniBand
Scalable communication data structures are crucial for runtime systems, with a focus on efficient connection management for PGAS models on InfiniBand systems. The research explores on-demand process creation, persistent connections, use cases, problem statements, and design choices for disconnection
0 views • 21 slides
Efficient Collective Operations using Remote Memory Operations on VIA-Based Clusters
This article discusses the use of Remote Direct Memory Access (RDMA) to optimize collective communication in parallel applications. It explores the communication characteristics, RDMA models, benefits of RDMA, and design issues related to buffer registration, data validity, and reuse. The potential
0 views • 34 slides
Asynchronous Zero-copy Communication in Sockets Direct Protocol over InfiniBand
This study explores the implementation of Asynchronous Zero-copy Communication for Synchronous Sockets in the Sockets Direct Protocol over InfiniBand. It discusses InfiniBand's high performance, low latency, and advanced features, as well as the Sockets Direct Protocol as a high-performance alternat
0 views • 36 slides