Remote Direct Memory Access (RDMA) is a high-performance networking technology that allows a device to transfer data directly to or from the memory of another device, bypassing both hosts' operating systems, CPUs, and caches. This results in ultra-low latency, high throughput, and lower CPU overhead, making it essential for AI/ML training, HPC, and fast storage, often using InfiniBand or RoCE.
RoCE (RDMA over Converged Ethernet) and InfiniBand (IB) are both high-performance, low-latency networking technologies utilizing RDMA to bypass CPU kernels, primarily used in AI and HPC clusters. InfiniBand offers superior performance and a native lossless, simplified architecture, while RoCE provides cost-effective scalability, leveraging existing Ethernet infrastructure.
Read Chapter 5
Back to Contents