Exploring Modern Datacenter Networking Technologies
Cutting-edge research in datacenter networking focusing on concepts like Edge-Queued Datagram Service (EQDS), TCP congestion control, network abstractions, and traffic handling mechanisms. Collaboration between prestigious institutions and experts to enhance network performance and efficiency.
- Datacenter Networking
- Edge Technologies
- TCP Congestion Control
- Network Abstractions
- Research Collaboration
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
An edge An edge- -queued datagram service queued datagram service for all for all datacenter datacenter traffic traffic Costin Raiciu Correct Networks and University Politehnica of Bucharest Joint work with Vladimir Olteanu (UPB & CNW), Haggai Eran (Technion & Nvidia), Dragos Dumitrescu (UPB & CNW), Adrian Popa, Cristi Baciu (CNW), Mark Silberstein (Technion), Georgios Nikolaidis (Intel), Mark Handley (UCL & CNW) 1
Datacenter Networks Internet abstractions Reachability: IP, packet switching ToR TCP 24Gbps TCP RDMA Bad sharing, incast, high latency ToR Sharing: TCP congestion control ToR RDMA Load balancing: Per-flow ECMP Flow collisions TOR ToR 2
Datacenter networking today Datacenter networking today Message API Socket API Verbs API MP-RDMA Swift RoCEv2 IRN Homa NDP DCTCP MPTCP TCP/IP NIC DCQCN Timely HPCC 1RMA Network Flow-level ECMP Packet Trimming Packet-level ECMP Priority Queues ECN PFC Shared buffers INT 3
EQDS: an edge EQDS: an edge- -queued datagram service queued datagram service Message API Verbs API Socket API RoCEv2 Snap eRPC TCP/IP DCQCN Edge-Queued Datagram Service 1RMA NDP Aeolus Swift Homa ECN Shared buffers PFC INT Priority Queues Differential Drops Configurable ECMP Hashing Packet Trimming QCN 4
Todays networks ToR ToR ToR
Todays networks ToR TCP 1 IF ToR IF TCP ToR TCP 2 IF Sender congestion control depends on queue build-up (loss / delay)
Edge Queued Datagram Service ToR TCP 1 IF EQDS ToR IF TCP EQDS ToR TCP 2 IF EQDS
EQDS concept Receiver-driven control loop EQ IF ToR TCP 1 IF +1 +1 ToR IF TCP EQIF EQ IF ToR +1 +1 TCP 2 IF EQDS moves queuing to the edge. Importantly: TCP observes the same network behavior.
EQDS concept: reordering & retransmissions Lossy substrate EQ IF ToR TCP 1 IF EQ IF ToR IF TCP EQ IF ToR TCP 2 IF ACK / NACK Reorder buffer Retransmit Per-packet ECMP lost packets
EQDS concept: queuing disciplines TCP: droptail Receiver-driven control loop EQ IF ToR 12Gbps TCP IF TCP RDMA ToR IF EQIF EQ IF ToR RDMA IF TCP RDMA: probabilistic ECN Fair share: receiver decides how to split the bandwidth
EQDS building blocks 1. Receiver-driven network backend 2. An efficient tunnel protocol 3. A host API 11
EQDS building blocks 1. Receiver-driven network backend 2. An efficient tunnel protocol 3. A host API 12
NDP backend: small buffers + packet trimming RDMA Trimming support available: Intel Tofino (full support) with meters, deflect-on-drop and local feedback. Nvidia Spectrum2 and Broadcom Trident 4 with Deflect on Drop. EQ IF EQ IF TCP TCP EQ IF RDMA Buffers for trimming networks: 12 packet buffers / port sufficient for max throughput Pull Queue EQ IF TCP 13
RTS backend (1RMA) no switch support. EQ IF RTS 1 RDMA EQ IF RTS 4 TCP TCP EQ IF RDMA Pull Queue EQ IF RTS 4 TCP 14
EQDS building blocks 1. Receiver-driven network backend 2. An efficient tunnel protocol 3. A host API 15
EQDS host API and tunnel protocol Host A Host B VM Legacy app Storage app DPDK TCP/IP RDMA TCP/IP Native EQIF EQDS EQDS Tunnel Native DPDK RDMA Storage TCP/IP Legacy Host C 16
EQDS end-host implementations EQDS DPDK implementation Polling host and NIC rings. Run to completion Highest performance Kernel (TSO, GRO) 27Gbps with 1.5KB packets 45Gbps with 9KB packets. Added latency 4-10 us depending on setup. 18
Deploying EQDS Unmodified Software Unmodified apps, containers VM 1 VM 2 over over IP IP or RDMA ovs-dpdk Linux Kernel EQDS kernel EQDS dpdk Tested on Mellanox Bluefield-2 Broadcom Stingray NIC Smart NIC NIC EQDS dpdk 19
Evaluation 20
Disaggregated storage: reducing flow collisions SPDK in leaf-spine testbed Clients use SPDK perf to stress targets. Connect to random target. Issue random 64KB READ/WRITE ops. Queue depth: number of outstanding ops. C1 T3 T1 T2 C2 C3 Storage Targets Clients
Disaggregated storage: reducing flow collisions 128 512 128 512 22
Incast: reducing latency at scale 850-to-1 incast on Amazon m5.8xlarge Vms (10Gbps throughput) No trimming support available in EC2 using RTS backend of EQDS. 23
Sharing in clouds Sharing in clouds What happens if different VMs use conflicting CCs? cubic 24
Edge Edge- -Queued Datagram Service (EQDS) Queued Datagram Service (EQDS) move buffering out of the network, into the end hosts. Edge queue abstraction: enables host stack and network innovation Receiver-driven control loop EQ IF ToR TCP IF ToR IF TCP EQIF EQ IF ToR RDMA IF Fair share for incompatible host stacks Improves TCP and RDMA performance transparently EQDS implementation available as open-source soon
Related work VXLAN, GRE widely used for network isolation, not performance isolation. Virtualizing congestion control (AC/DC, VCC [Sigcomm 2016]) Enforce DCTCP-style CC in the hypervisor. Keep per flow state. Does not handle multipath / reordering. OnRamp [NSDI21] Implement delay-based CC algo in tunnel. Buffer packets at source when latency greater than target. No load balancing Resource isolation (Seawall, Faircloud, ElasticSwitch). Rate limiting at endhosts based on dynamically computed weights. Do not tackle incast, load balancing. 27
See paper for how EQDS handles: Asymmetric networks. EQDS alongside legacy traffic. Oversubscribed network core. Outcast issues (RTS congestion). 28
EQDS: congestion tunnelling in the host. Why would it work? Why would it work? Control loop much faster than control loop of upper layer protocols. 200ms TCP timeouts. Loss-based detection for RDMA. Feasible to implement? Tunneling already widespread: VXLAN already sees much of the traffic. EQDS adds a bit more work at the same level. 29
EQDS tunnel protocol: basic approach Unidirectional tunnels One side sends data, other sends control messages. The two tunnels can connected, to piggyback control information of reverse tunnel if needed. EQDS adds 12B header to each data packet. Packet format: ETH IP UDP EQDS [ IP TCP ] ETH IP UDP VXLAN-GPE EQDS [ETH ..] 30
Simulations at scale: 1024 FatTree, 10Gbps links. Permutation traffic matrix, long flows 1023-to-1 Incast, 50pkts / flow 31