Presto: Edge-based Load Balancing for Fast Datacenter Networks

Slide Note
Embed
Share

Datacenter networks face challenges with network congestion affecting both throughput-sensitive elephants and latency-sensitive mice traffic. This paper discusses the problem, existing traffic load balancing schemes, and proposes an edge-based solution called Presto to address network congestion proactively and reactively by balancing traffic based on hardware and transport changes.


Uploaded on Sep 15, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Presto: Edge-based Load Balancing for Fast Datacenter Networks Keqiang He, Eric Rozner, Kanak Agarwal, Wes Felter, John Carter, Aditya Akella 1

  2. Background Datacenter networks support a wide variety of traffic Elephants: throughput sensitive Data Ingestion, VM Migration, Backups Mice: latency sensitive Search, Gaming, Web, RPCs 2

  3. The Problem Network congestion: flows of both types suffer Example Elephant throughput is cut by half TCP RTT is increased by 100X per hop (Rasley, SIGCOMM 14) SLA is violated, revenue is impacted 3

  4. Traffic Load Balancing Schemes Scheme Hardware changes Transport changes Granularity Pro-/reactive 4

  5. Traffic Load Balancing Schemes Scheme Hardware changes Transport changes Granularity Pro-/reactive ECMP No No Coarse-grained Proactive Proactive: try to avoid network congestion in the first place 5

  6. Traffic Load Balancing Schemes Scheme Hardware changes Transport changes Granularity Pro-/reactive ECMP No No Coarse-grained Proactive Centralized No No Coarse-grained Reactive (control loop) Reactive: mitigate congestion after it already happens 6

  7. Traffic Load Balancing Schemes Scheme Hardware changes Transport changes Granularity Pro-/reactive ECMP No No Coarse-grained Proactive Centralized No No Coarse-grained Reactive (control loop) MPTCP No Yes Fine-grained Reactive 7

  8. Traffic Load Balancing Schemes Scheme Hardware changes Transport changes Granularity Pro-/reactive ECMP No No Coarse-grained Proactive Centralized No No Coarse-grained Reactive (control loop) MPTCP No Yes Fine-grained Reactive CONGA/ Juniper VCF Yes No Fine-grained Proactive 8

  9. Traffic Load Balancing Schemes Scheme Hardware changes Transport changes Granularity Pro-/reactive ECMP No No Coarse-grained Proactive Centralized No No Coarse-grained Reactive (control loop) MPTCP No Yes Fine-grained Reactive CONGA/ Juniper VCF Yes No Fine-grained Proactive Presto No No Fine-grained Proactive 9

  10. Presto Near perfect load balancing without changing hardware or transport Utilize the software edge (vSwitch) Leverage TCP offloading features below transport layer Work at 10 Gbps and beyond Goal: near optimally load balance the network at fast speeds 10

  11. Presto at a High Level Spine Leaf Near uniform-sized data units NIC NIC vSwitch vSwitch TCP/IP TCP/IP 11

  12. Presto at a High Level Spine Leaf Near uniform-sized data units Proactively distributed evenly over symmetric network by vSwitch sender NIC NIC vSwitch vSwitch TCP/IP TCP/IP 12

  13. Presto at a High Level Spine Leaf Near uniform-sized data units Proactively distributed evenly over symmetric network by vSwitch sender NIC NIC vSwitch vSwitch TCP/IP TCP/IP 13

  14. Presto at a High Level Spine Leaf Near uniform-sized data units Proactively distributed evenly over symmetric network by vSwitch sender NIC NIC vSwitch vSwitch TCP/IP TCP/IP Receiver masks packet reordering due to multipathing below transport layer 14

  15. Outline Sender Receiver Evaluation 15

  16. What Granularity to do Load-balancing on? Per-flow Elephant collisions Per-packet High computational overhead Heavy reordering including mice flows Flowlets Burst of packets separated by inactivity timer Effectiveness depends on workloads small large inactivity timer A lot of reordering Mice flows fragmented Large flowlets (hash collisions) 16

  17. Presto LB Granularity Presto: load-balance on flowcells What is flowcell? A set of TCP segments with bounded byte count Bound is maximal TCP Segmentation Offload (TSO) size Maximize the benefit of TSO for high speed 64KB in implementation What s TSO? TCP/IP Large Segment NIC Segmentation & Checksum Offload 17 MTU-sized Ethernet Frames

  18. Presto LB Granularity Presto: load-balance on flowcells What is flowcell? A set of TCP segments with bounded byte count Bound is maximal TCP Segmentation Offload (TSO) size Maximize the benefit of TSO for high speed 64KB in implementation Examples TCP segments 25KB 30KB 30KB Flowcell: 55KB 18 Start

  19. Presto LB Granularity Presto: load-balance on flowcells What is flowcell? A set of TCP segments with bounded byte count Bound is maximal TCP Segmentation Offload (TSO) size Maximize the benefit of TSO for high speed 64KB in implementation Examples TCP segments 1KB 5KB 1KB Start Flowcell: 7KB (the whole flow is 1 flowcell) 19

  20. Presto Sender Spine Leaf NIC NIC Controller installs label-switched paths vSwitch TCP/IP vSwitch TCP/IP Host A Host B 20

  21. Presto Sender Spine Leaf NIC NIC Controller installs label-switched paths vSwitch TCP/IP vSwitch TCP/IP Host A Host B 21

  22. Presto Sender Spine NIC uses TSO and chunks segment #1 into MTU-sized packets Leaf flowcell #1: vSwitch encodes flowcell ID, rewrites label NIC id,label NIC vSwitch TCP/IP vSwitch TCP/IP 50KB vSwitch receives TCP segment #1 Host A Host B 22

  23. Presto Sender Spine NIC uses TSO and chunks segment #2 into MTU-sized packets Leaf flowcell #2: vSwitch encodes flowcell ID, rewrites label NIC id,label NIC vSwitch 60KB vSwitch TCP/IP vSwitch receives TCP segment #2 TCP/IP Host A Host B 23

  24. Benefits Most flows smaller than 64KB [Benson, IMC 11] the majority of mice are not exposed to reordering Most bytes from elephants [Alizadeh, SIGCOMM 10] traffic routed on uniform sizes Fine-grained and deterministic scheduling over disjoint paths near optimal load balancing 24

  25. Presto Receiver Major challenges Packet reordering for large flows due to multipath Distinguish loss from reordering Fast (10G and beyond) Light-weight 25

  26. Intro to GRO Generic Receive Offload (GRO) The reverse process of TSO 26

  27. Intro to GRO TCP/IP OS GRO NIC Hardware 27

  28. Intro to GRO TCP/IP GRO MTU-sized Packets NIC P1 P2 P3 P4 P5 Queue head 28

  29. Intro to GRO TCP/IP Merge GRO MTU-sized Packets NIC P1 P2 P3 P4 P5 Queue head 29

  30. Intro to GRO TCP/IP Merge GRO P1 MTU-sized Packets NIC P2 P3 P4 P5 Queue head 30

  31. Intro to GRO TCP/IP Merge GRO P1 P2 MTU-sized Packets NIC P3 P4 P5 Queue head 31

  32. Intro to GRO TCP/IP Merge GRO P1 P3 MTU-sized Packets NIC P4 P5 Queue head 32

  33. Intro to GRO TCP/IP Merge GRO P1 P4 MTU-sized Packets NIC P5 Queue head 33

  34. Intro to GRO TCP/IP Push-up GRO P1 P5 MTU-sized Packets NIC Large TCP segments are pushed-up at the end of a batched IO event (i.e., a polling event) 34

  35. Intro to GRO TCP/IP Push-up GRO P1 P5 MTU-sized Packets NIC Merging pkts in GRO creates less segments & avoids using substantially more cycles at TCP/IP and above [Menon, ATC 08] If GRO is disabled, ~6Gbps with 100% CPU usage of one core 35

  36. Reordering Challenges TCP/IP GRO NIC P1 P2 P3 P6 P4 P7 P5 P8 P9 Out of order packets 36

  37. Reordering Challenges TCP/IP GRO P1 NIC P2 P3 P6 P4 P7 P5 P8 P9 37

  38. Reordering Challenges TCP/IP GRO P1 P2 NIC P3 P6 P4 P7 P5 P8 P9 38

  39. Reordering Challenges TCP/IP GRO P1 P3 NIC P6 P4 P7 P5 P8 P9 39

  40. Reordering Challenges TCP/IP GRO P1 P3 P6 NIC P4 P7 P5 P8 P9 GRO is designed to be fast and simple; it pushes-up the existing segment immediately when 1) there is a gap in sequence number, 2) MSS reached or 3) timeout fired 40

  41. Reordering Challenges TCP/IP P1 P3 GRO P6 NIC P4 P7 P5 P8 P9 41

  42. Reordering Challenges TCP/IP P6 P1 P3 GRO P4 NIC P7 P5 P8 P9 42

  43. Reordering Challenges TCP/IP P6 P4 P1 P3 GRO P7 NIC P5 P8 P9 43

  44. Reordering Challenges TCP/IP P6 P4 P7 P1 P3 GRO P5 NIC P8 P9 44

  45. Reordering Challenges TCP/IP P6 P4 P7 P5 P1 P3 GRO P8 NIC P9 45

  46. Reordering Challenges TCP/IP P6 P4 P7 P5 P1 P3 GRO P8 P9 NIC 46

  47. Reordering Challenges P8 P9 TCP/IP P6 P4 P7 P5 P1 P3 GRO NIC 47

  48. Reordering Challenges GRO is effectively disabled Lots of small packets are pushed up to TCP/IP Huge CPU processing overhead Poor TCP performance due to massive reordering 48

  49. Improved GRO to Mask Reordering for TCP TCP/IP GRO NIC P1 P2 P3 P6 P4 P7 P5 P8 P9 Flowcell #1 Flowcell #2 49

  50. Improved GRO to Mask Reordering for TCP TCP/IP GRO P1 NIC P2 P3 P6 P4 P7 P5 P8 P9 Flowcell #1 Flowcell #2 50

Related


More Related Content