Enhancing Data Center Network Performance through Packet Scheduling and ECN

Slide Note
Embed
Share

Explore the advancements in data center network performance improvement through techniques like packet scheduling, Explicit Congestion Notification (ECN), and strict priority for different types of flows. The research discusses the requirements of low latency for short messages and high throughput for large flows within data center networks, along with the transition from fixed-function to programmable switching chips for efficient packet scheduling.


Uploaded on Oct 10, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Enabling ECN over Generic Packet Scheduling Wei Bai, Kai Chen, Li Chen, Changhoon Kim, Haitao Wu ACM CoNEXT, Irvine, CA, December 2016 1

  2. Data Centers Around the World Google s worldwide DC map Facebook DC interior Microsoft s DC in Dublin, Ireland Global Microsoft Azure DC Footprint ACM CoNEXT, Irvine, CA, December 2016 2

  3. Inside the Data Center (DC) Network requirements of applications Desire low latency for short messages Desire high throughput for large flows ACM CoNEXT, Irvine, CA, December 2016 3

  4. Inside the Data Center (DC) Network requirements of applications Desire low latency for short messages Desire high throughput for large flows Network performance improvement Packet scheduling ECN-based transport protocols Combine ECN = Explicit Congestion Notification ACM CoNEXT, Irvine, CA, December 2016 4

  5. Packet Scheduling in Data Centers Weight 4 Real-time Services 2 Best-effort Services Round Robin 1 Background Services Inter-Service Traffic Isolation Bai et al. (NSDI 16) ACM CoNEXT, Irvine, CA, December 2016 5

  6. Packet Scheduling in Data Centers Strict Priority Priority High (0, 100KB] Flows Medium (100KB, 10MB) Flows Round Robin Low (10MB, ) Flows Flow Scheduling Bai et al. (NSDI 15) ACM CoNEXT, Irvine, CA, December 2016 6

  7. Packet Scheduling in Data Centers Strict Priority Round Robin Existing fixed-function switching chips ACM CoNEXT, Irvine, CA, December 2016 7

  8. Packet Scheduling in Data Centers Strict Priority Round Robin Future programmable switching chips ACM CoNEXT, Irvine, CA, December 2016 8

  9. Packet Scheduling in Data Centers Programmable Schedulers Strict Priority Round Robin Push-In-First-Out (PIFO) Queue A Sivaraman et al. (SIGCOMM 16) ACM CoNEXT, Irvine, CA, December 2016 9

  10. Can we enable ECN for arbitrary packet schedulers in data centers? ACM CoNEXT, Irvine, CA, December 2016 10

  11. ECN/RED without Packet Scheduling Packets get marked when queue length ? > ? ? mark don t mark ACM CoNEXT, Irvine, CA, December 2016 11

  12. ECN/RED without Packet Scheduling Packets get marked when queue length ? > ? To achieve 100% throughput Buffer Occupancy K Time ACM CoNEXT, Irvine, CA, December 2016 12

  13. ECN/RED without Packet Scheduling Packets get marked when queue length ? > ? To achieve 100% throughput ? ? ??? ? Small number of concurrent large flows in DC M Alizadeh et al. (SIGCOMM 10) ACM CoNEXT, Irvine, CA, December 2016 13

  14. ECN/RED without Packet Scheduling Packets get marked when queue length ? > ? To achieve 100% throughput ? ? ??? ? Fixed link capacity ACM CoNEXT, Irvine, CA, December 2016 14

  15. ECN/RED without Packet Scheduling Packets get marked when queue length ? > ? To achieve 100% throughput ? ? ??? ? Base round-trip time, relatively stable in DC Wu et al. (CoNEXT 12) ACM CoNEXT, Irvine, CA, December 2016 15

  16. ECN/RED without Packet Scheduling Packets get marked when queue length ? > ? To achieve 100% throughput ? ? ??? ? Determined by congestion control algorithms ACM CoNEXT, Irvine, CA, December 2016 16

  17. ECN/RED without Packet Scheduling Packets get marked when queue length ? > ? To achieve 100% throughput ? ? ??? ? Standard queue length threshold A static value in data center environment ACM CoNEXT, Irvine, CA, December 2016 17

  18. ECN/RED without Packet Scheduling Packets get marked when queue length ? > ? To achieve 100% throughput static threshold: ? Easy to configure at the switch ACM CoNEXT, Irvine, CA, December 2016 18

  19. ECN/RED with Packet Scheduling Each queue is a link with the varying capacity Ideal ECN/RED solution Packets should get marked if the length of queue i ??> ??= ?? ??? ? dynamic per-queue threshold: ?? varying capacity: ?? ACM CoNEXT, Irvine, CA, December 2016 19

  20. ECN/RED with Packet Scheduling Each queue is a link with the varying capacity Ideal ECN/RED solution Packets should get marked if the length of queue i ??> ??= ?? ??? ? Not supported by current switching chips Current practice Configure static thresholds: ??= ? ??? ? High throughput but poor latency ACM CoNEXT, Irvine, CA, December 2016 20

  21. To Implement Ideal ECN/RED Solution A general way to estimate the queue capacity Queue capacity = Queue departure rate when the queue keeps non-empty Leverage the solution from PIE (HPSR 13) Start measurement when # of bytes in the switch buffer > dq_thresh Get the rate to drain dq_thresh bytes ACM CoNEXT, Irvine, CA, December 2016 21

  22. Trade-off of Measurement Window Transmitted packets from queue 1 Transmitted packets from queue 2 Queue 1 and 2 keep non-empty during the transmission Sequence of packets Link capacity: C ACM CoNEXT, Irvine, CA, December 2016 22

  23. Trade-off of Measurement Window Transmitted packets from queue 1 Transmitted packets from queue 2 Queue capacity 1 = Queue capacity 2 = 0.5C Sequence of packets Link capacity: C ACM CoNEXT, Irvine, CA, December 2016 23

  24. Trade-off of Measurement Window A too small measurement window e.g., dq_thresh = 3MTU Sample rate of queue 1 C 3/7 C C 3/7 C Sequence of packets Link capacity: C ACM CoNEXT, Irvine, CA, December 2016 24

  25. Trade-off of Measurement Window A too small measurement window Degrade measurement accuracy Sample rate of queue 1 C 3/7 C C 3/7 C Sequence of packets Link capacity: C ACM CoNEXT, Irvine, CA, December 2016 25

  26. Trade-off of Measurement Window A too small measurement window Degrade measurement accuracy A too large measurement window e.g, dq_thresh = 20MTU Sequence of packets Link capacity: C ACM CoNEXT, Irvine, CA, December 2016 26

  27. Trade-off of Measurement Window A too small measurement window Degrade measurement accuracy A too large measurement window Cannot efficiently capture the dynamic changes Sequence of packets Link capacity: C ACM CoNEXT, Irvine, CA, December 2016 27

  28. Trade-off of Measurement Window A too small measurement window Degrade measurement accuracy A too large measurement window Cannot efficiently capture the dynamic changes Rate measurement is non-trivial ACM CoNEXT, Irvine, CA, December 2016 28

  29. Another View Ideal ECN/RED solution Packets should get marked if ??> ?? ??? ? queue length: ?? varying capacity: ?? ACM CoNEXT, Irvine, CA, December 2016 29

  30. Another View Ideal ECN/RED solution Packets should get marked if ????> ??? ? ???? sojourn time: varying capacity: ?? ACM CoNEXT, Irvine, CA, December 2016 30

  31. TCN TCN mechanism Packets should get marked if their sojourn times > ??? ? Time-based Congestion Notification ACM CoNEXT, Irvine, CA, December 2016 31

  32. TCN in Detail Sojourn time measurement Enqueue: attach a metadata to each packet to store the enqueue time Teq ACM CoNEXT, Irvine, CA, December 2016 32

  33. TCN in Detail Sojourn time measurement Enqueue: attach a metadata to each packet to store the enqueue time Dequeue: calculate sojourn time sojourn time = now - Teq Teq ACM CoNEXT, Irvine, CA, December 2016 33

  34. TCN in Detail Sojourn time measurement Enqueue: attach a metadata to each packet to store the enqueue time Dequeue: calculate sojourn time 2B-long metadata is enough for DC ACM CoNEXT, Irvine, CA, December 2016 34

  35. TCN in Detail Sojourn time measurement Enqueue: attach a metadata to each packet to store the enqueue time Dequeue: calculate sojourn time Instantaneous ECN marking Compare the per-packet instantaneous sojourn time with a static threshold (??? ?) Stateless Data Plane Algorithm ACM CoNEXT, Irvine, CA, December 2016 35

  36. TCN in Detail Sojourn time measurement Enqueue: attach a metadata to each packet to store the enqueue time Dequeue: calculate sojourn time Instantaneous ECN marking Compare the per-packet instantaneous sojourn time with a static threshold (??? ?) Marking does not cause any bubble on the link ACM CoNEXT, Irvine, CA, December 2016 36

  37. TCN vs CoDel Advantages of TCN Stateless: cheaper to implement in hardware Instantaneous: faster reaction to busty traffic ACM CoNEXT, Irvine, CA, December 2016 37

  38. TCN vs CoDel Advantages of TCN Stateless: cheaper to implement in hardware Instantaneous: faster reaction to busty traffic Simplicity of TCN Unique Characteristics of Data Centers ACM CoNEXT, Irvine, CA, December 2016 38

  39. TCN vs CoDel Advantages of TCN Stateless: cheaper to implement in hardware Instantaneous: faster reaction to busty traffic Simplicity of TCN Small number of concurrent large flows Relatively stable RTTs Prior knowledge of transport at the end host ACM CoNEXT, Irvine, CA, December 2016 39

  40. Testbed Evaluation TCN software prototype Linux qdisc kernel module on a multi-NIC server Testbed setup 9 servers are connected to a software switch End-hosts use DCTCP as the transport protocol ECN schemes compared Per-queue RED with the standard threshold CoDel 40

  41. Static Flow Experiment 1 flow (500Mbps) SP/WFQ 1 flow high prio Q1 Q2 w=1 (low) Q3 w=1 (low) 4 flows 41

  42. Static Flow Experiment TCN preserves the scheduling policy ACM CoNEXT, Irvine, CA, December 2016 42

  43. Dynamic Flow Experiment 8 senders to 1 receiver (web search workload) SP/WFQ scheduling policy at the switch Traffic SP/WFQ (0, 100KB] flows of all services high prio w=1 (low) (100KB, ) flows of service 1 (100KB, ) flows of service 2 w=1 (low) w=1 (low) (100KB, ) flows of service 3 (100KB, ) flows of service 4 w=1 (low) 43

  44. 99th FCT of Small Flows (<100KB) TCN maintains the low buffer occupancy ACM CoNEXT, Irvine, CA, December 2016 44

  45. Realistic Traffic: Large Flows (>10MB) TCN achieves high throughput ACM CoNEXT, Irvine, CA, December 2016 45

  46. Conclusion TCN: a simple ECN solution for data centers Use sojourn time as the congestion signal (CoDel) Perform instantaneous ECN marking (DCTCP) Code: http://sing.cse.ust.hk/projects/TCN Next step: TCN in programmable hardware ACM CoNEXT, Irvine, CA, December 2016 46

  47. Thanks! 47

  48. Average FCT of Small Flows (<100KB) ACM CoNEXT, Irvine, CA, December 2016 48

Related