Enhancing Data Center Performance with ECN-Based Transports

Slide Note
Embed
Share

This study focuses on enabling Explicit Congestion Notification (ECN) in multi-service, multi-queue data centers to cater to diverse network requirements. ECN-based transports like DCTCP and DCQCN achieve high throughput and low latency by allowing end-hosts to react to ECN signals and switches to perform marking based on Active Queue Management policies. The adoption of Random Early Detection (RED) by ECN-aware switches further enhances performance by tracking buffer occupancy across different egress entities.


Uploaded on Sep 20, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Enabling ECN in Multi-Service Multi-Queue Data Centers Wei Bai, Li Chen, Kai Chen, Haitao Wu (Microsoft) SING Group @ Hong Kong University of Science and Technology 1

  2. Background Data Centers Many services with diverse network requirements 2

  3. Background Data Centers Many services with diverse network requirements ECN-based Transports ECN = Explicit Congestion Notification 3

  4. Background Data Centers Many services with diverse network requirements ECN-based Transports Achieve high throughput & low latency Widely deployed: DCTCP, DCQCN, etc. 4

  5. ECN-based Transports 5

  6. ECN-based Transports ECN-enabled end-hosts React to ECN by adjusting sending rates 6

  7. ECN-based Transports ECN-enabled end-hosts React to ECN by adjusting sending rates ECN-aware switches Perform ECN marking based on Active Queue Management (AQM) policies 7

  8. ECN-based Transports ECN-enabled end-hosts React to ECN by adjusting sending rates ECN-aware switches Perform ECN marking based on Active Queue Management (AQM) policies Our focus 8

  9. ECN-aware Switches Adopt RED to perform ECN marking RED = Random Early Detection 9

  10. ECN-aware Switches Adopt RED to perform ECN marking Per-queue/port/service-pool ECN/RED Track buffer occupancy of different egress entities 10

  11. ECN-aware Switches Adopt RED to perform ECN marking Per-queue/port/service-pool ECN/RED queue 1 port queue 2 11

  12. ECN-aware Switches Adopt RED to perform ECN marking Per-queue/port/service-pool ECN/RED queue 1 port queue 2 12

  13. ECN-aware Switches Adopt RED to perform ECN marking Per-queue/port/service-pool ECN/RED shared buffer queue 1 port queue 2 queue 3 port queue 4 13

  14. ECN-aware Switches Adopt RED to perform ECN marking Per-queue/port/service-pool ECN/RED Leverage multiple queues to classify traffic Isolate traffic from different services/applications 14

  15. ECN-aware Switches Adopt RED to perform ECN marking Per-queue/port/service-pool ECN/RED Leverage multiple queues to classify traffic Isolate traffic from different services/applications Services running DCTCP Services running TCP Services running UDP 15

  16. ECN-aware Switches Adopt RED to perform ECN marking Per-queue/port/service-pool ECN/RED Leverage multiple queues to classify traffic Isolate traffic from different services/applications Real-time services Best-effort services Background services 16

  17. ECN-aware Switches Adopt RED to perform ECN marking Per-queue/port/service-pool ECN/RED Leverage multiple queues to classify traffic Isolate traffic from different services/applications Weighted max-min fair sharing among queues Weight = 4 Real-time services Weight = 2 Best-effort services Weight = 1 Background services 17

  18. ECN-aware Switches Adopt RED to perform ECN marking Per-queue/port/service-pool ECN/RED Leverage multiple queues to classify traffic Isolate traffic from different services/applications Weighted max-min fair sharing among queues Perform ECN marking in multi-queue context 18

  19. ECN marking with Single Queue 19

  20. ECN marking with Single Queue RED Algorithm 20

  21. ECN marking with Single Queue RED Algorithm Practical Configuration (e.g., DCTCP) 21

  22. ECN marking with Single Queue To achieve 100% throughput ? ? ??? ? 22

  23. ECN marking with Single Queue To achieve 100% throughput ? ? ??? ? Determined by congestion control algorithms 23

  24. ECN marking with Single Queue To achieve 100% throughput ? ? ??? ? Standard ECN marking threshold 24

  25. ECN marking with Single Queue To achieve 100% throughput ? ? ??? ? The standard threshold is relatively stable in DCN, e.g., 65 packets for 10G network (DCTCP paper) 25

  26. ECN marking with Multi-Queue (1) 26

  27. ECN marking with Multi-Queue (1) Per-queue with the standard threshold ??????(?)= ? ??? ? standard threshold Don t mark Mark queue 1 queue 2 port queue 3 27

  28. ECN marking with Multi-Queue (1) Per-queue with the standard threshold ??????(?)= ? ??? ? Increase packet latency standard threshold Don t mark Mark queue 1 queue 2 port queue 3 28

  29. ECN marking with Multi-Queue (1) Per-queue with the standard threshold ??????(?)= ? ??? ? Increase packet latency Evenly classify 8 long-lived flows into a varying number of queues 29

  30. ECN marking with Multi-Queue (2) Per-queue with the minimum threshold ??????(?)= ? ??? ? ?? ?? Normalized weight minimum threshold Mark Don t mark queue 1 queue 2 port queue 3 30

  31. ECN marking with Multi-Queue (2) Per-queue with the minimum threshold ??????(?)= ? ??? ? Degrade throughput ?? ?? minimum threshold Mark Don t mark queue 1 queue 2 port queue 3 31

  32. ECN marking with Multi-Queue (2) Per-queue with the minimum threshold ??????(?)= ? ??? ? Degrade throughput ?? ?? Overall Average FCT Average FCT (>10MB) 32

  33. ECN marking with Multi-Queue (3) Per-port ?????= ? ??? ? standard threshold Don t mark Mark queue 1 queue 2 port queue 3 33

  34. ECN marking with Multi-Queue (3) Per-port ?????= ? ??? ? Violate weighted fair sharing standard threshold Don t mark Mark queue 1 queue 2 port queue 3 34

  35. ECN marking with Multi-Queue (3) Per-port ?????= ? ??? ? Violate weighted fair sharing Both services have a equal-weight dedicated queue on the switch 35

  36. Question Can we design an ECN marking scheme with following properties: Deliver low latency Achieve high throughput Preserve weighted fair sharing Compatible with legacy ECN/RED implementation 36

  37. Question Can we design an ECN marking scheme with following properties: Deliver low latency Achieve high throughput Preserve weighted fair sharing Compatible with legacy ECN/RED implementation Our answer: MQ-ECN 37

  38. MQ-ECNS DESIGN 38

  39. Start from GPS Model ? queues share the link with capacity ? Generalized Processor Sharing (GPS) 1 2 ... N ? 39

  40. Start from GPS Model ? queues share the link with capacity ? Input Rate ?1 ?2 1 2 ... N ? ... ?? 40

  41. Start from GPS Model ? queues share the link with capacity ? Input Rate ?1 ?2 Weight ?1 ?2 ... 1 2 ... N ? ... ?? ?? 41

  42. Start from GPS Model ? queues share the link with capacity ? ? = ?=1 min(??,???) ? Weighted Fair Share Rate Input Rate ?1 ?2 Weight ?1 ?2 ... Output Rate min(?1,?1?) min(?2,?2?) ... 1 2 ... N ? ... ?? min(??,???) ?? 42

  43. Start from GPS Model ? queues share the link with capacity ? ? = ?=1 min(??,???) Use ECN to throttle queue i if ??> ??? ? Input Rate ?1 ?2 Weight ?1 ?2 ... Output Rate min(?1,?1?) min(?2,?2?) ... 1 2 ... N ? ... ?? min(??,???) ?? 43

  44. Start from GPS Model ? queues share the link with capacity ? ? = ?=1 min(??,???) ??????(?)= ??? ??? ? ? Input Rate ?1 ?2 Weight ?1 ?2 ... Output Rate min(?1,?1?) min(?2,?2?) ... 1 2 ... N ? ... ?? min(??,???) ?? 44

  45. Start from GPS Model ? queues share the link with capacity ? ? = ?=1 min(??,???) ??????(?)= ??? ??? ? ? bit-by-bit round robin Input Rate ?1 ?2 Weight ?1 ?2 ... Output Rate min(?1,?1?) min(?2,?2?) ... 1 2 ... N ? ... ?? min(??,???) ?? 45

  46. Start from GPS Model ? queues share the link with capacity ? ? = ?=1 min(??,???) ??????(?)= ??? ??? ? ? bit-by-bit round robin Input Rate ?1 ?2 Quantum ?1bits ?2bits ... Output Rate min(?1,?1?) min(?2,?2?) ... 1 2 ... N ? ... ?? min(??,???) ??bits 46

  47. Start from GPS Model ? queues share the link with capacity ? ? = ?=1 min(??,???) ??????(?)= ??? ??? ? Time of a round: ?????? ? Input Rate ?1 ?2 Quantum ???????1 ???????2 Output Rate min(?1,?1?) min(?2,?2?) ... 1 2 ... N ? ... ... ?? ???????? min(??,???) 47

  48. Start from GPS Model ? queues share the link with capacity ? ? = ?=1 min(??,???) ??????(?)= ???????? ?????? ??? ? Time of a round: ?????? Quantum ???????1 ???????2 ... ? Input Rate ?1 ?2 Output Rate min(?1,?1?) min(?2,?2?) ... 1 2 ... N ? ... ?? ???????? min(??,???) 48

  49. MQ-ECN ? queues share the link with capacity ? ? = ?=1 min(??,???) ??????(?)= ???????? ?????? ??? ? ? 49

  50. MQ-ECN ??????(?)= ???????? ?????? ??? ? Why does it work? 50

Related