Enhancing Data Center Performance with ECN-Based Transports
This study focuses on enabling Explicit Congestion Notification (ECN) in multi-service, multi-queue data centers to cater to diverse network requirements. ECN-based transports like DCTCP and DCQCN achieve high throughput and low latency by allowing end-hosts to react to ECN signals and switches to perform marking based on Active Queue Management policies. The adoption of Random Early Detection (RED) by ECN-aware switches further enhances performance by tracking buffer occupancy across different egress entities.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
Enabling ECN in Multi-Service Multi-Queue Data Centers Wei Bai, Li Chen, Kai Chen, Haitao Wu (Microsoft) SING Group @ Hong Kong University of Science and Technology 1
Background Data Centers Many services with diverse network requirements 2
Background Data Centers Many services with diverse network requirements ECN-based Transports ECN = Explicit Congestion Notification 3
Background Data Centers Many services with diverse network requirements ECN-based Transports Achieve high throughput & low latency Widely deployed: DCTCP, DCQCN, etc. 4
ECN-based Transports ECN-enabled end-hosts React to ECN by adjusting sending rates 6
ECN-based Transports ECN-enabled end-hosts React to ECN by adjusting sending rates ECN-aware switches Perform ECN marking based on Active Queue Management (AQM) policies 7
ECN-based Transports ECN-enabled end-hosts React to ECN by adjusting sending rates ECN-aware switches Perform ECN marking based on Active Queue Management (AQM) policies Our focus 8
ECN-aware Switches Adopt RED to perform ECN marking RED = Random Early Detection 9
ECN-aware Switches Adopt RED to perform ECN marking Per-queue/port/service-pool ECN/RED Track buffer occupancy of different egress entities 10
ECN-aware Switches Adopt RED to perform ECN marking Per-queue/port/service-pool ECN/RED queue 1 port queue 2 11
ECN-aware Switches Adopt RED to perform ECN marking Per-queue/port/service-pool ECN/RED queue 1 port queue 2 12
ECN-aware Switches Adopt RED to perform ECN marking Per-queue/port/service-pool ECN/RED shared buffer queue 1 port queue 2 queue 3 port queue 4 13
ECN-aware Switches Adopt RED to perform ECN marking Per-queue/port/service-pool ECN/RED Leverage multiple queues to classify traffic Isolate traffic from different services/applications 14
ECN-aware Switches Adopt RED to perform ECN marking Per-queue/port/service-pool ECN/RED Leverage multiple queues to classify traffic Isolate traffic from different services/applications Services running DCTCP Services running TCP Services running UDP 15
ECN-aware Switches Adopt RED to perform ECN marking Per-queue/port/service-pool ECN/RED Leverage multiple queues to classify traffic Isolate traffic from different services/applications Real-time services Best-effort services Background services 16
ECN-aware Switches Adopt RED to perform ECN marking Per-queue/port/service-pool ECN/RED Leverage multiple queues to classify traffic Isolate traffic from different services/applications Weighted max-min fair sharing among queues Weight = 4 Real-time services Weight = 2 Best-effort services Weight = 1 Background services 17
ECN-aware Switches Adopt RED to perform ECN marking Per-queue/port/service-pool ECN/RED Leverage multiple queues to classify traffic Isolate traffic from different services/applications Weighted max-min fair sharing among queues Perform ECN marking in multi-queue context 18
ECN marking with Single Queue RED Algorithm 20
ECN marking with Single Queue RED Algorithm Practical Configuration (e.g., DCTCP) 21
ECN marking with Single Queue To achieve 100% throughput ? ? ??? ? 22
ECN marking with Single Queue To achieve 100% throughput ? ? ??? ? Determined by congestion control algorithms 23
ECN marking with Single Queue To achieve 100% throughput ? ? ??? ? Standard ECN marking threshold 24
ECN marking with Single Queue To achieve 100% throughput ? ? ??? ? The standard threshold is relatively stable in DCN, e.g., 65 packets for 10G network (DCTCP paper) 25
ECN marking with Multi-Queue (1) Per-queue with the standard threshold ??????(?)= ? ??? ? standard threshold Don t mark Mark queue 1 queue 2 port queue 3 27
ECN marking with Multi-Queue (1) Per-queue with the standard threshold ??????(?)= ? ??? ? Increase packet latency standard threshold Don t mark Mark queue 1 queue 2 port queue 3 28
ECN marking with Multi-Queue (1) Per-queue with the standard threshold ??????(?)= ? ??? ? Increase packet latency Evenly classify 8 long-lived flows into a varying number of queues 29
ECN marking with Multi-Queue (2) Per-queue with the minimum threshold ??????(?)= ? ??? ? ?? ?? Normalized weight minimum threshold Mark Don t mark queue 1 queue 2 port queue 3 30
ECN marking with Multi-Queue (2) Per-queue with the minimum threshold ??????(?)= ? ??? ? Degrade throughput ?? ?? minimum threshold Mark Don t mark queue 1 queue 2 port queue 3 31
ECN marking with Multi-Queue (2) Per-queue with the minimum threshold ??????(?)= ? ??? ? Degrade throughput ?? ?? Overall Average FCT Average FCT (>10MB) 32
ECN marking with Multi-Queue (3) Per-port ?????= ? ??? ? standard threshold Don t mark Mark queue 1 queue 2 port queue 3 33
ECN marking with Multi-Queue (3) Per-port ?????= ? ??? ? Violate weighted fair sharing standard threshold Don t mark Mark queue 1 queue 2 port queue 3 34
ECN marking with Multi-Queue (3) Per-port ?????= ? ??? ? Violate weighted fair sharing Both services have a equal-weight dedicated queue on the switch 35
Question Can we design an ECN marking scheme with following properties: Deliver low latency Achieve high throughput Preserve weighted fair sharing Compatible with legacy ECN/RED implementation 36
Question Can we design an ECN marking scheme with following properties: Deliver low latency Achieve high throughput Preserve weighted fair sharing Compatible with legacy ECN/RED implementation Our answer: MQ-ECN 37
Start from GPS Model ? queues share the link with capacity ? Generalized Processor Sharing (GPS) 1 2 ... N ? 39
Start from GPS Model ? queues share the link with capacity ? Input Rate ?1 ?2 1 2 ... N ? ... ?? 40
Start from GPS Model ? queues share the link with capacity ? Input Rate ?1 ?2 Weight ?1 ?2 ... 1 2 ... N ? ... ?? ?? 41
Start from GPS Model ? queues share the link with capacity ? ? = ?=1 min(??,???) ? Weighted Fair Share Rate Input Rate ?1 ?2 Weight ?1 ?2 ... Output Rate min(?1,?1?) min(?2,?2?) ... 1 2 ... N ? ... ?? min(??,???) ?? 42
Start from GPS Model ? queues share the link with capacity ? ? = ?=1 min(??,???) Use ECN to throttle queue i if ??> ??? ? Input Rate ?1 ?2 Weight ?1 ?2 ... Output Rate min(?1,?1?) min(?2,?2?) ... 1 2 ... N ? ... ?? min(??,???) ?? 43
Start from GPS Model ? queues share the link with capacity ? ? = ?=1 min(??,???) ??????(?)= ??? ??? ? ? Input Rate ?1 ?2 Weight ?1 ?2 ... Output Rate min(?1,?1?) min(?2,?2?) ... 1 2 ... N ? ... ?? min(??,???) ?? 44
Start from GPS Model ? queues share the link with capacity ? ? = ?=1 min(??,???) ??????(?)= ??? ??? ? ? bit-by-bit round robin Input Rate ?1 ?2 Weight ?1 ?2 ... Output Rate min(?1,?1?) min(?2,?2?) ... 1 2 ... N ? ... ?? min(??,???) ?? 45
Start from GPS Model ? queues share the link with capacity ? ? = ?=1 min(??,???) ??????(?)= ??? ??? ? ? bit-by-bit round robin Input Rate ?1 ?2 Quantum ?1bits ?2bits ... Output Rate min(?1,?1?) min(?2,?2?) ... 1 2 ... N ? ... ?? min(??,???) ??bits 46
Start from GPS Model ? queues share the link with capacity ? ? = ?=1 min(??,???) ??????(?)= ??? ??? ? Time of a round: ?????? ? Input Rate ?1 ?2 Quantum ???????1 ???????2 Output Rate min(?1,?1?) min(?2,?2?) ... 1 2 ... N ? ... ... ?? ???????? min(??,???) 47
Start from GPS Model ? queues share the link with capacity ? ? = ?=1 min(??,???) ??????(?)= ???????? ?????? ??? ? Time of a round: ?????? Quantum ???????1 ???????2 ... ? Input Rate ?1 ?2 Output Rate min(?1,?1?) min(?2,?2?) ... 1 2 ... N ? ... ?? ???????? min(??,???) 48
MQ-ECN ? queues share the link with capacity ? ? = ?=1 min(??,???) ??????(?)= ???????? ?????? ??? ? ? 49
MQ-ECN ??????(?)= ???????? ?????? ??? ? Why does it work? 50