Enhancing Data Center Network Performance through Packet Scheduling and ECN
Explore the advancements in data center network performance improvement through techniques like packet scheduling, Explicit Congestion Notification (ECN), and strict priority for different types of flows. The research discusses the requirements of low latency for short messages and high throughput for large flows within data center networks, along with the transition from fixed-function to programmable switching chips for efficient packet scheduling.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
Enabling ECN over Generic Packet Scheduling Wei Bai, Kai Chen, Li Chen, Changhoon Kim, Haitao Wu ACM CoNEXT, Irvine, CA, December 2016 1
Data Centers Around the World Google s worldwide DC map Facebook DC interior Microsoft s DC in Dublin, Ireland Global Microsoft Azure DC Footprint ACM CoNEXT, Irvine, CA, December 2016 2
Inside the Data Center (DC) Network requirements of applications Desire low latency for short messages Desire high throughput for large flows ACM CoNEXT, Irvine, CA, December 2016 3
Inside the Data Center (DC) Network requirements of applications Desire low latency for short messages Desire high throughput for large flows Network performance improvement Packet scheduling ECN-based transport protocols Combine ECN = Explicit Congestion Notification ACM CoNEXT, Irvine, CA, December 2016 4
Packet Scheduling in Data Centers Weight 4 Real-time Services 2 Best-effort Services Round Robin 1 Background Services Inter-Service Traffic Isolation Bai et al. (NSDI 16) ACM CoNEXT, Irvine, CA, December 2016 5
Packet Scheduling in Data Centers Strict Priority Priority High (0, 100KB] Flows Medium (100KB, 10MB) Flows Round Robin Low (10MB, ) Flows Flow Scheduling Bai et al. (NSDI 15) ACM CoNEXT, Irvine, CA, December 2016 6
Packet Scheduling in Data Centers Strict Priority Round Robin Existing fixed-function switching chips ACM CoNEXT, Irvine, CA, December 2016 7
Packet Scheduling in Data Centers Strict Priority Round Robin Future programmable switching chips ACM CoNEXT, Irvine, CA, December 2016 8
Packet Scheduling in Data Centers Programmable Schedulers Strict Priority Round Robin Push-In-First-Out (PIFO) Queue A Sivaraman et al. (SIGCOMM 16) ACM CoNEXT, Irvine, CA, December 2016 9
Can we enable ECN for arbitrary packet schedulers in data centers? ACM CoNEXT, Irvine, CA, December 2016 10
ECN/RED without Packet Scheduling Packets get marked when queue length ? > ? ? mark don t mark ACM CoNEXT, Irvine, CA, December 2016 11
ECN/RED without Packet Scheduling Packets get marked when queue length ? > ? To achieve 100% throughput Buffer Occupancy K Time ACM CoNEXT, Irvine, CA, December 2016 12
ECN/RED without Packet Scheduling Packets get marked when queue length ? > ? To achieve 100% throughput ? ? ??? ? Small number of concurrent large flows in DC M Alizadeh et al. (SIGCOMM 10) ACM CoNEXT, Irvine, CA, December 2016 13
ECN/RED without Packet Scheduling Packets get marked when queue length ? > ? To achieve 100% throughput ? ? ??? ? Fixed link capacity ACM CoNEXT, Irvine, CA, December 2016 14
ECN/RED without Packet Scheduling Packets get marked when queue length ? > ? To achieve 100% throughput ? ? ??? ? Base round-trip time, relatively stable in DC Wu et al. (CoNEXT 12) ACM CoNEXT, Irvine, CA, December 2016 15
ECN/RED without Packet Scheduling Packets get marked when queue length ? > ? To achieve 100% throughput ? ? ??? ? Determined by congestion control algorithms ACM CoNEXT, Irvine, CA, December 2016 16
ECN/RED without Packet Scheduling Packets get marked when queue length ? > ? To achieve 100% throughput ? ? ??? ? Standard queue length threshold A static value in data center environment ACM CoNEXT, Irvine, CA, December 2016 17
ECN/RED without Packet Scheduling Packets get marked when queue length ? > ? To achieve 100% throughput static threshold: ? Easy to configure at the switch ACM CoNEXT, Irvine, CA, December 2016 18
ECN/RED with Packet Scheduling Each queue is a link with the varying capacity Ideal ECN/RED solution Packets should get marked if the length of queue i ??> ??= ?? ??? ? dynamic per-queue threshold: ?? varying capacity: ?? ACM CoNEXT, Irvine, CA, December 2016 19
ECN/RED with Packet Scheduling Each queue is a link with the varying capacity Ideal ECN/RED solution Packets should get marked if the length of queue i ??> ??= ?? ??? ? Not supported by current switching chips Current practice Configure static thresholds: ??= ? ??? ? High throughput but poor latency ACM CoNEXT, Irvine, CA, December 2016 20
To Implement Ideal ECN/RED Solution A general way to estimate the queue capacity Queue capacity = Queue departure rate when the queue keeps non-empty Leverage the solution from PIE (HPSR 13) Start measurement when # of bytes in the switch buffer > dq_thresh Get the rate to drain dq_thresh bytes ACM CoNEXT, Irvine, CA, December 2016 21
Trade-off of Measurement Window Transmitted packets from queue 1 Transmitted packets from queue 2 Queue 1 and 2 keep non-empty during the transmission Sequence of packets Link capacity: C ACM CoNEXT, Irvine, CA, December 2016 22
Trade-off of Measurement Window Transmitted packets from queue 1 Transmitted packets from queue 2 Queue capacity 1 = Queue capacity 2 = 0.5C Sequence of packets Link capacity: C ACM CoNEXT, Irvine, CA, December 2016 23
Trade-off of Measurement Window A too small measurement window e.g., dq_thresh = 3MTU Sample rate of queue 1 C 3/7 C C 3/7 C Sequence of packets Link capacity: C ACM CoNEXT, Irvine, CA, December 2016 24
Trade-off of Measurement Window A too small measurement window Degrade measurement accuracy Sample rate of queue 1 C 3/7 C C 3/7 C Sequence of packets Link capacity: C ACM CoNEXT, Irvine, CA, December 2016 25
Trade-off of Measurement Window A too small measurement window Degrade measurement accuracy A too large measurement window e.g, dq_thresh = 20MTU Sequence of packets Link capacity: C ACM CoNEXT, Irvine, CA, December 2016 26
Trade-off of Measurement Window A too small measurement window Degrade measurement accuracy A too large measurement window Cannot efficiently capture the dynamic changes Sequence of packets Link capacity: C ACM CoNEXT, Irvine, CA, December 2016 27
Trade-off of Measurement Window A too small measurement window Degrade measurement accuracy A too large measurement window Cannot efficiently capture the dynamic changes Rate measurement is non-trivial ACM CoNEXT, Irvine, CA, December 2016 28
Another View Ideal ECN/RED solution Packets should get marked if ??> ?? ??? ? queue length: ?? varying capacity: ?? ACM CoNEXT, Irvine, CA, December 2016 29
Another View Ideal ECN/RED solution Packets should get marked if ????> ??? ? ???? sojourn time: varying capacity: ?? ACM CoNEXT, Irvine, CA, December 2016 30
TCN TCN mechanism Packets should get marked if their sojourn times > ??? ? Time-based Congestion Notification ACM CoNEXT, Irvine, CA, December 2016 31
TCN in Detail Sojourn time measurement Enqueue: attach a metadata to each packet to store the enqueue time Teq ACM CoNEXT, Irvine, CA, December 2016 32
TCN in Detail Sojourn time measurement Enqueue: attach a metadata to each packet to store the enqueue time Dequeue: calculate sojourn time sojourn time = now - Teq Teq ACM CoNEXT, Irvine, CA, December 2016 33
TCN in Detail Sojourn time measurement Enqueue: attach a metadata to each packet to store the enqueue time Dequeue: calculate sojourn time 2B-long metadata is enough for DC ACM CoNEXT, Irvine, CA, December 2016 34
TCN in Detail Sojourn time measurement Enqueue: attach a metadata to each packet to store the enqueue time Dequeue: calculate sojourn time Instantaneous ECN marking Compare the per-packet instantaneous sojourn time with a static threshold (??? ?) Stateless Data Plane Algorithm ACM CoNEXT, Irvine, CA, December 2016 35
TCN in Detail Sojourn time measurement Enqueue: attach a metadata to each packet to store the enqueue time Dequeue: calculate sojourn time Instantaneous ECN marking Compare the per-packet instantaneous sojourn time with a static threshold (??? ?) Marking does not cause any bubble on the link ACM CoNEXT, Irvine, CA, December 2016 36
TCN vs CoDel Advantages of TCN Stateless: cheaper to implement in hardware Instantaneous: faster reaction to busty traffic ACM CoNEXT, Irvine, CA, December 2016 37
TCN vs CoDel Advantages of TCN Stateless: cheaper to implement in hardware Instantaneous: faster reaction to busty traffic Simplicity of TCN Unique Characteristics of Data Centers ACM CoNEXT, Irvine, CA, December 2016 38
TCN vs CoDel Advantages of TCN Stateless: cheaper to implement in hardware Instantaneous: faster reaction to busty traffic Simplicity of TCN Small number of concurrent large flows Relatively stable RTTs Prior knowledge of transport at the end host ACM CoNEXT, Irvine, CA, December 2016 39
Testbed Evaluation TCN software prototype Linux qdisc kernel module on a multi-NIC server Testbed setup 9 servers are connected to a software switch End-hosts use DCTCP as the transport protocol ECN schemes compared Per-queue RED with the standard threshold CoDel 40
Static Flow Experiment 1 flow (500Mbps) SP/WFQ 1 flow high prio Q1 Q2 w=1 (low) Q3 w=1 (low) 4 flows 41
Static Flow Experiment TCN preserves the scheduling policy ACM CoNEXT, Irvine, CA, December 2016 42
Dynamic Flow Experiment 8 senders to 1 receiver (web search workload) SP/WFQ scheduling policy at the switch Traffic SP/WFQ (0, 100KB] flows of all services high prio w=1 (low) (100KB, ) flows of service 1 (100KB, ) flows of service 2 w=1 (low) w=1 (low) (100KB, ) flows of service 3 (100KB, ) flows of service 4 w=1 (low) 43
99th FCT of Small Flows (<100KB) TCN maintains the low buffer occupancy ACM CoNEXT, Irvine, CA, December 2016 44
Realistic Traffic: Large Flows (>10MB) TCN achieves high throughput ACM CoNEXT, Irvine, CA, December 2016 45
Conclusion TCN: a simple ECN solution for data centers Use sojourn time as the congestion signal (CoDel) Perform instantaneous ECN marking (DCTCP) Code: http://sing.cse.ust.hk/projects/TCN Next step: TCN in programmable hardware ACM CoNEXT, Irvine, CA, December 2016 46
Thanks! 47
Average FCT of Small Flows (<100KB) ACM CoNEXT, Irvine, CA, December 2016 48