Data Center TCP (DCTCP) Analysis and Solutions

data center tcp dctcp mohammad alizadeh albert n.w
1 / 28
Embed
Share

Explore the research on Data Center TCP (DCTCP) presented by Mohammad Alizadeh, Albert Greenberg, and others. The content covers the problem statement, design goals, DCTCP algorithm, analysis, evaluation, conclusions, and more. Learn about the challenges faced by TCP in data centers, incast issues, case studies like Microsoft Bing, workload characteristics, deadlines, worker nodes, and impairments. Discover innovative solutions and insights into optimizing data center packet transport.

  • Data Center TCP
  • DCTCP
  • Mohammad Alizadeh
  • Albert Greenberg
  • Cloud Computing

Uploaded on | 1 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Data Center TCP (DCTCP) Mohammad Alizadeh, Albert Greenberg, David A. Maltz, Jitendra Padhye Parveen Patel, Balaji Prabhakar, Sudipta Sengupta, Murari Sridharan Presented and Adapted by Yikai Lin EECS 582 W16 1

  2. Outline 1. Problem Statement 2. Design Goals 3. DCTCP Algorithm 4. Analysis 5. Evaluation 6. Conclusions 7. Q&A EECS 582 W16 2

  3. Data Center Packet Transport Cloud computing service provider Amazon, Microsoft, Google Transport inside the DC TCP rules (99.9% of traffic) How s TCP doing? EECS 582 W16 3

  4. TCP in the Data Center We ll see TCP does not meet demands of apps. Incast Suffers from bursty packet drops Builds up large queues: Adds significant latency. Wastes precious buffers, esp. bad with shallow-buffered switches. Operators work around TCP problems. Ad-hoc, inefficient, often expensive solutions Our solution: Data Center TCP EECS 582 W16 4

  5. Case Study: Microsoft Bing Measurements from 6000 server production cluster Instrumentation passively collects logs Application-level Socket-level Selected packet-level More than 150TB of compressed data over a month EECS 582 W16 5

  6. Workloads Partition/Aggregate (Query) Delay-sensitive Short messages [50KB-1MB] (Coordination, Control state) Delay-sensitive Large flows [1MB-50MB] (Data update) Throughput-sensitive EECS 582 W16 6

  7. TLA Deadline = 250ms Picasso Time is money Strict deadlines (SLAs) MLA Deadline = 50ms MLA Missed deadline Lower quality result It is your work in life that is the The chief enemy of creativity is Inspiration does exist, I'd like to live as a poor man Art is a lie that makes us Computers are useless. Deadline = 10ms Everything you can imagine is real. Bad artists copy. Good artists steal. ultimate seduction. good sense. but it must find you working. with lots of money. realize the truth. They can only give you answers. EECS 582 W16 Worker Nodes EECS 582 W16 7

  8. Impairments Incast Queue Buildup Buffer Pressure EECS 582 W16 8

  9. Incast Synchronized mice collide. Caused by Partition/Aggregate. Worker 1 Aggregator Worker 2 Worker 3 RTOmin = 300 ms Worker 4 TCP timeout EECS 582 W16 9

  10. Queue Buildup Sender 1 Receiver Measurements in Bing cluster For 90% packets: RTT < 1ms For 10% packets: 1ms < RTT < 15ms Sender 2 EECS 582 W16 10

  11. Data Center Transport Requirements 1. High Burst Tolerance Incast due to Partition/Aggregate is common. 2. Low Latency Short flows, queries 3. High Throughput Large file transfers EECS 582 W16 11

  12. Balance Between Requirements High Throughput Low Latency High Burst Tolerance Deep Buffers: Queuing Delays Increase Latency Low Queue Occupancy & High Throughput Shallow Buffers: Bad for Bursts & Throughput Objective: DCTCP AQM RED: Avg Queue Not Fast Enough for Incast Reduced RTOmin (SIGCOMM 09) Doesn t Help Latency EECS 582 W16 12

  13. Review: The TCP/ECN Control Loop Sender 1 ECN = Explicit Congestion Notification ECN Mark (1 bit) Receiver Sender 2 EECS 582 W16 13

  14. Two Key Ideas 1. React in proportion to the extent of congestion, not its presence. Reduces variance variance in sending rates, lowering queuing requirements. ECN Marks TCP DCTCP 1 0 1 1 1 1 0 1 1 1 Cut window by 50% Cut window by 40% 0 0 0 0 0 0 0 0 0 1 Cut window by 50% Cut window by 5% 2. Mark based on instantaneous queue length. Fast feedback to better deal with bursts. EECS 582 W16 14

  15. Data Center TCP Algorithm B K Don t Mark Mark Switch side: Mark packets when Queue Length > K. Queue Length > K. Sender side: Maintain running average of fractionof packets marked ( ). In each RTT: Adaptive window decreases: Note: decrease factor between 1 and 2. EECS 582 W16 15

  16. DCTCP in Action (Kbytes) Setup: Win 7, Broadcom 1Gbps Switch Scenario: 2 long-lived flows, K = 30KB EECS 582 W16 16

  17. Why it Works 1.High Burst Tolerance Large buffer headroom Large buffer headroom bursts fit. Aggressive marking Aggressive marking sources react before packets are dropped. 2. Low Latency Small buffer occupancies Small buffer occupancies low queuing delay. 3. High Throughput ECN averaging ECN averaging smooth rate adjustments, cwind low variance. EECS 582 W16 17

  18. Analysis Window Size W*+1 W* (W*+1)(1- /2) Time EECS 582 W16 18

  19. Analysis Packets sent in this RTT are marked. Window Size W*+1 W* (W*+1)(1- /2) Time EECS 582 W16 19

  20. Analysis How low can DCTCP maintain queues without loss of throughput? How do we set the DCTCP parameters? Need to quantify queue size oscillations (Stability). bandwidth-delay product 85% Less Buffer than TCP EECS 582 W16 20

  21. Evaluation Implemented in Windows stack. Real hardware, 1Gbps and 10Gbps experiments 90 server testbed 90 server testbed Broadcom Triumph Broadcom Triumph 48 1G ports 48 1G ports 4MB shared memory Cisco Cat4948 Cisco Cat4948 48 1G ports 48 1G ports 16MB shared memory Broadcom Scorpion Broadcom Scorpion 24 10G ports 24 10G ports 4MB shared memory 4MB shared memory 16MB shared memory 4MB shared memory Numerous benchmarks Throughput and Queue Length Throughput and Queue Length Multi Multi- -hop hop Queue Buildup Queue Buildup Buffer Pressure Buffer Pressure Fairness and Convergence Incast Static vs Dynamic Buffer Mgmt EECS 582 W16 21

  22. Evaluation Background Flows Query Flows EECS 582 W16 22

  23. Evaluation Background Flows Query Flows Low latency for short flows. EECS 582 W16 23

  24. Evaluation Background Flows Query Flows Low latency for short flows. High throughput for long flows. EECS 582 W16 24

  25. Evaluation Background Flows Query Flows Low latency for short flows. High throughput for long flows. High burst tolerance for query flows. EECS 582 W16 25

  26. Conclusions DCTCP satisfies all our requirements for Data Center packet transport. Handles bursts well Handles bursts well Keeps queuing delays low Keeps queuing delays low Achieves high throughput Achieves high throughput Features: Very simple change to TCP and a single switch parameter K. Very simple change to TCP and a single switch parameter K. Based on ECN mechanisms already available in commodity switch Based on ECN mechanisms already available in commodity switch.. EECS 582 W16 26

  27. Discussions 1. Will DCTCP perform worse in the internet? 2. How about SDN using fine-grained TE? OpenTCP? 3. Not compatible with SACK, which reduces the # of ACKs 4. Convergence really doesn t matter? 5. RTT-fairness (favors small RTT flows)? See Analysis of DCTCP: Stability, Convergence, and Fairness EECS 582 W16 27

  28. Q&A EECS 582 W16 28

More Related Content