
VL2: A Scalable and Flexible Data Center Network Overview
The presentation discusses VL2, a solution for the agility challenges in conventional data center networks by merging layer 2 and layer 3 into a virtual layer 2. It utilizes flat addressing, Valiant Load Balancing, and TCP to ensure performance isolation and high capacity between servers. The study findings indicate VL2's ability to provide high capacity and agility through layer-2 semantics.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
VL2: A Scalable and Flexible data Center Network CS538 10/23/2014 Presentation by: Soteris Demetriou Scribe: Cansu Erdogan
Credits Some of the slides were used in their original form or adjusted from Assistant Professor s Hakim Weatherspoon (Cornell) Those slides are annotated with a * on the top right hand side of the slide.
Paper Details Title VL2: A Scalable and Flexible Data Center Network Authors Albert Greenberg, James R. Hamilton, Navendu Jain, Srikanth Kandula, Changhoon Kim, Parantap Lahiri, David A. Maltz, Paveen Patel, Sudipta Sengupta Microsoft Research Venue Proceedings of the ACM SIGCOMM 2009 conference on Data communication Citations 918
Overview Problem: Conventional Data Center Networks do not provide agility. I.e assigning any service to any server efficiently is challenging. Approach: Merge layer 2 and layer 3 into a virtual layer 2 (VL2). How? Use of flat addressing to provide Layer-2 semantics, Valiant Load Balancing for uniform high capacity between servers and TCP to ensure performance isolation. Findings: VL2 can provide uniform high capacity between any 2 servers, performance isolation between services and agility through layer-2 semantics
Outline Background Motivation Measurement Study VL2 Evaluation Summary and Discussion
Outline Background Motivation Measurement Study VL2 Evaluation Summary and Discussion
Clos Topology Multistage circuit switching network Usage: when the physical network exceeds the capacity of the largest crossbar switch (MxN)
Clos Topology Middle Stage Ingress Stage Egress Stage
Clos Topology m n r
Traffic Matrix Traffic rate between a node and every other node E.g network with N nodes Each node N connects to every other node NxN flows NxN representative matrix Valid: a valid Traffic Matrix is one that ensures no node is oversubscribed
Valiant Load Balancing Keslassy et al. proved that uniform Valiant load-balancing is the unique architecture which requires the minimum node capacity in interconnecting a set of identical nodes Zhang-Shen et al used it to design a predictable network backbone Intuition: It is much easier to estimate the aggregate traffic entering and leaving a node than to estimate a complete traffic matrix (traffic rate from every node to every other node) A valid traffic matrix approach where an ingress to egress path is used, requires link capacity = node capacity (= r) VLB load balances traffic among any two-hop paths Link capacity among any 2 nodes: r/N + r/N = r * 2/N Zhang-Shen, Rui, and Nick McKeown. "Designing a predictable Internet backbone network." HotNets, 2004.
Valiant Load Balancing Backbone network Point of Presence Access Network Zhang-Shen, Rui, and Nick McKeown. "Designing a predictable Internet backbone network." HotNets, 2004.
Outline Background Motivation Measurement Study VL2 Evaluation Summary and Discussion
* Conventional Data Center Network Architecture
* DCN Problems CR CR AR AR AR AR S S S S 1:240 I have spare ones, but I want more S S S S 1:80 . . . S S S S 1:5 Poor server to server connectivity Traffics affects each other Poor reliability and utilization Static network assignment Fragmentation of resource
* End Result CR CR 1. L2 semantics . . . AR AR AR AR 2. Uniform high capacity 3. Performance isolation S S S S . . . S S S S S S S S 16
* Objectives Uniform high capacity: Maximum rate of server to server traffic flow should be limited only by capacity on network cards Assigning servers to service should be independent of network topology Performance isolation: Traffic of one service should not be affected by traffic of other services Layer-2 semantics: Easily assign any server to any service Configure server with whatever IP address the service expects VM keeps the same IP address even after migration
Outline Background Motivation Measurement Study VL2 Evaluation Summary and Discussion
Methodology Setting design objectives Interview stakeholders to derive a set of objectives Deriving typical workloads Measurement study on traffic patterns Data-center traffic analysis Flow distribution analysis Traffic matrix analysis Failure characteristics
Measurement Study 2 main questions Who sends how much data, to whom and when How often does the state of the network change due to changes in demand, or switch/link failures and recoveries Studied production data centers of a large cloud provider
Data-Center Traffic Analysis Setting Instrumentation of a highly utilized cluster in a data-center Cluster of 15,000 nodes Center supports data mining on PB of data Servers are distributed roughly evenly among 75 ToR (Top of Rack) switches which are connected hierarchically
Data-Center Traffic Analysis Ratio of traffic volume between servers in the data-center with traffic entering/leaving the data-center is 4:1 Bandwidth demand between servers inside a data-center shows grows faster than bandwidth demand to external hosts Network is the bottleneck of computation
Flow Distribution Analysis Majority of flows are small (few KB) in par with Internet flows Why? Mostly hellos and meta-data requests to the distributed file system Almost all bytes (>90%) are transported in flows of 100MB to 1GB size Mode is around 100MB Why? The distributed file system breaks long files into 100-MB chunks Flows over a few GB are rear
Flow Distribution Analysis 2 modes >50% of the time, an average machine has ~10 concurrent flows At least 5% of the time it has >80 concurrent flows Implies that randomizing path selection at flow granularity will not cause perpetual congestion in case of unlucky placement of flows
Traffic Matrix Analysis Poor summarizibility of traffic patterns Even when approximating with 50-60 clusters, fitting error remains high (60%) Engineering for just a few traffic matrices is unlikely to work well for real traffic in data centers Instability of traffic patterns Traffic pattern shows no periodicity that can be exploited for prediction
Failure Characteristics 1/2 Failure definition The event that occurs when a system or component is unable to perform its required function for more than 30s Most failures are small in size 50% of network failures involve < 4 devices 95% of network failures involve < 20 devices Downtimes can be significant 95% are resolved in 10 min 98% in < 1 hour 99.6% in < 1 day 0.09% last > 10 days
Failure Characteristics 2/2 In 0.3% of failures all redundant components in a network device group became unavailable Main causes of downtimes Network misconfigurations Firmware bugs Faulty components No obvious way to eliminate all failures from the top of the hierarchy
Outline Background Motivation Measurement Study VL2 Evaluation Summary and Discussion
* Objectives Methodology: Interviews with architects, developers and operators Uniform high capacity: Maximum rate of server to server traffic flow should be limited only by capacity on network cards Assigning servers to service should be independent of network topology Performance isolation: Traffic of one service should not be affected by traffic of other services Layer-2 semantics: Easily assign any server to any service Configure server with whatever IP address the service expects VM keeps the same IP address even after migration
Design Overview Directory System (resolution service) Layer-2 semantics Flat Addressing Name Location Separation Enforce hose model using existing mechanisms Performance Isolation TCP Guarantee bandwidth for hose-model traffic Valiant Load Balancing Uniform high- capacity Scale-out CLOS topology Solution Approach Objective
Design Overview Randomizing to cope with unpredictability and volatility Valiant Load Balancing: Destination independent traffic spreading across multiple intermediate nodes Clos topology is used to support randomization Proposal of a flow spreading mechanism
Design Overview Building on proven technologies VL2 is based on IP routing and forwarding technologies that are available in commodity switches Link state routing Maintains switch-level topology Does not disseminate end hosts info Equal-Cost Multi-Path forwarding with anycast addresses Enables VLB with minimal control plane messaging
Design Overview Separating names from locators Enables agility rapid VM migration Use of Application addresses (AA) Location Addresses (LA) Directory System for name resolution
VL2 Components Scale-out CLOS topology Addressing and Routing VLB Directory System
* Scale-out topology Bipartite graph Graceful degradation of bandwidth if an IS fails VL2 . . . CLOS Int . . . Aggr . . . . . . . . . TOR . . . . . . . . . . . 20 Servers
Scale-out topology Clos very suitable for VLB By indirectly forwarding traffic through an IS at the top, the network can provide bandwidth guarantees for any traffic matrices subject to the hose model Routing is simple and resilient Need a random path up to an IS and a random path down to a destination ToR
* VL2 Addressing and Routing: name- location separation Allows usage of low-cost switches Protects network and hosts from host-state churn Directory Service Switches run link-state routing and maintain only switch-level topology VL2 x ToR2 y ToR3 z ToR3 x ToR2 y ToR3 z ToR4 . . . . . . . . . ToR1 ToR2 ToR3 ToR4 ToR3 y payload Lookup & Response y y, z z x ToR4 z ToR3 z payload payload Servers use flat names
* VL2 Addressing and Routing: VLB indirection [ ECMP + IP Anycast ] Harness huge bisection bandwidth Avoid esoteric traffic engineering or optimization Ensure robustness to failures Work with switch mechanisms available today IANY IANY IANY Links used for up paths Links used for down paths T1 T2 T3 T4 T5 T6 IANY T3 T5 z y payload payload 1. Must spread traffic 2. Must ensure dst independence Equal Cost Multi Path Forwarding x y z
VL2 Directory System Three key functions Lookups Updates AA to LA mapping Reactive cache update For latency sensitive updates E.g VM during migration Goals Scalability Reliability for updates High lookup performance Eventual consistency (like ARP)
VL2 Directory System RSM RSM Servers 3. Replicate RSM RSM 4. Ack 2. Set (6. Disseminate) . . . . . . . . . Directory Servers DS DS DS 2. Reply 2. Reply 5. Ack 1. Lookup 1. Update Agent Agent Lookup Update
Outline Background Motivation Measurement Study VL2 Evaluation Summary and Discussion
Evaluation Uniform high capacity: All-to-all data shuffle traffic matrix: 75 servers Each delivers 500MB to all others (for a total of 2.7TB shuffle from memory to memory) VL2 completes the shuffle in 395s Aggregate goodput: 58.8Gbps 10x times better than their current data center network Maximal achievable goodput over all flows is 62.3Gbps VL2 network efficiency as 58.8/62.3 = 94%
Evaluation VLB Fairness: 75 node testbed Traffic characteristics as per measurement study All flows pass through the Aggregation switches sufficient to check there for the split ratio among the links to the Intermediate switches Plot Jain s fairness index for traffics to intermediate switches 1.00 Fairness Index 0.98 0.96 Aggr1 Aggr2 Aggr3 0.94 0 100 200 300 400 500 Time (s) VLB split ratio fairness index, averages >0.98% for all ASs
Evaluation Performance isolation: Added two types of services to the network: Service one: 18 servers do single TCP transfer to another server, starting at time 0 and lasting throughout the experiment Service two: Start one server at 60s and assigns a new server every 2s for a total of 19servers Each one starts a 8GB transfer over TCP as soon as it starts up (every 2 seconds) No perceptible change in Service 1, as servers start up on Service 2
Evaluation Performance isolation (cnt d): To evaluate how mice flows (large number of short TCP connections) common in DC affect performance on other services: Service 2: Servers create successively more bursts of short TCP connections (1 to 20 KB) No perceptible change in Service 1 TCP s natural enforcement of the hose model is sufficient to provide performance isolation when combined with VLB and no oversubscription
Evaluation Convergence after link failures 75 servers All-to-all data shuffle Disconnect links between intermediate and aggregation switches Maximum capacity of the network, degrades gracefully Restoration is delayed VL2 fully uses a link, roughly 50s after it is restored Restoration does not interfere with traffic and the aggregate throughput eventually returns to its initial level.
Outline Background Motivation Measurement Study VL2 Evaluation Summary and Discussion
Thats a lot to take in! Take Aways please! Problem: Over-subscription in data-centers and lack of agility
Overall Approach Measurement Study Datacenter workload measurements Stakeholder Interviews Design Application of known techniques when possible Objectives Architecture Evaluation Testbed: includes all design components Evaluation with respect to objectives
Design Overview Directory System (resolution service) Layer-2 semantics Flat Addressing Name Location Separation Enforce hose model using existing mechanisms Performance Isolation TCP Guarantee bandwidth for hose-model traffic Valiant Load Balancing Uniform high- capacity Scale-out CLOS topology Solution Approach Objective