Understanding LSA Scaling and Performance in Network Service Orchestration

Slide Note

Explore the concepts of Link State Advertisements (LSA) and how they impact scaling, performance, and reliability in Network Service Orchestration (NSO). Learn about different cluster models, terminology, and strategies for achieving scalability with LSA, along with comparisons between LSA clusters and device clusters.

dem_k Follow

Uploaded on Oct 08, 2024 | 0 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript

LSA, Scaling & Performance Johan Bevemyr, Cisco NSO Developer Days Stockholm 2018

Agenda Introduction Scaling with LSA Designing for LSA Cloud Scaling Cloud Scaling with HA LSA in NSO 5.0 Customer use cases

Terminology Throughput maximum rate of requests being processed Response time The time taken to respond to a request Scalability The capability to manage a large network; number of devices and services Reliability The capability to function for a particular amount of time Different characteristics

Device Cluster Service NSO Throughput -- Response time - Scalability + Reliability + - Mapping logic D2 D5 D1 Device NSO Device NSO D2 D3 D5 D6 D1 D4 D1 D2 D3 D4 D5 D6 Device Cluster was the first scalability cluster model for NSO Evolve to LSA Factor 5 of performance hit in NSO-NSO communication Used to localize NSO close to devices

LSA Cluster Higher Service Model Upper NSO Throughput + Response time - Scalability + Reliability + - Logic + Dispatching n1 n2 LowerService Model Lower NSO (n1) Lower NSO (n2) Logic Logic D2 D3 D5 D6 D1 D4 Device Model D1 D2 D3 D4 D5 D6 Service Code impacted Split the devices across NSO nodes Parallelism Always combined with commit queues to gain performance

LSA Cluster vs Device Cluster Device Cluster Top node has device meta-data. Sees all devices. Top node does not see the south NSO node. The dispatching of device operations is managed by NSO LSA Cluster Top node has no device knowledge Top node sees the south NSO nodes as another NSO. Service code is not impacted by the deployment Service code need to be split between the NSO nodes. Extra code required. High penalty in NSO NSO communication Much less penalty since the only communication is the service diff. Tight coupling between all NSO nodes Loose coupling between NSO nodes, easier upgrade and operation. No parallelism in service execution Service code can be executed in parallel on lower NSO nodes.

Scaling with LSA Two types of scaling is achieved 1. Number of managed devices, ie deployment size 2. Service execution in parallel, ie throughput a) Service create b) Minimal diff calculation c) Device communication

Scaling New lower nodes are added when needed Multiple layers can be added Upper node becomes a load balancer and resource coordinator. Higher Service Model Upper NSO Logic + Dispatching n1 n2 LowerService Model Lower NSO (n1) Lower NSO (n2) Logic Logic D2 D3 D5 D6 D1 D4 Device Model D1 D2 D3 D4 D5 D6

Maintenance NSO nodes can be upgrade independently as long as RFS interface is backwards compatible. This allows rolling upgrades, and even independent release cycles of different nodes. Upper LSA CFS RFS N1 N2 N1 N2 Lower LSA Lower LSA RFS RFS D1 D2 D3 D4

Designing for LSA Use the Stacked Services pattern Make sure no information is propagated across service barriers, ie no direct reading of data from a higher layer to a lower layer, or jumping across service layers Consider where orchestration and resource allocation needs to happen A simple and clean design helps Device knowledge is often needed on top node, eg dispatch map and placement algorithm

Single node NSO CFS RFS RFS RFS RFS Device RFS Device RFS Device RFS Device RFS

Two layer LSA Upper LSA Adds some complexity to application Error recover becomes more complex CFS RFS RFS Lower LSA Lower LSA RFS RFS Device RFS Device RFS Device RFS Device RFS

LSA: splitting RFM service across layers Some acrobatics is needed when splitting a RFM service On a single node a kicker can be used directly, across an LSA boundary we need special packages to both create a kicker on the lower node, and to generate custom notifications. The same namespace cannot be used on both nodes, resulting in duplication.

CDM simplifies LSA (NSO 5.0) Manage lower node through top node Devices Auth groups Actions, eg sync-from, fetch-ssh-keys etc. Easier to split applications between upper and lower node Configure kickers directly on lower LSA node Notification Kickers (see web-server-farm/web-site-service)

LSA model for cloud scaling Limitations of regular LSA: Difficult to move device between lower LSA nodes Shrinking lower LSA layer is tricky Goal: Easy to add new lower LSA node when needed Easy to remove node when no longer needed Easy to move device between nodes

LSA model for cloud scaling devices { device ex0 { address 127.0.0.1; port ssh { ... /* Refcount: 1 */ /* Backpointer: [ /drfs:d...1'] ] */ interface eth3 { ... } ... } } dRFS { device ex0 { vlan v1 { private { ... } } } } 12022; Solution Design LSA services such that: Only RFSs related to a single device (dRFS) on lower LSA All RFSs for a given device in the same subtree

Initial state Upper LSA Dispatch-map D1 -> N1 D2 -> N1 D3 -> N1 D4 -> N2 sRFS N1 N2 N1 N2 Lower LSA Lower LSA dRFS D3 D1 D2 D3 D4 D3 D1 D2 D4

Moving a device from L1 to L2 Move D3 from N1 to N2: 1. Lock N1 (partial lock) on Upper 2. Extract device and dRFS config for D3 from N1 3. Install config on N2 4. Delete D3 form N1 5. Update device-map 6. Release Partial Lock N1 7. Redeploy sRFS (by following backpointers) no-networking, no-lsa 8. compare-config N1, N2 Device-map D1 -> N1 D2 -> N1 D3 -> N2 D4 -> N2 Upper LSA sRFS N1 N2 N1 N2 Lower LSA Lower LSA dRFS D3 D1 D2 D3 D4 D3 D1 D2 D4

Example 24-layered-service-architecture-scaling Structure LSA for easy re-balancing of devices between lower LSAs. Advantages: Easier to grow or shrink size of lower LSA layer Balance resource utilization Easier to recover from failure of a node Simplifies maintenance, evacuate devices from a node Option to simplify HA

Example 24-LSA-scaling (cont.) Disadvantages: Restrictions on what a service can do on lower layer Only single-device services No RFM Useful actions: move-device rebalance evacuate

RFM on 24-LSA-scaling Problems: State needs to be moved with RFM service Event triggers needs to be moved No lingering references on source node when moving Difficult to solve in the general case, but possible for specific service designs.

Cloud scaling RFM Key properties Place all config and state in subtrees keyed by device name Triggers must be present on all LSA devices Extract config and stats using new CONFIG_CDB_OPER option Krunal Patel modified nfvo package to support cloud scaling LSA.

Simplified HA Traditionally each node is an HA pair for redundancy Data is replicated between pairs Upper LSA Upper LSA sRFS L1 L2 L1 L2 Lower LSA Lower LSA Lower LSA Lower LSA dRFS D3 D1 D2 D3 D4 D3 D1 D2 D4

Simplified HA (cont) Lower nodes are single nodes. Data is replicated to central DB. Upper LSA Upper LSA sRFS Advantage: Easy to scale Easier to maintain L1 L2 DB L1 L2 Lower LSA Lower LSA Disadvantages: Less secure Does not work well with commit queues. Slower Kicker Kicker dRFS D3 D1 D2 D3 D4 D3 D1 D2 D4