Exploring Management Plane Analytics in Networking
In the realm of networking, Management Plane Analytics (MPA) plays a crucial role in defining and configuring network structures, analyzing the impact of management practices on network health, and predicting network issues. This article delves into the significance of analyzing the management plane, the disagreement among experts on its impact, classes of management practices, and inferring practices to improve network health using predictive models.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
Management Plane Analytics Aaron Gember-Jacobson, Wenfei Wu, Xiujun Li, Aditya Akella, Ratul Mahajan 1
Important network planes Management plane Defines the network s physical structure Configures the control plane Analyze using ??? Control plane Computes routes Analyze using traceroute, Rocketfuel, pathchar, pathload, etc. Data plane Forwards packets 2
Why analyze the management plane? http://popsci.com/network-outages-nyses-united-airlines-are-new-natural-disasters Good management practices are important! Does a network management practice impact network health (i.e., problem frequency)? 3
Disagreement among experts To what extent does a management practice impact the frequency/severity of problems? No Low Medium High Unsure 35 30 25 20 15 10 5 0 No. of devices No. of protocols No. of change events Avg. devices changed per event Frac. Events automated Frac. events w/ ACL change 4
Management plane analytics (MPA) MPA framework Inventory Practices that cause poor health Quantify management practices and network health Analyze relationships Configs Apply to 850+ networks from a large online service provider Predictive model Tickets 5
Outline Motivation How do we 1. Quantify an organization s practices? 2. Identify which practices impact network health? 3. Predict network health given a set of practices? 6
Classes of management practices 1. Design practices long-term decisions about network structure # of devices, roles, models routing protocols, size of routing domains, 2. Operational practices day-to-day activities that address emerging needs frequency of config changes, fraction automated, types of stanzas changed, Practices not directly logged! 7
Inferring management practices + Quantify on a monthly basis Configs Inventory Practices (28) Discretize into equal-width bins Health (# of tickets) Tickets Data from 850+ networks for 17 months 8
Outline Motivation How do we 1. Quantify an organization s practices? 2. Identify which practices impact network health? 3. Predict network health given a set of practices? 9
Statistical dependencies Challenge: identify causal relationships 10
Experimental design Treatment Outcome causes Practice Health Other practices Confounding factors Randomized experiment Quasi-experimental design (QED) [Krishnan et al. IMC 12, IMC 13] 11
Propensity score matching Randomized Pre-defined Randomized Want randomized Treatment # Models # Roles # Changes # Tickets 3 2 3 2 3 2 5 2 5 2 5 2 5 3 5 4 Propensity score = predicted probability (Treatment = yes | ConfoundingPractices = ) Confounding Outcome Compare cases from population samples where distribution of confounding factor values are similar Treated 6 1 1 2 2 2 2 2 3 0.5 0.5 0.3 0.5 0.5 0.3 0.3 11 2 6 12 1 1 18 Untreated 0 12
Test for causality Can we reject? H0: median = 0 Treatment # Models # Roles # Changes # Tickets 3 2 3 2 3 2 Confounding Outcome 0.5 0.5 0.3 0.5 0.5 0.3 0.3 6 1 1 2 Sign-test p-value < 10-3? 11 2 # of pairs 5 2 6 2 5 5 5 2 2 3 12 1 1 2 2 2 < 0 0 > 0 1 - 2 = -1 2 - 2 = -0 13
Causal relationships Agrees with operators Practice No. of change events p-value 1.05 x 10-12 5.75 x 10-12 2.99 x 10-10 9.10 x 10-9 1.92 x 10-8 Discredits belief that impact is low No. of change types No. of roles Frac. events w/ ACL change No. of devices Avg. devices changed per event 3.56 x 10-8 Operators had mixed beliefs 1.31 x 10-7 6.46 x 10-6 No. of models No. of VLANs Frac. events w/ interface change 5.27 x 10-3 < 10-3 1.53 x 10-2 Intra-device complexity 14
Outline Motivation How do we 1. Quantify an organization s practices? 2. Identify which practices impact network health? 3. Predict network health given a set of practices? 15
Predicting network health Build decision trees using machine learning + Model arbitrary boundaries + Easy to understand Challenge: heavy skew in practices and health 73% 16
Addressing skew Oversampling repeat minority class examples during training x2 Boosting in each iteration, increase the weight of examples that were misclassified using the prior model 17
Model accuracy Overall accuracy: 81% 91% with 2-classes 1 Majority predictor Decision tree (DT) 0.8 DT with oversampling and boosting (MPA) Precision 0.6 0.4 0.2 0 Excellent Good Moderate Health Class Poor Very Poor 18
Conclusion Management plane analysis is important MPA framework 1) Determine which practices cause a decline in health 2) Construct a predictive model of health based on practices Results from an OSP with 850+ networks http://github.com/agember/mpa 19