Sketch Resource Allocation for Software-defined Measurement in Network Management

Slide Note
Embed
Share

Measurement plays a crucial role in network management, especially for tasks like heavy hitter detection and anomaly detection. The focus is on sketch-based measurement, using innovative techniques like Count-Min Sketch to approximate specific queries efficiently. Challenges include limited resources for multiple tasks and the need to balance accuracy and resource allocation effectively.


Uploaded on Sep 24, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. SCREAM: Sketch Resource Allocation for Software-defined Measurement (CoNEXT 15) Masoud Moshref, Minlan Yu, Ramesh Govindan, Amin Vahdat

  2. Measurement is Crucial for Network Management Network Management on multiple tenants: Traffic Engineering Engineering Anomaly Detection detection Traffic DDoS Accounting Anomaly Detection Need fine-grained visibility of network traffic Measurement tasks: Heavy Hitter detection Heavy hitter detection (HH) Heavy Hitter detection Hierarchical heavy hitter detection (HHH) Change detection Super source detection (SSD) 2

  3. Software Defined Measurement Task 1 Task 2 DREAM [SIGCOMM 14] / SCREAM [CoNEXT 15] Controller Configure Collect Switch A Task 1 counters Switch B Task 1 counters Task 2 counters Task 2 counters 3

  4. Our Focus: Sketch-based Measurement Summaries of streaming data to approximately answer specific queries E.g., Bitmap for counting unique items OpenFlow Counters DREAM [SIGCOMM 14] Sketches SCREAM [CoNEXT 15] Memory Expensive, power-hungry TCAM Cheaper SRAM Counters Volume counters Volume and Connection counters Flows Selected prefixes All traffic all-the-time Sketches use a cheaper memory and are more expressive 4

  5. Sketch Example: Count-Min Sketch At packet arrival: h1(IP) h2(IP) 2+1=3 4+1=5 (IP, 1 Kbytes) d h3(IP) 1+1=2 At query: What is the traffic size of IP? = row with min collision = Min(3,5,2) = 2 Resource accuracy trade-off: Provable error bound given traffic properties (e.g., skew) 5

  6. Challenges: Limited Counters for Many Tasks Limited shared resources: SRAM capacity (e.g., 128 MB) Shared with other functions (e.g., routing) Too many resources to guarantee accuracy: 1 MB-32 MB per task Less than 4-128 tasks in SRAM Many task instances: 3 types (Heavy hitter, Hierarchical heavy hitter, Super source) Different flow aggregates (Rack, App, Src/Dst/Port) 1000s of tenants 6

  7. Goal: Many Accurate Sketch-based Measurements Users dynamically instantiate a variety of measurement tasks SCREAM supports the largest number of measurement tasks while maintaining measurement accuracy 7

  8. Approach: Dynamic Resource Allocation Resource accuracy trade-off depends on traffic Count Min: Provable error bound given traffic properties Ex: Skew of traffic from each IP Worst-case uses >10x counters than average Required memory Skew Dynamic allocation for current traffic 8

  9. Opportunity: Temporal Multiplexing Memory requirement varies over time Task 1 Task 2 Required Memory Time Multiplex memory among tasks over time 9

  10. Opportunity: Spatial Multiplexing Memory requirement varies across switches Task 1 Task 2 Required Memory Switch A Switch B Multiplex memory among tasks across switches 10

  11. Key Insight Leverage spatial and temporal multiplexing and dynamically allocate switch memory per task to achieve sufficient accuracy for many tasks DREAM has the same insight SCREAM applies it for sketches 11

  12. SCREAM Contributions 2- Allocate memory among sketch-based task instances acrossswitches while maintainingsufficient accuracy SCREAM Dynamic resource allocator Allocation Heavy hitter (HH) tasks Hierarchical heavy hitter (HHH) tasks Super Source (SSD) tasks 1- Supports 3 sketch-based task types Anomaly detection Traffic engineering DDoS detection 12

  13. SCREAM Iterative Workflow Collect & report Counters from many switches Estimate accuracy Accuracy Allocate resources Memory size 13

  14. SCREAM Iterative Workflow 50 Allocated Memory (KB) 100 40 Collect & report 80 Precision Accuracy 60 30 40 20 Task 1 Task 1 Task 2 Task 2 20 10 0 0 20 20 40 40 60 60 0 Estimate accuracy Time (s) Time (s) 50 Allocated Memory (KB) 50 Task1 accuracy <80% Allocated Memory (KB) 40 40 30 30 20 Task 1 Task 2 20 Task 1 Task 2 Allocate resources 10 10 0 20 20 40 40 60 60 Give more memory to task1 0 Time (s) Time (s) 14

  15. SCREAM Iterative Workflow 100 Collect & report Merge counters from switches 80 Precision Accuracy 60 40 Task 1 Task 2 20 0 0 20 40 60 Estimate accuracy Skew of traffic for task2 changes Task2 accuracy <80% Time (s) 50 50 Allocated Memory (KB) Allocated Memory (KB) 40 40 30 30 20 20 Task 1 Task 2 Task 2 Task 1 Allocate resources Give more memory to task2 10 10 0 0 20 20 40 40 60 60 Time (s) Time (s) 15

  16. SCREAM Challenges Collect & report Network-wide task implementation using sketches Estimate accuracy Accuracy estimation without the ground-truth Allocate resources Fast & Stable allocation in DREAM [SIGCOMM 14]

  17. Challenge: Merge Sketches of Different Sizes Network-wide Task Heavy hitter (HH) Source IPs sending > 10Mbps 25 10 15 Switch A Switch B d d w1 w2 17

  18. SCREAM Solution to Merge Sketches for HH Detection Previous work: Min of sums SCREAM: Sum of mins + + + 10 40 50 70 20 30 30 70 40 50 20 10 Min Min 50 80 90 25 Min + 10 20 50 30 10 15 Switch A Switch B 10 40 30 50 Both over-approximate smaller is more accurate 70 20 18

  19. SCREAM Solutions Collect & report Network-wide task implementation using sketches Merge sketches of different sizes for HH, HHH, SSD SSD algorithm with higher and more stable accuracy Estimate accuracy Accuracy estimation without the ground-truth Allocate resources Fast & Stable allocation in DREAM [SIGCOMM 14]

  20. Precision Estimation for Heavy Hitter Detection True detected HH Detected HHs Insight: Relate probability to Error on counters of detected HHs = Sum(P[Detected HH is true]) Precision = Estimate-Threshold Error Estimate-Threshold Threshold Estimated Real True HH False HH P[Detected HH is true] = 1 - P[Error Estimate-Threshold] 20

  21. Precision Estimation Step 1: Find a Bound on The Error Idea 1: Use average Error in Markov s inequality to bound it Insight: Relate probability to Error on counters of detected HHs Idea 1 P[Detected HH is true] = 1 - P[Error Estimate-Threshold] 21

  22. Precision Estimation Step 2: Improve The Bound Idea 2 A row in Count-Min: Idea 1 Insight: Average Error = heavy items collision + small items collision Counter indices of detected HHs show heavy collisions Idea 2: Markov s inequality only for small items 22

  23. SCREAM Solutions Collect & report Network-wide task implementation using sketches Merge sketches of different sizes for HH, HHH, SSD SSD algorithm with higher and more stable accuracy Estimate accuracy Accuracy estimation without the ground-truth Precision estimators for HH, HHH and SSD tasks Allocate resources Fast & Stable allocation in DREAM [SIGCOMM 14]

  24. SCREAM Solutions Collect & report Network-wide task implementation using sketches Merge sketches of different sizes for HH, HHH, SSD SSD algorithm with higher and more stable accuracy Estimate accuracy Accuracy estimation without the ground-truth Precision estimators for HH, HHH and SSD tasks Allocate resources Fast & Stable allocation in DREAM [SIGCOMM 14]

  25. Evaluation Metrics: Satisfaction of a task: Fraction of task s lifetime with sufficient accuracy % of rejected tasks Alternatives: OpenSketch: Allocate for bounded error for worst-case traffic at task instantiation (test with different bounds) Oracle: Knows required resource for a task in each switch in advance 25

  26. Evaluation Setting Simulation for 8 switches: 256 task instances (HH, HHH, SSD, combination) Accuracy bound = 80% 5 min tasks arriving in 20 minutes 2 hours CAIDA trace 26

  27. SCREAM Provides High Accuracy for More Tasks SCREAM: High satisfaction and low reject 100 100 OS_10 OS_50 OS_90 SCREAM Average Satisfaction 80 Rejected tasks (%) 80 60 60 40 OS_10 OS_50 OS_90 SCREAM 40 20 20 0 0 128 256 384 512 128 256 384 512 Switch capacity (KB) Switch capacity (KB) OpenSketch: Loose bound Under provision low satisfaction Tight bound Over provision high reject 27

  28. SCREAMs Performance Is Close to An Oracle 100 100 Oracle SCREAM Average Satisfaction 80 80 Rejected tasks (%) 60 60 40 40 20 20 Oracle SCREAM 0 0 128 256 384 512 128 256 384 512 Switch capacity (KB) Switch capacity (KB) SCREAM performance is close to an oracle, its satisfaction is a bit lower because: Iterative allocation takes time Accuracy estimation has error 28

  29. Other Evaluations Changing traffic skew SCREAM supports more accurate tasks than OpenSketch Accuracy estimation error SCREAM accuracy estimation has 5% error in average Other accuracy metrics Tasks in SCREAM have high recall (low false negative) 29

  30. Conclusion Measurement is crucial for SDN management in a resource-constrained environment Practical sketch-based SDM by dynamic memory allocation Implementing network-wide tasks using sketches Estimating accuracy for 3 types of tasks SCREAM is available at github.com/USC-NSL/SCREAM 30

  31. Thanks! Questions? 31

Related