Sketch Resource Allocation for Software-defined Measurement in Network Management

undefined
SCREAM: 
S
ket
c
h 
Re
source 
A
llocation
for Software-defined 
M
easurement
Masoud Moshref, 
Minlan Yu
,
Ramesh Govindan, Amin Vahdat
(CoNEXT’15)
Measurement is Crucial for Network Management
2
Accounting
Anomaly
Detection
Traffic
Engineering
Heavy Hitter detection
Heavy hitter detection (HH)
Change detection
Super source detection (SSD)
DDoS
detection
Anomaly
Detection
Traffic
Engineering
Network Management on multiple tenants:
 
Measurement tasks:
Heavy Hitter detection
Hierarchical heavy hitter detection (HHH)
Need fine-grained visibility of network traffic
Controller
DREAM [SIGCOMM’14] / SCREAM [CoNEXT’15]
Software Defined Measurement
3
Switch A
Task 1 counters
Task 2 counters
Switch B
Task 1 counters
Task 2 counters
Collect
Configure
Task 2
Task 1
Our Focus: Sketch-based Measurement
4
Summaries of streaming data to approximately answer specific queries
E.g., Bitmap for counting unique items
 
 
OpenFlow 
Counters
DREAM
[SIGCOMM’14]
Sketches
 
Memory
 
Expensive,
 
power-hungry
 TCAM
 
Cheaper SRAM
 
Counters
 
Volume counters
 
Volume and
 
Connection counters
 
Flows
 
Selected prefixes
 
All traffic all-the-time
SCREAM [CoNEXT’15]
Sketches use a cheaper memory and are more expressive
Sketch Example: Count-Min Sketch
5
 
(IP, 
1
 Kbytes)
 
h1(IP)
 
h2(IP)
 
h3(IP)
 
What is the traffic size of IP? = row with min collision = Min(3,5,2) = 2
 
d
 
At packet arrival:
 
Provable error bound given traffic properties (e.g., skew)
 
Resource accuracy trade-off:
 
At query:
 
2+
1
=3
 
4+
1
=5
 
1+
1
=2
Challenges: Limited Counters 
for Many Tasks
6
 
Many task instances:
3 types (Heavy hitter, Hierarchical heavy hitter, Super source)
Different flow aggregates (Rack, App, Src/Dst/Port)
1000s of tenants
Limited shared resources:
SRAM capacity (e.g., 
128 MB
)
Shared with other functions (e.g., routing)
 
Too many resources to guarantee accuracy:
1 MB-32 MB 
per task
Less than 4-128 tasks in SRAM
Goal: Many Accurate Sketch-based Measurements
7
Users 
dynamically instantiate 
a variety of measurement tasks
SCREAM supports 
the 
largest
 number 
of measurement
tasks while 
maintaining measurement accuracy
Approach: Dynamic Resource Allocation
8
Resource accuracy trade-off depends on 
traffic
Dynamic allocation for 
current
 
traffic
 
Worst-case
 uses 
>10x
 counters than average
 
Count Min: 
Provable error bound given traffic properties
Ex: 
Skew
 of traffic from each IP
 
Skew
 
Required memory
Opportunity: Temporal Multiplexing
9
Required Memory
Time
Multiplex memory among tasks over time
Memory requirement varies over 
time
Opportunity: Spatial Multiplexing
10
Required Memory
Switch A
Switch B
Memory requirement varies across 
switches
Multiplex memory among tasks across switches
Key Insight
11
Leverage 
spatial and temporal multiplexing
and dynamically 
allocate 
switch memory per task
to achieve 
sufficient accuracy 
for many tasks
DREAM has the same insight
SCREAM applies it for sketches
SCREAM Contributions
12
Heavy hitter
(HH) tasks
Super Source
(SSD) tasks
Dynamic resource allocator
Hierarchical heavy hitter
(HHH) tasks
Allocation
1- Supports 
3 sketch-based task types
 
2- A
llocate
 memory 
among
 
sketch-based task instances
across
 
switches 
while maintaining
 
sufficient accuracy
SCREAM
Anomaly detection
Traffic engineering
DDoS detection
SCREAM Iterative Workflow
13
 
Estimate accuracy
 
Allocate resources
Collect & report
 
Counters from many switches
 
Accuracy
 
Memory size
SCREAM Iterative Workflow
14
 
Task1 accuracy <80%
 
Give more memory to task1
Estimate accuracy
Allocate resources
Collect & report
Accuracy
SCREAM Iterative Workflow
15
Estimate accuracy
Allocate resources
Collect & report
 
Skew of traffic for task2 changes
Task2 accuracy <80%
 
Give more memory to task2
Accuracy
Merge counters from switches
SCREAM Challenges
Estimate accuracy
Allocate resources
Collect & report
 
Network-wide task implementation using sketches
 
Accuracy estimation without the ground-truth
 
Fast & Stable allocation in DREAM [SIGCOMM’14]
Switch B
Switch A
Challenge: Merge Sketches of Different Sizes
17
Network-wide Task
Heavy hitter (HH)
d
d
w1
w2
Source IPs sending > 10Mbps
10
15
25
 
SCREAM Solution to Merge Sketches for HH Detection
18
50
30
 
+
Previous work: Min of sums
 
SCREAM: Sum of mins
Min
Switch B
Switch A
10
15
25
Both over-approximate 
 
smaller is more accurate
SCREAM Solutions
Estimate accuracy
Allocate resources
Collect & report
 
Accuracy estimation without the ground-truth
 
Merge sketches of different sizes for HH, HHH, SSD
SSD algorithm with higher and more stable accuracy
 
Network-wide task implementation using sketches
 
Fast & Stable allocation in DREAM [SIGCOMM’14]
Precision Estimation for Heavy Hitter Detection
20
 
Estimated
 
Real
 
Error
 
Estimate-Threshold
 
Estimate-Threshold
= Sum(P[Detected HH is true])
 
= 1 - P[
Error
 
 
Estimate-Threshold
]
True detected HH
Detected HHs
Precision =
 
Insight:
 Relate probability to 
Error
 on counters of detected HHs
 
P[Detected HH is true]
Precision Estimation Step 1: Find a Bound on The Error
21
Idea 1:
 Use average 
Error 
in Markov’s inequality to bound it
Idea 1
 
= 1 - P[
Error
 
 
Estimate-Threshold
]
 
Insight:
 Relate probability to 
Error
 on counters of detected HHs
 
P[Detected HH is true]
 
A row in 
Count-Min:
Precision Estimation Step 2: Improve The Bound
22
Insight:
Average 
Error
 = heavy items collision + small items collision
Counter indices of detected HHs show heavy collisions
 
Idea 2: 
Markov’s inequality only for small items
Idea 1
Idea 2
SCREAM Solutions
Estimate accuracy
Allocate resources
Collect & report
Accuracy estimation without the ground-truth
 
Merge sketches of different sizes for HH, HHH, SSD
SSD algorithm with higher and more stable accuracy
 
Network-wide task implementation using sketches
Precision estimators for HH, HHH and SSD tasks
 
Fast & Stable allocation in DREAM [SIGCOMM’14]
SCREAM Solutions
Estimate accuracy
Allocate resources
Collect & report
Accuracy estimation without the ground-truth
Merge sketches of different sizes for HH, HHH, SSD
SSD algorithm with higher and more stable accuracy
Network-wide task implementation using sketches
Precision estimators for HH, HHH and SSD tasks
Fast & Stable allocation in DREAM [SIGCOMM’14]
Evaluation
25
Metrics:
Satisfaction of a task: Fraction of task’s lifetime with
sufficient accuracy
% of rejected tasks
Alternatives:
OpenSketch
: Allocate for bounded error for worst-case
traffic at task instantiation (test with different bounds)
Oracle
: Knows required resource for a task in each
switch in advance
Evaluation Setting
26
Simulation for 8 switches:
256 task instances (HH, HHH, SSD, combination)
Accuracy bound = 80%
5 min tasks arriving in 20 minutes
2 hours CAIDA trace
SCREAM Provides High Accuracy for More Tasks
27
SCREAM: High satisfaction and low reject
OpenSketch:
Loose bound 
 
Under provision 
 low satisfaction
 
Tight bound 
 
Over provision 
 high reject
SCREAM’s Performance Is Close to An Oracle
28
SCREAM performance is close to an oracle,
its satisfaction is a bit lower because:
Iterative allocation takes time
Accuracy estimation has error
Other Evaluations
29
SCREAM accuracy estimation has 5% error in average
Accuracy estimation error
Changing traffic skew
SCREAM supports more accurate tasks than OpenSketch
Other accuracy metrics
Tasks in SCREAM have high recall (low false negative)
Conclusion
30
Practical sketch-based SDM by dynamic memory allocation
Implementing network-wide tasks using sketches
Estimating accuracy for 3 types of tasks
SCREAM is available at
 
github.com/USC-NSL/SCREAM
Measurement is crucial for SDN management
in a resource-constrained environment
Thanks!
Questions?
 
31
Slide Note

I will show you how to stuff many measurement tasks with high accuracy on switches with limited resources.

Masoud is on job market

Embed
Share

Measurement plays a crucial role in network management, especially for tasks like heavy hitter detection and anomaly detection. The focus is on sketch-based measurement, using innovative techniques like Count-Min Sketch to approximate specific queries efficiently. Challenges include limited resources for multiple tasks and the need to balance accuracy and resource allocation effectively.

  • Network Management
  • Sketch-based Measurement
  • Resource Allocation
  • Count-Min Sketch
  • Anomaly Detection

Uploaded on Sep 24, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. SCREAM: Sketch Resource Allocation for Software-defined Measurement (CoNEXT 15) Masoud Moshref, Minlan Yu, Ramesh Govindan, Amin Vahdat

  2. Measurement is Crucial for Network Management Network Management on multiple tenants: Traffic Engineering Engineering Anomaly Detection detection Traffic DDoS Accounting Anomaly Detection Need fine-grained visibility of network traffic Measurement tasks: Heavy Hitter detection Heavy hitter detection (HH) Heavy Hitter detection Hierarchical heavy hitter detection (HHH) Change detection Super source detection (SSD) 2

  3. Software Defined Measurement Task 1 Task 2 DREAM [SIGCOMM 14] / SCREAM [CoNEXT 15] Controller Configure Collect Switch A Task 1 counters Switch B Task 1 counters Task 2 counters Task 2 counters 3

  4. Our Focus: Sketch-based Measurement Summaries of streaming data to approximately answer specific queries E.g., Bitmap for counting unique items OpenFlow Counters DREAM [SIGCOMM 14] Sketches SCREAM [CoNEXT 15] Memory Expensive, power-hungry TCAM Cheaper SRAM Counters Volume counters Volume and Connection counters Flows Selected prefixes All traffic all-the-time Sketches use a cheaper memory and are more expressive 4

  5. Sketch Example: Count-Min Sketch At packet arrival: h1(IP) h2(IP) 2+1=3 4+1=5 (IP, 1 Kbytes) d h3(IP) 1+1=2 At query: What is the traffic size of IP? = row with min collision = Min(3,5,2) = 2 Resource accuracy trade-off: Provable error bound given traffic properties (e.g., skew) 5

  6. Challenges: Limited Counters for Many Tasks Limited shared resources: SRAM capacity (e.g., 128 MB) Shared with other functions (e.g., routing) Too many resources to guarantee accuracy: 1 MB-32 MB per task Less than 4-128 tasks in SRAM Many task instances: 3 types (Heavy hitter, Hierarchical heavy hitter, Super source) Different flow aggregates (Rack, App, Src/Dst/Port) 1000s of tenants 6

  7. Goal: Many Accurate Sketch-based Measurements Users dynamically instantiate a variety of measurement tasks SCREAM supports the largest number of measurement tasks while maintaining measurement accuracy 7

  8. Approach: Dynamic Resource Allocation Resource accuracy trade-off depends on traffic Count Min: Provable error bound given traffic properties Ex: Skew of traffic from each IP Worst-case uses >10x counters than average Required memory Skew Dynamic allocation for current traffic 8

  9. Opportunity: Temporal Multiplexing Memory requirement varies over time Task 1 Task 2 Required Memory Time Multiplex memory among tasks over time 9

  10. Opportunity: Spatial Multiplexing Memory requirement varies across switches Task 1 Task 2 Required Memory Switch A Switch B Multiplex memory among tasks across switches 10

  11. Key Insight Leverage spatial and temporal multiplexing and dynamically allocate switch memory per task to achieve sufficient accuracy for many tasks DREAM has the same insight SCREAM applies it for sketches 11

  12. SCREAM Contributions 2- Allocate memory among sketch-based task instances acrossswitches while maintainingsufficient accuracy SCREAM Dynamic resource allocator Allocation Heavy hitter (HH) tasks Hierarchical heavy hitter (HHH) tasks Super Source (SSD) tasks 1- Supports 3 sketch-based task types Anomaly detection Traffic engineering DDoS detection 12

  13. SCREAM Iterative Workflow Collect & report Counters from many switches Estimate accuracy Accuracy Allocate resources Memory size 13

  14. SCREAM Iterative Workflow 50 Allocated Memory (KB) 100 40 Collect & report 80 Precision Accuracy 60 30 40 20 Task 1 Task 1 Task 2 Task 2 20 10 0 0 20 20 40 40 60 60 0 Estimate accuracy Time (s) Time (s) 50 Allocated Memory (KB) 50 Task1 accuracy <80% Allocated Memory (KB) 40 40 30 30 20 Task 1 Task 2 20 Task 1 Task 2 Allocate resources 10 10 0 20 20 40 40 60 60 Give more memory to task1 0 Time (s) Time (s) 14

  15. SCREAM Iterative Workflow 100 Collect & report Merge counters from switches 80 Precision Accuracy 60 40 Task 1 Task 2 20 0 0 20 40 60 Estimate accuracy Skew of traffic for task2 changes Task2 accuracy <80% Time (s) 50 50 Allocated Memory (KB) Allocated Memory (KB) 40 40 30 30 20 20 Task 1 Task 2 Task 2 Task 1 Allocate resources Give more memory to task2 10 10 0 0 20 20 40 40 60 60 Time (s) Time (s) 15

  16. SCREAM Challenges Collect & report Network-wide task implementation using sketches Estimate accuracy Accuracy estimation without the ground-truth Allocate resources Fast & Stable allocation in DREAM [SIGCOMM 14]

  17. Challenge: Merge Sketches of Different Sizes Network-wide Task Heavy hitter (HH) Source IPs sending > 10Mbps 25 10 15 Switch A Switch B d d w1 w2 17

  18. SCREAM Solution to Merge Sketches for HH Detection Previous work: Min of sums SCREAM: Sum of mins + + + 10 40 50 70 20 30 30 70 40 50 20 10 Min Min 50 80 90 25 Min + 10 20 50 30 10 15 Switch A Switch B 10 40 30 50 Both over-approximate smaller is more accurate 70 20 18

  19. SCREAM Solutions Collect & report Network-wide task implementation using sketches Merge sketches of different sizes for HH, HHH, SSD SSD algorithm with higher and more stable accuracy Estimate accuracy Accuracy estimation without the ground-truth Allocate resources Fast & Stable allocation in DREAM [SIGCOMM 14]

  20. Precision Estimation for Heavy Hitter Detection True detected HH Detected HHs Insight: Relate probability to Error on counters of detected HHs = Sum(P[Detected HH is true]) Precision = Estimate-Threshold Error Estimate-Threshold Threshold Estimated Real True HH False HH P[Detected HH is true] = 1 - P[Error Estimate-Threshold] 20

  21. Precision Estimation Step 1: Find a Bound on The Error Idea 1: Use average Error in Markov s inequality to bound it Insight: Relate probability to Error on counters of detected HHs Idea 1 P[Detected HH is true] = 1 - P[Error Estimate-Threshold] 21

  22. Precision Estimation Step 2: Improve The Bound Idea 2 A row in Count-Min: Idea 1 Insight: Average Error = heavy items collision + small items collision Counter indices of detected HHs show heavy collisions Idea 2: Markov s inequality only for small items 22

  23. SCREAM Solutions Collect & report Network-wide task implementation using sketches Merge sketches of different sizes for HH, HHH, SSD SSD algorithm with higher and more stable accuracy Estimate accuracy Accuracy estimation without the ground-truth Precision estimators for HH, HHH and SSD tasks Allocate resources Fast & Stable allocation in DREAM [SIGCOMM 14]

  24. SCREAM Solutions Collect & report Network-wide task implementation using sketches Merge sketches of different sizes for HH, HHH, SSD SSD algorithm with higher and more stable accuracy Estimate accuracy Accuracy estimation without the ground-truth Precision estimators for HH, HHH and SSD tasks Allocate resources Fast & Stable allocation in DREAM [SIGCOMM 14]

  25. Evaluation Metrics: Satisfaction of a task: Fraction of task s lifetime with sufficient accuracy % of rejected tasks Alternatives: OpenSketch: Allocate for bounded error for worst-case traffic at task instantiation (test with different bounds) Oracle: Knows required resource for a task in each switch in advance 25

  26. Evaluation Setting Simulation for 8 switches: 256 task instances (HH, HHH, SSD, combination) Accuracy bound = 80% 5 min tasks arriving in 20 minutes 2 hours CAIDA trace 26

  27. SCREAM Provides High Accuracy for More Tasks SCREAM: High satisfaction and low reject 100 100 OS_10 OS_50 OS_90 SCREAM Average Satisfaction 80 Rejected tasks (%) 80 60 60 40 OS_10 OS_50 OS_90 SCREAM 40 20 20 0 0 128 256 384 512 128 256 384 512 Switch capacity (KB) Switch capacity (KB) OpenSketch: Loose bound Under provision low satisfaction Tight bound Over provision high reject 27

  28. SCREAMs Performance Is Close to An Oracle 100 100 Oracle SCREAM Average Satisfaction 80 80 Rejected tasks (%) 60 60 40 40 20 20 Oracle SCREAM 0 0 128 256 384 512 128 256 384 512 Switch capacity (KB) Switch capacity (KB) SCREAM performance is close to an oracle, its satisfaction is a bit lower because: Iterative allocation takes time Accuracy estimation has error 28

  29. Other Evaluations Changing traffic skew SCREAM supports more accurate tasks than OpenSketch Accuracy estimation error SCREAM accuracy estimation has 5% error in average Other accuracy metrics Tasks in SCREAM have high recall (low false negative) 29

  30. Conclusion Measurement is crucial for SDN management in a resource-constrained environment Practical sketch-based SDM by dynamic memory allocation Implementing network-wide tasks using sketches Estimating accuracy for 3 types of tasks SCREAM is available at github.com/USC-NSL/SCREAM 30

  31. Thanks! Questions? 31

Related


More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#