DREAM: Dynamic Resource Allocation for Software-defined Measurement

Slide Note
Embed
Share

DREAM, presented at SIGCOMM’14 by Masoud Moshref, Minlan Yu, Ramesh Govindan, and Amin Vahdat, focuses on the importance of measurement for network management, catering to tenants like Netflix, Expedia, and Reddit. It highlights the significance of anomaly detection and traffic management in the realm of software-defined measurement.


Uploaded on Oct 11, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. DREAM: Dynamic Resource Allocation for Software-defined Measurement (SIGCOMM 14) Masoud Moshref, Minlan Yu, Ramesh Govindan, Amin Vahdat 1

  2. Measurement is Crucial for Network Management Tenant: Netflix Expedia Reddit Management: Anomaly Detection Detection Traffic Engineering Engineering Failure Traffic Accounting Accounting Measurement: Heavy Hitter detection Heavy Hitter detection Heavy Hitter detection Change detection Change detection Change detection Network: Motivation Motivation System Algorithm Evaluation 2

  3. High Level Contribution: Flexible Measurement Management: Users dynamically instantiate complex measurements on network state Measurement: DREAM supports the largest number of measurement tasks while maintaining measurement accuracy, by dynamically leveraging tradeoffs between switch resource consumption and measurement accuracy Network: We leverage unmodified hardware and existing switch interfaces Motivation Motivation System Algorithm Evaluation 3

  4. Prior Work: Software Defined Measurement (SDM) Controller Heavy Hitter detection Change detection 3 Install rules Update rules 1 2 Fetch counters #Bytes=1M #Bytes=5M Source IP: 10.0.1.128/30 Source IP: 10.0.1.130/31 Source IP: 55.3.4.34/31 Source IP: 55.3.4.32/30 Motivation Motivation System Algorithm Evaluation 4

  5. Our Focus: Measurement Using TCAMs Existing OpenFlow switches use TCAMs which permit counting traffic for a prefix Focus on TCAMs enables immediate deployability Prior work has explored other primitives such as hash-based counters Motivation Motivation System Algorithm Evaluation 5

  6. Challenge: Limited TCAM Memory 31 Find source IPs sending > 10Mbps 26 5 Controller 13 13 2 3 11 Heavy Hitter detection 00 01 10 Install rules Fetch counters 1 2 00 13MB 01 13MB 10 3MB 11 2MB Problem: Requires too many TCAMs 64K IPs to monitor a /16 prefix >> ~4K TCAMs at switches Motivation Motivation System Algorithm Evaluation 6

  7. Reducing TCAM Usage Monitor internal nodes to reduce TCAM usage 31 Monitoring 1* is enough because a node with size 5 cannot have leaves >10 26 5 13 00 13 01 2 10 3 11 31 26 5 13 00 13 01 2 10 3 11 Motivation Motivation System Algorithm Evaluation 7

  8. Challenge: Loss of Accuracy Fixed configuration misses heavy hitters as traffic changes 31 26 5 13 13 2 3 Missed heavy hitters 39 9 30 4 5 15 15 Motivation Motivation System Algorithm Evaluation 8

  9. Dynamic Configuration to Avoid Loss of Accuracy 39 Find leaves >10Mbps using 3 TCAMs 9 30 4 5 15 15 Merge Divide Monitor children to detect HHs but using 2 TCAMs 39 9 30 Monitor parent to save a TCAM 4 5 15 15 Motivation Motivation System Algorithm Evaluation 9

  10. Reducing TCAM Usage: Temporal Multiplexing Required TCAM changes over time Task 1 Task 2 # TCAMs Required Time Motivation Motivation System Algorithm Evaluation 10

  11. Reducing TCAM Usage: Spatial Multiplexing Required TCAMs varies across switches Switch A Switch B # TCAMs Required Time Only needs more TCAMs at switch A Motivation Motivation System Algorithm Evaluation 11

  12. Reducing TCAM Usage: Diminishing Returns 1 7% 0.8 Accuracy Bound 12% Accuracy 0.6 0.4 0.2 0 256 512 1024 TCAMs 2048 Can accept an accuracy bound <100% to save TCAMs Motivation Motivation System Algorithm Evaluation 12

  13. Key Insight Leverage spatial and temporal multiplexing and diminishing returns to dynamically adapt the configuration and allocation of TCAM entries per task to achieve sufficient accuracy Motivation Motivation System Algorithm Evaluation 13

  14. DREAM Contributions System Supports concurrent instances of three task types: Heavy Hitter, Hierarchical HH and Change Detection Algorithm Dynamicallyadapts tasks TCAM allocations and configuration over time and across switches, while maintaining sufficient accuracy Evaluation Significantly outperforms fixed allocation and scales well to larger networks Motivation Motivation System Algorithm Evaluation 14

  15. DREAM Tasks Anomaly detection Traffic engineering Management Network provisioning Accounting Network visualization DDoS detection Heavy Hitter detection Measurement Hierarchical HH detection Change detection DREAM Network Architecture System Motivation Algorithm Evaluation 15

  16. DREAM Workflow Task type Task parameters Task filter Accuracy bound Instantiate task Report Task Instance 1 TCAM Allocation and Configuration Task Instance n DREAM SDN Controller Configure counters Fetch counters Architecture System Motivation Algorithm Evaluation 16

  17. Algorithmic Challenges Dynamically adapts tasks TCAM allocations and configuration over time and across switches, configuration over time and across switches, configuration over time and across switches, configuration Dynamically adapts tasks TCAM allocations and Dynamically adapts tasks TCAM allocations and Dynamically adapts tasks TCAM allocations and configuration over time and across switches, while maintaining sufficient accuracy while maintaining sufficient accuracy while maintaining sufficient accuracy while maintaining sufficient accuracy sufficient accuracy allocations allocations switches switches How to allocate TCAMs for sufficient accuracy? Diminishing Return Which switches to allocate? Temporal Multiplexing How to adapt TCAM configuration on multiple switches? Spatial Multiplexing Algorithm Motivation System Evaluation 17

  18. Dynamic TCAM Allocation Allocate TCAM Estimate accuracy Measure Why iterative approach? Enough TCAMs High accuracy Satisfied We cannot know the curve for every traffic and task instance Thus we cannot formulate a one-shot optimization Not enough TCAMs Low accuracy Unsatisfied 1 Why estimating accuracy? 0.8 Accuracy 0.6 We don t have ground-truth Thus we must estimate accuracy 0.4 0.2 0 256 512 1024 TCAMs 2048 Algorithm Motivation System Evaluation 18

  19. Estimate Accuracy: Heavy Hitter Detection True detected HH Precision = Detected HHs Is 1 because any detected HH is a true HH True detected HH Recall = True detected + Missed HHs Estimate missed HHs Algorithm Motivation System Evaluation 19

  20. Estimate Recall for Heavy Hitter Detection True detected HH Recall = True detected + Missed HHs Find an upper bound of missed HHs using size and level of internal nodes Threshold=10Mbps With size 26: missed <=2 HHs At level 2: missed <=2 HH 76 26 50 12 14 15 35 5 7 12 2 0 15 20 15 Algorithm Motivation System Evaluation 20

  21. Allocate TCAM Goal: maintain high task satisfaction Fraction of task s lifetime with sufficient accuracy How many TCAMs to exchange? Small Slow convergence Large Oscillations Accuracy Accuracy Time Time Algorithm Motivation System Evaluation 21

  22. Avoid Overloading Not enough TCAMs to satisfy all tasks Solutions Reject new tasks Drop existing tasks Algorithm Motivation System Evaluation 22

  23. Algorithmic Challenges Dynamically adapts tasks TCAM allocations and configuration over time and across switches, while maintaining sufficient accuracy How to allocate TCAMs for sufficient accuracy? Diminishing Returns Which switches to allocate? Temporal Multiplexing How to adapt TCAM configuration on multiple switches? Spatial Multiplexing Algorithm Motivation System Evaluation 23

  24. Allocate TCAM: Multiple Switches A task can have traffic from multiple switches Controller 30 HHs Heavy Hitter detection 20 HHs 10 HHs A B Local accuracy is important Global accuracy is important Use both local and global accuracy If a task is globally unsatisfied, increasing B s TCAMs is expensive (diminishing returns) If a task is globally satisfied, no need to increase A s TCAMs Algorithm Motivation System Evaluation 24

  25. DREAM Modularity Task Independent Task Dependent TCAM Configuration: Divide & Merge Accuracy Estimation TCAM Allocation DREAM Algorithm Motivation System Evaluation 25

  26. Evaluation: Accuracy and Overhead Accuracy Satisfaction of a task: Fraction of task s lifetime with sufficient accuracy % of rejected/dropped tasks Overhead How fast is the DREAM control loop? Evaluation Motivation System Algorithm 26

  27. Evaluation: Alternatives Equal: divide TCAMs equally at each switch, no reject Fixed: fixed fraction of TCAMs, reject extra tasks Evaluation Motivation System Algorithm 27

  28. Evaluation Setting Prototype on 8 Open vSwitches 256 tasks (HH, HHH, CD, combination) 5 min tasks arriving in 20 mins Accuracy bound=80% 5 hours CAIDA trace Validate simulator using prototype Large scale simulation (4096 tasks on 32 switches) accuracy bounds task loads (arrival rate, duration, switch size) tasks (task types, task parameters e.g., threshold) # switches per tasks Evaluation Motivation System Algorithm 28

  29. Prototype Results: Average Satisfaction DREAM: High satisfaction of tasks at the expense of more rejection for small switches 1 100 DREAM-reject Fixed-reject DREAM-drop Average Satisfaction 0.8 80 Satisfaction % of tasks 0.6 60 0.4 40 Dream Equal Fixed 0.2 20 0 0 512 1024 Switch capacity # TCAMs in Switch 2048 4096 512 1024 Switch capacity # TCAMs in Switch 2048 4096 Fixed: High rejection as over-provisions for small tasks Evaluation Motivation System Algorithm 29

  30. Prototype Results: 95th Percentile Satisfaction DREAM: High 95th percentile satisfaction 1 95th Percentile Satisfaction 0.8 Satisfaction 0.6 Dream Equal Fixed 0.4 0.2 0 512 1024 Switch capacity # TCAMs in Switch 2048 4096 Equal and Fixed only keep small tasks satisfied Evaluation Motivation System Algorithm 30

  31. Conclusion Measurement is crucial for SDN management in a resource-constrained environment Dynamic TCAM allocation across measurement tasks Diminishing returns in accuracy Spatial and temporal multiplexing Future work More TCAM-based measurement tasks (quintiles for load balancing, entropy detection) Hash-based measurements DREAM is available at github.com/USC-NSL/DREAM 31

Related


More Related Content