Temporal Coordination of Measurement in Data Centers

Slide Note
Embed
Share

Measurement plays a crucial role in data centers for fault diagnosis, traffic engineering, and attack detection. This study focuses on the concept of temporal coordination of measurement to overcome issues like reporting overhead and resource wastage. Various examples illustrate the importance of coordinated measurement in scenarios such as loss detection and port scanning to enhance network performance and security.


Uploaded on Sep 16, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. MOZART: Temporal Coordination of Measurement (SOSR 16) Xuemei Liu, Meral Shirazipour, Minlan Yu, Ying Zhang 1

  2. Measurement in data center Incentive examples of measurement Fault diagnosis: Capture root causes for failures. Traffic engineering: Capture statistics for big flows. Attack detection: Capture signatures of attacks. Essence of measurement Capture data related to events. 2

  3. Different views/abilities of devices View: per source/destination traffic Abilities: end-2-end loss, latency, etc. View: per link traffic Abilities: per link volume, latency, etc. switches hosts 3

  4. No-coordination of measurement Controller Too much reporting overhead Limited resource may be utilized by flows not related to the event. We propose temporal coordination of measurement 4

  5. Example1 loss detection Packet loss affects performance. Operators want to locate the loss. Measure & report flow volume of all flows Measure & report loss of all flows S1 S2 S0 Traffic flow No-coordination 5

  6. Example1 - loss detection Packet loss affects performance. Operators want to locate the loss. Measure & report flow volume of only lossy flows Detect high loss for some flows S1 S2 S0 Selected flows Traffic flow Coordination Sender needs to coordinate the lossy flows with switches. 6

  7. Example2 - port scan Compromised servers detect vulnerable servers. Count & report number of destinations for all senders Port: 456 Port: 123 S0 S1 Compromised sever No-coordination Port: 789 Traffic flow 7

  8. Example2 - port scan Compromised servers detect vulnerable servers. Detect senders with unwanted traffic sent to secure ports Count & report number of destinations for detected sender Port: 456 Port: 123 S0 S1 Http server (80) Compromised sever Egress switch coordinates candidate compromised senders with ingress switch Coordination Port: 789 Traffic flow 8 Selected flows

  9. Example3 - ECMP flow Facebook reported congestion caused by unbalanced ECMP traffic distribution. Measure & report volume of all flows S0 S1 No-coordination S2 Traffic flow 9

  10. Example3 - ECMP flow Facebook reported congestion caused by unbalanced ECMP traffic distribution. Measure & report volume of elephant flows Detect elephant flows S0 S1 Switches coordinate elephant flows with each other Coordination S2 Traffic flow Selected flows 10

  11. MOZART MOnitor flowZ At the Right Time 11

  12. MOZART framework MOZART controller Capture data related to events Configure Detect events monitor Report data of selected flows selector selector monitor Selected flows 12

  13. MOZART design challenges Coordination measurement Placement of MOZART tasks 13

  14. MOZART design challenges Coordination measurement Placement of tasks 14

  15. Strawman Coordination TIME f1 satisfies the event f1 in Selector: f1 is selected f1 in Monitor: Normal packet 15

  16. Strawman Coordination TIME f1 satisfies the event f1 in Selector: f1 is selected f1 in Monitor: Traffic before selected is not captured Normal packet Captured packet 16

  17. Two-mode Coordination Event Mode Normal Mode TIME f1 satisfies the event f1 in Selector: f1 is selected Sampling in Normal Mode f1 in Monitor: Normal packet Traffic before selected has a chance to be captured. Captured packet 17 Sampled packet

  18. Memory management in monitors Selected flows, non-selected flows coexist in hash table. Limited memory in devices. Collision may happen in hash table. Selected flows f7 f1 f2 Flow ID f3 1 1 0 Selected flow? Flow statistics 10240 2048 500 18

  19. Memory management in monitors Selected flows, non-selected flows coexist in hash table. Limited memory in devices. Collision may happen in hash table. Selected flows f7 f1 f2 Flow ID f7 1 1 1 Selected flow? Flow statistics 10240 2048 1024 19

  20. Memory management in monitors Selected flows, non-selected flows coexist in hash table. Limited memory in devices. Collision may happen in hash table. Non-selected flows Selected flows f5 f6 f7 f1 f2 Flow ID f7 1 1 1 Selected flow? More memory is allocated to selected flows. Flow statistics 10240 2048 1024 20

  21. MOZART design challenges Coordination measurement Placement of MOZART tasks 21

  22. Placement of MOZART tasks Many candidate MOZART tasks to run Operators want to detect many events. Device Resource Constraints Switches: limited memory; Hosts: limited CPU. Measurement can just use leftover resources. Latency constraint within one MOZART task Timely communication is critical. Latency between selectors/monitors should be small. 22

  23. Placement of MOZART tasks Strawman algorithm Maximize Allocated Modules (MAM). Challenges One task - Selectors and monitors should all be placed. Multiple tasks - Joint placement to max running tasks. MOZART- Binary Integer Linear Programming Objective - Maximize the number of tasks to run. Subject to resource and latency constraints. 23

  24. Evaluation Setup Topology & Traffic B4 topology (12 switches, 12 hosts). Implemented in Mininet. Switches run Open vSwitch. 2 hours Caida trace. Compared algorithms No-coordination - Just Sample and Hold (SH) in monitors. Coordination - Selectors sends selected flows; SH in monitors. 24

  25. Example loss detection measure flow volume of lossy flows High loss for some flows monitor monitor monitor S1 S2 S0 selector Selected flows from selector Traffic flow 25

  26. MOZART achieves high accuracy Ratio of selected flows not captured 15% 1.3% Memory size in each monitor for measurement 26

  27. MOZART supports more tasks Algorithms tasks assigned(%) Avg. latency(ms) Maximize Allocated Modules 77% 94 MOZART (Latency <= infinite) 100% 110 27

  28. MOZART supports more tasks Algorithms tasks assigned(%) Avg. latency(ms) Maximize Allocated Modules 77% 94 MOZART (Latency <= infinite) 100% 110 MOZART (Latency <= 250ms) 98% 64 28

  29. Conclusion Temporal coordination is important Collect data related to events. Different views/abilities of devices. MOZART design highlights Coordination algorithms. Placement algorithm for maximizing tasks to run. Benefits High measurement accuracy. Support more tasks. Meet memory constraints in devices. 29

Related


More Related Content