Energy Management Challenges in Datacenters for Online Data-Intensive Applications

Slide Note
Embed
Share

Massive growth of big data calls for efficient energy management in datacenters hosting online data-intensive applications. Traditional energy management methods fall short due to the interactive nature of these applications and strict service-level agreements. Leveraging network variability, TimeThief offers a novel approach to saving datacenter energy by exploiting slack in the network, reshaping response time distributions, and employing EDF scheduling, achieving significant energy savings even at peak loads.


Uploaded on Sep 22, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. TimeThief: Leveraging Network TimeThief: Leveraging Network Variability to Save Datacenter Energy Variability to Save Datacenter Energy in On in On- -line Data line Data- -Intensive Applications Intensive Applications Balajee Vamanan (Purdue Balajee Vamanan (Purdue Hamza Bin Sohail (Purdue) Hamza Bin Sohail (Purdue) Jahangir Hasan (Google) Jahangir Hasan (Google) T. N. Vijaykumar (Purdue) T. N. Vijaykumar (Purdue) UIC UIC) )

  2. Big data and OLDI applications Big data and OLDI applications Massive growth of big-data Unstructured data doubles every two years Mostly Internet data Consumed by Online, data-intensive (OLDI) applications Datacenters are critical computing platforms for OLDIs OLDIs are important Many popular Internet applications are OLDIs e.g., Google Search, Facebook, Twitter Organize and provide access to Internet data Vital to many big companies 2

  3. Challenges to energy management Challenges to energy management Traditional energy management does not work 1. OLDIs are interactive strict service-level agreements (SLAs) cannot batch 2. Shorter response times and inter-arrival times (e.g., 200 ms and 1 ms for Web Search) Low-power modes (e.g., p-states) not applicable 3. OLDIs distribute data over 1000s of servers each user query searches all servers Cannot consolidate workload to fewer servers Energy management for OLDIs can only slowdown servers without violating SLAs 3

  4. Previous work Previous work Pegasus [ISCA 14] slows down responses at low loads Response time more accurate than CPU utilization Shifts latency distribution at lower loads (uses datacenter-wide metrics) Achieves load-proportional energy Saves energy at lower loads Does not save much at higher loads Saves 0% at peak Datacenters operate at moderate-peak during the day (diurnal pattern) savings desired at high loads Pegasus saves substantial energy at low loads but the savings are limited at high loads 4

  5. TimeThief: contributions TimeThief: contributions 1. Exploits slack from the network to slowdown compute i.e., 80% flows take 1/5th of budget even at peak load 2. Reshapes response time distribution (at all loads) Slows sub-critical leaf servers for each query 3. Leverages network signals to determine slack Does not need fine-grained clock synchronization 4. Employs Earliest Deadline First (EDF) scheduling Decouples critical queries from slowed-down, previously- queued sub-critical queries TimeThief identifies sub-critical queries and exploits their slack to save 12% energy even at the peak load 5

  6. Talk Organization Talk Organization Introduction Motivation, previous work Our contributions Background OLDI architecture Network variability TimeThief Key Ideas Slack calculation & application EDF Methodology Results 6

  7. Background: OLDI Background: OLDI architecture architecture OLDI = OnLine Data Intensive applications e.g. Web Search Request Response Root 1. Online: deadline-bound 2. Data Intensive: large datasets highly distributed Partition-aggregate (tree) Request-compute-reply Request: root leaf Compute: leaf node Reply: leaf data Wait until deadline Dropped replies affect quality e.g., SLA of 1% misses Agg m Agg 1 Leaf 1 Leaf n Leaf 1 Leaf n Tail latency problem 7

  8. OLDI OLDItail latencies and SLA budgets tail latencies and SLA budgets Tail latency problem: Overall response time affected by slower leaves Deadline budgets based on 99th -99.9th percentiles of individual leaves replies for one query 30 Component budgets: To optimize network and compute separately deadline budget split among nodes (compute) and network e.g., total = 200ms flow deadlines 10-20ms 10 20 30 10 20 80 8

  9. OLDI traffic characteristics: Incast OLDI traffic characteristics: Incast Many children respond around same time causing incast Causes packet drops and long tails OLDI trees have large degree Hard to absorb in buffers 1.Datacenter switches use shallow buffers for cost 2.Query multiplexing: many in-flight queries incast collide 3.Inevitable background traffic worsens incast Requests (parent child) also affected by reply incast (child parent) due to root randomization details in the paper 9

  10. Network variability and incast Network variability and incast Network-variability is significant even with root randomization 25% 20% Percent of flows 15% TCP 10% Queuing in the network Timeouts 5% 0% 1 3 5 7 9 11 13 15 17 19 21 Flow completion time (ms) Incast leads to long tails; network budgets based on tail latency are about 6-7X of median network delay 10

  11. Talk Organization Talk Organization Introduction Motivation, previous work Our contributions Background OLDI architecture Network variability TimeThief Key Ideas Slack calculation & application EDF Methodology Results 11

  12. TimeThief: key ideas TimeThief: key ideas TimeThief exploits per-query per-leaf slack e.g., 80% of requests take 1/5th of budget Three kinds of slack: Request (before compute) Compute (future extension) Reply (after compute) TimeThief exploits request slack Mechanism to determine request slack Machinery to effectively exploit slack EDF scheduling to shield tail (critical) requests Intel s Running Avg. Power Limit (RAPL) to set power state TimeThief reshapes response time distribution using per-query slack; EDF pulls the tail in 12

  13. Determining request Slack Determining request Slack To determine slack, timestamps at parent and leaf? Clock skew slack Estimate based on network signals 1. Explicit Congestion Notification (ECN) Switch marks if buffer occupancy > Threshold 2. TCP Timeouts Sender marks retransmitted packets (lost earlier) If a leaf doesn t see either ECN or Timeouts request_slack = network_budget - median_latency else request_slack = 0 how much slack can be used to slowdown compute? depends on queuing (load) 13

  14. Slowing down based on request slack Slowing down based on request slack Two questions 1. How much to slow down the current request? We don t know the current request s needs ahead! But compute budget accounts for tail! 2. How to account for queueing (load)? Attenuate the slack slowdown = request_slack * scale / compute_budget scale depends on load Feedback controller to dynamically determine scale Controller monitors response times of completed queries every 5 seconds (more details in the paper) scale += 0.05if there is 5% room scale -= 0.05otherwise 14

  15. Earliest Earliest- -deadline first scheduling deadline first scheduling A sub-critical request (request_slack 0) could hurt another critical request (request_slack=0) If the critical request is queued behind the sub-critical request EDF decouples critical and sub-critical We compute the deadline at the leaf node as follows: deadline = compute_budget + request_slack Note: Determining slack enables EDF! More details in the paper 15

  16. Talk Organization Talk Organization Introduction Motivation, previous work Our contributions Background OLDI architecture Network variability TimeThief Key Ideas Slack calculation & application EDF Methodology Results 16

  17. Methodology Methodology 1. Compute (service time, power) Real measurements feasible at a small scale 2. Network Tails effects only in large clusters ns-3 Workload: Web Search (Search) from CloudSuite 2.0 Search index from Wikipedia 3000 queries/s at peak load 90% load with 100 threads per leaf (4 sockets * 12 cores * 2 way SMT 100) Deadline budget: Overall 125 ms Network (request/reply): 25 ms Compute: 75 ms The budgets are in line with other papers SLA of 1% missed deadlines 17

  18. Methodology (cont.) Methodology (cont.) Service time and power measurements Service time measured at low load (no queueing) at the leaf server (matches other papers) Power measurement with RAPL Measurements at one leaf server Leaf servers are i.i.d Network Fat-tree topology 64 racks, 16 servers/rack 10 Gbps links 4 MB shared packet buffers 200 s round-trip time (unloaded) 18

  19. Service time and Power Service time and Power 100 Cummulative percent of requests 2 90 Power Saving Normalized to 80 1.8 70 60 1.6 baseline 50 40 1.4 30 20 1.2 10 1 0 1 1.2 1.4 1.6 1.8 2 Service Slowdown (norm. to base) 0 10 20 30 40 50 60 Service time (ms) 19

  20. Response time distribution Response time distribution 100% Cummulative percentile of requests 90% 80% 70% 60% Timetrader (90%) TimeThief 50% Timetrader (30%) TimeThief 40% Baseline (30%) 30% Baseline (90%) 20% Pegasus (30%) 10% 0% 0 20 40 60 Latency (ms) 80 100 120 140 TimeThief reshapes distribution at all loads; saves 12% energy at the peak load 20

  21. Power state distribution Power state distribution 1.2 GHz 1.5 GHz 1.8 GHz 2.2 GHz 2.5 GHZ 100% 80% Percent of requests 60% 40% 20% 0% P T P T 90% 30% Even at the peak load, TimeThief slows down about 80% of the time using per-query slack 21

  22. Conclusion Conclusion TimeThief exploits per-query slack Request slack Compute slack (future work) Reply slack not easily predictable Reshapes response time distribution at all loads Saves 12% energy at peak load Leverages network signals to estimate slack Does not require fine-grain clock synchronization Employs EDF to decouple critical requests from sub-critical requests TimeThief converts OLDIs performance disadvantage of latency tail into an energy advantage! 22

  23. Thank you

Related


More Related Content