ICE: An Integrated Configuration Engine for Interference Mitigation

 
 
Amiya K. Maji, Subrata Mitra, 
Saurabh
Bagchi
 
School of Electrical and Computer Engineering
Purdue University
West Lafayette, Indiana
 
I
C
E
:
 
A
n
 
I
n
t
e
g
r
a
t
e
d
 
C
o
n
f
i
g
u
r
a
t
i
o
n
 
E
n
g
i
n
e
f
o
r
 
I
n
t
e
r
f
e
r
e
n
c
e
 
M
i
t
i
g
a
t
i
o
n
 
i
n
 
C
l
o
u
d
S
e
r
v
i
c
e
s
 
a
 
I
n
t
r
o
d
u
c
t
i
o
n
 
Long latency and variable latency of cloud services degrade user
experience
Interference between VMs on the same physical machine is a key
factor for such latency perturbations
Internet
P
e
r
f
o
r
m
a
n
c
e
 
I
n
t
e
r
f
e
r
e
n
c
e
 
d
u
e
 
t
o
 
S
h
a
r
e
d
 
H
a
r
d
w
a
r
e
R
e
s
o
u
r
c
e
s
 
Other shared resources
Memory bandwidth
Network/IO
Translation Lookaside
Buffer (TLB)
Performance Interference
Performance of one VM suffering due to activity of another co-
located VM
One common occurrence is due to shared cache
R
e
m
e
d
i
a
t
i
o
n
 
T
e
c
h
n
i
q
u
e
s
 
Existing solutions
Better scheduling [Paragon ASPLOS’13, QCloud Eurosys’10]
Live migration [Deepdive ATC’13]
Resource containment [CPI2 Eurosys’13]
Our prior work
IC
2
 [Middleware’14]
Interference-aware Cloud Application Configuration
Advantages
User level control, no hypervisor modification
30-40% response time (RT) improvement during interference
Disadvantages:
High overhead of web server reconfiguration
Cannot improve RT further without degrading throughput
 
Require changes in hypervisor.
Not feasible in public cloud
T
y
p
i
c
a
l
 
L
o
a
d
-
b
a
l
a
n
c
e
d
 
W
S
 
S
e
t
u
p
Latency of a WS VM increases during interference
LB has no knowledge of interference, hence, treats all VMs identically
WS VM
WS VM
WS VM
Load Balancer
(LB)
I
C
E
:
 
A
n
 
I
n
t
e
g
r
a
t
e
d
 
C
o
n
f
i
g
u
r
a
t
i
o
n
 
E
n
g
i
n
e
 
f
o
r
I
n
t
e
r
f
e
r
e
n
c
e
 
M
i
t
i
g
a
t
i
o
n
 
Animating Insights
Reducing server load limits the impact of interference
Most large-scale web servers are placed behind load balancers
Use available residual capacity in a WS cluster efficiently
 
Objectives
Make reconfiguration (interference
mitigation) faster
Make existing load-balancers
interference-aware
Get better response time during
interference (than IC
2
)
 
I
C
E
 
W
o
r
k
f
l
o
w
 
Detect interference in predictive mode by mining patterns
of system usage values, e.g., Cycles per instruction
(CPI), Cache Miss Rate (CMR)
Two-level reconfiguration
1. Update load balancer weight
Less overhead. More agile.
2. Update Middleware parameters
Only for long interferences. Reduces overhead of idle threads.
I
C
E
 
D
e
s
i
g
n
 
Key components
1.
Monitoring Engine (ME)
2.
Interference Detector (DT)
3.
LB Config Engine (LBE)
4.
WS Config Engine (WSE)
 
I
n
t
e
r
f
e
r
e
n
c
e
 
D
e
t
e
c
t
i
o
n
 
We use hardware counters for interference detection
Faster detection
Hypervisor access not required if counters are virtualized
Use CPI and CMR from training runs to build a Decision Tree
Decision Tree is easy to interpret
Low detection (classification) overhead
 
Sample Run with Cloudsuite
I
C
E
:
 
L
o
a
d
 
B
a
l
a
n
c
e
r
 
R
e
c
o
n
f
i
g
u
r
a
t
i
o
n
 
Objective:
 Keep WS VM’s CPU utilization below a
threshold 
U
thres
If predicted CPU above threshold, find a new request rate
such that it goes below threshold
Request rate (RPS) is determined by 
server weight
value in load balancer configuration
Use the following empirical function for load estimation
 
 
 
 
 
I
C
E
:
 
T
r
a
i
n
i
n
g
 
D
e
c
i
s
i
o
n
 
T
r
e
e
 
a
n
d
 
E
s
t
i
m
a
t
o
r
 
Run CloudSuite with various interference intensities, for two
different interference generators 
 DCopy and LLCProbe
Monitor: CPI, CMR, CPU, RPS, RT
Collected data is labeled based on when interference was running
Labeled data used for building  Decision Tree
Observations during interference used to build estimator
Multivariate regression using R
Linear model on (OLDCPU, RPS,
 
CPI) chosen since little added benefit
 
with higher degree
 
I
C
E
:
 
W
e
b
 
S
e
r
v
e
r
 
R
e
c
o
n
f
i
g
u
r
a
t
i
o
n
 
WS reconfiguration is applied only if interference is long
lasting
Similar to heuristic presented in IC
2
 [Middleware 14]
During periods of interference optimal Apache/PHP
parameters change
MaxClients (MXC) reduces
KeepaliveTimeout (KAT) increases
pm.max_children (PHP) increases
Under interference, the following actions are needed to
improve response time
MXC 
, KAT 
, PHP 
Value of update determined using empirical functions
R
e
c
o
n
f
i
g
u
r
i
n
g
 
A
p
a
c
h
e
 
a
n
d
 
P
h
p
No-interference
Minimal latency with
MXC=8, PHP=2
 
During interference
Minimal latency with
MXC=4, PHP=4
 
E
v
a
l
u
a
t
i
o
n
 
Experimental Setup
CloudSuite (Olio) benchmark
 
with different interferences
Middlewares:
 
HAProxy+Apache+PHP
Interferences: LLCProbe, Dcopy
We look at ICE with two load balancer scheduling policies
Weighted Round Robin (WRR or simply RR)
ICE with WRR shows comparison against a static configuration.
Weighted Least Connection (WLC or simply LC)
ICE with WLC shows comparison against an out-of-box dynamic load
balancer
 
R
e
s
u
l
t
 
1
:
 
C
o
m
p
a
r
i
n
g
 
I
C
E
 
w
i
t
h
 
E
x
i
s
t
i
n
g
 
S
o
l
u
t
i
o
n
s
 
Baseline is static config, IC
2
 is Middleware’14 solution (WS reconfiguration
only), ICE is two-level reconfiguration (current paper)
ICE improves response time both in RR and LC
LC (out-of-box) reduces effect of interference significantly, but occasional
spikes remain
ICE reduces frequency and magnitude of these spikes
 
Round Robin (RR) Load Balancer
 
Least Connection (LC) Load Balancer
 
400ms
 
200ms
 
R
e
s
u
l
t
 
2
:
 
I
m
p
r
o
v
e
m
e
n
t
 
i
n
 
R
e
s
p
o
n
s
e
 
T
i
m
e
 
a
n
d
D
e
t
e
c
t
i
o
n
 
L
a
t
e
n
c
y
 
ICE improves median response time by upto 94% compared to a static
configuration (RR)
ICE improves median response time by upto 39% compared to a dynamic load
balancer (LC)
Median interference detection latency, i.e., time from onset of interference to
first-level reconfiguration at LB is low
3 sec using ICE; 15-20 sec for IC
2
 
Round Robin (RR) Load Balancer
 
Least Connection (LC) Load Balancer
 
A
p
p
l
y
i
n
g
 
I
C
E
 
t
o
 
o
t
h
e
r
 
W
e
b
 
S
e
r
v
i
c
e
s
 
Can the basic principles of ICE (two-level
reconfiguration) be applied to other web services?
Yes, at least to media streaming services
 Consider Darwin media streaming server
Long lasting sessions
Longer responses (video streams) compared to web requests
Mostly static content vs. dynamic website (Olio)
Questions:
Can we find a Darwin configuration that can mitigate
interference?
Does changing load-balancer weights improve latency?
 
E
x
p
e
r
i
m
e
n
t
a
l
 
S
e
t
u
p
 
Application
Darwin streaming server
CloudSuite Media Streaming benchmark
Middleware
LVS load balancer, Darwin Streaming Server
Interference
LLCProbe
Server Performance Metric
Frame delay: Time between when a video frame was expected to be sent by
the server and when it was actually sent
Frame delay should be <= 0 for correct operating points
Parameters
run_num_threads
 in Darwin and 
server_weight
 in LVS
 
O
p
t
i
m
a
l
 
n
u
m
_
t
h
r
e
a
d
s
 
w
i
t
h
 
I
n
t
e
r
f
e
r
e
n
c
e
 
10,000 concurrent clients to a 2-node Darwin cluster
Optimal num_threads changes from 10 to 150 during interference
This is comparable to PhpMaxChildren config in Php-fpm server
Frame delay improves significantly with different num_threads
during interference
 
No interference
 
With LLCProbe
 
I
m
p
r
o
v
e
m
e
n
t
 
i
n
 
L
a
t
e
n
c
y
 
w
i
t
h
 
D
i
f
f
e
r
e
n
t
 
L
B
 
W
e
i
g
h
t
s
 
Consider two num_threads settings
Optimal value for no interference (10)
Optimal value for interference (150)
Vary load balancer weight for both these settings to see
impact on latency
Reducing LB weights does improve latency in both cases
By reducing load sufficiently (e.g. 70% here) latency is as
good as during no-interference
Similar to our previous observation with Olio
 
With LLCProbe
 
C
o
n
c
l
u
d
i
n
g
 
I
n
s
i
g
h
t
s
 
Effect of interference can be mitigated by reducing load on the
affected VM, through a load balancer
We presented ICE for two-level configuration in WS clusters
First level: Reconfigure load balancer
Second level: Reconfigure the web service (only for longer lasting
interference)
ICE improves median Response Time of a representative web
service by 94% compared to static configuration and 39%
compared to a dynamic out-of-box load balancer
Median interference detection latency is low – 3 seconds
The basic principle of ICE is also applicable to streaming servers
Future work:
Handling other types of interferences: network, storage, etc.
Finding “useful” configuration parameters automatically
 
Q
u
e
s
t
i
o
n
s
 
T
h
a
n
k
 
Y
o
u
!
 
 
 
 
B
a
c
k
u
p
 
S
l
i
d
e
s
 
 
R
u
n
n
i
n
g
 
W
e
b
 
A
p
p
l
i
c
a
t
i
o
n
s
 
i
n
 
t
h
e
 
C
l
o
u
d
 
W
S
 
C
o
n
f
i
g
u
r
a
t
i
o
n
 
C
o
n
t
r
o
l
l
e
r
 
(
I
C
2
)
 
Choice of parameter driven by knowledge base
Created from empirical results shown earlier
Can be created by expert administrators
Our heuristic
Decrease MXC and increase KAT and PHP as shown below
 
I
C
E
:
 
R
e
c
o
n
f
i
g
u
r
a
t
i
o
n
 
A
c
t
i
o
n
s
 
If CPU utilization crosses a threshold estimate reduced request rate (
r
) which
would reduce load.
Update load balancer weight proportionately
Slide Note
Embed
Share

ICE is an innovative system designed to mitigate interference in cloud services caused by VMs sharing physical hardware resources. The system aims to improve user experience by addressing latency issues through interference-aware configurations and load balancer adjustments. By reducing server load and maximizing residual capacity in web server clusters, ICE enhances response times during interference, offering valuable insights for efficient cloud service management.

  • Cloud services
  • Interference mitigation
  • Performance enhancement
  • Load balancing
  • Configuration engine

Uploaded on Sep 29, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. ICE: An Integrated Configuration Engine for Interference Mitigation in Cloud Services a Amiya K. Maji, Subrata Mitra, Saurabh Bagchi School of Electrical and Computer Engineering Purdue University West Lafayette, Indiana 1/23

  2. Introduction Internet Long latency and variable latency of cloud services degrade user experience Interference between VMs on the same physical machine is a key factor for such latency perturbations 2/23

  3. Performance Interference due to Shared Hardware Resources Performance Interference Performance of one VM suffering due to activity of another co- located VM One common occurrence is due to shared cache Other shared resources Memory bandwidth Network/IO Translation Lookaside Buffer (TLB) P2 L1 P1 L1 Processor Cache L2 Cache (last level) Multi-core Cache Sharing 3/23

  4. Remediation Techniques Existing solutions Better scheduling [Paragon ASPLOS 13, QCloud Eurosys 10] Live migration [Deepdive ATC 13] Resource containment [CPI2 Eurosys 13] Our prior work IC2[Middleware 14] Interference-aware Cloud Application Configuration Advantages User level control, no hypervisor modification 30-40% response time (RT) improvement during interference Disadvantages: High overhead of web server reconfiguration Cannot improve RT further without degrading throughput Require changes in hypervisor. Not feasible in public cloud 4/23

  5. Typical Load-balanced WS Setup WS VM WS VM Load Balancer (LB) WS VM Latency of a WS VM increases during interference LB has no knowledge of interference, hence, treats all VMs identically 5/23

  6. ICE: An Integrated Configuration Engine for Interference Mitigation Objectives Make reconfiguration (interference mitigation) faster Make existing load-balancers interference-aware Get better response time during interference (than IC2) Animating Insights Reducing server load limits the impact of interference Most large-scale web servers are placed behind load balancers Use available residual capacity in a WS cluster efficiently 6/23

  7. ICE Workflow Detect interference in predictive mode by mining patterns of system usage values, e.g., Cycles per instruction (CPI), Cache Miss Rate (CMR) Two-level reconfiguration 1. Update load balancer weight Less overhead. More agile. 2. Update Middleware parameters Only for long interferences. Reduces overhead of idle threads. 7/23

  8. ICE Design Key components 1. Monitoring Engine (ME) 2. Interference Detector (DT) 3. LB Config Engine (LBE) 4. WS Config Engine (WSE) 8/23

  9. Interference Detection We use hardware counters for interference detection Faster detection Hypervisor access not required if counters are virtualized Use CPI and CMR from training runs to build a Decision Tree Decision Tree is easy to interpret Low detection (classification) overhead Sample Run with Cloudsuite 9/23

  10. ICE: Load Balancer Reconfiguration Objective:Keep WS VM s CPU utilization below a threshold Uthres If predicted CPU above threshold, find a new request rate such that it goes below threshold Request rate (RPS) is determined by server weight value in load balancer configuration Use the following empirical function for load estimation Indicator of Interference Predicted Util Past Util CPI RPS 10/23

  11. ICE: Training Decision Tree and Estimator Run CloudSuite with various interference intensities, for two different interference generators DCopy and LLCProbe Monitor: CPI, CMR, CPU, RPS, RT Collected data is labeled based on when interference was running Labeled data used for building Decision Tree Observations during interference used to build estimator Multivariate regression using R Linear model on (OLDCPU, RPS, CPI) chosen since little added benefit with higher degree 11/23

  12. ICE: Web Server Reconfiguration WS reconfiguration is applied only if interference is long lasting Similar to heuristic presented in IC2 [Middleware 14] During periods of interference optimal Apache/PHP parameters change MaxClients (MXC) reduces KeepaliveTimeout (KAT) increases pm.max_children (PHP) increases Under interference, the following actions are needed to improve response time MXC , KAT , PHP Value of update determined using empirical functions 12/23

  13. Reconfiguring Apache and Php During interference Minimal latency with MXC=4, PHP=4 No-interference Minimal latency with MXC=8, PHP=2 13/23

  14. Evaluation Experimental Setup CloudSuite (Olio) benchmark with different interferences Middlewares: HAProxy+Apache+PHP Interferences: LLCProbe, Dcopy We look at ICE with two load balancer scheduling policies Weighted Round Robin (WRR or simply RR) ICE with WRR shows comparison against a static configuration. Weighted Least Connection (WLC or simply LC) ICE with WLC shows comparison against an out-of-box dynamic load balancer 14/23

  15. Result 1: Comparing ICE with Existing Solutions 400ms 200ms Least Connection (LC) Load Balancer Round Robin (RR) Load Balancer Baseline is static config, IC2is Middleware 14 solution (WS reconfiguration only), ICE is two-level reconfiguration (current paper) ICE improves response time both in RR and LC LC (out-of-box) reduces effect of interference significantly, but occasional spikes remain ICE reduces frequency and magnitude of these spikes 15/23

  16. Result 2: Improvement in Response Time and Detection Latency Least Connection (LC) Load Balancer Round Robin (RR) Load Balancer ICE improves median response time by upto 94% compared to a static configuration (RR) ICE improves median response time by upto 39% compared to a dynamic load balancer (LC) Median interference detection latency, i.e., time from onset of interference to first-level reconfiguration at LB is low 3 sec using ICE; 15-20 sec for IC2 16/23

  17. Applying ICE to other Web Services Can the basic principles of ICE (two-level reconfiguration) be applied to other web services? Yes, at least to media streaming services Consider Darwin media streaming server Long lasting sessions Longer responses (video streams) compared to web requests Mostly static content vs. dynamic website (Olio) Questions: Can we find a Darwin configuration that can mitigate interference? Does changing load-balancer weights improve latency? 17/23

  18. Experimental Setup Application Darwin streaming server CloudSuite Media Streaming benchmark Middleware LVS load balancer, Darwin Streaming Server Interference LLCProbe Server Performance Metric Frame delay: Time between when a video frame was expected to be sent by the server and when it was actually sent Frame delay should be <= 0 for correct operating points Parameters run_num_threads in Darwin and server_weight in LVS 18/23

  19. Optimal num_threads with Interference No interference With LLCProbe 10,000 concurrent clients to a 2-node Darwin cluster Optimal num_threads changes from 10 to 150 during interference This is comparable to PhpMaxChildren config in Php-fpm server Frame delay improves significantly with different num_threads during interference 19/23

  20. Improvement in Latency with Different LB Weights With LLCProbe Consider two num_threads settings Optimal value for no interference (10) Optimal value for interference (150) Vary load balancer weight for both these settings to see impact on latency Reducing LB weights does improve latency in both cases By reducing load sufficiently (e.g. 70% here) latency is as good as during no-interference Similar to our previous observation with Olio 20/23

  21. Concluding Insights Effect of interference can be mitigated by reducing load on the affected VM, through a load balancer We presented ICE for two-level configuration in WS clusters First level: Reconfigure load balancer Second level: Reconfigure the web service (only for longer lasting interference) ICE improves median Response Time of a representative web service by 94% compared to static configuration and 39% compared to a dynamic out-of-box load balancer Median interference detection latency is low 3 seconds The basic principle of ICE is also applicable to streaming servers Future work: Handling other types of interferences: network, storage, etc. Finding useful configuration parameters automatically 21/23

  22. Questions 22/23

  23. Thank You! 23/23

  24. 24/23

  25. Backup Slides 25/23

Related


More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#