Achieving Bounded Latency in Data Centers: A Comprehensive Study

Queues don’t matter when you can
JUMP them!
Matthew P. Grosvenor, Malte Schwarzkopf, Ionel Gog,
Robert N. M. Watson, Andrew W. Moore,
Steven Hand, Jon Crowcroft
University of Cambridge Computer Laboratory 
Presented by
Vishal Shrivastav, Cornell University
Introduction
 
Datacenters comprise of varying mixture of workloads
some require very low latencies, some sustained high
throughput, and others require some combination of both
 
Statistical Multiplexing leads to 
in-network interference
can lead to large latency variance and long latency tails
leads to poor user experience and impacts revenue
 
How to achieve strong (bounded?) latency guarantees in
present day datacenters?
2
What causes latency variance?
 
Queue build-up
packets from throughput intensive flows block a latency-sensitive
packet
Need a way to separate throughput intensive flows from latency-
sensitive flows
 
Incast
packets from many different latency-sensitive flows hit the queue
at the same time
Need a way to 
proactively
 rate-limit latency-sensitive flows
3
Setup
4
1 server running 
ptpd v2.1.0
 synchronizing with a
timeserver
1 server generating mixed GET/SET workload of 1 KB requests
in TCP mode sent to 
memcached 
server
4 servers running 4-way barrier-synchronization benchmark
using
 Naiad v0.2.3 
benchmark
8 servers running
 Hadoop
, performing 
a natural join
between two 512 MB data sets (39M rows each)
How bad it really is?
5
CDF
CDF
In-network interference can lead to significant increase in
latencies and eventual performance degradation for latency-
sensitive applications
Towards achieving bounded latency
 
Servicing delay
Time since a packet got assigned to an output port to when it is
finally ready to be transmitted over the outgoing link
 
 
 
 
 
 
 
 
 
Servicing delay is a function of the queue length
6
Packets fanning in to a 4-port, virtual output queued switch. 
Output queues shown for port 3 only. 
Maximum servicing delay
7
Rate-limiting to achieve bounded latency
 
Network epoch
maximum time that an 
idle
 network will take to service one packet
from every sending host
 
 
All hosts are rate-limited so that they can issue 
atmost
 one
packet per epoch
bounded queuing   =>   bounded latency
8
 
epoch 1
 
epoch 2
What about throughput?
 
 
 
 
 
 
Configure the value of 
n
 to create different QJump levels
n = number of hosts -- 
highest QJump level
bounded latency; very low throughput
n = 1 -- 
lowest QJump level
latency variance; line rate throughput
9
QJump within switches
 
Datacenter switches support 8 hardware enforced priorities
 
Map each “logical” QJump level to “physical” priority level
on switches
Highest QJump level mapped to highest priority level on switches
and so on
 
Packets from higher QJump levels can now “jump” the
queue in the switches
10
Evaluation
11
CDF - Naiad barrier sync latency
CDF - Memcached request latency
QJump resolves in-network interference
and attains near-ideal performance for real applications
Simulation : Workload
12
In web search workload, 95% of all bytes are from 30% of the
flows that are 1-20 MB
In data mining workload, 80% of flows are less than 10KB and
95% of all bytes are from 4% of the flows that are >35 MB
Simulation : Setup
13
QJump parameters
Maximum bytes that can be transmitted in an epoch (P) = 9KB
Bandwidth of slowest link (R) = 10Gbps
QJump levels = {1, 1.44, 7.2, 14.4, 28.8, 48, 72, 144}
varying value of n from lowest -> highest level
Simulation : Results
14
For short flows, on both workloads, QJUMP achieves average
and 99th percentile FCTs close to or better than pFabric
For long flows, on web search workload, QJump beats pFabric by
up to 20% at high load, but loses by 15% at low load
For long flows, on data mining workload, QJump average FCTs
are between 30% and 63% worse than pFabric’s
Conclusion
 
QJump applies QoS-inspired concepts to datacenter
applications to mitigate network interference
 
Offers multiple service levels with different latency variance
vs. throughput tradeoffs
 
Attains near-ideal performance for real applications in the
testbed and good flow completion times in simulations
 
QJump is immediately deployable and requires no
modifications to the hardware
 
 
15
Final thoughts
 
The Good 
can provide bounded latencies for applications that require it
does a good job of resolving interference via priorities
immediately deployable
 
The Bad 
QJump levels are determined by applications (instead of automatic
classification)
 
and The Ugly 
no principled way to figure out rate limit values for different
QJump levels
16
Discussion
 
1.
Are we fundamentally limited by statistical multiplexing when it comes
to achieving strong guarantees (latency, throughput, queuing) about
the network?
 
2.
Is it reasonable to trade-off throughput for strong latency guarantees?
17
 
Boston Viridis
Server = Calxeda SoC
900 CPUs
Network
 
Rack-scale computing
 
Resource disaggregation
Thank you!
18
Where in the network interference happens?
 
One instance of 
ping
 and two instances of 
iperf
sharing the same network
 
 
 
 
 
 
 
    
Paper focusses only on interference at shared switch queues
 
19
Median and 99
th
 percentile 
ping
 latencies
Slide Note
Embed
Share

Data centers face challenges in providing consistent low latencies due to in-network interference and varying workloads. This study explores solutions to guarantee strong latency performance, mitigate latency variance, and minimize performance degradation for latency-sensitive applications. By analyzing the causes of latency variance and implementing strategies like proactive rate-limiting, the research aims to achieve bounded latency and enhance user experience in modern data center environments.

  • Data Centers
  • Latency Variance
  • In-Network Interference
  • Latency Guarantees
  • Performance Degradation

Uploaded on Sep 21, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Queues dont matter when you can JUMP them! Matthew P. Grosvenor, Malte Schwarzkopf, Ionel Gog, Robert N. M. Watson, Andrew W. Moore, Steven Hand, Jon Crowcroft University of Cambridge Computer Laboratory Presented by Vishal Shrivastav, Cornell University

  2. Introduction Datacenters comprise of varying mixture of workloads some require very low latencies, some sustained high throughput, and others require some combination of both Statistical Multiplexing leads to in-network interference can lead to large latency variance and long latency tails leads to poor user experience and impacts revenue How to achieve strong (bounded?) latency guarantees in present day datacenters? 2

  3. What causes latency variance? Queue build-up packets from throughput intensive flows block a latency-sensitive packet Need a way to separate throughput intensive flows from latency- sensitive flows Incast packets from many different latency-sensitive flows hit the queue at the same time Need a way to proactively rate-limit latency-sensitive flows 3

  4. Setup 1 server running ptpd v2.1.0 synchronizing with a timeserver 1 server generating mixed GET/SET workload of 1 KB requests in TCP mode sent to memcached server 4 servers running 4-way barrier-synchronization benchmark using Naiad v0.2.3 benchmark 8 servers running Hadoop, performing a natural join between two 512 MB data sets (39M rows each) 4

  5. How bad it really is? CDF In-network interference can lead to significant increase in latencies and eventual performance degradation for latency- sensitive applications CDF 5

  6. Towards achieving bounded latency Servicing delay Time since a packet got assigned to an output port to when it is finally ready to be transmitted over the outgoing link Packets fanning in to a 4-port, virtual output queued switch. Output queues shown for port 3 only. Servicing delay is a function of the queue length 6

  7. Maximum servicing delay Assumptions Entire network abstracted as a single big switch Initially idle network Each host connected to the network via a single link Link rates do not decrease from edges to network core ????? ???? ????????? ????? ? ? ? + ? n = number of hosts P = maximum packet size R = bandwidth of slowest link ? = switch processing delay 7

  8. Rate-limiting to achieve bounded latency Network epoch maximum time that an idle network will take to service one packet from every sending host ??????? ???? = 2? ? ? + ? All hosts are rate-limited so that they can issue atmost one packet per epoch bounded queuing => bounded latency epoch 1 epoch 2 8

  9. What about throughput? Configure the value of n to create different QJump levels n = number of hosts -- highest QJump level bounded latency; very low throughput n = 1 -- lowest QJump level latency variance; line rate throughput 9

  10. QJump within switches Datacenter switches support 8 hardware enforced priorities Map each logical QJump level to physical priority level on switches Highest QJump level mapped to highest priority level on switches and so on Packets from higher QJump levels can now jump the queue in the switches 10

  11. Evaluation CDF - Naiad barrier sync latency QJump resolves in-network interference and attains near-ideal performance for real applications CDF - Memcached request latency 11

  12. Simulation : Workload In web search workload, 95% of all bytes are from 30% of the flows that are 1-20 MB In data mining workload, 80% of flows are less than 10KB and 95% of all bytes are from 4% of the flows that are >35 MB 12

  13. Simulation : Setup QJump parameters Maximum bytes that can be transmitted in an epoch (P) = 9KB Bandwidth of slowest link (R) = 10Gbps QJump levels = {1, 1.44, 7.2, 14.4, 28.8, 48, 72, 144} varying value of n from lowest -> highest level 13

  14. Simulation : Results For short flows, on both workloads, QJUMP achieves average and 99th percentile FCTs close to or better than pFabric For long flows, on web search workload, QJump beats pFabric by up to 20% at high load, but loses by 15% at low load For long flows, on data mining workload, QJump average FCTs are between 30% and 63% worse than pFabric s 14

  15. Conclusion QJump applies QoS-inspired concepts to datacenter applications to mitigate network interference Offers multiple service levels with different latency variance vs. throughput tradeoffs Attains near-ideal performance for real applications in the testbed and good flow completion times in simulations QJump is immediately deployable and requires no modifications to the hardware 15

  16. Final thoughts The Good can provide bounded latencies for applications that require it does a good job of resolving interference via priorities immediately deployable The Bad QJump levels are determined by applications (instead of automatic classification) and The Ugly no principled way to figure out rate limit values for different QJump levels 16

  17. Discussion 1. Are we fundamentally limited by statistical multiplexing when it comes to achieving strong guarantees (latency, throughput, queuing) about the network? 2. Is it reasonable to trade-off throughput for strong latency guarantees? Resource disaggregation Rack-scale computing Controllers: IO, memory... CPU NIC/Packet switch SoC d ports Network Boston Viridis Server = Calxeda SoC 900 CPUs 17

  18. Thank you! 18

Related


More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#