Enhancing Internet Backbone Performance through Parallel Resolution of Packets and Rules

 
P
O
P
E
 
a
n
d
 
P
a
N
e
L
 
H B Acharya
 
M
o
t
i
v
a
t
i
o
n
:
 
T
h
e
 
B
o
t
t
l
e
n
e
c
k
 
S. Keshav : the rate limiting step in Internet backbones is deciding
which action to perform on an arriving packet
o
Policies in routers and middleboxes consist of rules; the
challenge is to quickly decide which rule applies
 
To increase throughput, we match different packets in parallel
o
But this still requires the latency of checking all the rules
 
P
a
r
a
l
l
e
l
 
=
 
F
a
s
t
?
 
Concurrency simply means there is not a serializing causal
dependency – things can happen in any order
Parallelism means things actually happen at the same time
Policies are sequential: there is an order of precedence for the
rules in a policy.
Can they be evaluated in parallel (without an O(n) wait?)
 
C
o
n
t
e
n
t
s
 
Packets and Rules
Matching and Resolution
XMT
Parallel Resolution
o
starring Prefix-Sum
POPE and PaNeL
 
P
a
c
k
e
t
s
 
a
n
d
 
R
u
l
e
s
 
Decisions on packets are made based on the header values
o
Source and Destination address and port, protocol, etc.
o
These values are represented by integers
We represent a packet by a tuple of 
d 
integers
o
Each representing a field of interest, e.g. source IP
o
An example packet might be <7, 918, 4, 5, 14>
A rule consists of two components
o
A 
guard: d
 integer intervals (one for each field)
o
A 
decision
 specifying an action to take, eg. accept, discard
 
T
h
e
 
M
a
t
c
h
i
n
g
 
o
f
 
P
a
c
k
e
t
s
 
A rule matches a packet if
o
for all 
d
 fields,
o
the  value specified by the packet lies in the corresponding
interval specified in the rule.
 
For example, the rule
<[10, 100], [0, 255]> -> 
discard
matches the packet <51, 0>
and not <101, 0>
 
P
o
l
i
c
i
e
s
 
a
n
d
 
R
u
l
e
 
P
r
e
c
e
d
e
n
c
e
 
Practical policies can be large and complex
o
A routing table of 10^5 rules is not unusual
These rules can overlap
o
When multiple rules match the same packet, which applies?
o
Rule precedence!
 
T
h
e
 
o
r
d
e
r
 
o
f
 
r
u
l
e
s
 
In practice, the order of rules is quite complex
The rule with the “best” match (the smallest specified interval)
among the matching rules wins
To break ties, rules are ordered as follows
o
Static routes
o
Dynamic routes, in order
the usual order is EIGRP, OSPF, ISIS, RIP
o
Default routes
 
I
n
t
r
o
d
u
c
i
n
g
 
X
M
T
 
Serial computation: von Neumann machine
o
all memory accesses (from the one processor) occur at once
o
an instruction, available for execution, executes immediately
Parallel computation: PRAM
o
all memory accesses, from any processor, occur at once
XMT is a PRAM-like machine
o
many instructions, available for execution, execute
immediately, in parallel
 
W
h
y
 
X
M
T
?
 
O(1) Random Memory Access from any processor
Many Core Architecture (1024 cores) leads to better speedup.
Simple and efficient architecture: each Thread Control Unit (core)
runs a thread independent of the others
The XMT-C programming language is simple
 
M
a
t
c
h
i
n
g
 
R
u
l
e
s
 
i
n
 
P
a
r
a
l
l
e
l
 
Spawn a thread for each rule in the policy (in parallel)
The thread corresponding to a rule checks whether that rule
matches the packet or not
In O(1) time [or more accurately O(d) if we check d fields] we find,
for each rule, whether it matches the incoming packet
 
T
h
e
 
P
r
e
c
e
d
e
n
c
e
 
P
r
o
b
l
e
m
!
 
For a rule to match a packet is NOT enough!
When multiple rules match a packet (and have different actions) the
conflict is resolved using match semantics
o
W
e use first-match semantics: the 
first
 rule to match a packet is
the one applied.
o
This rule is said to 
resolve
 the packet
Serial checking: stop after the first rule matches the packet
Parallel checking: how to tell first match?
 
R
e
s
t
a
t
i
n
g
 
t
h
e
 
F
i
r
s
t
 
M
a
t
c
h
 
p
r
o
b
l
e
m
 
Consider an array A[n]
Each value of the array corresponds to a rule in the policy
o
A[i] = 1 iff rule i matches the packet
o
A[i] = 0 otherwise
Finding the first rule to match the packet reduces to finding the
smallest i s.t. A[i] = 1
 
F
i
r
s
t
 
M
a
t
c
h
 
w
i
t
h
 
P
r
e
f
i
x
 
S
u
m
s
 
There exists a standard algorithm to compute prefix match of an
array of n elements in O(log n)
o
Prefix sum A
+
 of array A is an array of length n, such that
A
+
[i] = A[1]+A[2] + … A[i]
For the first i such that A[i] = 1, A
+
[i] = 1 also
This is true only for this element.
o
Therefore, we can resolve packets in O(1) + O(log n) + O(1)
time rather than O(n) time, thanks to the O(log n) prefix-sum
algorithm.
 
F
i
r
s
t
 
A
t
t
e
m
p
t
 
:
 
P
O
P
E
 
For every rule in parallel, try to match the packet and store the
result of the attempt in corresponding space in A.
Each value of the array corresponds to a rule in the policy
o
A[i] = 1 iff rule i matches the packet
o
A[i] = 0 otherwise
 
Compute A+, the prefix sum of A
Perform the action of the rule no. i where A[i] == A
+
[i] == 1
 
S
c
a
l
i
n
g
 
P
r
o
b
l
e
m
s
 
POPE works very well for small policies
However, the performance drops off for larger policies (say of the
order of 10
4
 rules and above)
 
The reason is simple: POPE always checks ALL the rules
o
Whereas serial resolution stops after the first match … which in
a large policy, is likely to happen “early,” i.e. in rules high up in
the policy
 
S
e
c
o
n
d
 
A
t
t
e
m
p
t
:
 
P
a
N
e
L
 
Divide large policies up into “batches” of rules
o
We find experimentally that batches of 1000 rules work best
o
Probably because XMT had 1024 cores?
 
Perform the POPE algorithm batch by batch, until the first match is
found. (One batch at a time)
 
This reduces the average running time, though not the worst case
 
R
e
s
u
l
t
s
 
T
o
 
D
o
 
Compare the performance of parallelizing rule matching of packets
versus the solution of matching multiple packets in parallel.
o
Perhaps combine both approaches
The performance seems to vary with the “width index” – the bit
width of the fields tested. Further study is indicated
 
Q
 
&
 
A
 
P
r
e
f
i
x
 
S
u
m
 
1  2  3  4
3  7
10
1  2  3  4
3  10
10
 
P
r
e
f
i
x
 
S
u
m
 
1  3  3  10
3  10
10
1  3  6  10
3  10
10
Slide Note
Embed
Share

The bottleneck in Internet backbones lies in the decision-making process for incoming packets. This article explores the challenges faced in efficiently processing policies in routers and middleboxes by introducing parallel resolution techniques to increase throughput and reduce latency. It discusses the distinction between concurrency and parallelism, the necessity of evaluating policies in parallel, and the complexities of rule precedence in large and overlapping rule sets.

  • Internet backbone
  • Parallel resolution
  • Packet processing
  • Rule precedence
  • Policy evaluation

Uploaded on Sep 11, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. POPE and PaNeL H B Acharya

  2. Motivation: The Bottleneck S. Keshav : the rate limiting step in Internet backbones is deciding which action to perform on an arriving packet o Policies in routers and middleboxes consist of rules; the challenge is to quickly decide which rule applies To increase throughput, we match different packets in parallel o But this still requires the latency of checking all the rules

  3. Parallel = Fast? Concurrency simply means there is not a serializing causal dependency things can happen in any order Parallelism means things actually happen at the same time Policies are sequential: there is an order of precedence for the rules in a policy. Can they be evaluated in parallel (without an O(n) wait?)

  4. Contents Packets and Rules Matching and Resolution XMT Parallel Resolution o starring Prefix-Sum POPE and PaNeL

  5. Packets and Rules Decisions on packets are made based on the header values o Source and Destination address and port, protocol, etc. o These values are represented by integers We represent a packet by a tuple of d integers o Each representing a field of interest, e.g. source IP o An example packet might be <7, 918, 4, 5, 14> A rule consists of two components o A guard: d integer intervals (one for each field) o A decision specifying an action to take, eg. accept, discard

  6. The Matching of Packets A rule matches a packet if o for all d fields, o the value specified by the packet lies in the corresponding interval specified in the rule. For example, the rule <[10, 100], [0, 255]> -> discard matches the packet <51, 0> and not <101, 0>

  7. Policies and Rule Precedence Practical policies can be large and complex o A routing table of 10^5 rules is not unusual These rules can overlap o When multiple rules match the same packet, which applies? o Rule precedence!

  8. The order of rules In practice, the order of rules is quite complex The rule with the best match (the smallest specified interval) among the matching rules wins To break ties, rules are ordered as follows o Static routes o Dynamic routes, in order the usual order is EIGRP, OSPF, ISIS, RIP o Default routes

  9. Introducing XMT Serial computation: von Neumann machine o all memory accesses (from the one processor) occur at once o an instruction, available for execution, executes immediately Parallel computation: PRAM o all memory accesses, from any processor, occur at once XMT is a PRAM-like machine o many instructions, available for execution, execute immediately, in parallel

  10. Why XMT? O(1) Random Memory Access from any processor Many Core Architecture (1024 cores) leads to better speedup. Simple and efficient architecture: each Thread Control Unit (core) runs a thread independent of the others The XMT-C programming language is simple

  11. Matching Rules in Parallel Spawn a thread for each rule in the policy (in parallel) The thread corresponding to a rule checks whether that rule matches the packet or not In O(1) time [or more accurately O(d) if we check d fields] we find, for each rule, whether it matches the incoming packet

  12. The Precedence Problem! For a rule to match a packet is NOT enough! When multiple rules match a packet (and have different actions) the conflict is resolved using match semantics o We use first-match semantics: the first rule to match a packet is the one applied. o This rule is said to resolve the packet Serial checking: stop after the first rule matches the packet Parallel checking: how to tell first match?

  13. Restating the First Match problem Consider an array A[n] Each value of the array corresponds to a rule in the policy o A[i] = 1 iff rule i matches the packet o A[i] = 0 otherwise Finding the first rule to match the packet reduces to finding the smallest i s.t. A[i] = 1

  14. First Match with Prefix Sums There exists a standard algorithm to compute prefix match of an array of n elements in O(log n) o Prefix sum A+of array A is an array of length n, such that A+[i] = A[1]+A[2] + A[i] For the first i such that A[i] = 1, A+[i] = 1 also This is true only for this element. o Therefore, we can resolve packets in O(1) + O(log n) + O(1) time rather than O(n) time, thanks to the O(log n) prefix-sum algorithm.

  15. First Attempt : POPE For every rule in parallel, try to match the packet and store the result of the attempt in corresponding space in A. Each value of the array corresponds to a rule in the policy o A[i] = 1 iff rule i matches the packet o A[i] = 0 otherwise Compute A+, the prefix sum of A Perform the action of the rule no. i where A[i] == A+[i] == 1

  16. Scaling Problems POPE works very well for small policies However, the performance drops off for larger policies (say of the order of 104rules and above) The reason is simple: POPE always checks ALL the rules o Whereas serial resolution stops after the first match which in a large policy, is likely to happen early, i.e. in rules high up in the policy

  17. Second Attempt: PaNeL Divide large policies up into batches of rules o We find experimentally that batches of 1000 rules work best o Probably because XMT had 1024 cores? Perform the POPE algorithm batch by batch, until the first match is found. (One batch at a time) This reduces the average running time, though not the worst case

  18. Results

  19. To Do Compare the performance of parallelizing rule matching of packets versus the solution of matching multiple packets in parallel. o Perhaps combine both approaches The performance seems to vary with the width index the bit width of the fields tested. Further study is indicated

  20. Q & A

  21. Prefix Sum 1 2 3 4 3 7 10 1 2 3 4 3 10 10

  22. Prefix Sum 1 3 3 10 3 10 10 1 3 6 10 3 10 10

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#