Understanding Programmable Traffic Management for Network Optimization

 
P
4
 
P
r
o
g
r
a
m
m
a
b
l
e
T
r
a
f
f
i
c
 
M
a
n
a
g
e
m
e
n
t
 
S
t
e
p
h
e
n
 
I
b
a
n
e
z
 
&
 
G
o
r
d
o
n
 
B
r
e
b
n
e
r
9
/
1
2
/
2
0
1
8
 
G
o
a
l
s
 
˃
M
o
ti
v
a
ti
n
g
 
D
i
s
c
u
s
s
i
o
n
s
˃
T
r
a
ffi
c
 
M
a
n
a
g
e
r
 
A
r
c
h
i
t
e
c
t
u
r
a
l
 
M
o
d
e
l
 
O
v
e
r
v
i
e
w
˃
P
u
t
a
ti
v
e
 
P
4
 
E
x
t
e
n
s
i
o
n
s
˃
M
o
v
i
n
g
 
F
o
r
w
a
r
d
 
D
i
s
c
u
s
s
i
o
n
s
 
>
>
 
2
W
h
a
t
 
i
s
 
T
r
a
f
f
i
c
 
M
a
n
a
g
e
m
e
n
t
?
 
˃
P
a
c
k
e
t
 
S
c
h
e
d
u
l
i
n
g
 
 
i
n
 
w
h
a
t
 
o
r
d
e
r
?
˃
T
r
a
ffi
c
 
S
h
a
p
i
n
g
 
 
a
t
 
w
h
a
t
 
r
a
t
e
/
p
a
c
i
n
g
?
˃
P
o
l
i
c
i
n
g
 
 
i
s
 
p
a
c
k
e
t
 
c
o
m
p
l
i
a
n
t
?
˃
D
r
o
p
 
P
o
l
i
c
y
 
 
h
o
w
 
t
o
 
a
v
o
i
d
 
/
 
d
e
a
l
 
w
i
t
h
 
c
o
n
g
e
s
ti
o
n
?
˃
P
a
c
k
e
t
 
B
u
ff
e
r
i
n
g
 
 
h
o
w
 
t
o
 
s
t
o
r
e
 
p
a
c
k
e
t
s
?
˃
P
a
c
k
e
t
 
R
e
p
l
i
c
a
ti
o
n
 
 
h
o
w
 
t
o
 
r
e
p
l
i
c
a
t
e
 
p
a
c
k
e
t
s
?
 
˃
C
l
a
s
s
i
fi
c
a
ti
o
n
 
 
h
o
w
 
t
o
 
m
a
p
 
p
a
c
k
e
t
s
 
i
n
t
o
 
q
u
e
u
e
s
?
>
>
 
3
W
h
e
r
e
 
a
r
e
 
t
r
a
f
f
i
c
 
m
a
n
a
g
e
r
s
 
u
s
e
d
?
 
˃
I
n
t
e
g
r
a
t
e
d
 
S
w
i
t
c
h
 
(
P
I
S
A
)
 
 
 
˃
N
I
C
 
o
r
 
L
i
n
e
 
C
a
r
d
>
>
 
4
W
h
y
 
s
h
o
u
l
d
 
w
e
 
c
a
r
e
 
a
b
o
u
t
 
T
r
a
f
f
i
c
 
M
a
n
a
g
e
m
e
n
t
?
 
˃
L
o
t
s
 
o
f
 
d
i
ff
e
r
e
n
t
 
t
y
p
e
s
 
o
f
 
t
r
a
ffi
c
 
w
/
 
d
i
ff
e
r
e
n
t
 
c
h
a
r
a
c
t
e
r
i
s
ti
c
s
 
a
n
d
 
r
e
q
u
i
r
e
m
e
n
t
s
:
Characteristics: burstiness, packet sizes, flow sizes, flow rates
Requirements: throughput, latency, loss, jitter, reordering, flow completion time, tail latency
˃
N
e
t
w
o
r
k
 
o
p
e
r
a
t
o
r
s
 
h
a
v
e
 
a
 
w
i
d
e
 
r
a
n
g
e
 
o
f
 
o
b
j
e
c
ti
v
e
s
:
Meet all SLAs
Maximize network utilization
Achieve fairness
˃
N
e
t
w
o
r
k
 
d
e
v
i
c
e
s
 
a
r
e
 
p
i
c
k
i
n
g
 
u
p
 
m
o
r
e
 
f
u
n
c
ti
o
n
a
l
i
t
y
 
 
 
˃
A
b
o
u
t
 
5
0
%
 
o
f
 
a
 
m
o
d
e
r
n
 
p
r
o
g
r
a
m
m
a
b
l
e
 
s
w
i
t
c
h
 
c
h
i
p
 
i
s
 
d
e
d
i
c
a
t
e
d
 
t
o
 
b
u
ff
e
r
i
n
g
 
a
n
d
t
r
a
f
f
i
c
 
m
a
n
a
g
e
m
e
n
t
 
l
o
g
i
c
 
 
b
u
t
 
i
s
 
n
o
t
 
p
r
o
g
r
a
m
m
a
b
l
e
!
>
>
 
5
 
W
h
y
 
s
h
o
u
l
d
 
w
e
 
c
a
r
e
 
a
b
o
u
t
 
T
r
a
f
f
i
c
 
M
a
n
a
g
e
m
e
n
t
?
 
˃
W
A
N
 
l
i
n
k
s
 
a
r
e
 
e
x
p
e
n
s
i
v
e
 
 
w
a
n
t
 
t
o
 
m
a
k
e
 
b
e
s
t
 
u
s
e
 
o
f
 
t
h
e
m
 
b
y
 
p
r
i
o
r
i
ti
z
i
n
g
 
t
r
a
ffi
c
˃
M
o
d
e
r
n
 
c
o
n
g
e
s
ti
o
n
 
c
o
n
t
r
o
l
 
a
l
g
o
r
i
t
h
m
s
 
r
e
l
y
 
o
n
 
a
c
c
u
r
a
t
e
 
p
a
c
k
e
t
 
p
a
c
i
n
g
 
(
e
.
g
.
 
B
B
R
[
1
]
,
T
i
m
e
l
y
[
2
]
)
˃
P
e
r
f
o
r
m
a
n
c
e
 
i
s
o
l
a
ti
o
n
 
f
o
r
 
t
h
o
u
s
a
n
d
s
 
o
f
 
V
M
s
 
(
m
i
l
l
i
o
n
s
 
o
f
 
fl
o
w
s
)
 
p
e
r
 
s
e
r
v
e
r
 
>
>
 
6
 
[1] Cardwell, Neal, et al. "BBR: Congestion-based congestion control." 
Queue
 14.5 (2016): 50.
[2] Mittal, Radhika, et al. "TIMELY: RTT-based Congestion Control for the Datacenter." 
ACM SIGCOMM Computer Communication Review
. Vol. 45. No. 4. ACM, 2015.
[3] Saeed, Ahmed, et al. "Carousel: Scalable traffic shaping at end hosts." Proceedings of the Conference of the ACM Special Interest Group on Data Communication. ACM, 2017.
B
e
n
e
f
i
t
s
 
o
f
 
P
r
o
g
r
a
m
m
a
b
l
e
 
T
r
a
f
f
i
c
 
M
a
n
a
g
e
m
e
n
t
 
˃
B
e
n
e
fi
t
s
 
o
f
 
a
n
 
u
n
a
m
b
i
g
u
o
u
s
 
w
a
y
 
t
o
 
d
e
fi
n
e
 
T
M
 
a
l
g
o
r
i
t
h
m
s
:
Portability
Formal verification and static analysis
Precise way to express customer needs
˃
B
e
n
e
fi
t
s
 
o
f
 
h
a
v
i
n
g
 
p
r
o
g
r
a
m
m
a
b
l
e
 
T
M
:
Innovation
Differentiation
Reliability
Network operators can fine tune for performance
Small menu of algorithms to choose from today
Many possible algorithms that can be expressed
>
>
 
7
 
E
x
i
s
t
i
n
g
 
T
M
 
a
l
g
o
r
i
t
h
m
s
 
a
r
e
 
t
o
o
 
l
i
m
i
t
i
n
g
 
˃
T
M
 
a
l
g
o
r
i
t
h
m
 
p
a
r
a
m
e
t
e
r
s
 
c
a
n
 
b
e
 
d
e
r
i
v
e
d
 
f
r
o
m
:
Network operator static configurations
Per packet information
Per flow or per class information (i.e. state stored across packets)
Current queue state
Aggregated queue state (e.g. avg sizes, avg service rates)
Other state that evolves over time
Randomness
Combinations of any of the above parameters
Combined arithmetically
Combined temporally (i.e. alternate between various policies)
Combined hierarchically
 
>
>
 
8
 
G
e
n
e
r
i
c
 
T
r
a
f
f
i
c
 
M
a
n
a
g
e
r
 
A
r
c
h
i
t
e
c
t
u
r
e
 
˃
O
u
t
p
u
t
 
q
u
e
u
e
d
˃
E
a
c
h
 
e
g
r
e
s
s
 
p
o
r
t
 
m
a
y
 
h
a
v
e
 
m
a
n
y
 
a
s
s
o
c
i
a
t
e
d
 
q
u
e
u
e
s
˃
A
 
N
I
C
 
m
a
y
 
n
o
t
 
n
e
e
d
 
a
 
c
r
o
s
s
b
a
r
 
o
r
 
P
R
E
 
>
>
 
9
Classification &
Policing & Drop
Policy
Non-P4-Prog
PRE
Non-P4-Prog
Buffer
Scheduling &
Shaping
 
C
r
o
s
s
b
a
r
 
P
r
o
p
o
s
e
d
 
T
M
 
A
r
c
h
i
t
e
c
t
u
r
e
 
M
o
d
e
l
 
>
>
 
10
10
 
˃
I
n
g
r
e
s
s
 
l
o
g
i
c
Classification
Policing
Drop Policy
˃
P
a
c
k
e
t
 
b
u
ff
e
r
˃
S
c
h
e
d
u
l
i
n
g
 
/
 
S
h
a
p
i
n
g
 
I
n
g
r
e
s
s
 
L
o
g
i
c
 
>
>
 
11
11
 
˃
M
a
t
c
h
 
/
 
A
c
ti
o
n
 
p
r
o
c
e
s
s
i
n
g
˃
C
l
a
s
s
i
fi
c
a
ti
o
n
:
Decide which queue
E.g. per flow queues
˃
P
o
l
i
c
i
n
g
:
Is packet compliant?
E.g. Use PSA meters to drop packets
˃
D
r
o
p
 
P
o
l
i
c
y
:
How to drop packets during periods of
perceived congestion
E.g. WRED
˃
E
x
t
r
a
c
t
 
s
c
h
e
d
u
l
i
n
g
 
m
e
t
a
d
a
t
a
˃
E
x
t
e
r
n
 
i
n
t
e
r
f
a
c
e
 
t
o
 
a
c
c
e
s
s
 
p
a
c
k
e
t
b
u
f
f
e
r
 
s
t
a
t
e
 
 
Queue 1
 
Queue i
 
Queue N
 
. . .
 
. . .
Classification &
Policing & Drop
Policy
 
Packet Buffer
 
Input
Packet
 
Output
Packet
Scheduling
/ Shaping
State
 
Extern
interface
 
P
a
c
k
e
t
 
B
u
f
f
e
r
 
˃
M
e
m
o
r
y
 
s
p
a
c
e
 
t
o
 
s
t
o
r
e
 
p
a
c
k
e
t
s
˃
D
i
v
i
d
e
d
 
i
n
t
o
 
q
u
e
u
e
s
 
t
o
 
i
s
o
l
a
t
e
 
t
r
a
ffi
c
˃
Q
u
e
u
e
 
b
o
u
n
d
a
r
i
e
s
 
m
a
y
 
b
e
 
fl
e
x
i
b
l
e
˃
P
a
c
k
e
t
 
o
r
d
e
r
 
i
n
 
q
u
e
u
e
 
 
s
c
h
e
d
u
l
i
n
g
o
r
d
e
r
˃
E
x
c
h
a
n
g
e
 
p
a
c
k
e
t
 
d
e
s
c
r
i
p
t
o
r
 
a
n
d
m
e
t
a
d
a
t
a
 
w
i
t
h
 
s
c
h
e
d
u
l
i
n
g
 
/
 
s
h
a
p
i
n
g
l
o
g
i
c
˃
S
t
a
t
e
:
Size of each queue
Accessible to ingress logic (read only)
 
>
>
 
12
12
 
Queue 1
 
Queue i
 
Queue N
 
. . .
 
. . .
Classification &
Policing & Drop
Policy
 
Packet Buffer
 
Input
Packet
 
Output
Packet
Scheduling
/ Shaping
State
 
Extern
interface
 
T
h
e
 
P
u
s
h
-
I
n
-
F
i
r
s
t
-
O
u
t
 
(
P
I
F
O
)
 
M
o
d
e
l
 
[
1
]
 
˃
W
h
a
t
 
i
s
 
a
 
P
I
F
O
?
 
 
˃
W
h
y
 
i
s
 
t
h
e
 
P
I
F
O
 
a
 
g
o
o
d
 
m
o
d
e
l
?
Scheduling decision made at time of enqueue 
 helps relax timing constraints 
 leads to
efficient implementation
Clear separation b/w fixed and programmable logic
˃
C
a
n
 
i
m
p
l
e
m
e
n
t
 
e
x
i
s
ti
n
g
 
a
l
g
o
r
i
t
h
m
s
:
Start Time Fair Queueing (STFQ)
, Least Slack-Time First (LSTF), Stop-and-Go Queueing,
Minimum rate guarantees
, 
fine grained priority scheduling
, Service-Curved Earliest Deadline
First (SC-EDF), Rate-Controlled Service Disciplines (RCSD)
Token bucket rate limiting
˃
C
a
n
 
i
m
p
l
e
m
e
n
t
 
n
e
w
 
a
l
g
o
r
i
t
h
m
s
 
>
>
 
13
13
S
c
h
e
d
u
l
i
n
g
 
/
 
S
h
a
p
i
n
g
 
T
r
e
e
˃
P
a
t
h
 
c
o
m
p
u
t
a
ti
o
n
Decide leaf node to enqueue into
˃
E
a
c
h
 
n
o
d
e
One PIFO
Programmable enq and deq logic
Potentially shared state b/w enq and
deq logic
˃
S
c
h
e
d
u
l
i
n
g
 
n
o
d
e
Implement scheduling algs
Leaf nodes store descriptors, parent
nodes store scheduling node refs
˃
S
h
a
p
i
n
g
 
n
o
d
e
Implement shaping algs
Shapes all nodes within its sub tree
>
>
 
14
14
 
S
c
h
e
d
u
l
i
n
g
 
&
 
S
h
a
p
i
n
g
D
e
m
o
n
s
t
r
a
t
i
o
n
 
S
c
h
e
d
u
l
i
n
g
 
a
n
d
 
S
h
a
p
i
n
g
 
>
>
 
16
16
 
S
c
h
e
d
u
l
i
n
g
 
S
c
h
e
d
u
l
i
n
g
Class
A
 
S
c
h
e
d
u
l
i
n
g
 
S
h
a
p
i
n
g
Class
B
 
˃
G
o
a
l
:
Enqueue 3 packets
Dequeue 3 packets
p1
E
n
q
u
e
u
e
 
P
a
c
k
e
t
 
1
>
>
 
17
17
S
Classification /
Policing /
Drop Policy
Queue 1
Queue i
Queue N
. . .
. . .
State
&p1
e
n
q
 
l
o
g
i
c
d
e
q
 
l
o
g
i
c
S
e
n
q
 
l
o
g
i
c
d
e
q
 
l
o
g
i
c
S
e
n
q
 
l
o
g
i
c
d
e
q
 
l
o
g
i
c
c
o
m
p
u
t
e
 
p
a
t
h
 
L
e
n
q
 
l
o
g
i
c
d
e
q
 
l
o
g
i
c
S
Shaping Node
p1
p2
E
n
q
u
e
u
e
 
P
a
c
k
e
t
 
2
>
>
 
18
18
S
Classification /
Policing /
Drop Policy
Queue 1
Queue i
Queue N
. . .
. . .
State
&p2
e
n
q
 
l
o
g
i
c
d
e
q
 
l
o
g
i
c
S
e
n
q
 
l
o
g
i
c
d
e
q
 
l
o
g
i
c
c
o
m
p
u
t
e
 
p
a
t
h
S
e
n
q
 
l
o
g
i
c
d
e
q
 
l
o
g
i
c
e
n
q
 
l
o
g
i
c
d
e
q
 
l
o
g
i
c
S
 
R
p1
p3
p2
E
n
q
u
e
u
e
 
P
a
c
k
e
t
 
3
>
>
 
19
19
S
Classification /
Policing /
Drop Policy
Queue 1
Queue i
Queue N
. . .
. . .
State
&p3
e
n
q
 
l
o
g
i
c
d
e
q
 
l
o
g
i
c
S
e
n
q
 
l
o
g
i
c
d
e
q
 
l
o
g
i
c
c
o
m
p
u
t
e
 
p
a
t
h
S
e
n
q
 
l
o
g
i
c
d
e
q
 
l
o
g
i
c
e
n
q
 
l
o
g
i
c
d
e
q
 
l
o
g
i
c
S
 
R
p1
p3
p2
E
n
q
u
e
u
e
 
P
a
c
k
e
t
 
2
(
c
o
n
t
i
n
u
e
d
)
>
>
 
20
20
S
Classification /
Policing /
Drop Policy
Queue 1
Queue i
Queue N
. . .
. . .
State
e
n
q
 
l
o
g
i
c
d
e
q
 
l
o
g
i
c
S
e
n
q
 
l
o
g
i
c
d
e
q
 
l
o
g
i
c
c
o
m
p
u
t
e
 
p
a
t
h
 
R
S
e
n
q
 
l
o
g
i
c
d
e
q
 
l
o
g
i
c
e
n
q
 
l
o
g
i
c
d
e
q
 
l
o
g
i
c
S
p1
p3
p2
D
e
q
u
e
u
e
 
P
a
c
k
e
t
 
3
>
>
 
21
21
S
Classification /
Policing /
Drop Policy
Queue 1
Queue i
Queue N
. . .
. . .
State
e
n
q
 
l
o
g
i
c
d
e
q
 
l
o
g
i
c
S
e
n
q
 
l
o
g
i
c
d
e
q
 
l
o
g
i
c
c
o
m
p
u
t
e
 
p
a
t
h
 
R
 
&
p
3
S
e
n
q
 
l
o
g
i
c
d
e
q
 
l
o
g
i
c
e
n
q
 
l
o
g
i
c
d
e
q
 
l
o
g
i
c
S
p1
p2
D
e
q
u
e
u
e
 
P
a
c
k
e
t
 
1
>
>
 
22
22
S
Classification /
Policing /
Drop Policy
Queue 1
Queue i
Queue N
. . .
. . .
State
e
n
q
 
l
o
g
i
c
d
e
q
 
l
o
g
i
c
S
e
n
q
 
l
o
g
i
c
d
e
q
 
l
o
g
i
c
c
o
m
p
u
t
e
 
p
a
t
h
 
L
 
&
p
1
S
e
n
q
 
l
o
g
i
c
d
e
q
 
l
o
g
i
c
e
n
q
 
l
o
g
i
c
d
e
q
 
l
o
g
i
c
S
p2
E
n
q
u
e
u
e
 
P
a
c
k
e
t
 
3
(
c
o
n
t
i
n
u
e
d
)
>
>
 
23
23
S
Classification /
Policing /
Drop Policy
Queue 1
Queue i
Queue N
. . .
. . .
State
e
n
q
 
l
o
g
i
c
d
e
q
 
l
o
g
i
c
S
e
n
q
 
l
o
g
i
c
d
e
q
 
l
o
g
i
c
c
o
m
p
u
t
e
 
p
a
t
h
 
R
S
e
n
q
 
l
o
g
i
c
d
e
q
 
l
o
g
i
c
e
n
q
 
l
o
g
i
c
d
e
q
 
l
o
g
i
c
S
p2
D
e
q
u
e
u
e
 
P
a
c
k
e
t
 
2
>
>
 
24
24
S
Classification /
Policing
 /
Drop Policy
Queue 1
Queue i
Queue N
. . .
. . .
State
e
n
q
 
l
o
g
i
c
d
e
q
 
l
o
g
i
c
S
e
n
q
 
l
o
g
i
c
d
e
q
 
l
o
g
i
c
c
o
m
p
u
t
e
 
p
a
t
h
 
R
 
&
p
2
S
e
n
q
 
l
o
g
i
c
d
e
q
 
l
o
g
i
c
e
n
q
 
l
o
g
i
c
d
e
q
 
l
o
g
i
c
S
P
r
o
g
.
 
D
r
o
p
 
P
o
l
i
c
y
>
>
 
25
25
S
Classification /
Policing /
Drop Policy
Queue 1
Queue i
Queue N
. . .
. . .
State
e
n
q
 
l
o
g
i
c
d
e
q
 
l
o
g
i
c
S
e
n
q
 
l
o
g
i
c
d
e
q
 
l
o
g
i
c
S
e
n
q
 
l
o
g
i
c
d
e
q
 
l
o
g
i
c
c
o
m
p
u
t
e
 
p
a
t
h
˃
E
x
t
e
r
n
 
i
n
t
e
r
f
a
c
e
 
t
o
 
r
e
a
d
 
b
u
ff
e
r
 
s
t
a
t
e
˃
C
a
n
 
i
m
p
l
e
m
e
n
t
 
W
R
E
D
-
l
i
k
e
 
p
o
l
i
c
i
e
s
N
e
w
 
E
x
t
e
r
n
s
 
/* PIFO extern */
extern 
pifo<
T
,
 M
> {
   pifo(
bit
<
32
> size);          
// constructor
   enq(
in
 
T
 rank, 
in M
 meta);   
// meta is optional
   deq(
out
 
T
 rank, 
out M
 meta); 
// meta is optional
}
// example instantiation
pifo<
rank_t, sched_meta_t
>(
2048
) p;
 
/* Packet buffer extern */
extern 
buffer<
T
> {
   buffer(
bit
<
32
>
 
num_queues);
   
// get the current size of a queue
   get_q_size(
in
 
T 
q_id, 
out
 
bit
<
32
> size);
}
// example instantiation
b
u
f
f
e
r
<
q
i
d
_
t
>
(
1
0
2
4
)
 
b
u
f
;
>
>
 
26
26
S
i
m
p
l
e
 
A
r
c
h
i
t
e
c
t
u
r
e
parser
 Parser<H, M>(
packet_in
 b,
                    
out
 H hdr,
                    
out
 M user_meta,
                    
inout
 
std_meta_t
 std_meta);
control
 Ingress<H, M, D>(
inout
 H hdr,
                         
out
 D sched_meta,
                         
inout
 M user_meta,
                         
inout
 
std_meta_t
 std_meta);
>
>
 
27
27
scheduler
 MyScheduler<D>(
in
 
D
 sched_meta);
control
 Egress<H, M>(
inout
 H hdr,
                     
inout
 M user_meta,
                     
inout
 
std_meta_t
 std_meta);
control
 Deparser<H, M>(
packet_out
 b,
                       
in
 H hdr,
                       
in
 M user_meta,
                       
inout
 std_meta_t std_meta);
Ingress M/A
Ingress
Parser
Egress
Deparser
Egress M/A
Scheduling /
Shaping
Non-P4-
Programmable
PRE / buffer
Classification &
Policing & Drop
Policy
User defined
scheduling
metadata
S
c
h
e
d
u
l
e
r
 
B
l
o
c
k
scheduler
 MyScheduler(
in
 
sched_meta_t
 sched_meta)
{
   
/* Define PIFO tree nodes */
   
/* root node */
   
node
 strict_priority {
       
type
 = scheduling;
       pifo<
rank_t
>(
2048
) p;
       
enqueue
 = { }
       
dequeue
 = { }
   }
   
/* shaping node */
   
node
 token_bucket {
       
type
 = shaping;
       pifo<
rank_t, sched_meta_t
>(
2048
) p;
       
enqueue
 = {
 
}
       
dequeue
 = { }
   }
   
/* Define the shape 
of the tree */
   
tree
 myTree { strict_priority(), {wfq(), {token_bucket(), {wfq()} } }
   
table
 find_path { }
   
apply
 {
       find_path.
apply
();
       
// apply the scheduling algorithm defined by the tree
       myTree.
apply
(leaf_node);
   }
}
>
>
 
28
28
 
E
x
a
m
p
l
e
 
 
S
t
r
i
c
t
 
P
r
i
o
r
i
t
y
 
   
/* root node */
   
node
 strict {
 
 
 
 
 
 
 
t
y
p
e
 
=
 
s
c
h
e
d
u
l
i
n
g
;
       
// PIFO extern instantiation format:
       // pifo<rank_type>(size_in_pkts) instance_name;
       pifo<
rank_t
>(
2048
) p;
 
       
enqueue
 = {
           
rank_t
 rank;
           rank = sched_meta.diffserv; 
// maybe use a table?
           
p.enq(rank);
       }
 
       
dequeue
 = {
           
rank_t
 rank;
           
p.deq(rank);
       }
   }
 
>
>
 
29
29
 
S
t
r
i
c
t
Class
A
Class
B
 
h
i
g
h
 
l
o
w
 
E
x
a
m
p
l
e
 
 
S
t
r
i
c
t
 
P
r
i
o
r
i
t
y
 
w
i
t
h
 
S
t
a
r
v
a
t
i
o
n
 
P
r
e
v
e
n
t
i
o
n
 
˃
F
C
F
S
 
n
o
d
e
s
 
p
r
e
v
e
n
t
 
r
e
o
r
d
e
r
i
n
g
 
o
f
 
p
a
c
k
e
t
s
w
i
t
h
i
n
 
a
 
f
l
o
w
˃
A
l
s
o
 
u
s
e
f
u
l
 
f
o
r
 
t
r
a
ffi
c
 
t
h
a
t
 
i
s
 
s
e
n
s
i
ti
v
e
 
t
o
j
i
t
t
e
r
 
(
e
.
g
.
 
v
o
i
c
e
)
 
>
>
 
30
30
 
E
x
a
m
p
l
e
 
 
S
t
r
i
c
t
 
P
r
i
o
r
i
t
y
 
w
i
t
h
 
S
t
a
r
v
a
t
i
o
n
 
P
r
e
v
e
n
t
i
o
n
 
    node
 token_bucket {
 
 
 
 
 
 
 
 
t
y
p
e
 
=
 
s
h
a
p
i
n
g
;
        pifo<
rank_t
,
 sched_meta_t
>(
2048
) p;
        
enqueue
 = {
           // declare registers: tb, last_time, rate, max_tokens
           
@atomic
 {
               
// atomically update last_time and tb registers
               tb.read(
0
, old_tokens);
               old_tokens = old_tokens + r * (now - lt);
               
if
 (old_tokens > B) {
                   old_tokens = B;
               }
               
// compute new tokens and send time
               
if
 (sm.pkt_len <= old_tokens) {
                   
new_tokens = old_tokens – sm.pkt_len;
                   send_time = now;
               } 
else
 {
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
n
e
w
_
t
o
k
e
n
s
 
=
 
0
;
                   send_time = now + (sm.pkt_len – old_tokens)/r;
               }
               tb.write(
0
, new_tokens);
           }
           
p.enq(send_time, sm);
       }
       
dequeue
 = {
           
bit
<
rank_t
> rank;
           
p.deq(rank, sm);
       }
    }
 
>
>
 
31
31
 
E
x
a
m
p
l
e
 
 
S
t
r
i
c
t
 
P
r
i
o
r
i
t
y
 
w
i
t
h
 
S
t
a
r
v
a
t
i
o
n
 
P
r
e
v
e
n
t
i
o
n
 
    node
 FCFS {
 
 
 
 
 
 
 
 
t
y
p
e
 
=
 
s
c
h
e
d
u
l
i
n
g
;
        pifo<
rank_t
>(
2048
) p;
        
enqueue
 = {
            bit<rank_t> rank = get_time(); 
// extern function
            p.enq(rank);
        }
        
dequeue
 = {
            
bit
<
rank_t
> rank;
            
p.deq(rank);
        }
    }
 
   
/* Define the shape 
of the tree */
   
tree
 myTree { strict_priority(), {FCFS(), {token_bucket(), {FCFS()}}}
}
   
apply
 {
       
// path computation logic
       
pifo_id_t 
leaf_node;
       
if
 (sched_meta.flow_id == 0) {
           leaf_node = 
RIGHT
;
       } 
else 
{
           leaf_node = 
LEFT
;
 
 
 
 
 
 
 
}
       
// apply the scheduling / shaping algorithm
       myTree.
apply
(leaf_node);
   }
 
>
>
 
32
32
 
E
x
a
m
p
l
e
 
 
L
o
w
 
L
a
t
e
n
c
y
 
T
r
a
f
f
i
c
 
˃
W
F
Q
 
a
c
r
o
s
s
 
a
l
l
 
q
u
e
u
e
s
 
u
n
ti
l
 
a
 
p
a
r
ti
c
u
l
a
r
 
q
u
e
u
e
e
x
c
e
e
d
s
 
a
 
c
o
n
f
i
g
u
r
e
d
 
t
h
r
e
s
h
o
l
d
 
a
t
 
w
h
i
c
h
 
p
o
i
n
t
t
h
a
t
 
q
u
e
u
e
 
i
s
 
g
i
v
e
n
 
s
t
r
i
c
t
 
p
r
i
o
r
i
t
y
 
u
n
t
i
l
 
i
t
 
h
a
s
b
e
e
n
 
s
u
f
f
i
c
i
e
n
t
l
y
 
d
r
a
i
n
e
d
˃
C
l
a
s
s
i
fi
c
a
ti
o
n
 
s
e
t
s
 
d
e
s
i
r
e
d
 
q
u
e
u
e
 
s
i
z
e
 
i
n
s
c
h
e
d
_
m
e
t
a
 
>
>
 
33
33
 
l
o
w
 
l
a
t
e
n
c
y
 
F
C
F
S
Flow
A
 
F
C
F
S
Flow
B
 
F
C
F
S
Flow
C
 
E
x
a
m
p
l
e
 
 
L
o
w
 
L
a
t
e
n
c
y
 
T
r
a
f
f
i
c
 
I
n
g
r
e
s
s
 
L
o
g
i
c
:
buffer<
qid_t
>(
NUM_QUEUES
) buf; 
// instantiate buffer extern
// get size of low latency queue
sched_meta.ll_q_size = buf.get_q_size(
LL_Q_ID
);
lookup_queue.
apply
(); 
// set egress queue
R
o
o
t
 
N
o
d
e
:
node
 low_latency {
 
 
 
 
t
y
p
e
 
=
 
s
c
h
e
d
u
l
i
n
g
;
    pifo<
rank_t
>(
2048
) p;
    
enqueue
 = {
        if (sched_meta.ll_q_size > 
THRESH
) {
            
// Strict priority logic
        } else {
            
// WFQ logic
        }
        
p.enq(rank);
    
}
    
dequeue
 = {
        
rank_t
 rank;
        
p.deq(rank);
    }
}
 
>
>
 
34
34
 
l
o
w
 
l
a
t
e
n
c
y
 
F
C
F
S
Flow
A
 
F
C
F
S
Flow
B
 
F
C
F
S
Flow
C
 
R
u
n
t
i
m
e
 
C
o
n
t
r
o
l
 
˃
T
h
e
 
t
r
e
e
 
c
a
n
 
b
e
 
u
p
d
a
t
e
d
 
b
y
 
t
h
e
 
c
o
n
t
r
o
l
 
p
l
a
n
e
,
 
b
u
t
 
n
o
t
 
f
r
o
m
 
w
i
t
h
i
n
 
t
h
e
 
P
4
 
p
r
o
g
r
a
m
Akin to adding table entries
˃
W
i
l
l
 
n
e
e
d
 
P
4
R
u
n
ti
m
e
 
s
u
p
p
o
r
t
 
f
o
r
 
t
r
e
e
 
c
o
n
fi
g
u
r
a
ti
o
n
˃
T
a
b
l
e
s
 
a
n
d
 
e
x
t
e
r
n
s
 
c
a
n
 
b
e
 
h
a
n
d
l
e
d
 
n
o
r
m
a
l
l
y
 
>
>
 
35
35
 
O
p
e
n
 
S
o
u
r
c
e
 
I
m
p
l
e
m
e
n
t
a
t
i
o
n
s
 
˃
N
e
t
F
P
G
A
 
P
r
o
t
o
t
y
p
e
 
(
P
4
 
W
o
r
k
s
h
o
p
 
D
e
m
o
)
 
˃
P
I
F
O
 
p
a
p
e
r
 
A
S
I
C
 
d
e
s
i
g
n
 
[
1
]
 
C
h
a
l
l
e
n
g
e
s
 
w
i
t
h
 
T
M
 
P
r
o
g
r
a
m
m
a
b
i
l
i
t
y
 
˃
S
c
a
l
i
n
g
 
p
r
o
g
r
a
m
m
a
b
l
e
 
d
e
s
i
g
n
s
 
t
o
 
h
i
g
h
 
r
a
t
e
s
 
a
n
d
 
l
a
r
g
e
 
s
i
z
e
s
Bottleneck #1 – timing closure for scheduling decision logic
Bottleneck #2 – memory bandwidth/interface to large packet buffer
˃
A
b
s
t
r
a
c
ti
o
n
s
 
t
o
 
p
r
o
g
r
a
m
 
t
h
e
 
P
I
F
O
 
t
r
e
e
˃
L
i
m
i
t
a
ti
o
n
s
 
o
f
 
t
h
e
 
p
r
o
p
o
s
e
d
 
m
o
d
e
l
 
>
>
 
37
37
 
M
o
v
i
n
g
 
F
o
r
w
a
r
d
 
˃
M
o
ti
v
a
ti
o
n
 
f
r
o
m
 
t
h
e
 
c
o
m
m
u
n
i
t
y
?
˃
S
e
tt
l
e
 
o
n
 
a
r
c
h
i
t
e
c
t
u
r
e
 
m
o
d
e
l
˃
D
e
fi
n
e
 
l
a
n
g
u
a
g
e
 
s
e
m
a
n
ti
c
s
Is a tree too low-level?
Are there better abstractions?
˃
I
m
p
r
o
v
e
 
o
p
e
n
 
s
o
u
r
c
e
 
H
W
 
i
m
p
l
e
m
e
n
t
a
ti
o
n
s
˃
O
p
e
n
 
s
o
u
r
c
e
 
s
i
m
u
l
a
ti
o
n
/
e
m
u
l
a
ti
o
n
 
e
n
v
i
r
o
n
m
e
n
t
Bmv2? NS3? Other?
˃
S
m
a
l
l
 
P
4
 
T
M
 
W
o
r
k
i
n
g
 
G
r
o
u
p
 
>
>
 
38
38
 
R
e
f
e
r
e
n
c
e
s
 
[1] Anirudh Sivaraman, Suvinay Subramanian, Mohammad Alizadeh, Sharad Chole, Shang-
Tse Chuang, Anurag Agrawal, Hari Balakrishnan, Tom Edsall, Sachin Katti, Nick McKeown.
"Programmable packet scheduling at line rate." Proceedings of the 2016 ACM SIGCOMM
Conference  
https://cs.nyu.edu/~anirudh/pifo-sigcomm.pdf
 
>
>
 
39
39
 
Q
u
e
s
t
i
o
n
s
?
 
N
e
t
F
P
G
A
 
P
r
o
t
o
t
y
p
e
 
(
P
4
 
W
o
r
k
s
h
o
p
 
D
e
m
o
)
 
˃
P
a
r
a
l
l
e
l
 
d
e
t
e
r
m
i
n
i
s
ti
c
 
s
k
i
p
 
l
i
s
t
s
a
n
d
 
r
e
g
i
s
t
e
r
-
b
a
s
e
d
 
s
o
r
t
i
n
g
 
c
a
c
h
e
˃
B
R
A
M
 
b
a
s
e
d
 
p
a
c
k
e
t
 
b
u
ff
e
r
 
>
>
 
41
41
Load
Balancer
Selector
Register
Cache
 
. . .
 
Insertion
 
Removal
 
Top level PIFO block diagram
 
P
I
F
O
 
P
a
p
e
r
 
A
S
I
C
 
D
e
s
i
g
n
 
[
1
]
 
˃
F
l
o
w
 
s
c
h
e
d
u
l
e
r
Choose amongst head pkt of each flow
˃
R
a
n
k
 
S
t
o
r
e
Store computed ranks for each flow in
FIFO order
˃
P
I
F
O
 
b
l
o
c
k
s
 
c
o
n
n
e
c
t
e
d
 
i
n
 
a
 
f
u
l
l
 
m
e
s
h
˃
6
4
 
p
o
r
t
 
1
0
G
b
p
s
 
s
h
a
r
e
d
 
m
e
m
o
r
y
 
s
w
i
t
c
h
,
1
G
H
z
˃
1
0
0
0
 
fl
o
w
s
,
 
6
4
K
 
p
a
c
k
e
t
s
 
>
>
 
42
42
 
T
M
 
M
o
d
e
l
 
L
i
m
i
t
a
t
i
o
n
s
I
n
p
u
t
 
v
s
 
O
u
t
p
u
t
 
R
a
t
e
 
L
i
m
i
t
i
n
g
˃
G
o
a
l
 
o
f
 
r
a
t
e
 
l
i
m
i
ti
n
g
:
 
c
o
n
t
r
o
l
 
t
h
e
 
r
a
t
e
 
a
t
 
w
h
i
c
h
 
b
y
t
e
s
 
a
r
e
 
r
e
m
o
v
e
d
 
f
r
o
m
 
t
h
e
 
b
u
ff
e
r
a
n
d
 
s
c
h
e
d
u
l
e
d
.
Output rate limiting – direct approach, short and long term rate guarantees
Input rate limiting – indirect approach, long term rate guarantees
>
>
 
44
44
l
o
w
h
i
g
h
Sent out at line
rate
6
p3
A
r
b
i
t
r
a
r
y
 
r
e
o
r
d
e
r
i
n
g
 
o
f
 
b
u
f
f
e
r
e
d
 
p
a
c
k
e
t
s
˃
P
f
a
b
r
i
c
 
s
c
h
e
d
u
l
i
n
g
 
o
r
d
e
r
:
SRPT with FIFO order within flows
>
>
 
45
45
8
p2
9
p1
7
p0
A
p
p
r
o
x
i
m
a
t
e
 
P
f
a
b
r
i
c
>
>
 
46
46
S
R
P
T
F
C
F
S
F
C
F
S
7
p0
0
p0
7
p0
7
L
9
p1
9
p1
0
p1
9
R
8
p2
8
p2
1
p2
8
R
6
p3
6
p3
2
p3
6
R
 
F
i
n
a
l
 
S
c
h
e
d
u
l
i
n
g
 
O
r
d
e
r
:
 
P
f
a
b
r
i
c
 
S
c
h
e
d
u
l
i
n
g
 
O
r
d
e
r
:
 
S
o
f
t
 
R
a
t
e
 
L
i
m
i
t
i
n
g
 
˃
R
a
t
e
 
l
i
m
i
t
 
fl
o
w
 
A
 
t
o
 
2
 
G
b
p
s
 
o
n
l
y
 
i
f
 
t
h
e
 
e
g
r
e
s
s
 
l
i
n
k
 
i
s
 
c
o
n
g
e
s
t
e
d
˃
H
a
r
d
 
v
s
.
 
S
o
ft
 
R
a
t
e
 
L
i
m
i
ti
n
g
:
H
a
r
d
 
 
h
o
l
d
 
p
a
c
k
e
t
s
 
u
n
t
i
l
 
t
i
m
e
 
t
o
 
s
e
n
d
 
(
n
o
n
 
w
o
r
k
 
c
o
n
s
e
r
v
i
n
g
)
S
o
f
t
 
 
p
a
c
k
e
t
s
 
c
a
n
 
b
e
 
s
e
n
t
 
i
f
 
b
a
n
d
w
i
d
t
h
 
i
s
 
a
v
a
i
l
a
b
l
e
 
(
w
o
r
k
 
c
o
n
s
e
r
v
i
n
g
)
 
>
>
 
47
47
p2
S
o
f
t
 
R
a
t
e
 
L
i
m
i
t
i
n
g
>
>
 
48
48
S
Classification /
Policing /
Drop Policy
Queue 1
Queue i
Queue N
. . .
. . .
State
e
n
q
 
l
o
g
i
c
d
e
q
 
l
o
g
i
c
S
e
n
q
 
l
o
g
i
c
d
e
q
 
l
o
g
i
c
c
o
m
p
u
t
e
 
p
a
t
h
 
&
p
2
d
e
f
a
u
l
t
S
e
n
q
 
l
o
g
i
c
d
e
q
 
l
o
g
i
c
e
n
q
 
l
o
g
i
c
d
e
q
 
l
o
g
i
c
S
What if there are multiple
shaping nodes?
 
R
 
I
n
g
r
e
s
s
 
L
o
g
i
c
 
L
i
m
i
t
a
t
i
o
n
s
 
˃
D
r
o
p
 
p
o
l
i
c
y
:
No drop head algorithms (unless using speedup)
˃
N
o
 
a
l
g
o
r
i
t
h
m
s
 
t
h
a
t
 
m
u
s
t
 
p
r
o
g
r
a
m
m
a
ti
c
a
l
l
y
 
u
p
d
a
t
e
 
s
t
a
t
e
 
i
n
 
t
h
e
 
p
a
c
k
e
t
 
b
u
ff
e
r
 
>
>
 
49
49
 
B
e
s
t
 
E
f
f
o
r
t
 
f
o
r
 
T
r
a
f
f
i
c
 
S
h
a
p
i
n
g
 
˃
S
h
a
p
i
n
g
 
o
p
e
r
a
ti
o
n
s
 
c
a
u
s
e
 
c
o
n
fl
i
c
t
s
Scheduling and shaping transactions commit at same time
Resolved in favor of scheduling transactions
Rationale: scheduling algorithms must run at line rate, shaping algorithms can afford to be
delayed a few cycles
 
>
>
 
50
50
p4
p1
p3
p2
E
n
q
u
e
u
e
 
C
o
n
f
l
i
c
t
>
>
 
51
51
S
Classification /
Policing /
Drop Policy
Queue 1
Queue i
Queue N
. . .
. . .
State
e
n
q
 
l
o
g
i
c
d
e
q
 
l
o
g
i
c
S
e
n
q
 
l
o
g
i
c
d
e
q
 
l
o
g
i
c
c
o
m
p
u
t
e
 
p
a
t
h
 
R
&p4
 
L
S
e
n
q
 
l
o
g
i
c
d
e
q
 
l
o
g
i
c
e
n
q
 
l
o
g
i
c
d
e
q
 
l
o
g
i
c
S
 
C
o
n
s
t
r
a
i
n
t
s
 
˃
1
 
e
n
q
u
e
u
e
 
a
n
d
 
1
 
d
e
q
u
e
u
e
 
a
t
 
e
a
c
h
 
l
e
v
e
l
 
o
f
 
t
r
e
e
 
p
e
r
 
ti
m
e
 
s
l
o
t
˃
N
e
e
d
 
t
o
 
d
e
a
l
 
w
i
t
h
 
w
r
a
p
 
a
r
o
u
n
d
 
o
f
 
r
a
n
k
 
v
a
l
u
e
s
 
>
>
 
52
52
 
S
c
h
e
d
u
l
i
n
g
 
D
e
m
o
n
s
t
r
a
t
i
o
n
 
S
c
h
e
d
u
l
i
n
g
 
>
>
 
54
54
 
S
c
h
e
d
u
l
i
n
g
 
S
c
h
e
d
u
l
i
n
g
Class
A
 
S
c
h
e
d
u
l
i
n
g
Class
B
 
˃
G
o
a
l
:
Enqueue 3 packets
Dequeue 3 packets
p1
E
n
q
u
e
u
e
 
P
a
c
k
e
t
 
1
>
>
 
55
55
S
Classification /
Policing /
Drop Policy
Queue 1
Queue i
Queue N
. . .
. . .
State
&p1
e
n
q
 
l
o
g
i
c
d
e
q
 
l
o
g
i
c
S
e
n
q
 
l
o
g
i
c
d
e
q
 
l
o
g
i
c
S
e
n
q
 
l
o
g
i
c
d
e
q
 
l
o
g
i
c
c
o
m
p
u
t
e
 
p
a
t
h
 
L
p1
p2
E
n
q
u
e
u
e
 
P
a
c
k
e
t
 
2
>
>
 
56
56
S
Classification /
Policing /
Drop Policy
Queue 1
Queue i
Queue N
. . .
. . .
State
&p2
e
n
q
 
l
o
g
i
c
d
e
q
 
l
o
g
i
c
S
e
n
q
 
l
o
g
i
c
d
e
q
 
l
o
g
i
c
S
e
n
q
 
l
o
g
i
c
d
e
q
 
l
o
g
i
c
c
o
m
p
u
t
e
 
p
a
t
h
 
R
p1
p3
p2
E
n
q
u
e
u
e
 
P
a
c
k
e
t
 
3
>
>
 
57
57
S
Classification /
Policing /
Drop Policy
Queue 1
Queue i
Queue N
. . .
. . .
State
&p3
e
n
q
 
l
o
g
i
c
d
e
q
 
l
o
g
i
c
S
e
n
q
 
l
o
g
i
c
d
e
q
 
l
o
g
i
c
S
e
n
q
 
l
o
g
i
c
d
e
q
 
l
o
g
i
c
c
o
m
p
u
t
e
 
p
a
t
h
 
R
p1
p3
p2
D
e
q
u
e
u
e
 
P
a
c
k
e
t
 
3
>
>
 
58
58
S
Classification /
Policing /
Drop Policy
Queue 1
Queue i
Queue N
. . .
. . .
State
e
n
q
 
l
o
g
i
c
d
e
q
 
l
o
g
i
c
S
e
n
q
 
l
o
g
i
c
d
e
q
 
l
o
g
i
c
S
e
n
q
 
l
o
g
i
c
d
e
q
 
l
o
g
i
c
c
o
m
p
u
t
e
 
p
a
t
h
 
R
 
&
p
3
p1
p2
D
e
q
u
e
u
e
 
P
a
c
k
e
t
 
2
>
>
 
59
59
S
Classification /
Policing /
Drop Policy
Queue 1
Queue i
Queue N
. . .
. . .
State
e
n
q
 
l
o
g
i
c
d
e
q
 
l
o
g
i
c
S
e
n
q
 
l
o
g
i
c
d
e
q
 
l
o
g
i
c
S
e
n
q
 
l
o
g
i
c
d
e
q
 
l
o
g
i
c
c
o
m
p
u
t
e
 
p
a
t
h
 
R
 
&
p
2
p1
D
e
q
u
e
u
e
 
P
a
c
k
e
t
 
1
>
>
 
60
60
S
Classification /
Policing /
Drop Policy
Queue 1
Queue i
Queue N
. . .
. . .
State
e
n
q
 
l
o
g
i
c
d
e
q
 
l
o
g
i
c
S
e
n
q
 
l
o
g
i
c
d
e
q
 
l
o
g
i
c
S
e
n
q
 
l
o
g
i
c
d
e
q
 
l
o
g
i
c
c
o
m
p
u
t
e
 
p
a
t
h
 
L
 
&
p
1
 
A
d
d
i
t
i
o
n
a
l
 
E
x
a
m
p
l
e
s
 
E
x
a
m
p
l
e
 
 
S
h
o
r
t
e
s
t
 
R
e
m
a
i
n
i
n
g
 
P
r
o
c
e
s
s
i
n
g
 
T
i
m
e
 
(
S
R
P
T
)
 
I
n
g
r
e
s
s
 
L
o
g
i
c
:
table
 lookup_queue {
    
key
 = { sched_meta.rank: 
ternary
; }
    
actions
 = { set_queue; }
    
size
 = 
1024
;
    default_action = set_default_queue;
}
// apply block
sched_meta.rank = hdr.bytes_remaining;
lookup_queue.
apply
();
 
S
c
h
e
d
u
l
i
n
g
 
N
o
d
e
:
node
 SRPT {
 
 
 
 
t
y
p
e
 
=
 
s
c
h
e
d
u
l
i
n
g
;
    pifo<
rank_t
>(
2048
) p;
    
enqueue
 = { 
 
p.enq(sched_meta.rank);
 
}
    
dequeue
 = {
        
rank_t
 rank;
        
p.deq(rank);
    }
}
 
>
>
 
62
62
 
S
R
P
T
Flow
A
Flow
X
 
. . .
 
E
x
a
m
p
l
e
 
 
W
e
i
g
h
t
e
d
 
F
a
i
r
 
Q
u
e
u
e
i
n
g
 
(
W
F
Q
)
 
   
node
 wfq {
       // PIFO extern instantiations
       pifo<
rank_t
>(
2048
) p;
       
// virtual_time state to track last rank dequeued
       register<
rank_t
>(
1
) virtual_time;
       
enqueue
 = {
           
// declare weight & last_finish reg arrays
           
// read virtual_time
           virtual_time.read(0, vt);
           
@atomic
 {
              last_finish.read(flow_id, lf);
              
if
 (lf != 0) {
                   
// seen this flow before
                   start = max(vt, lf);
              } 
else
 {
                   start = vt;
              }
              weight.read(flow_id, fw)
              lf = start + sched_meta.pkt_len/fw;
              last_finish.write(flow_id, lf);
           }
           rank = start;
           
p.enq(rank);
    
}
 
       
dequeue
 = {
           
bit
<
16
> rank;
           
p.deq(rank);
           virtual_time.write(
0
, rank);
       }
    
}
 
>
>
 
63
63
 
W
F
Q
Class
A
Class
C
Class
B
 
E
x
a
m
p
l
e
 
 
S
t
r
i
c
t
 
P
r
i
o
r
i
t
y
 
w
/
 
W
F
Q
 
   
/* Define the shape 
of the tree */
   
tree
 myTree { strict(), {WFQ(), WFQ()} }
 
   table
 find_path { }
   
apply
 {
       
// path computation logic
       find_path.
apply
();
       
// apply the scheduling / shaping algorithm
       myTree.
apply
(leaf_node);
   }
 
>
>
 
64
64
 
W
F
Q
Class
C
Class
D
 
W
F
Q
Class
A
Class
B
 
S
t
r
i
c
t
 
h
i
g
h
 
l
o
w
 
E
x
a
m
p
l
e
 
 
B
a
n
d
w
i
d
t
h
 
R
e
s
e
r
v
a
t
i
o
n
s
 
(
M
i
n
 
R
a
t
e
 
G
u
a
r
a
n
t
e
e
s
)
 
˃
R
o
o
t
 
c
o
m
p
u
t
e
s
 
r
a
n
k
 
a
s
 
e
i
t
h
e
r
 
0
 
(
b
e
l
o
w
m
i
n
 
r
a
t
e
)
 
o
r
 
1
 
(
a
b
o
v
e
 
m
i
n
 
r
a
t
e
)
˃
F
C
F
S
 
u
s
e
d
 
t
o
 
p
r
e
v
e
n
t
 
r
e
o
r
d
e
r
i
n
g
w
i
t
h
i
n
 
a
 
f
l
o
w
 
>
>
 
65
65
 
S
t
r
i
c
t
 
(
M
i
n
 
R
a
t
e
)
 
F
C
F
S
Flow
A
 
F
C
F
S
Flow
B
 
F
C
F
S
Flow
C
 
M
o
r
e
 
E
x
a
m
p
l
e
s
 
˃
H
e
a
v
y
 
h
i
tt
e
r
 
d
e
t
e
c
ti
o
n
 
t
o
 
m
a
p
 
H
H
 
fl
o
w
s
 
t
o
 
s
e
p
a
r
a
t
e
 
q
u
e
u
e
 
a
n
d
 
a
s
s
i
g
n
 
t
h
e
i
r
 
p
a
c
k
e
t
s
l
o
w
e
r
 
p
r
i
o
r
i
t
y
˃
S
c
h
e
d
u
l
i
n
g
 
t
r
a
n
s
a
c
ti
o
n
 
c
o
m
p
u
t
e
s
 
l
e
v
e
l
 
o
f
 
b
u
r
s
ti
n
e
s
s
 
f
o
r
 
e
a
c
h
 
fl
o
w
 
a
n
d
 
a
s
s
i
g
n
s
p
r
i
o
r
i
t
y
 
a
s
 
t
h
e
 
i
n
v
e
r
s
e
 
o
f
 
t
h
e
 
b
u
r
s
t
i
n
e
s
s
 
 
e
n
c
o
u
r
a
g
e
s
 
s
m
o
o
t
h
 
t
r
a
f
f
i
c
 
>
>
 
66
66
Slide Note

Hello my name is Stephen Ibanez and I’m a PhD student at Stanford University working with Nick McKeown. And I’ve spent the summer as an intern in Xilinx Labs working with Gordon Brebner.

I’ve very excited to be here today to talk to you guys about our work in exposing programmability of traffic management functions to P4 programs.

Embed
Share

Programmable Traffic Management involves packet scheduling, traffic shaping, policing, drop policies, packet buffering, replication, and classification to optimize network performance. It is used in integrated switch architectures and is crucial for addressing diverse traffic characteristics and requirements. Embracing programmability in traffic management is essential for enhancing network utilization, meeting SLAs, achieving fairness, and adapting to evolving traffic demands.


Uploaded on Aug 14, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. P4 Programmable Traffic Management Stephen Ibanez & Gordon Brebner 9/12/2018

  2. Goals Motivating Discussions Traffic Manager Architectural Model Overview Putative P4 Extensions Moving Forward Discussions >> 2

  3. What is Traffic Management? Packet Scheduling in what order? Traffic Shaping at what rate/pacing? Policing is packet compliant? Drop Policy how to avoid / deal with congestion? Packet Buffering how to store packets? Packet Replication how to replicate packets? Classification how to map packets into queues? >> 3

  4. Where are traffic managers used? Integrated Switch (PISA) Ingress Parser Traffic Manager Ingress Deparser Egress Parser Egress Deparser Ingress M/A Egress M/A NIC or Line Card Ingress Deparser Traffic Manager Ingress Parser Ingress M/A Ethernet Ports Host PCI Express (NIC) / Switch Fabric (Line Card) Egress Parser Traffic Manager Egress Deparser Egress M/A >> 4

  5. Why should we care about Traffic Management? Lots of different types of traffic w/ different characteristics and requirements: Characteristics: burstiness, packet sizes, flow sizes, flow rates Requirements: throughput, latency, loss, jitter, reordering, flow completion time, tail latency Network operators have a wide range of objectives: Meet all SLAs Maximize network utilization Achieve fairness More complicated ASICs Network devices are picking up more functionality More network programmability More types of traffic More TM requirements! Embrace programmability! About 50% of a modern programmable switch chip is dedicated to buffering and traffic management logic but is not programmable! >> 5

  6. Why should we care about Traffic Management? WAN links are expensive want to make best use of them by prioritizing traffic Modern congestion control algorithms rely on accurate packet pacing (e.g. BBR[1], Timely[2]) Performance isolation for thousands of VMs (millions of flows) per server [1] Cardwell, Neal, et al. "BBR: Congestion-based congestion control." Queue 14.5 (2016): 50. [2] Mittal, Radhika, et al. "TIMELY: RTT-based Congestion Control for the Datacenter." ACM SIGCOMM Computer Communication Review. Vol. 45. No. 4. ACM, 2015. [3] Saeed, Ahmed, et al. "Carousel: Scalable traffic shaping at end hosts." Proceedings of the Conference of the ACM Special Interest Group on Data Communication. ACM, 2017. >> 6

  7. Benefits of Programmable Traffic Management Benefits of an unambiguous way to define TM algorithms: Portability Formal verification and static analysis Precise way to express customer needs Benefits of having programmable TM: Innovation Differentiation Reliability Network operators can fine tune for performance Small menu of algorithms to choose from today Many possible algorithms that can be expressed >> 7

  8. Existing TM algorithms are too limiting TM algorithm parameters can be derived from: Network operator static configurations Per packet information Per flow or per class information (i.e. state stored across packets) Current queue state Aggregated queue state (e.g. avg sizes, avg service rates) Other state that evolves over time Randomness Combinations of any of the above parameters Combined arithmetically Combined temporally (i.e. alternate between various policies) Combined hierarchically >> 8

  9. Generic Traffic Manager Architecture Scheduling & Shaping Crossbar Classification & Policing & Drop Policy Non-P4-Prog PRE Non-P4-Prog Buffer Output queued Each egress port may have many associated queues A NIC may not need a crossbar or PRE >> 9

  10. Proposed TM Architecture Model Ingress logic Scheduling / Shaping Classification Policing Queue 1 Drop Policy . . . Classification & Policing & Drop Policy Input Packet Output Packet Queue i Packet buffer . . . Scheduling / Shaping Queue N State Extern interface Packet Buffer >> 10

  11. Ingress Logic Scheduling / Shaping Match / Action processing Classification: Decide which queue E.g. per flow queues Queue 1 Policing: . . . Classification & Policing & Drop Policy Input Packet Output Packet Is packet compliant? E.g. Use PSA meters to drop packets Queue i . . . Drop Policy: How to drop packets during periods of perceived congestion E.g. WRED Queue N State Extern interface Packet Buffer Extract scheduling metadata Extern interface to access packet buffer state >> 11

  12. Packet Buffer Scheduling / Shaping Memory space to store packets Divided into queues to isolate traffic Queue 1 Queue boundaries may be flexible . . . Classification & Policing & Drop Policy Packet order in queue scheduling order Input Packet Output Packet Queue i . . . Exchange packet descriptor and metadata with scheduling / shaping logic Queue N State Extern interface Packet Buffer State: Size of each queue Accessible to ingress logic (read only) >> 12

  13. The Push-In-First-Out (PIFO) Model [1] 2 What is a PIFO? Programmable rank computation 0 8 7 3 4 Fixed PIFO Why is the PIFO a good model? Scheduling decision made at time of enqueue helps relax timing constraints leads to efficient implementation Clear separation b/w fixed and programmable logic Can implement existing algorithms: Start Time Fair Queueing (STFQ), Least Slack-Time First (LSTF), Stop-and-Go Queueing, Minimum rate guarantees, fine grained priority scheduling, Service-Curved Earliest Deadline First (SC-EDF), Rate-Controlled Service Disciplines (RCSD) Token bucket rate limiting Can implement new algorithms >> 13

  14. Scheduling / Shaping Tree PIFO Scheduling / Shaping Tree 4 3 2 0 1 Path computation Decide leaf node to enqueue into deq logic enq logic Scheduling / Shaping S Each node One PIFO Programmable enq and deq logic Potentially shared state b/w enq and deq logic Shaping Node Queue 1 t1 t0 . . . Classification & Policing & Drop Policy deq logic enq logic Scheduling Node Input Packet Output Packet Queue i S Scheduling node Implement scheduling algs Leaf nodes store descriptors, parent nodes store scheduling node refs . . . Queue N 0 3 9 3 3 2 9 State Extern interface deq logic enq logic deq logic enq logic Packet Buffer Shaping node Implement shaping algs Shapes all nodes within its sub tree S S compute path >> 14 descriptor & metadata

  15. Scheduling & Shaping Demonstration

  16. Scheduling and Shaping Goal: Scheduling Enqueue 3 packets Dequeue 3 packets Shaping Scheduling Scheduling Class A Class B >> 16

  17. Enqueue Packet 1 t8 t1 t7 t2 t6 2 deq logic enq logic L t5 t3 t4 S 3 deq logic deq logic enq logic enq logic deq logic enq logic &p1 S S S compute path L &p1 Shaping Node p1 p1 Queue 1 . . . Classification / Policing / Drop Policy p1 Queue i . . . Queue N >> 17 State

  18. Enqueue Packet 2 t8 2 t1 t7 L t2 t6 deq logic enq logic t5 t3 t4 S t4 3 &p1 7 deq logic deq logic enq logic enq logic deq logic enq logic &p2 S S S compute path R &p2 p2 p2 p2 Queue 1 . . . Classification / Policing / Drop Policy p1 Queue i . . . Queue N >> 18 State

  19. Enqueue Packet 3 t8 2 t1 t7 L t2 t6 deq logic enq logic t5 t3 t4 S t7 t4 3 7 &p1 &p2 2 deq logic deq logic enq logic enq logic deq logic enq logic &p3 S S S compute path R &p3 p3 p3 p2 Queue 1 . . . Classification / Policing / Drop Policy p1 Queue i . . . p3 Queue N >> 19 State

  20. Enqueue Packet 2 (continued) t8 2 t1 t7 L t2 t6 1 deq logic enq logic R t5 t3 t4 S t7 t4 7 2 3 &p2 &p3 &p1 deq logic deq logic enq logic enq logic deq logic enq logic R S S S compute path p2 Queue 1 . . . Classification / Policing / Drop Policy p1 Queue i . . . p3 Queue N >> 20 State

  21. Dequeue Packet 3 t8 2 1 t1 t7 L R t2 t6 deq logic enq logic t5 t3 R t4 S t7 2 3 7 &p3 &p1 &p2 deq logic deq logic enq logic enq logic deq logic enq logic &p3 S S S compute path p2 Queue 1 . . . Classification / Policing / Drop Policy p1 Queue i . . . p3 Queue N >> 21 State

  22. Dequeue Packet 1 t8 2 t1 t7 L t2 t6 deq logic enq logic t5 t3 L t4 S t7 3 7 &p1 &p2 deq logic deq logic enq logic enq logic deq logic enq logic &p1 S S S compute path p2 Queue 1 . . . Classification / Policing / Drop Policy p1 Queue i . . . Queue N >> 22 State

  23. Enqueue Packet 3 (continued) t8 t1 t7 t2 t6 1 deq logic enq logic R t5 t3 t4 S t7 7 &p2 deq logic deq logic enq logic enq logic deq logic enq logic R S S S compute path p2 Queue 1 . . . Classification / Policing / Drop Policy Queue i . . . Queue N >> 23 State

  24. Dequeue Packet 2 t8 1 t1 t7 R t2 t6 deq logic enq logic t5 t3 R t4 S 7 &p2 deq logic deq logic enq logic enq logic deq logic enq logic &p2 S S S compute path p2 Queue 1 . . . Classification / Policing / Drop Policy Queue i . . . Queue N >> 24 State

  25. Prog. Drop Policy t8 1 1 2 t1 t7 R R L t2 t6 deq logic enq logic Extern interface to read buffer state t5 t3 Can implement WRED-like policies t4 S 7 2 3 &p2 &p3 &p1 deq logic deq logic enq logic enq logic S S compute path p4 drop p4 Queue 1 . . . Classification / Policing / Drop Policy Queue i . . . Queue N >> 25 State

  26. New Externs /* PIFO extern */ extern pifo<T, M> { pifo(bit<32> size); // constructor enq(in T rank, in M meta); deq(out T rank, out M meta); // meta is optional } // example instantiation pifo<rank_t, sched_meta_t>(2048) p; // meta is optional /* Packet buffer extern */ extern buffer<T> { buffer(bit<32> num_queues); // get the current size of a queue get_q_size(in T q_id, out bit<32> size); } // example instantiation buffer<qid_t>(1024) buf; >> 26

  27. Simple Architecture User defined scheduling metadata scheduler MyScheduler<D>(in D sched_meta); parser Parser<H, M>(packet_in b, out H hdr, out M user_meta, inout std_meta_t std_meta); control Egress<H, M>(inout H hdr, inout M user_meta, inout std_meta_t std_meta); control Ingress<H, M, D>(inout H hdr, out D sched_meta, inout M user_meta, inout std_meta_t std_meta); control Deparser<H, M>(packet_out b, in H hdr, in M user_meta, inout std_meta_t std_meta); Classification & Policing & Drop Policy Scheduling / Shaping Non-P4- Programmable PRE / buffer Ingress Parser Egress Deparser Ingress M/A Egress M/A >> 27

  28. Scheduler Block scheduler MyScheduler(in sched_meta_t sched_meta) { /* Define PIFO tree nodes */ /* root node */ node strict_priority { type = scheduling; pifo<rank_t>(2048) p; enqueue = { } dequeue = { } } /* shaping node */ node token_bucket { type = shaping; pifo<rank_t, sched_meta_t>(2048) p; enqueue = {} dequeue = { } } /* Define the shape of the tree */ tree myTree { strict_priority(), {wfq(), {token_bucket(), {wfq()} } } table find_path { } apply { find_path.apply(); // apply the scheduling algorithm defined by the tree myTree.apply(leaf_node); } } strict token bucket WFQ WFQ >> 28

  29. Example Strict Priority /* root node */ node strict { type = scheduling; // PIFO extern instantiation format: // pifo<rank_type>(size_in_pkts) instance_name; pifo<rank_t>(2048) p; Strict low high Class A enqueue = { rank_t rank; rank = sched_meta.diffserv; // maybe use a table? p.enq(rank); } Class B dequeue = { rank_t rank; p.deq(rank); } } >> 29

  30. Example Strict Priority with Starvation Prevention strict FCFS nodes prevent reordering of packets within a flow high Also useful for traffic that is sensitive to token bucket low jitter (e.g. voice) FCFS FCFS Flow A Flow B >> 30

  31. Example Strict Priority with Starvation Prevention node token_bucket { type = shaping; pifo<rank_t, sched_meta_t>(2048) p; enqueue = { // declare registers: tb, last_time, rate, max_tokens @atomic { // atomically update last_time and tb registers tb.read(0, old_tokens); old_tokens = old_tokens + r * (now - lt); if (old_tokens > B) { old_tokens = B; } // compute new tokens and send time if (sm.pkt_len <= old_tokens) { new_tokens = old_tokens sm.pkt_len; send_time = now; } else { new_tokens = 0; send_time = now + (sm.pkt_len old_tokens)/r; } tb.write(0, new_tokens); } p.enq(send_time, sm); } strict high token bucket low FCFS FCFS Flow A Flow B dequeue = { bit<rank_t> rank; p.deq(rank, sm); } } >> 31

  32. Example Strict Priority with Starvation Prevention node FCFS { type = scheduling; pifo<rank_t>(2048) p; enqueue = { bit<rank_t> rank = get_time(); // extern function p.enq(rank); } dequeue = { bit<rank_t> rank; p.deq(rank); } } strict high token bucket low FCFS FCFS /* Define the shape of the tree */ tree myTree { strict_priority(), {FCFS(), {token_bucket(), {FCFS()}}} } apply { // path computation logic pifo_id_t leaf_node; if (sched_meta.flow_id == 0) { leaf_node = RIGHT; } else { leaf_node = LEFT; } // apply the scheduling / shaping algorithm myTree.apply(leaf_node); } Flow A Flow B >> 32

  33. Example Low Latency Traffic low latency WFQ across all queues until a particular queue exceeds a configured threshold at which point that queue is given strict priority until it has been sufficiently drained FCFS FCFS FCFS Classification sets desired queue size in sched_meta Flow A Flow C Flow B >> 33

  34. Example Low Latency Traffic Ingress Logic: buffer<qid_t>(NUM_QUEUES) buf; // instantiate buffer extern // get size of low latency queue sched_meta.ll_q_size = buf.get_q_size(LL_Q_ID); lookup_queue.apply(); // set egress queue low latency Root Node: node low_latency { type = scheduling; pifo<rank_t>(2048) p; enqueue = { if (sched_meta.ll_q_size > THRESH) { // Strict priority logic } else { // WFQ logic } p.enq(rank); } dequeue = { rank_t rank; p.deq(rank); } } >> 34 FCFS FCFS FCFS Flow A Flow C Flow B

  35. Runtime Control The tree can be updated by the control plane, but not from within the P4 program Akin to adding table entries Will need P4Runtime support for tree configuration Tables and externs can be handled normally >> 35

  36. Open Source Implementations PIFO paper ASIC design [1] NetFPGA Prototype (P4 Workshop Demo) PIFO Scheduler 0 8 7 3 4 descriptor & rank PIFO block diagram rank Rank Store (SRAM) descriptor Flow Scheduler (flip-flops) computation A 5 6 3 4 2 descriptor & metadata C 3 B 1 A 0 B 2 4 5 C 8 Queue 1 Input Packet Increasing ranks Output Packet . . . Increasing ranks Classification Queue i . . . Queue N Packet Buffer

  37. Challenges with TM Programmability Scaling programmable designs to high rates and large sizes Bottleneck #1 timing closure for scheduling decision logic Bottleneck #2 memory bandwidth/interface to large packet buffer Abstractions to program the PIFO tree Limitations of the proposed model >> 37

  38. Moving Forward Motivation from the community? Settle on architecture model Define language semantics Is a tree too low-level? Are there better abstractions? Improve open source HW implementations Open source simulation/emulation environment Bmv2? NS3? Other? Small P4 TM Working Group >> 38

  39. References [1] Anirudh Sivaraman, Suvinay Subramanian, Mohammad Alizadeh, Sharad Chole, Shang- Tse Chuang, Anurag Agrawal, Hari Balakrishnan, Tom Edsall, Sachin Katti, Nick McKeown. "Programmable packet scheduling at line rate." Proceedings of the 2016 ACM SIGCOMM Conference https://cs.nyu.edu/~anirudh/pifo-sigcomm.pdf >> 39

  40. Questions?

  41. NetFPGA Prototype (P4 Workshop Demo) Parallel deterministic skip lists and register-based sorting cache PIFO Scheduler 0 8 7 3 4 BRAM based packet buffer descriptor & rank rank descriptor computation Top level PIFO block diagram descriptor & metadata Load Balancer Insertion Queue 1 Input Packet Output Packet . . . Register Cache Register Cache Register Cache Classification Queue i . . . Register Cache . . . Skip List Skip List Skip List Queue N Packet Buffer Selector Removal >> 41

  42. PIFO Paper ASIC Design [1] Flow scheduler Choose amongst head pkt of each flow PIFO block diagram Rank Store Store computed ranks for each flow in FIFO order Rank Store (SRAM) Flow Scheduler (flip-flops) A 5 6 3 4 2 PIFO blocks connected in a full mesh C 3 B 1 A 0 B 2 4 5 C 8 64 port 10Gbps shared memory switch, 1GHz Increasing ranks Increasing ranks 1000 flows, 64K packets >> 42

  43. TM Model Limitations

  44. Input vs Output Rate Limiting Goal of rate limiting: control the rate at which bytes are removed from the buffer and scheduled. Output rate limiting direct approach, short and long term rate guarantees Input rate limiting indirect approach, long term rate guarantees Sent out at line rate Strict low high Rate Limit FCFS FCFS >> 44

  45. Arbitrary reordering of buffered packets Pfabric scheduling order: SRPT with FIFO order within flows 6 8 9 7 p3 p2 p1 p0 >> 45

  46. Approximate Pfabric SRPT 9 R R 8 7 L R 6 FCFS FCFS 0 2 1 0 p0 p3 p2 p1 p3 p2 p0 p1 Final Scheduling Order: 7 7 9 9 8 8 6 p3 p3 6 p0 p0 p1 p1 p2 p2 Pfabric Scheduling Order: p0 p3 p2 p1 >> 46

  47. Soft Rate Limiting Rate limit flow A to 2 Gbps only if the egress link is congested Hard vs. Soft Rate Limiting: Hard hold packets until time to send (non work conserving) Soft packets can be sent if bandwidth is available (work conserving) >> 47

  48. Soft Rate Limiting t8 t1 default t7 t2 t6 deq logic enq logic t5 t3 t4 S t7 7 &p2 deq logic deq logic enq logic enq logic deq logic enq logic R &p2 S S S compute path What if there are multiple shaping nodes? p2 Queue 1 . . . Classification / Policing / Drop Policy Queue i . . . Queue N >> 48 State

  49. Ingress Logic Limitations Drop policy: No drop head algorithms (unless using speedup) No algorithms that must programmatically update state in the packet buffer >> 49

  50. Best Effort for Traffic Shaping Shaping operations cause conflicts Scheduling and shaping transactions commit at same time Resolved in favor of scheduling transactions Rationale: scheduling algorithms must run at line rate, shaping algorithms can afford to be delayed a few cycles >> 50

Related


More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#