Understanding Programmable Traffic Management for Network Optimization

˃

ti

ti

˃

ffi

˃

ti

˃

˃

–

˃

ffi

–

˃

–

˃

–

ti

˃

ff

–

˃

ti

–

˃

fi

ti

–

˃

˃

˃

ff

ffi

ff

ti

•

Characteristics: burstiness, packet sizes, flow sizes, flow rates

•

Requirements: throughput, latency, loss, jitter, reordering, flow completion time, tail latency

˃

ti

•

Meet all SLAs

•

Maximize network utilization

•

Achieve fairness

˃

ti

˃

ff

–

˃



ti

ffi

˃

ti

˃

ti

fl

[1] Cardwell, Neal, et al. "BBR: Congestion-based congestion control."

Queue

 14.5 (2016): 50.

[2] Mittal, Radhika, et al. "TIMELY: RTT-based Congestion Control for the Datacenter."

ACM SIGCOMM Computer Communication Review

. Vol. 45. No. 4. ACM, 2015.

[3] Saeed, Ahmed, et al. "Carousel: Scalable traffic shaping at end hosts." Proceedings of the Conference of the ACM Special Interest Group on Data Communication. ACM, 2017.

˃

fi

fi

•

Portability

•

Formal verification and static analysis

•

Precise way to express customer needs

˃

fi

•

Innovation

•

Differentiation

•

Reliability

•

Network operators can fine tune for performance

•

Small menu of algorithms to choose from today

•

Many possible algorithms that can be expressed

˃

•

Network operator static configurations

•

Per packet information

•

Per flow or per class information (i.e. state stored across packets)

•

Current queue state

•

Aggregated queue state (e.g. avg sizes, avg service rates)

•

Other state that evolves over time

•

Randomness

•

Combinations of any of the above parameters

‒

Combined arithmetically

‒

Combined temporally (i.e. alternate between various policies)

‒

Combined hierarchically

˃

˃

˃

Classification &

Policing & Drop

Policy

Non-P4-Prog

PRE

Non-P4-Prog

Buffer

Scheduling &

Shaping

˃

•

Classification

•

Policing

•

Drop Policy

˃

ff

˃

˃

ti

˃

fi

ti

•

Decide which queue

•

E.g. per flow queues

˃

•

Is packet compliant?

•

E.g. Use PSA meters to drop packets

˃

•

How to drop packets during periods of

perceived congestion

•

E.g. WRED

˃

˃

Queue 1

Queue i

Queue N

. . .

. . .

Classification &

Policing & Drop

Policy

Packet Buffer

Input

Packet

Output

Packet

Scheduling

/ Shaping

State

Extern

interface

˃

˃

ffi

˃

fl

˃

≠

˃

˃

•

Size of each queue

•

Accessible to ingress logic (read only)

Queue 1

Queue i

Queue N

. . .

. . .

Classification &

Policing & Drop

Policy

Packet Buffer

Input

Packet

Output

Packet

Scheduling

/ Shaping

State

Extern

interface

˃

˃

•

Scheduling decision made at time of enqueue



 helps relax timing constraints



 leads to

efficient implementation

•

Clear separation b/w fixed and programmable logic

˃

ti

•

Start Time Fair Queueing (STFQ)

, Least Slack-Time First (LSTF), Stop-and-Go Queueing,

Minimum rate guarantees

fine grained priority scheduling

, Service-Curved Earliest Deadline

First (SC-EDF), Rate-Controlled Service Disciplines (RCSD)

•

Token bucket rate limiting

˃

˃

ti

•

Decide leaf node to enqueue into

˃

•

One PIFO

•

Programmable enq and deq logic

•

Potentially shared state b/w enq and

deq logic

˃

•

Implement scheduling algs

•

Leaf nodes store descriptors, parent

nodes store scheduling node refs

˃

•

Implement shaping algs

•

Shapes all nodes within its sub tree

Class

Class

˃

•

Enqueue 3 packets

•

Dequeue 3 packets

p1

Classification /

Policing /

Drop Policy

Queue 1

Queue i

Queue N

. . .

. . .

State

&p1

Shaping Node

p1

p2

Classification /

Policing /

Drop Policy

Queue 1

Queue i

Queue N

. . .

. . .

State

&p2

p1

p3

p2

Classification /

Policing /

Drop Policy

Queue 1

Queue i

Queue N

. . .

. . .

State

&p3

p1

p3

p2

Classification /

Policing /

Drop Policy

Queue 1

Queue i

Queue N

. . .

. . .

State

p1

p3

p2

Classification /

Policing /

Drop Policy

Queue 1

Queue i

Queue N

. . .

. . .

State

p1

p2

Classification /

Policing /

Drop Policy

Queue 1

Queue i

Queue N

. . .

. . .

State

p2

Classification /

Policing /

Drop Policy

Queue 1

Queue i

Queue N

. . .

. . .

State

p2

Classification /

Policing

Drop Policy

Queue 1

Queue i

Queue N

. . .

. . .

State

Classification /

Policing /

Drop Policy

Queue 1

Queue i

Queue N

. . .

. . .

State

˃

ff

˃

/* PIFO extern */

extern

pifo<

> {

   pifo(

bit

> size);

// constructor

   enq(

in

 rank,

in M

 meta);

// meta is optional

   deq(

out

 rank,

out M

 meta);

// meta is optional

// example instantiation

pifo<

rank_t, sched_meta_t

>(

) p;

/* Packet buffer extern */

extern

buffer<

> {

   buffer(

bit

num_queues);

// get the current size of a queue

   get_q_size(

in

q_id,

out

bit

> size);

// example instantiation

parser

 Parser<H, M>(

packet_in

b,

out

 H hdr,

out

 M user_meta,

inout

std_meta_t

 std_meta);

control

 Ingress<H, M, D>(

inout

 H hdr,

out

 D sched_meta,

inout

 M user_meta,

inout

std_meta_t

 std_meta);

scheduler

 MyScheduler<D>(

in

 sched_meta);

control

 Egress<H, M>(

inout

 H hdr,

inout

 M user_meta,

inout

std_meta_t

 std_meta);

control

 Deparser<H, M>(

packet_out

b,

in

 H hdr,

in

 M user_meta,

inout

 std_meta_t std_meta);

Ingress M/A

Ingress

Parser

Egress

Deparser

Egress M/A

Scheduling /

Shaping

Non-P4-

Programmable

PRE / buffer

Classification &

Policing & Drop

Policy

User defined

scheduling

metadata

scheduler

 MyScheduler(

in

sched_meta_t

 sched_meta)

/* Define PIFO tree nodes */

/* root node */

node

 strict_priority {

type

 = scheduling;

       pifo<

rank_t

>(

) p;

enqueue

 = { }

dequeue

 = { }

/* shaping node */

node

 token_bucket {

type

 = shaping;

       pifo<

rank_t, sched_meta_t

>(

) p;

enqueue

= {

dequeue

 = { }

/* Define the shape

of the tree */

tree

 myTree { strict_priority(), {wfq(), {token_bucket(), {wfq()} } }

table

 find_path { }

apply

       find_path.

apply

();

// apply the scheduling algorithm defined by the tree

       myTree.

apply

(leaf_node);

–

/* root node */

node

 strict {

// PIFO extern instantiation format:

       // pifo<rank_type>(size_in_pkts) instance_name;

       pifo<

rank_t

>(

) p;

enqueue

= {

rank_t

 rank;

           rank = sched_meta.diffserv;

// maybe use a table?

p.enq(rank);

dequeue

= {

rank_t

 rank;

p.deq(rank);

Class

Class

–

˃

˃

ffi

ti

–

    node

 token_bucket {

        pifo<

rank_t

 sched_meta_t

>(

) p;

enqueue

= {

           // declare registers: tb, last_time, rate, max_tokens

@atomic

// atomically update last_time and tb registers

               tb.read(

, old_tokens);

               old_tokens = old_tokens + r * (now - lt);

if

 (old_tokens > B) {

                   old_tokens = B;

// compute new tokens and send time

if

 (sm.pkt_len <= old_tokens) {

new_tokens = old_tokens – sm.pkt_len;

                   send_time = now;

else

                   send_time = now + (sm.pkt_len – old_tokens)/r;

               tb.write(

, new_tokens);

p.enq(send_time, sm);

dequeue

= {

bit

rank_t

> rank;

p.deq(rank, sm);

–

    node

 FCFS {

        pifo<

rank_t

>(

) p;

enqueue

= {

            bit<rank_t> rank = get_time();

// extern function

            p.enq(rank);

dequeue

= {

bit

rank_t

> rank;

p.deq(rank);

/* Define the shape

of the tree */

tree

 myTree { strict_priority(), {FCFS(), {token_bucket(), {FCFS()}}}

apply

// path computation logic

pifo_id_t

leaf_node;

if

 (sched_meta.flow_id == 0) {

           leaf_node =

RIGHT

else

           leaf_node =

LEFT

// apply the scheduling / shaping algorithm

       myTree.

apply

(leaf_node);

–

˃

ti

ti

˃

fi

ti

Flow

Flow

Flow

–

buffer<

qid_t

>(

NUM_QUEUES

) buf;

// instantiate buffer extern

// get size of low latency queue

sched_meta.ll_q_size = buf.get_q_size(

LL_Q_ID

);

lookup_queue.

apply

();

// set egress queue

node

 low_latency {

    pifo<

rank_t

>(

) p;

enqueue

= {

        if (sched_meta.ll_q_size >

THRESH

) {

// Strict priority logic

        } else {

// WFQ logic

p.enq(rank);

dequeue

= {

rank_t

 rank;

p.deq(rank);

Flow

Flow

Flow

˃

•

Akin to adding table entries

˃

ti

fi

ti

˃

˃

˃

˃

•

Bottleneck #1 – timing closure for scheduling decision logic

•

Bottleneck #2 – memory bandwidth/interface to large packet buffer

˃

ti

˃

ti

˃

ti

ti

˃

tt

˃

fi

ti

•

Is a tree too low-level?

•

Are there better abstractions?

˃

ti

˃

ti

ti

•

Bmv2? NS3? Other?

˃

[1] Anirudh Sivaraman, Suvinay Subramanian, Mohammad Alizadeh, Sharad Chole, Shang-

Tse Chuang, Anurag Agrawal, Hari Balakrishnan, Tom Edsall, Sachin Katti, Nick McKeown.

"Programmable packet scheduling at line rate." Proceedings of the 2016 ACM SIGCOMM

Conference

https://cs.nyu.edu/~anirudh/pifo-sigcomm.pdf

˃

ti

˃

ff

Load

Balancer

Selector

Register

Cache

. . .

Insertion

Removal

Top level PIFO block diagram

˃

•

Choose amongst head pkt of each flow

˃

•

Store computed ranks for each flow in

FIFO order

˃

˃

˃

fl

˃

ti

ff

•

Output rate limiting – direct approach, short and long term rate guarantees

•

Input rate limiting – indirect approach, long term rate guarantees

Sent out at line

rate

p3

˃

•

SRPT with FIFO order within flows

p2

p1

p0

p0

p0

p0

p1

p1

p1

p2

p2

p2

p3

p3

p3

˃

“

fl

”

˃

ft

ti

•

–

•

–

p2

Classification /

Policing /

Drop Policy

Queue 1

Queue i

Queue N

. . .

. . .

State

What if there are multiple

shaping nodes?

˃

•

No drop head algorithms (unless using speedup)

˃

ti

ff

˃

ti

fl

•

Scheduling and shaping transactions commit at same time

•

Resolved in favor of scheduling transactions

•

Rationale: scheduling algorithms must run at line rate, shaping algorithms can afford to be

delayed a few cycles

p4

p1

p3

p2

Classification /

Policing /

Drop Policy

Queue 1

Queue i

Queue N

. . .

. . .

State

&p4

˃

ti

˃

Class

Class

˃

•

Enqueue 3 packets

•

Dequeue 3 packets

p1

Classification /

Policing /

Drop Policy

Queue 1

Queue i

Queue N

. . .

. . .

State

&p1

p1

p2

Classification /

Policing /

Drop Policy

Queue 1

Queue i

Queue N

. . .

. . .

State

&p2

p1

p3

p2

Classification /

Policing /

Drop Policy

Queue 1

Queue i

Queue N

. . .

. . .

State

&p3

p1

p3

p2

Classification /

Policing /

Drop Policy

Queue 1

Queue i

Queue N

. . .

. . .

State

p1

p2

Classification /

Policing /

Drop Policy

Queue 1

Queue i

Queue N

. . .

. . .

State

p1

Classification /

Policing /

Drop Policy

Queue 1

Queue i

Queue N

. . .

. . .

State

–

table

 lookup_queue {

key

 = { sched_meta.rank:

ternary

; }

actions

 = { set_queue; }

size

    default_action = set_default_queue;

// apply block

sched_meta.rank = hdr.bytes_remaining;

lookup_queue.

apply

();

node

 SRPT {

    pifo<

rank_t

>(

) p;

enqueue

= {

p.enq(sched_meta.rank);

dequeue

= {

rank_t

 rank;

p.deq(rank);

Flow

Flow

. . .

–

node

 wfq {

       // PIFO extern instantiations

       pifo<

rank_t

>(

) p;

// virtual_time state to track last rank dequeued

       register<

rank_t

>(

) virtual_time;

enqueue

= {

// declare weight & last_finish reg arrays

// read virtual_time

           virtual_time.read(0, vt);

@atomic

              last_finish.read(flow_id, lf);

if

 (lf != 0) {

// seen this flow before

                   start = max(vt, lf);

else

                   start = vt;

              weight.read(flow_id, fw)

              lf = start + sched_meta.pkt_len/fw;

              last_finish.write(flow_id, lf);

           rank = start;

p.enq(rank);

dequeue

= {

bit

> rank;

p.deq(rank);

           virtual_time.write(

, rank);

Class

Class

Class

–

/* Define the shape

of the tree */

tree

 myTree { strict(), {WFQ(), WFQ()} }

   table

 find_path { }

apply

// path computation logic

       find_path.

apply

();

// apply the scheduling / shaping algorithm

       myTree.

apply

(leaf_node);

Class

Class

Class

Class

–

˃

˃

Flow

Flow

Flow

˃

tt

ti

fl

˃

ti

ti

fl

–

Slide Note

Hello my name is Stephen Ibanez and I’m a PhD student at Stanford University working with Nick McKeown. And I’ve spent the summer as an intern in Xilinx Labs working with Gordon Brebner.

I’ve very excited to be here today to talk to you guys about our work in exposing programmability of traffic management functions to P4 programs.

Embed Share

Download Presentation

Programmable Traffic Management involves packet scheduling, traffic shaping, policing, drop policies, packet buffering, replication, and classification to optimize network performance. It is used in integrated switch architectures and is crucial for addressing diverse traffic characteristics and requirements. Embracing programmability in traffic management is essential for enhancing network utilization, meeting SLAs, achieving fairness, and adapting to evolving traffic demands.

humphrey Follow

Uploaded on Aug 14, 2024 | 0 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript

P4 Programmable Traffic Management Stephen Ibanez & Gordon Brebner 9/12/2018

Goals Motivating Discussions Traffic Manager Architectural Model Overview Putative P4 Extensions Moving Forward Discussions >> 2

What is Traffic Management? Packet Scheduling in what order? Traffic Shaping at what rate/pacing? Policing is packet compliant? Drop Policy how to avoid / deal with congestion? Packet Buffering how to store packets? Packet Replication how to replicate packets? Classification how to map packets into queues? >> 3

Where are traffic managers used? Integrated Switch (PISA) Ingress Parser Traffic Manager Ingress Deparser Egress Parser Egress Deparser Ingress M/A Egress M/A NIC or Line Card Ingress Deparser Traffic Manager Ingress Parser Ingress M/A Ethernet Ports Host PCI Express (NIC) / Switch Fabric (Line Card) Egress Parser Traffic Manager Egress Deparser Egress M/A >> 4

Why should we care about Traffic Management? Lots of different types of traffic w/ different characteristics and requirements: Characteristics: burstiness, packet sizes, flow sizes, flow rates Requirements: throughput, latency, loss, jitter, reordering, flow completion time, tail latency Network operators have a wide range of objectives: Meet all SLAs Maximize network utilization Achieve fairness More complicated ASICs Network devices are picking up more functionality More network programmability More types of traffic More TM requirements! Embrace programmability! About 50% of a modern programmable switch chip is dedicated to buffering and traffic management logic but is not programmable! >> 5

Why should we care about Traffic Management? WAN links are expensive want to make best use of them by prioritizing traffic Modern congestion control algorithms rely on accurate packet pacing (e.g. BBR[1], Timely[2]) Performance isolation for thousands of VMs (millions of flows) per server [1] Cardwell, Neal, et al. "BBR: Congestion-based congestion control." Queue 14.5 (2016): 50. [2] Mittal, Radhika, et al. "TIMELY: RTT-based Congestion Control for the Datacenter." ACM SIGCOMM Computer Communication Review. Vol. 45. No. 4. ACM, 2015. [3] Saeed, Ahmed, et al. "Carousel: Scalable traffic shaping at end hosts." Proceedings of the Conference of the ACM Special Interest Group on Data Communication. ACM, 2017. >> 6

Benefits of Programmable Traffic Management Benefits of an unambiguous way to define TM algorithms: Portability Formal verification and static analysis Precise way to express customer needs Benefits of having programmable TM: Innovation Differentiation Reliability Network operators can fine tune for performance Small menu of algorithms to choose from today Many possible algorithms that can be expressed >> 7

Existing TM algorithms are too limiting TM algorithm parameters can be derived from: Network operator static configurations Per packet information Per flow or per class information (i.e. state stored across packets) Current queue state Aggregated queue state (e.g. avg sizes, avg service rates) Other state that evolves over time Randomness Combinations of any of the above parameters Combined arithmetically Combined temporally (i.e. alternate between various policies) Combined hierarchically >> 8

Generic Traffic Manager Architecture Scheduling & Shaping Crossbar Classification & Policing & Drop Policy Non-P4-Prog PRE Non-P4-Prog Buffer Output queued Each egress port may have many associated queues A NIC may not need a crossbar or PRE >> 9

Proposed TM Architecture Model Ingress logic Scheduling / Shaping Classification Policing Queue 1 Drop Policy . . . Classification & Policing & Drop Policy Input Packet Output Packet Queue i Packet buffer . . . Scheduling / Shaping Queue N State Extern interface Packet Buffer >> 10

Ingress Logic Scheduling / Shaping Match / Action processing Classification: Decide which queue E.g. per flow queues Queue 1 Policing: . . . Classification & Policing & Drop Policy Input Packet Output Packet Is packet compliant? E.g. Use PSA meters to drop packets Queue i . . . Drop Policy: How to drop packets during periods of perceived congestion E.g. WRED Queue N State Extern interface Packet Buffer Extract scheduling metadata Extern interface to access packet buffer state >> 11

Packet Buffer Scheduling / Shaping Memory space to store packets Divided into queues to isolate traffic Queue 1 Queue boundaries may be flexible . . . Classification & Policing & Drop Policy Packet order in queue scheduling order Input Packet Output Packet Queue i . . . Exchange packet descriptor and metadata with scheduling / shaping logic Queue N State Extern interface Packet Buffer State: Size of each queue Accessible to ingress logic (read only) >> 12

The Push-In-First-Out (PIFO) Model [1] 2 What is a PIFO? Programmable rank computation 0 8 7 3 4 Fixed PIFO Why is the PIFO a good model? Scheduling decision made at time of enqueue helps relax timing constraints leads to efficient implementation Clear separation b/w fixed and programmable logic Can implement existing algorithms: Start Time Fair Queueing (STFQ), Least Slack-Time First (LSTF), Stop-and-Go Queueing, Minimum rate guarantees, fine grained priority scheduling, Service-Curved Earliest Deadline First (SC-EDF), Rate-Controlled Service Disciplines (RCSD) Token bucket rate limiting Can implement new algorithms >> 13

Scheduling / Shaping Tree PIFO Scheduling / Shaping Tree 4 3 2 0 1 Path computation Decide leaf node to enqueue into deq logic enq logic Scheduling / Shaping S Each node One PIFO Programmable enq and deq logic Potentially shared state b/w enq and deq logic Shaping Node Queue 1 t1 t0 . . . Classification & Policing & Drop Policy deq logic enq logic Scheduling Node Input Packet Output Packet Queue i S Scheduling node Implement scheduling algs Leaf nodes store descriptors, parent nodes store scheduling node refs . . . Queue N 0 3 9 3 3 2 9 State Extern interface deq logic enq logic deq logic enq logic Packet Buffer Shaping node Implement shaping algs Shapes all nodes within its sub tree S S compute path >> 14 descriptor & metadata

Scheduling & Shaping Demonstration

Scheduling and Shaping Goal: Scheduling Enqueue 3 packets Dequeue 3 packets Shaping Scheduling Scheduling Class A Class B >> 16

Enqueue Packet 1 t8 t1 t7 t2 t6 2 deq logic enq logic L t5 t3 t4 S 3 deq logic deq logic enq logic enq logic deq logic enq logic &p1 S S S compute path L &p1 Shaping Node p1 p1 Queue 1 . . . Classification / Policing / Drop Policy p1 Queue i . . . Queue N >> 17 State

Enqueue Packet 2 t8 2 t1 t7 L t2 t6 deq logic enq logic t5 t3 t4 S t4 3 &p1 7 deq logic deq logic enq logic enq logic deq logic enq logic &p2 S S S compute path R &p2 p2 p2 p2 Queue 1 . . . Classification / Policing / Drop Policy p1 Queue i . . . Queue N >> 18 State

Enqueue Packet 3 t8 2 t1 t7 L t2 t6 deq logic enq logic t5 t3 t4 S t7 t4 3 7 &p1 &p2 2 deq logic deq logic enq logic enq logic deq logic enq logic &p3 S S S compute path R &p3 p3 p3 p2 Queue 1 . . . Classification / Policing / Drop Policy p1 Queue i . . . p3 Queue N >> 19 State

Enqueue Packet 2 (continued) t8 2 t1 t7 L t2 t6 1 deq logic enq logic R t5 t3 t4 S t7 t4 7 2 3 &p2 &p3 &p1 deq logic deq logic enq logic enq logic deq logic enq logic R S S S compute path p2 Queue 1 . . . Classification / Policing / Drop Policy p1 Queue i . . . p3 Queue N >> 20 State

Dequeue Packet 3 t8 2 1 t1 t7 L R t2 t6 deq logic enq logic t5 t3 R t4 S t7 2 3 7 &p3 &p1 &p2 deq logic deq logic enq logic enq logic deq logic enq logic &p3 S S S compute path p2 Queue 1 . . . Classification / Policing / Drop Policy p1 Queue i . . . p3 Queue N >> 21 State

Dequeue Packet 1 t8 2 t1 t7 L t2 t6 deq logic enq logic t5 t3 L t4 S t7 3 7 &p1 &p2 deq logic deq logic enq logic enq logic deq logic enq logic &p1 S S S compute path p2 Queue 1 . . . Classification / Policing / Drop Policy p1 Queue i . . . Queue N >> 22 State

Enqueue Packet 3 (continued) t8 t1 t7 t2 t6 1 deq logic enq logic R t5 t3 t4 S t7 7 &p2 deq logic deq logic enq logic enq logic deq logic enq logic R S S S compute path p2 Queue 1 . . . Classification / Policing / Drop Policy Queue i . . . Queue N >> 23 State

Dequeue Packet 2 t8 1 t1 t7 R t2 t6 deq logic enq logic t5 t3 R t4 S 7 &p2 deq logic deq logic enq logic enq logic deq logic enq logic &p2 S S S compute path p2 Queue 1 . . . Classification / Policing / Drop Policy Queue i . . . Queue N >> 24 State

Prog. Drop Policy t8 1 1 2 t1 t7 R R L t2 t6 deq logic enq logic Extern interface to read buffer state t5 t3 Can implement WRED-like policies t4 S 7 2 3 &p2 &p3 &p1 deq logic deq logic enq logic enq logic S S compute path p4 drop p4 Queue 1 . . . Classification / Policing / Drop Policy Queue i . . . Queue N >> 25 State

New Externs /* PIFO extern */ extern pifo<T, M> { pifo(bit<32> size); // constructor enq(in T rank, in M meta); deq(out T rank, out M meta); // meta is optional } // example instantiation pifo<rank_t, sched_meta_t>(2048) p; // meta is optional /* Packet buffer extern */ extern buffer<T> { buffer(bit<32> num_queues); // get the current size of a queue get_q_size(in T q_id, out bit<32> size); } // example instantiation buffer<qid_t>(1024) buf; >> 26

Simple Architecture User defined scheduling metadata scheduler MyScheduler<D>(in D sched_meta); parser Parser<H, M>(packet_in b, out H hdr, out M user_meta, inout std_meta_t std_meta); control Egress<H, M>(inout H hdr, inout M user_meta, inout std_meta_t std_meta); control Ingress<H, M, D>(inout H hdr, out D sched_meta, inout M user_meta, inout std_meta_t std_meta); control Deparser<H, M>(packet_out b, in H hdr, in M user_meta, inout std_meta_t std_meta); Classification & Policing & Drop Policy Scheduling / Shaping Non-P4- Programmable PRE / buffer Ingress Parser Egress Deparser Ingress M/A Egress M/A >> 27

Scheduler Block scheduler MyScheduler(in sched_meta_t sched_meta) { /* Define PIFO tree nodes */ /* root node */ node strict_priority { type = scheduling; pifo<rank_t>(2048) p; enqueue = { } dequeue = { } } /* shaping node */ node token_bucket { type = shaping; pifo<rank_t, sched_meta_t>(2048) p; enqueue = {} dequeue = { } } /* Define the shape of the tree */ tree myTree { strict_priority(), {wfq(), {token_bucket(), {wfq()} } } table find_path { } apply { find_path.apply(); // apply the scheduling algorithm defined by the tree myTree.apply(leaf_node); } } strict token bucket WFQ WFQ >> 28

Example Strict Priority /* root node */ node strict { type = scheduling; // PIFO extern instantiation format: // pifo<rank_type>(size_in_pkts) instance_name; pifo<rank_t>(2048) p; Strict low high Class A enqueue = { rank_t rank; rank = sched_meta.diffserv; // maybe use a table? p.enq(rank); } Class B dequeue = { rank_t rank; p.deq(rank); } } >> 29

Example Strict Priority with Starvation Prevention strict FCFS nodes prevent reordering of packets within a flow high Also useful for traffic that is sensitive to token bucket low jitter (e.g. voice) FCFS FCFS Flow A Flow B >> 30

Example Strict Priority with Starvation Prevention node token_bucket { type = shaping; pifo<rank_t, sched_meta_t>(2048) p; enqueue = { // declare registers: tb, last_time, rate, max_tokens @atomic { // atomically update last_time and tb registers tb.read(0, old_tokens); old_tokens = old_tokens + r * (now - lt); if (old_tokens > B) { old_tokens = B; } // compute new tokens and send time if (sm.pkt_len <= old_tokens) { new_tokens = old_tokens sm.pkt_len; send_time = now; } else { new_tokens = 0; send_time = now + (sm.pkt_len old_tokens)/r; } tb.write(0, new_tokens); } p.enq(send_time, sm); } strict high token bucket low FCFS FCFS Flow A Flow B dequeue = { bit<rank_t> rank; p.deq(rank, sm); } } >> 31

Example Strict Priority with Starvation Prevention node FCFS { type = scheduling; pifo<rank_t>(2048) p; enqueue = { bit<rank_t> rank = get_time(); // extern function p.enq(rank); } dequeue = { bit<rank_t> rank; p.deq(rank); } } strict high token bucket low FCFS FCFS /* Define the shape of the tree */ tree myTree { strict_priority(), {FCFS(), {token_bucket(), {FCFS()}}} } apply { // path computation logic pifo_id_t leaf_node; if (sched_meta.flow_id == 0) { leaf_node = RIGHT; } else { leaf_node = LEFT; } // apply the scheduling / shaping algorithm myTree.apply(leaf_node); } Flow A Flow B >> 32

Example Low Latency Traffic low latency WFQ across all queues until a particular queue exceeds a configured threshold at which point that queue is given strict priority until it has been sufficiently drained FCFS FCFS FCFS Classification sets desired queue size in sched_meta Flow A Flow C Flow B >> 33

Example Low Latency Traffic Ingress Logic: buffer<qid_t>(NUM_QUEUES) buf; // instantiate buffer extern // get size of low latency queue sched_meta.ll_q_size = buf.get_q_size(LL_Q_ID); lookup_queue.apply(); // set egress queue low latency Root Node: node low_latency { type = scheduling; pifo<rank_t>(2048) p; enqueue = { if (sched_meta.ll_q_size > THRESH) { // Strict priority logic } else { // WFQ logic } p.enq(rank); } dequeue = { rank_t rank; p.deq(rank); } } >> 34 FCFS FCFS FCFS Flow A Flow C Flow B

Runtime Control The tree can be updated by the control plane, but not from within the P4 program Akin to adding table entries Will need P4Runtime support for tree configuration Tables and externs can be handled normally >> 35

Open Source Implementations PIFO paper ASIC design [1] NetFPGA Prototype (P4 Workshop Demo) PIFO Scheduler 0 8 7 3 4 descriptor & rank PIFO block diagram rank Rank Store (SRAM) descriptor Flow Scheduler (flip-flops) computation A 5 6 3 4 2 descriptor & metadata C 3 B 1 A 0 B 2 4 5 C 8 Queue 1 Input Packet Increasing ranks Output Packet . . . Increasing ranks Classification Queue i . . . Queue N Packet Buffer

Challenges with TM Programmability Scaling programmable designs to high rates and large sizes Bottleneck #1 timing closure for scheduling decision logic Bottleneck #2 memory bandwidth/interface to large packet buffer Abstractions to program the PIFO tree Limitations of the proposed model >> 37

Moving Forward Motivation from the community? Settle on architecture model Define language semantics Is a tree too low-level? Are there better abstractions? Improve open source HW implementations Open source simulation/emulation environment Bmv2? NS3? Other? Small P4 TM Working Group >> 38

References [1] Anirudh Sivaraman, Suvinay Subramanian, Mohammad Alizadeh, Sharad Chole, Shang- Tse Chuang, Anurag Agrawal, Hari Balakrishnan, Tom Edsall, Sachin Katti, Nick McKeown. "Programmable packet scheduling at line rate." Proceedings of the 2016 ACM SIGCOMM Conference https://cs.nyu.edu/~anirudh/pifo-sigcomm.pdf >> 39

Questions?

NetFPGA Prototype (P4 Workshop Demo) Parallel deterministic skip lists and register-based sorting cache PIFO Scheduler 0 8 7 3 4 BRAM based packet buffer descriptor & rank rank descriptor computation Top level PIFO block diagram descriptor & metadata Load Balancer Insertion Queue 1 Input Packet Output Packet . . . Register Cache Register Cache Register Cache Classification Queue i . . . Register Cache . . . Skip List Skip List Skip List Queue N Packet Buffer Selector Removal >> 41

PIFO Paper ASIC Design [1] Flow scheduler Choose amongst head pkt of each flow PIFO block diagram Rank Store Store computed ranks for each flow in FIFO order Rank Store (SRAM) Flow Scheduler (flip-flops) A 5 6 3 4 2 PIFO blocks connected in a full mesh C 3 B 1 A 0 B 2 4 5 C 8 64 port 10Gbps shared memory switch, 1GHz Increasing ranks Increasing ranks 1000 flows, 64K packets >> 42

TM Model Limitations

Input vs Output Rate Limiting Goal of rate limiting: control the rate at which bytes are removed from the buffer and scheduled. Output rate limiting direct approach, short and long term rate guarantees Input rate limiting indirect approach, long term rate guarantees Sent out at line rate Strict low high Rate Limit FCFS FCFS >> 44

Arbitrary reordering of buffered packets Pfabric scheduling order: SRPT with FIFO order within flows 6 8 9 7 p3 p2 p1 p0 >> 45

Approximate Pfabric SRPT 9 R R 8 7 L R 6 FCFS FCFS 0 2 1 0 p0 p3 p2 p1 p3 p2 p0 p1 Final Scheduling Order: 7 7 9 9 8 8 6 p3 p3 6 p0 p0 p1 p1 p2 p2 Pfabric Scheduling Order: p0 p3 p2 p1 >> 46

Soft Rate Limiting Rate limit flow A to 2 Gbps only if the egress link is congested Hard vs. Soft Rate Limiting: Hard hold packets until time to send (non work conserving) Soft packets can be sent if bandwidth is available (work conserving) >> 47

Soft Rate Limiting t8 t1 default t7 t2 t6 deq logic enq logic t5 t3 t4 S t7 7 &p2 deq logic deq logic enq logic enq logic deq logic enq logic R &p2 S S S compute path What if there are multiple shaping nodes? p2 Queue 1 . . . Classification / Policing / Drop Policy Queue i . . . Queue N >> 48 State

Ingress Logic Limitations Drop policy: No drop head algorithms (unless using speedup) No algorithms that must programmatically update state in the packet buffer >> 49

Best Effort for Traffic Shaping Shaping operations cause conflicts Scheduling and shaping transactions commit at same time Resolved in favor of scheduling transactions Rationale: scheduling algorithms must run at line rate, shaping algorithms can afford to be delayed a few cycles >> 50

Understanding Programmable Traffic Management for Network Optimization

Download Presentation

Presentation Transcript

Related

More Related Content