Understanding Interconnection Networks Topology

Interconnection Networks:
Topology
Prof. Natalie Enright Jerger
Topology Overview
Definition: determines
 
arrangement of channels and
nodes in network
Analogous to road map
Often first step in network design
Significant impact on network cost-performance
Determines number of 
hops
Latency
Network energy consumption
Implementation 
complexity
Node degree
Ease of layout
ECE 1749H: Interconnection Networks (Enright Jerger)
Fall 2014
2
ABSTRACT METRICS
 
Fall 2014
ECE 1749H: Interconnection Networks (Enright Jerger)
3
Abstract Metrics
Use metrics
 to evaluate 
performance 
and 
cost
of topology
Also influenced by routing/flow control
At this stage
Assume 
ideal 
routing (perfect load balancing)
Assume 
ideal 
flow control (no idle cycles on any
channel)
ECE 1749H: Interconnection Networks (Enright Jerger)
Fall 2014
4
Abstract Metrics: Degree
Switch Degree: number of links at a node
Proxy for estimating 
cost
Higher degree requires more links and port counts at
each router
2
4
2,3,4
ECE 1749H: Interconnection Networks (Enright Jerger)
Fall 2014
5
Abstract Metrics: Hop Count
Path: ordered set of channels between source
and destination
Hop Count: number of hops a message takes
from source to destination
Simple, useful proxy for network 
latency
Every node, link incurs some propagation delay even when
no contention
Minimal hop count: smallest hop count
connecting two nodes
ECE 1749H: Interconnection Networks (Enright Jerger)
Fall 2014
6
Hop Count
Network 
diameter
: large min hop count in
network
Average minimum hop count: average across
all src/dst pairs
Implementation may incorporate non-minimal
paths
Increases average hop count
ECE 1749H: Interconnection Networks (Enright Jerger)
Fall 2014
7
Hop Count
Uniform random traffic
Ring > Mesh > Torus
Derivations later
Max = 4
Avg = 2.2
Max = 2
1.33
Max = 4
1.77
ECE 1749H: Interconnection Networks (Enright Jerger)
Fall 2014
8
Latency
Time
 for packet to traverse network
Start:
 head arrives at input port
End: tail departs output port
Latency = Head latency + serialization latency
Serialization latency: time for packet with Length L to
cross channel with bandwidth b (L/b)
Approximate with hop count
Other design choices (routing, flow control) impact
latency
Unknown at this stage
ECE 1749H: Interconnection Networks (Enright Jerger)
Fall 2014
9
Abstract Metrics: Maximum Channel Load
Estimate max 
bandwidth 
the network can
support
Max bits per second (bps) that can be injected by
every node before it saturates
Saturation
: network cannot accept any more traffic
Determine most congested link
For given traffic pattern
Will limit overall network bandwidth
Estimate load on this channel
ECE 1749H: Interconnection Networks (Enright Jerger)
Fall 2014
10
Maximum Channel Load
Preliminary
Don’t know specifics of link yet
Define relative to injection load
Channel load of 2
Channel is loaded with twice injection bandwidth
If each node injects a flit every cycle
2 flits will want to traverse bottleneck channel every cycle
If bottleneck channel can only handle 1 flit per cycle
Max network bandwidth is ½ link bandwidth
A flit can be injected every other cycle
ECE 1749H: Interconnection Networks (Enright Jerger)
Fall 2014
11
Maximum Channel Load Example
Uniform random
Every node has equal probability of sending to every node
Identify bottleneck channel
Half of traffic from every node will cross bottleneck
channel
8 x ½ = 4
Network saturates at ¼ injection bandwidth
A
C
D
B
E
G
H
F
ECE 1749H: Interconnection Networks (Enright Jerger)
Fall 2014
12
Bisection Bandwidth
Common off-chip metric
Proxy for cost
Amount of global wiring that will be necessary
Less useful for on-chip
Global on-chip wiring considered 
abundant
Cuts: partition all the nodes into two disjoint sets
Bandwidth of a cut
Bisection
A cut which divides all nodes into (nearly) half
Channel bisection 
 min. channel count over all bisections
Bisection bandwidth 
 
min. bandwidth over all bisections
With uniform traffic
½ of traffic crosses bisection
ECE 1749H: Interconnection Networks (Enright Jerger)
Fall 2014
13
Throughput
 Example
 
Bisection = 4 (2 in each direction)
0
1
2
3
4
5
6
7
 
With uniform random traffic
3 sends 1/8 of its traffic to 4,5,6
3 sends 1/16 of its traffic to 7 (2 possible shortest paths)
2 sends 1/8 of its traffic to 4,5
Etc
 
Channel load = 1
ECE 1749H: Interconnection Networks (Enright Jerger)
Fall 2014
14
Path Diversity
Multiple shortest paths between source/destination pair (R)
Fault tolerance
Better 
load balancing 
in network
Routing algorithm should be able to exploit path diversity
R
A-B
 = 1
R
A-B
 = 2
R
A-B 
= 6
ECE 1749H: Interconnection Networks (Enright Jerger)
Fall 2014
15
NETWORK EVALUATION
 
Fall 2014
ECE 1749H: Interconnection Networks (Enright Jerger)
16
Evaluating Networks
Analytical and theoretical analysis
E.g. mathematical derivations of max channel load
Analytical models for power (DSENT)
Simulation-based analysis
Network-only simulation with synthetic traffic patterns
Full system simulation with real application benchmarks
Hardware implementation
HDL implementation to measure power, area, frequency
etc.
Measurement on real hardware
Profiling and analyzing communication
Fall 2014
ECE 1749H: Interconnection Networks (Enright Jerger)
17
Evaluating Topologies
Important to consider traffic pattern
Talked about system architecture impact on
traffic
If actual traffic pattern unknown
Synthetic traffic patterns
Evaluate common scenarios
Stress test network
Derive various properties of network
ECE 1749H: Interconnection Networks (Enright Jerger)
Fall 2014
18
Traffic Patterns
Historically derived from particular
applications of interest
Spatial distribution
Matrix Transpose 
 Transpose traffic pattern
d
i
 = s
i+b/2 mod b
b-bit address, d
i
: ith bit of destination
Fall 2014
19
Traffic Patterns Examples
Fast Fourier Transform (FFT) or sorting
application 
 shuffle permutation
Fluid dynamics 
 neighbor patterns
Shuffle: d
i 
= s
i-1 mod b
Neighbor: d
x 
= s
x
+ 1 mod k
ECE 1749H: Interconnection Networks (Enright Jerger)
Fall 2014
20
Traffic Patterns (3)
Uniform random
Each source equally likely to communication with each
destination
Most commonly used traffic pattern
Very 
benign
Traffic is uniformly distributed
Balances load even if topology/routing algorithm has very poor
load balancing
Need to be careful
But can be good for debugging/verifying
implementation
Well-understood pattern
ECE 1749H: Interconnection Networks (Enright Jerger)
Fall 2014
21
Stress-testing Network
Uniform random can make bad topologies
look good
Permutation traffic will stress-test the network
Many types of permutation (ex: shuffle,
transpose, neighbor)
Each source sends all traffic to single destination
Concentration of load on individual pairs
Stresses load balancing
ECE 1749H: Interconnection Networks (Enright Jerger)
Fall 2014
22
Traffic Patterns
For topology/routing discussion
Focus on 
spatial 
distribution
Traffic patterns also have 
temporal 
aspects
Bursty behavior
Important to capture temporal behavior as well
Motivate need for new traffic patterns
ECE 1749H: Interconnection Networks (Enright Jerger)
Fall 2014
23
Full System Simulation
24
NoC Simulator
Full System Simulator
Packets Sent
Packets Arrived
Feedback
!
Accurate But 
Slow
Fall 2014
ECE 1749H: Interconnection Networks (Enright Jerger)
Trace Simulation
25
NoC Simulator
Trace Simulator
Trace
NoC B
Packets Sent
Faster But 
Less Accurate
Fall 2014
ECE 1749H: Interconnection Networks (Enright Jerger)
Traffic Patterns
26
NoC Simulator
NoC
Synthetic Traffic Driver
Application
Very Fast But 
Inaccurate
Fall 2014
ECE 1749H: Interconnection Networks (Enright Jerger)
COMMON TOPOLOGIES
 
Fall 2014
ECE 1749H: Interconnection Networks (Enright Jerger)
27
Types of Topologies
Focus on switched topologies
Alternatives: bus and crossbar
Bus
Connects a set of components to a single shared channel
Effective broadcast medium
Crossbar
Directly connects 
n 
inputs to 
m 
outputs without
intermediate stages
Fully connected, single hop network
Component of routers
 
ECE 1749H: Interconnection Networks (Enright Jerger)
Fall 2014
28
Types of Topologies
Direct
Each router is associated with a terminal node
All routers are sources and destinations of traffic
Indirect
Routers are distinct from terminal nodes
Terminal nodes can source/sink traffic
Intermediate nodes switch traffic between terminal nodes
To date: Most on-chip networks use direct topologies
ECE 1749H: Interconnection Networks (Enright Jerger)
Fall 2014
29
Torus (1)
K-ary n-cube:  k
n 
network nodes
N-Dimensional grid with k nodes in each
dimension
 
3-ary 2-cube
 
3-ary 2-mesh
2,3,4-ary 3-mesh
ECE 1749H: Interconnection Networks (Enright Jerger)
Fall 2014
30
Torus (2)
Map well to planar substrate for on-chip
Topologies in Torus Family
Ex: Ring -- k-ary 1-cube
Edge Symmetric
Good for load balancing
Removing wrap-around links for mesh loses edge symmetry
More traffic concentrated on center channels
Good path diversity
Exploit locality for near-neighbor traffic
ECE 1749H: Interconnection Networks (Enright Jerger)
Fall 2014
31
Hop Count
Average shortest distance over all pairs of
nodes
Torus:
For uniform random traffic
Packet travels 
k/4
 hops in each of 
n 
dimensions
Mesh:
ECE 1749H: Interconnection Networks (Enright Jerger)
Fall 2014
32
Torus (4)
Degree = 2n, 2 channels per dimension
All nodes have same degree
Total channels = 2nN
ECE 1749H: Interconnection Networks (Enright Jerger)
Fall 2014
33
Channel Load for Torus
Even number of k-ary (n-1)-cubes in outer dimension
Dividing these k-ary (n-1)-cubes gives a 2 sets of k
n-1
bidirectional channels or 4k
n-1
½ Traffic from each node cross bisection
Mesh has ½ the
 bisection bandwidth of torus
ECE 1749H: Interconnection Networks (Enright Jerger)
Fall 2014
34
Deriving Channel Load: 4-ary 2-cube
Divide network in half
Number of bisection channels
8 links, bidirectional = 16
channels
½ traffic crosses bisection
N/2 traffic distributed over 16
links
Channel load = ½
Loaded at twice injection
bandwidth
ECE 1749H: Interconnection Networks (Enright Jerger)
Fall 2014
35
Torus Path Diversity
 
2 edge and node disjoint minimum paths
2 dimensions*
*assume single direction for x and y
NW, NE, SW, SE combos
ECE 1749H: Interconnection Networks (Enright Jerger)
Fall 2014
36
Mesh
A torus with end-around connection removed
Same node degree
Bisection channels halved
Max channel load = k/4
Higher demand for central channels
Load imbalance
ECE 1749H: Interconnection Networks (Enright Jerger)
Fall 2014
37
Butterfly
Indirect network
K-ary n-fly: k
n
network nodes
Routing from 000 to
010
Dest address used to
directly route packet
Bit 
n 
used to select
output port at stage 
n
0
00
1
2
3
4
5
6
7
0
1
2
3
4
5
6
7
01
02
03
10
11
12
13
20
21
22
23
 
0
 
1
 
0
2-ary 3-fly
2 input switch, 3 stages
ECE 1749H: Interconnection Networks (Enright Jerger)
Fall 2014
38
Butterfly (2)
No path diversity
Can add extra stages for diversity
Increase network diameter
0
x0
1
2
3
4
5
6
7
0
1
2
3
4
5
6
7
x1
x2
x3
10
11
12
13
20
21
22
23
00
01
02
03
ECE 1749H: Interconnection Networks (Enright Jerger)
Fall 2014
39
Butterfly (3)
Hop Count
Log
k
N + 1
Does 
not 
exploit 
locality
Hop count same regardless of location
Switch Degree = 2k
Requires long wires to implement
ECE 1749H: Interconnection Networks (Enright Jerger)
Fall 2014
40
Butterfly: Channel Load
H
min
 x N: Channel demand
Number of channel traversals required to deliver one
round of packets
Channel Load 
 uniform traffic
Equally loads channels
Increases for adversarial traffic
ECE 1749H: Interconnection Networks (Enright Jerger)
Fall 2014
41
Butterfly: Deriving Channel Load
Divide network in half
Number of bisection channels:
4
4 nodes (top half) send ½
traffic to lower half
Distributed across 2 channels
(C)
Channel load = 1
0
00
1
2
3
4
5
6
7
0
1
2
3
4
5
6
7
01
02
03
10
11
12
13
20
21
22
23
ECE 1749H: Interconnection Networks (Enright Jerger)
Fall 2014
42
Butterfly: Channel Load
Adversarial traffic
All traffic from top
half sent to bottom
half
E.g. 0 sends to 4, 1
sends to 5
Channel load: 2
Loaded at ½ injection
bandwidth
0
00
1
2
3
4
5
6
7
0
1
2
3
4
5
6
7
01
02
03
10
11
12
13
20
21
22
23
ECE 1749H: Interconnection Networks (Enright Jerger)
Fall 2014
43
Clos Network
3-stage indirect network
Larger number of stages: built recursively by replacing
middle stage with 3-stage Clos
Characterized by triple
 (m, n, r)
M: # of middle stage switches
N: #
 of input/output ports on input/output switches
R: # of input/output switches
Hop Count = 4
ECE 1749H: Interconnection Networks (Enright Jerger)
Fall 2014
44
Clos Network
nxm
input
switch
nxm
input
switch
nxm
input
switch
nxm
input
switch
rxr
input
switch
rxr
input
switch
rxr
input
switch
rxr
input
switch
rxr
input
switch
mxn
output
switch
mxn
output
switch
mxn
output
switch
mxn
output
switch
ECE 1749H: Interconnection Networks (Enright Jerger)
Fall 2014
45
Clos Network
Strictly non-blocking when 
m > 2n-1
Any input can connect to any unique output port
r x n
 nodes
Degree
First and last stages: 
n + m
, middle stage: 
2r
Path diversity: 
m
Can be folded along middle switches
Input and output switches are shared
ECE 1749H: Interconnection Networks (Enright Jerger)
Fall 2014
46
Folded Clos (Fat Tree)
Bandwidth remains constant at each level
Regular Tree: Bandwidth decreases closer to root
ECE 1749H: Interconnection Networks (Enright Jerger)
Fall 2014
47
Fat
 Tree (2)
Provides
 path diversity
ECE 1749H: Interconnection Networks (Enright Jerger)
Fall 2014
48
Application of Topologies to On-Chip Networks
FBFly
Convert butterfly to direct network
Swizzle switch
Circuit-optimized crossbar
Rings
Simple, low hardware cost
Mesh networks
Several products/prototypes
MECS and bus-based networks
Broadcast and multicast capabilities
Fall 2014
ECE 1749H: Interconnection Networks (Enright Jerger)
49
Implementation
Folding
Equalize path lengths
Reduces max link
length
Increases length of
other links
0
1
2
3
0
1
2
3
ECE 1749H: Interconnection Networks (Enright Jerger)
Fall 2014
50
Concentration
Don’t need 1:1 ratio of
routers to cores
Ex: 4 cores concentrated to 1
router
Can save area and power
Increases network
complexity
Concentrator must
implement policy for sharing
injection bandwidth
During bursty communication
Can bottleneck
ECE 1749H: Interconnection Networks (Enright Jerger)
Fall 2014
51
Implication of Abstract Metrics on Implementation
Degree: useful proxy for router complexity
Increasing ports requires additional buffer queues,
requestors to allocators, ports to crossbar
All contribute to critical path delay, area and
power
Link complexity does not correlate with degree
Link complexity depends on link width
Fixed number of wires, link complexity for 2-port vs 3-
port is same
ECE 1749H: Interconnection Networks (Enright Jerger)
Fall 2014
52
Implications (2)
Hop Count: useful proxy for overall latency
and power
Does not always correlate with latency
Depends heavily on router pipeline and link
propagation
Example:
Network A with 2 hops, 5 stage pipeline, 4 cycle link
traversal vs.
Network B with 3 hops, 1 stage pipeline, 1 cycle link
traversal
ECE 1749H: Interconnection Networks (Enright Jerger)
Fall 2014
53
Implications (2)
Hop Count: useful proxy for overall latency
and power
Does not always correlate with latency
Depends heavily on router pipeline and link
propagation
Example:
Network A with 2 hops, 5 stage pipeline, 4 cycle link
traversal vs.
Network B with 3 hops, 1 stage pipeline, 1 cycle link
traversal
Hop Count says A is better than B
But A has 18 cycle latency vs 6 cycle
latency for B
ECE 1749H: Interconnection Networks (Enright Jerger)
Fall 2014
54
Implications (3)
Topologies typically trade-off hop count and
node degree
Max channel load useful proxy for network
saturation and max power
Higher max channel load 
 greater network
congestion
Traffic pattern impacts max channel load
Representative traffic patterns important
Max power: dynamic power is highest with peak
switching activity and utilization in network
ECE 1749H: Interconnection Networks (Enright Jerger)
Fall 2014
55
Topology Summary
First network design decision
Critical impact on network latency and
throughput
Hop count provides first order approximation of
message latency
Bottleneck channels determine saturation
throughput
ECE 1749H: Interconnection Networks (Enright Jerger)
Fall 2014
56
Slide Note
Embed
Share

Exploring the topology of interconnection networks helps determine the arrangement of channels and nodes, impacting network cost, performance, latency, energy consumption, and complexity of implementation. Abstract metrics such as degree, hop count, and network diameter play crucial roles in evaluating the network's performance and cost. Higher degrees at nodes increase costs, while hop count reflects network latency. Considerations of routing, flow control, and ideal scenarios further shape the network design process.


Uploaded on Sep 10, 2024 | 1 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Interconnection Networks: Topology Prof. Natalie Enright Jerger

  2. Topology Overview Definition: determines arrangement of channels and nodes in network Analogous to road map Often first step in network design Significant impact on network cost-performance Determines number of hops Latency Network energy consumption Implementation complexity Node degree Ease of layout Fall 2014 ECE 1749H: Interconnection Networks (Enright Jerger) 2

  3. ABSTRACT METRICS Fall 2014 ECE 1749H: Interconnection Networks (Enright Jerger) 3

  4. Abstract Metrics Use metrics to evaluate performance and cost of topology Also influenced by routing/flow control At this stage Assume ideal routing (perfect load balancing) Assume ideal flow control (no idle cycles on any channel) Fall 2014 ECE 1749H: Interconnection Networks (Enright Jerger) 4

  5. Abstract Metrics: Degree Switch Degree: number of links at a node Proxy for estimating cost Higher degree requires more links and port counts at each router B B B A 2,3,4 4 2 A A Fall 2014 ECE 1749H: Interconnection Networks (Enright Jerger) 5

  6. Abstract Metrics: Hop Count Path: ordered set of channels between source and destination Hop Count: number of hops a message takes from source to destination Simple, useful proxy for network latency Every node, link incurs some propagation delay even when no contention Minimal hop count: smallest hop count connecting two nodes Fall 2014 ECE 1749H: Interconnection Networks (Enright Jerger) 6

  7. Hop Count Network diameter: large min hop count in network Average minimum hop count: average across all src/dst pairs Implementation may incorporate non-minimal paths Increases average hop count Fall 2014 ECE 1749H: Interconnection Networks (Enright Jerger) 7

  8. Hop Count B B B A A A Max = 4 Avg = 2.2 Max = 4 1.77 Max = 2 1.33 Uniform random traffic Ring > Mesh > Torus Derivations later Fall 2014 ECE 1749H: Interconnection Networks (Enright Jerger) 8

  9. Latency Time for packet to traverse network Start: head arrives at input port End: tail departs output port Latency = Head latency + serialization latency Serialization latency: time for packet with Length L to cross channel with bandwidth b (L/b) Approximate with hop count Other design choices (routing, flow control) impact latency Unknown at this stage Fall 2014 ECE 1749H: Interconnection Networks (Enright Jerger) 9

  10. Abstract Metrics: Maximum Channel Load Estimate max bandwidth the network can support Max bits per second (bps) that can be injected by every node before it saturates Saturation: network cannot accept any more traffic Determine most congested link For given traffic pattern Will limit overall network bandwidth Estimate load on this channel Fall 2014 ECE 1749H: Interconnection Networks (Enright Jerger) 10

  11. Maximum Channel Load Preliminary Don t know specifics of link yet Define relative to injection load Channel load of 2 Channel is loaded with twice injection bandwidth If each node injects a flit every cycle 2 flits will want to traverse bottleneck channel every cycle If bottleneck channel can only handle 1 flit per cycle Max network bandwidth is link bandwidth A flit can be injected every other cycle Fall 2014 ECE 1749H: Interconnection Networks (Enright Jerger) 11

  12. Maximum Channel Load Example A B E F C D G H Uniform random Every node has equal probability of sending to every node Identify bottleneck channel Half of traffic from every node will cross bottleneck channel 8 x = 4 Network saturates at injection bandwidth Fall 2014 ECE 1749H: Interconnection Networks (Enright Jerger) 12

  13. Bisection Bandwidth Common off-chip metric Proxy for cost Amount of global wiring that will be necessary Less useful for on-chip Global on-chip wiring considered abundant Cuts: partition all the nodes into two disjoint sets Bandwidth of a cut Bisection A cut which divides all nodes into (nearly) half Channel bisection min. channel count over all bisections Bisection bandwidth min. bandwidth over all bisections With uniform traffic of traffic crosses bisection Fall 2014 ECE 1749H: Interconnection Networks (Enright Jerger) 13

  14. Throughput Example 0 1 2 3 4 5 6 7 Bisection = 4 (2 in each direction) With uniform random traffic 3 sends 1/8 of its traffic to 4,5,6 3 sends 1/16 of its traffic to 7 (2 possible shortest paths) 2 sends 1/8 of its traffic to 4,5 Etc Channel load = 1 Fall 2014 ECE 1749H: Interconnection Networks (Enright Jerger) 14

  15. Path Diversity Multiple shortest paths between source/destination pair (R) Fault tolerance Better load balancing in network Routing algorithm should be able to exploit path diversity B B B RA-B = 6 RA-B = 2 A RA-B = 1 A A Fall 2014 ECE 1749H: Interconnection Networks (Enright Jerger) 15

  16. NETWORK EVALUATION Fall 2014 ECE 1749H: Interconnection Networks (Enright Jerger) 16

  17. Evaluating Networks Analytical and theoretical analysis E.g. mathematical derivations of max channel load Analytical models for power (DSENT) Simulation-based analysis Network-only simulation with synthetic traffic patterns Full system simulation with real application benchmarks Hardware implementation HDL implementation to measure power, area, frequency etc. Measurement on real hardware Profiling and analyzing communication Fall 2014 ECE 1749H: Interconnection Networks (Enright Jerger) 17

  18. Evaluating Topologies Important to consider traffic pattern Talked about system architecture impact on traffic If actual traffic pattern unknown Synthetic traffic patterns Evaluate common scenarios Stress test network Derive various properties of network Fall 2014 ECE 1749H: Interconnection Networks (Enright Jerger) 18

  19. Traffic Patterns Historically derived from particular applications of interest Spatial distribution Matrix Transpose Transpose traffic pattern di = si+b/2 mod b b-bit address, di: ith bit of destination Fall 2014 19

  20. Traffic Patterns Examples Fast Fourier Transform (FFT) or sorting application shuffle permutation Fluid dynamics neighbor patterns Shuffle: di = si-1 mod b Fall 2014 Neighbor: dx = sx+ 1 mod k ECE 1749H: Interconnection Networks (Enright Jerger) 20

  21. Traffic Patterns (3) Uniform random Each source equally likely to communication with each destination Most commonly used traffic pattern Very benign Traffic is uniformly distributed Balances load even if topology/routing algorithm has very poor load balancing Need to be careful But can be good for debugging/verifying implementation Well-understood pattern Fall 2014 ECE 1749H: Interconnection Networks (Enright Jerger) 21

  22. Stress-testing Network Uniform random can make bad topologies look good Permutation traffic will stress-test the network Many types of permutation (ex: shuffle, transpose, neighbor) Each source sends all traffic to single destination Concentration of load on individual pairs Stresses load balancing Fall 2014 ECE 1749H: Interconnection Networks (Enright Jerger) 22

  23. Traffic Patterns For topology/routing discussion Focus on spatial distribution Traffic patterns also have temporal aspects Bursty behavior Important to capture temporal behavior as well Motivate need for new traffic patterns Fall 2014 ECE 1749H: Interconnection Networks (Enright Jerger) 23

  24. Full System Simulation NoC Simulator Full System Simulator Packets Sent Processor Application Cache NoC Disk Other Components Feedback! Packets Arrived Accurate But Slow Fall 2014 ECE 1749H: Interconnection Networks (Enright Jerger) 24

  25. Trace Simulation NoC Simulator Trace Simulator Trace Packets Sent Processor NoC B Cache Application NoC A Disk Other Faster But Less Accurate Fall 2014 ECE 1749H: Interconnection Networks (Enright Jerger) 25

  26. Traffic Patterns NoC Simulator Synthetic Traffic Driver Traffic Pattern Uniform Random Bit Complement Application Bit Reverse Bit Rotation NoC Shuffle Transpose Tornado Neighbour Very Fast But Inaccurate Fall 2014 ECE 1749H: Interconnection Networks (Enright Jerger) 26

  27. COMMON TOPOLOGIES Fall 2014 ECE 1749H: Interconnection Networks (Enright Jerger) 27

  28. Types of Topologies Focus on switched topologies Alternatives: bus and crossbar Bus Connects a set of components to a single shared channel Effective broadcast medium Crossbar Directly connects n inputs to m outputs without intermediate stages Fully connected, single hop network Component of routers Fall 2014 ECE 1749H: Interconnection Networks (Enright Jerger) 28

  29. Types of Topologies Direct Each router is associated with a terminal node All routers are sources and destinations of traffic Indirect Routers are distinct from terminal nodes Terminal nodes can source/sink traffic Intermediate nodes switch traffic between terminal nodes To date: Most on-chip networks use direct topologies Fall 2014 ECE 1749H: Interconnection Networks (Enright Jerger) 29

  30. Torus (1) K-ary n-cube: kn network nodes N-Dimensional grid with k nodes in each dimension 2,3,4-ary 3-mesh 3-ary 2-cube 3-ary 2-mesh Fall 2014 ECE 1749H: Interconnection Networks (Enright Jerger) 30

  31. Torus (2) Map well to planar substrate for on-chip Topologies in Torus Family Ex: Ring -- k-ary 1-cube Edge Symmetric Good for load balancing Removing wrap-around links for mesh loses edge symmetry More traffic concentrated on center channels Good path diversity Exploit locality for near-neighbor traffic Fall 2014 ECE 1749H: Interconnection Networks (Enright Jerger) 31

  32. Hop Count Average shortest distance over all pairs of nodes Torus: nk 4 k even Hmin= 4-1 nk k odd 4k For uniform random traffic Packet travels k/4 hops in each of n dimensions nk 3 k even Mesh: Hmin= 3-1 nk k odd 3k Fall 2014 ECE 1749H: Interconnection Networks (Enright Jerger) 32

  33. Torus (4) Degree = 2n, 2 channels per dimension All nodes have same degree Total channels = 2nN Fall 2014 ECE 1749H: Interconnection Networks (Enright Jerger) 33

  34. Channel Load for Torus Even number of k-ary (n-1)-cubes in outer dimension Dividing these k-ary (n-1)-cubes gives a 2 sets of kn-1 bidirectional channels or 4kn-1 Traffic from each node cross bisection channelload =N 4N=k k 2 8 Mesh has the bisection bandwidth of torus Fall 2014 ECE 1749H: Interconnection Networks (Enright Jerger) 34

  35. Deriving Channel Load: 4-ary 2-cube Divide network in half Number of bisection channels 8 links, bidirectional = 16 channels 4N k =4 16 4 traffic crosses bisection N 2=8 N/2 traffic distributed over 16 links Channel load = Loaded at twice injection bandwidth Fall 2014 ECE 1749H: Interconnection Networks (Enright Jerger) 35

  36. Torus Path Diversity 2 dimensions* NW, NE, SW, SE combos 2 edge and node disjoint minimum paths *assume single direction for x and y Fall 2014 ECE 1749H: Interconnection Networks (Enright Jerger) 36

  37. Mesh A torus with end-around connection removed Same node degree Bisection channels halved Max channel load = k/4 Higher demand for central channels Load imbalance Fall 2014 ECE 1749H: Interconnection Networks (Enright Jerger) 37

  38. Butterfly Indirect network 0 1 0 0 0 00 10 20 K-ary n-fly: kn network nodes 1 1 2 2 01 11 21 3 3 4 4 Routing from 000 to 010 Dest address used to directly route packet Bit n used to select output port at stage n 02 12 22 5 5 6 6 03 13 23 7 7 2-ary 3-fly 2 input switch, 3 stages Fall 2014 ECE 1749H: Interconnection Networks (Enright Jerger) 38

  39. Butterfly (2) Rxy=1 No path diversity Can add extra stages for diversity Increase network diameter 0 0 x0 10 20 00 1 1 2 2 x1 11 21 01 3 3 4 4 x2 12 22 02 5 5 6 6 x3 13 23 03 7 7 Fall 2014 ECE 1749H: Interconnection Networks (Enright Jerger) 39

  40. Butterfly (3) Hop Count LogkN + 1 Does not exploit locality Hop count same regardless of location Switch Degree = 2k Requires long wires to implement Fall 2014 ECE 1749H: Interconnection Networks (Enright Jerger) 40

  41. Butterfly: Channel Load Hmin x N: Channel demand Number of channel traversals required to deliver one round of packets Channel Load uniform traffic Equally loads channels =kn(n+1) kn(n+1)=1 NHmin C Increases for adversarial traffic Fall 2014 ECE 1749H: Interconnection Networks (Enright Jerger) 41

  42. Butterfly: Deriving Channel Load Divide network in half Number of bisection channels: 4 0 0 00 10 20 1 1 2 2 01 11 21 4 nodes (top half) send traffic to lower half 4 2= 2 3 3 4 4 02 12 22 5 5 6 6 03 13 23 Distributed across 2 channels (C) Channel load = 1 7 7 Fall 2014 ECE 1749H: Interconnection Networks (Enright Jerger) 42

  43. Butterfly: Channel Load Adversarial traffic All traffic from top half sent to bottom half 0 0 00 10 20 1 1 2 2 01 11 21 3 3 4 4 E.g. 0 sends to 4, 1 sends to 5 Channel load: 2 Loaded at injection bandwidth 02 12 22 5 5 6 6 03 13 23 7 7 Fall 2014 ECE 1749H: Interconnection Networks (Enright Jerger) 43

  44. Clos Network 3-stage indirect network Larger number of stages: built recursively by replacing middle stage with 3-stage Clos Characterized by triple (m, n, r) M: # of middle stage switches N: # of input/output ports on input/output switches R: # of input/output switches Hop Count = 4 Fall 2014 ECE 1749H: Interconnection Networks (Enright Jerger) 44

  45. Clos Network rxr input switch nxm input switch mxn output switch rxr input switch nxm input switch mxn output switch rxr input switch nxm input switch mxn output switch rxr input switch nxm input switch mxn output switch rxr input switch Fall 2014 ECE 1749H: Interconnection Networks (Enright Jerger) 45

  46. Clos Network Strictly non-blocking when m > 2n-1 Any input can connect to any unique output port r x n nodes Degree First and last stages: n + m, middle stage: 2r Path diversity: m Can be folded along middle switches Input and output switches are shared Fall 2014 ECE 1749H: Interconnection Networks (Enright Jerger) 46

  47. Folded Clos (Fat Tree) Bandwidth remains constant at each level Regular Tree: Bandwidth decreases closer to root Fall 2014 ECE 1749H: Interconnection Networks (Enright Jerger) 47

  48. Fat Tree (2) Provides path diversity Fall 2014 ECE 1749H: Interconnection Networks (Enright Jerger) 48

  49. Application of Topologies to On-Chip Networks FBFly Convert butterfly to direct network Swizzle switch Circuit-optimized crossbar Rings Simple, low hardware cost Mesh networks Several products/prototypes MECS and bus-based networks Broadcast and multicast capabilities Fall 2014 ECE 1749H: Interconnection Networks (Enright Jerger) 49

  50. Implementation Folding Equalize path lengths Reduces max link length Increases length of other links 0 1 2 3 3 2 1 0 Fall 2014 ECE 1749H: Interconnection Networks (Enright Jerger) 50

Related


More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#