Laundry Pipelining and Digital Circuits

 
 
 
ECE 352
Digital System Fundamentals
 
Pipelining
 
Laundry Pipelining Example
 
You could wait for a load to completely finish
before you start the next…
Time Units: 
0
1
2
3
4
5
6
7
8
 
G
r
e
e
n
:
 
d
o
n
e
 
a
f
t
e
r
 
2
 
P
u
r
p
l
e
:
 
d
o
n
e
 
a
f
t
e
r
 
4
 
O
r
a
n
g
e
:
 
d
o
n
e
 
a
f
t
e
r
 
6
 
B
l
u
e
:
 
d
o
n
e
 
a
f
t
e
r
 
8
 
Four loads in 8 time units
 
Laundry Pipelining Example
2-stage pipeline: start new washer load at same
time you put first load in dryer
Work on two loads at once, at different stages of completion
Time Units: 
0
1
2
3
4
5
 
G
r
e
e
n
:
 
d
o
n
e
 
a
f
t
e
r
 
2
 
P
u
r
p
l
e
:
 
d
o
n
e
 
a
f
t
e
r
 
3
 
O
r
a
n
g
e
:
 
d
o
n
e
 
a
f
t
e
r
 
4
 
B
l
u
e
:
 
d
o
n
e
 
a
f
t
e
r
 
5
 
Four loads in 5 time units
 
Laundry Pipelining Example
What does this mean?
If you wash your favorite shirt first, it is not done sooner
But you get approximately twice the amount of laundry done
in the same time (if doing many loads)
Time Units: 
0
1
2
3
4
5
G
r
e
e
n
:
 
d
o
n
e
 
a
f
t
e
r
 
2
P
u
r
p
l
e
:
 
d
o
n
e
 
a
f
t
e
r
 
3
O
r
a
n
g
e
:
 
d
o
n
e
 
a
f
t
e
r
 
4
B
l
u
e
:
 
d
o
n
e
 
a
f
t
e
r
 
5
Four loads in 5 time units
 
Terminology
 
Latency
The 
length of time
 it takes for a value ready at the
input to propagate to a “result” at the output
Throughput
The 
rate
 at which “results” are produced
Laundry Example:
Unpipelined
Latency = 2 time units
Throughput = 1 result per 2 time units
Pipelined
Latency = 2 time units
Throughput = 1 result per time unit
 
Note: throughput ignores startup latency
(2 time units until first result produced)
 
Pipelining Digital Circuits
 
Original circuit has lower throughput than desired
What can we do to increase it?
Increase throughput by still producing one result
per clock cycle, but with a shorter 
t
min
How can we decrease 
t
min
 and still accomplish the
same amount of work?
 
Pipelining Digital Circuits
 
Insert registers to subdivide long-latency
combinational blocks into two (or more) stages
New circuit has a shorter critical path
Clock rate of modified circuit is limited by longest
combinational path between registers, so we want to
subdivide as evenly as possible (balance the stages)
 
 
 
Pipelining Effects
 
Original circuit produces 1
result per cycle
 
Pipelined with 
2
 evenly-
balanced stages
Still produces 1 result / cycle
 
t
comb,pipe
 = (
t
comb,orig
 / 
2
)
 
> (
t
min,orig
 / 
2
)
 
t
min,pipe
 
= 
t
pd
 + (
t
comb,orig
 / 
2
) + 
t
s
 
= 
t
pd
 + 
t
comb,pipe
 + 
t
s
 
t
min,pipe
Pipelined circuit can
be clocked faster,
but not 2× faster!
 
Pipelining Effects
 
Throughput is 
increased
!
Produce a result once per cycle
f
max,pipe
 is higher than 
f
max,orig
Latency is 
increased
!
N
 stages, so latency is 
N
 cycles
t
min,pipe
 is more than 
t
min,orig
 / 
N
 
 
Pipelining is only useful if we can take advantage of
throughput increase and can tolerate latency increase
Need to be processing a sequence of data…
Diminishing returns as pipeline depth (
N
) increases
 
Add Four Values: Non-Pipelined
 
Calculate t
min
All paths in this circuit have the same delay
Calculate latency
The time it takes for input values that are ready
in their registers to propagate to output Y
 
=
 
+ t
s
 
+ t
ADD1
 
+ t
ADD2
 
 t
pd
 
=
 
1 cycle 
×
 t
p
 
= 1 
×
 t
min
 
= 4 + 10 + 12 + 1
= 27 ns
 
= 27 ns
 
Throughput = 1 result per cycle
 
Max = 1 result / 27 ns
 
= 
37 M
 results / s
 
Add Four Values: Pipelined
 
Calculate t
min
Based on the 
longest
 path
Calculate latency
Same idea, but remember that the length of each
pipeline stage is dictated by the 
same clock
!
 
=
 
=
 
= 4 + max(10, 12) + 1
= 17 ns
 
Throughput = 1 result per cycle
 
Max = 1 result / 17 ns
 
= 
59 M
 results / s
 
= 34 ns
 
2 cycles 
×
 t
p
 
= 2 
×
 t
min
 
Comparison
t
min
:
Max Throughput:
Minimum Latency:
t
min
:
Max Throughput:
Minimum Latency:
Non-Pipelined
Pipelined
t
s
 
 
= 1ns
t
pd
 
= 4ns
t
ADD1
 = 10ns
t
ADD2
 = 12ns
 
Pipelining Summary
 
Technique that can increase frequency and
throughput at the expense of latency and area
If adding pipeline stages, we need to evaluate:
Is latency, throughput, or area most important for how
that particular circuit will be used?
Where should pipeline registers be added?
Clock speed depends on the 
longest
 path…
Limited by flip-flop t
s
 and t
pd
 (diminishing returns)
There are tricks we can use to mitigate this, but they
are beyond the scope of the class….
 
 
 
ECE 352
Digital System Fundamentals
 
Pipelining
Slide Note

In this presentation, we will introduce the concept of pipelining. Pipelining is a technique we often use to improve the performance of data processing machines. But before we look at what pipelining means in digital circuits, let’s first look at a more familiar example where you are probably already using this idea.

Embed
Share

Explore the concept of pipelining through laundry examples and digital circuits. Learn how pipelining can improve throughput and reduce latency in processing tasks. Discover the benefits of pipelining in achieving more work done in less time.

  • Pipelining
  • Digital Circuits
  • Throughput
  • Latency
  • Processing

Uploaded on Oct 11, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. ECE 352 Digital System Fundamentals Pipelining Pipelining 1 1

  2. Laundry Pipelining Example You could wait for a load to completely finish before you start the next Four loads in 8 time units Pipelining Time Units: 0 1 2 3 4 5 6 7 8 Green: done after 2 Purple: done after 4 Orange: done after 6 Blue: done after 8 2 2

  3. Laundry Pipelining Example 2-stage pipeline: start new washer load at same time you put first load in dryer Work on two loads at once, at different stages of completion Four loads in 5 time units Pipelining Time Units: 0 1 2 3 4 5 Green: done after 2 Purple: done after 3 Orange: done after 4 Blue: done after 5 3 3

  4. Laundry Pipelining Example What does this mean? If you wash your favorite shirt first, it is not done sooner But you get approximately twice the amount of laundry done in the same time (if doing many loads) Four loads in 5 time units Pipelining Time Units: 0 1 2 3 4 5 Green: done after 2 Purple: done after 3 Orange: done after 4 Blue: done after 5 4 4

  5. Terminology Latency The length of time it takes for a value ready at the input to propagate to a result at the output Throughput The rateat which results are produced Laundry Example: Unpipelined Latency = 2 time units Throughput = 1 result per 2 time units Pipelined Latency = 2 time units Throughput = 1 result per time unit Note: throughput ignores startup latency (2 time units until first result produced) Pipelining 5 5

  6. Pipelining Digital Circuits Original circuit has lower throughput than desired What can we do to increase it? Increase throughput by still producing one result per clock cycle, but with a shorter tmin How can we decrease tmin and still accomplish the same amount of work? Pipelining Registers Registers Logic 6 6

  7. Pipelining Digital Circuits Insert registers to subdivide long-latency combinational blocks into two (or more) stages New circuit has a shorter critical path Clock rate of modified circuit is limited by longest combinational path between registers, so we want to subdivide as evenly as possible (balance the stages) Pipelining Registers Registers Registers Registers Registers Stage 1 Stage 2 Logic Logic Logic 7 7

  8. Pipelining Effects Original circuit produces 1 result per cycle ORIGINAL Logic Reg Reg Pipelined with 2 evenly- balanced stages Still produces 1 result / cycle PIPELINED Pipelining Logic Logic Reg Reg Reg tcomb,pipe = (tcomb,orig / 2) = tpd + tcomb,pipe + ts tmin,pipe Pipelined circuit can be clocked faster, but not 2 faster! = tpd + (tcomb,orig / 2) + ts tmin,pipe > (tmin,orig / 2) 8 8

  9. Pipelining Effects Throughput is increased! Produce a result once per cycle fmax,pipe is higher than fmax,orig Latency is increased! N stages, so latency is N cycles tmin,pipe is more than tmin,orig / N ORIGINAL Logic Reg Reg PIPELINED Pipelining Logic Logic Reg Reg Reg Pipelining is only useful if we can take advantage of throughput increase and can tolerate latency increase Need to be processing a sequence of data Diminishing returns as pipeline depth (N) increases 9 9

  10. Add Four Values: Non-Pipelined Calculate tmin All paths in this circuit have the same delay Calculate latency The time it takes for input values that are ready in their registers to propagate to output Y = 4 + 10 + 12 + 1 = 27 ns + tADD1+ tADD2 tpd + ts = min latency = 1 tmin = 1 cycle tp = 27 ns Pipelining For these delay values ts = 1ns tpd = 4ns tADD2 = 12ns tADD1 = 10ns A + tADD1 B + tADD2 Y C + tADD1 Throughput = 1 result per cycle Max = 1 result / 27 ns = 37 M results / s D 10 10

  11. Add Four Values: Pipelined Calculate tmin Based on the longest path Calculate latency Same idea, but remember that the length of each pipeline stage is dictated by the same clock! = 4 + max(10, 12) + 1 = 17 ns + max(tADD1, tADD2) tpd + tS = min latency = 2 tmin 2 cycles tp = = 34 ns Pipelining tCOMB is the longest of these paths For these delay values ts = 1ns tpd = 4ns tADD2 = 12ns tADD1 = 10ns A + tADD1 B + tADD2 Y C + tADD1 Throughput = 1 result per cycle Max = 1 result / 17 ns = 59 M results / s D 11 11

  12. ts = 1ns tpd = 4ns tADD1 = 10ns tADD2 = 12ns Comparison Non-Pipelined Pipelined tmin: Max Throughput: 1 result / cycle tmin: Max Throughput: 1 result / cycle 27 ns 17 ns Pipelining = 37 M results / s = 59 M results / s Minimum Latency: = 1 tmin 1 cycle Minimum Latency: = 2 tmin 2 cycles = 27 ns = 34 ns A A + tADD1 + tADD1 B B + tADD2 + tADD2 Y Y C C + tADD1 + tADD1 D D 12 12

  13. Pipelining Summary Technique that can increase frequency and throughput at the expense of latency and area If adding pipeline stages, we need to evaluate: Is latency, throughput, or area most important for how that particular circuit will be used? Where should pipeline registers be added? Clock speed depends on the longest path Limited by flip-flop ts and tpd (diminishing returns) There are tricks we can use to mitigate this, but they are beyond the scope of the class . Pipelining 13 13

  14. ECE 352 Digital System Fundamentals Pipelining Pipelining 14 14

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#