Performance Analysis of Different MIPS Processors

undefined
 
S
OLUTIONS
C
HAPTER
 1
 
E
XERCISE
 1.5
 
Consider two different implementations,P1 and P2, of the same
instruction set. There are five classes of instructions(A,B,C,D,
and E) in the instruction set. The clock rate and CPI of each
class is given below.
 
1.5.1
  Assume that peak performance is defined as the
fastest rate that a computer can execute any instruction
sequence. What are the peak performances of P1 and
P2 expressed in instructions per second?
 
Solution:
   a. P1: 2 × 10
9
 inst/sec, P2: 2 × 10
9
 inst/sec
   b. P1: 2 × 10
9
 inst/sec, P2: 3 × 10
9
 inst/sec
 
1.5.2
  If the number of instructions executed in a
certain program is divided equally among the
classes of instructions except for class A, which
occurs twice as often as each of others, which
computer is faster? How much faster is it?
Solution:
   a. T(P1)/T(P2) =(1*2+2+3+4+3)4/(2*2+2+2+4+4)2
                           =14*2/16= 7/4;
       P2 is 1.75 times faster than P1
   b. T(P2)/T(P1 )= 4.66/5
       P2 is 1.07 times faster than P1
 
1.5.3
  If the number of instructions executed in a
certain program is divided equally among the
classes of instructions except for class E, which
occurs twice as often as each of the others, which
computer is faster? How much faster is it?
Solution:
   a. T(P2)/T(P1) = 4.5/8
       P2 is 1.77 times faster than P1
   b. T(P2)/T(P1) = 5.33/5.5
       P2 is 1.03 times faster than P1
 
The table below shows instruction-type
breakdown for different programs. Using this
data, you will be exploring the performance
trade-offs for different changes made to an MIPS
process.
 
 
1.5.4
  Assuming that ALU take 1 cycle, loads and
store instructions take 10 cycles, and branches
take 3 cycles, find the execution time on a 3GHz
MIPS processor.
Solution:
   a. (600+600*10+200*10+50*3)/3*10^9=2.91 
μ
s
   b. 2.50 
μ
s
 
1.5.5
  Assuming that computers take 1 cycle,
loads and store instructions take 2 cycles, and
branches take 3 cycles, find the execution time on
a 3GHz MIPS processor.
Solution:
   a. 0.78 
μ
s
   b. 0.90 
μ
s
 
1.5.6
  Assuming that computers take 1 cycle,
loads and store instructions take 2 cycles, and
branches take 3 cycles, what is the speedup if the
number of compute instruction can be reduced by
one-half ?
Solution:
   a. 0.78 
μ
s
   b. 0.90 
μ
s
 
E
XERCISE
 1.6
 
Compilers can have a profound impact on the
performance of an application on given a processor. This
problem will explore the impact compilers have on
execution time.
 
1.6.1
  For the same program, two different
compilers are used. The table above shows the
execution time of the two different compiled
programs. Find the average CPI for each
program given that the processor has a clock
cycle time of 1 ns.
Solution:
   CPI = T
exec
 × f / No. Instr=1.8s/1 ns /1.00E+09
   a. CPI(Compiler A)=1.8; CPI(Compiler B)=1.5.
   b. CPI(Compiler A)=1.1; CPI(Compiler B)=1.25.
 
 
 
 
1.6.2
  Assume the average CPIs found in 1.6.1, but that the
compiled programs run on two different processors. If the
execution times on the two processors are the same, how
much faster is the clock of the processor running compiler
A’s code versus the clock of the processor running
compiler B’s code?
Solution:
f
A
/f
B
 = (No. Instr(A)
×
 CPI(A))/(No. Instr(B)
×
CPI(B))
   a. 
f
A
/f
B 
= (1*1.8)/(1.5*1.2)=1
   b
. f
A
/f
B 
=0.73
 
1.6.3
  A new compiler is developed that uses only
600 million instructions and has an average CPI
of 1.1. What is the speedup of using this new
compiler versus using Compiler A or B on the
original processor of 1.6.1?
Solution:
   a. T
new
/T
A
 = 0.6*1.1/1*1.8=0.36 T
new
/T
B
 = 0.36
   b. T
new
/T
A
 = 0.6 T
new
/T
B
 = 0.44
 
Consider two different implementations, P1 and
P2, of the same instruction set. There are five
classes of instructions(A,B,C,D, and E) in the
instruction set. P1 has a clock rate of 4GHz, and
P2 has clock rate of 6GHz. The average number
of cycles for each instruction class for P1 and P2
are listed in the following table.
 
1.6.4
  Assume that peak performance is defined
as the fastest rate that a computer can execute
any instruction sequence. What are the peak
performances of P1 and P2 expressed in
instructions per second?
Solution:
   a. 4 × 109 Inst/s 2 × 109 Inst/s
   b. 4 × 109 Inst/s 3 × 109 Inst/s
 
1.6.5
  If the number of instructions executed in a
certain program is divided equally among the five
classes of instructions except for class A, which
occurs twice as often as each of the others, how
much faster is P2 than P1?
Solution:
   a. T1/T2 = 1.9
   b. T1/T2 = 1.5
 
1.6.6
  At what frequency does P1 have the same
performance of P2 for the instruction mix given
in 1.6.5 ?
Solution:
   a. 4.37 GHz
   b. 6 GHz
 
E
XERCISE
 1.14
 
Section 1.8 cites as a pitfall the utilization of a
subset of the performance equation as a
performance metric. To illustrate this, consider
the following data for the execution of a program
in different processors.
 
1.14.1
  One usual fallacy is to consider the
computer with the largest clock rate as having
the largest performance. Check if this is true for
P1 and P2.
Solution:
    No. instr = 10
6
     a. T(P1) = 5 × 10
6
 × 0.9/(4 × 10
9
) = 1.125 × 10
–3
 s
        T(P2) = 10
6
 × 0.75/(3 × 10
9
) = 0.25 × 10
–3
 s
        clock rate (P1) > clock rate (P2)
        performance (P1) < performance (P2)
 
1.14.1
  One usual fallacy is to consider the
computer with the largest clock rate as having
the largest performance. Check if this is true for
P1 and P2.
Solution:
    No. instr = 10
6
     b. T(P1) = 3 × 10
6
 × 1.1/(3 × 10
9
) = 1.1 × 10
–3
 s
        T(P2) = 0.5 × 10
6
 × 1/(2.5 × 10
9
) = 0.2 × 10
–3
 s
        clock rate (P1) > clock rate (P2)
        performance (P1) < performance (P2)
 
1.14.2
  Another fallacy is to consider that the
processor executing the largest number of instructions
will need a larger CPU time. Considering that
processor P1 is executing a sequence of 10
6
instructions and that the CPI of processors P1 and P2
do not change, determine the number of instructions
that P2 can execute in the same time that P1 needs to
execute 10
6
 instructions.
Solution:
    a. 10
6
 instructions,  T(P1) = No. Intr × CPI/clock rate
        T(P1) = 2.25 × 10
–4
 s
        T(P2) = N × 0.75/(3 × 10
9
)  then N = 9 × 10
5
 
1.14.2
  Another fallacy is to consider that the
processor executing the largest number of instructions
will need a larger CPU time. Considering that
processor P1 is executing a sequence of 10
6
instructions and that the CPI of processors P1 and P2
do not change, determine the number of instructions
that P2 can execute in the same time that P1 needs to
execute 10
6
 instructions.
Solution:
    b. 106 instructions, T(P1) = No. Intr × CPI/clock rate
        T(P1) = 3.66 × 10
–4
 s
        
T(P2) = N × 1/(3 × 10
9
)  then N = 9.15 × 10
5
 
1.14.3
  A common fallacy is to use MIPS(millions
of instructions per second) to compare the
performance of two different processors, and
consider that the processor with the largest MIPS
has the largest performance. Check if this is true
for P1 and P2.
Solution:
      MIPS = Clock rate × 10
−6
/CPI
     a. MIPS(P1) = 4 × 10
9
 × 10
–6
/0.9 = 4.44 × 10
3
         MIPS(P2) = 3 × 10
9
 × 10
–6
/0.75 = 4.0 × 10
3
         MIPS(P1) > MIPS(P2)
         performance(P1) < performance(P2) (from 1.14.1)
 
1.14.3
  A common fallacy is to use MIPS(millions
of instructions per second) to compare the
performance of two different processors, and
consider that the processor with the largest MIPS
has the largest performance. Check if this is true
for P1 and P2.
Solution:
      MIPS = Clock rate × 10
−6
/CPI
     b. MIPS(P1) = 3 × 10
9
 × 10
–6
/1.1 = 2.72 × 10
3
         MIPS(P2) = 2.5 × 10
9
 × 10
–6
/1 = 2.5 × 10
3
         MIPS(P1) > MIPS(P2)
         performance(P1) < performance(P2) (from 1.14.1)
 
Another common performance figure is
MFLOPS(millions of floating-point operations per
second), defined as
   MFLOPS = No.FP operations / (execution time × 10
6
)
but this figure has the same problems as MIPS.
Consider the program in the following table, running
on the two processors below.
 
1.14.4
  Find the MFLOPS figures for the programs.
MFLOPS = No. FP operations × 10
−6
/T
   a:
T(P1) = (5 × 10
5
 × 0.75 + 4 × 10
5
 × 1 + 10 × 10
5
 × 1.5)/(4
× 10
9
) = 5.86 × 10
–4
 s
MFLOPS(P1) = 4 × 10
5
 × 10
–6
/(5.86 × 10
–4
 ) = 6.82 × 10
2
T(P2) = (2 × 10
6
 × 1.25 + 2 × 10
6
 × 0.8 + 1 × 10
6
 ×
1.25)/(3 × 10
9
) = 1.78 × 10
–3
 s
MFLOPS(P2) = 3 × 10
5
 × 10
–6
/(1.78 × 10
–3
) = 1.68 × 10
2
 
1.14.4
  Find the MFLOPS figures for the programs.
MFLOPS = No. FP operations × 10
−6
/T
   b:
T(P1) = (1.5 × 10
6
 × 1.5 + 1.5 × 10
6
 × 1 + 2 × 10
6
 × 2)/(4
× 10
9
) = 1.93 × 10
–3
 s
MFLOPS(P1) = 1.5 × 10
6
 × 10
–6
/(1.93 × 10
–3
) = 7.7 × 10
2
T(P2) = (0.8 × 10
6
 × 1.25 + 0.6 × 10
6
 × 1 + 0.6 × 10
6
 ×
2.5)/(3 × 10
9
) = 1.03 × 10
–3
 s
MFLOPS(P2) = 0.6 × 10
6
 × 10
–6
/(1.03 × 10
–3
) = 5.82 × 10
2
 
 
1.14.5
  Find the MIPS figures for the programs.
  a:
T(P1) = (5 × 10
5
 × 0.75 + 4 × 10
5
 × 1 + 10 × 10
5
 ×
1.5)/(4 × 10
9
) = 5.86 × 10
–4
 (s)
CPI(P1) = 5.86 × 10
–4
 × 4 × 10
9
/10
6
 = 2.27
MIPS(P1) = 4 × 10
9
/(2.27 ×10
6
) = 1.76 × 10
3
T(P2) = (2 × 10
6
 × 1.25 + 2 × 10
6
 × 0.8 + 1 × 10
6
 ×
1.25)/(3 × 10
9
) = 1.78 × 10
–3
 (s)
CPI(P2) = 1.78 × 10
–3
 × 3 × 10
9
/(5 × 10
6
) = 1.068 (s)
MIPS(P2) = 3 × 10
9
/(1.068 × 10
6
) = 2.78 × 10
3
 
 
1.14.5
  Find the MIPS figures for the programs.
   b.
T(P1) = (1.5 × 10
6
 × 1.5 + 1.5 × 10
6
 × 1 + 2 × 10
6
 ×
2)/(4 × 10
9
) = 1.93 × 10
–3
 (s)
CPI(P1) = 1.93 × 10
–3
 × 4 × 10
9
/(5 × 10
6
) = 1.54
MIPS(P1) = 4 × 10
9
/(1.54 × 10
6
) = 2.59 × 10
3
T(P2) = (0.8 × 10
6
 × 1.25 + 0.6 × 10
6
 × 1 + 0.6 × 10
6
 ×
2.5)/(3 × 10
9
) = 1.03 × 10
–3
 (s)
CPI(P2) = 1.03 × 10
–3
 × 3 × 10
9
/(2 ×10
6
) = 1.54
MIPS(P1) = 3 × 10
9
/(1.54 × 10
6
) = 1.94 × 10
3
 
1.14.6
  Find the performance for the programs
and compare it with MIPS ans MFLOPS.
   a:
T(P1) = 5.86 × 10
–4
 s (see problem 1.14.5)
performance(P1) = 1/T(P1) = 1.7 × 10
3
T(P2) = 1.78 × 10
–3
 s s (see problem 1.14.5)
performance(P2) = 1/T(P2) = 5.6 × 10
2
perf(P1) > perf(P2),
   MIPS(P1) > MIPS(P2),  MFLOPS(P1) < MFLOPS(P2)
 
1.14.6
  Find the performance for the programs
and compare it with MIPS ans MFLOPS.
   b:
T(P1) = 1.93 × 10
–3
 s s (see problem 1.14.5)
performance(P1) = 1/T(P1) = 5.1 × 10
2
T(P2) = 1.03 × 10
–3
 s s (see problem 1.14.5)
performance(P2) = 1/T(P2) = 9.7 × 10
2
perf(P1) < perf(P2),
   MIPS(P1) < MIPS(P2), MFLOPS(P1) > MFLOPS(P2)
 
E
XERCISE
 1.15
 
Another pitfall cited in Section 1.8 is expecting to
improve the overall performance of a computer by
improving only one aspect of the computer. This
might be true, but not always. Consider a
computer running programs with CPU times
shown in the following table.
 
1.15.1
  How much is the total time reduced if the
time for FP operations is reduced by 20%?
Solution:
   a. T
fp
 = 70 × 0.8 = 56 s.
       T
new
= 56 + 85 + 55 + 40 = 236 s.
       Reduction: 5.6%
   b. T
fp
 = 40 × 0.8 = 32 s.
       T
new
= 32 + 90 + 60 + 20 = 202 s.
       Reduction: 3.8%
 
1.15.2
  How much is the time for INT operations
reduced if the total time is reduced by 20%?
Solution:
   a. T
new
 = 250 × 0.8 = 200 s
       T
fp
 + T
l/s
 + T
branch
 = 165 s, T
int
 = 35 s
       Reduction time INT: 58.8%
   b. T
new
 = 210 × 0.8 = 168 s
       T
fp
 + T
l/s
 + T
branch
 = 120 s, T
int
 = 48 s
       Reduction time INT: 46.6%
 
1.15.3
  Can the total time be reduced by 20%by
reducing only the time for branch instructions?
Solution:
   a. T
new
 = 250 × 0.8 = 200 s
       T
fp
 + T
int
 + T
l/s
 = 210 s
       NO
   b. T
new
 = 210 × 0.8 = 168 s
       T
fp
 + T
int
 + T
l/s
 = 190 s
       NO
 
The following table shows the instruction type
breakdown per processor of given applications
executed in different numbers of processors.
 
 
 
 
Assume that each processor has a 2 GHz clock
rate.
 
1.15.4
  How much must we improve the CPI of FP
instructions if we want the program to run two
times faster?
Solution:
   Clock cyles =     CPI
fp
 × No. FP instr.
                          + CPI
int
 × No. INT instr.
                          + CPI
l/s
 × No. L/S instr.
                          + CPI
branch
 × No. branch instr.
   T
cpu
 = clock cycles/clock rate = clock cycles/2 × 10
9
   a. 2 processors: clock cycles = 4,096 × 106
       T
cpu
 = 2.048 s
   b. 16 processors: clock cycles = 512 × 106
       T
cpu
 = 0.256 s
 
1.15.4
  How much must we improve the CPI of FP
instructions if we want the program to run two
times faster?
Solution:
    To half the number of clock cycles by improving the
CPI of FP instructions:
      CPI
improved fp 
× No. FP instr.
   + CPI
int
 × No. INT instr.
   + CPI
l/s
 × No. L/S instr.
   +
CPI
branch
 × No. branch instr.   =   clock cycles/2
CPI
improved fp 
= (clock cycles/2 − (CPI
int
 × No. INT instr.
+ CPI
l/s
 × No. L/S instr. + CPI
branch
 × No. branch
instr.))/No. FP instr.
 
1.15.4
  How much must we improve the CPI of FP
instructions if we want the program to run two times
faster?
Solution:
    a. 2 processors:
      CPI
improved fp
 = (2,048 – 3,816)/280 < 0 ==> not possible
    b. 16 processors:
      CPI
improved fp
 = (256 – 462)/50 < 0 ==> not possible
 
1.15.5
  How much must we improve the CPI of L/S
instructions if we want the program to run two times
faster?
Solution:
    Using the clock cycle data from 1.15.4:
    To half the number of clock cycles improving the CPI of
L/S instructions:
    CPI
fp
 × No. FP instr.
 + CPI
int
 × No. INT instr.
 + CPI
improved l/s
 × No. L/S instr.
 +
CPI
branch
 × No. branch instr.   =   clock cycles/2
 
 CPI
improved l/s 
= (clock cycles/2 − (CPI
fp
 × No. FP instr.
 + CPI
int
 × No. INT instr. + CPI
branch
 × No. branch
instr.))/No. L/S instr.
 
1.15.5
  How much must we improve the CPI of L/S
instructions if we want the program to run two times
faster?
Solution:
   a.
     2 processors:
      CPI
improved l/s
 = (2,048 – 1,536)/640 = 0.8
   b.
     16 processors:
      CPI
improved l/s
 = (256 – 198)/80 = 0.725
 
1.15.6
  How much is the execution time of the program
improved if the CPI of INT and FP instructions is reduced
by 40% and the CPI of L/S and Branch is reduced by 30%?
Solution:
   clock cyles =
          CPI
fp
 
×
 No. FP instr.
       + CPI
int
 
×
 No. INT instr.
       + CPI
l/s
 
×
 No. L/S instr.
       + CPI
branch
 
×
 No. branch instr.
T
cpu
 = clock cycles/clock rate = clock cycles/2 
×
 10
9
CPI
int
 = 0.6 
×
 1 = 0.6; CPI
fp
 = 0.6 
×
 1 = 0.6;
   CPI
l/s
 = 0.7 
×
 4 = 2.8; CPI
branch
 = 0.7 
×
 2 = 1.4
2 processors: T
cpu
 (before improv.) = 2.048 s
                          T
cpu
 (after improv.) = 1.370 s
   16processors: T
cpu
 (before improv.) = 0.256 s
                          T
cpu
 (after improv.) = 0.171 s
Slide Note
Embed
Share

This analysis compares two different implementations (P1 and P2) of the same instruction set based on clock rates, CPI values, peak performances, and execution times on a 3GHz MIPS processor. The performance trade-offs for changes made to an MIPS process are explored using a breakdown of different program instructions.

  • Performance Analysis
  • MIPS Processors
  • Instruction Set
  • Clock Rate
  • CPI

Uploaded on Jul 30, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. SOLUTIONS CHAPTER 1

  2. EXERCISE 1.5 Consider two different implementations,P1 and P2, of the same instruction set. There are five classes of instructions(A,B,C,D, and E) in the instruction set. The clock rate and CPI of each class is given below. Clock Rate CPI Class A 1 2 1 1 CPI Class B 2 2 1 2 CPI Class C 3 2 2 3 CPI Class D 4 4 3 4 CPI Class E 3 4 2 3 a P1 P2 P1 P2 2.0GHz 4.0GHz 2.0GHz 3.0GHz b

  3. 1.5.1 Assume that peak performance is defined as the fastest rate that a computer can execute any instruction sequence. What are the peak performances of P1 and P2 expressed in instructions per second? Solution: a. P1: 2 109inst/sec, P2: 2 109inst/sec b. P1: 2 109inst/sec, P2: 3 109inst/sec Clock Rate CPI Class A 1 2 1 1 CPI Class B 2 2 1 2 CPI Class C 3 2 2 3 CPI Class D 4 4 3 4 CPI Class E 3 4 2 3 a P1 P2 P1 P2 2.0GHz 4.0GHz 2.0GHz 3.0GHz b

  4. 1.5.2 If the number of instructions executed in a certain program is divided equally among the classes of instructions except for class A, which occurs twice as often as each of others, which computer is faster? How much faster is it? Solution: a. T(P1)/T(P2) =(1*2+2+3+4+3)4/(2*2+2+2+4+4)2 =14*2/16= 7/4; P2 is 1.75 times faster than P1 b. T(P2)/T(P1 )= 4.66/5 P2 is 1.07 times faster than P1 Clock Rate CPI Class A a P1 2.0GHz 1 P2 4.0GHz 2 b P1 2.0GHz 1 P2 3.0GHz 1 CPI Class B 2 2 1 2 CPI Class C 3 2 2 3 CPI Class D 4 4 3 4 CPI Class E 3 4 2 3

  5. 1.5.3 If the number of instructions executed in a certain program is divided equally among the classes of instructions except for class E, which occurs twice as often as each of the others, which computer is faster? How much faster is it? Solution: a. T(P2)/T(P1) = 4.5/8 P2 is 1.77 times faster than P1 b. T(P2)/T(P1) = 5.33/5.5 P2 is 1.03 times faster than P1 Clock Rate CPI Class A 1 2 1 1 CPI Class B 2 2 1 2 CPI Class C 3 2 2 3 CPI Class D 4 4 3 4 CPI Class E 3 4 2 3 a P1 P2 P1 P2 2.0GHz 4.0GHz 2.0GHz 3.0GHz b

  6. The breakdown for different programs. Using this data, you will be exploring the performance trade-offs for different changes made to an MIPS process. table below shows instruction-type No. Instructions Load Store 600 200 500 100 Compute 600 900 Branch 50 200 Total 1450 1700 a b program1 program2

  7. 1.5.4 Assuming that ALU take 1 cycle, loads and store instructions take 10 cycles, and branches take 3 cycles, find the execution time on a 3GHz MIPS processor. Solution: a. (600+600*10+200*10+50*3)/3*10^9=2.91 s b. 2.50 s No. Instructions Load Store 600 200 500 100 Compute 600 900 Branch 50 200 Total 1450 1700 a b program1 program2

  8. 1.5.5 Assuming that computers take 1 cycle, loads and store instructions take 2 cycles, and branches take 3 cycles, find the execution time on a 3GHz MIPS processor. Solution: a. 0.78 s b. 0.90 s No. Instructions Load Store 600 200 500 100 Compute 600 900 Branch 50 200 Total 1450 1700 a b program1 program2

  9. 1.5.6 Assuming that computers take 1 cycle, loads and store instructions take 2 cycles, and branches take 3 cycles, what is the speedup if the number of compute instruction can be reduced by one-half ? Solution: a. 0.78 s b. 0.90 s No. Instructions Load Store 600 200 500 100 Compute 600 900 Branch 50 200 Total 1450 1700 a b program1 program2

  10. EXERCISE 1.6 Compilers performance of an application on given a processor. This problem will explore the impact compilers have on execution time. can have a profound impact on the Compiler A Compiler B No. Instructions 1.00E+09 1.00E+09 Execution Time 1.8s 1.1s No. Instructions 1.20E+09 1.20E+09 Execution Time 1.8s 1.5s a b

  11. 1.6.1 For the same program, two different compilers are used. The table above shows the execution time of the two different compiled programs. Find the average CPI for each program given that the processor has a clock cycle time of 1 ns. Solution: CPI = Texec f / No. Instr=1.8s/1 ns /1.00E+09 a. CPI(Compiler A)=1.8; CPI(Compiler B)=1.5. b. CPI(Compiler A)=1.1; CPI(Compiler B)=1.25. Compiler A Compiler B No. Instructions 1.00E+09 1.00E+09 Execution Time 1.8s 1.1s No. Instructions 1.20E+09 1.20E+09 Execution Time 1.8s 1.5s a b

  12. 1.6.2 Assume the average CPIs found in 1.6.1, but that the compiled programs run on two different processors. If the execution times on the two processors are the same, how much faster is the clock of the processor running compiler A s code versus the clock of the processor running compiler B s code? Solution: fA/fB= (No. Instr(A) CPI(A))/(No. Instr(B) CPI(B)) a. fA/fB = (1*1.8)/(1.5*1.2)=1 b. fA/fB =0.73 Compiler A Compiler B No. Instructions 1.00E+09 1.00E+09 Execution Time 1.8s 1.1s No. Instructions 1.20E+09 1.20E+09 Execution Time 1.8s 1.5s a b

  13. 1.6.3 A new compiler is developed that uses only 600 million instructions and has an average CPI of 1.1. What is the speedup of using this new compiler versus using Compiler A or B on the original processor of 1.6.1? Solution: a. Tnew/TA= 0.6*1.1/1*1.8=0.36 Tnew/TB= 0.36 b. Tnew/TA= 0.6 Tnew/TB= 0.44 Compiler A Compiler B No. Instructions 1.00E+09 1.00E+09 Execution Time 1.8s 1.1s No. Instructions 1.20E+09 1.20E+09 Execution Time 1.8s 1.5s a b

  14. Consider two different implementations, P1 and P2, of the same instruction set. There are five classes of instructions(A,B,C,D, and E) in the instruction set. P1 has a clock rate of 4GHz, and P2 has clock rate of 6GHz. The average number of cycles for each instruction class for P1 and P2 are listed in the following table. CPI Class A CPI Class B CPI Class C CPI Class D CPI Class E a P1 P2 P1 P2 1 3 1 2 2 3 2 2 3 3 3 2 4 5 4 2 5 5 5 6 b

  15. 1.6.4 Assume that peak performance is defined as the fastest rate that a computer can execute any instruction sequence. What are the peak performances of P1 and P2 expressed in instructions per second? Solution: a. 4 109 Inst/s 2 109 Inst/s b. 4 109 Inst/s 3 109 Inst/s CPI Class A CPI Class B CPI Class C CPI Class D CPI Class E a P1 P2 P1 P2 1 3 1 2 2 3 2 2 3 3 3 2 4 5 4 2 5 5 5 6 b

  16. 1.6.5 If the number of instructions executed in a certain program is divided equally among the five classes of instructions except for class A, which occurs twice as often as each of the others, how much faster is P2 than P1? Solution: a. T1/T2 = 1.9 b. T1/T2 = 1.5 CPI Class A CPI Class B CPI Class C CPI Class D CPI Class E a P1 P2 P1 P2 1 3 1 2 2 3 2 2 3 3 3 2 4 5 4 2 5 5 5 6 b

  17. 1.6.6 At what frequency does P1 have the same performance of P2 for the instruction mix given in 1.6.5 ? Solution: a. 4.37 GHz b. 6 GHz CPI Class A CPI Class B CPI Class C CPI Class D CPI Class E a P1 P2 P1 P2 1 3 1 2 2 3 2 2 3 3 3 2 4 5 4 2 5 5 5 6 b

  18. EXERCISE 1.14 Section 1.8 cites as a pitfall the utilization of a subset of the performance equation as a performance metric. To illustrate this, consider the following data for the execution of a program in different processors. Processor P1 P2 P1 P2 Clock Rate 4 GHz 3 GHz 3 GHz 2.5 GHz CPI 0.9 0.75 1.1 No. Instr. 5.00E+06 1.00E+06 3.00E+06 0.50E+06 a b

  19. 1.14.1 One usual fallacy is to consider the computer with the largest clock rate as having the largest performance. Check if this is true for P1 and P2. Solution: No. instr = 106 a. T(P1) = 5 106 0.9/(4 109) = 1.125 10 3s T(P2) = 106 0.75/(3 109) = 0.25 10 3s clock rate (P1) > clock rate (P2) performance (P1) < performance (P2) Processor P1 P2 P1 P2 Clock Rate 4 GHz 3 GHz 3 GHz 2.5 GHz CPI 0.9 0.75 1.1 No. Instr. 5.00E+06 1.00E+06 3.00E+06 0.50E+06 a b

  20. 1.14.1 One usual fallacy is to consider the computer with the largest clock rate as having the largest performance. Check if this is true for P1 and P2. Solution: No. instr = 106 b. T(P1) = 3 106 1.1/(3 109) = 1.1 10 3s T(P2) = 0.5 106 1/(2.5 109) = 0.2 10 3s clock rate (P1) > clock rate (P2) performance (P1) < performance (P2) Processor P1 P2 P1 P2 Clock Rate 4 GHz 3 GHz 3 GHz 2.5 GHz CPI 0.9 0.75 1.1 No. Instr. 5.00E+06 1.00E+06 3.00E+06 0.50E+06 a b

  21. 1.14.2 Another fallacy is to consider that the processor executing the largest number of instructions will need a larger CPU time. Considering that processor P1 is executing a sequence of 106 instructions and that the CPI of processors P1 and P2 do not change, determine the number of instructions that P2 can execute in the same time that P1 needs to execute 106instructions. Solution: a. 106instructions, T(P1) = No. Intr CPI/clock rate T(P1) = 2.25 10 4s T(P2) = N 0.75/(3 109) then N = 9 105 Processor P1 P2 P1 P2 Clock Rate 4 GHz 3 GHz 3 GHz 2.5 GHz CPI 0.9 0.75 1.1 No. Instr. 5.00E+06 1.00E+06 3.00E+06 0.50E+06 a b

  22. 1.14.2 Another fallacy is to consider that the processor executing the largest number of instructions will need a larger CPU time. Considering that processor P1 is executing a sequence of 106 instructions and that the CPI of processors P1 and P2 do not change, determine the number of instructions that P2 can execute in the same time that P1 needs to execute 106instructions. Solution: b. 106 instructions, T(P1) = No. Intr CPI/clock rate T(P1) = 3.66 10 4s T(P2) = N 1/(3 109) then N = 9.15 105 Processor P1 P2 P1 P2 Clock Rate 4 GHz 3 GHz 3 GHz 2.5 GHz CPI 0.9 0.75 1.1 No. Instr. 5.00E+06 1.00E+06 3.00E+06 0.50E+06 a b

  23. 1.14.3 A common fallacy is to use MIPS(millions of instructions per second) to compare the performance of two different processors, and consider that the processor with the largest MIPS has the largest performance. Check if this is true for P1 and P2. Solution: MIPS = Clock rate 10 6/CPI a. MIPS(P1) = 4 109 10 6/0.9 = 4.44 103 MIPS(P2) = 3 109 10 6/0.75 = 4.0 103 MIPS(P1) > MIPS(P2) performance(P1) < performance(P2) (from 1.14.1) Processor Clock Rate a P1 P2 b P1 P2 2.5 GHz CPI 0.9 0.75 1.1 No. Instr. 5.00E+06 1.00E+06 3.00E+06 0.50E+06 4 GHz 3 GHz 3 GHz

  24. 1.14.3 A common fallacy is to use MIPS(millions of instructions per second) to compare the performance of two different processors, and consider that the processor with the largest MIPS has the largest performance. Check if this is true for P1 and P2. Solution: MIPS = Clock rate 10 6/CPI b. MIPS(P1) = 3 109 10 6/1.1 = 2.72 103 MIPS(P2) = 2.5 109 10 6/1 = 2.5 103 MIPS(P1) > MIPS(P2) performance(P1) < performance(P2) (from 1.14.1) Processor Clock Rate a P1 P2 b P1 P2 2.5 GHz CPI 0.9 0.75 1.1 No. Instr. 5.00E+06 1.00E+06 3.00E+06 0.50E+06 4 GHz 3 GHz 3 GHz

  25. Another common performance figure is MFLOPS(millions of floating-point operations per second), defined as MFLOPS = No.FP operations / (execution time 106) but this figure has the same problems as MIPS. Consider the program in the following table, running on the two processors below. P Instr. Count No. instructions L/S FP 50% 40% 40% 40% 30% 30% 40% 30% CPI FP 1.0 0.8 1.0 1.0 Clock Rate Branch 10% 20% 40% 30% L/S 0.75 1.25 1.5 1.25 Branch 1.5 1.25 2.0 2.5 a P1 P2 P1 P2 1.00E+06 5.00E+06 5.00E+06 2.00E+06 4 GHz 3 GHz 4 GHz 3 GHz b

  26. 1.14.4 Find the MFLOPS figures for the programs. MFLOPS = No. FP operations 10 6/T a: T(P1) = (5 105 0.75 + 4 105 1 + 10 105 1.5)/(4 109) = 5.86 10 4s MFLOPS(P1) = 4 105 10 6/(5.86 10 4) = 6.82 102 T(P2) = (2 106 1.25 + 2 106 0.8 + 1 106 1.25)/(3 109) = 1.78 10 3s MFLOPS(P2) = 3 105 10 6/(1.78 10 3) = 1.68 102 P Instr. Count No. instructions L/S FP 50% 40% 40% 40% 30% 30% 40% 30% CPI FP 1.0 0.8 1.0 1.0 Clock Rate Branch 10% 20% 40% 30% L/S 0.75 1.25 1.5 1.25 Branch 1.5 1.25 2.0 2.5 a P1 P2 P1 P2 1.00E+06 5.00E+06 5.00E+06 2.00E+06 4 GHz 3 GHz 4 GHz 3 GHz b

  27. 1.14.4 Find the MFLOPS figures for the programs. MFLOPS = No. FP operations 10 6/T b: T(P1) = (1.5 106 1.5 + 1.5 106 1 + 2 106 2)/(4 109) = 1.93 10 3s MFLOPS(P1) = 1.5 106 10 6/(1.93 10 3) = 7.7 102 T(P2) = (0.8 106 1.25 + 0.6 106 1 + 0.6 106 2.5)/(3 109) = 1.03 10 3s MFLOPS(P2) = 0.6 106 10 6/(1.03 10 3) = 5.82 102 P Instr. Count No. instructions L/S FP 50% 40% 40% 40% 30% 30% 40% 30% CPI FP 1.0 0.8 1.0 1.0 Clock Rate Branch 10% 20% 40% 30% L/S 0.75 1.25 1.5 1.25 Branch 1.5 1.25 2.0 2.5 a P1 P2 P1 P2 1.00E+06 5.00E+06 5.00E+06 2.00E+06 4 GHz 3 GHz 4 GHz 3 GHz b

  28. 1.14.5 Find the MIPS figures for the programs. a: T(P1) = (5 105 0.75 + 4 105 1 + 10 105 1.5)/(4 109) = 5.86 10 4(s) CPI(P1) = 5.86 10 4 4 109/106= 2.27 MIPS(P1) = 4 109/(2.27 106) = 1.76 103 T(P2) = (2 106 1.25 + 2 106 0.8 + 1 106 1.25)/(3 109) = 1.78 10 3(s) CPI(P2) = 1.78 10 3 3 109/(5 106) = 1.068 (s) MIPS(P2) = 3 109/(1.068 106) = 2.78 103 P Instr. Count No. instructions L/S FP 50% 40% 40% 40% 30% 30% 40% 30% CPI FP 1.0 0.8 1.0 1.0 Clock Rate Branch 10% 20% 40% 30% L/S 0.75 1.25 1.5 1.25 Branch 1.5 1.25 2.0 2.5 a P1 P2 P1 P2 1.00E+06 5.00E+06 5.00E+06 2.00E+06 4 GHz 3 GHz 4 GHz 3 GHz b

  29. 1.14.5 Find the MIPS figures for the programs. b. T(P1) = (1.5 106 1.5 + 1.5 106 1 + 2 106 2)/(4 109) = 1.93 10 3(s) CPI(P1) = 1.93 10 3 4 109/(5 106) = 1.54 MIPS(P1) = 4 109/(1.54 106) = 2.59 103 T(P2) = (0.8 106 1.25 + 0.6 106 1 + 0.6 106 2.5)/(3 109) = 1.03 10 3(s) CPI(P2) = 1.03 10 3 3 109/(2 106) = 1.54 MIPS(P1) = 3 109/(1.54 106) = 1.94 103 P Instr. Count No. instructions L/S FP 50% 40% 40% 40% 30% 30% 40% 30% CPI FP 1.0 0.8 1.0 1.0 Clock Rate Branch 10% 20% 40% 30% L/S 0.75 1.25 1.5 1.25 Branch 1.5 1.25 2.0 2.5 a P1 P2 P1 P2 1.00E+06 5.00E+06 5.00E+06 2.00E+06 4 GHz 3 GHz 4 GHz 3 GHz b

  30. 1.14.6 Find the performance for the programs and compare it with MIPS ans MFLOPS. a: T(P1) = 5.86 10 4s (see problem 1.14.5) performance(P1) = 1/T(P1) = 1.7 103 T(P2) = 1.78 10 3s s (see problem 1.14.5) performance(P2) = 1/T(P2) = 5.6 102 perf(P1) > perf(P2), MIPS(P1) > MIPS(P2), MFLOPS(P1) < MFLOPS(P2) P Instr. Count No. instructions L/S FP 50% 40% 40% 40% 30% 30% 40% 30% CPI FP 1.0 0.8 1.0 1.0 Clock Rate Branch 10% 20% 40% 30% L/S 0.75 1.25 1.5 1.25 Branch 1.5 1.25 2.0 2.5 a P1 P2 P1 P2 1.00E+06 5.00E+06 5.00E+06 2.00E+06 4 GHz 3 GHz 4 GHz 3 GHz b

  31. 1.14.6 Find the performance for the programs and compare it with MIPS ans MFLOPS. b: T(P1) = 1.93 10 3s s (see problem 1.14.5) performance(P1) = 1/T(P1) = 5.1 102 T(P2) = 1.03 10 3s s (see problem 1.14.5) performance(P2) = 1/T(P2) = 9.7 102 perf(P1) < perf(P2), MIPS(P1) < MIPS(P2), MFLOPS(P1) > MFLOPS(P2) P Instr. Count No. instructions L/S FP 50% 40% 40% 40% 30% 30% 40% 30% CPI FP 1.0 0.8 1.0 1.0 Clock Rate Branch 10% 20% 40% 30% L/S 0.75 1.25 1.5 1.25 Branch 1.5 1.25 2.0 2.5 a P1 P2 P1 P2 1.00E+06 5.00E+06 5.00E+06 2.00E+06 4 GHz 3 GHz 4 GHz 3 GHz b

  32. EXERCISE 1.15 Another pitfall cited in Section 1.8 is expecting to improve the overall performance of a computer by improving only one aspect of the computer. This might be true, but not always. Consider a computer running programs with CPU times shown in the following table. FP Instr. 70s 40s INT Instr. 85s 90s L/S Instr. 55s 60s Branch Instr. 40s 20s Total Time 250s 210s a b

  33. 1.15.1 How much is the total time reduced if the time for FP operations is reduced by 20%? Solution: a. Tfp= 70 0.8 = 56 s. Tnew= 56 + 85 + 55 + 40 = 236 s. Reduction: 5.6% b. Tfp= 40 0.8 = 32 s. Tnew= 32 + 90 + 60 + 20 = 202 s. Reduction: 3.8% FP Instr. 70s 40s INT Instr. 85s 90s L/S Instr. 55s 60s Branch Instr. 40s 20s Total Time 250s 210s a b

  34. 1.15.2 How much is the time for INT operations reduced if the total time is reduced by 20%? Solution: a. Tnew= 250 0.8 = 200 s Tfp+ Tl/s+ Tbranch= 165 s, Tint= 35 s Reduction time INT: 58.8% b. Tnew= 210 0.8 = 168 s Tfp+ Tl/s+ Tbranch= 120 s, Tint= 48 s Reduction time INT: 46.6% FP Instr. 70s 40s INT Instr. 85s 90s L/S Instr. 55s 60s Branch Instr. 40s 20s Total Time 250s 210s a b

  35. 1.15.3 Can the total time be reduced by 20%by reducing only the time for branch instructions? Solution: a. Tnew= 250 0.8 = 200 s Tfp+ Tint+ Tl/s= 210 s NO b. Tnew= 210 0.8 = 168 s Tfp+ Tint+ Tl/s= 190 s NO FP Instr. 70s 40s INT Instr. 85s 90s L/S Instr. 55s 60s Branch Instr. 40s 20s Total Time 250s 210s a b

  36. The following table shows the instruction type breakdown per processor of given applications executed in different numbers of processors. P FP Instr. 280 106 50 106 INT Instr. 1000 106 110 106 L/S Instr. 640 106 80 106 Branch Instr. 128 106 16 106 CPI (FP) 1 1 CPI (INT) 1 1 CPI (L/S) 4 4 CPI (Branch) 2 2 a b 2 16 Assume that each processor has a 2 GHz clock rate.

  37. 1.15.4 How much must we improve the CPI of FP instructions if we want the program to run two times faster? Solution: Clock cyles = CPIfp No. FP instr. + CPIint No. INT instr. + CPIl/s No. L/S instr. + CPIbranch No. branch instr. Tcpu= clock cycles/clock rate = clock cycles/2 109 a. 2 processors: clock cycles = 4,096 106 Tcpu= 2.048 s b. 16 processors: clock cycles = 512 106 Tcpu= 0.256 s P FP Instr. Instr. Instr. 2 280 106 1000 106 640 106 16 50 106 110 106 80 106 INT L/S Branch Instr. 128 106 16 106 CPI (FP) 1 1 CPI (INT) 1 1 CPI (L/S) 4 4 CPI (Branch) 2 2 a b

  38. 1.15.4 How much must we improve the CPI of FP instructions if we want the program to run two times faster? Solution: To half the number of clock cycles by improving the CPI of FP instructions: CPIimproved fp No. FP instr. + CPIint No. INT instr. + CPIl/s No. L/S instr. +CPIbranch No. branch instr. = clock cycles/2 CPIimproved fp= (clock cycles/2 (CPIint No. INT instr. + CPIl/s No. L/S instr. + CPIbranch No. branch instr.))/No. FP instr.

  39. 1.15.4 How much must we improve the CPI of FP instructions if we want the program to run two times faster? Solution: a. 2 processors: CPIimproved fp= (2,048 3,816)/280 < 0 ==> not possible b. 16 processors: CPIimproved fp= (256 462)/50 < 0 ==> not possible

  40. 1.15.5 How much must we improve the CPI of L/S instructions if we want the program to run two times faster? Solution: Using the clock cycle data from 1.15.4: To half the number of clock cycles improving the CPI of L/S instructions: CPIfp No. FP instr. + CPIint No. INT instr. + CPIimproved l/s No. L/S instr. +CPIbranch No. branch instr. = clock cycles/2 CPIimproved l/s = (clock cycles/2 (CPIfp No. FP instr. + CPIint No. INT instr. + CPIbranch No. branch instr.))/No. L/S instr.

  41. 1.15.5 How much must we improve the CPI of L/S instructions if we want the program to run two times faster? Solution: a. 2 processors: CPIimproved l/s= (2,048 1,536)/640 = 0.8 b. 16 processors: CPIimproved l/s= (256 198)/80 = 0.725

  42. 1.15.6 How much is the execution time of the program improved if the CPI of INT and FP instructions is reduced by 40% and the CPI of L/S and Branch is reduced by 30%? Solution: clock cyles = CPIfp No. FP instr. + CPIint No. INT instr. + CPIl/s No. L/S instr. + CPIbranch No. branch instr. Tcpu= clock cycles/clock rate = clock cycles/2 109 CPIint= 0.6 1 = 0.6; CPIfp= 0.6 1 = 0.6; CPIl/s= 0.7 4 = 2.8; CPIbranch= 0.7 2 = 1.4 2 processors: Tcpu(before improv.) = 2.048 s Tcpu(after improv.) = 1.370 s 16processors: Tcpu(before improv.) = 0.256 s Tcpu(after improv.) = 0.171 s

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#