Accelerator Design Space Exploration Tutorial

Tutorial Outline
2
Aladdin: A pre-RTL, Power-
Performance Accelerator Simulator
3
Aladdin: A pre-RTL, Power-
Performance Accelerator Simulator
4
Aladdin: A pre-RTL, Power-
Performance Accelerator Simulator
Future Accelerator-Centric Architecture
5
Future Accelerator-Centric Architecture
6
 
Aladdin
 can 
rapidly
 evaluate 
large
 design
space of accelerator-centric architectures.
Aladdin Overview
C Code
Power/Area
Performance
Activity
Acc Design
Parameters
7
Aladdin Overview
C Code
Program
Constrained
DDDG
Resource
Constrained
DDDG
Optimization Phase
Realization Phase
Power/Area
Performance
Activity
Acc Design
Parameters
8
From C to Design Space
C Code:
for(i=0; i<N; ++i)
  c[i] = a[i] + b[i];
9
Aladdin Overview
C Code
Program
Constrained
DDDG
Resource
Constrained
DDDG
Optimization Phase
Realization Phase
Power/Area
Performance
Activity
Acc Design
Parameters
10
From C to Design Space
IR Dynamic Trace
C Code:
for(i=0; i<N; ++i)
  c[i] = a[i] + b[i];
 
 
0.   r0=0  //i = 0
1.
r4=load (r0 + r1) //load a[i]
2.
r5=load (r0 + r2) //load b[i]
3.
r6=r4 + r5
4.
store(r0 + r3, r6) //store c[i]
5.
r0=r0 + 1  //++i
6.
r4=load(r0 + r1) //load a[i]
7.
r5=load(r0 + r2) //load b[i]
8.
r6=r4 + r5
9.
store(r0 + r3, r6) //store c[i]
10.
r0 = r0 + 1  //++i
11
Aladdin Overview
C Code
Program
Constrained
DDDG
Resource
Constrained
DDDG
Optimization Phase
Realization Phase
Power/Area
Performance
Activity
Acc Design
Parameters
12
From C to Design Space
Initial DDDG
C Code:
for(i=0; i<N; ++i)
  c[i] = a[i] + b[i];
IR Trace:
0.  r0=0  //i = 0
1.  r4=load (r0 + r1) //load a[i]
2.  r5=load (r0 + r2) //load b[i]
3.  r6=r4 + r5
4.  store(r0 + r3, r6) //store c[i]
5.  r0=r0 + 1  //++i
6.  r4=load(r0 + r1) //load a[i]
7.  r5=load(r0 + r2) //load b[i]
8.  r6=r4 + r5
9.  store(r0 + r3, r6) //store c[i]
10.r0 = r0 + 1  //++i
13
Aladdin Overview
C Code
Program
Constrained
DDDG
Resource
Constrained
DDDG
Optimization Phase
Realization Phase
Power/Area
Performance
Activity
Acc Design
Parameters
14
0. i=0
5. i++
10. i++
11. ld a
12. ld b
13.  +
14. st c
6. ld a
7. ld b
8.  +
9. st c
1. ld a
2. ld b
3.  +
4. st c
C Code:
for(i=0; i<N; ++i)
  c[i] = a[i] + b[i];
IR Trace:
0.  r0=0  //i = 0
1.  r4=load (r0 + r1) //load a[i]
2.  r5=load (r0 + r2) //load b[i]
3.  r6=r4 + r5
4.  store(r0 + r3, r6) //store c[i]
5.  r0=r0 + 1  //++i
6.  r4=load(r0 + r1) //load a[i]
7.  r5=load(r0 + r2) //load b[i]
8.  r6=r4 + r5
9.  store(r0 + r3, r6) //store c[i]
10.r0 = r0 + 1  //++i
15
From C to Design Space
Idealistic DDDG
Include application-specific customization strategies.
Node-Level:
Bit-width Analysis
Strength Reduction
Tree-height Reduction
Loop-Level:
Remove dependences between loop index variables
Memory Optimization:
Memory-to-Register Conversion
Store-Load Forwarding
Store Buffer
Extensible
e.g. Model CAM accelerator by matching nodes in DDDG
16
From C to Design Space
Idealistic DDDG
Aladdin Overview
C Code
Program
Constrained
DDDG
Resource
Constrained
DDDG
Optimization Phase
Realization Phase
Power/Area
Performance
Activity
Acc Design
Parameters
17
From C to Design Space
One Design
Idealistic DDDG
 
Acc Design Parameters:
 Memory BW <= 
2
 
1
 
Adder
0. i=0
5.i++
10. i++
11. ld a
12. ld b
13.  +
14. st c
6. ld a
7. ld b
8.  +
9. st c
1. ld a
2. ld b
3.  +
4. st c
15. i++
16. ld a
17. ld b
18.  +
19. st c
18
From C to Design Space
Another Design
19
 
Acc Design Parameters:
 Memory BW <= 
4
 
2
 
Adders
Idealistic DDDG
0. i=0
5.i++
10. i++
11. ld a
12. ld b
13.  +
14. st c
6. ld a
7. ld b
8.  +
9. st c
1. ld a
2. ld b
3.  +
4. st c
15. i++
16. ld a
17. ld b
18.  +
19. st c
Constrain the DDDG with program and user-defined
resource constraints
Program Constraints
Control Dependence
Memory Ambiguation
Resource Constraints
Loop-level Parallelism
Loop Pipelining
Memory Ports
20
From C to Design Space
Realization Phase: DDDG->Power-Perf
Cycle
Power
21
From C to Design Space
Power-Performance per Design
From C to Design Space
Design Space of an Algorithm
Cycle
Power
22
Power Model
Functional Units Power Model
Microbenchmarks characterize various FUs.
Design Compiler with 40nm Standard Cell
 
SRAM Power Model
Commercial register file and SRAM memory
compilers with the same 40nm standard cell library
23
Aladdin Overview
C Code
Program
Constrained
DDDG
Resource
Constrained
DDDG
Optimization Phase
Realization Phase
Power/Area
Performance
Activity
Acc Design
Parameters
24
Aladdin Validation
C Code
Power/Area       Performance
Aladdin
25
Aladdin Validation
C Code
Power/Area       Performance
Aladdin
RTL
Designer
HLS C
Tuning
Vivado
HLS
ModelSim
Design
Compiler
Verilog
Activity
26
Aladdin Validation
27
Aladdin Validation
28
Aladdin enables rapid design space
exploration for accelerators.
C Code
Power/Area       Performance
Aladdin
RTL
Designer
HLS C
Tuning
Vivado
HLS
ModelSim
Design
Compiler
Verilog
Activity
29
Limitations
 
Algorithm Choices
Aladdin generates a design space per algorithm
Can use Aladdin to quickly compare the design spaces
of algorithms
Input Dependent
Inputs that exercise all paths of the code
Input C Code
Aladdin can create DDDG for any C code.
C constructs that require resources outside the
accelerator, such as system calls and dynamic memory
allocation, are not modeled.
30
Aladdin enables pre-RTL simulation of
accelerators with the rest of the SoC.
31
GPU
Shared Resources
Memory
Interface
Sea of Fine-Grained
Accelerators
GPGPU-
Sim
Cacti/Orion2
DRAMSim2
Architectures with 1000s of accelerators will
be radically different; New design tools are
needed.
Aladdin enables rapid design space
exploration of future accelerator-centric
platforms.
Download Aladdin at
http://vlsiarch.eecs.harvard.edu/aladdin
32
Aladdin: A pre-RTL, Power-
Performance Accelerator Simulator
Tutorial Outline
Aladdin Hands-on Exercise
Goal:
Running a power-performance design space exploration for
stencil2d in MachSuite.
Tasks:
1.
Build LLVM-Tracer, Aladdin, and verify with aladdin unit-tests.
2.
Walk through the design space exploration steps using triad as
an example:
a)
Generate LLVM IR trace
b)
Prepare a hardware configuration file
c)
Run Aladdin
d)
Explore the parameter space
Unrolling
Memory Bandwidth
Clock frequency
3.
Repeat the above steps for MachSuite/stencil2d
Task 1: Build LLVM-Tracer and Aladdin
Make sure LLVM-Tracer and Aladdin are built
successfully in your virtual machine.
Task 2 Design Space Exploration for
triad
void triad (int *a, int *b, int *c, int s) {
  int i;
  triad_loop: for (i = 0; i < NUM; i++) {
    c[i] = a[i] + s * b[i];
  }
}
Task 2 Design Space Exploration for
triad
void triad (int *a, int *b, int *c, int s) {
  int i;
  triad_loop: for (i = 0; i < NUM; i++) {
    c[i] = a[i] + s * b[i];
  }
}
Arrays
Task 2 Design Space Exploration for
triad
void triad (int *a, int *b, int *c, int s) {
  int i;
  triad_loop: for (i = 0; i < NUM; i++) {
    c[i] = a[i] + s * b[i];
  }
}
Arrays
Loop
Array Parameters
Read port
Write port
Partition/Bank
partition,cyclic,a,8192,4,1
// partition type: cyclic
// array name : a
// array size : 8192 Bytes
// element size : 4 Bytes (int)
// partition factor : 1 (1 partition)
Array Parameters
Read port
Write port
Partition/Bank
partition,cyclic,a,8192,4,1
// partition type: cyclic
// array name : a
// array size : 8192 Bytes
// element size : 4 Bytes (int)
// partition factor : 1 (1 partition)
partition,cyclic,a,8192,4,
2
// partition type: cyclic
// array name : a
// array size : 8192 Bytes
// element size : 4 Bytes (int)
// partition factor : 
2 (2 partitions)
Array Parameters
Read port
Write port
Partition/Bank
partition,
cyclic
,a,8192,4,
2
// partition type: cyclic
// array name : a
// array size : 8192 Bytes
// element size : 4 Bytes (int)
// partition factor : 2 (2 partitions)
a[0]
a[1]
a[2]
a[3]
partition,
block
,a,8192,4,
2
// partition type: block
// array name : a
// array size : 8192 Bytes
// element size : 4 Bytes (int)
// partition factor : 2 (2 partitions)
a[0]
a[2]
a[1]
a[3]
Loop Parameters
 
X
+
unrolling,triad,triad_loop,1
// unrolling a loop
// function name : triad
// loop label : triad_loop
// unrolling factor : 1
Loop Parameters
 
X
+
X
+
unrolling,triad,triad_loop,
2
// unrolling a loop
// function name : triad
// loop label : triad_loop
// unrolling factor : 
2
Task 2.1 Generator Triad Trace
vagrant@genie:~$ 
cd gem5-
aladdin/src/aladdin/SHOC/triad/
vagrant@genie:~/gem5-
aladdin/src/aladdin/SHOC/triad$ 
vi triad.c
vagrant@genie:~/gem5-
aladdin/src/aladdin/SHOC/triad$ 
make run-
trace
vagrant@genie:~/gem5-
aladdin/src/aladdin/SHOC/triad$
vi dynamic_trace.gz
Task 2.2 Setup a design config
vagrant@genie:~/gem5-
aladdin/src/aladdin/SHOC/triad$ 
mkdir
example
vagrant@genie:~/gem5-
aladdin/src/aladdin/SHOC/triad/example$
vi triad.cfg
Task 2.2 Setup a design config
cycle_time,6
pipelining,1
partition,cyclic,a,8192,4,1
partition,cyclic,b,8192,4,1
partition,cyclic,c,8192,4,1
unrolling,triad,triad_loop,1
Task 2.2 Setup a design config
vagrant@genie:~/gem5-
aladdin/src/aladdin/SHOC/triad/example$
cp ../run.sh .
vagrant@genie:~/gem5-
aladdin/src/aladdin/SHOC/triad/example$
make outputs
vagrant@genie:~/gem5-
aladdin/src/aladdin/SHOC/triad/example$
bash run.sh
Task 2.3 Design Space Exploration
Task 2.3 Design Space Exploration
Task 2.3 Design Space Exploration
Task 2.3 Design Space Exploration
Tutorial Outline
Slide Note

Amortize optimization phase

Please do not distribute

GYW

Embed
Share

This tutorial covers hands-on activities and presentations on virtual machine setup, accelerator research overview, RTL modeling, design space exploration using Aladdin, gem5-Aladdin for system integration, and SoC design space exploration. Aladdin, a pre-RTL power-performance accelerator simulator, is highlighted for its features and the future trends in accelerator-centric architectures. Learn about algorithmic-HW design space, flexibility, and programmability in accelerator design.

  • Accelerator Design
  • Aladdin
  • RTL Modeling
  • System Integration
  • Design Space Exploration

Uploaded on Sep 29, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Tutorial Outline Time Topic 8:45 am 9:00 am Hands-on: Virtual Machine Setup 9:00 am 9:20 am Presentation: Accelerator Research Overview 9:20 am 9:35 am Presentation: Presentation: Aladdin: Accelerator Pre Aladdin: Accelerator Pre- -RTL Modeling RTL Modeling 9:35 am 10:15 am Hands-on: Accelerator Design Space Exploration using Aladdin 10:15 am 10:30 am Break 10:30 am 11:00 am Presentation: gem5-Aladdin: Accelerator System Integration 11:00 am 12:00 pm Hands-on: SoC Design Space Exploration using gem5-Aladdin

  2. Aladdin: A pre-RTL, Power- Performance Accelerator Simulator Shared Memory/Interconnect Models Aladdin Unmodified C-Code Power/Area Accelerator Specific Datapath Private L1/ Scratchpad Accelerator Design Parameters (e.g., # FU, mem. BW) Performance Accelerator Simulator Design Accelerator-Rich SoC Fabrics and Memory Systems 2

  3. Aladdin: A pre-RTL, Power- Performance Accelerator Simulator Shared Memory/Interconnect Models Aladdin Unmodified C-Code Power/Area Accelerator Specific Datapath Private L1/ Scratchpad Accelerator Design Parameters (e.g., # FU, mem. BW) Performance Accelerator Simulator Design Accelerator-Rich SoC Fabrics and Memory Systems Flexibility Programmability 3

  4. Aladdin: A pre-RTL, Power- Performance Accelerator Simulator Shared Memory/Interconnect Models Aladdin Unmodified C-Code Power/Area Accelerator Specific Datapath Private L1/ Scratchpad Accelerator Design Parameters (e.g., # FU, mem. BW) Performance Accelerator Simulator Design Accelerator-Rich SoC Fabrics and Memory Systems Design Assistant Understand Algorithmic-HW Design Space before RTL Flexibility Programmability Design Cost 4

  5. Future Accelerator-Centric Architecture Small Cores Big Cores Shared Resources GPU/DS P Memory Interface Sea of Fine-Grained Accelerators 5

  6. Future Accelerator-Centric Architecture Small Cores Big Cores Shared Resources GPU/DS P Memory Interface Sea of Fine-Grained Accelerators Aladdin can rapidly evaluate large design space of accelerator-centric architectures. 6

  7. Aladdin Overview Optimization Phase Optimistic IR Initial DDDG Idealistic DDDG C Code Dynamic Data Dependence Graph (DDDG) Performance Activity Resource Constrained DDDG Program Constrained DDDG Acc Design Parameters Power/Area Models Power/Area Realization Phase 7

  8. Aladdin Overview Optimization Phase Optimistic IR Initial DDDG Idealistic DDDG C Code Performance Activity Resource Constrained DDDG Program Constrained DDDG Acc Design Parameters Power/Area Models Power/Area Realization Phase 8

  9. From C to Design Space C Code: for(i=0; i<N; ++i) c[i] = a[i] + b[i]; 9

  10. Aladdin Overview Optimization Phase Optimistic IR Initial DDDG Idealistic DDDG C Code Performance Activity Resource Constrained DDDG Program Constrained DDDG Acc Design Parameters Power/Area Models Power/Area Realization Phase 10

  11. From C to Design Space IR Dynamic Trace 0. r0=0 //i = 0 1. r4=load (r0 + r1) //load a[i] 2. r5=load (r0 + r2) //load b[i] 3. r6=r4 + r5 4. store(r0 + r3, r6) //store c[i] 5. r0=r0 + 1 //++i 6. r4=load(r0 + r1) //load a[i] 7. r5=load(r0 + r2) //load b[i] 8. r6=r4 + r5 9. store(r0 + r3, r6) //store c[i] 10. r0 = r0 + 1 //++i C Code: for(i=0; i<N; ++i) c[i] = a[i] + b[i]; 11

  12. Aladdin Overview Optimization Phase Optimistic IR Initial DDDG Idealistic DDDG C Code Performance Activity Resource Constrained DDDG Program Constrained DDDG Acc Design Parameters Power/Area Models Power/Area Realization Phase 12

  13. From C to Design Space Initial DDDG IR Trace: 0. r0=0 //i = 0 1. r4=load (r0 + r1) //load a[i] 2. r5=load (r0 + r2) //load b[i] 3. r6=r4 + r5 4. store(r0 + r3, r6) //store c[i] 5. r0=r0 + 1 //++i 6. r4=load(r0 + r1) //load a[i] 7. r5=load(r0 + r2) //load b[i] 8. r6=r4 + r5 9. store(r0 + r3, r6) //store c[i] 10.r0 = r0 + 1 //++i 0. i=0 5. i++ 1. ld a 2. ld b C Code: for(i=0; i<N; ++i) c[i] = a[i] + b[i]; 10. i++ 6. ld a 7. ld b 3. + 11. ld a 12. ld b 8. + 4. st c 13. + 9. st c 14. st c 13

  14. Aladdin Overview Optimization Phase Optimistic IR Initial DDDG Idealistic DDDG C Code Performance Activity Resource Constrained DDDG Program Constrained DDDG Acc Design Parameters Power/Area Models Power/Area Realization Phase 14

  15. From C to Design Space Idealistic DDDG IR Trace: 0. r0=0 //i = 0 1. r4=load (r0 + r1) //load a[i] 2. r5=load (r0 + r2) //load b[i] 3. r6=r4 + r5 4. store(r0 + r3, r6) //store c[i] 5. r0=r0 + 1 //++i 6. r4=load(r0 + r1) //load a[i] 7. r5=load(r0 + r2) //load b[i] 8. r6=r4 + r5 9. store(r0 + r3, r6) //store c[i] 10.r0 = r0 + 1 //++i 0. i=0 0. i=0 5. i++ 10. i++ 6. ld a 7. ld b 2. ld b 1. ld a 11. ld a 12. ld b 5. i++ 2. ld b 1. ld a C Code: for(i=0; i<N; ++i) c[i] = a[i] + b[i]; 10. i++ 6. ld a 7. ld b 3. + 3. + 8. + 13. + 11. ld a 12. ld b 8. + 4. st c 4. st c 14. st c 9. st c 13. + 9. st c 14. st c 15

  16. From C to Design Space Idealistic DDDG Include application-specific customization strategies. Node-Level: Bit-width Analysis Strength Reduction Tree-height Reduction Loop-Level: Remove dependences between loop index variables Memory Optimization: Memory-to-Register Conversion Store-Load Forwarding Store Buffer Extensible e.g. Model CAM accelerator by matching nodes in DDDG 16

  17. Aladdin Overview Optimization Phase Optimistic IR Initial DDDG Idealistic DDDG C Code Performance Activity Resource Constrained DDDG Program Constrained DDDG Acc Design Parameters Power/Area Models Power/Area Realization Phase 17

  18. From C to Design Space One Design Resource Activity Idealistic DDDG 0. i=0 0. i=0 5.i++ 15. i++ 10. i++ 1. ld a 2. ld b MEM MEM 1. ld a 6. ld a 16. ld a 17. ld b 7. ld b 11. ld a 12. ld b 2. ld b + 3. + 18. + 13. + 8. + 3. + MEM 4. st c 19. st c 14. st c 4. st c 9. st c + 5.i++ MEM MEM 6. ld a 7. ld b Acc Design Parameters: Memory BW <= 2 1 Adder + 8. + MEM 9. st c Cycle 18

  19. From C to Design Space Another Design Resource Activity Idealistic DDDG + 0. i=0 5.i++ 15. i++ 0. i=0 10. i++ 5.i++ MEM MEM MEM MEM + 1. ld a 6. ld a 16. ld a 17. ld b 1. ld a 6. ld a 7. ld b 11. ld a 12. ld b 2. ld b 7. ld b 2. ld b + 18. + 13. + 8. + 3. + 3. + 8. + MEM MEM 19. st c 14. st c 4. st c 9. st c 4. st c 9. st c + + 15. i++ 10. i++ MEM MEM MEM MEM + 16. ld a 17. ld b 11. ld a 12. ld b Acc Design Parameters: Memory BW <= 4 2 Adders + 18. + 13. + MEM MEM 19. st c 14. st c Cycle 19

  20. From C to Design Space Realization Phase: DDDG->Power-Perf Constrain the DDDG with program and user-defined resource constraints Program Constraints Control Dependence Memory Ambiguation Resource Constraints Loop-level Parallelism Loop Pipelining Memory Ports 20

  21. From C to Design Space Power-Performance per Design Acc Design Parameters: Memory BW <= 4 2 Adders Power Acc Design Parameters: Memory BW <= 2 1 Adder Cycle 21

  22. From C to Design Space Design Space of an Algorithm Power Cycle 22

  23. Power Model Functional Units Power Model Microbenchmarks characterize various FUs. Design Compiler with 40nm Standard Cell Power = +Pileakage (activityi*Pidynamic) 1<i<N SRAM Power Model Commercial register file and SRAM memory compilers with the same 40nm standard cell library 23

  24. Aladdin Overview Optimization Phase Optimistic IR Initial DDDG Idealistic DDDG C Code Performance Activity Resource Constrained DDDG Program Constrained DDDG Acc Design Parameters Power/Area Models Power/Area Realization Phase 24

  25. Aladdin Validation Aladdin C Code Power/Area Performance Design Compiler Verilog Activity ModelSim 25

  26. Aladdin Validation Aladdin C Code Power/Area Performance Design Compiler RTL Designer Verilog Activity Vivado HLS HLS C Tuning ModelSim 26

  27. Aladdin Validation 27

  28. Aladdin Validation 28

  29. Aladdin enables rapid design space exploration for accelerators. Aladdin 7 mins C Code Power/Area Performance Design Compiler RTL Designer 52 hours Verilog Activity Vivado HLS HLS C Tuning ModelSim 29

  30. Limitations Algorithm Choices Aladdin generates a design space per algorithm Can use Aladdin to quickly compare the design spaces of algorithms Input Dependent Inputs that exercise all paths of the code Input C Code Aladdin can create DDDG for any C code. C constructs that require resources outside the accelerator, such as system calls and dynamic memory allocation, are not modeled. 30

  31. Aladdin enables pre-RTL simulation of accelerators with the rest of the SoC. Big Cores ... gem5 Small Cores gem5 Shared Resources Cacti/Orion2 GPGPU- Memory Interface GPU Sim DRAMSim2 Sea of Fine-Grained Accelerators 31

  32. Aladdin: A pre-RTL, Power- Performance Accelerator Simulator Architectures with 1000s of accelerators will be radically different; New design tools are needed. Aladdin enables rapid design space exploration of future accelerator-centric platforms. Download Aladdin at http://vlsiarch.eecs.harvard.edu/aladdin 32

  33. Tutorial Outline Time Topic 8:45 am 9:00 am Hands-on: Virtual Machine Setup 9:00 am 9:20 am Presentation: Accelerator Research Overview 9:20 am 9:35 am Presentation: Aladdin: Accelerator Pre-RTL Modeling 9:35 am 10:15 am Hands-on: Accelerator Design Space Exploration using Aladdin 10:15 am 10:30 am Break 10:30 am 11:00 am Presentation: gem5-Aladdin: Accelerator System Integration 11:00 am 12:00 pm Hands-on: SoC Design Space Exploration using gem5-Aladdin

  34. Aladdin Hands-on Exercise Goal: Running a power-performance design space exploration for stencil2d in MachSuite. Tasks: 1. Build LLVM-Tracer, Aladdin, and verify with aladdin unit-tests. 2. Walk through the design space exploration steps using triad as an example: a) Generate LLVM IR trace b) Prepare a hardware configuration file c) Run Aladdin d) Explore the parameter space Unrolling Memory Bandwidth Clock frequency 3. Repeat the above steps for MachSuite/stencil2d

  35. Task 1: Build LLVM-Tracer and Aladdin Make sure LLVM-Tracer and Aladdin are built successfully in your virtual machine.

  36. Task 2 Design Space Exploration for triad void triad (int *a, int *b, int *c, int s) { int i; triad_loop: for (i = 0; i < NUM; i++) { c[i] = a[i] + s * b[i]; } }

  37. Task 2 Design Space Exploration for triad Arrays void triad (int *a, int *b, int *c, int s) { int i; triad_loop: for (i = 0; i < NUM; i++) { c[i] = a[i] + s * b[i]; } }

  38. Task 2 Design Space Exploration for triad Arrays void triad (int *a, int *b, int *c, int s) { int i; triad_loop: for (i = 0; i < NUM; i++) { c[i] = a[i] + s * b[i]; } } Loop

  39. Read port Array Parameters Write port Partition/Bank partition,cyclic,a,8192,4,1 // partition type: cyclic // array name : a // array size : 8192 Bytes // element size : 4 Bytes (int) // partition factor : 1 (1 partition)

  40. Read port Array Parameters Write port Partition/Bank partition,cyclic,a,8192,4,1 // partition type: cyclic // array name : a // array size : 8192 Bytes // element size : 4 Bytes (int) // partition factor : 1 (1 partition) partition,cyclic,a,8192,4,2 // partition type: cyclic // array name : a // array size : 8192 Bytes // element size : 4 Bytes (int) // partition factor : 2 (2 partitions)

  41. Read port Array Parameters Write port Partition/Bank partition,cyclic,a,8192,4,2 // partition type: cyclic // array name : a // array size : 8192 Bytes // element size : 4 Bytes (int) // partition factor : 2 (2 partitions) a[0] a[1] a[2] a[3] partition,block,a,8192,4,2 // partition type: block // array name : a // array size : 8192 Bytes // element size : 4 Bytes (int) // partition factor : 2 (2 partitions) a[2] a[0] a[3] a[1]

  42. Loop Parameters a b s unrolling,triad,triad_loop,1 // unrolling a loop // function name : triad // loop label : triad_loop // unrolling factor : 1 X + c

  43. Loop Parameters a b s a b s X X + + c c unrolling,triad,triad_loop,2 // unrolling a loop // function name : triad // loop label : triad_loop // unrolling factor : 2

  44. Task 2.1 Generator Triad Trace vagrant@genie:~$ cd gem5- aladdin/src/aladdin/SHOC/triad/ vagrant@genie:~/gem5- aladdin/src/aladdin/SHOC/triad$ vi triad.c vagrant@genie:~/gem5- aladdin/src/aladdin/SHOC/triad$ make run- trace vagrant@genie:~/gem5- aladdin/src/aladdin/SHOC/triad$ vi dynamic_trace.gz

  45. Task 2.2 Setup a design config vagrant@genie:~/gem5- aladdin/src/aladdin/SHOC/triad$ mkdir example vagrant@genie:~/gem5- aladdin/src/aladdin/SHOC/triad/example$ vi triad.cfg

  46. Task 2.2 Setup a design config cycle_time,6 pipelining,1 partition,cyclic,a,8192,4,1 partition,cyclic,b,8192,4,1 partition,cyclic,c,8192,4,1 unrolling,triad,triad_loop,1

  47. Task 2.2 Setup a design config vagrant@genie:~/gem5- aladdin/src/aladdin/SHOC/triad/example$ cp ../run.sh . vagrant@genie:~/gem5- aladdin/src/aladdin/SHOC/triad/example$ make outputs vagrant@genie:~/gem5- aladdin/src/aladdin/SHOC/triad/example$ bash run.sh

  48. Task 2.3 Design Space Exploration Unrolling Partition Clock Period (ns) Cycles Power (mW) 1 1 6

  49. Task 2.3 Design Space Exploration Unrolling Partition Clock Period (ns) Cycles Power (mW) 1 1 6 2052 4.47 4 1 6 2052 4.43 4 4 6 516 10.29 4 4 1 517 68.91

  50. Task 2.3 Design Space Exploration Unrolling Partition Clock Period (ns) Cycles Power (mW) 1 1 6 2052 4.47 4 1 6 2052 4.43 4 4 6 516 10.29 4 4 1 517 68.91

Related


More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#