Efficient Techniques for Writing Parallel Programs

W
r
i
t
i
n
g
 
P
a
r
a
l
l
e
l
 
P
r
o
g
r
a
m
-
2
R
e
c
a
p
Process
Thread
Parallel program
W
r
i
t
i
n
g
 
a
 
P
a
r
a
l
l
e
l
 
P
r
o
g
r
a
m
Sequential Program
            int t=0;
 
for(i=0;i<N;i++)  {
                 t= a[i] + b[i];
                 c[i] = t;
           }
Parallel program
            #pragma omp parallel for
 
 for(i=0;i<N;i++)  {
                 t= a[i] + b[i];
                 c[i] = t;
            }
P
r
i
v
a
t
e
 
a
n
d
 
S
h
a
r
e
d
Shared variable:
 single instance is shared among all threads
Private variable: 
each thread has its own local copy
Example
int x = 5;
#pragma omp parallel {
     
int a = x+1;
}
P
r
i
v
a
t
e
 
a
n
d
 
S
h
a
r
e
d
Shared variable:
 single instance is shared among all threads
Private variable: 
each thread has its own local copy
Example
int x = 5;
#pragma omp parallel {
     
int a = x+1;
}
Example
int x = 5;
#pragma omp parallel private(x){
     int 
x = x+1; 
// bad programming
}
R
u
l
e
s
 
t
o
 
S
p
e
c
i
f
y
 
P
r
i
v
a
t
e
 
a
n
d
 
S
h
a
r
e
d
 
V
a
r
i
a
b
l
e
s
Loop iteration variable is private
int a[N]; int i=0;
:
#pragma omp parallel for
for(i=0;i<N;i++)  a[i] = i;
Better programming practice
int a[N];
:
#pragma omp parallel for
for(int i=0;i<N;i++)  a[i] = i;
R
u
l
e
s
 
t
o
 
S
p
e
c
i
f
y
 
P
r
i
v
a
t
e
 
a
n
d
 
S
h
a
r
e
d
 
V
a
r
i
a
b
l
e
s
Explicit specification of shared and private variables
int n; int a; int b;
:
#pragma omp parallel for 
shared
(n, a) 
private
(b)
for (int i = 0; i < n; i++) {
    int t = b;
   // b = a + i;
}
Values of a private variable is undefined at the entry and exit of a parallel region.
S
t
a
t
i
c
 
S
c
h
e
d
u
l
e
int nthreads=10
#pragma omp parallel for shared(a,b,c) private(i) 
schedule(static)
parallel for(i=0;i<N;i++)  c[i] = a[i] + b[i];
Distribute the chunk of iterations to threads
thread 1 : iteration 0…(N/10)-1
thread 2: iteration (N/10)…2*(N/10)-1
:
thread 10: iteration 9*(N/10)…N-1
A
n
o
t
h
e
r
 
S
t
a
t
i
c
 
S
c
h
e
d
u
l
e
#pragma omp parallel for shared(a,b,c) private(i)
schedule(static,size of the chunk=4)
parallel for(i=0;i<64;i++)  c[i] = a[i] + b[i];
Distribute the chunk of iterations to threads
thread 1 : iteration {0,1,2,3}, {16,17,18,19}, …
thread 2: iteration {4,5,6,7}, {20,21,22,23}, …
thread 3 : iteration {8,9,10,11}, {24,25,26,27}, …
thread 4: iteration {12,13,14,15}, {28,29,30,31}, …
D
y
n
a
m
i
c
 
S
c
h
e
d
u
l
e
schedule(dynamic, n): 
Default value of n=1
Each thread
1.
executes a chunk of n iterations
2.
requests another chunk 
No particular order of chunk assignments to threads
S
t
a
t
i
c
 
v
s
 
D
y
n
a
m
i
c
 
S
c
h
e
d
u
l
e
1.
Dynamic scheduling is preferred when the iterations are of different
computational size
2.
Dynamic scheduling incurs runtime overhead unlike static scheduling
      as distribution is performed during execution
W
a
i
t
i
n
g
 
i
n
 
`
p
a
r
a
l
l
e
l
 
f
o
r
No synchronization at the beginning of a parallel for loop.
Threads synchronize at the end of a parallel for loop.
Ref: https://ppc.cs.aalto.fi/ch3/nowait/
W
a
i
t
i
n
g
 
i
n
 
`
p
a
r
a
l
l
e
l
 
f
o
r
`nowait’ removes the synchronization
Ref: https://ppc.cs.aalto.fi/ch3/nowait/
Slide Note
Embed
Share

Learn about writing parallel programs, thread processes, private and shared variables, rules for specifying variables, and static scheduling for optimized performance. Understand the concepts through code examples and best practices in parallel programming.

  • Parallel programming
  • Thread processes
  • Shared variables
  • Private variables
  • Static scheduling

Uploaded on Sep 26, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Writing Parallel Program Writing Parallel Program- -2 2

  2. Recap Recap Process Thread Parallel program

  3. Writing a Parallel Program Writing a Parallel Program Sequential Program int t=0; for(i=0;i<N;i++) { t= a[i] + b[i]; c[i] = t; } Parallel program #pragma omp parallel for for(i=0;i<N;i++) { t= a[i] + b[i]; c[i] = t; }

  4. Private and Shared Private and Shared Shared variable: single instance is shared among all threads Private variable: each thread has its own local copy Example int x = 5; #pragma omp parallel { int a = x+1; }

  5. Private and Shared Private and Shared Shared variable: single instance is shared among all threads Private variable: each thread has its own local copy Example Example int x = 5; int x = 5; #pragma omp parallel private(x){ int x = x+1; // bad programming } #pragma omp parallel { int a = x+1; }

  6. Rules to Specify Private and Shared Variables Rules to Specify Private and Shared Variables Loop iteration variable is private int a[N]; int i=0; : #pragma omp parallel for for(i=0;i<N;i++) a[i] = i; int a[N]; : #pragma omp parallel for for(int i=0;i<N;i++) a[i] = i; Better programming practice

  7. Rules to Specify Private and Shared Variables Rules to Specify Private and Shared Variables Explicit specification of shared and private variables int n; int a; int b; : #pragma omp parallel for shared(n, a) private(b) for (int i = 0; i < n; i++) { int t = b; // b = a + i; } Values of a private variable is undefined at the entry and exit of a parallel region.

  8. Static Schedule Static Schedule int nthreads=10 #pragma omp parallel for shared(a,b,c) private(i) schedule(static) parallel for(i=0;i<N;i++) c[i] = a[i] + b[i]; Distribute the chunk of iterations to threads thread 1 : iteration 0 (N/10)-1 thread 2: iteration (N/10) 2*(N/10)-1 : thread 10: iteration 9*(N/10) N-1

  9. Another Static Schedule Another Static Schedule #pragma omp parallel for shared(a,b,c) private(i) schedule(static,size of the chunk=4) parallel for(i=0;i<64;i++) c[i] = a[i] + b[i]; Distribute the chunk of iterations to threads thread 1 : iteration {0,1,2,3}, {16,17,18,19}, thread 2: iteration {4,5,6,7}, {20,21,22,23}, thread 3 : iteration {8,9,10,11}, {24,25,26,27}, thread 4: iteration {12,13,14,15}, {28,29,30,31},

  10. Dynamic Schedule Dynamic Schedule schedule(dynamic, n): Default value of n=1 Each thread 1. executes a chunk of n iterations 2. requests another chunk No particular order of chunk assignments to threads

  11. Static vs Dynamic Schedule Static vs Dynamic Schedule 1. Dynamic scheduling is preferred when the iterations are of different computational size 2. Dynamic scheduling incurs runtime overhead unlike static scheduling as distribution is performed during execution

  12. Waiting in `parallel for Waiting in `parallel for No synchronization at the beginning of a parallel for loop. Threads synchronize at the end of a parallel for loop. Ref: https://ppc.cs.aalto.fi/ch3/nowait/

  13. Waiting in `parallel for Waiting in `parallel for `nowait removes the synchronization Ref: https://ppc.cs.aalto.fi/ch3/nowait/

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#