Parallel Programming Directives and Concepts
Learn about parallel programming directives like Diretiva.parallel and #pragma omp.parallel, which allow code to be executed by multiple threads simultaneously. Explore concepts such as defining parallel regions, setting the number of threads, and utilizing OpenMP directives for parallel for loops. Understand how hyperthreading simulates logical cores, and discover functions like omp_get_thread_num to manage thread execution within parallel regions.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
#pragma omp parallel [clauses] { code_block Diretiva parallel } Define uma regi o paralela, que o c digo que ser executado por v rios threads em paralelo.
// omp_parallel.cpp // compile with: /openmp #include <stdio.h> Exemplo Diretiva parallel #include <omp.h> int main() { #pragma omp parallel num_threads(4) { int i = omp_get_thread_num(); printf_s("Hello fromthread %d\n", i); } }
Por padro, o nmero de threads igual ao n mero de processadores l gicosno computador. Por exemplo, se voc tiver uma m quina com um processador f sico com hyperthreading habilitado, ele ter dois processadores l gicos e, portanto, duas threads. Diretiva parallel Hyperthreading-Simulando dois n cleos l gicos em um nico n cleo f sico, cada n cleo l gico recebe seu pr prio controlador de interrup o program vel, e um conjunto de registradores. Os outros recursos do n cleo f sico como cache de mem ria, unidade l gica e aritm tica, barramentos, s o compartilhados entre os n cleos l gicos, parecendo assim um sistema com dois n cleos f sicos.
omp_get_thread_num() Retorna o n mero da thread em execu o dentro de sua equipe de threads em paralelo. Hello from thread 0 Hello from thread 1 Hello from thread 2 Hello from thread 3 Fun o omp_get_thread_num() Observe que a ordem de sa da pode variar em m quinas diferentes. N o confundir com omp_get_num_threads() fun o retorna o n mero de threads, atualmente na equipe de threads executando na regi o paralela do qual ele chamado.
#pragma omp [parallel] for [clauses] for_statement Diretiva OpenMP for Faz com que o trabalho feito em um loop for dentro de uma regi o paralela seja dividido entre threads.
#pragmaompfor for (i = nStart; i <= nEnd; ++i) { #pragmaompatomic nSum+= i; } Diretiva atomic-Especifica que um local de mem ria que ser atualizado numa nica etapa de processamento, relativa a outras threads. An operation acting on shared memory isatomic if it completes in a single step relative to other threads.
Operaes agindo sobre memria compartilhada. Ver a diretiva OpenMP atomic
A diretiva master permite especificar que uma se o de c digo deve ser executada em uma nica thread, n o necessariamente a thread principal. Diretiva master
intmain( ) { inta[5], i; Exemplo Diretivas mastere barrier #pragmaompparallel { // Performsome computation. #pragmaompfor for (i = 0; i < 5; i++) a[i] = i * i;
Sincroniza todos as threads em uma equipe; Diretiva barrier Todas as threads pausam na barreira, at que todas as threads executem a barreira.
// Print intermediate resultsin a single thread. #pragma omp master for (i = 0; i < 5; i++) printf_s("a[%d] = %d\n", i, a[i]); Exemplo Diretivas mastere barrier // Wait. #pragma omp barrier // Continue with the computation. #pragma omp for for (i = 0; i < 5; i++) a[i] += i; } }
#define THREADS 8 #define N 100 int main ( ) { By default, OpenMP statically assigns loop iterations to threads. int i; #pragma omp parallel for num_threads(THREADS) for (i = 0; i < N; i++) { printf( "Thread %d is doing iteration %d.\n", omp_get_thread_num(), i ); } /* all threads done */ printf("All done!\n"); return 0; }
#define THREADS 4 #define N 16 A static schedule can be non- optimal, however. This is the case when the different iterations take different amounts of time. intmain( ) { inti; #pragmaompparallel for schedule(static) num_threads(THREADS) for (i = 0; i < N; i++) { /* waitfor i seconds*/ sleep(i); printf( "Thread %d has completediteration %d.\n", omp_get_thread_num( ), i); } /* all threads done*/ printf("Alldone!\n"); return0; } This program also specifies static scheduling, in the parallel for directive. This program can be greatly improved with a dynamic schedule.
#define THREADS 4 #define N 16 intmain( ) { inti; #pragmaompparallel for schedule(dynamic)num_threads(THREADS) for (i = 0; i < N; i++) { How much faster does this program run? /* wait for i seconds*/ sleep(i); printf( "Thread %d has completediteration %d.\n", omp_get_thread_num( ), i ); } /* all threads done*/ printf("Alldone!\n"); return0; } How much faster does this program run?
Dynamic scheduling is better when the iterations may take very different amounts of time. Dynamic Schedule Overhead However, there is some overhead to dynamic scheduling. After each iteration, the threads must stop and receive a new value of the loop variable to use for its next iteration.
#define THREADS 16 #define N 100000000 int main ( ) { int i; The following program demonstrates this overhead: printf( "Running %d iterations on %d threads dynamically.\n", N, THREADS); #pragma omp parallel for schedule(dynamic) num_threads(THREADS) for (i = 0; i < N; i++) { /* a loop that doesn't take very long */ } /* all threads done */ printf("All done!\n"); return 0; } How long does this program take to execute?
#define THREADS 16 #define N 100000000 int main ( ) { int i; printf( "Running %d iterations on %d threads statically.\n", N, THREADS); #pragma omp parallel for schedule(static) num_threads(THREADS) for (i = 0; i < N; i++) { /* a loop that doesn't take very long */ } /* all threads done */ printf("All done!\n"); return 0; } If we specify static scheduling, the program will run faster:
We can split the difference between static and dynamic scheduling by using chunks in a dynamic schedule. Chunk Sizes Here, each thread will take a set number of iterations, called a chunk , execute it, and then be assigned a new chunk when it is done.
#define THREADS 16 By specifying a chunk size of 100 in the program below, we markedly improve the performance: #define N 100000000 #define CHUNK 100 intmain( ) { int i; printf("Running %d iterationson%d threads dynamically.\n", N, THREADS); #pragma ompparallel for schedule(dynamic, CHUNK) num_threads(THREADS) for (i = 0; i < N; i++) { /* a loop thatdoesn't takeverylong */ } /* all threads done*/ printf("Alldone!\n"); return0; }
Increasingor decreasingthe chunksize... Increasing the chunk size makes the scheduling more static, and decreasing it makes it more dynamic.
Instead of static, or dynamic, we can specify guided as the schedule. This scheduling policy is similar to a dynamic schedule, except that the chunk size changes as the program runs. Guided Schedules It begins with big chunks, but then adjusts to smaller chunk sizes if the workload is imbalanced.
Guided Schedules How does the program above perform with a guided schedule?
#define THREADS 16 #define N 100000000 int main ( ) { int i; printf("Running %d iterations on %d threads guided. \n", N, THREADS); #pragma omp parallel for schedule(guided) num_threads(THREADS) for (i = 0; i < N; i++) { /* a loop that doesn't take very long */ } /* all threads done */ printf("All done!\n"); return 0; } Guided Schedules
#define THREADS 4 #define N 16 int main ( ) { int i; #pragma omp parallel for schedule(guided) num_threads(THREADS) for (i = 0; i < N; i++) { /* wait for i seconds */ sleep(i); printf("Thread %d has completed iteration %d.\n", omp_get_thread_num( ), i); } /* all threads done */ printf("All done!\n"); return0; } How does our program with iterations that take different amounts of time perform with guided scheduling?
OpenMP for automatically splits for loop iterations for us. But, depending on our program, the default behavior may not be ideal. Conclusion For loops where each iteration takes roughly equal time, static schedules work best, as they have little overhead.
For loops where each iteration can take very different amounts of time, dynamic schedules, work best as the work will be split more evenly across threads. Scheduled Conclusion Specifying chunks, or using a guided schedule provide a trade-off (uma alternativa) between the two. Choosing the best schedule depends on understanding your loop.
2 problemas: inicializao + valor final! int soma = 0 ; #pragma omp parallel for schedule(static) private(soma) for (i=0 ; i < 10000 ; i++) soma += a[i]; printf( Terminado soma = %d , soma); Private
int soma = 0 ; #pragma omp parallel for schedule(static) #pragma omp firstprivate(soma) lastprivate(soma) for (i=0 ; i < 10000 ; i++) soma += a[i]; printf( Terminado ); Resolveu o problema da inicializa o e do fim!
Especifica que cada thread deve ter sua pr pria inst ncia de uma vari vel, e que a vari vel deve ser inicializada com o valor da vari vel, pois ela existe antes da constru o parallel. Firstprivate
Especifica que o contexto delimitador da vari vel definida igual vers o particular de qualquer threadque executa a itera o final (constru o de loop). Lastprivate
Especifica que uma ou mais variveis que s o particulares a cada thread s o o assunto de uma opera o de redu o no final da regi o paralela. reduction
usada para operaes tipo all-to-one: exemplo: op= + cada thread ter uma c pia da(s) vari vel(is) definidas em list com a devida inicializa o; reduction (op: list); ela efetuar a soma local com sua c pia; ao sair da se o paralela, as somas locais ser o automaticamente adicionadas na variavelglobal.
#include #define NUM_THREADS 4 voidmain( ) { int i, tmp, res = 0; #pragma omp parallel for reduction(+:res) private(tmp) for (i=0 ; i< 10000 ; i++) { tmp = Calculo( ); res += tmp ; } printf( O resultado vale %d , res) ; } Obs: os ndices de la os sempre s o privados.
nowait -Substitui a barreira implcita em uma diretiva. #include <stdio.h> #define SIZE 5 void test( int *a, int *b, int *c, int size ) { int i; #pragma omp parallel { #pragma omp for nowait for (i = 0; i < size; i++) nowait b[i] = a[i] * a[i]; #pragma ompfor nowait for (i = 0; i < size; i++) c[i] = a[i]/2; } } Se houver v rios loops independentes dentro de uma regi o paralela, voc pode usar o nowait para evitar a barreira impl cita no final do for, da seguinte maneira: ... ... ...
#pragmaomp [parallel] sections[clauses] { #pragmaompsection { code_block} } Parallel sections
Pode-se usar ompsection quando no se usam laos: OMP SECTIONS #pragmaompparallel #pragmaompsections { Calculo1( ); #pragmaompsection Calculo2( ); #pragmaompsection Calculo3( ); } As se es s o distribu das entre as diferentes threads. Cada se o tem uma l gica diferente (threads diferentes) .