Introduction to MPI Basics
Message Passing Interface (MPI) is an industrial standard API for communication, essential in developing scalable and portable message passing programs for distributed memory systems. MPI execution model revolves around coordinating processes with separate address spaces. The data model involves partitioning and exchanging information between processes for solving large problems. While the MPI specification is both simple and complex, one can start utilizing it effectively after learning just six basic MPI routines.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Programming distributed memory systems: Message Passing Interface (MPI) Introduction to MPI Basic MPI functions Most of the MPI materials are obtained from William Gropp and Rusty Lusk s MPI tutorial at https://www.mcs.anl.gov/research/projects/mpi/tutorial/ MPI standard: http://www.mpi-forum.org
Message Passing Interface (MPI) In order for processes in a distributed memory system to coordinate (for solving a single problem), the processes need to exchange data as well as to synchronize MPI is an API for communication. MPI is an industrial standard that specifies library routines needed for writing message passing programs. o Mainly communication routines o Also include other features such as topology. MPI allows the development of scalable portable message passing programs. o It is a standard supported by everybody in the field. o If one wants to use more than one node to solve problems, MPI will be used most likely.
MPI MPI uses a library approach to support parallel programming. oMPI specifies the API for message passing (communication related routines) oMPI program = C/Fortran program + MPI communication calls. oMPI programs are compiled with a regular compiler(e.g gcc) and linked with an mpi library.
MPI execution model The unit of MPI execution is process: a process is a program counter and address space. o A process may have multiple threads o MPI is for communication among processes that have separate address spaces Separate (collaborative) MPI processes execute and coordinate o mpirun hostfile hostfile1 np 16 ./a.out The same ./a.out is executed on 16 processes. o Single program multiple data (SPMD) o Different from the OpenMP s fork-join model. What about the sequential portion of an application? o Interprocess communication consists of Synchronization Moving data from one process s address space to another s
MPI data model No shared memory. Using explicit communications whenever information needs to be exchanged between processes. Solve large problems? oLogically partition the large array and distribute the large array into local (smaller) arrays in multiple processes. Each process contains a small array. The arrays across all processes logically form the whole domain.
MPI MPI specification is both simple and complex. oAlmost all MPI programs can be realized with six MPI routines. oMPI has a total of more than 100 functions and a lot of concepts. oWe will mainly discuss the simple MPI, but we will also give a glimpse of the complex MPI. MPI is about just the right size. oOne has the flexibility when it is required. oOne can start using it after learning the six routines.
MPI hello world program (lect20/example1.c) mpi.h contains MPI routine prototypes and data types #include mpi.h #include <stdio.h> int main( int argc, char *argv[] ) { MPI_Init( &argc, &argv ); printf( "Hello world." ); MPI_Finalize(); return 0; } An MPI program always starts with MPI_Init An MPI program always finishes with MPI_Finalize MPI routines are library functions that can be used in the regular C/C++, Fortran code. Compiled with the right library
Compiling, linking and running MPI programs Open MPI has been installed on linprog To compile example2.c o mpicc example2.c o mpicc calls gcc with correct libraries To run a MPI program, do the following: o mpirun -hostfile hostfile1 -np 16 ./a.out o The content of hostfile1 is: linprog1 slots=12 linprog2 slots=12 o see lect20/hostfile1 and lect20/hostfile2 to hostfile examples linprog1 with at most 12 processes linprog2 with at most 12 processes Only linprog1, linprog2, and linprog3 work at this time. The system administrators are looking into the issues.
Login without typing password between linprog nodes Key based authentication Password based authentication is inconvenient at times Remote system management Starting a remote program (starting many MPI processes!) Key based authentication allows login without typing the password. Key based authentication with ssh in UNIX Remote ssh from machine A to machine B Step 1: at machine A: ssh-keygen t rsa (do not enter any pass phrase, just keep typing enter ) Step 2: append A:.ssh/id_rsa.pub to B:.ssh/authorized_keys
MPI uses the SPMD model all processes run ./a.out How to make different processes do different things (MIMD functionality)? o Need to know the execution environment: Can usually decide what to do based on the number of processes (nprocs) on this job and the process id (myid). How many processes are working on this problem? MPI_Comm_size What is myid? MPI_Comm_rank Rank is with respect to a communicator (group and context of the communication). MPI_COMM_WORLD is a predefined communicator that includes all processes (already mapped to processors). See lect20/example2.c o Nprocs and myid are often used to derive the mapping between local array indices to the logical global array indices.
A better MPI hello world program (lect20/example2.c) #include mpi.h #include <stdio.h> int main( int argc, char *argv[] ) { int myrank, size MPI_Init( &argc, &argv ); MPI_Comm_size(MPI_COMM_WORLD, &size); MPI_Comm_rank(MPI_COMM_WORLD, &myrank); printf( "Hello world. I am %d of %d. , myrank, size ); MPI_Finalize(); return 0; }
Cooperative Operations for Communication MPI has two types of communications: cooperative and one-sided Cooperative operations for communications: both sender and receiver are explicitly involved. o Sender explicitly sends the data and receiver explicitly receives the data. oChanges to the receiver s memory is made with the receiver s explicit instruction. o Communication and synchronization are combined: implicit ordering send happens before receive in the following example. Process 0 Process 1 Send (data) Receive (data)
One-sided Operations for Communication One-sided operations between processes include remote memory reads and writes. Only one process needs to explicitly participate The communication and synchronization are decoupled: no order between the put and get in the following example. Process 0 Process 1 Put (data) (data1 mem) (P1 memory) Get (data1)
MPI basic Send/Recv We need to fill in the details in Process 0 Process 1 Send (data) Receive (data) Things that need specifying: o How will data be described? o How will processes (sender and receiver) be identified? o How will the receiver recognize the message? o What will it mean for the operations to complete?
Identifying the sender and the receiver MPI Processes are collected into groups. Each message is sent in a context, and must be received in the same context. The group and the context together form a communicator A process is identified by its rank in the group associated with a communicator. There is a default communicator, MPI_COMM_WORLD, whose group contains all initial processors. Identifying sender and receiver: a rank within a communicator o Example: I am going to send to rank myid+1 in MPI_COMM_WORLD.
MPI Datatypes The data in a message to sent or received is described by a triple (starting_address, count, datatype) o Example: (&a, 1000000, MPI_CHAR) 100000 characters in array a[]. An MPI datatype is recursively defined as: o Predefined, corresponding to a data type from the language (e.g. MPI_INT, MPI_DOUBLE_PRECISION). o A contiguous array of MPI datatypes o A strided block of datatypes o An indexed array of blocks of datatypes o An arbitrary structure of datatypes There are MPI functions to construct custom datatypes, such as an array of (int, float) pairs or a row of a matrix stored columnwise.
MPI Tags Messages are sent with an accompanying user-defined integer tag to assist the receiving process in identifying the message. Message can be screened at the receiving end by specifying a specific tag, or not screened by specifying MPI_ANY_TAG as the tag in a receive.
Put it togather: MPI_Send MPI_Send(start, count, datatype, dest, tag, comm) o Example: sends one integer to rank 1 in MPI_COMM_WORLD with tag 100 MPI_Send(&var, 1, MPI_INT, 1, 100, MPI_COMM_WORLD) The message to be sent is described by (start, count, datatype) The receiver is specified by (dest, comm) The message will be received by the receiver when it is looking to receive a message with tag 100 When the function returns, the data has been delivered to the system and the data buffer can be reused. This does not implied that the message has been received: the message may or may not be received.
Put it togather: MPI_Recv MPI_Recv(start, count, datatype, source, tag, comm, status) o Example: receives one integer from rank 5 in MPI_COMM_WORLD with tag 100 into variable var. MPI_Recv(&var, 1, MPI_INT, 5, 100, MPI_COMM_WORLD, &status) Wait until a matching (on source and tag) message is received from the system, put the data into buffer specified by (start, count, datatype) The sender is specified by (source, comm), it can also be MPI_ANY_SOURCE Status contains further information about the communication (e.g. actual data size received). The count may not match the sender s data count. It is ok to receive fewer data, but receiving more count data is an error. Note that the actual data received is determined by the matching MPI_Send.
MPI is simple Many parallel programs can be written using six MPI functions: oMPI_Init oMPI_Finalize oMPI_Comm_size oMPI_Comm_rank oMPI_Send oMPI_Recv
MPI PI program (lect20/pi_mpi.c) 1 0 . 4 n = i = PI Lim 5 . 0 5 . 0 i i n n + + 1 1 ( * ) 1 n n MPI_Comm_size(MPI_COMM_WORLD, &numprocs); MPI_Comm_rank(MPI_COMM_WORLD, &myid); H = 1.0 / (DOUBLE) N; SUM = 0.0; h = 1.0 / (double) n; sum = 0.0; for (i = myid + 1; i <= n; i += numprocs) { x = h * ((double)i - 0.5); sum += 4.0 / (1.0 + x*x); } mypi = h * sum; FOR (I = 1; I <= N; I++) { X = H * ((DOUBLE)I - 0.5); SUM += 4.0 / (1.0 + X*X); } MYPI = H * SUM; if (myid == 0) { for (i=1; i<numprocs; i++) { MPI_Recv(&tmp, 1, MPI_DOUBLE, i, 0, MPI_COMM_WORLD, &status); mypi += tmp; } } else MPI_Send(&mypi, 1, MPI_DOUBLE, 0, 0, MPI_COMM_WORLD); /* see pi_mpi.c */