A message-passing library specification
extended message-passing model
not a language or compiler specification
not a specific implementation or product
For parallel computers, clusters, and heterogeneous networks
Designed to provide access to advanced parallel hardware for
end users
library writers
tool developers
Not designed for fault tolerance
History of MPI
MPI Forum: government, industry and academia.
Formal process began November 1992
Draft presented at Supercomputing 1993
Final standard (1.0) published May 1994
Clarifications (1.1) published June1995
MPI-2 process began April, 1995
MPI-1.2 finalized July 1997
MPI-2 finalized July 1997
Current status (MPI-1)
Public domain versions available from ANL/MSU (MPICH), OSC (LAM)
Proprietary versions available from all vendors
Portability is the key reason why MPI is important.
7 CS267 2/2000 Bill Saphir
MPI Overview
MPI covers
Point-to-point communication (send/receive)
Collective communication
Support for library development
MPI design goals
Portable
Provides access to fast hardware (user space/zero copy)
Based on existing practice (MPI-1)
MPI does not cover
Fault tolerance
Parallel/distributed operating system
8 CS267 2/2000 Bill Saphir
An MPI Application
An MPI application
The elements of the application are:
4 processes, numbered zero through three
Communication paths between them
The set of processes plus the communication channels is called
“MPI_COMM_WORLD”. More on the name later.
0
3 2
1
MPI Sources
The Standard itself is at: http://www.mpi-forum.org
All MPI official releases, in both postscript and HTML
Books:
Using MPI: Portable Parallel Programming with the Message-Passing Interface, by Gropp, Lusk, and Skjellum, MIT Press, 1994.
MPI: The Complete Reference, by Snir, Otto, Huss-Lederman, Walker, and Dongarra, MIT Press, 1996.
Designing and Building Parallel Programs, by Ian Foster, Addison-Wesley, 1995.
Parallel Programming with MPI, by Peter Pacheco, Morgan-Kaufmann, 1997.
MPI: The Complete Reference Vol 1 and 2,MIT Press, 1998(Fall).
Other information on Web:
http://www.mcs.anl.gov/mpi
Parallel Programming Overview
Basic parallel programming problems (for MPI):
Creating parallelism
SPMD Model
Communication between processors
Basic
Collective
Non-blocking
Synchronization
Point-to-point synchronization is done by message passing
Global synchronization done by collective communication
SPMD Model
Single Program Multiple Data model of programming:
Each processor has a copy of the same program
All run them at their own rate
May take different paths through the code
Process-specific control through variables like:
My process number
Total number of processors
Processors may synchronize, but none is implicit
Many people equate SPMD programming with Message Passing, but they shouldn’t
Summary of Basic Point-to-Point MPI
Many parallel programs can be written using just these six functions, only two of which are non-trivial:
MPI_INIT
MPI_FINALIZE
MPI_COMM_SIZE
MPI_COMM_RANK
MPI_SEND
MPI_RECV
Point-to-point (send/recv) isn’t the only way...
Hello World (Trivial)
MPI_Init( &argc, &argv ); MPI_Finalize(); A simple, but not very interesting SPMD Program.
To make this legal MPI, we need to add 2 lines.
#include "mpi.h"
#include
int main( int argc, char *argv[] )
{
printf( "Hello, world!\n" );
return 0;
}
Hello World (Independent Processes)
We can use MPI calls to get basic values for controlling processes
#include "mpi.h"
#include
int main( int argc, char *argv[] )
{
int rank, size;
MPI_Init( &argc, &argv );
MPI_Comm_rank( MPI_COMM_WORLD, &rank );
MPI_Comm_size( MPI_COMM_WORLD, &size );
printf( "I am %d of %d\n", rank, size );
MPI_Finalize();
return 0;
}
May print in any order.
MPI Basic Send/Receive
Process 0 Process 1 Send(data) Receive(data) We need to fill in the details in
Things that need specifying:
How will processes be identified?
How will “data” be described?
How will the receiver recognize/screen messages?
What will it mean for these operations to complete?
Identifying Processes: MPI Communicators
In general, processes can be subdivided into groups:
Group for each component of a model (chemistry, mechanics,…)
Group to work on a subdomain
Supported using a “communicator:” a message context and a group of processes
More on this later…
In a simple MPI program all processes do the same thing:
The set of all processes make up the “world”:
MPI_COMM_WORLD
Name processes by number (called “rank”)
Communicators
What is MPI_COMM_WORLD?
A communicator consists of:
A group of processes
Numbered 0 ... N-1
Never changes membership
A set of private communication channels between them
Message sent with one communicator cannot be received by another.
Implemented using hidden message tags
Why?
Enables development of safe libraries
Restricting communication to subgroups is useful
Point-to-Point Example
Process 0 sends array “A” to process 1 which receives it as “B”
1:
#define TAG 123
double A[10];
MPI_Send(A, 10, MPI_DOUBLE, 1,
TAG, MPI_COMM_WORLD)
2:
#define TAG 123
double B[10];
MPI_Recv(B, 10, MPI_DOUBLE, 0,
TAG, MPI_COMM_WORLD, &status)
or
MPI_Recv(B, 10, MPI_DOUBLE, MPI_ANY_SOURCE,
MPI_ANY_TAG, MPI_COMM_WORLD, &status)
Describing Data: MPI Datatypes
The data in a message to be sent or received is described by a triple (address, count, datatype), where
An MPI datatype is recursively defined as:
predefined, corresponding to a data type from the language (e.g., MPI_INT, MPI_DOUBLE_PRECISION)
a contiguous array of MPI datatypes
a strided block of datatypes
an indexed array of blocks of datatypes
an arbitrary structure of datatypes
There are MPI functions to construct custom datatypes, such an array of (int, float) pairs, or a row of a matrix stored columnwise.
Can’t the implementation just “send the bits?”
To support heterogeneous machines:
All data is labeled with a type
MPI implementation can support communication on heterogeneous machines without compiler support
I.e., between machines with very different memory representations (big/little endian, IEEE fp or others, etc.)
Simplifies programming for application-oriented layout:
Matrices in row/column
May improve performance:
reduces memory-to-memory copies in the implementation
allows the use of special hardware (scatter/gather) when available
Recognizing & Screening Messages: MPI Tags
Messages are sent with a user-defined integer tag:
Allows receiving process in identifying the message.
Receiver may also screen messages by specifying a tag.
Use MPI_ANY_TAG to avoid screening.
Tags are called “message types” in some non-MPI message passing systems.
Message Status
Status is a data structure allocated in the user’s program.
Especially useful with wild-cards to find out what matched:
int recvd_tag, recvd_from, recvd_count;
MPI_Status status;
MPI_Recv(..., MPI_ANY_SOURCE, MPI_ANY_TAG, ..., &status )
recvd_tag = status.MPI_TAG;
recvd_from = status.MPI_SOURCE;
MPI_Get_count( &status, datatype, &recvd_count );
MPI Basic (Blocking) Send
MPI_SEND (start, count, datatype, dest, tag, comm)
start: a pointer to the start of the data
count: the number of elements to be sent
datatype: the type of the data
dest: the rank of the destination process
tag: the tag on the message for matching
comm: the communicator to be used.
Completion: When this function returns, the data has been delivered to the “system” and the data structure (start…start+count) can be reused. The message may not have been received by the target process.
MPI Basic (Blocking) Receive
MPI_RECV(start, count, datatype, source, tag, comm, status)
start: a pointer to the start of the place to put data
count: the number of elements to be received
datatype: the type of the data
source: the rank of the sending process
tag: the tag on the message for matching
comm: the communicator to be used
status: place to put status information
Waits until a matching (on source and tag) message is received from the system, and the buffer can be used.
Receiving fewer than count occurrences of datatype is OK, but receiving more is an error.
Comments