Storing VectorsDivide vector elements among processes
Replicate vector elements
Vector replication acceptable because vectors have only n elements, versus n2 elements in matrices
Parallel Programmingin C with MPI and OpenMP Michael J. Quinn
Chapter 8 Matrix-vector Multiplication
Chapter Objectives
Review matrix-vector multiplicaiton
Propose replication of vectors
Develop three parallel programs, each based on a different data decomposition
Outline
Sequential algorithm and its complexity
Design, analysis, and implementation of three parallel programs
Rowwise block striped
Columnwise block striped
Checkerboard block
Divide vector elements among processes
Replicate vector elements
Vector replication acceptable because vectors have only n elements, versus n2 elements in matrices
Rowwise Block Striped Matrix
Partitioning through domain decomposition
Primitive task associated with
Row of matrix
Entire vector
Phases of Parallel Algorithm
Row i of A b Row i of A b ci Inner product computation Row i of A b c All-gather communication
Agglomeration and Mapping
Static number of tasks
Regular communication pattern (all-gather)
Computation time per task is constant
Strategy:
Agglomerate groups of rows
Create one task per MPI process
Complexity Analysis
Sequential algorithm complexity: (n2)
Parallel algorithm computational complexity: (n2/p)
Communication complexity of all-gather: (log p + n)
Overall complexity: (n2/p + log p)
Isoefficiency Analysis
Sequential time complexity: (n2)
Only parallel overhead is all-gather
When n is large, message transmission time dominates message latency
Parallel communication time: (n)
n2 Cpn n Cp and M(n) = n2
System is not highly scalable
Block-to-replicated Transformation
MPI_Allgatherv
MPI_Allgatherv
int MPI_Allgatherv (
void *send_buffer,
int send_cnt,
MPI_Datatype send_type,
void *receive_buffer,
int *receive_cnt,
int *receive_disp,
MPI_Datatype receive_type,
MPI_Comm communicator)
MPI_Allgatherv in Action
Function create_mixed_xfer_arrays
First array
How many elements contributed by each process
Uses utility macro BLOCK_SIZE
Second array
Starting position of each process’ block
Assume blocks in process rank order
Function replicate_block_vector
Create space for entire vector
Create “mixed transfer” arrays
Call MPI_Allgatherv
Function read_replicated_vector
Process p-1
Opens file
Reads vector length
Broadcast vector length (root process = p-1)
Allocate space for vector
Process p-1 reads vector, closes file
Broadcast vector
Function print_replicated_vector
Process 0 prints vector
Exact call to printf depends on value of parameter datatype
Run-time Expression
: inner product loop iteration time
Computational time: nn/p
All-gather requires log p messages with latency
Total vector elements transmitted:(2log p -1) / 2log p
Total execution time: nn/p + log p + (2log p -1) / (2log p )
Comments