Search directories, subdirectories for documents (look for .html, .txt, .tex, etc.)
Using a dictionary of key words, create a profile vector for each document
Store profile vectors
Document Classification Problem
Data Dependence Graph (1)
Partitioning and Communication
Most time spent reading documents and generating profile vectors
Create two primitive tasks for each document
Data Dependence Graph (2)
Agglomeration and Mapping
Number of tasks not known at compile time
Tasks do not communicate with each other
Time needed to perform tasks varies widely
Strategy: map tasks to processes at run time
Manager/worker-style Algorithm
Can also be viewed as domain partitioning
with run-time allocation of data to tasks
Manager/Worker vs. SPMD
SPMD (single program multiple data)
Every process executes same functions
Our prior programs fit this mold
Manager/worker
Manager process has different responsibilities than worker processes
An MPI manager/worker program has an early control flow split (manager process one way, worker processes the other way)
Roles of Manager and Workers
Manager Pseudocode
Identify documents
Receive dictionary size from worker 0
Allocate matrix to store document vectors
repeat
Receive message from worker
if message contains document vector
Store document vector
endif
if documents remain then Send worker file name
else Send worker termination message
endif
until all workers terminated
Write document vectors to file
Worker Pseudocode
Send first request for work to manager
if worker 0 then
Read dictionary from file
endif
Broadcast dictionary among workers
Build hash table from dictionary
if worker 0 then
Send dictionary size to manager
endif
repeat
Receive file name from manager
if file name is NULL then terminate endif
Read document, generate document vector
Send document vector to manager
forever
Task/Channel Graph
MPI_Abort
A “quick and dirty” way for one process to terminate all processes in a specified communicator
Example use: If manager cannot allocate memory needed to store document profile vectors
Header for MPI_Abort
int MPI_Abort (
MPI_Comm comm, /* Communicator */
int error_code) /* Value returned to
calling environment */
Creating a Workers-only Communicator
Dictionary is broadcast among workers
To support workers-only broadcast, need workers-only communicator
Can use MPI_Comm_split
Manager passes MPI_UNDEFINED as the value of split_key, meaning it will not be part of any new communicator
Comments