Newest Viewed Downloaded

Sources of Parallelism and Locality in Simulation

Sources of Parallelism and Locality in Simulation

Continuous Variables, Continuous Parameters

Examples of such systems include Parabolic (time-dependent) problems: Heat flow: Temperature(position, time) Diffusion: Concentration(position, time) Elliptic (steady state) problems: Electrostatic or Gravitational Potential: Potential(position) Hyperbolic problems (waves): Quantum mechanics: Wave-function(position,time) Many problems combine features of above Fluid flow: Velocity,Pressure,Density(position,time) Elasticity: Stress,Strain(position,time)

Terminology

Term hyperbolic, parabolic, elliptic, come from special cases of the general form of a second order linear PDE a*d2u/dx + b*d2u/dxdy + c*d2u/dy2 + d*du/dx + e*du/dy + f = 0 where y is time Analog to solutions of general quadratic equation a*x2 + b*xy + c*y2 + d*x + e*y + f

Example: Deriving the Heat Equation

0 1 x x+h As h  0, we get the heat equation: d u(x,t) (u(x-h,t)-u(x,t))/h - (u(x,t)- u(x+h,t))/h dt h = C * d u(x,t) d2 u(x,t) dt dx2 = C * x-h Consider a simple problem A bar of uniform material, insulated except at ends Let u(x,t) be the temperature at position x at time t Heat travels from x-h to x+h at rate proportional to: Equation comes from experimental observation (Fourier’s Law) C is the conductivity of the bar

Details of the Explicit Method for Heat

From experimentation (physical observation) we have: d u(x,t) /d t = d 2 u(x,t)/dx (assume C = 1 for simplicity) Discretize time and space and use explicit approach (as described for ODEs) to approximate derivative: (u(x,t+1) – u(x,t))/dt = (u(x-h,t) – 2*u(x,t) + u(x+h,t))/h2 u(x,t+1) – u(x,t)) = dt/h2 * (u(x-h,t) - 2*u(x,t) + u(x+h,t)) u(x,t+1) = u(x,t)+ dt/h2 *(u(x-h,t) – 2*u(x,t) + u(x+h,t)) Let z = dt/h2 u(x,t+1) = z* u(x-h,t) + (1-2z)*u(x,t) + z+u(x+h,t) By changing variables (x to j and y to i): u[j,i+1]= z*u[j-1,i]+ (1-2*z)*u[j,i]+ z*u[j+1,i]

Explicit Solution of the Heat Equation

t=5 t=4 t=3 t=2 t=1 t=0 u[0,0] u[1,0] u[2,0] u[3,0] u[4,0] u[5,0] For j=0 to N u[j,i+1]= z*u[j-1,i]+ (1-2*z)*u[j,i]+ z*u[j+1,i] where z = dt/h2 Use finite differences with u[j,i] as the heat at time t= i*dt (i = 0,1,2,…) and position x = j*h (j=0,1,…,N=1/h) initial conditions on u[j,0] boundary conditions on u[0,i] and u[N,i] At each timestep i = 0,1,2,... This corresponds to matrix vector multiply nearest neighbors on grid U[j,I+1] =

Matrix View of Explicit Method for Heat

1-2z z z 1-2z z z 1-2z z z 1-2z z z 1-2z T = 1-2z z z Graph and “3 point stencil” Multiplying by a tridiagonal matrix at each step For a 2D mesh (5 point stencil) the matrix is pentadiagonal More on the matrix/grid views later

Parallelism in Explicit Method for PDEs

Partitioning the space (x) into p largest chunks good load balance (assuming large number of points relative to p) minimized communication (only p chunks) Generalizes to multiple dimensions. arbitrary graphs (= arbitrary sparse matrices). Explicit approach often used for hyperbolic equations Problem with explicit approach for heat (parabolic): numerical instability. solution blows up eventually if z = dt/h2 > .5 need to make the time steps very small when h is small: dt < .5*h2

Instability in Solving the Heat Equation Explicitly

Note: stability of the algorithm has nothing to do with floating point round-off

Implicit Solution of the Heat Equation

2 -1 -1 2 -1 -1 2 -1 -1 2 -1 -1 2 L = From experimentation (physical observation) we have: d u(x,t) /d t = d 2 u(x,t)/dx (assume C = 1 for simplicity) Discretize time and space and use implicit approach (backward Euler) to approximate derivative: (u(x,t+1) – u(x,t))/dt = (u(x-h,t+1) – 2*u(x,t+1) + u(x+h,t+1))/h2 u(x,t) = u(x,t+1)+ dt/h2 *(u(x-h,t+1) – 2*u(x,t+1) + u(x+h,t+1)) Let z = dt/h2 and change variables (t to j and x to i) u(:,i) = (I - z *L)* u(:, i+1) Where I is identity and L is Laplacian

Implicit Solution of the Heat Equation

(I + (z/2)*L) * u[:,i+1]= (I - (z/2)*L) *u[:,i] 2 -1 -1 2 -1 -1 2 -1 -1 2 -1 -1 2 L = 2 -1 -1 Graph and “stencil” The previous slide used Backwards Euler, but using the trapezoidal rule gives better numerical properties. This turns into solving the following equation: Again I is the identity matrix and L is: This is essentially solving Poisson’s equation in 1D

2D Implicit Method

4 -1 -1 -1 4 -1 -1 -1 4 -1 -1 4 -1 -1 -1 -1 4 -1 -1 -1 -1 4 -1 -1 4 -1 -1 -1 4 -1 -1 -1 4 L = 4 -1 -1 -1 -1 Graph and “5 point stencil” 3D case is analogous (7 point stencil) Similar to the 1D case, but the matrix L is now Multiplying by this matrix (as in the explicit case) is simply nearest neighbor computation on 2D grid. To solve this system, there are several techniques.

Relation of Poisson to Gravity, Electrostatics

Poisson equation arises in many problems E.g., force on particle at (x,y,z) due to particle at 0 is -(x,y,z)/r3, where r = sqrt(x2 +y2 +z2 ) Force is also gradient of potential V = -1/r = -(d/dx V, d/dy V, d/dz V) = -grad V V satisfies Poisson’s equation (try working this out!)

Algorithms for 2D Poisson Equation (N vars)

2 2 2 Algorithm Serial PRAM Memory #Procs Dense LU N3 N N2 N2 Band LU N2 N N3/2 N Jacobi N2 N N N Explicit Inv. N log N N N Conj.Grad. N 3/2 N 1/2 *log N N N RB SOR N 3/2 N 1/2 N N Sparse LU N 3/2 N 1/2 N*log N N FFT N*log N log N N N Multigrid N log2 N N N Lower bound N log N N PRAM is an idealized parallel model with zero cost communication Reference: James Demmel, Applied Numerical Linear Algebra, SIAM, 1997.

Overview of Algorithms

Sorted in two orders (roughly): from slowest to fastest on sequential machines. from most general (works on any matrix) to most specialized (works on matrices “like” T). Dense LU: Gaussian elimination; works on any N-by-N matrix. Band LU: Exploits the fact that T is nonzero only on sqrt(N) diagonals nearest main diagonal. Jacobi: Essentially does matrix-vector multiply by T in inner loop of iterative algorithm. Explicit Inverse: Assume we want to solve many systems with T, so we can precompute and store inv(T) “for free”, and just multiply by it (but still expensive). Conjugate Gradient: Uses matrix-vector multiplication, like Jacobi, but exploits mathematical properties of T that Jacobi does not. Red-Black SOR (successive over-relaxation): Variation of Jacobi that exploits yet different mathematical properties of T. Used in multigrid schemes. LU: Gaussian elimination exploiting particular zero structure of T. FFT (fast Fourier transform): Works only on matrices very like T. Multigrid: Also works on matrices like T, that come from elliptic PDEs. Lower Bound: Serial (time to print answer); parallel (time to combine N inputs). Details in class notes and www.cs.berkeley.edu/~demmel/ma221.

Mflop/s Versus Run Time in Practice

Problem: Iterative solver for a convection-diffusion problem; run on a 1024-CPU NCUBE-2. Reference: Shadid and Tuminaro, SIAM Parallel Processing Conference, March 1991. Solver Flops CPU Time Mflop/s Jacobi 3.82x1012 2124 1800 Gauss-Seidel 1.21x1012 885 1365 Least Squares 2.59x1011 185 1400 Multigrid 2.13x109 7 318 Which solver would you select?

Summary of Approaches to Solving PDEs

As with ODEs, either explicit or implicit approaches are possible Explicit, sparse matrix-vector multiplication Implicit, sparse matrix solve at each step Direct solvers are hard (more on this later) Iterative solves turn into sparse matrix-vector multiplication Grid and sparse matrix correspondence: Sparse matrix-vector multiplication is nearest neighbor “averaging” on the underlying mesh Not all nearest neighbor computations have the same efficiency Factors are the mesh structure (nonzero structure) and the number of Flops per point.

Comments on practical meshes

Regular 1D, 2D, 3D meshes Important as building blocks for more complicated meshes Practical meshes are often irregular Composite meshes, consisting of multiple “bent” regular meshes joined at edges Unstructured meshes, with arbitrary mesh points and connectivities Adaptive meshes, which change resolution during solution process to put computational effort where needed

Parallelism in Regular meshes

Implemented using “ghost” regions. Adds memory overhead Computing a Stencil on a regular mesh need to communicate mesh points near boundary to neighboring processors. Often done with ghost regions Surface-to-volume ratio keeps communication down, but Still may be problematic in practice

Showing 1 - 20 of 34 items Details

Name: 
7pde
Author: 
N/A
Company: 
N/A
Description: 
Sources of Parallelism and Locality in Simulation
Tags: 
mesh | matrix | time | parallel | jacobi | vector | equat | explicit
Created: 
1/20/1997 7:06:50 AM
Slides: 
34
Views: 
3
Downloads: 
0
Rating: 
0


> Comment



Share this presentation
|

Comments

Share this presentation:

|
Sitemap