Newest Viewed Downloaded

David E. Keyes Center for Computational Science Old Dominion University Institute for Computer Applications in Science & Engineering NASA Langley Research Center Institute for Scientific Computing Research Lawrence Livermore National Laboratory Algorithms for Cluster Computing Based on Domain Decomposition

David E. Keyes Center for Computational Science Old Dominion University Institute for Computer Applications in Science & Engineering NASA Langley Research Center Institute for Scientific Computing Research Lawrence Livermore National Laboratory Algorithms for Cluster Computing Based on Domain Decomposition

Plan of Presentation

Introduction (5) Imperative of domain decomposition and multilevel methods for terascale computing (5) Basic and “advanced” domain decomposition and multilevel algorithmic concepts (14) Vignettes of domain decomposition and multilevel performance (13) (with a little help from my friends…) Agenda for future research (6) Terascale Optimal PDE Simulations (TOPS) software project (14) Conclusions (4)

Terascale simulation has been “sold”

Environment global climate groundwater flow Lasers & Energy combustion ICF modeling Engineering structural dynamics electromagnetics Experiments expensive Chemistry & Bio materials modeling drug design Experiments controversial Applied Physics radiation transport hydrodynamics Experiments prohibited Scientific Simulation Experiments dangerous However, it is far from proven! To meet expectations, we need to handle problems of multiple physical scales.

Large platforms have been provided

‘97 ‘98 ‘99 ‘00 ‘01 ‘02 ‘03 ‘04 ‘05 ‘06 100+ Tflop / 30 TB Time (CY) Capability 1+ Tflop / 0.5 TB Plan Develop Use 30+ Tflop / 10 TB Red 3+ Tflop / 1.5 TB Blue 10+ Tflop / 4 TB White 50+ Tflop / 25 TB ASCI program of the U.S. DOE has roadmap to go to 100 Tflop/s by 2006 www.llnl.gov/asci/platforms Sandia Los Alamos Livermore Livermore

NSF’s 13.6 TF TeraGrid coming on line

26 24 8 4 HPSS 5 HPSS HPSS UniTree External Networks External Networks External Networks External Networks Site Resources Site Resources Site Resources Site Resources SDSC 4.1 TF 225 TB Caltech NCSA/PACI 8 TF 240 TB Argonne TeraGrid: NCSA, SDSC, Caltech, Argonne www.teragrid.org

Algorithmic requirements from architecture

T3E “horizontal” aspects “vertical” aspects message passing, shared memory threads register blocking, cache blocking, prefetching Must run on physically distributed memory units connected by message-passing network, each serving one or more processors with multiple levels of cache

Building platforms is the “easy” part

Domain decomposition also “natural” for software engineering Fortunate that mathematicians built up its theory in advance of requirements! Algorithms must be highly concurrent and straightforward to load balance latency tolerant cache friendly (good temporal and spatial locality) highly scalable (in the sense of convergence) Domain decomposed multilevel methods “natural” for all of these

Decomposition strategies for Lu=f in 

Consider the implicitly discretized parabolic case Operator decomposition Function space decomposition Domain decomposition

Operator decomposition

Iteration matrix consists of four multiplicative substeps per timestep two sparse matrix-vector multiplies two sets of unidirectional bandsolves Parallelism within each substep But global data exchanges between bandsolve substeps Consider ADI

Function space decomposition

System of ordinary differential equations Perhaps are diagonal matrices Perfect parallelism across spectral index But global data exchanges to transform back to physical variables at each step Consider a spectral Galerkin method

Domain decomposition

Solve by a Krylov method Matrix-vector multiplies with parallelism on each subdomain nearest-neighbor exchanges, global reductions possible small global system (not needed for parabolic case) = Consider restriction and extension operators for subdomains, , and for possible coarse grid, Replace discretized with

Comparison

(Of course, domain decomposition can be interpreted as a special operator or function space decomposition) Operator decomposition (ADI) natural row-based assignment requires all-to-all, bulk data exchanges in each step (for transpose) Function space decomposition (Fourier) Natural mode-based assignment requires all-to-all, bulk data exchanges in each step (for transform) Domain decomposition (Schwarz) Natural domain-based assignment requires local (nearest neighbor) data exchanges, global reductions, and optional small global problem

Theoretical Scaling of Domain Decomposition for Three Common Network Topologies

With tree-based (logarithmic) global reductions and scalable nearest neighbor hardware: optimal number of processors scales linearly with problem size (scalable, assumes one subdomain per processor) With 3D torus-based global reductions and scalable nearest neighbor hardware: optimal number of processors scales as three-fourths power of problem size (almost scalable) With common bus network (heavy contention): optimal number of processors scales as one-fourth power of problem size (not scalable) bad news for conventional Beowulf clusters, but see 2000 Bell Prize “price-performance awards” using multiple NICs per Beowulf node!

Basic Concepts

“Advanced” Concepts Polynomial combinations of Schwarz projections Schwarz-Schur combinations Neumann-Neumann/FETI (Schwarz on Schur) LNKS (Schwarz inside Schur) Nonlinear Schwarz Iterative correction Schwarz preconditioning Schur preconditioning

Iterative correction

Optimal polynomials of lead to various preconditioned Krylov methods Scale recurrence, e.g., with , leads to multilevel methods The most basic idea in iterative methods Evaluate residual accurately, but solve approximately, where is an approximate inverse to A A sequence of complementary solves can be used, e.g., with first and then one has

Multilevel Preconditioning

smoother Finest Grid First Coarse Grid coarser grid has fewer cells (less work & storage) Restriction transfer from fine to coarse grid Recursively apply this idea until we have an easy problem to solve A Multigrid V-cycle Prolongation transfer from coarse to fine grid

Schwarz Preconditioning

Let Boolean rectangular matrix extract the subset of : Let The Boolean matrices are gather/scatter operators, mapping between a global vector and its subdomain support Given A x = b , partition x into subvectors, corresp. to subdomains of the domain of the PDE, nonempty, possibly overlapping, whose union is all of the elements of

Iteration Count Estimates From the Schwarz Theory

Ο(P1/3) Ο(P1/3) 1-level Additive Schwarz Ο(1) Ο(1) 2-level Additive Schwarz Ο((NP)1/6) Ο((NP)1/4) Domain Jacobi (=0) Ο(N1/3) Ο(N1/2) Point Jacobi in 3D in 2D Preconditioning Type Krylov-Schwarz iterative methods typically converge in a number of iterations that scales as the square-root of the condition number of the Schwarz-preconditioned system In terms of N and P, where for d-dimensional isotropic problems, N=h-d and P=H-d, for mesh parameter h and subdomain diameter H, iteration counts may be estimated as follows:

Schur Preconditioning

Given a partition Condense: Let M be a good preconditioner for S Then is a preconditioner for A Moreover, solves with may be done approximately if all degrees of freedom are retained

Schwarz polynomials

This leads to “Hybrid II” in Smith, Bjorstad & Gropp: Polynomials of Schwarz projections that are combinations of additive and multiplicative may be appropriate for certain implementations We may solve the fine subdomains concurrently and follow with a coarse grid (redundantly/cooperatively)

Showing 1 - 20 of 68 items Details

Name: 
9pde_DD
Author: 
William Gropp
Company: 
Argonne National Labor...
Description: 
David E. Keyes Center for Computational Science Old Dominion University Institute for Computer Applications in Science & Engineering NASA Langley Research Center Institute for Scientific Computing Research Lawrence Livermore National Laboratory Algorithms for Cluster Computing Based on Domain Decomposition
Tags: 
schwarz | solver | decomposit | optim | parallel | newton | petsc | domain
Created: 
9/1/1999 11:44:59 AM
Slides: 
68
Views: 
8
Downloads: 
1
Rating: 
0


> Comment



Share this presentation
|

Comments

Share this presentation:

|
Sitemap