CALCOLO SCIENTIFICO (PARALLELO) Prof. Luca F. Pavarino
Dipartimento di Matematica
Universita` di Milano
a.a. 2005-2006
pavarino@mat.unimi.it, http://www.mat.unimi.it/~pavarino Corso di Laurea Magistrale e Dottorati in Matematica Applicata
CALCOLO SCIENTIFICO (PARALLELO) Prof. Luca F. Pavarino
Dipartimento di Matematica
Universita` di Milano
a.a. 2005-2006
pavarino@mat.unimi.it, http://www.mat.unimi.it/~pavarino Corso di Laurea Magistrale e Dottorati in Matematica Applicata
Struttura del corso
hardware software Orario
Lunedi` 10.30-12.30 Aula 2
Martedi` 14.30-12.30 Aula 2
Giovedi` 13.30-14.30 Aula 8
13 settimane (66 ore = 40 lezione + 26 laboratorio)
Laboratorio in Aula 2: esercitazioni con
- Nostro Cluster Linux (ulisse.mat.unimi.it), 72 processori
- Cluster Linux del Cilea (avogadro.cilea.it), 256 processori
(IBM SP5 del Cineca (sp5.sp.cineca.it), 512 processori)
(Cluster Linux del Cineca (clx.cineca.it), 1024 processori)
Uso della libreria standard per “message passing” MPI
Uso della libreria parallela di calcolo scientifico PETSc dell’Argonne National Lab., basata su MPI
Materiale e Testi
Slides in inglese basate su corsi di calcolo parallelo tenuti a
UC Berkeley da Jim Demmel, MIT da Alan Edelmann,
Univ. Illinois da Michael Heath
Possibile testo: L. R. Scott, T. Clark, B. Bagheri,
Scientific Parallel Computing,
Princeton University Press, 2005
Tantissimo materiale on-line, e.g.:
- www-unix.mcs.anl.gov/dbpp/ (Ian Foster’s book)
www.cs.berkeley.edu/~demmel/ (Demmel’s course)
www-math.mit.edu/~edelman/ (Edelman’s course)
www.cse.uiuc.edu/~heath/ (Heath’s course)
www.cs.rit.edu/~ncs/parallel.html (Nan’s ref page)
Schedule of Topics
Introduction
Parallel architectures
Networks
Parallel Algorithm Design
Performance modeling: speedup, efficiency, scalability
Dependences
Parallel languages and programming
Collective operations, MPI (Message Passing Interface), PETSc
Parallel Linear Algebra: products, linear systems (direct and iterative methods), QR, eigenvalues, SVD
FFT, nonlinear equations, Ordinary Differential Equations
Particle methods
Partial Differential Equations, Domain Decomposition methods
Sorting
Grid and Distributed Computing, future trends (quantum computing, DNA computing,...)
1) Introduction
What is parallel computing
Large important problems require powerful computers
Why powerful computers must be parallel processors
Why writing (fast) parallel programs is hard
Principles of parallel computing performance
What is parallel computing
It is an example of parallel processing:
division of task (process) into smaller tasks (processes)
assign smaller tasks to multiple processing units that work simultaneously
coordinate, control and monitor the units
Many examples from nature:
human brain consists of ~10^11 neurons
complex living organisms consist of many cells (although monocellular organism are estimated to be ½ of the earth biomass)
leafs of trees ...
Many examples from daily life:
highways tollbooths
supermarket cashiers
building construction
written exams ...
Parallel computing is the use of multiple processors to execute different parts of the same program (task) simultaneously
Main goals of parallel computing are:
Increase the size of problems that can be solved
bigger problem would not be solvable on a serial computer in a reasonable amount of time decompose it into smaller problems
bigger problem might not fit in the memory of a serial computer distribute it over the memory of many computer nodes
Reduce the “wall-clock” time to solve a problem
Solve (much) bigger problems (much) faster
Subgoal: save money using cheapest available resources (clusters, beowulf, grid computing,...)
Why we need powerful computers
Simulation: The Third Pillar of Science
Traditional scientific and engineering paradigm:
Do theory or paper design.
Perform experiments or build system.
Limitations:
Too difficult -- build large wind tunnels.
Too expensive -- build a throw-away passenger jet.
Too slow -- wait for climate or galactic evolution.
Too dangerous -- weapons, drug design, climate experimentation.
Computational science paradigm:
Use high performance computer systems to simulate the phenomenon
Based on known physical laws and efficient numerical methods.
Some Particularly Challenging Computations
Science
Global climate modeling
Astrophysical modeling
Biology: Genome analysis; protein folding (drug design)
Medicine: cardiac modeling, physiology, neurosciences
Engineering
Crash simulation
Semiconductor design
Earthquake and structural modeling
Business
Financial and economic modeling
Transaction processing, web services and search engines
Defense
Nuclear weapons -- test by simulations
Cryptography
$5B World Market in Technical Computing
Source: IDC 2004, from NRC Future of Supercomputer Report
Units of Measure in HPC
High Performance Computing (HPC) units are:
Flops: floating point operations
Flops/s: floating point operations per second
Bytes: size of data (a double precision floating point number is 8)
Typical sizes are millions, billions, trillions…
Mega Mflop/s = 106 flop/sec Mbyte = 220 = 1048576 ~ 106 bytes
Giga Gflop/s = 109 flop/sec Gbyte = 230 ~ 109 bytes
Tera Tflop/s = 1012 flop/sec Tbyte = 240 ~ 1012 bytes
Peta Pflop/s = 1015 flop/sec Pbyte = 250 ~ 1015 bytes
Exa Eflop/s = 1018 flop/sec Ebyte = 260 ~ 1018 bytes
Zetta Zflop/s = 1021 flop/sec Zbyte = 270 ~ 1021 bytes
Yotta Yflop/s = 1024 flop/sec Ybyte = 280 ~ 1024 bytes
Add definition of flop, where are current machines (size speed)
Ex. 1: Global Climate Modeling Problem
Uses:
Predict major events, e.g., El Nino
Use in setting air emissions standards
Source: http://www.epm.ornl.gov/chammp/chammp.html Problem is to compute:
f(latitude, longitude, elevation, time)
temperature, pressure, humidity, wind velocity
Atmospheric model: equation of fluid dynamics
Navier-Stokes system of nonlinear partial differential equations
Approach:
Discretize the domain, e.g., a measurement point every 1km
Devise an algorithm to predict weather at time t+1 given t
Global Climate Modeling Computation
One piece is modeling the fluid flow in the atmosphere
Solve Navier-Stokes problem
Roughly 100 Flops per grid point with 1 minute timestep
Computational requirements:
To match real-time, need 5x 1011 flops in 60 seconds ~ 8 Gflop/s
Weather prediction (7 days in 24 hours) 56 Gflop/s
Climate prediction (50 years in 30 days) 4.8 Tflop/s
To use in policy negotiations (50 years in 12 hours) 288 Tflop/s
To double the grid resolution, computation is at least 8x
State of the art models require integration of atmosphere, ocean, sea-ice, land models, plus possibly carbon cycle, geochemistry and more
Current models are coarser than this
Climate Modeling on the Earth Simulator System Development of ES started in 1997 in order to make a comprehensive understanding of global environmental changes such as global warming. 26.58Tflops was obtained by a global atmospheric circulation code. 35.86Tflops (87.5% of the peak performance) is achieved in the Linpack benchmark. Its construction was completed at the end of February, 2002 and the practical operation started from March 1, 2002
Ex. 2: Cardiac simulation
Very difficult problem spanning many disciplines:
Electrophysiology (spreading of electrical excitation front)
Structural Mechanics (large deformation of incompressible biomaterial)
Fluid Dynamics (flow of blood inside the heart)
Large-scale simulations in computational electrophysiology (joint work with P. Colli-Franzone)
Bidomain model (system of 2 reaction-diffusion equations) coupled with Luo-Rudy 1 gating (system of 7 ODEs) in 3D
Q1 finite elements in space + adaptive semi-implicit method in time
Parallel solver based on PETSc library
Linear systems up to 36 M unknowns each time-step (128 procs of Cineca SP4) solved in seconds or minutes
Simulation of full heartbeat (4 M unknowns in space, thousands of time-steps) took more than 6 days on 25 procs of Cilea HP Superdome, now down to about 50 hours on 36 procs of our cluster
3D simulations: isochrones of acti, repo, APD
Activation and repolarization fronts
Hemodynamics in circulatory system (work in Quarteroni’s group)
Blood flow in the heart (work by Peskin’s group)
Modeled as an elastic structure in an incompressible fluid.
The “immersed boundary method” due to Peskin and McQueen.
20 years of development in model
Many applications other than the heart: blood clotting, inner ear, paper making, embryo growth, and others
Use a regularly spaced mesh (set of points) for evaluating the fluid
- Uses
Current model can be used to design artificial heart valves
Can help in understand effects of disease (leaky valves)
Related projects look at the behavior of the heart during a heart attack
Ultimately: real-time clinical work
Needs more features:
Electrical model of the heart, and details of muscles fibers,
Circulatory systems
Lungs This involves solving Navier-Stokes equations
64^3 was possible on Cray YMP, but 128^3 required for accurate model (would have taken 3 years).
Done on a Cray C90 -- 100x faster and 100x more memory
Until recently, limited to vector machines
Comments