evacuate now? faster-than-real-time shallow water...

37
Evacuate Now? Faster-than-real-time Shallow Water Simulations on GPUs NVIDIA GPU Technology Conference San Jose, California, 2010 André R. Brodtkorb

Upload: others

Post on 05-Aug-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Evacuate Now? Faster-than-real-time Shallow Water ...babrodtk.at.ifi.uio.no/files/publications/brodtkorb_gtc2010.pdf · Validation –Barrage de Malpasset South-east France near Fréjus

Evacuate Now?

Faster-than-real-time

Shallow Water Simulations on GPUs

NVIDIA GPU Technology Conference

San Jose, California, 2010

André R. Brodtkorb

Page 2: Evacuate Now? Faster-than-real-time Shallow Water ...babrodtk.at.ifi.uio.no/files/publications/brodtkorb_gtc2010.pdf · Validation –Barrage de Malpasset South-east France near Fréjus

ICT

Talk Outline

Introduction

Why Shallow Water Simulations?

The Shallow Water Equations

Numerical scheme

Our contribution

Simulator Implementation

Results including screen capture video

Live Demo on a standard Laptop

Summary

2

Learn how to simulate a half an hour dam break in 27 seconds

Page 3: Evacuate Now? Faster-than-real-time Shallow Water ...babrodtk.at.ifi.uio.no/files/publications/brodtkorb_gtc2010.pdf · Validation –Barrage de Malpasset South-east France near Fréjus

ICT

The Shallow Water Equations

First described by de Saint-Venant (1797-1886)

Gravity-induced fluid motion

2D free surface

Negligible vertical acceleration

Wave length much larger than depth

Conservation of mass and momentum

Not only for water:

Atmospheric flow

Avalanches

...

3Water image from http://freephoto.com / Ian Britton

Page 4: Evacuate Now? Faster-than-real-time Shallow Water ...babrodtk.at.ifi.uio.no/files/publications/brodtkorb_gtc2010.pdf · Validation –Barrage de Malpasset South-east France near Fréjus

ICT

Target application areas

4

Floods

2010: Pakistan (2000+)

Tsunamis

2004 Indian Ocean (230000)

Storm Surges

2005 Hurricane Katrina (1836)

Dam breaks

1959 Malpasset (423)

Images from wikipedia.org

Page 5: Evacuate Now? Faster-than-real-time Shallow Water ...babrodtk.at.ifi.uio.no/files/publications/brodtkorb_gtc2010.pdf · Validation –Barrage de Malpasset South-east France near Fréjus

ICT

Mathematical Formulation

5

Vector of

Conserved

variables

Flux FunctionsBed slope

source term

Bed friction

source term

Page 6: Evacuate Now? Faster-than-real-time Shallow Water ...babrodtk.at.ifi.uio.no/files/publications/brodtkorb_gtc2010.pdf · Validation –Barrage de Malpasset South-east France near Fréjus

ICT

The Shallow Water Equations

6

Water depth,

discharge (u), and

discharge (v)

Page 7: Evacuate Now? Faster-than-real-time Shallow Water ...babrodtk.at.ifi.uio.no/files/publications/brodtkorb_gtc2010.pdf · Validation –Barrage de Malpasset South-east France near Fréjus

ICT

Explicit Numerical Schemes

Hyperbolic partial differential equation

Enables explicit schemes

Accurate modeling of discontinuities / shocks

High accuracy in smooth parts

without oscillations near discontinuities

Capable of representing dry states

Negative water depths ruin simulations

7Images from wikipedia.org, James Kilfiger

Page 8: Evacuate Now? Faster-than-real-time Shallow Water ...babrodtk.at.ifi.uio.no/files/publications/brodtkorb_gtc2010.pdf · Validation –Barrage de Malpasset South-east France near Fréjus

ICT

Explicit Numerical Schemes

8

Additional wanted properties:

Second order accurate fluxes

Total variation diminishing

Well balancedness

Page 10: Evacuate Now? Faster-than-real-time Shallow Water ...babrodtk.at.ifi.uio.no/files/publications/brodtkorb_gtc2010.pdf · Validation –Barrage de Malpasset South-east France near Fréjus

ICT

Kurganov-Petrova – Spatial discretization

10

Write on vector form

Impose finite-volume grid

Rewrite in terms of w=h+B

Page 11: Evacuate Now? Faster-than-real-time Shallow Water ...babrodtk.at.ifi.uio.no/files/publications/brodtkorb_gtc2010.pdf · Validation –Barrage de Malpasset South-east France near Fréjus

ICT

Kurganov-Petrova – Finite Volume Grid

Q defined as cell averages

B defined as piecewise bilinear

F and G calculated across cell interfaces

Source terms, H, calculated as cell averages

11

Page 12: Evacuate Now? Faster-than-real-time Shallow Water ...babrodtk.at.ifi.uio.no/files/publications/brodtkorb_gtc2010.pdf · Validation –Barrage de Malpasset South-east France near Fréjus

ICT

Kurganov-Petrova – Flux calculations

12

Continuous variables Discrete variables

Dry states fix

Slope reconstruction

Integration pointsFlux calculation

Page 13: Evacuate Now? Faster-than-real-time Shallow Water ...babrodtk.at.ifi.uio.no/files/publications/brodtkorb_gtc2010.pdf · Validation –Barrage de Malpasset South-east France near Fréjus

ICT

Kurganov-Petrova – Temporal discretization

13

Gather all explicit terms

One ordinary differential equation in time per cell

Page 14: Evacuate Now? Faster-than-real-time Shallow Water ...babrodtk.at.ifi.uio.no/files/publications/brodtkorb_gtc2010.pdf · Validation –Barrage de Malpasset South-east France near Fréjus

ICT

Kurganov-Petrova – Temporal discretization

14

Discretize using second order Runge-Kutta

Total variation diminishing

Semi-implicit friction source term

Discretize in time

Page 15: Evacuate Now? Faster-than-real-time Shallow Water ...babrodtk.at.ifi.uio.no/files/publications/brodtkorb_gtc2010.pdf · Validation –Barrage de Malpasset South-east France near Fréjus

ICT

Kurganov-Petrova – CFL condition Explicit scheme, time step restriction:

Time step size restricted by a

Courant-Friedrichs-Lewy condition

The numerical domain of dependence must include

the domain of dependence of the equation

Each wave is allowed to travel at most one

quarter grid cell per time step

15

Mathematical

propagation speed

Space

Stable

Unstable

Tim

e

Page 16: Evacuate Now? Faster-than-real-time Shallow Water ...babrodtk.at.ifi.uio.no/files/publications/brodtkorb_gtc2010.pdf · Validation –Barrage de Malpasset South-east France near Fréjus

ICT

Kurganov-Petrova – Simulation cycle

16

3. Halfstep

1. Calculate fluxes

4. Calculate fluxes5. Evolve in time

6. Boundary

conditions

2. Calculate Dt

Page 17: Evacuate Now? Faster-than-real-time Shallow Water ...babrodtk.at.ifi.uio.no/files/publications/brodtkorb_gtc2010.pdf · Validation –Barrage de Malpasset South-east France near Fréjus

ICT

Implementation – GPU code

17

Step

Four CUDA kernels:

87% Flux

<1% Timestep size (CFL condition)

12% Forward euler step

<1% Set boundary conditions

Page 18: Evacuate Now? Faster-than-real-time Shallow Water ...babrodtk.at.ifi.uio.no/files/publications/brodtkorb_gtc2010.pdf · Validation –Barrage de Malpasset South-east France near Fréjus

ICT

Flux kernel – Domain decomposition

A nine-point nonlinear stencil

Comprised of simpler stencils

Heavy use of shmem

Computationally demanding

Traditional Block Decomposition

Overlaping ghost cells (aka. apron)

Global ghost cells for boundary conditions

Domain padding

18

Page 19: Evacuate Now? Faster-than-real-time Shallow Water ...babrodtk.at.ifi.uio.no/files/publications/brodtkorb_gtc2010.pdf · Validation –Barrage de Malpasset South-east France near Fréjus

ICT

Flux kernel – Block size

Block size is 16x14 Warp size: multiple of 32

Shared memory use: 16 shmem

buffers use ~16 KB

Occupancy

Use 48 KB shared mem, 16 KB cache

Three resident blocks

Trades cache for occupancy

Fermi cache

Global memory access

19

Page 20: Evacuate Now? Faster-than-real-time Shallow Water ...babrodtk.at.ifi.uio.no/files/publications/brodtkorb_gtc2010.pdf · Validation –Barrage de Malpasset South-east France near Fréjus

ICT

Flux kernel - computations

Calculations

Flux across north and east interface

Bed slope source term for the cell

Collective stencil operations

n threads, and n+1 interfaces one warp performs extra calculations!

Alternative is one thread per stencil operation

Many idle threads, and extra register pressure

20

Input Slopes Integration points Flux

Page 21: Evacuate Now? Faster-than-real-time Shallow Water ...babrodtk.at.ifi.uio.no/files/publications/brodtkorb_gtc2010.pdf · Validation –Barrage de Malpasset South-east France near Fréjus

ICT

Flux kernel – flux limiter

Limits the fluxes to obtain

non-oscillatory solution

Generalized minmod limiter

Least steep slope, or

Zero if signs differ

Creates divergent code paths

Use branchless implementation (2007)

Requires special sign function

Much faster than naïve approach

21

(2007) T. Hagen, M. Henriksen, J. Hjelmervik, and K.-A. Lie.

How to solve systems of conservation laws numerically using the graphics processor as a high-performance computational engine.

Geometrical Modeling, Numerical Simulation, and Optimization: Industrial Mathematics at SINTEF, (211–264). Springer Verlag, 2007.

float minmod(float a, float b, float c) {

return 0.25f

*sign(a)

*(sign(a) + sign(b))

*(sign(b) + sign(c))

*min( min(abs(a), abs(b)), abs(c) );

}

Page 22: Evacuate Now? Faster-than-real-time Shallow Water ...babrodtk.at.ifi.uio.no/files/publications/brodtkorb_gtc2010.pdf · Validation –Barrage de Malpasset South-east France near Fréjus

ICT

Timestep size kernel

Flux kernel calculates wave speed per cell

Find global maximum

Calculate timestep using the CFL condition

Parallel reduction:

Models CUDA SDK sample

Template code

Fully coalesced reads

Without bank conflicts

Optimization

Perform partial reduction in flux kernel

Reduces memory and bandwidth

by a factor 192

22Image from ”Optimizing Parallel Reduction in CUDA”, Mark Harris

16x14 1

Page 23: Evacuate Now? Faster-than-real-time Shallow Water ...babrodtk.at.ifi.uio.no/files/publications/brodtkorb_gtc2010.pdf · Validation –Barrage de Malpasset South-east France near Fréjus

ICT

Time integration kernel

Computes Q* or Qn+1

Solves the time-ODE per cell

”Trivial” to implement

Fully coalesced memory access

Memory bound

23

Page 24: Evacuate Now? Faster-than-real-time Shallow Water ...babrodtk.at.ifi.uio.no/files/publications/brodtkorb_gtc2010.pdf · Validation –Barrage de Malpasset South-east France near Fréjus

ICT

Boundary conditions kernel

Global boundary uses ghost cells

Fixed inlet / outlet discharge

Fixed depth

Reflecting

Outflow/Absorbing

Currently no mixed boundaries

Can also supply hydrograph

Tsunamies

Storm surges

Tidal waves

24

Global boundary

Local ghost cells

3.5m Tsunami, 1h 10m Storm Surge, 4d

Page 25: Evacuate Now? Faster-than-real-time Shallow Water ...babrodtk.at.ifi.uio.no/files/publications/brodtkorb_gtc2010.pdf · Validation –Barrage de Malpasset South-east France near Fréjus

ICT

Boundary conditions kernel

25

Similar to CUDA SDK reduction sample, using templates:

One block sets all four boundaries

Boundary length (>64, >128, >256, >512)

Boundary type (”none”, reflecting, fixed depth, fixed discharge, absorbing outlet)

In total: 4*5*5*5*5 = 2500 realizations

switch(block.x) {

case 512: BCKernelLauncher<512, N, S, E, W>(grid, block, stream); break;

case 256: BCKernelLauncher<256, N, S, E, W>(grid, block, stream); break;

case 128: BCKernelLauncher<128, N, S, E, W>(grid, block, stream); break;

case 64: BCKernelLauncher< 64, N, S, E, W>(grid, block, stream); break;

}

Page 26: Evacuate Now? Faster-than-real-time Shallow Water ...babrodtk.at.ifi.uio.no/files/publications/brodtkorb_gtc2010.pdf · Validation –Barrage de Malpasset South-east France near Fréjus

ICT

Optimization: Early exit

Observation: Many dry areas

do not require computation

Use a small buffer to store

wet blocks

Exit flux kernel if nearest

neighbours are dry

Up-to 6x speedup

Blocks still have to be scheduled

Blocks read the auxiliary buffer

One wet cell marks the whole block as wet

26

Page 27: Evacuate Now? Faster-than-real-time Shallow Water ...babrodtk.at.ifi.uio.no/files/publications/brodtkorb_gtc2010.pdf · Validation –Barrage de Malpasset South-east France near Fréjus

ICT

Results - Performance

Circular Dam break

1st order Euler

30% wet cells: 1200 megacells / s

50% wet cells: 900 megacells / s

100% wet cells: 300 megacells / s

2nd order Runge-Kutta

30% wet cells: 600 megacells / s

50% wet cells: 450 megacells / s

100% wet cells: 150 megacells / s

27

Page 28: Evacuate Now? Faster-than-real-time Shallow Water ...babrodtk.at.ifi.uio.no/files/publications/brodtkorb_gtc2010.pdf · Validation –Barrage de Malpasset South-east France near Fréjus

ICT

Results – Multiple GPUs

Single-node multi-GPU

Four Tesla GPUs

Threading

Near-perfect weak scaling

Near-perfect strong scaling

Up-to 380 million cells (16 GB)

19 000 x 19 000 cells

28

Page 29: Evacuate Now? Faster-than-real-time Shallow Water ...babrodtk.at.ifi.uio.no/files/publications/brodtkorb_gtc2010.pdf · Validation –Barrage de Malpasset South-east France near Fréjus

ICT

Verification

2D Parabolic basin

Planar water surface oscillates

100 x 100 cells

Horizontal scale: 8 km

Vertical scale: 3.3 m

Simulation and analytical match well

But, as most schemes, growing errors along wet-dry interface

29

Page 30: Evacuate Now? Faster-than-real-time Shallow Water ...babrodtk.at.ifi.uio.no/files/publications/brodtkorb_gtc2010.pdf · Validation –Barrage de Malpasset South-east France near Fréjus

ICT

Validation – Barrage de Malpasset South-east France near Fréjus

Bursts at 21:13 December 2nd 1959

40 meter high wall of water

70 km/h (43 mi/h)

Reaches mediterranean in 30 minutes

423 casualties, $68 million in damages

30Images from Google maps, TeraMetrics

Double curvature dam

66.5 m high

220 m crest length

55 million cubic metres of water

Page 31: Evacuate Now? Faster-than-real-time Shallow Water ...babrodtk.at.ifi.uio.no/files/publications/brodtkorb_gtc2010.pdf · Validation –Barrage de Malpasset South-east France near Fréjus

ICT

Validation

Experimental data from 1:400 model

482 000 cells

1100 x 440 bathymetry values

15 meter resolution

31

Accurately predicts maximum

elevation and front arrival time

Largest discrepancy at gauges

14 (arrival time) and 9 (elevation)

Compares well with published results

Page 32: Evacuate Now? Faster-than-real-time Shallow Water ...babrodtk.at.ifi.uio.no/files/publications/brodtkorb_gtc2010.pdf · Validation –Barrage de Malpasset South-east France near Fréjus

ICT

Implementation – CPU framework

Simulation loop executed by CPU

Output to netCDF

Direct visualization via OpenGL

32

Page 34: Evacuate Now? Faster-than-real-time Shallow Water ...babrodtk.at.ifi.uio.no/files/publications/brodtkorb_gtc2010.pdf · Validation –Barrage de Malpasset South-east France near Fréjus

ICT

Live Demo

Dell XPS m1330, Flamingo Pink

Purchased 09-2008, price ~$1850

Intel Core 2 duo T9300 @ 2.5 GHz

4.0 GB RAM

NVIDIA GeForce 8400M GS

128 MB graphics RAM

Only 16 cuda cores (GTX 480 has 448)

34

Windows Vista Ultimate SP2 32-bit

CUDA toolkit/SDK 3.1 32-bit

CUDA Driver 257.21

Microsoft Visual Studio 2008

Images from dell.com

Page 35: Evacuate Now? Faster-than-real-time Shallow Water ...babrodtk.at.ifi.uio.no/files/publications/brodtkorb_gtc2010.pdf · Validation –Barrage de Malpasset South-east France near Fréjus

ICT

Summary

Faster than real-time performance

150-1200 megacells per second

Verified and validated results

Can accurately predict real-world events using single precision

Direct visualization

Interactive exploration of simulation results

35

Learn how to simulate a half an hour dam break in seconds

Page 36: Evacuate Now? Faster-than-real-time Shallow Water ...babrodtk.at.ifi.uio.no/files/publications/brodtkorb_gtc2010.pdf · Validation –Barrage de Malpasset South-east France near Fréjus

ICT

References

36

A. R. Brodtkorb, T. R. Hagen, K.-A. Lie and J. R. Natvig,

Simulation and Visualization of the Saint-Venant System using GPUs,

Computing and Visualization in Science, 2010

special issue on Hot topics in Computational Engineering, [forthcoming].

A. R. Brodtkorb, M. L. Sætra, and M. Altinakar,

Efficient Shallow Water Simulations on GPUs: Implementation,

Visualization, Verification, and Validation,

in review, 2010.

A. R. Brodtkorb,

Scientific Computing on Heterogeneous Architectures

Ph.D. Thesis, University of Oslo,

Submitted, 2010.

Page 37: Evacuate Now? Faster-than-real-time Shallow Water ...babrodtk.at.ifi.uio.no/files/publications/brodtkorb_gtc2010.pdf · Validation –Barrage de Malpasset South-east France near Fréjus

ICT

Thank you for your attention.

Questions?

[email protected]

37

http://babrodtk.at.ifi.uio.no http://hetcomp.com