aa220/cs238 - parallel methods in numerical analysis...

48
Lecture 27 November 26, 2003 AA220/CS238 - Parallel Methods in Numerical Analysis Parallel Visualization in the ASCI Program

Upload: others

Post on 19-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: AA220/CS238 - Parallel Methods in Numerical Analysis ...adl.stanford.edu/cme342/Lecture_Notes_files/lecture27-03.pdf · Lecture 27 November 26, 2003 AA220/CS238 - Parallel Methods

Lecture 27

November 26, 2003

AA220/CS238 - Parallel Methods in Numerical Analysis

Parallel Visualization in the ASCI Program

Page 2: AA220/CS238 - Parallel Methods in Numerical Analysis ...adl.stanford.edu/cme342/Lecture_Notes_files/lecture27-03.pdf · Lecture 27 November 26, 2003 AA220/CS238 - Parallel Methods

Overview

• Visualization of large-scale datasets generated with massively

parallel machines is a very compute intensive task:

– Large datasets

– Usually time-dependent

– Complex solution features yield large I/O requirements

– Floating point operations needed to render the image

• Advancements are required in several areas

– Basic improvements in visualization algorithms

– Parallel implementation of visualization algorithms

– Parallel visualization hardware (scalable and cost-effective)

Page 3: AA220/CS238 - Parallel Methods in Numerical Analysis ...adl.stanford.edu/cme342/Lecture_Notes_files/lecture27-03.pdf · Lecture 27 November 26, 2003 AA220/CS238 - Parallel Methods

Overview - Cont’d

• Examples in this lecture drawn from:

– Stanford ASCI work in unsteady turbomachinery flow simulations

– University of Utah Scientific Computing and Imaging Institute

– Collaboration with MIT on parallel pV3

• A number of research groups are working on parallel

visualization techniques (both hardware and software):

– Stanford University

– U. of Utah

– DoE National Laboratories

– Etc…

Page 4: AA220/CS238 - Parallel Methods in Numerical Analysis ...adl.stanford.edu/cme342/Lecture_Notes_files/lecture27-03.pdf · Lecture 27 November 26, 2003 AA220/CS238 - Parallel Methods

Large-Scale Scientific Visualization

Scientific Computing and Imaging InstituteScientific Computing and Imaging Institute

University of UtahUniversity of Utah

Chris Johnson

Page 5: AA220/CS238 - Parallel Methods in Numerical Analysis ...adl.stanford.edu/cme342/Lecture_Notes_files/lecture27-03.pdf · Lecture 27 November 26, 2003 AA220/CS238 - Parallel Methods

Interactive Large-Scale Visualization

Medical

Scientific

ComputingGeoScience

Page 6: AA220/CS238 - Parallel Methods in Numerical Analysis ...adl.stanford.edu/cme342/Lecture_Notes_files/lecture27-03.pdf · Lecture 27 November 26, 2003 AA220/CS238 - Parallel Methods

The Visualization Pipeline

OnlineOffline

Search RenderConstruct

Isovalue

• Dynamic extraction of isosurfaces

• Rapid extractions

Pre-

process

Generate Render

Visualization Process

Page 7: AA220/CS238 - Parallel Methods in Numerical Analysis ...adl.stanford.edu/cme342/Lecture_Notes_files/lecture27-03.pdf · Lecture 27 November 26, 2003 AA220/CS238 - Parallel Methods

Isosurface Extraction

• Marching Cubes

• Octree

• Extrema Graphs

• Sweeping Simplices

• The Span Space

• Livnat, Shen, Johnson

Isovalue

Isovalue

Maximum

Minimum

min =

max

Page 8: AA220/CS238 - Parallel Methods in Numerical Analysis ...adl.stanford.edu/cme342/Lecture_Notes_files/lecture27-03.pdf · Lecture 27 November 26, 2003 AA220/CS238 - Parallel Methods

Isosurface Extraction

• Marching Cubes

• Octree

• Extrema Graphs

• Sweeping Simplices

• The Span Space

• Livnat, Shen, Johnson

Isovalue

Isovalue

Maximum

Minimum

min =

max

Page 9: AA220/CS238 - Parallel Methods in Numerical Analysis ...adl.stanford.edu/cme342/Lecture_Notes_files/lecture27-03.pdf · Lecture 27 November 26, 2003 AA220/CS238 - Parallel Methods

Isosurface Extraction

• Marching Cubes

• Octree

• Extrema Graphs

• Sweeping Simplices

• The Span Space

• Livnat, Shen, Johnson

Maximum

Minimum

min =

max

– NOISE: O( n+k)

Page 10: AA220/CS238 - Parallel Methods in Numerical Analysis ...adl.stanford.edu/cme342/Lecture_Notes_files/lecture27-03.pdf · Lecture 27 November 26, 2003 AA220/CS238 - Parallel Methods

The Visualization Pipeline

Search Renderconstruct

Isovalue

OnlineOffline

• Reduce the amount of data

– Reduce during the search...

Pre-

process

View point

O(k) O(k)O(V(k)) O(V(k))

Page 11: AA220/CS238 - Parallel Methods in Numerical Analysis ...adl.stanford.edu/cme342/Lecture_Notes_files/lecture27-03.pdf · Lecture 27 November 26, 2003 AA220/CS238 - Parallel Methods

A View-dependent Approach

• Attractive for:

– Large datasets

– High depth complexity

– Remote visualization

Page 12: AA220/CS238 - Parallel Methods in Numerical Analysis ...adl.stanford.edu/cme342/Lecture_Notes_files/lecture27-03.pdf · Lecture 27 November 26, 2003 AA220/CS238 - Parallel Methods

A View-dependent Approach

• Three step method

Traverse

Project

To Graphics

Hardware

Front to back

1) Traverse front to back

2) Project onto a virtual screen

3) Render triangles on graphics hardware

Page 13: AA220/CS238 - Parallel Methods in Numerical Analysis ...adl.stanford.edu/cme342/Lecture_Notes_files/lecture27-03.pdf · Lecture 27 November 26, 2003 AA220/CS238 - Parallel Methods

A View-dependent Approach

• Flow chart

• Object Space • Value Space

• Image Space

• Prune non -

intersecting cells

• Front to Back

traversal

• Prune non-visible cells

• Graphics

Engine

• Z-buffer

• Rendering

•S

oft

ware

Hard

ware

• Final Image

• Visibility Part II

• Visibility Part I

Page 14: AA220/CS238 - Parallel Methods in Numerical Analysis ...adl.stanford.edu/cme342/Lecture_Notes_files/lecture27-03.pdf · Lecture 27 November 26, 2003 AA220/CS238 - Parallel Methods

Visible Woman

• Full View

• Isosurface depend

• Polys 2,246,000 246,000

• Create 177 sec 72 sec

• Render 2.32 sec 0.25 sec

Page 15: AA220/CS238 - Parallel Methods in Numerical Analysis ...adl.stanford.edu/cme342/Lecture_Notes_files/lecture27-03.pdf · Lecture 27 November 26, 2003 AA220/CS238 - Parallel Methods

Why Not Always Use Polygons?

• Marching cubes and similar algorithms can

generate millions of polygons for large data sets

– Reduce by decimation (e.g. Shekhar et. al ‘96)

– View dependent (e.g. Livnat and Hansen ‘98)

Page 16: AA220/CS238 - Parallel Methods in Numerical Analysis ...adl.stanford.edu/cme342/Lecture_Notes_files/lecture27-03.pdf · Lecture 27 November 26, 2003 AA220/CS238 - Parallel Methods

Real-Time Ray Tracer

Page 17: AA220/CS238 - Parallel Methods in Numerical Analysis ...adl.stanford.edu/cme342/Lecture_Notes_files/lecture27-03.pdf · Lecture 27 November 26, 2003 AA220/CS238 - Parallel Methods

Real-Time Ray Tracer (RTRT)

• Implemented on SGI Origin 3000 ccNUMA

architecture - up to 512 processors (now

working on a distributed version)

• Approximately linear speedup

• Load balancing and memory coherence are

key to performance

Page 18: AA220/CS238 - Parallel Methods in Numerical Analysis ...adl.stanford.edu/cme342/Lecture_Notes_files/lecture27-03.pdf · Lecture 27 November 26, 2003 AA220/CS238 - Parallel Methods

Algorithm - 3 Phases

• Traversing a ray through cells that do notcontain an isosurface

• Analytically computing the isosurface whenthe intersecting volume contains an isosurface

• Shading the resulting intersection point

Page 19: AA220/CS238 - Parallel Methods in Numerical Analysis ...adl.stanford.edu/cme342/Lecture_Notes_files/lecture27-03.pdf · Lecture 27 November 26, 2003 AA220/CS238 - Parallel Methods

0

5

10

15

20

Frame Number (time)

Fra

mes/

seco

nd

(3

2 p

ro

cess

ors)

Page 20: AA220/CS238 - Parallel Methods in Numerical Analysis ...adl.stanford.edu/cme342/Lecture_Notes_files/lecture27-03.pdf · Lecture 27 November 26, 2003 AA220/CS238 - Parallel Methods

Real-Time Ray Tracer - Scalability

Page 21: AA220/CS238 - Parallel Methods in Numerical Analysis ...adl.stanford.edu/cme342/Lecture_Notes_files/lecture27-03.pdf · Lecture 27 November 26, 2003 AA220/CS238 - Parallel Methods

RTRT Time Varying Visualization

Page 22: AA220/CS238 - Parallel Methods in Numerical Analysis ...adl.stanford.edu/cme342/Lecture_Notes_files/lecture27-03.pdf · Lecture 27 November 26, 2003 AA220/CS238 - Parallel Methods

Real-time Volume Rendering

Page 23: AA220/CS238 - Parallel Methods in Numerical Analysis ...adl.stanford.edu/cme342/Lecture_Notes_files/lecture27-03.pdf · Lecture 27 November 26, 2003 AA220/CS238 - Parallel Methods

Volume Rendering

enamel /

backgrounddentin / background dentin / enamel dentin / pulp

1D: not possible

2D: specificity not as good

Page 24: AA220/CS238 - Parallel Methods in Numerical Analysis ...adl.stanford.edu/cme342/Lecture_Notes_files/lecture27-03.pdf · Lecture 27 November 26, 2003 AA220/CS238 - Parallel Methods

Volume Rendering - 3D Transfer Function

Page 25: AA220/CS238 - Parallel Methods in Numerical Analysis ...adl.stanford.edu/cme342/Lecture_Notes_files/lecture27-03.pdf · Lecture 27 November 26, 2003 AA220/CS238 - Parallel Methods

Vector FieldsVector Fields

© ZIB© ZIB

© © UofUUofU

Page 26: AA220/CS238 - Parallel Methods in Numerical Analysis ...adl.stanford.edu/cme342/Lecture_Notes_files/lecture27-03.pdf · Lecture 27 November 26, 2003 AA220/CS238 - Parallel Methods

LIC Flow (Banks and Interrante)

Page 27: AA220/CS238 - Parallel Methods in Numerical Analysis ...adl.stanford.edu/cme342/Lecture_Notes_files/lecture27-03.pdf · Lecture 27 November 26, 2003 AA220/CS238 - Parallel Methods

Illuminated Lines - C. Hege, ZIB

Page 28: AA220/CS238 - Parallel Methods in Numerical Analysis ...adl.stanford.edu/cme342/Lecture_Notes_files/lecture27-03.pdf · Lecture 27 November 26, 2003 AA220/CS238 - Parallel Methods

Tensor Visualization - Hesselink

Page 29: AA220/CS238 - Parallel Methods in Numerical Analysis ...adl.stanford.edu/cme342/Lecture_Notes_files/lecture27-03.pdf · Lecture 27 November 26, 2003 AA220/CS238 - Parallel Methods

Brush Strokes (Laidlaw `98)

Page 30: AA220/CS238 - Parallel Methods in Numerical Analysis ...adl.stanford.edu/cme342/Lecture_Notes_files/lecture27-03.pdf · Lecture 27 November 26, 2003 AA220/CS238 - Parallel Methods

Lecture 27

November 26, 2003

AA220/CS238 - Parallel Methods in Numerical Analysis

Large-Scale Visualization of

Turbomachinery Flows Using pV3

Page 31: AA220/CS238 - Parallel Methods in Numerical Analysis ...adl.stanford.edu/cme342/Lecture_Notes_files/lecture27-03.pdf · Lecture 27 November 26, 2003 AA220/CS238 - Parallel Methods

Objectives

• Utilize existing software and hardware

technologies to visualize large datasets with

proper scalability in both

– Display size / resolution

– Rendering speed

• Interactive visualization of large-scale

datasets for useful investigation of simulation

results

• Understand what can be done with the kind of

visualization systems that will be available on

the desktop in 2-3 years

Page 32: AA220/CS238 - Parallel Methods in Numerical Analysis ...adl.stanford.edu/cme342/Lecture_Notes_files/lecture27-03.pdf · Lecture 27 November 26, 2003 AA220/CS238 - Parallel Methods

Motivation

• At Stanford, in the DoE ASCI (Accelerated Strategic

Computing Initiative) we are trying to simulate very large

scale flows in turbomachinery. The visualization of these

flows is rather difficult and time consuming.

• Our CS group has a lot of expertise in software and hardware

for parallel rendering.

• Can we leverage these tools in the context of an engineering-

usable visualization package?

Page 33: AA220/CS238 - Parallel Methods in Numerical Analysis ...adl.stanford.edu/cme342/Lecture_Notes_files/lecture27-03.pdf · Lecture 27 November 26, 2003 AA220/CS238 - Parallel Methods
Page 34: AA220/CS238 - Parallel Methods in Numerical Analysis ...adl.stanford.edu/cme342/Lecture_Notes_files/lecture27-03.pdf · Lecture 27 November 26, 2003 AA220/CS238 - Parallel Methods

Objective - Demonstrate Potential of

Hi-Fi Gas Turbine Engine Simulation

• Integrated fan/compressor/combustor/turbine/secondariesunsteady flow and turbulentcombustion simulation

– RANS Turbomachinery

– Combustor

• RANS (NASA-NCC)

• LES (CITS)

– Multi-Code Interface

• Complex code coupling

• Will require 100 TFLOPS

• Have industry and NASAparticipation and interest

P&W 6000 Engine

Page 35: AA220/CS238 - Parallel Methods in Numerical Analysis ...adl.stanford.edu/cme342/Lecture_Notes_files/lecture27-03.pdf · Lecture 27 November 26, 2003 AA220/CS238 - Parallel Methods

FlamletFlamlet-progress variable model-progress variable model

for combustion LESfor combustion LES

Mixture fraction

Product mass

fraction

Page 36: AA220/CS238 - Parallel Methods in Numerical Analysis ...adl.stanford.edu/cme342/Lecture_Notes_files/lecture27-03.pdf · Lecture 27 November 26, 2003 AA220/CS238 - Parallel Methods

P&W combustor 2.5D grid 1P&W combustor 2.5D grid 1

Page 37: AA220/CS238 - Parallel Methods in Numerical Analysis ...adl.stanford.edu/cme342/Lecture_Notes_files/lecture27-03.pdf · Lecture 27 November 26, 2003 AA220/CS238 - Parallel Methods

Stanford-ASCI TFLO Project Goals

• To develop a scalable code (TFLO) that is capable of:

– tackling large-scale unsteady flow simulations of multistageturbomachinery, as well as interactions between compressor, combustor,and turbine

– rapid and cost-effective steady and unsteady analyses required in a designenvironment (single blade passages, multiple stage simulation with lowblade counts) comparable to existing industrial practice

– incorporate advanced turbulence models with corrections to account foreffects typical in turbomachinery (streamline curvature, rotation, etc.)

• To contribute to the development of numerical simulationtechniques that make this type of calculations computationallyaffordable

• To demonstrate integrated calculations simulating the interactionbetween the compressor, combustor, and HP/LP turbine

Page 38: AA220/CS238 - Parallel Methods in Numerical Analysis ...adl.stanford.edu/cme342/Lecture_Notes_files/lecture27-03.pdf · Lecture 27 November 26, 2003 AA220/CS238 - Parallel Methods

Gas-Turbine Components

Page 39: AA220/CS238 - Parallel Methods in Numerical Analysis ...adl.stanford.edu/cme342/Lecture_Notes_files/lecture27-03.pdf · Lecture 27 November 26, 2003 AA220/CS238 - Parallel Methods

TFLO performance on P&W 6000 turbine

Page 40: AA220/CS238 - Parallel Methods in Numerical Analysis ...adl.stanford.edu/cme342/Lecture_Notes_files/lecture27-03.pdf · Lecture 27 November 26, 2003 AA220/CS238 - Parallel Methods

Unsteady Simulation of

Aachen Turbine Rig (TFLO)

Entropy

x/C

pressureenvelope(p/pref)

0 0.25 0.5 0.75 11.2

1.25

1.3

1.35

1.4

passage count 1-1-1passage count 6-7-6

Aachen BladeUnsteady Pressure

Envelopes• Simulation Completed,

AIAA Paper Presented

• 13.5 M Points

• 374 Blocks

• 187 Processors

• 2,800 Time-Steps (w/ 30

inner iterations per time-

step) Required

• 1,985 Hours (clock

time), 371,000 Hours

(cpu time) Required

Frequency/BPF

PressureAmplitude(Pa)

0 1 2 3 4 5 6 7 8 9 100

10

20

30

40

50

60

70

80

90

Estimated T.E. vortex sheddingfrequency: 10 BPF

1.00

0.69 0.660.83

0.76

0.28

0.03

0.21

0.240.24

0.10

0.03

0.17

0.21

0.10

0.38

0.030.21

0.41

0.14

0.310.03

0.55

0.45

0.38

0.21 0.03

0.03

0.17

0.03

0.07

0.21

T.E.L.E. L.E.

S.S. P.S.

Amplitude of Harmonics

Frequency Spectrum

PredictedMeasured

Plane No. 00Time Index: 1

Secondary Velocity Field

Page 41: AA220/CS238 - Parallel Methods in Numerical Analysis ...adl.stanford.edu/cme342/Lecture_Notes_files/lecture27-03.pdf · Lecture 27 November 26, 2003 AA220/CS238 - Parallel Methods

Unsteady Flow Simulation of

P&W Turbine Rig (TFLO)

PressureEntropy

Blade Trailing

Edge Shocks

Shock/Blade

Interaction:

Reflected

Waves from

Vane

Viscous

Wake/Blade

Interaction

Vane/Blade

Potential

Interaction • One Global (1/6 Circumference)

(33% of Total) Completed

• 31.2 M Points

• 652 Blocks

• 196 Processors

• 4,200 Time-Steps (w/ 30 inner

iterations per time-step) Required

• 4,125 Hours (clock time), 808,500

Hours (cpu time) Required

Pressure Loading

Compares Well with

Experiment and PW

Prediction

Predicted Aerodynamic

Losses Compare

Favorably with PW

Prediction

Page 42: AA220/CS238 - Parallel Methods in Numerical Analysis ...adl.stanford.edu/cme342/Lecture_Notes_files/lecture27-03.pdf · Lecture 27 November 26, 2003 AA220/CS238 - Parallel Methods

Unsteady Flow Simulation of PW6000 Turbine

(TFLO)

Entropy

• 63% of One GlobalCycle (1/6Circumference)(21% of Total)Completed

• 93.8 M Points• 2192 Blocks

• 512 - 1024Processors

• 5,700 Time-Steps(w/ 30 inneriterations per time-step) Required

• 5,970 Hours (clocktime), 3,060,000Hours (cpu time)Required

HPT (1,2,3)

LPT (5,6,7)

Page 43: AA220/CS238 - Parallel Methods in Numerical Analysis ...adl.stanford.edu/cme342/Lecture_Notes_files/lecture27-03.pdf · Lecture 27 November 26, 2003 AA220/CS238 - Parallel Methods

Main/Secondary Flow Path Integration

Direct Coupling (SPMD)

Temperature and

Streamlines

(projected inconstant )

Temperature and 3D Blade-

Relative Streamlines

Pressure and

Streamlines

(projected inconstant )

• Simulation Complete

• 9.4 M Points, 238 Blocks, 144

Processors,

• 1-200,000 Time-Steps Required

• 3,700 Hours (clock time), 532,800

Hours (cpu time) Required

Page 44: AA220/CS238 - Parallel Methods in Numerical Analysis ...adl.stanford.edu/cme342/Lecture_Notes_files/lecture27-03.pdf · Lecture 27 November 26, 2003 AA220/CS238 - Parallel Methods

Key Technologies

• Hardware

– High resolution displays• Powerwall

• Super-high resolution displays (5000x3000 kind)

– High speed network interconnects for commodityclusters

• Software

– Support for tiled / high resolution displays with wireGL

– Parallel software implementation for scalablerendering using pV3

Page 45: AA220/CS238 - Parallel Methods in Numerical Analysis ...adl.stanford.edu/cme342/Lecture_Notes_files/lecture27-03.pdf · Lecture 27 November 26, 2003 AA220/CS238 - Parallel Methods

Why pV3?

• pV3 is already setup for

– Parallel feature extraction

– Concurrent visualization

– Distributed visualization

– Computational steering

• Work to be done

– Use of wireGL for tiled displays (completed)

– Parallelization of renderer (almost completed)

Page 46: AA220/CS238 - Parallel Methods in Numerical Analysis ...adl.stanford.edu/cme342/Lecture_Notes_files/lecture27-03.pdf · Lecture 27 November 26, 2003 AA220/CS238 - Parallel Methods

Current Large-Scale Visualization Setup

GR

1

GR

2

GR

3

GR

4

CPU 1

CPU 2

CPU 3

CPU 4

WAN

CPU 1

CPU 2

CPU 3

CPU 4

CPU 5

CPU 6

CPU 7

CPU 8

CPU 9

CPU 10

CPU 11

CPU 12

CPU 13

CPU 14

CPU 15

CPU 16

pV3

clients

pV3

server

(wiregl)

Feature Extraction

Rendering

wiregl

Bottlenecks in WAN (avoidable),

single renderer (in progress),

internal network

Page 47: AA220/CS238 - Parallel Methods in Numerical Analysis ...adl.stanford.edu/cme342/Lecture_Notes_files/lecture27-03.pdf · Lecture 27 November 26, 2003 AA220/CS238 - Parallel Methods

Future Large-Scale Visualization Setup

GR

1

GR

2

GR

3

GR

4

CPU 1

CPU 2

CPU 3

CPU 4

WAN

CPU 1

CPU 2

CPU 3

CPU 4

CPU 5

CPU 6

CPU 7

CPU 8

CPU 9

CPU 10

CPU 11

CPU 12

CPU 13

CPU 14

CPU 15

CPU 16

pV3

clients

pV3

server

(wiregl)

Feature Extraction

Rendering

wiregl

Page 48: AA220/CS238 - Parallel Methods in Numerical Analysis ...adl.stanford.edu/cme342/Lecture_Notes_files/lecture27-03.pdf · Lecture 27 November 26, 2003 AA220/CS238 - Parallel Methods

Advantages / Expected Outcome

• Rendering speed 12 x on current display

(best case scenario)

• High resolution images for flow details

• Large degree of interactivity for

turbomachinery flow visualizations

• Parallel I/O will be necessary for unsteady

flow visualizations