hank’s activities longhorn/xd ahm austin, tx december 20, 2010

Hank’s ActivitiesLonghorn/XD AHM

Austin, TXDecember 20, 2010

Volume rendering of 4608^3 combustion data set

Image credit: Mark Howison

Volume rendering of flame data set using VisIt + IceT on Longhorn. Image credit: Tom Fogal

My perception of my role in Longhorn/XD

Help users succeed via: Direct support Ensuring necessary algorithms/functionality are in

place Research most effective way to utilize

Longhorn Also help test machine through aggressive usage

Collaborate with / facilitate for other project members

Provide visibility for center externally (outreach, etc)

Outline Researching how to best use Longhorn

HW-accelerated volume rendering on Longhorn SW-ray casting on Longhorn

Collaborations Manta/VisIt VDF/VisIt

User support Analysis of 4K^3 turbulent data

Connected components algorithms Other user support

Outreach

HW-accelerated volume rendering on longhorn

“Large Data Visualization on Distributed Memory Multi-GPU Clusters”, HPG2010

Authors: Fogal, Childs, Shankar, Krueger, Bergeron, and Hatcher

Ran VisIt + IceT on Longhorn, varying data size and number of GPUs.

Stage data on CPU, transfer to GPU (high transfer time, but can look at bigger data sets)

Volume rendering of flame data set using VisIt + IceT on Longhorn. Image credit: Tom Fogal

HW-accelerated volume rendering on longhorn

Observation about CPU volume rendering:

Number of cores

Large Small

Ray evaluation Fast Slow

Compositing Slow Fast

Paper purpose: study the performance characteristics of GPU volume rendering at

high concurrency on big data.

Paper purpose: study the performance characteristics of GPU volume rendering at

high concurrency on big data.

Idea: GPU volume rendering has the computational horsepower to do ray evaluation quickly, but will have many fewer MPI participants.

Big data

Lots of GPUs

Fast-ish on small data

Software ray-casting

Previous work (not XD-related): “MPI-Hybrid Parallelism for Volume

Rendering on Large Multi-Core Systems”, EGPGV 2010

Authors: Howison, Bethel, and Childs Strong scaling study up 216,000 cores on ORNL

Jaguar machine looking at 4608^3 data. Study outcome: hybrid parallelism benefits this

algorithm, primarily during the compositing phase, since there are less participants in MPI communication.

One of two EGPGV best paper winners, invited for follow on article to TVCG.

Volume rendering of combustion data set

Image credit: Mark Howison

Software ray-casting

TVCG article (unpublished research): Add

weak scaling study (up to 22K^3) on Jaguar GPU scaling study on Longhorn

GPU scaling study: Went up to 448 GPUs Purpose: similar to Fogal work, but with a

different spin … show that hybrid parallelism is beneficial. Instead of pthreads or OpenMP on the CPU, we are now using CUDA on the GPU.

Scaling results on GPU

2308^3 data

2308^3 data 2308^3-

4608^3 data

2308^3-4608^3

data

Software ray-casting on Longhorn

Two caveats:(1)We didn’t optimize for CUDA. So we could

have had favorable numbers to an even higher concurrency level.

(2)But Jaguar @ 46K processors has more memory and can look at way bigger data sets.

Two caveats:(1)We didn’t optimize for CUDA. So we could

have had favorable numbers to an even higher concurrency level.

(2)But Jaguar @ 46K processors has more memory and can look at way bigger data sets.

Takeaway: for this algorithm and this data size, longhorn is as powerful as 46K processors of

jaguar.

Takeaway: for this algorithm and this data size, longhorn is as powerful as 46K processors of

jaguar.

Manta/VisIt

Carson Brownlee delivers integration of VisIt and Manta via vtkManta objects.

Hank does some small work: Updates work from VisIt 2.0 to

VisIt 2.2 & makes a branch for Hank and Carson to put fixes on.

Testing Carson and Hank create a list of

issues and are in the process of tracking them down.

Rendering of isosurface by VisIt using Manta

Visualizing and Analyzing Large-Scale Turbulent Flow

Detect, track, classify, and visualize features in large-scale turbulent flow.

Analysis effort by Kelly Gaither (TACC), Hank Childs (LBNL), & Cyrus Harrison (LLNL).

Stresses two algorithms that are difficult in a distributed memory parallel setting:

1. Can we identify connected components?

2. Can we characterize their shape?

VisIt calculated connected components on a 4K^3 turbulence data in parallel using TACC's Longhorn machine. 2 million components were initially identified and then the map expression was used to select only the components that had total volume greater than 15. Data courtesy of P.K. Yeung & and Diego Donzis

Identifying connected components in parallel is difficult.

Hard to do efficiently

Tremendous bookkeeping problem.

4 stage algorithm that finds local connectivity and then merges globally.

Participating in 2011 EGPGV submission describing this

algorithm and its performance. Authors: Harrison, Childs, Gaither

Participating in 2011 EGPGV submission describing this

algorithm and its performance. Authors: Harrison, Childs, Gaither

We used shape characterization to assist our feature tracking.

15

Shape characterization metric: chord length distribution Difficult to perform

efficiently in a distributed memory setting

P0P1

P3

P2Line Scan Filter

1) ChooseLines

2) CalculateIntersections

3) Segmentredistribution

4) Analyzelines

5) Collectresults

Line Scan Analysis Sink

It is our hope that chord length distributions, a characteristic

function, can assist in tracking component behavior over time.

It is our hope that chord length distributions, a characteristic

function, can assist in tracking component behavior over time.

My role in this effort

Easily summarized: “use VisIt to get results to Kelly”

Several iterations: Started with just statistics of components Looked at how variation in isovalue affected statistics Added in chord length distributions as a characteristic

function Took still images of each component for visual

inspection (recently) extracted each component as its own

surface for combined inspection.

VDF/VisIt

John Clyne and Dan Lagreca add VDF reader to VisIt.

Hank performs some testing and debugging. Still lots to do:

Formal commit to VisIt repo. Also add in new VisIt multi-res hooks.

Study how well large features are preserved across refinement level.

Use coarsest versions in conjunction with analysis code from Janine Bennett.

Other user support

Small amount of effort helping Saju Varghese and Kentaro Nagmine of UNLV Fixed VisIt bug with ray-casting + point

meshes Helped them format their data into BOV

format

Outreach & Service VisIt tutorials:

SC10 (beginning and advanced), Nov 2010, NOLA Users at US ARL, Sep 2010, Abderdeen, MD SciDAC 2010, July 2010, Chattanooga, TN

Speaker at NSF Extreme Scale I/O and Data Analysis Workshop, March 2010, Austin, TX

Participant in NSF Workshop on SW Development Environments, Sep 2010, Washington DC

Given ~10 additional talks at various venues this year

Proposed Future Plans

Continue collaboration with Kelly on analyzing turbulent flow

Formally integrate VDF Multi-res study with John & Kelly Would like to do 1T cell

runs on Longhorn Continued user support

Esp. CIG Connected components @

EGPGV VisIt + GPU

Two trillion cell data set, rendered

in VisIt by David Pugmire on ORNL

Jaguar machine

Summary Researching how to best use Longhorn

HW-accelerated volume rendering on Longhorn SW-ray casting on Longhorn

Collaborating with other Longhorn/XD members Manta/VisIt VDF/VisIt

Doing user support Helping Kelly analyze 4K^3 turbulent data

Working to make sure connected components algorithms is up to snuff

Some user support and more to come… Performing outreach activities

hank’s activities longhorn/xd ahm austin, tx december 20, 2010

Documents

cpu volume rendering

big data

stage data

rendering of isosurface

flame data set

varying data size

bigger data setsvolume

weak scaling study