ben simons data arena lead developer university of technology, … · 2013-09-09 · isilon (now...

Visualisation of Large Datasets with Houdini

Ben Simons

Data Arena Lead DeveloperUniversity of Technology, Sydney

[email protected]@acm.org

mailto:[email protected]

New UTS Broadway Building

UTS Data Arena~ April 2014

Today's Outline - Big Data

1. Some strategies used in Film Visual FX

2. Visualisation Techniques in Houdini

3. VFX Data Formats & Disk Systems

Happy Feet 2

● 2 Petabytes (2,000,000 GB)

● 3D Stereo HD images

● Render: 18,000 cpu cores

● Parallel access to data● HDF5 data on Bluearc & Isolon

NAS Disk Systems

● Linux software: Maya, Houdini, Naiad, Nuke, 3Delight

● Entirely made at Carriageworks in Sydney at Dr D Studios

Resident Evil 3 Extinction● The Desert Undead: 18-layer images (Rman AOV's)

● Each single image frame was split into 96 tiles

● Rendered on 96 machines, then each frame tile-joined

Houdini www.sidefx.com

Houdini across 2 screens

Houdini Object Nodes

Houdini Procedural Network

Houdini Parameters

Houdini Chops

● Channel is a column of data

● Plain textfiles ok – separate columns with tabs

● Interactive Channel graph (zoom in)

● Visual programming

● Filtering, Sampling, shading, instancing, and rendering

● Hands-on tomorrow will be Chops & Vops

Spitzer Glimpse Datasethttp://data.spitzer.caltech.edu/popular/glimpse/20070416_enhanced_v2/source_lists/south/

Spitzer Space Telescope GLIMPSE Dataset

● South: ~300 files, 78 different Channels, 145K rows

● gzipped .tbl data loaded into Houdini

● Houdini Chops used to filter & calc 'colours'

– Show difference of infra-red magnitude bands● Point colours and scales calculated by VOPs SIMD

Shaders

● Houdini Movie Rendered (Mantra PBR)

– 36M points, filtered <12M

Shading & VOP's

● A shader is a mini-program which makes data● It can be better to generate data than load it.● Shaders allow additional level of management● Geom shaders on HF2 generated 1 billion snow

particles per image frame (impossible to load).● Houdini VOP's are SIMD

HoudiniVOP Network

Instancing

● Saves Memory & I/O by re-using geometry● Copies generated at render time● Each Instance can be varied based on point

attributes● Referencing one “instance object” provides a

massive data reduction

Adaptive Meshes, LOD, Caching & Filtering

● Data reduction techniques● Level of Detail (distance from camera)● Adaptive Meshes● Cache common files locally● Filter texture (images) - Mipmapping

Other tricks -Baked Lighting & Shadows

● Pre-calculate lighting & shadows

● “bake” new textures & reapply onto geom

● Sydney Harbour Multi-Beam Sonar Survey, 30cm data.

● Interactive 3D Fly-through

Know ur Limits: Memory & I/O

● I/O will Bottleneck - Partition the problem & then scale it up

– Split job across many independent machines (eg. render)– Segment data access for each machine (eg. HDF5)

● Alternate memory hardware

● Vector (array) processor - SIMD

– as Cray, now intel SSE/MMX and Nvidia GPU– IBM Cell Processor has Vector Processor

● Content-Addressable Memory

– “associative arrays” are used by Network Routers

Types of System Memory

● Virtual Memory

● Swapping is good, thrashing is bad● SMP vs MPI

● SMP Symmetric Multiprocessing: Multiple CPU's with common/shared memory. Multi-threaded apps.

– eg. Intel Xeon, Core 2 Duo are SMP.– Cache coherency, snooping bus (on distributed SM)

ccNUMA● MPI (Message Passing) PVM Clusters, Beowulf, etc

(Memory not shared)

Data Formats● HDF5 “Heirachical Data Format”

● www.hdfgroup.org

● Browsable container of data (HDFView)● Has “groups & datasets” like “dirs & files”● Data stored in B-Trees● Can also store Binary Data

● HDF5 for Python www.h5py.org● Operate on HDF5 data via python dictionaries

& NumPy arrays - www.numpy.org

http://www.h5py.org/

Disk Systems

● Network Attached Storage (NAS)● Bluearc (now Hitachi) implemented via FPGA● Isilon (now EMC) clustered filesystem, 100GB/s

– Multiple SSD nodes & maintains global file coherency

● Lustre Filesystem● Experimental Parallel distributed filesystem – can

have multiple copies of a file, one master.

● Venti (Bell Labs Plan-9 & Inferno)– WORM Archive. Shares Blocks by secure SHA-1 Hash.

Data Formats 2

● Open VDB www.openvdb.org● Hierachical structure for volumetric data (“clouds”)● Good for sparse volumetric time-varying data● Fast access (constant-time) to voxels● Large set of operators (Level Set tools, filters,

transforms & morphological operators)

http://www.openvdb.org/

Data Formats 3

● Disney Ptex eliminates uv texture assignment● http://ptex.us/● no (u,v)'s required! no seams visible● works on sub-d/poly faces● Stores face adjacency data & filters● Efficiently stores 106 mipmapped texture files● Multi-channels, compressed separately● Used in Disney's “Bolt”

“D3” Data-Driven Documents

● D3 – An amazing Data visualisation web framework (javascript)

● http://d3js.org● See: https://github.com/mbostock/d3/wiki/Gallery

● Offers Parallel Coordinates

● Demo ? Nutrient Contents - An interactive visualization of the USDA Nutrient Database.

http://exposedata.com/parallel/

https://github.com/mbostock/d3/wiki/Gallery

Parallel Co-ordinatesprotein, calcium, sodium, fibre, vitamin c, potassium, carbohydrate, sugar, fat, water, calories, saturated, ...