towards a digital human kathy yelick eecs department u.c. berkeley

Post on 19-Dec-2015

225 Views

Category:

Documents

3 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Towards a Digital Human

Kathy Yelick

EECS Department

U.C. Berkeley

The 20+ Year Vision

• Imagine a “digital body double” – 3D image-based medical record– Includes diagnostic, pathologic, and other

information

• Used for:– Diagnosis– Less invasive surgery-by-robot– Experimental treatments

• Digital Human Effort– Lead by the Federation of American Scientists

Digital Human Today: Imaging

• The Visible Human Project– 18,000 digitized sections of the body

• Male: 1mm sections, released in 1994• Female: .33mm sections, released in 1995

– Goals• study of human anatomy• testing medical imaging algorithms

– Current applications: • educational, diagnostic, treatment planning,

virtual reality, artistic, mathematical and industrial

• Used by > 1,400 licensees in 42 countries

Image Source: www.madsci.org

Digital Human Roadmap

1995 2000 2005 2010

1 organ 1 model

scalable implementations

1 organ multiple models

multiple organs

3D model construction

better algorithms

organ system

coupled models

faster computers

improved programmability

digital human

Building 3D Models from Images

Image source: John Sullivan et al, WPI

Image data from • visible human• MRI• Laboratory experiments

Automatic construction• Surface mesh• Volume mesh• John Sullivan et al, WPI

Simulation: From Genome to Function (James B.Bassingthwaighte, UW)

Genes

HealthOrganism

Organ

Tissue

Cell

Molecule

The Physiome Projecthttp://www.physiome.org

Structure to Function:• Experiments, Databases• Problem Formulation• Engineering the Solutions • Quantitative System Modeling• Archiving & Dissemination

Organ Simulation

Cardiac cells/muscles– SDSC, Auckland, UW, Utah,

Cardiac flow

– NYU,…

Lung transport– Vanderbilt

Lung flow– ORNL

Cochlea

– Caltech, UM, UCB

Kidney mesh generation

– Dartmouth

Electrocardiography– Johns Hopkins,…Skeletal mesh

generation

Brain

– ElIisman

Just a few of the efforts at understanding and simulating parts of the human body

Immersed Boundaries within the Body

• Fluid flow within the body is one of the major challenges, e.g., – Blood through the heart– Coagulation of platelets in clots– Movement of bacteria

• A key problem is modeling an elastic structure immersed in a fluid– Irregular moving boundaries– Wide range of scales– Vary by structure, connectivity, viscosity, external

forced, internally-generated forces, etc.

Heart Simulation

Developed by Peskin and McQueen at NYU– On a Cray C90: 1 heart-beat in 100 CPU hours– 1283 point fluid mesh; coarser mesh incorrect – Heart represented by muscle fibers

–Applications: • Understanding

structural abnormalities

• Evaluating artificial heart valves

• Eventually, artificial hearts

Source: www.psc.org

Heart Simulation

Source: www.psc.org

Animation of lower portion of the heart

Platelet Simulation

• Simulation of blood clotting– Developed by Aaron Fogelson and Charlie Peskin– Cells are represented by polygons

• Rules added to simulate adhesion

– Artery walls represented as fibers– 2D Simulation

• Main implementation in F77 for vectors• Our implemented in Split-C for distributed memory

Cochlea Simulation

– Simulates fluid-structure interactions due to incoming sound waves

– Potential applications: design of cochlear implants

• Simulation–OpenMP code

• E. Givelberg• J. Bunn• M. Rajan

–256x256x128 fluid grid

–18 hours on HP Superdome

Cochlea Simulation

• Embedded structures are– Elastic membranes– Shell model– Variable elasticity

• Capture shape of cochlea

• Shown here is a plate– Implemented in a

few weeks within Titanium code Source: Titanium simulation on a T3E

Bacteria Simulation

• Simulation of bacteria in a fluid– Requires smaller scale– Small animal swimming

• Lisa Fauchy, Tulane and

• James Sullivan, Tulane

• E coli simulation– Video © James A. Sullivan, Cells Alive!

• Brownian motion extension– Under development by Peter Kramer, RPI– microscopic processes where thermal fluctuation is

important

Insect Flight Simulation

• Wings are– Immersed 2D

structure

• Under development – UW and NYU

• Applications to– Insect robot design

Source: Dickenson, UCB

Other Applications

• The immersed boundary method has also been used, or is being applied to– Flags and parachutes– Flagella – Embryo growth– Valveless pumping– Paper making

Immersed Boundary Method Structure

Fiber activation & force calculation

InterpolateVelocity

Navier-Stokes Solver

SpreadForce

4 steps in each timestep

Fiber Points

Interaction

Fluid Lattice

Challenges to Parallelization

• Irregular fiber lists need to interact with regular fluid lattice. – Trade-off between load balancing of fibers

and minimizing communication• Efficient “scatter-gather” across processors

• Need a scalable elliptic solver– Plan to uses multigrid – Eventually add Adaptive Mesh Refinement

• New algorithms under development by Colella’s group at LBNL and Baden at UCSD

Software Architecture

Application Models

Generic Immersed Boundary Method (Titanium)

Heart(Titanium)

CochleaFlagellateSwimming

Spectral(Titanium)

Multigrid MLC(KeLP)

AMR

Extensible Simulation

SolversMultigrid(Titanium)

– Based on Generic Immersed Boundary Method– Can plug in new models by extending fiber points– Uses Java inheritance for simplicity

Immersed Boundaries in Titanium

–Heartbeat simulation • Experiment 800• 1/5th of heartbeat done• Using to validate code

• Recent improvements– FFTW in solver: 10x faster– Study of fluid/fiber

communication• Better load balancing

Source: Titanium simulation on Millennium

• Immersed Boundary Method in Titanium is complete– Download from www.cs.berkeley.edu/~smyau

Reduced Solver Bottleneck

• Performance improvements in past year– About 5x overall; ~10 seconds/ts on 64 T3E nodes– Bottleneck has changed to interaction phases

Fiber14%

Solve23%

Spread30%

Inter-polate27%

Inter-polate27%

Spread27%

Solve44%

Fiber2%

2000 Performance 2002 Performance

Load Balance and Locality

• Tried various assignments of fiber points

Optimized for Locality

Optimized for Load Balance

• Optimizing for load balance generally better– Application (fiber structure) dependent– Somewhat constrained by spectral solver– Multigrid will give more options

Scallop: New Multigrid Poisson Solver

• A latency tolerant elliptic solver library– Based on domain decomposition– Trades off extra computation for fewer messages– Will be used to build Navier-Stokes Solver

• Work by Scott Baden and Greg Balls, UCSD– Based on Balls/Colella algorithm– Implemented in C++/KeLP, with a simple interface

• Solver not yet integrated– Algorithm is complete– Performance tuning is ongoing– Interface between Titanium and C++/KeLP

developed

Tools for High Performance

Challenges to parallel simulation of a digital human are generic

• Parallel machines are too hard to program– Users “left behind” with each new major generation

• Efficiency is too low– Even after a large programming effort – Single digit efficiency numbers are common

• Approach– Titanium: A modern (Java-based) language that

provides performance transparency– Sparsity: Self-tuning scientific kernels

Titanium Overview

Object-oriented language based on Java with:• Scalable parallelism

– Single Program Multiple Data (SPMD) model of parallelism, 1 thread per processor

• Global address space– Processors can read/write memory on other

processor– Pointer dereference can cause communication

• Intermediate point between message passing and shared memory

Performance Challenges

• Two separate performance problems– Serial Performance

• Can Java be used for high performance?• What if we compile to machine code, rather

than using the JVM?

– Communication Performance• Global address space encourages the use of

small messages• Overlapping used to tolerate latencies• How well does this work on current networks?

SciMark Benchmark

• Numerical benchmark for Java, C/C++• Five kernels:

– FFT (complex, 1D)– Successive Over-Relaxation (SOR)– Monte Carlo integration (MC)– Sparse matrix multiply – dense LU factorization

• Results are reported in Mflops• Download and run on your machine from:

– http://math.nist.gov/scimark2– C and Java sources also provided

Roldan Pozo, NIST, http://math.nist.gov/~Rpozo

SciMark: Java vs. C(Sun UltraSPARC 60)

0

10

20

30

40

50

60

70

80

90

MFl

ops

FFT SOR MC Sparse LU

C

Java

* Sun JDK 1.3 (HotSpot) , javac -0; Sun cc -0; SunOS 5.7

Roldan Pozo, NIST, http://math.nist.gov/~Rpozo

SciMark: Java vs. C(Intel PIII 500MHz, Win98)

0

20

40

60

80

100

120

FFT SOR MC Sparse LU

C

Java

* Sun JDK 1.2, javac -0; Microsoft VC++ 5.0, cl -0; Win98

Roldan Pozo, NIST, http://math.nist.gov/~Rpozo

Can we do better without the JVM?

• Pure Java with a JVM (and JIT)– Within 2x of C and sometimes better

• OK for many users, even those using high end machines

– Depends on quality of both compilers

• We can try to do better using a traditional compilation model – E.g., Titanium compiler at Berkeley

• Compiles Java extension to C• Bounds checking of arrays is crucial – compiler

flag to turn this off

Java Compiled by Titanium Compiler

Performance on a Pentium IV (1.5GHz)

050

100150200250300350400450

Overall FFT SOR MC Sparse LU

MF

lop

s

java C (gcc -O6) Ti Ti -nobc

Java Compiled by Titanium Compiler

Performance on a Sun Ultra 4

0

10

20

30

40

50

60

70

Overall FFT SOR MC Sparse LU

MF

lop

s

Java C Ti Ti -nobc

Language Support for Performance

• Multidimensional arrays– Contiguous storage– Support for sub-array operations without copying

• Support for small objects– E.g., complex numbers– Called “immutables” in Titanium– Sometimes called “value” classes

• Semi-automatic memory management– Create named “regions” for new and delete– Avoids distributed garbage collection

Performance Challenges

• Two separate performance problems– Serial Performance

• Can Java be used for high performance?• What if we compile to machine code, rather

than using the JVM?

– Communication Performance• Global address space encourages the use of

small messages• Overlapping used to tolerate latencies• How well does this work on current networks?

The Network Parameters

• EEL – End to end latency or time spent sending a short message between two processes.

• Parameters of the LogP Model– L – “Latency”or time spent on the network

• During this time, processor can be doing other work

– O – “Overhead” or processor busy time on the sending or receiving side.

– G – “gap” the rate at which messages can be pushed onto the network.

– P – the number of processors

• Overhead is the most important parameter – cannot be hidden

LogP Parameters: Overhead & Latency• Non-overlapping

overhead• Send and recv

overhead can overlap

P0

P1

osend

L

orecv

P0

P1

osend

orecv

EEL = osend + L + orecv

EEL < osend + orecv

Performance of Remote Read/Write

0

5

10

15

20

25

T3E/M

PI

T3E/S

hmem

T3E/E

-Reg

IBM

/MPI

IBM

/LAPI

Quadr

ics/M

PI

Quadr

ics/P

ut

Quadr

ics/G

et

M2K

/MPI

M2K

/GM

Dolph

in/M

PI

Gigan

et/V

IPL

use

c

Send Overhead (alone) Send & Rec Overhead Rec Overhead (alone) Added Latency

8-byte messages

LogP Parameters: gap

• The Gap is the delay between sending messages

• Gap may be larger than send overhead, e.g., – NIC may be busy processing the last

message.– Flow control on the network may

prevent the NIC from accepting the next message.

• If the gap is higher than overhead– Need to overlap communication with

computation, not other communication

P0

P1

osendgap

gap

Results: Gap and Overhead

6.7

1.2 0.2

8.2 9.5

95.0

1.6

6.510.3

17.8

7.84.6

0.0

5.0

10.0

15.0

20.0

use

c

Gap Send Overhead Receive Overhead

Send Overhead Over Time

• Overhead has not improved significantly– Lack of integration; lack of attention in software

Myrinet2K

Dolphin

T3E

Cenju4

CM5

CM5

Meiko

MeikoParagon

T3D

Dolphin

Myrinet

SP3

SCI

Compaq

NCube/2

T3E0

2

4

6

8

10

12

14

1990 1992 1994 1996 1998 2000 2002Year (approximate)

usec

Summary

Several categories–Application development

• Heart and cochlea-component simulation

–Application-level package• Generic immersed boundary method• Parallel for shared and distributed memory• Enables new larger-scale simulations; finer grid

–Titanium language and compiler• High level, high performance programming

–Communication runtime layer• Lightweight communication and benchmarks

applicationdata

softwaresystems

Collaborators• Susan Graham

• Paul Hilfinger

• Jim Demmel

• Alex Aiken• Phillip Colella• Peter McQuorquodale• Mike Welcome• Sabrina Merchant• Kaushik Datta

• Dan Bonachea• Rich Vuduc• Jason Duell• Paul Hargrove• Wei Chen

• Eun-Jin Im• Szu-Huey Chang• Carrie Fei• Ben Liblit• Robert Lin• Geoff Pike• Jimmy Su• Ellen Tsai• Siu Man Yau• Shaoib Kamil• Benjamin Lee• Rajesh Nishtala• Costin Iancu

http://titanium.cs.berkeley.edu/

top related