real-world gpgpu mark harris nvidia developer technology

14
Real-World GPGPU Mark Harris NVIDIA Developer Technology

Post on 15-Jan-2016

253 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Real-World GPGPU Mark Harris NVIDIA Developer Technology

Real-World GPGPU

Mark HarrisNVIDIA Developer Technology

Page 2: Real-World GPGPU Mark Harris NVIDIA Developer Technology

Copyright © NVIDIA Corporation 2004

GPGPU Research Promises Big Speedups

physically-based simulation

image processing

scientific computing

computer vision

computational finance

medical imaging

bioinformatics

databases and data mining

sorting

ray tracing

Researchers have tried many applications on GPUs

Research results promise big speedupsLU-GPU dense linear system solver: 10x CPU (UNC)

GPUTeraSort: 2006 Indy PennySort Champion (UNC)

ClawHMMr streaming sequence search: 5-20x CPU (Stanford)

Page 3: Real-World GPGPU Mark Harris NVIDIA Developer Technology

Copyright © NVIDIA Corporation 2004

Raw Data Promises High Perf, Too

• GPU Observed GFLOPS• CPU Theoretical peak GFLOPS

2005 2006

NVIDIA GPU Pixel Shader GFLOPS

Page 4: Real-World GPGPU Mark Harris NVIDIA Developer Technology

Copyright © NVIDIA Corporation 2004

Real-World Performance Gains

Do research results and high peak performance translate to real application speedups?

Real-World ApplicationsMedical Imaging (Mercury Computer Systems)

Electromagnetic Simulations (NVIDIA Partner)

Game Physics (Havok)

Page 5: Real-World GPGPU Mark Harris NVIDIA Developer Technology

© 2006 Mercury Computer Systems, Inc.

Digital Breast Tomosynthesis (DBT)

100X reconstruction speed-up with NVIDIA Quadro FX 4500 GPU From hours to minutes Facilitates clinical use

Improved diagnostic value Clearer images Fewer obstructions Earlier detection

Axis of rotation

Compressed breast

Digital detector

X-Ray tube

Compression paddle

11 Low-dose X-ray ProjectionsExtremely Computationally Intense Reconstruction

Advanced Imaging Solution of the Year

“Mercury reduced reconstruction time from 5 hours to 5 minutes, making DBT clinically viable.…among 70 women diagnosed with breast cancer, DBT pinpointed 7 cases not seen with mammography”

Pioneering DBT work at Massachusetts General Hospital

Page 6: Real-World GPGPU Mark Harris NVIDIA Developer Technology

Copyright © NVIDIA Corporation 2004

0

100

200

300

400

500

600

700

800

Performance (Mcells/s)

Electromagnetic Simulation

3D Finite-Difference and Finite-ElementModeling of:

Cell phone irradiationMRI Design / ModelingPrinted Circuit Boards Radar Cross Section (Military)

Computationally Intensive!Large speedups with Quadro GPUs

Pacemaker with Transmit Antenna

Commercial, Optimized, Mature Software

Single CPU, 3.x GHz

5X5X

10X10X

1X1X

18X18X

4210

# Quadro FX 4500 GPUs

Page 7: Real-World GPGPU Mark Harris NVIDIA Developer Technology

Copyright © NVIDIA Corporation 2004

Havok FX Physics on NVIDIA GPUs

Physics-based effects on a massive scale10,000s of objects at high frame rates

Rigid bodies

Particles

Fluids

Cloth

and more

Page 8: Real-World GPGPU Mark Harris NVIDIA Developer Technology

Copyright © NVIDIA Corporation 2004

Dedicated Performance For Physics

Performance Measurement15,000 Boulder Scene

FrameRate

CPU PhysicsDual Core P4EE 955 - 3.46GHz

GeForce 7900GTX SLICPU Multi-threading enabled

GPU PhysicsDual Core P4EE 955 - 3.46GHz

GeForce 7900GTX SLICPU Multi-threading enabled

6.2 fps

64.5 fps

Page 9: Real-World GPGPU Mark Harris NVIDIA Developer Technology

Copyright © NVIDIA Corporation 2004

GPGPU Performance Strategies

Choose applications with high Arithmetic IntensityArithmetic Intensity = Arithmetic / BandwidthGame physics top kernels = very high A.I.

> 1500 cycles per collision, ~100 texture fetches

Leverage strengths of all processors in the systemGPUs: data-parallel computationCPUs: sequential computationMulti-core CPUs: task-parallel computation

Find the parallelism in the applicationData dependencies can make problem appear sequentialDivide into batches of independent parallelism

Page 10: Real-World GPGPU Mark Harris NVIDIA Developer Technology

Copyright © NVIDIA Corporation 2004

Rigid Body Dynamics Overview

3 phases to every simulation time stepIntegrate positions and velocities

Detect collisions

Resolve collisions

Integration is very parallelNo dependencies between objects: use the GPU

Detecting collisions is basically scene traversalCPU is good at this – use it

Resolving collisions is a tricky oneIs it parallel enough for the GPU?

Page 11: Real-World GPGPU Mark Harris NVIDIA Developer Technology

Copyright © NVIDIA Corporation 2004

Is Game Physics A Data Parallel Task?

Solve Collisions

NewVelocities

Contacts&

Velocities

Body

Body

Body

Body

Body

Body Body

Body

Slide courtesy of Andrew Bond, Havok

Page 12: Real-World GPGPU Mark Harris NVIDIA Developer Technology

Copyright © NVIDIA Corporation 2004

Is Game Physics A Data Parallel Task?

Solve Collisions

NewVelocities

Contacts&

Velocities

Body

Body

Body

Body

Body

Body Body

Body

Slide courtesy of Andrew Bond, Havok

Page 13: Real-World GPGPU Mark Harris NVIDIA Developer Technology

Copyright © NVIDIA Corporation 2004

Is Game Physics A Data Parallel Task?

Solve Collisions

NewVelocities

Contacts

Solve link 1

Solve link 2

Solve link N

Solve link 1

Solve link 2

Solve link N

Solve link 1

Solve link 2

Solve link N

Batch 1 Batch 2 Batch M

Slide courtesy of Andrew Bond, Havok

Page 14: Real-World GPGPU Mark Harris NVIDIA Developer Technology

Copyright © NVIDIA Corporation 2004

Conclusion

Real-World GPGPU is just beginning!

[email protected]

http://developer.nvidia.com