g dgame devel’loper’s perspppective on opencl - hot … · g dgame devel’loper’s...

44
G D l G D l G ame D eve l oper s G ame D eve l oper s Pers p ective on O p enCL Pers p ective on O p enCL Eric Schenk, EA A t 2009 ©Copyright Electronic Arts, 2009 - Page 1 A ugus t 2009

Upload: dodan

Post on 06-Jul-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: G DGame Devel’loper’s Perspppective on OpenCL - Hot … · G DGame Devel’loper’s Perspppective on OpenCL Eric ... f lih Games have 100’s or 1000 ... of the skinned model-Integrator

G D l ’ G D l ’ Game Developer’s Game Developer’s Perspect ive on OpenCLPerspect ive on OpenCLp pp p

Eric Schenk, EAA t 2009

© Copyright Electronic Arts, 2009 - Page 1

August 2009

Page 2: G DGame Devel’loper’s Perspppective on OpenCL - Hot … · G DGame Devel’loper’s Perspppective on OpenCL Eric ... f lih Games have 100’s or 1000 ... of the skinned model-Integrator

Motivat ionMotivat ion

© Copyright Electronic Arts, 2009 - Page 2

Page 3: G DGame Devel’loper’s Perspppective on OpenCL - Hot … · G DGame Devel’loper’s Perspppective on OpenCL Eric ... f lih Games have 100’s or 1000 ... of the skinned model-Integrator

Motivat ion• Supports a variety of compute resources - CPUs, GPUs, SPUs, accelerators- Unifies programming for devices that have very different environmentsUnifies programming for devices that have very different environments- Provides uniform interface to non-fixed platforms (e.g. PC/Mac)

© Copyright Electronic Arts, 2009 - Page 3

Page 4: G DGame Devel’loper’s Perspppective on OpenCL - Hot … · G DGame Devel’loper’s Perspppective on OpenCL Eric ... f lih Games have 100’s or 1000 ... of the skinned model-Integrator

Motivat ion• Open standard- Many vendors will support directly- Potential for 3rd party implementations if no vendor supportPotential for 3rd party implementations if no vendor support- Embedded platform support in spec

© Copyright Electronic Arts, 2009 - Page 4

Page 5: G DGame Devel’loper’s Perspppective on OpenCL - Hot … · G DGame Devel’loper’s Perspppective on OpenCL Eric ... f lih Games have 100’s or 1000 ... of the skinned model-Integrator

Motivat ion• Concurrent programming model- Command queue(s) per device with support for

dependencies across queuesdependencies across queues- Data parallel and SIMD support in a portable notation, - superior to intrinsics

Low level programming model minimalist abstractions- Low-level programming model, minimalist abstractions

© Copyright Electronic Arts, 2009 - Page 5

Page 6: G DGame Devel’loper’s Perspppective on OpenCL - Hot … · G DGame Devel’loper’s Perspppective on OpenCL Eric ... f lih Games have 100’s or 1000 ... of the skinned model-Integrator

How to apply How to apply OpenCLOpenCL to gamesto games

© Copyright Electronic Arts, 2009 - Page 6

Page 7: G DGame Devel’loper’s Perspppective on OpenCL - Hot … · G DGame Devel’loper’s Perspppective on OpenCL Eric ... f lih Games have 100’s or 1000 ... of the skinned model-Integrator

How to apply OpenCL to games• Don’t try speed up everything- Create new content/features that use the excess compute capacity

S • Some cases are easy- We already parallelize them

• Some cases are harder• Some cases are harder- But with OpenCL it will be easier to try them.

© Copyright Electronic Arts, 2009 - Page 7

Page 8: G DGame Devel’loper’s Perspppective on OpenCL - Hot … · G DGame Devel’loper’s Perspppective on OpenCL Eric ... f lih Games have 100’s or 1000 ... of the skinned model-Integrator

Dodging Amdahl’s Law• Amdahl’s Law- “The speedup of a program using multiple processors is limited by the

sequential fraction of the program” sequential fraction of the program

• So, massively parallel hardware has limited benefit?

• No• No- The game already runs on today’s hardware.- More compute power => add more content/features.

G h 100’ 1000’ f l i h - Games have 100’s or 1000’s of algorithms to target.

• OpenCL makes the coding easier

© Copyright Electronic Arts, 2009 - Page 8

Page 9: G DGame Devel’loper’s Perspppective on OpenCL - Hot … · G DGame Devel’loper’s Perspppective on OpenCL Eric ... f lih Games have 100’s or 1000 ... of the skinned model-Integrator

The “Easy” cases• Rendering - Rasterization / shading already runs on graphics hardware- Visibility cullingVisibility culling- Procedural geometry

• Codec decompressionp- Animation, audio, video, etc.

• Animation blending

• Audio mixing

• Rigid body physics integrationg y p y g

• Some collision calculations

• Etc

© Copyright Electronic Arts, 2009 - Page 9

• Etc.

Page 10: G DGame Devel’loper’s Perspppective on OpenCL - Hot … · G DGame Devel’loper’s Perspppective on OpenCL Eric ... f lih Games have 100’s or 1000 ... of the skinned model-Integrator

Then it gets harder…• AI- Path finding- Search

P tt t hi- Pattern matching- Fuzzy logic / neural nets for AI- Massive scale AI behavior on multiple actors- Speculative AI paths- Speculative AI paths

• Animation- High level motion planning

Detailed crowd simulation with individual behaviors- Detailed crowd simulation with individual behaviors

• Physics- Broad phase collision systems- Character to character interaction- High quality liquid/smoke.

• Asset creation

© Copyright Electronic Arts, 2009 - Page 10

- Landscape, vegetation, architecture, etc.

Page 11: G DGame Devel’loper’s Perspppective on OpenCL - Hot … · G DGame Devel’loper’s Perspppective on OpenCL Eric ... f lih Games have 100’s or 1000 ... of the skinned model-Integrator

Early ExperiencesEarly Experiences

© Copyright Electronic Arts, 2009 - Page 11

Page 12: G DGame Devel’loper’s Perspppective on OpenCL - Hot … · G DGame Devel’loper’s Perspppective on OpenCL Eric ... f lih Games have 100’s or 1000 ... of the skinned model-Integrator

Early Experiences• Too early to tell you about shipped game code

• We have done some feasibility experiments

• Skate 2 Cloth Experiment- Pulls the character skinning and cloth physics out of EA’s Skate 2 and

embeds it into a stand alone demoembeds it into a stand alone demo

• Goal:- Exercise OpenCL on “real” code- Exercise OpenCL on real code- Extract game subsystem and port key algorithms to OpenCL- Leave intact as much original code and API as possible

© Copyright Electronic Arts, 2009 - Page 12

Page 13: G DGame Devel’loper’s Perspppective on OpenCL - Hot … · G DGame Devel’loper’s Perspppective on OpenCL Eric ... f lih Games have 100’s or 1000 ... of the skinned model-Integrator

What does it do?• Play back recorded skeleton poses

• Skin character model to pose

• Apply cloth physics to portion of the skinned model- Integrator for gravity

Particle to particle spring system- Particle-to-particle spring system- Constrained to underlying skinned model

• Render via OpenGLRender via OpenGL

© Copyright Electronic Arts, 2009 - Page 13

Page 14: G DGame Devel’loper’s Perspppective on OpenCL - Hot … · G DGame Devel’loper’s Perspppective on OpenCL Eric ... f lih Games have 100’s or 1000 ... of the skinned model-Integrator

Demo Task Graph• Simple sequential execution graph

• In-order, data parallel tasks

• Render inputs are double buffered to OpenCL tasks could, in theory, begin prior to render completion

© Copyright Electronic Arts, 2009 - Page 14

Page 15: G DGame Devel’loper’s Perspppective on OpenCL - Hot … · G DGame Devel’loper’s Perspppective on OpenCL Eric ... f lih Games have 100’s or 1000 ... of the skinned model-Integrator

Enqueue Instead of Execute• In original code each box represents a function call

• Demo abstracts OpenCL kernel invocations as functors- Functors take an event parameter to be dependent on, and return their

own eventown event- Functors enqueue a kernel—upon return the kernel may not have

completed or even begun

© Copyright Electronic Arts, 2009 - Page 15

Page 16: G DGame Devel’loper’s Perspppective on OpenCL - Hot … · G DGame Devel’loper’s Perspppective on OpenCL Eric ... f lih Games have 100’s or 1000 ... of the skinned model-Integrator

EnqueueEnqueue Instead of ExecuteInstead of Executef• A function like thisthis->TCVIntegrate(dt, deltaPos);

Became an enqueue functor call like this• Became an enqueue functor call like this

if (mIntegratorFunc != NULL)event = (*mIntegratorFunc)(event, dt, deltaPos);( g )( , , );

• The original code remains as a fallback, slightly modifiedelse {

this->LockBuffers();this->TCVIntegrate(dt, deltaPos);this->UnlockBuffers();this >UnlockBuffers();

}

© Copyright Electronic Arts, 2009 - Page 16

Page 17: G DGame Devel’loper’s Perspppective on OpenCL - Hot … · G DGame Devel’loper’s Perspppective on OpenCL Eric ... f lih Games have 100’s or 1000 ... of the skinned model-Integrator

Memory Objects• Original data was organized in aligned arrays of structs

• cl_mem objects were created for each array- CL_MEM_USE_HOST_PTR to avoid additional allocations and

copying (when using CPU device)- OpenGL vertex buffers allocated by GL and bound to OpenCL via p y p

GL/CL interop

© Copyright Electronic Arts, 2009 - Page 17

Page 18: G DGame Devel’loper’s Perspppective on OpenCL - Hot … · G DGame Devel’loper’s Perspppective on OpenCL Eric ... f lih Games have 100’s or 1000 ... of the skinned model-Integrator

Data Flow Through Kernels

© Copyright Electronic Arts, 2009 - Page 18

Page 19: G DGame Devel’loper’s Perspppective on OpenCL - Hot … · G DGame Devel’loper’s Perspppective on OpenCL Eric ... f lih Games have 100’s or 1000 ... of the skinned model-Integrator

Complete Command Queue• Buffer write used to move data into pose buffer

• Acquire / release to give OpenCL access to GL buffers

© Copyright Electronic Arts, 2009 - Page 19

Page 20: G DGame Devel’loper’s Perspppective on OpenCL - Hot … · G DGame Devel’loper’s Perspppective on OpenCL Eric ... f lih Games have 100’s or 1000 ... of the skinned model-Integrator

Kernels• Skinning- Generates vertex buffer and drivers from pose

I t t• Integrator- Acceleration due to gravity

• Distance constraint• Distance constraint- Springs between cloth particles

• Driver constraintDriver constraint- Keeps cloth near underlying ‘skin’

• Writeback- Output cloth positions to vertex buffer

© Copyright Electronic Arts, 2009 - Page 20

Page 21: G DGame Devel’loper’s Perspppective on OpenCL - Hot … · G DGame Devel’loper’s Perspppective on OpenCL Eric ... f lih Games have 100’s or 1000 ... of the skinned model-Integrator

Vectorizat ion• Wrote two variants of each kernel: Scalar and vector- Some hardware does much better with SIMD code

• “Structure of Arrays” (SoA) style mathMemory data layout unchanged rearranged at load/store- Memory data layout unchanged, rearranged at load/store- AoS float4 contains “xyzw”; SoA float4 contains “xxxx”

• Two key techniques employed: Transpose and select

© Copyright Electronic Arts, 2009 - Page 21

Page 22: G DGame Devel’loper’s Perspppective on OpenCL - Hot … · G DGame Devel’loper’s Perspppective on OpenCL Eric ... f lih Games have 100’s or 1000 ... of the skinned model-Integrator

4x4 Transpose• Used to convert between AoS and SoA

• Four float4 AoS values loaded into one float16

• Post transpose the four float4 parts are SoA

float16 transpose (float16 m) {float16 t;t even = m lo; t odd = m hi;t.even = m.lo; t.odd = m.hi;m.even = t.lo; m.odd = t.hi;return m;

}

© Copyright Electronic Arts, 2009 - Page 22

Page 23: G DGame Devel’loper’s Perspppective on OpenCL - Hot … · G DGame Devel’loper’s Perspppective on OpenCL Eric ... f lih Games have 100’s or 1000 ... of the skinned model-Integrator

Select vs. Branch• Branching is bad, for many reasons

• select(a,b,c) efficiently chooses between “a” and “b” based on “c” element wise fashionc element-wise fashion

• Comparisons like isgreater(a,b) are set up to output to “c”

© Copyright Electronic Arts, 2009 - Page 23

Page 24: G DGame Devel’loper’s Perspppective on OpenCL - Hot … · G DGame Devel’loper’s Perspppective on OpenCL Eric ... f lih Games have 100’s or 1000 ... of the skinned model-Integrator

Scalar IntegratorScalar Integratorid f i t t (i t Id fl t4 fl t4void perform_integrator (int pIdx, float4 vsr, float4 acc,

float dt, __global Particle* particles,__global short* indices, __global IntegratorState *iState){{

float4 curPos, curPrevPos, nextPos;curPos = particles[pIdx].mPos;curPrevPos = particles[pIdx].mPrevPos;float vsr = (1 0f ctp >mVerticalSpeedDampening);float vsr = (1.0f - ctp->mVerticalSpeedDampening);// TIME CORRECTED VERLET // xi+1 = xi + (xi - xi-1)*(dti/dti-1) + a*dti* dtiif ((indices[pIdx]) >= 0 && (curPrevPos.w > 0.0f)) {

nextPos = curPos;nextPos -= curPrevPos;nextPos *= dt / iState->mLastDT;nextPos y *= vsr;nextPos.y *= vsr;nextPos += acc;particles[pIdx].mPrevPos = curPos;particles[pIdx].mPos = curPos + nextPos;

© Copyright Electronic Arts, 2009 - Page 24

}}

Page 25: G DGame Devel’loper’s Perspppective on OpenCL - Hot … · G DGame Devel’loper’s Perspppective on OpenCL Eric ... f lih Games have 100’s or 1000 ... of the skinned model-Integrator

Vector IntegratorVector Integratorid f i (void perform_vector_integrator (

float4 vdt_ratio,float4 acc,int numparticles,p ,int pIdx,__global Particle *ptrParticle,__global uint *mapped)

{{// load 4 particle positions and previous positions// transpose particles from Aos -> SoA// extract locked flags for particles// TIME CORRECTED VERLET:// xi+1 = xi + (xi - xi-1) * (dti / dti-1)// + a * dti * dti// select between unchanged (if locked)// select between unchanged (if locked)// and new location (if not locked)// transpose SoA -> AoS// store particles back

}

© Copyright Electronic Arts, 2009 - Page 25

}

Page 26: G DGame Devel’loper’s Perspppective on OpenCL - Hot … · G DGame Devel’loper’s Perspppective on OpenCL Eric ... f lih Games have 100’s or 1000 ... of the skinned model-Integrator

Vector IntegratorVector Integrator// l d i l i i d i i i// load particle position and previous positionfloat16 curPos, curPrevPos;curPos.s0123 = ptrParticle[0].mPos;curPrevPos s0123 = ptrParticle[0] mPrevPos;curPrevPos.s0123 = ptrParticle[0].mPrevPos;curPos.s4567 = ptrParticle[1].mPos;curPrevPos.s4567 = ptrParticle[1].mPrevPos;curPos.s89ab = ptrParticle[2].mPos;curPos.s89ab ptrParticle[2].mPos;curPrevPos.s89ab = ptrParticle[2].mPrevPos;curPos.scdef = ptrParticle[3].mPos;curPrevPos.scdef = ptrParticle[3].mPrevPos;

// transpose particles from Aos -> SoAcurPos = transpose(curPos);curPrevPos = transpose(curPrevPos);

// extract locked flags for particlesi t4 k ( i t4)i t ( P P d f

© Copyright Electronic Arts, 2009 - Page 26

uint4 mask = (uint4)isgreater(curPrevPos.scdef,(float4)0.0f) & ~ComputeMappedMask(mapped);

Page 27: G DGame Devel’loper’s Perspppective on OpenCL - Hot … · G DGame Devel’loper’s Perspppective on OpenCL Eric ... f lih Games have 100’s or 1000 ... of the skinned model-Integrator

Vector IntegratorVector Integrator//// TIME CORRECTED VERLET float4 next_x = ((curPos.s0123 - curPrevPos.s0123) *

vdt_ratio + (curPos.s0123 + acc.x));float4 next y = ((curPos.s4567 - curPrevPos.s4567) * _y (( )

vdt_ratio + (curPos.s4567 + acc.y));float4 next_z = ((curPos.s89ab - curPrevPos.s89ab) *

vdt_ratio + (curPos.s89ab + acc.z));

// select between unchanged (if locked)// and new location (if not locked)curPrevPos.s0123 = select(

curPrevPos.s0123, curPos.s0123, mask);curPrevPos.s4567 = select(

curPrevPos.s4567, curPos.s4567, mask);curPrevPos.s89ab = select(curPrevPos.s89ab select(

curPrevPos.s89ab, curPos.s89ab, mask);curPos.s0123 = select(curPos.s0123, next_x, mask);curPos.s4567 = select(curPos.s4567, next_y, mask);

89 b l ( 89 b k)

© Copyright Electronic Arts, 2009 - Page 27

curPos.s89ab = select(curPos.s89ab, next_z, mask);

Page 28: G DGame Devel’loper’s Perspppective on OpenCL - Hot … · G DGame Devel’loper’s Perspppective on OpenCL Eric ... f lih Games have 100’s or 1000 ... of the skinned model-Integrator

Vector IntegratorVector Integrator//// transpose SoA -> AoScurPos = transpose(curPos); curPrevPos = transpose(curPrevPos);

// store particles backptrParticle[0].mPos = curPos.s0123;

t P ti l [0] P PptrParticle[0].mPrevPos = curPrevPos.s0123;ptrParticle[1].mPos = curPos.s4567;

ptrParticle[1].mPrevPos =ptrParticle[1].mPrevPos curPrevPos.s4567;ptrParticle[2].mPos = curPos.s89ab;

ptrParticle[2].mPrevPos = P P 89 bcurPrevPos.s89ab;

ptrParticle[3].mPos = curPos.scdef;ptrParticle[3].mPrevPos =

curPrevPos.scdef;

© Copyright Electronic Arts, 2009 - Page 28

curPrevPos.scdef;

Page 29: G DGame Devel’loper’s Perspppective on OpenCL - Hot … · G DGame Devel’loper’s Perspppective on OpenCL Eric ... f lih Games have 100’s or 1000 ... of the skinned model-Integrator

Work-I tems and Workgroups• Data parallel model in OpenCL is based on work-items- Each work-item is given its own index (in up to three dimensions)- Work-items are organized into uniformly sized “workgroups”

W k i i li it d b d i d k l- Workgroup size is limited by device and kernel

• Work-items in a workgroup execute in parallel and share- Local memoryy- Barriers- Fences

© Copyright Electronic Arts, 2009 - Page 29

Page 30: G DGame Devel’loper’s Perspppective on OpenCL - Hot … · G DGame Devel’loper’s Perspppective on OpenCL Eric ... f lih Games have 100’s or 1000 ... of the skinned model-Integrator

Per Kernel Index Space• This demo’s kernels are all in one-dimensional spaces

• Each kernel uses its own space- Skinning: Complete vertex array- Integrator: Particle array- Driver constraint: Driver arrayy- Distance constraint: Constraints array- Writeback: Clothed portion of vertex array

© Copyright Electronic Arts, 2009 - Page 30

Page 31: G DGame Devel’loper’s Perspppective on OpenCL - Hot … · G DGame Devel’loper’s Perspppective on OpenCL Eric ... f lih Games have 100’s or 1000 ... of the skinned model-Integrator

Distance Constraint: Limited Paralle lism

• Each distance constraint modifies two particles- Modifying a particle from multiple constraints concurrently gives

incorrect results

• Original constraints organized into “octets”- Eight-at-once is far too few to keep GPU busy- Eight-at-once is far too few to keep GPU busy

© Copyright Electronic Arts, 2009 - Page 31

Page 32: G DGame Devel’loper’s Perspppective on OpenCL - Hot … · G DGame Devel’loper’s Perspppective on OpenCL Eric ... f lih Games have 100’s or 1000 ... of the skinned model-Integrator

CPU Device PerformanceCPU Device Performance

S l DP V t DP Host Scalar Task

Scalar DP (8 Cores)

Vector DP (8 Cores)

Overall 1.00 2.72 17.03 17.27

Skinning 1.00 2.98 20.98 20.98* Integrator 1.00 1.53 1.48 1.10 teg ato 00 53 8 0Distance 1.00 1.34 2.54 2.69

Driver 1.00 1.14 5.58 8.83 WriteBack 1 00 1 01 7 26 7 26* WriteBack 1.00 1.01 7.26 7.26

*No vector version available

© Copyright Electronic Arts, 2009 - Page 32

Page 33: G DGame Devel’loper’s Perspppective on OpenCL - Hot … · G DGame Devel’loper’s Perspppective on OpenCL Eric ... f lih Games have 100’s or 1000 ... of the skinned model-Integrator

GPU Device PerformanceGPU Device Performance

B t CPU U ti i d Host

Best CPU (8 Cores)

Unoptimized GTX285

Overall 1.00 17.27 4.11

Skinning 1.00 20.98 4.39 Integrator 1.00 1.53 1.10 teg ato 00 53 0Distance 1.00 2.69 0.74

Driver 1.00 8.83 3.24 WriteBack 1 00 7 26 17 69 WriteBack 1.00 7.26 17.69

© Copyright Electronic Arts, 2009 - Page 33

Page 34: G DGame Devel’loper’s Perspppective on OpenCL - Hot … · G DGame Devel’loper’s Perspppective on OpenCL Eric ... f lih Games have 100’s or 1000 ... of the skinned model-Integrator

Optimizat ions – Algorithmic• Each distance constraint modifies two particles- Modifying a particle from multiple constraints concurrently gives

incorrect resultsincorrect results.- Original algorithm allows for at most eight particles in a work-group.

This cripples GPU performance.

• Reordering constraints to maximize the workgroup size (7x) improved kernel performance dramatically

• Limited by algorithm,- Alternatives should be investigated

© Copyright Electronic Arts, 2009 - Page 34

Page 35: G DGame Devel’loper’s Perspppective on OpenCL - Hot … · G DGame Devel’loper’s Perspppective on OpenCL Eric ... f lih Games have 100’s or 1000 ... of the skinned model-Integrator

Optimizat ions – Work per Task• Larger simultaneous data set sizes provide the GPU with substantially more work

CPU benefits as well but drops off with larger data sets- CPU benefits as well, but drops off with larger data sets

• Real game: Process all characters as a single batch

D Si l t li ti (25 )• Demo: Simple geometry replication (25x)

© Copyright Electronic Arts, 2009 - Page 35

Page 36: G DGame Devel’loper’s Perspppective on OpenCL - Hot … · G DGame Devel’loper’s Perspppective on OpenCL Eric ... f lih Games have 100’s or 1000 ... of the skinned model-Integrator

Optimizat ions – Bandw idth• Memory access is crucial in bandwidth limited kernels (and most will be bandwidth limited)

• GL/CL integration

• CL_MEM_COPY_HOST_PTR for GPU

• Current GPU memory controllers are optimized for a y pspecific set of memory access patterns

© Copyright Electronic Arts, 2009 - Page 36

Page 37: G DGame Devel’loper’s Perspppective on OpenCL - Hot … · G DGame Devel’loper’s Perspppective on OpenCL Eric ... f lih Games have 100’s or 1000 ... of the skinned model-Integrator

GPU Memory and Execut ion Optimizat ion• Burst reads from memory are essential

• Bursts only happen on sequential accesses

• Bursts are created by coalescing smaller reads

• Can only coalesce 4-, 8-, or 16-byte readsy , , y

• Coalesces across work-items in a workgroup

• Solution: Optimized transfer from global to local memory then operate on local memorymemory, then operate on local memory

• This pattern must currently be explicitly coded for

© Copyright Electronic Arts, 2009 - Page 37

Page 38: G DGame Devel’loper’s Perspppective on OpenCL - Hot … · G DGame Devel’loper’s Perspppective on OpenCL Eric ... f lih Games have 100’s or 1000 ... of the skinned model-Integrator

GPU Memory and Execut ion Optimizat ionGPU Memory and Execut ion Optimizat ion

Local memoryKernel barriersKernel fences

4 5 6 70 1 2 3 8 9 10 114

0

5

1

6

2

7

3

0

0

1

1

2

2

3

3

8

0

9

1

10

2

11

3

Buffer of 80-byte vertex structs(5 x 16-byte float4)

© Copyright Electronic Arts, 2009 - Page 38

Page 39: G DGame Devel’loper’s Perspppective on OpenCL - Hot … · G DGame Devel’loper’s Perspppective on OpenCL Eric ... f lih Games have 100’s or 1000 ... of the skinned model-Integrator

GPU Memory and Execut ion Optimizat ionGPU Memory and Execut ion Optimizat ion

Local memoryKernel barriersKernel fences

4 5 6 70 1 2 3 8 9 10 114

0

5

1

6

2

7

3

0

0

1

1

2

2

3

3

8

0

9

1

10

2

11

3

Buffer of 80-byte vertex structs(5 x 16-byte float4)

© Copyright Electronic Arts, 2009 - Page 39

Page 40: G DGame Devel’loper’s Perspppective on OpenCL - Hot … · G DGame Devel’loper’s Perspppective on OpenCL Eric ... f lih Games have 100’s or 1000 ... of the skinned model-Integrator

GPU Memory and Execut ion Optimizat ionGPU Memory and Execut ion Optimizat ion

Local memoryKernel barriersKernel fences

4 5 6 70 1 2 3 8 9 10 114

0

5

1

6

2

7

3

0

0

1

1

2

2

3

3

8

0

9

1

10

2

11

3

Buffer of 80-byte vertex structs(5 x 16-byte float4)

© Copyright Electronic Arts, 2009 - Page 40

Page 41: G DGame Devel’loper’s Perspppective on OpenCL - Hot … · G DGame Devel’loper’s Perspppective on OpenCL Eric ... f lih Games have 100’s or 1000 ... of the skinned model-Integrator

GPU Memory and Execut ion Optimizat ionGPU Memory and Execut ion Optimizat ion

Local memoryKernel barriersKernel fences

4 5 6 70 1 2 3 8 9 10 114

0

5

1

6

2

7

3

0

0

1

1

2

2

3

3

8

0

9

1

10

2

11

3

Buffer of 80-byte vertex structs(5 x 16-byte float4)

© Copyright Electronic Arts, 2009 - Page 41

Page 42: G DGame Devel’loper’s Perspppective on OpenCL - Hot … · G DGame Devel’loper’s Perspppective on OpenCL Eric ... f lih Games have 100’s or 1000 ... of the skinned model-Integrator

GPU Memory and Execut ion Optimizat ionGPU Memory and Execut ion Optimizat ion

Local memoryKernel barriersKernel fences

4 5 6 70 1 2 3 8 9 10 114

0

5

1

6

2

7

3

0

0

1

1

2

2

3

3

8

0

9

1

10

2

11

3

Buffer of 80-byte vertex structs(5 x 16-byte float4)

© Copyright Electronic Arts, 2009 - Page 42

Page 43: G DGame Devel’loper’s Perspppective on OpenCL - Hot … · G DGame Devel’loper’s Perspppective on OpenCL Eric ... f lih Games have 100’s or 1000 ... of the skinned model-Integrator

GPU Device PerformanceGPU Device Performance

HostBest CPU (8 Cores)

Unoptimized GTX285

Optimized GTX285

Overall 1 17.27 4.11 45.15

Skinning 1 20.98 4.39 83.56Integrator 1 1 53 1 1 4 6Integrator 1 1.53 1.1 4.6Distance 1 2.69 0.74 1.73Driver 1 8.83 3.24 14.45W it B k 1 7 26 17 69 56 6WriteBack 1 7.26 17.69 56.6

© Copyright Electronic Arts, 2009 - Page 43

Page 44: G DGame Devel’loper’s Perspppective on OpenCL - Hot … · G DGame Devel’loper’s Perspppective on OpenCL Eric ... f lih Games have 100’s or 1000 ... of the skinned model-Integrator

Performance SummaryPerformance Summary• Performance tuning is

essential

• Performance Performance characteristics are not hidden by abstraction layerlayer

• Expect to write multiple variations and choose empirically at runtime

© Copyright Electronic Arts, 2009 - Page 44