Go Parallel –Multicore Programming & Game Development
“The Importance of Multi-Core for Game Development at Crytek”
Martin Mittring
Lead Graphics Programmer
One for all, and all for one
1894, Maurice Leloir (1851-1940)
“The three Musketeers” Alexandre Dumas, published 1844
Crytek
• Main office located in Frankfurt
• 3 more studios (Kiev, Budapest, Sofia)
• English as company language
• 30+ nationalities
• CryEngine 1: PC
• CryEngine 2: PC, XBox360, PS3
Crytek Games
Far Cry Shipped 2004UbisoftPlatform: PC
Shipped 2007Electronic Arts
Platform: PCCrysis*
* Crysis, Crysis Wars and Crysis Warhead
Computer Game Business• Production:
– Game (game specific code)– Engine (reusable code, big, license business)– Tools (editor, resource compiler, plug-ins)– Art, Design (3D models, textures, levels)
• Milestones (CXP, demo, competitors, technology window)• Real-time, Multiple platforms, Multiplayer, Cheating• Technology is in flux:
– CPU, Fixed Function, Shaders, … , CPU again?• Marketing, Publisher, Most games don’t make money• Quality vs Performance• A few % performance win costs more and more work• Multithreading does not make it easier [Sutter05]
Challenges of future Engines
• Range of HW (x CPUs, GPU, Memory)• Large world• Production problems
(Cost, Time, Deadline)• Massive multiplayer• Quality expectations (HDTV,
Anti-aliasing, Stereo Displays,stable FPS, “Uncanny Valley”)
Multiprocessing in Game EnginesI. CPU
(e.g. Doom)
II. += Graphic cards (Fixed function)(e.g. Far Cry)
III. += GPU (Shaders)(e.g. Far Cry)
IV. += x CPU (MT performance critical parts)(e.g. Crysis)
V. Job based? Distributed? Special language?
How graphic cards get their speed?
• OpenGL/DirectX = Most successful multiprocessing language?
• SIMD, VLIW (but not always)• Optimized for that application (e.g. bilinear filter,
swizzled memory layout)• Latency hiding by processing
multiple data packetsin cyclic way
And now for somethingcompletely different
Triangle Rasterization
• Fixed function HW in GPU• Fast SW version still useful
(used in CryEngine to avoid rendering occluded objects)
• SW rasterization comes back (Larrabee)
• Good example how to parallelize
… lets take a closer look
Triangle
Given:
3 corners float2
2D array of pixels
Rasterizer
Where to
Rasterize
A clear definition is crucial:
center of the pixel is inside,
pixels on the border depend
on edge direction (Integer)
=> no holes
=> no edge overdraw
Classic
Approach
Scan-line algorithm
Special cases
Difficult to parallelize
Half-
Space
Convert three edges
to half-spaces,
inside the triangle
becomes behind 3 the
half-spaces
Bounding rectangle as input
Wastes performance outside
Half-
Space
Packets
Packet of multiple pixels
needs processing if partly
inside the triangle
Bounding rectangle as input
Three
Packet
Types
• Completely outside=> drop
• Completely inside=> process all pixels
• Partly inside=> per pixel test required
Half-
Space
+ Packets
• SIMD (Single InstructionMultiple Data) friendly
• Integer math (precise, fast)
• Less Branches
• Cache friendly
[Nicolas04]
And now for somethingcompletely different
Three rendering methods
• Rasterization (no slide)
• Ray-tracing (four slides)
• REYES (three slides)
• Comparison (one slide)
(not a complete or fair comparison)
Ray-tracing• Simple on CPU
(stack recursion)• Spatial Hierarchy to
speed up ray-casts• Computationally
intensive• Elegant solution• Inherently parallel• Not cache friendly• Hard to get
real-time
[Hauser02a]
Ray-tracing generations 1/3
• RecursiveRay-casting
Hard shadows,Perfect Reflections,Refractions,Occlusion solved, Adaptive AA
Ray-tracing generations 2/3
• DistributedRay-tracing[Carp84]
more samples per pixel to integrate:
Motion Blur, blurry reflections,soft shadows,DOF, Lightprobes
Ray-tracing generations 3/3
• GlobalIllumination
more samples per pixel to integrate:
Indirect lighting
© GNU Free Documentation Licensee
REYES, Renderman
Lucasfilm Computer Graphics (now Pixar)
used in Pixar and many other movies:
Wall-E, Ratatouille, Transformers, Cars, … [Pixar08]
1984
REYES• Recursive split into
dice-able primitives• Visible ones become
grids of micro-polygons(near pixel size)
• Shading is done ateach grid point or quad(Color and position)
• Renderman shader language (C like syntax)
REYES: Sampling of the micro-polygons
• A pixel consists of multiple jittered samples
• DOF, Motion-blur, AA(cheap as all shading is done)
• A-Buffer for OIT• Tiled rendering
for memory locality,to use less memory,to process in parallel
The three methods compared• Rasterization:
eye-rays, triangles, opaque, GPU• Ray-tracing:
perfect reflections/refractions, triangles are limiting,less real-time: distributed Ray-tracing, global illumination
• REYES:displacement maps, Level of Detail, Anti-aliasing
• Shading/Lighting is orthogonal to the method [Hauser02b]• Movie realism instead of physically correct• Ambient Occlusion, Hybrid solutions
Conclusion
• “The Importance of Multi-Core for Game Development at Crytek”
• PS3 (1*2 PPU + 7 SPU)XBox360 (3*2 PowerPC)Intel/AMD (1/2/4/?)
• Multi / Many Cores?
• My plan was to motivate and I used my favorite topic: Graphics
References• [Nicolas04] Advanced Rasterization, DevMaster.net
www.devmaster.net/forums/showthread.php?t=1884
• [Carp84] Distributed Ray Tracing, Siggraph 1984http://www.csie.ntu.edu.tw/~cyy/courses/rendering/05fall/assignments/pres/slides/DRT.ppt
• [Hauser02a] CGR4- Raytracing (German)http://www.vrvis.at/vr/cgr4/slides/CGR4-2002-RayTracing.pdf
• [Hauser02b] CGR4- Beleuchtung (German)http://www.vrvis.at/vr/cgr4/slides/CGR4-2002-Beleuchtung.pdf
• [Sutter05] Herb Sutter, The Free Lunch Is OverDr. Dobb's Journal, 30(3), March 2005http://www.gotw.ca/publications/concurrency-ddj.htm
• [Pixar08] Renderman Movieshttps://renderman.pixar.com/products/whatsrenderman/movies.html
www.crytek.com/inside/presentations