developing software for the ps3/cell platform
DESCRIPTION
Developing Software for the PS3/Cell Platform. Mike Acton [email protected]. The Cell Processor. A quick introduction to the hardware. The Cell is. ... not a magic bullet. ... not a radical change in high-performance design. ... fun to program for!. Programming for Games. - PowerPoint PPT PresentationTRANSCRIPT
![Page 2: Developing Software for the PS3/Cell Platform](https://reader031.vdocuments.net/reader031/viewer/2022013004/56815884550346895dc5e4cd/html5/thumbnails/2.jpg)
![Page 3: Developing Software for the PS3/Cell Platform](https://reader031.vdocuments.net/reader031/viewer/2022013004/56815884550346895dc5e4cd/html5/thumbnails/3.jpg)
The Cell Processor
A quick introduction to the hardware
![Page 4: Developing Software for the PS3/Cell Platform](https://reader031.vdocuments.net/reader031/viewer/2022013004/56815884550346895dc5e4cd/html5/thumbnails/4.jpg)
![Page 5: Developing Software for the PS3/Cell Platform](https://reader031.vdocuments.net/reader031/viewer/2022013004/56815884550346895dc5e4cd/html5/thumbnails/5.jpg)
![Page 6: Developing Software for the PS3/Cell Platform](https://reader031.vdocuments.net/reader031/viewer/2022013004/56815884550346895dc5e4cd/html5/thumbnails/6.jpg)
![Page 7: Developing Software for the PS3/Cell Platform](https://reader031.vdocuments.net/reader031/viewer/2022013004/56815884550346895dc5e4cd/html5/thumbnails/7.jpg)
![Page 8: Developing Software for the PS3/Cell Platform](https://reader031.vdocuments.net/reader031/viewer/2022013004/56815884550346895dc5e4cd/html5/thumbnails/8.jpg)
![Page 9: Developing Software for the PS3/Cell Platform](https://reader031.vdocuments.net/reader031/viewer/2022013004/56815884550346895dc5e4cd/html5/thumbnails/9.jpg)
The Cell is...
• ... not a magic bullet.• ... not a radical change in high-performance design.• ... fun to program for!
![Page 10: Developing Software for the PS3/Cell Platform](https://reader031.vdocuments.net/reader031/viewer/2022013004/56815884550346895dc5e4cd/html5/thumbnails/10.jpg)
Programming for Games
Quick background of game development
![Page 11: Developing Software for the PS3/Cell Platform](https://reader031.vdocuments.net/reader031/viewer/2022013004/56815884550346895dc5e4cd/html5/thumbnails/11.jpg)
Performance
Mostly "soft" real-time (60Hz or 30Hz)
![Page 12: Developing Software for the PS3/Cell Platform](https://reader031.vdocuments.net/reader031/viewer/2022013004/56815884550346895dc5e4cd/html5/thumbnails/12.jpg)
Languages and Compilers
![Page 13: Developing Software for the PS3/Cell Platform](https://reader031.vdocuments.net/reader031/viewer/2022013004/56815884550346895dc5e4cd/html5/thumbnails/13.jpg)
Other Processers and I/O
GPU, Blu-ray, HDD, Peripherals, Network
![Page 14: Developing Software for the PS3/Cell Platform](https://reader031.vdocuments.net/reader031/viewer/2022013004/56815884550346895dc5e4cd/html5/thumbnails/14.jpg)
Many Assets
Art, Animation, Audio
![Page 15: Developing Software for the PS3/Cell Platform](https://reader031.vdocuments.net/reader031/viewer/2022013004/56815884550346895dc5e4cd/html5/thumbnails/15.jpg)
"Game" vs. "Engine" code
Divisions of development
![Page 16: Developing Software for the PS3/Cell Platform](https://reader031.vdocuments.net/reader031/viewer/2022013004/56815884550346895dc5e4cd/html5/thumbnails/16.jpg)
Our Approach to the Cell for games
What does it change?
![Page 17: Developing Software for the PS3/Cell Platform](https://reader031.vdocuments.net/reader031/viewer/2022013004/56815884550346895dc5e4cd/html5/thumbnails/17.jpg)
“Manual Solution” “Very optimized codes but at
cost of programmability."
Marc Gonzales, IBM
WARNING!
![Page 18: Developing Software for the PS3/Cell Platform](https://reader031.vdocuments.net/reader031/viewer/2022013004/56815884550346895dc5e4cd/html5/thumbnails/18.jpg)
Good solutions for the Cellwill be good solutions on other
platforms.
![Page 19: Developing Software for the PS3/Cell Platform](https://reader031.vdocuments.net/reader031/viewer/2022013004/56815884550346895dc5e4cd/html5/thumbnails/19.jpg)
High-performance code is easy to
port to the Cell.
![Page 20: Developing Software for the PS3/Cell Platform](https://reader031.vdocuments.net/reader031/viewer/2022013004/56815884550346895dc5e4cd/html5/thumbnails/20.jpg)
Poorly performing code from any
platform is hard to port.
![Page 21: Developing Software for the PS3/Cell Platform](https://reader031.vdocuments.net/reader031/viewer/2022013004/56815884550346895dc5e4cd/html5/thumbnails/21.jpg)
Data and codeoptimization are merely important
on conventional architectures
... On the Cell, they're critical.
![Page 22: Developing Software for the PS3/Cell Platform](https://reader031.vdocuments.net/reader031/viewer/2022013004/56815884550346895dc5e4cd/html5/thumbnails/22.jpg)
Common Complaints
![Page 23: Developing Software for the PS3/Cell Platform](https://reader031.vdocuments.net/reader031/viewer/2022013004/56815884550346895dc5e4cd/html5/thumbnails/23.jpg)
#1 "But, it's too hard!"
![Page 24: Developing Software for the PS3/Cell Platform](https://reader031.vdocuments.net/reader031/viewer/2022013004/56815884550346895dc5e4cd/html5/thumbnails/24.jpg)
Multi-processing is not new
• Trouble with the SPUs usually is just trouble with multi-core.• You can't wish multi-core programming away. • It's part of the job.
![Page 25: Developing Software for the PS3/Cell Platform](https://reader031.vdocuments.net/reader031/viewer/2022013004/56815884550346895dc5e4cd/html5/thumbnails/25.jpg)
"Cache and DMA data design too complex"
Enforced simplicity
![Page 26: Developing Software for the PS3/Cell Platform](https://reader031.vdocuments.net/reader031/viewer/2022013004/56815884550346895dc5e4cd/html5/thumbnails/26.jpg)
Not that different from calling memcpy
![Page 27: Developing Software for the PS3/Cell Platform](https://reader031.vdocuments.net/reader031/viewer/2022013004/56815884550346895dc5e4cd/html5/thumbnails/27.jpg)
![Page 28: Developing Software for the PS3/Cell Platform](https://reader031.vdocuments.net/reader031/viewer/2022013004/56815884550346895dc5e4cd/html5/thumbnails/28.jpg)
But you get extra control
![Page 29: Developing Software for the PS3/Cell Platform](https://reader031.vdocuments.net/reader031/viewer/2022013004/56815884550346895dc5e4cd/html5/thumbnails/29.jpg)
![Page 30: Developing Software for the PS3/Cell Platform](https://reader031.vdocuments.net/reader031/viewer/2022013004/56815884550346895dc5e4cd/html5/thumbnails/30.jpg)
"My code can't be made parallel."
Yes. It can.
![Page 31: Developing Software for the PS3/Cell Platform](https://reader031.vdocuments.net/reader031/viewer/2022013004/56815884550346895dc5e4cd/html5/thumbnails/31.jpg)
"C/C++ is no good for parallel programming"
The solution is to understand the issues --not to hide them.
![Page 32: Developing Software for the PS3/Cell Platform](https://reader031.vdocuments.net/reader031/viewer/2022013004/56815884550346895dc5e4cd/html5/thumbnails/32.jpg)
"But I'm just doing this one little thing... it doesn't make any
difference."
Is it really just you? Is it really just this one thing?
![Page 33: Developing Software for the PS3/Cell Platform](https://reader031.vdocuments.net/reader031/viewer/2022013004/56815884550346895dc5e4cd/html5/thumbnails/33.jpg)
It's all about the SPUs
Designing code for concurrency
![Page 34: Developing Software for the PS3/Cell Platform](https://reader031.vdocuments.net/reader031/viewer/2022013004/56815884550346895dc5e4cd/html5/thumbnails/34.jpg)
“What's the easiest way to split programs into
SPU modules?”
![Page 35: Developing Software for the PS3/Cell Platform](https://reader031.vdocuments.net/reader031/viewer/2022013004/56815884550346895dc5e4cd/html5/thumbnails/35.jpg)
Let someone else do it.
![Page 36: Developing Software for the PS3/Cell Platform](https://reader031.vdocuments.net/reader031/viewer/2022013004/56815884550346895dc5e4cd/html5/thumbnails/36.jpg)
But if that someone is you...
Rules and guidelines
![Page 37: Developing Software for the PS3/Cell Platform](https://reader031.vdocuments.net/reader031/viewer/2022013004/56815884550346895dc5e4cd/html5/thumbnails/37.jpg)
Rule #1
DATA IS MORE IMPORTANTTHAN CODE
![Page 38: Developing Software for the PS3/Cell Platform](https://reader031.vdocuments.net/reader031/viewer/2022013004/56815884550346895dc5e4cd/html5/thumbnails/38.jpg)
• Good code follows good data.• Fast code follows good data.• Small code follows good data.
Guess what follows bad data.
![Page 39: Developing Software for the PS3/Cell Platform](https://reader031.vdocuments.net/reader031/viewer/2022013004/56815884550346895dc5e4cd/html5/thumbnails/39.jpg)
Rule #2
WHERE THERE IS ONE, THERE IS MORE THAN ONE
![Page 40: Developing Software for the PS3/Cell Platform](https://reader031.vdocuments.net/reader031/viewer/2022013004/56815884550346895dc5e4cd/html5/thumbnails/40.jpg)
The "domain-model design" Lie.
![Page 41: Developing Software for the PS3/Cell Platform](https://reader031.vdocuments.net/reader031/viewer/2022013004/56815884550346895dc5e4cd/html5/thumbnails/41.jpg)
Rule #3
SOFTWARE IS NOT A PLATFORM
![Page 42: Developing Software for the PS3/Cell Platform](https://reader031.vdocuments.net/reader031/viewer/2022013004/56815884550346895dc5e4cd/html5/thumbnails/42.jpg)
Unless you are a CS Professor.
![Page 43: Developing Software for the PS3/Cell Platform](https://reader031.vdocuments.net/reader031/viewer/2022013004/56815884550346895dc5e4cd/html5/thumbnails/43.jpg)
The real difficulty is in the unlearning.
![Page 44: Developing Software for the PS3/Cell Platform](https://reader031.vdocuments.net/reader031/viewer/2022013004/56815884550346895dc5e4cd/html5/thumbnails/44.jpg)
The ultimate goal:
Get everything on the SPUs.
![Page 45: Developing Software for the PS3/Cell Platform](https://reader031.vdocuments.net/reader031/viewer/2022013004/56815884550346895dc5e4cd/html5/thumbnails/45.jpg)
Complex systems can go on the SPUs
• Not just streaming systems• Used for any kind of task• But you do need to consider some things...
![Page 46: Developing Software for the PS3/Cell Platform](https://reader031.vdocuments.net/reader031/viewer/2022013004/56815884550346895dc5e4cd/html5/thumbnails/46.jpg)
• Data comes first. • Goal is minimum energy for transformation.• What is energy usage?
o CPU time. o Memory read/write time. o Stall time.
![Page 47: Developing Software for the PS3/Cell Platform](https://reader031.vdocuments.net/reader031/viewer/2022013004/56815884550346895dc5e4cd/html5/thumbnails/47.jpg)
Design the transformation pipeline back to front.
• Start with your destination data and work backward. Changes are inevitable -- Pay less for them.
![Page 48: Developing Software for the PS3/Cell Platform](https://reader031.vdocuments.net/reader031/viewer/2022013004/56815884550346895dc5e4cd/html5/thumbnails/48.jpg)
![Page 49: Developing Software for the PS3/Cell Platform](https://reader031.vdocuments.net/reader031/viewer/2022013004/56815884550346895dc5e4cd/html5/thumbnails/49.jpg)
SPUs use the canonical data.
Best format for the most expensive case.
![Page 50: Developing Software for the PS3/Cell Platform](https://reader031.vdocuments.net/reader031/viewer/2022013004/56815884550346895dc5e4cd/html5/thumbnails/50.jpg)
Minimize synchronization
Start with the smallest synchronization method possible.
![Page 51: Developing Software for the PS3/Cell Platform](https://reader031.vdocuments.net/reader031/viewer/2022013004/56815884550346895dc5e4cd/html5/thumbnails/51.jpg)
Often is lock-free single reader, single writer queue.
![Page 52: Developing Software for the PS3/Cell Platform](https://reader031.vdocuments.net/reader031/viewer/2022013004/56815884550346895dc5e4cd/html5/thumbnails/52.jpg)
![Page 53: Developing Software for the PS3/Cell Platform](https://reader031.vdocuments.net/reader031/viewer/2022013004/56815884550346895dc5e4cd/html5/thumbnails/53.jpg)
Load balancing
Data centric design will give coarser divisions.
![Page 54: Developing Software for the PS3/Cell Platform](https://reader031.vdocuments.net/reader031/viewer/2022013004/56815884550346895dc5e4cd/html5/thumbnails/54.jpg)
For constant time transforms:• Divide into multiple queues
For other transforms: • Use heuristic to decide times and a single entry queue to
distribute to multiple queues.
Start with simplest task queues
![Page 55: Developing Software for the PS3/Cell Platform](https://reader031.vdocuments.net/reader031/viewer/2022013004/56815884550346895dc5e4cd/html5/thumbnails/55.jpg)
Then work your way up.
• Is there a pre-existing sync point that will work? (e.g. vsync)• Can you split your data into need-to-sync and don't-care?
![Page 56: Developing Software for the PS3/Cell Platform](https://reader031.vdocuments.net/reader031/viewer/2022013004/56815884550346895dc5e4cd/html5/thumbnails/56.jpg)
![Page 57: Developing Software for the PS3/Cell Platform](https://reader031.vdocuments.net/reader031/viewer/2022013004/56815884550346895dc5e4cd/html5/thumbnails/57.jpg)
![Page 58: Developing Software for the PS3/Cell Platform](https://reader031.vdocuments.net/reader031/viewer/2022013004/56815884550346895dc5e4cd/html5/thumbnails/58.jpg)
![Page 59: Developing Software for the PS3/Cell Platform](https://reader031.vdocuments.net/reader031/viewer/2022013004/56815884550346895dc5e4cd/html5/thumbnails/59.jpg)
Write “optimizable” code.
Simple, self-contained loopsOver as many iterations as possible
No branches
![Page 60: Developing Software for the PS3/Cell Platform](https://reader031.vdocuments.net/reader031/viewer/2022013004/56815884550346895dc5e4cd/html5/thumbnails/60.jpg)
Transitioning from "legacy" systems...
An example from RCF
![Page 61: Developing Software for the PS3/Cell Platform](https://reader031.vdocuments.net/reader031/viewer/2022013004/56815884550346895dc5e4cd/html5/thumbnails/61.jpg)
FastPathFollowers C++ class
• And it's derived classes• Running on the PPU• Typical Update() method• Derived from a root class of all “updatable” types
![Page 62: Developing Software for the PS3/Cell Platform](https://reader031.vdocuments.net/reader031/viewer/2022013004/56815884550346895dc5e4cd/html5/thumbnails/62.jpg)
Where did this go wrong?
What rules where broken?• Used domain-model design• Code “design” over data design• No advatage of scale• No synchronization design• No cache consideration
![Page 63: Developing Software for the PS3/Cell Platform](https://reader031.vdocuments.net/reader031/viewer/2022013004/56815884550346895dc5e4cd/html5/thumbnails/63.jpg)
Result
• Typical performance issues• Cache misses• Unnecessary transformations• Didn't scale well• Problems after a few hundred updating
![Page 64: Developing Software for the PS3/Cell Platform](https://reader031.vdocuments.net/reader031/viewer/2022013004/56815884550346895dc5e4cd/html5/thumbnails/64.jpg)
Step 1: Group the data together
“Where there's one, there's more than one.” • Before the update() loop was called,• Intercepted all FastPathFollowers and derived classes • Removed them from the update list.• Then kept in a separate array.
![Page 65: Developing Software for the PS3/Cell Platform](https://reader031.vdocuments.net/reader031/viewer/2022013004/56815884550346895dc5e4cd/html5/thumbnails/65.jpg)
Step 1: Group the data together
• Created new function, UpdateFastPathFollowers()• Used the new list of same type of data• Generic Update() no longer used• (Ignored derived class behaviors here.)
![Page 66: Developing Software for the PS3/Cell Platform](https://reader031.vdocuments.net/reader031/viewer/2022013004/56815884550346895dc5e4cd/html5/thumbnails/66.jpg)
Step 2: Organize Inputs and Outputs
• Define what's read, what's write.• Inputs: Position, Time, State, Results of queries, Paths• Outputs: Position, State, Queries, Animation• Read inputs. Transform to Outputs.• Nothing more complex than that.
![Page 67: Developing Software for the PS3/Cell Platform](https://reader031.vdocuments.net/reader031/viewer/2022013004/56815884550346895dc5e4cd/html5/thumbnails/67.jpg)
Step 3: Reduce Synchronization Points
• Collected all outputs together• Collected any external function calls together into a
command buffer• Separate Query and Query-Result• Effectively a Queue between systems• Reduced from many sync points per “object” to one sync
point for the system
![Page 68: Developing Software for the PS3/Cell Platform](https://reader031.vdocuments.net/reader031/viewer/2022013004/56815884550346895dc5e4cd/html5/thumbnails/68.jpg)
Before Pattern
Loop Objects• Read Input 0• Update 0• Write Output• Read Input 1• Update 1• Call External Function• Block (Sync)
![Page 69: Developing Software for the PS3/Cell Platform](https://reader031.vdocuments.net/reader031/viewer/2022013004/56815884550346895dc5e4cd/html5/thumbnails/69.jpg)
After Pattern (Simplified)
Loop Objects• Read Input 0, 1• Update 0, 1• Write Output, Function to Queue
Block (Sync) Empty (Execute) Queue
![Page 70: Developing Software for the PS3/Cell Platform](https://reader031.vdocuments.net/reader031/viewer/2022013004/56815884550346895dc5e4cd/html5/thumbnails/70.jpg)
Next: Added derived-class functionality
• Similarly simplified derived-class Update() functions into functions with clear inputs and outputs.
• Added functions to deferred queue as any other function.• Advantage: Can limit derived functionality based on count,
LOD, etc.
![Page 71: Developing Software for the PS3/Cell Platform](https://reader031.vdocuments.net/reader031/viewer/2022013004/56815884550346895dc5e4cd/html5/thumbnails/71.jpg)
Step 4: Move to PPU thread
• Now system update has no external dependencies• Now system update has no conflicting data areas (with other
systems)• Now system update does not call non-re-entrant functions• Simply put in another thread
![Page 72: Developing Software for the PS3/Cell Platform](https://reader031.vdocuments.net/reader031/viewer/2022013004/56815884550346895dc5e4cd/html5/thumbnails/72.jpg)
Step 4: Move to PPU thread
• Add literal sync between system update and queue execution
• Sync can be removed because only single reader and single writer to data
• Queue can be emptied while being filled without collision See also: R&D page at insomniacgames.com on multi-threaded optimization
![Page 73: Developing Software for the PS3/Cell Platform](https://reader031.vdocuments.net/reader031/viewer/2022013004/56815884550346895dc5e4cd/html5/thumbnails/73.jpg)
Step 5: Move to SPU
• Now completely independent thread• Can be run anytime• Move to new SPU system • Using SPU Shaders
![Page 74: Developing Software for the PS3/Cell Platform](https://reader031.vdocuments.net/reader031/viewer/2022013004/56815884550346895dc5e4cd/html5/thumbnails/74.jpg)
SPU Shaders
• On SPU, Code is data• Double buffer / stream same as data• Very easy to do (No need for special libraries)
o Compile the codeo Dump the object as binaryo Load binary as datao Jump to binary location (e.g. Normal function pointer)o Pass everything as parameters, the ABI won't change.
![Page 75: Developing Software for the PS3/Cell Platform](https://reader031.vdocuments.net/reader031/viewer/2022013004/56815884550346895dc5e4cd/html5/thumbnails/75.jpg)
The 256K Barrier
The solution is simple:• Upload more code when you need it.• Upload more data when you need it.• Data is managed by traditional means • i.e. Double, triple fixed-buffers, etc.• Code is just data.
![Page 76: Developing Software for the PS3/Cell Platform](https://reader031.vdocuments.net/reader031/viewer/2022013004/56815884550346895dc5e4cd/html5/thumbnails/76.jpg)
The End
• Programming for the SPUs is not really differentThese issues are not going to go away.
• Teams need practice and experience.• Modern systems still benefit from heavy optimization.• Design around asynchronous processing.• Don't be afraid to learn and change.