power efficient processor design and the cell processor · 2005-03-02 · power efficient processor...

30
Systems and Technology Group © 2005 IBM Corporation Power Efficient Processor Design and the Cell Processor H. Peter Hofstee, Ph. D. [email protected] Architect, Cell Synergistic Processor Element IBM Systems and Technology Group Austin, Texas

Upload: others

Post on 20-May-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Power Efficient Processor Design and the Cell Processor · 2005-03-02 · Power Efficient Processor Design and the Cell Processor H. Peter Hofstee, Ph. D. ... – Control Plane vs

Systems and Technology Group

© 2005 IBM Corporation

Power Efficient Processor Design and the Cell Processor

H. Peter Hofstee, Ph. [email protected], Cell Synergistic Processor ElementIBM Systems and Technology GroupAustin, Texas

Page 2: Power Efficient Processor Design and the Cell Processor · 2005-03-02 · Power Efficient Processor Design and the Cell Processor H. Peter Hofstee, Ph. D. ... – Control Plane vs

Systems and Technology Group

© 2005 IBM Corporation2

Agenda

� Power Efficient Processor Architecture

� System Trends

� Cell Processor Overview

Page 3: Power Efficient Processor Design and the Cell Processor · 2005-03-02 · Power Efficient Processor Design and the Cell Processor H. Peter Hofstee, Ph. D. ... – Control Plane vs

Systems and Technology Group

© 2005 IBM Corporation3

Power Efficient Architecture

Page 4: Power Efficient Processor Design and the Cell Processor · 2005-03-02 · Power Efficient Processor Design and the Cell Processor H. Peter Hofstee, Ph. D. ... – Control Plane vs

Systems and Technology Group

© 2005 IBM Corporation4

Limiters to Processor Performance

� Power wall

� Memory wall

� Frequency wall

Page 5: Power Efficient Processor Design and the Cell Processor · 2005-03-02 · Power Efficient Processor Design and the Cell Processor H. Peter Hofstee, Ph. D. ... – Control Plane vs

Systems and Technology Group

© 2005 IBM Corporation5

Power Wall (Voltage Wall)

0.010.110.001

0.01

0.1

1

10

100

1000

Gate Length (microns)

Active Power

Passive Power

1994 2004

Po

wer

Den

sity

(W

/cm

2 )

� Power components:

– Active power

– Passive power

• Gate leakage• Sub-threshold

leakage (source-drain leakage)

10S Tox=11AGate Stack

Gate dielectric approaching a fundamental limit (a few atomic layers)

NET: INCREASING PERFORMANCE

REQUIRES INCREASING EFFICIENCY

Page 6: Power Efficient Processor Design and the Cell Processor · 2005-03-02 · Power Efficient Processor Design and the Cell Processor H. Peter Hofstee, Ph. D. ... – Control Plane vs

Systems and Technology Group

© 2005 IBM Corporation6

Memory wall

� Main memory now nearly 1000 cycles from the processor

– Situation worse with (on-chip) SMP

� Memory latency penalties drive inefficiency in the design

– Expensive and sophisticated hardware to try and deal with it

– Programmers that try to gain control of cache content, but are hindered by the hardware mechanisms

� Latency induced bandwidth limitations

– Much of the bandwidth to memory in systems can only be used speculatively

– Diminishing returns from added bandwidth on traditional systems

Page 7: Power Efficient Processor Design and the Cell Processor · 2005-03-02 · Power Efficient Processor Design and the Cell Processor H. Peter Hofstee, Ph. D. ... – Control Plane vs

Systems and Technology Group

© 2005 IBM Corporation7

Frequency wall

� Increasing frequencies and deeper pipelines have reached diminishing returns on performance

� Returns negative if power is taken into account

� Results of studies depend on issue width of processor– The wider the processor the slower it wants to be

– Simultaneous Multithreading helps to use issue slots efficiently

� Results depend on number of architected registers and workload– More registers tolerate deeper pipeline

– Fewer random branches in application tolerates deeper pipelines

Page 8: Power Efficient Processor Design and the Cell Processor · 2005-03-02 · Power Efficient Processor Design and the Cell Processor H. Peter Hofstee, Ph. D. ... – Control Plane vs

Systems and Technology Group

© 2005 IBM Corporation8

Microprocessor Efficiency

�Recent History:–Gelsinger’s law

• 1.4x more performance for 2x more transistors–Hofstee’s corollary

• 1/1.4x efficiency loss in every generation• Examples: Cache size, OoO, Superscalar, etc. etc.

�Re-examine microarchitecture with performance per transistor as metric–Pipelining is last clear win

Page 9: Power Efficient Processor Design and the Cell Processor · 2005-03-02 · Power Efficient Processor Design and the Cell Processor H. Peter Hofstee, Ph. D. ... – Control Plane vs

Systems and Technology Group

© 2005 IBM Corporation9

Attacking the Performance Walls

� Multi-Core Non-Homogeneous Architecture– Control Plane vs. Data Plane processors– Attacks Power Wall

� 3-level Model of Memory– Main Memory, Local Store, Registers– Attacks Memory Wall

� Large Shared Register File & SW Controlled Branching– Allows deeper pipelines (11FO4 … helps power!)– Attacks Frequency Wall

Page 10: Power Efficient Processor Design and the Cell Processor · 2005-03-02 · Power Efficient Processor Design and the Cell Processor H. Peter Hofstee, Ph. D. ... – Control Plane vs

Systems and Technology Group

© 2005 IBM Corporation10

System Trends

Page 11: Power Efficient Processor Design and the Cell Processor · 2005-03-02 · Power Efficient Processor Design and the Cell Processor H. Peter Hofstee, Ph. D. ... – Control Plane vs

Systems and Technology Group

© 2005 IBM Corporation11

System Trends toward Integration

� Increased integration is driving processors to take on many functions typically associated with systems

– Integration forces processor developers to address off-load and acceleration in the design of the processor

– Integration of bridge chip functionality

� Virtualization technology is used to support non-homogeneous environments

Processor

Southbridge

NorthbridgeMemory

Accel

Cell

Processor

Memory

IO

IO

Page 12: Power Efficient Processor Design and the Cell Processor · 2005-03-02 · Power Efficient Processor Design and the Cell Processor H. Peter Hofstee, Ph. D. ... – Control Plane vs

Systems and Technology Group

© 2005 IBM Corporation12

Next Generation Processors address Programming Complexity and Trend Towards Programmable Offload Engines with a Simpler System Alternative

CPU

GPU

NIC

Security

Media…

CPU

StreamingGraphicsProcessor

NetworkProcessor

SecurityProcessor

MediaProcessor…

64b PowerProcessor

SynergisticProcessor

SynergisticProcessor

...

Mem. Contr.

Config. IO

Hardwired Function

Programmable ASIC Cell

Page 13: Power Efficient Processor Design and the Cell Processor · 2005-03-02 · Power Efficient Processor Design and the Cell Processor H. Peter Hofstee, Ph. D. ... – Control Plane vs

Systems and Technology Group

© 2005 IBM Corporation13

“Outward Facing” Aspects of Cell

� Cell is designed to be responsive

� .. to human user

– Real-time response

– Supports rich visual interfaces

� .. to network

– Flexible, can support new standards

– High-bandwidth

– Content protection, privacy & security

� Contrast to traditional processors which evolved from “batch processing” mentality (inward focused).

Page 14: Power Efficient Processor Design and the Cell Processor · 2005-03-02 · Power Efficient Processor Design and the Cell Processor H. Peter Hofstee, Ph. D. ... – Control Plane vs

Systems and Technology Group

© 2005 IBM Corporation14

Cell Overview

Page 15: Power Efficient Processor Design and the Cell Processor · 2005-03-02 · Power Efficient Processor Design and the Cell Processor H. Peter Hofstee, Ph. D. ... – Control Plane vs

Systems and Technology Group

© 2005 IBM Corporation15

Key Attributes of Cell� Cell is Multi-Core

– Contains 64-bit Power Architecture TM

– Contains 8 Synergistic Processor Elements (SPE)

� Cell is a Flexible Architecture

– Multi-OS support (including Linux) with Virtualization technology

– Path for OS, legacy apps, and software development

� Cell is a Broadband Architecture

– SPE is RISC architecture with SIMD organization and Local Store

– 128+ concurrent transactions to memory per processor

� Cell is a Real-Time Architecture

– Resource allocation (for Bandwidth Measurement)

– Locking Caches (via Replacement Management Tables)

� Cell is a Security Enabled Architecture

– SPE dynamically reconfigurable as secure processors

Page 16: Power Efficient Processor Design and the Cell Processor · 2005-03-02 · Power Efficient Processor Design and the Cell Processor H. Peter Hofstee, Ph. D. ... – Control Plane vs

Systems and Technology Group

© 2005 IBM Corporation16

Cell Chip Block Diagram

PXU

EIB (up to 96 Bytes/cycle)

SXU

LS

SXU

LSLS

SXU

LSLS

SXU

LS

Dual XDRTM FlexIOTM

LS

SXU

LS

SXU SXU SXU

BICMICL2

L1

SMF SMFSMFSMF SMFSMF SMF SMF

PPE

SPESPU

Page 17: Power Efficient Processor Design and the Cell Processor · 2005-03-02 · Power Efficient Processor Design and the Cell Processor H. Peter Hofstee, Ph. D. ... – Control Plane vs

Systems and Technology Group

© 2005 IBM Corporation17

PPU

SPU

SPU

SPU

SPU

SPU

SPU

SPU

SPU

MIC

RRAC

BIC

MIB

Cell Prototype Die (Pham et al, ISSCC 2005)

Page 18: Power Efficient Processor Design and the Cell Processor · 2005-03-02 · Power Efficient Processor Design and the Cell Processor H. Peter Hofstee, Ph. D. ... – Control Plane vs

Systems and Technology Group

© 2005 IBM Corporation18

Cell Highlights

�Observed clock speed– > 4 GHz

�Peak performance (single precision)– > 256 GFlops

�Peak performance (double precision)– >26 GFlops

�Area 221 mm2

�Technology 90nm SOI

�Total # of transistors 234M

Page 19: Power Efficient Processor Design and the Cell Processor · 2005-03-02 · Power Efficient Processor Design and the Cell Processor H. Peter Hofstee, Ph. D. ... – Control Plane vs

Systems and Technology Group

© 2005 IBM Corporation19

Element Interconnect Bus� EIB data ring for internal communication

– Four 16 byte data rings, supporting multiple transfers

– 96B/cycle peak bandwidth

– Over 100 outstanding requests

Page 20: Power Efficient Processor Design and the Cell Processor · 2005-03-02 · Power Efficient Processor Design and the Cell Processor H. Peter Hofstee, Ph. D. ... – Control Plane vs

Systems and Technology Group

© 2005 IBM Corporation20

Power Processor Element

� PPE handles operating system and control tasks

– 64-bit Power ArchitectureTM with VMX

– In-order, 2-way hardware Multi-threading

– Coherent Load/Store with 32KB I & D L1 and 512KB L2

Page 21: Power Efficient Processor Design and the Cell Processor · 2005-03-02 · Power Efficient Processor Design and the Cell Processor H. Peter Hofstee, Ph. D. ... – Control Plane vs

Systems and Technology Group

© 2005 IBM Corporation21

Synergistic Processor Element

� SPE provides computational performance

– Dual issue, up to 16-way 128-bit SIMD

– Dedicated resources: 128 128-bit RF, 256KB Local Store

– Each can be dynamically configured to protect resources

– Dedicated DMA engine: Up to 16 outstanding request

Page 22: Power Efficient Processor Design and the Cell Processor · 2005-03-02 · Power Efficient Processor Design and the Cell Processor H. Peter Hofstee, Ph. D. ... – Control Plane vs

Systems and Technology Group

© 2005 IBM Corporation22

SPE Highlights � User-mode architecture– No translation/protection within SPU

– DMA is full Power Arch protect/x-late

� Direct programmer control– DMA/DMA-list

– Branch hint

� VMX-like SIMD dataflow– Broad set of operations

– Graphics SP-Float

– IEEE DP-Float (BlueGene-like)

� Unified register file– 128 entry x 128 bit

� 256kB Local Store– Combined I & D

– 16B/cycle L/S bandwidth

– 128B/cycle DMA bandwidth

LS

LS

LS

LS

GPR

FXU ODD

FXU EVN

SFPDP

CO

NT

RO

L

CHANNEL

DMA SMMATO

SBIRTB

BE

B

FWD

SPU

SMF

14.5mm2 (90nm SOI)

Page 23: Power Efficient Processor Design and the Cell Processor · 2005-03-02 · Power Efficient Processor Design and the Cell Processor H. Peter Hofstee, Ph. D. ... – Control Plane vs

Systems and Technology Group

© 2005 IBM Corporation23

SPE Organization (Flachs et al, ISSCC 2005)

Page 24: Power Efficient Processor Design and the Cell Processor · 2005-03-02 · Power Efficient Processor Design and the Cell Processor H. Peter Hofstee, Ph. D. ... – Control Plane vs

Systems and Technology Group

© 2005 IBM Corporation24

SPE PIPELINE (Flachs et al, ISSCC 2005)

Page 25: Power Efficient Processor Design and the Cell Processor · 2005-03-02 · Power Efficient Processor Design and the Cell Processor H. Peter Hofstee, Ph. D. ... – Control Plane vs

Systems and Technology Group

© 2005 IBM Corporation25

I/O and Memory Interfaces� I/O Provides wide bandwidth

– Dual XDRTM controller (25.6GB/s @ 3.2Gbps)– Two configurable interfaces (76.8GB/s @6.4Gbps)– Flexible Bandwidth between interfaces– Allows for multiple system configurations

Page 26: Power Efficient Processor Design and the Cell Processor · 2005-03-02 · Power Efficient Processor Design and the Cell Processor H. Peter Hofstee, Ph. D. ... – Control Plane vs

Systems and Technology Group

© 2005 IBM Corporation26

Cell Processor Can Support Many Systems� Game console systems

� Workstations (CPBW)

� HDTV

� Home media servers

� Supercomputers

CELLProcessor

XDRtm XDRtm

IOIF0 IOIF1

CELLProcessor

XDRtm XDRtm

IOIF BIF

CELLProcessor

XDRtm XDRtm

IOIF

CELLProcessor

XDRtm XDRtm

IOIFBIF

CELLProcessor

XDRtm XDRtm

IOIF

CELLProcessor

XDRtm XDRtm

IOIF

BIF

CELLProcessor

XDRtm XDRtm

IOIFSW

Page 27: Power Efficient Processor Design and the Cell Processor · 2005-03-02 · Power Efficient Processor Design and the Cell Processor H. Peter Hofstee, Ph. D. ... – Control Plane vs

Systems and Technology Group

© 2005 IBM Corporation27

Cell Processor Based Workstation (CPBW) (Sony Group and IBM)

� First Prototype “Powered On”

� 16 Tera-flops in a rack (est.)

– ( equals 1 Peta-flop in 64 racks )

� Optimized for Digital Content Creation, including

• Computer entertainment

• Movies

• Real-time rendering

• Physics simulation

Memory

High BWNetworks

CELL Processor (2-Way SMP)

StorageSys

Mgmt

I/OBridge

CELL Processor Board

16 TFlop rack

Page 28: Power Efficient Processor Design and the Cell Processor · 2005-03-02 · Power Efficient Processor Design and the Cell Processor H. Peter Hofstee, Ph. D. ... – Control Plane vs

Systems and Technology Group

© 2005 IBM Corporation28

Cell Processor Example Application Areas

� Cell is a processor that excels at processing of rich media content in the context of broad connectivity

– Digital content creation (games and movies)

– Game playing and game serving

– Distribution of (dynamic, media rich) content

– Imaging and image processing

– Image analysis (e.g. video surveillance)

– Next-generation physics-based visualization

– Video conferencing (3D?)

– Streaming applications (codecs etc.)

– Physical simulation & science

Page 29: Power Efficient Processor Design and the Cell Processor · 2005-03-02 · Power Efficient Processor Design and the Cell Processor H. Peter Hofstee, Ph. D. ... – Control Plane vs

Systems and Technology Group

© 2005 IBM Corporation29

Summary

� Cell ushers in a new era of leading edge processors optimized for digital media and entertainment

� Desire for realism is driving a convergence between supercomputing and entertainment

� New levels of performance and power efficiency beyond what is achieved by PC processors

� Responsiveness to the human user and the network are key drivers for Cell

� Cell will enable entirely new classes of applications, even beyond those we contemplate today

Page 30: Power Efficient Processor Design and the Cell Processor · 2005-03-02 · Power Efficient Processor Design and the Cell Processor H. Peter Hofstee, Ph. D. ... – Control Plane vs

Systems and Technology Group

© 2005 IBM Corporation30

Acknowledgements

� Cell is the result of a deep partnership between SCEI/Sony, Toshiba, and IBM

� Cell represents the work of more than 400 people starting in 2001

� More detailed papers on the Cell implementation and the SPE micro-architecture can be found in the ISSCC 2005 proceedings