living with "moore" & designing the ultimate soc
DESCRIPTION
Jack Browne,SonicsTRANSCRIPT
1
Living with “Moore” & Designing the Ultimate SoC
Jack BrowneSenior VP Sales & Marketing, Sonics, Inc.
2 May 2012
Evolution of Consumer SoCs
2 May 2012 2
Driving SoC Complexity
• Relentless push for higher quality user experience – at minimum system cost!• Feature convergence – Video, Voice, Data, Audio (in every consumer device!)• Critical demand for 1GHz and beyond
Mobile is now!Unprecedented market impact:• 9 years ago: 3G introduced in Europe• First IPhone: 5 years old
– Game changing with Apple & Samsung capturing 90% of wireless device profits
• iPad 1: 2 years old . . #1 in a market that will ship 119M units in 2012
By 2014, top 4 semi market segments:• Smart phones – # 1 semi mkt 2011
2011 unit volumes > Mobile PC’s• Mobile PC’s• Office PC’s• Tablets
2 May 2012 3
Source: Gartner, IHS Supply
Cloud
Market Drivers
2 May 2012 4
• Smart Phone Shipments > PC’s• Consumers transitioning from Personal Computer to Personal Computing, Intel• Sensing/Control = IOT = 7B devices 2012 15B 2015, Broadcom• Cloud = Bandwidth, Connectivity, Services, Commerce, Content, …
Mobile SoC Design Challenges• Products Shipping today (Smart
phones, Tablets, Netbooks):– Single and Dual Core processors at
800MHz – 1.4GHz– 40nm process node– LP DDR2– 1080p video Encode/Decode– Integrated and discrete baseband – 60-80 unique IP cores
• Products for 2013:– Next generation dual and Quad core
processors at 1 - 3GHz , e.g. big.LITTLE– 28nm process node– LP DDR2 and Wide I/O memory– Multi Channel Memory– Integrated and discrete baseband– 80 – 120 Unique IP cores
• 3D TSV packaging coming
Silicon Area = 122mm2
Silicon Area = 163mm2
2 May 2012 5
Source: http://www.anandtech.com
6
Our Market ChallengesGive SoC Performance, Bandwidth at Right Power
CPU/GPU/Media, Process, Connectivity
Sources: ARM, 2011, Morgan Stanley, 2011
Performance, All Day Use
Cortex A15/A7
Big.LITTLE
Cortex 64-bit
Big.LITTLE
Cortex A9A5
Complexity, Differentiation, TTM
2 May 2012
• TSMC is ready…28nm HPM, with full ecosystem enablement• ARM is ready…
– Cortex™-A15 CPU: 1-2.5 GHz, 1-4 cores/cluster– Mali™-T658 GPU: 350 GFLOPs, 1-8 cores/cluster
• DRAM vendors are ready…– DDR3/4: 1600-3200 Mb/sec/pin 6-50 GB/sec, 1-4 channels– LPDDR2/3: 800-1600 Mb/sec/pin 3-25 GB/sec, 1-4 channels– Wide IO: 200-266 Mb/sec/pin 13-17 GB/sec, 4 channels
• But what about the middle?
2 May 2012 7
Are We Ready?
Tablet SoC
DDR
Cortex-A15
Mali-658
Video
Audio
Camera
Display
USB…
DDR DDR DDR
?
Why So Fast?• We’re fully converged!
– Computing– Graphics– Video/Audio
• Everything runs user applications
• Apps need Giga’s– 1-2 GHz multicore CPUs– 100+ GFLOP multicore GPUs– 15-50 GB/sec DRAM
• At consumer pricing• … and something to
integrate it all!
2 May 2012 8
Consumer Electronics:“Wish List” 2011
ProductRank Ages 6-12
Rank Ages 13+
User Apps
iPad 1 1 Computer 4 2 iPhone 3 7 Tablet (non-iPad) 5 5 TV 9 4 iPod Touch 2 12 Kinect for Xbox 360 7 9E-Reader 13 3 Smartphone (non-iPhone) 10 8 Blu-Ray Player 12 6 Nintendo 3DS 6 16 PlayStation 3 11 11 Nintendo DS* 8 15 Nintendo Wii 16 10 Xbox 360 14 13 PlayStation Move 17 14Other Mobile Phone 15 17 PlayStation Portable 18 18
Source: Nielsen, November 2011
ProductRank Ages 6-12
Rank Ages 13+
Battery Powered
iPad 1 1 Computer 4 2 iPhone 3 7 Tablet (non-iPad) 5 5 TV 9 4iPod Touch 2 12 Kinect for Xbox 360 7 9E-Reader 13 3 Smartphone (non-iPhone) 10 8 Blu-Ray Player 12 6Nintendo 3DS 6 16 PlayStation 3 11 11Nintendo DS* 8 15 Nintendo Wii 16 10Xbox 360 14 13PlayStation Move 17 14Other Mobile Phone 15 17 PlayStation Portable 18 18
But What About Power?• Convergence drives massive
SoC integration– Thin is in!
• All these Giga’s cost power– But most devices run from
batteries
• Result: cannot afford to power entire SoC at once
– “Dark silicon”– Power only those subsystems
needed for current apps– And only as long as needed
2 May 2012 9
Consumer Electronics:“Wish List” 2011
Source: Nielsen, November 2011
CPU GFX
Video Other Per
• General techniques– Stop/start subsystem clocks– Dynamic clock frequency– On/off voltage domains– Dynamic voltage/frequency domains
(DVFS)• IP-specific techniques
– ARM big.LITTLE™ (use optimum IP for loading)
• Power managers implement the techniques– Software: flexible, but slow– Hardware: very responsive, but less
flexible
• Moving towards subsystem blocks normally ‘off’
2 May 2012 10
Managing Dark Silicon…di
fficu
lty
CPU GFX
Video BBBB
System Design Challenges
How do Semiconductor Companies Keep Pace?• System analysis
– Evaluate Performance, Power, and Area early in the design
• SoC Architecture Choices– Processor Speed– Bus speed– Clocks domains– Power and voltage domains– Critical data flow paths– Memory subsystem– Physical design– IP selection/development
11
MemoryScheduler
DRAM Cont.
On-chip Network
DMASecure ROM
Security SRAM
ROM
Audio
LCD Controller
HDMI
EthernetMemoryScheduler
DRAM Cont.
CPU
Camera
PCIe
2 May 2012
On-Chip Network SpeedProcessor and Memory selection drive on-chip network speeds
• Memory with a 2:1 or 4:1 Controller– Example:
• DDR3 2133MHz with 2:1 controller requires a network speed of 1066MHz• With 4:1 controller requires a bus speed of 533MHz
• Processors with cache: 1-2GHz– On-Chip network typically runs at 2:1 or 4:1 ratio– Option 1: Run “wide and slow”: Eases timing closure– Option 2: Run “fast and narrow”: Save area
• Memory speed typically paces the system
122 May 2012
Introducing SonicsGN (SGN)
2 May 2012 13
■ High-speed network > GHz
■ Low System Power– Clock gated– Power signals
■ Highly Optimized area– Virtual channels– Fully configurable IP
■ Ideal for Tablet/Smart Phone SoCs– Supports advanced processor speeds: 1-3GHz– Scalable design: supports many heterogeneous
IP cores– Supports multiple power domains
• Targeted for 28nm process node and below
On-Chip Network IP for Complex SoCs Design
14
Key Feature: Performance Efficiency
• Virtual Channels– Share system resources
• Non-blocking– Always allow progress in the
system– Advanced “knowledge” if the
resource is utilized
• Quality of Service algorithm
Maximize SoC Performance for Concurrent Applications
MB/s
time
Sustainable rate
Flowblocked
Peak rate
No over-provisioning of the Network
2 May 2012
15
Virtual Channels
• Spatially Concurrent (left)– More peak performance– Potential for “over-
provisioning”
• Shared Resources (right)– Uses virtual channels– Independent flow control on
each channel– Saves wires and area
VCs allow system resources to be maximized for greater network efficiency
Shared Link
Same area
Fewer wires
Less area(shared input buffers)
Spatially Concurrent Share Resources
Concurrency – How it works
2 May 2012
16
Power Management
Network Power Consumption■ Minimize ACTIVE power
• Fine grained clock gating■ Minimize IDLE power
• Coarse grained auto-gating w/ combo wakeup
■ Minimum LEAKAGE power• Efficient network - minimum gate
count
• Advanced System Partitioning– Identify Power intent– CPF and UPF support
System Power Management• Power Management signals• Fast wake-up and shut down• Reliably enter and exit low-power
state• Simplify System Power Manager
design
On
-Ch
ip N
etw
ork
Sy
ste
m P
ow
er
Mg
r
Power Down Ack
Power Down Req
Active
Auto Wake Enable
Auto Wake Req
On-Chip Network can efficiently monitor activity 2 May 2012
Tablet Processor – Design Example
17
Tablet SoC Functional Blocks
2 May 2012
MFV Codec
267MHz 200MHz
Tablet Processor
2 May 2012 18
Sonics MemMax Memory Scheduler
DRAM Cont.
SonicsGN On-chip Network
Sonics3220 Peripheral Network
Coherency Fabric
DMASecure ROM
Security SRAMROM
PCIe
USB
Audio
LCD Controller
HDMI
SATAEthernet
APB Peripherals
Sonics MemMax Memory Scheduler
DRAM Cont.
Core1
L2 Cache
Quad core CPUCore2
Core3 Core4
Graphics Sub-System
GPU
GPU GPU
GPUCore1
L2 Cache
Quad core CPUCore2
Core3 Core4
1333MHz 1333MHz 533MHz
533
MH
z
133MHz
133MHz133MHz
400MHz
2133MHz 2133MHz
533MHz 533MHz
133MHz
267MHz
133MHz
133MHz
267MHz
267MHz
533MHz
133MHz
267MHz
Cam 2
Cam 1200MHz
• Multi-threaded, Multiple Queues
• Multiple QoS levels
• Power domains
18
On-Chip Network – Under the Hood
19
• Routed Network
• Virtual channels
• Clock Crossing
• Multiple power
domains
• Bit conversion
2 May 2012
2 May 2012 20
Managing Power with SonicsGN• Flexible power domain
support– Asynch/mesochronous– Isolation/level shifters
• HW-controlled safe shutdown
• Automatic wakeup• Benefits:
– More domains– Quicker shutdown– Faster wakeup
• Keep more dark, more of the time
50% SoC Power Reduction!
Cortex-A15
Cluster
Cortex-A7
Cluster
Mali-T658
Cluster
CCI-400Video
EngineVideo
Encode
Cam2
Cam1
DMA
DisplayCtrl.
US
B1
US
B2
DR
AM
Ch.
1D
RA
MC
h. 2
On
-die
S
RA
M
US
B3
US
BO
TG
E-net
Audio
SATA UFSSD/CF/
MMCHSIPCIe
HDMI
On-
die
RO
MP
erip
hera
ls
A2x2
G4x1
J3x1
F4x1
E4x1
H5x2
C2x3
SecurityEngine
M MMM
M M M M MMMS M
S
MM
MM
MM
M
SS
B2x3
SS
M S128 128
128
32
T
128
T
128
T
64
T
I T I I I I
I
I
I
I
I
I
I
I
IT I I IIII
32 646432
64
64
3232 32 32 64646464
1333 MHz 1066 MHz 533 MHz
53
3 M
Hz
267 MHz 267 MHz133 MHz267 MHz
267 MHz
64
64
64
64
64
64
13
3 M
Hz
13
3 M
Hz
13
3 M
Hz
13
3 M
Hz
40
0 M
Hz
20
0 M
Hz
20
0 M
Hz
133 MHz 133 MHz 133 MHz 133 MHz133 MHz267 MHz 267 MHz
13
3 M
Hz
53
3 M
Hz
53
3 M
Hz
53
3 M
Hz
13
3 M
Hz
DDR32133
DDR32133
IPC
ontr
olS
32
T
13
3 M
Hz
M128
I
D1x3
I4x1
T
Power DomainBoundary
SonicsGNRequestNetwork
SGN Results
• SGN met the tablet performance requirement with fabric frequency of 1066MHz
• Efficient gate count: 508K Gates
• Advanced system partitioning• Support for System Concurrency:
– Virtual Channels: Non-blocking network– Quality of Service
• Advanced Power Management– Simplifies unit power manager– >1% free running flops
• Support for Memory Subsystem– QoS to increase DRAM efficiency– Load balancing for multi-channel DRAM
22
On-Chip Network selection critical to SoC Performance Design Goals
Results=================
Process: TSMC 28nm HPM
Base clocks: 1.2 GHz, 1GHz
Area: • 508K Gates
Cost to add…• Master core: 7K Gates• Slave core: 5K Gates
2 May 2012
• GHz, GFLOPs and GB/sec are consumer design points– And your next SoC will need them!
• SoC integration must exploit that performance– GHz on-chip networks: SonicsGN– Multichannel DRAM optimization: Sonics IMT– High efficiency DRAM scheduling: Sonics MemMax
• … while improving battery life– Automatic hardware power management, with software policies
• SonicsGN– Twice the frequency– One half the SoC power
2 May 2012 23
Summary
Questions?
242 May 2012
Thank You!
For more information:www.sonicsinc.com
Contact: [email protected]