© 2006 ibm corporation 0. ibm research © 2007 ibm corporation multi-core design automation...
TRANSCRIPT
1
© 2006 IBM Corporation
IBM Research
© 2007 IBM Corporation
Multi-Core Design Automation Challenges
John Darringer
IBM T. J. Watson Research CenterYorktown Heights, NY, USA
DAC 2007
3
© 2006 IBM Corporation
Scaling no longer provides traditional performance boost
Power limits everything
Advances will come from entire performance stack
Technology
Chip Level
System Level
Application
Dynamic optimization
Assist Threads
Fast Computation
Power Optimization
Compiler Support
Packaging, CoolingNew Devices
Dense SRAM, eDRAMOptics
Memory
Languages,Software Tuning
Efficient Programming
Middleware
System Performance Requires An Integrated Approach
Compiler Support
Multiple Cores
SMT
Accelerators
Power Management
Interconnect
Circuits
RecentHistorical
Trend
Device Performance
1998 2000 2002 2004 2006 2008
Production Date20
200
FP
G
100
4
© 2006 IBM Corporation
Innovation in System DesignL
3 D
ire
cto
ry/C
on
tro
l
L2 L2 L2
LSU LSUIFUBXU
IDU IDU
IFUBXU
FPU FPU
FX
U
FX
U
ISU ISU
Power 4Multi-Core-2001
Power 5Multi-Thread-2004
CELLAccelerators-2006
Power 64.7 Ghz-2007
5
© 2006 IBM Corporation
Trend to Modular Application Optimized Systems
Growing use of diverse modular components
Chip integration may evolve to component assembly
Challenge is in system-level design
– Optimizing architecture for specific applications
Core Accelerator
Cache
Blades
SMP
...
Memory
6
© 2006 IBM Corporation
Multi-Core ASICs
Multi-core ASIC SoCs are common today– Address broad range of markets
– Enables high functional integration
– Provides rapid time to market
One example from 2004– Cisco Silicon Packet Processor
– 188 32-bit RISC processors
– 47 BIPS
7
© 2006 IBM Corporation
Multi-Core Processors
Power efficient, reusable cores
Application matched accelerators
Flexible scaleable interconnect
Optimized memory hierarchy
High speed I/O
Energy management
Deliver system performance
Rapid chip assembly to serve diverse markets
8
© 2006 IBM Corporation
CHALLENGE
System Design
– Continued performance growth
– Increasing power efficiency
– Optimizing for new applications
Design Automation
– Custom design efficiency
– AISC productivity
– Design and verification
Enablers
– Physical Architecture
– Integrated Early Analysis
– Multi-Core Verification
9
© 2006 IBM Corporation
Physical Architecture
Complement logical architecture
Streamline chip integration
Plan for interconnect
Provide predictable results
Multiple strategies
– Fixed layout per block
– Parametric or generated
– Extended synthesis
Example Logical Architecture
Example Physical Architecture
10
© 2006 IBM Corporation
Modular Components
Components need self-contained vertical stack
– with clean interfaces to enable automated integration
ComponentFabric
InterfaceComponent
Function
FutureComponent
Current“Component”
Mixed Fabric and Component Function;
Custom Interface
Future ChipsCurrent Chips
Automated connection with parametric fabric
Custom crafting of clock, data, and power meshes
11
© 2006 IBM Corporation
Custom Design
Careful interconnect design
– Communication
– Clock distribution
– Power and ground
Better power efficiency
– Clock gating, Power gating
– Detailed transistor sizing
High bandwidth memory and I/O
Higher frequency operation
12
© 2006 IBM Corporation
Challenges of Modular Design
Core Core
CoreCore
Core
CoreCore
Core
Custom Layout
– Flexible shape and orientation
– Optimum mesh for power and clock
– Distributed communication and test
– Manually optimized
Modular Layout
– Constrained shape and orientation
– Separate power and clock per core
– Parametric interconnect fabric
– Automatic connection to fabric
13
© 2006 IBM Corporation
Custom Clock Design
Distribution network– Latches and clocked gates
– Control skew and jitter
– Minimize power
– Survive variation and noise
Interconnect models– Inductance critical
– Transmission line
– Buffer placement
Hand optimized– Still an art
Phillip Restle
14
© 2006 IBM Corporation
Custom Power Distribution
Distribute to all devices Multiple voltage domains Simulate detailed power demand Model chip and package Consider ground coupling Balance mesh and trees Allocate decoupling capacitors Focus on resonant frequency Explore clock/power gating
scenariosHoward Chen
15
© 2006 IBM Corporation
Challenges of Modular Design
Custom Wiring
– Optimized over chip
– Resources shared
– Variation minimized
– Complex analysis and integration
Modular Wiring
– Optimized at block level
– Fixed resource allocation
– Some variation in results
– Requires automated integration
16
© 2006 IBM Corporation
Spectrum of Strategies
Fixed physical architecture
Careful block design
Custom within block
Automated block connect
Predictable results
Good for planned cases
Stresses design
ModularReuse
ExtendedSynthesis
Generated physical architecture
More abstract layout
Heavy physical synthesis
Unique block configuration
Results will vary
Flexible restructuring
Stresses tools
Fixed Layout …. Parametric ….. Generated
17
© 2006 IBM Corporation
Systems Demand Early Analysis
To explore many more options
– Cores, Accelerators, Interconnect, Memory Hierarchy, …
To consider many design criteria simultaneously
– Power, Performance, Latency, Hotspots, Reliability, …
To optimize system for specific market
Environment exists for early functional modeling
But today’s tools are not linked to physical design
18
© 2006 IBM Corporation
Early System Analysis
PerformanceModels
Design
PowerAnalysis
Technology
ThermalAnalysis
Package
Implementation
InterconnectAnalysis
FloorplanAssumptionsAssumptions
DesignTeam
Loosely coupled disciplines with multiple experts and distinct models
19
© 2006 IBM Corporation
Performance Modeling Is Changing
New parallel workloads emerging
– Execution vs. trace driven
Shifting to multi-core designs
– Stresses balance of model performance and accuracy
Complex interconnect fabric and memory hierarchy
– Bus, switch, network, asynchronous,…
Increasing use of SystemC
– For early software development and component sharing
20
© 2006 IBM Corporation
Early Physical Planning is Essential
Interconnect requires full chip layout
– Estimate component area before implementation
– Need more accurate methods
– Have to plan for all facilities to predict chip size
Placement coupled to many factors
– Interconnect performance
– Power
– Thermal and reliability concerns
– Yield
21
© 2006 IBM Corporation
Interconnect Fabric
Modeling Interconnects in Multi-Core Designs
MemoryController
Core
Cache
Core
Cache
Cache
Core
Cache
Core
Async/Sync Interface withParametric delay
Interconnect Delays
Interconnect delays– Effect performance– Depend on placement– Require accurate modeling
22
© 2006 IBM Corporation
Power is Key Criteria, but Hard to Predict
Need estimate before implementation
– Voltage/Frequency scaling, Voltage islands,clock gating, leakage
Not just core, but many diverse chip components
– Core, cache, interconnect, controllers, I/O, pervasive
Model “interesting” states and transitions
Scale known implementations
– Complex measurement process for calibration
– Requires data from chip layout
23
© 2006 IBM Corporation
Integrated Early System Analysis
Implementation
DesignFloorplanPackage
TechnologyAssumptions
Results
Performance
Power
Interconnect
Thermal
Optimize
Handoff
DesignTeam Couple all forms of early analysis
Share data in central repository
Industry standard data model
– Open Access
Hand-off to chip integration
– Assumptions, blocks, layout, …
Graphic interface for editing
Stage is set for optimization
24
© 2006 IBM Corporation
Multi-Core Verification
Verification has always been the greatest challenge
Complexity grows with each generation
Challenge is to exploit reuse with multi-core designs
– Requires clear interface definition
CoreCore
Core Core
CoreCore
VerificationSystem
Verification
Traditional Approach Multi-Core Approach
25
© 2006 IBM Corporation
Core Verification
Complexity growing– Clock/Power gating, Voltage and frequency scaling
Formal methods are used– Checking RTL = netlist
– Checking assertions
– Proving implementation equivalent to reference model
Simulation still dominates
Need higher level of specification– Improve quality
– Stretch synthesis and verification tools
Reuse verification environment
26
© 2006 IBM Corporation
System Verification
More complex systems
– Many cores, accelerators, networks, asynchronous links
Memory and network contention is critical area
Formal methods have made impact
– Verifying abstract memory protocols
Simulation is still the final check
Need system-level test case generation
– Use system knowledge to expose resource contention issues
27
© 2006 IBM Corporation
Summary
Exciting and challenging times
– Designing application optimized multi-core systems
– Delivering custom efficiency with ASIC productivity
Focus areas
– Physical Architecture to streamline chip integration
– Integrated Early Analysis to explore design space
– Multi-core verification that exploits reuse
Long history of invention in today’s RTL flow
Innovation is needed now at the system level
28
© 2006 IBM Corporation
Acknowledgements
Thanks to the following people
– Emrah Acar, Reinaldo Bergamaschi, Pradip Bose, Howard Chen, Nagu Dhanwada, Steven German, Steve Kosonocky, Indira Nair, Ruchir Puri, Phillip Restle, Albert Ruehli, Michael Vinov.