x86 everywhere - amd
TRANSCRIPT
x86 Everywhere
Chris HerringDirector StrategyPCSGAMD
Session OutlineSession Outline
Instructions, Languages, and the EuroArchitectural Evolution
Cost of DeploymentMacro Level
Extending x86Architectures du JourEvolution/Convergence: Why Now?
Primary Drivers: Performance, Power, CostKnobs and LeversAbsolute Costs and Tradeoffs
Future Innovations
Session GoalsSession Goals
An understanding of the benefits of a unified
approach to Instruction Set Architecture
No limitations to extending x86 architecture
The time is now
This is not a crazy idea!
Imagine if instruction sets were languagesImagine if instruction sets were languages……
Bonjour MondeHallo WeltΓειάσου κόσμοςCiao Mondo����������� ��Hello MundoЗдравствулте! МирHola Mundo
Or, conversely, if languages were instruction set Or, conversely, if languages were instruction set architecturesarchitectures……
Language Should UnifyLanguage Should Unify
What I really want to say is “Hello, World”Each Instruction Set Architecture (ISA) requires translationTranslation, Virtual Machines, Transmorphing, Transcoding, Simulation, Re-compile: All “tech speak” for 2nd language communicationWhat we have in effect is, truly, an ISA “Tower of Babel”
UltraSPARC®
PowerPC™
XScale™SHARC®
SuperH®
Cell
Towards Instruction Set ConsolidationTowards Instruction Set Consolidation
Future innovation should come in micro-architecture enhancements and compatible
extensions to dominant instruction sets, rather than the creation of new instruction sets.
With ever growing software complexity and installed base the value of remaining compatible with and extending existing, dominant
instruction sets heavily outweighs any disadvantages.
Is the trend clear?1
Technology has passed the point where instruction set costs are no longer relevant.
Is the time now?2
What The Euro Can Teach UsWhat The Euro Can Teach Us
The economic benefits of moving away from multiple currencies is enormous.
$36 billion per year in avoided transaction costs, or $90 per EU
resident
What The Euro Can Teach UsWhat The Euro Can Teach Us
What were the catalysts that prompted standardization?
Growing markets
More sophisticated consumers
Desire for increased stability
Exploding Functionality
per Square Inch
70s 80s 90s 00s 10s
Architectural EvolutionArchitectural EvolutionMacroMacro--LevelLevel
Growing Software
Complexity
Majority of System Software and
Applications are x86-Based
Time is Right for Full-Function x86
on All Form Factors
• Porting complexities
• High management costs
• “Crippled” apps• Non-integration• Backup, security• Configuration
management
High TCOHigh TCO
The True Cost of DeploymentThe True Cost of Deployment
Enthusiastic Initial Enthusiastic Initial DeploymentDeployment
• Communication• PIM function• Portability• Ease of use
• Wide range of ISV applications
• Mission critical custom applications
• “Out-of-the-box”integration
• Universal security policy
• Interoperability
Uncompromising Uncompromising End UsersEnd Users
STOPPostponed or Postponed or
Canceled Canceled Projects, Projects, Limited Limited
Deployments Deployments
Possible SolutionsPossible Solutions
STOP
• Resource intensive
• Slow• Costly to
maintain
x86
ARM
MIPS®
EPIC
PowerPC
SPARC
main() {main() {printf(printf(““Hello WorldHello World\\nn””););
}}
Port thousands of applications, operating systems, drivers, codecs, tool chains and virtual machines.
Cell
Other
Possible SolutionsPossible Solutions
Develop a web-based interface for each application.
STOP
• Assumes always-connected client server
• Limited functionality
• Least Common Denominator
• Difficult security
x86
ARM
MIPS®
EPIC
PowerPC
SPARC
Cell
Web Interface
0001010101001011110110
10101001
Other
Possible SolutionsPossible Solutions
Write once (or port once) and run anywhere.
Portingis not
• Heavy testing• Ongoing
optimization• Per-platform
customization
YIELDPortability is good
x86
ARM
MIPS®
EPIC
PowerPC
SPARC
Other
Java or .Net
0001010101001011110110
10101001
Is it really Is it really ““port onceport once””
Java
J2SEJ2EE
J2MEJXTA
JSLEE
Executing and/or translating to multiple languages and platforms is a necessary
cost — not something to be desired.
Architectural Evolution MacroArchitectural Evolution Macro--LevelLevel
Common Instruction Set
Architecture
No need to port
No need for multiple validations
Built in OS integration
Robust security
Investment protection
DesktopServer
LaptopTo
day
Storage
Handheld
Hap
pen
ing
No
w
Networking
Ubiquitous
Th
e F
utu
re?
The Importance of Dusty DecksThe Importance of Dusty Decks……Or Data Sets Never DieOr Data Sets Never Die……
Essential legacy data and code lives forever.
PL1/BasicPL1/Basic
VMSVMS
VAX VAX EmulatorEmulator
x86 Server
Legacy Legacy DataData
Legacy Legacy DataData
LinuxLinux
Extending x86Extending x86
Enterprise x86
Consumer x86
LFF Consumer Electronics
Networking
SFF Consumer Electronics
DesktopWorkstation
High End Server
Low/Mid Server
SAN/NAS
Handheld
Rugged Small Form
Factor
Internet Appliances
X86
Power
ARM
MIPS®
Precision
Sparc
AlphaVAX
432
32xxx
Transputer
68xxx
Common uArch
Consolidation
?RISC
ISA Architectural Evolution/ ConvergenceISA Architectural Evolution/ Convergence(architectures must cross platform boundaries)(architectures must cross platform boundaries)
70s 80s 90s 00s 10s
Proprietary
Proliferation
PowerPower
ARMARM
MIPS®MIPS®
X86X86
Auto
DTV
Digital Camera
Mobile Phone
Computer
Why is It Possible Now?Why is It Possible Now?Moore’s Law
Core processors can now be so small that any overhead of x86 is easily affordable (a few mm², a few % of total die, trivial power increment)
Sufficiency of performanceCPU designs can range from small simple designs to huge server designs
Added functionality makes processor core small part of chip
Large L1 and L2 cacheSOC functions (memory control, graphics, etc.)
Pads fundamentally force minimum die size much larger than coreLearning
Lots of design tricks have accumulated
6 Copper layer 6 Copper layer 180 nm design rules180 nm design rules
9 Copper layer 9 Copper layer 130 nm design rules130 nm design rules
Interconnect EvolutionInterconnect Evolution
2 Alum. layer 2 Alum. layer 1.01.0µµ design rulesdesign rules
AMD Am386®ProcessorAMD Am386®Processor
AMD Athlon™ProcessorAMD Athlon™Processor
AMD Athlon™ 64 ProcessorAMD Athlon™ 64 Processor
45 nm generation
32 nm generation
22 nm generation
65 nm generation
90 nm generation
(2004)(2004)
(2005)(2005)
(2007)(2007)
(2009)(2009)
(2011)(2011)
LLgg = 13 nm= 13 nmLLgg = 15 nm= 15 nm
LLgg = 35 nm= 35 nm
Planned Transistor EvolutionPlanned Transistor Evolution
LLgg = 50 nm= 50 nm
Huge Variation Huge Variation –– Without Even Trying!Without Even Trying!
Frequency
Stat
ic C
urre
ntAt this frequency, Iccstaticvaries from 1.5A to 10A+
About 10% lower MHz
Lots of Really Nice
Parts
Lots of Really Nice
PartsFast, High PowerFast, High Power
Fast, Low PowerFast, Low Power
1.0 1.5
15
0
Primary Drivers of Performance, Power and Primary Drivers of Performance, Power and CostCost
PowerPowerDynamic
VoltageSwitching capacitance
# of parallel units# of flops (pipeline depth)Clock skew goalLong busesClock gatingI/O
StaticProcess: Gate leakage, Dcap leakage and IoffVoltage and temperatureTotal transistor width
Leaky transistor width if substantial use of low leakage transistors
CostCostPackage, Assembly and Test
25-50% of total cost
Cache and IO Often Dominate Die Size
IO often in the range of 15-20% of die areaCache can be as much as 50% of the die area or more
PerformancePerformance
Determined by IPC and MHz
At 1GHz and 1 IPC a 1% effective miss rate cuts performance by approx. 50%, 4% cuts it by approx. 80%Required IPC, application footprint and DRAM speed dictate cache size
Peak Instruction Rate
Memory Latency ―We Have Hit the Memory Wall
Dynamic Range of Design ChoicesDynamic Range of Design Choices
Factor
RangeLever
500x0.2-100 WDynamic Power
15x20-300 mm²Die Size
1000x1-1000 nA/um Leakage (Idoff)
10x300-3000 MHzFrequency5x100-500IO Pins
2.5x0.7-1.8 voltsVoltage10x1-10 unitsILP
125x16K-2M bytesCache
5x5-25 stagesPipeline Depth
All can be varied independent of instruction set.
Three Orders of Magnitude Transistor Three Orders of Magnitude Transistor CountCount
16 32 64 BitsInteger FPU, SIMD, Vector1/3 Issue 9 IssueTrivial Cache 1MB CacheCPU SOC
0.25μ9M
transistors78mm2
0.18μ37M
transistors120mm2
AMD K6®-III Processor
AMD Athlon™
Processor
AMD Opteron™Processor
0.13μ100M
transistors193mm2
AMD Geode™
Processor
0.15μ9.5M
transistors58.3mm2
90nm84mm2
AMD AMD64™Processor Future AMD
Geode™ SOCProcessor
90nm30M+ transistors
40-45mm2
Levers Affecting Performance, Power and Levers Affecting Performance, Power and CostCost
Exponential
Polynomial
Linear
0.1 or Less 1.0 10 or More
SW
Alg
ori
thm
SW
Alg
ori
thm
ISAISA
Ckt Ckt StyleStyle
PerfPerf(MP)(MP)
MHz(V)MHz(V)
PipelinePipeline
ILPILP DataData--Set Set SizeSize
PowerPower(V)(V)
ProcessProcess
PowerPower(skew)(skew)
$Size$Size(MHz,App)(MHz,App)
LeakageLeakage(Tox)(Tox)
Instruction set (ISA) is probably least valuable and
certainly most disruptive.
Absolute Costs Demonstrate the Time is Absolute Costs Demonstrate the Time is NowNow
I/O
I/O (25%)
L2 (42%)
FPU/RF/OO (5%)
D$ (6%)
I$(4%)
INT/RF/OO (4%)
LS(4%)
Northbridge (5%)
BU(1%)
BP (2%)
DEC (2%)
Instruction set consumes very little real estate comparatively
Absolute Costs Demonstrate the Time is Absolute Costs Demonstrate the Time is NowNow
40mm², $4-$8Substantial CPU Core
5mm², $0.5-1Small CPU Core
$1-$2512KB Cache
$5-$10500 pin package
$1-$4200 pin package
0.5-2 centsPackage pin
10-20 cents1mm²
1mm²64KB Cache
1mm²1M Transistors
... and so, instruction set adds very little
cost.
We Can Optimize for Area or Power or Performance.
Future MicroFuture Micro--Architectural Architectural InnovationsInnovations
All are essentially instruction
set agnostic.
• Threaded architectures
• Multicore
• Chip level multiprocessing
• Huge scale MP machines
• Much higher performance superscalar, out of order CPU core
• Huge caches
• Media/vector processing extensions
• Static and dynamic Power management
• Branch and memory hints
• GHz performance IO
• Security and virtualization
Final Choices are driven by Optimization
Priorities.
Towards Instruction Set Towards Instruction Set ConsolidationConsolidation
Instruction Set Architecture Consolidation
Micro-Architecture Proliferation
The trend is clear. The time is now.The trend is clear. The time is now.
Extending x86Extending x86
Enterprise x86
Consumer x86
LFF LFF Consumer Consumer ElectronicsElectronics
NetworkingNetworking
SFF SFF Consumer Consumer ElectronicsElectronics
DesktopDesktopWorkstationWorkstation
High End High End ServerServer
Low/Mid Low/Mid ServerServer
SAN/NASSAN/NAS
HandheldHandheld
Rugged Rugged Small Form Small Form
FactorFactor
Internet Internet AppliancesAppliances
Call To ActionCall To Action
Hardware Developers:
Break through artificial barriers of power, price, form
factor
Allow a common architecture across market
boundaries
Software Developers:
Do not lockout support based on ISA
Allow x86 to actually be Everywhere
Additional ResourcesAdditional Resources
Web Resources:
Specs: http://www.amd.com
http://www.amd..com/embeddedprocessors
Other Resources: http://www.50x15.com
Related Sessions
Low Power, small formfactor x86
© 2005 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary.
AttributionAttribution© 2003 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo, AMD Athlon, AMD Opteron and combinations thereof, and Geode are trademarks, and Am386, Am486 and K6-III are registered trademarks of Advanced Micro Devices, Inc in the U.S. and/or other jurisdictions. MIPS is a registered trademark of MIPS Technologies, Inc. in the U.S. and/or other jurisdictions. Other product and company names used in this presentation are for identification purposes only and may be trademarks of their respective companies.