hypertransport extending technology leadership
TRANSCRIPT
Copyright HyperTransport Consortium 2009
HyperTransportExtending Technology Leadership
International HyperTransport Symposium 2009February 11, 2009
Mario CavalliGeneral Manager
HyperTransport Technology Consortium
Copyright HyperTransport Consortium 2009
HyperTransport and Consortium Snapshot
Industry Status and Trends
HyperTransport Leadership Role
February 11, 2009
Mario CavalliGeneral Manager
HyperTransport Technology Consortium
HyperTransportExtending Technology Leadership
Copyright HyperTransport Consortium 2009
HyperTransport SnapshotLow Latency, High Bandwidth, High Efficiency
Point-to-Point Interconnect Leadership
CPU-to-CPU CPU-to-I/O
CPU-to-Coprocessor
Copyright HyperTransport Consortium 2009
Adopted by Industry Leadersin Widest Range of Applications
than Any Other Interconnect Technology
Snapshot
Formed 2001
Controls, Licenses, Promotes HyperTransport as Royalty-Free Open Standard
World Technology Leaders among Commercial and Academic Members
Newly Elected PresidentMike UhlerVP Accelerated ComputingAdvanced Micro Devices
Copyright HyperTransport Consortium 2009
Industry Status and Trends
Copyright HyperTransport Consortium 2009
Global Economic DownturnTough State of Affairs for All Industries
Copyright HyperTransport Consortium 2009
Consumer Markets Crippled with Long-Term to Recovery
Commercial Markets Strongly Impacted
Consequent Business Focus
Cost Effectiveness
No Redundancy
Frugality
Copyright HyperTransport Consortium 2009
Downturn Breeds Opportunities
Copyright HyperTransport Consortium 2009
Reinforced Need for More Optimized, Cost-Effective Computing Infrastructure
Good for HPC Sector
Creating Demand for New Technology
Copyright HyperTransport Consortium 2009
Delivering:More Value for Same Power and Cost Same Value for Less Power and Cost Best Investment Preservation Minimized Total Cost of Ownership
Through Better:Performance and Power Efficiency Resource Flexibility and Adaptability System Virtualization Consolidation
Producing New Computing TrendsCloud Computing Hosted Software, Software as a Service (SaaS)Replace Costly In-House Infrastructure and Management Resources
Infrastructure Centralization Demands Efficient Data Centers, Server Farms
Copyright HyperTransport Consortium 2009
Producing New Computing Trends (cont.)
Netbook over Notebook / Desktop
New? No
Innovative? No
Same for Less? No
Less for Much Less? Yes!
Good Enough if Budget Tight? Yes!
Right-Time, Right-Place Products? Right!
Copyright HyperTransport Consortium 2009
HyperTransport Leadership Role
Copyright HyperTransport Consortium 2009
Copyright HyperTransport Consortium 2009
Answers Market Trend Expectations
With Core Values
Leading Performance
Full Scalability
Power Efficiency
Low Design Cost
Market-Proven Solidity
Vast Product Ecosystem
Copyright HyperTransport Consortium 2009
Continued Technology Progression
With Expanding Market Presence
2001
HT 1.0
HT 2.0
2002
2004
2003
20062005
HTX
HT 1.1
17.7M HT-BasedSystems Shipped
(Note 1)
2008
HT 3.0
Note 1: by end of 2003 – Source InStatNote 2: by end of 2008 – Source InStatNote 3: High Node Count HT Specification 1.0 - Accessible/Useable by HTC Promoter and Contributor Members Only
62.7M HT-BasedSystems Shipped
(Note 2)
HTX3
HT 3.1HNC 1.0
(Note 3)
2009
Copyright HyperTransport Consortium 2009
HT 3.1 Specification
Keeps HT Ahead of Industry Requirements
HT 3.1
2.6 GHz 2.8 GHz 3.0 GHz 3.2 GHz Clock
51.2 GB/s (32-Bit)25.6 GB/s (16-Bit)
HT 3.041.6 GB/s (32-Bit)20.8 GB/s (16-Bit)
Clock Rate 2.0 GHz 3.2 GHz 60%Bandwidth 16 GB/s 51.2 GB/s 220%Link Width 16-bit 32-bit 100%
Feature Current Use HT 3.1
Max Max Headroom
Solidifies HT LeadershipReinforces HT ROI
The Only 32-Bit-CapableProcessor Interconnect
In Industry
Copyright HyperTransport Consortium 2009
HTX3TM Specification
3x Bandwidth of HTXTM Connector Standard
• HT3.0 Performance• HT3.0 Link Splitting Support• More Power Mgmt. Features• 100% Backward Compatibility
For Highest Performance Subsystems
Direct Network / Switched Network
Copyright HyperTransport Consortium 2009
High Node Count HT Specification 1.0
Enables Scalable HPC Systems and Clusters with Low Latency Non-Coherent Shared Memory Architecture
Server nServer 2
Server 1
M1
M2
M4M3
M5
M6
M7 M8
Mx
Mx+1
Mx+2 Mx+3
Copyright HyperTransport Consortium 2009
High Node Count HT Specification 1.0 (cont.)
Answers Ever Compounding On-Chip + In-System Addressing Challenge
You are Here
ExponentialNumber of Cores
ExponentialNumber of CPU
Clusters/Subclusters
Network
Copyright HyperTransport Consortium 2009
High Node Count Specification 1.0 (cont.)
Supports Global Sharing of Localized Data Storage
Server YServer X
Server Z
Network
Copyright HyperTransport Consortium 2009
High Node Count Specification 1.0 (cont.)
Server YServer X
Server Z
Flash MemorySubsystem
High-DensityDRAM
Especially High-Density DRAM
Supports Global Sharing of Localized Data Storage
Network
Copyright HyperTransport Consortium 2009
High Node Count Specification 1.0 (cont.)
Server Y
Especially High-Density DRAMand Low Power Flash-Based Memory Subsystems
Server X
Server Z
Flash MemorySubsystem
High-DensityDRAM
Supports Global Sharing of Localized Data Storage
Copyright HyperTransport Consortium 2009
High Node Count Specification 1.0 (cont.)
Best System and Performance ScalabilityMinimized Power Consumption
Optimized Total Cost of Ownership
Copyright HyperTransport Consortium 2009
Mature Stability, Mission-Critical Reliability
Field-Proven Dependability for Demanding Markets
63 Million HT-Powered Products
by end of 2008
8% Defense Applications 17%32% Top500 Supercomputers 28%11% Core Routers 1.2%22% Edge Routers 34%15% SAN 11%23% Servers 38%
2007 2007Capture Market Yr/Yr Growth
Source: InStat
Copyright HyperTransport Consortium 2009
Ever Expanding Product Ecosystem• From HT IP to HT Software• Fosters Technology Strength
• 12 HT-Based Processor Brands• Widespread Market Utilization
X86 Computing
Graphics
Security
Packet
Media
Comm
Acceleration
System Virtualization
Copyright HyperTransport Consortium 2009
Expanding Product Ecosystem (cont.)
New Godson Multi-Core Server-Class CPU
• Petascale Performance Target by 2010• Backed by China’s Government• MIPS-Based with 200+ More Instructions for
x86 Translation and Acceleration• 16 GFLOPS at 1GHz and 10W of Power• Earlier versions (non-HT), produced by ST
Microelectronics and sold to 40 companies in set-top boxes, laptops, etc.
• @200 developers working on Godson HW, @100 on SW and Compilers
Institute of Computing TechnologyChinese Academy of Sciences
Copyright HyperTransport Consortium 2009
HyperTransport Book
Covers all HT Link and HTX Specification
Available Online from MindShare www.mindhsare.comin Paper and eBook Formats
700 Pages of Must-Have Tutorial
Co-Authored by HTC’s Brian Holden
Thank You!
Mario CavalliGeneral Manager
HyperTransport Technology Consortium
Copyright HyperTransport Consortium 2009
Corollary InformationNot Part of Live Presentation
Copyright HyperTransport Consortium 2009
HyperTransport Everywhere!
Copyright HyperTransport Consortium 2009
Also in PowerPC-Based and Intel-Based Products
Godson Server-Class CPU
Copyright HyperTransport Consortium 2009
4-Core Reconfigurable Architecture
PCIe PCIe
DMA Engine Supports Pre-Fetch and Matrix
Shared L2 ConfigurableAs Internal RAM, DMA
To Internal RAM Directly(Stream Processor)
8 Config. AddressWindows of EachMaster Port AllowPages Migration
Across L2 and Memory
8x8 AXI Switch
2 Links for Each Node’s4 Connection Points
Nodes Organized in Mesh
ncHT1.0 ncHT1.0
Directory-Based CoherenceProtocol Safeguards
Cache Data
65-nm Technology
Institute of Computing Technology - Chinese Academy of Sciences
Godson Server-Class CPU (cont.)
Copyright HyperTransport Consortium 2009
Godson Versions
8-Core Multi-Chip 20W Version Possible in 2009
Institute of Computing Technology - Chinese Academy of Sciences
Godson Server-Class CPU (cont.)
Copyright HyperTransport Consortium 2009
GodsonCoresProfile
Institute of Computing Technology - Chinese Academy of Sciences
Copyright HyperTransport Consortium 2009
How and Why HyperTransport HTX Proves Best Choicefor Compute-Intensive Applications
HTXTM Spotlight
Copyright HyperTransport Consortium 2009
HTXTM Values Snapshot
Enables• HPC Products Demanding Performance
Beyond the Reach of PCI-Class Interconnects• Integration of System Functionality
Too New/Complex/Costly for MB Integration
Empowers• HPC Solution Providers with a Competitive Edge
– No Risks of Premature MB Integration– Shortest Time-to-Market– One MB Fits Multiple Markets/Applications– Up-Sell Factor
Copyright HyperTransport Consortium 2009
Compute Intensive• High Bandwidth + Low Latency• Multi-Processing, Co-Processing
Target Markets• Database Analytics• High Traffic Web Services• Stock Trading Acceleration• Server Clustering and SMP• Streaming Media Servers• Financial Modeling
HTXTM Applications
Expanding HTXTM Product Ecosystem
HTXTM
Server / MBData Analysys Coprocessor
Content-AwareRouting Processor
High-PerfServer Clustering
Controller
Content/SecurityProcessor
10GE NICRef Design
UniversalHTX/HTX3 Board
Ref Design
FPGA Ref Design Board
Content/SecurityProcessor
More Innovative HTXTM Systems and Subsystems in the PipelineCopyright HyperTransport Consortium 2009
New HTXTM Systems
Copyright HyperTransport Consortium 2009
ProLiant DL165-G5
ProLiant DL785-G5
HTX HTX PCIe PCIe PCIe PCIe PCIe PCIe PCIex16 x16 x4 x4 x16 x4 x4 x4 x8
Slot Blank 9 Blank 8 7 6 5 4 3 2 1
New HTXTM Subsystems
Copyright HyperTransport Consortium 2009
Cache-Coherent Shared Memory Processor for Scalable Server Clustering
NumaChip Technology
New HTXTM Subsystems
Copyright HyperTransport Consortium 2009
VulcanContent-Aware Routing Processor for Multi-Core Systems
Delivers UnprecedentedMulti-Core Processing andPower Optimization
ApplicationsHigh-Traffic WebTelecomAutomated TradingHigh Throughput, Fast Network Access
New HTXTM Reference Designs
Copyright HyperTransport Consortium 2009
HTX3TM Universal Reference Design Board
HT3 Core IP
Copyright HyperTransport Consortium 2009
Why HTX3TM ?
Empowers Future HPC Innovation
• FPGAs Playing Key Role in Compute-Intensive Designs• HTX3 Paves Way for New Generation FPGA Technology
– FPGAs from Bandwidth Bottlenecks to Performance Drivers• Power Optimization Ranks High in HPC Agenda• HT 3.0 Has Reached Maturity and Stability• HT 3.0 Capability Now Safely and Stably
“Connectorized”
Reinforces HTX Performance Edge over PCI Express
Copyright HyperTransport Consortium 2009
HTX3TM Features Summary
Feature HTX HTX3 Notes
Max Clock Rate 800 MHz 2.6 GHz 12” Trace length
Max Bandwidth x Lane 1.6 GT/s 5.2 GT/s Bi-directional
Max Bandwidth Aggregate
6.4 GB/s 20.8 GB/s Bi-directional 16-Bit HT link
HT3 Link Splitting Support
NO YES HT link can be 1x 16-Bit or 2x 8-Bit for multi-CPU
support
HT3 Extended Power Management
NO YES LDTREQ# Signal Added to participate in x86 power
states
Extended FPGA Guidelines
NO YES Incorporated field-proven recommendations
Full Backward Compatibility
-- YES Level shifters and signal allocation
For more details, see HTX3 specifications on HTC’s web site
Copyright HyperTransport Consortium 2009
HTXTM a Substitute for PCI Express?
No – HTX Complements and Coexists withPCIe by Providing the Capability that
PCIe Cannot Deliver
DDR Memory
Chipset
Direct Connect toCompute-Intensive
Subsystems PeripheralInterconnects
HTXHTX TMTM
HTX3HTX3 TMTM
HTX3HTX3 TMTM
DDR Memory
16-Bit
2x 8-Bit
Copyright HyperTransport Consortium 2009
Unique HTXTM Capabilities
• 20% Better Physical Layer Latency and Bandwidth due to Absence of 8B/10B Clock Recovery Overhead– No SerDes
• 55% Lower Latency Per Transaction due to Absence of Intermediate Control Logic Overhead– 95nS of PCIe Gen2’s Estimated Round Trip Penalty out of 170nS
Total on Short, Open Page DRAM Reads• Vastly Leaner Protocol (Packet Payload)
– 12 Less Bytes of Overhead per Packet Compared to PCIe• 20nS Better Per-Transaction Latency in Heavy Traffic
Environments due to HT’s Priority Request InterleavingTM
Aggregate Latency Advantage
Copyright HyperTransport Consortium 2009
Unique HTXTM Capabilities (cont.)
Up to Twice Packet/Latency Efficiencyin Intra-Processor Traffic
Packet Overhead Efficiency Margins over PCIe
Min Overhead
Max Overhead
Data Bytesper Packet
Efficiency
HTXTM
Usual Intra-Processor Traffic
Copyright HyperTransport Consortium 2009
Considerable Per-Packet Latency Advantage
Per Packet Latency Advantage over PCIe Gen2HTX3TM
nS
nS
nS
Min Packet Overhead
Max Packet Overhead
HTX3: 2.6 GHz - x16 Links
PCIe: 5.0 GHz – x16 Links
Data Bytes Per Packet
Latency Advantage
Latency Advantage
The results take into account PCIe’s 20% clock recovery, packet payload and 55% chipset overhead penalties. HTX’s Priority Request Interleaving, if applicable, will add to HTX’s total latency advantage.
Usual Intra-Processor Traffic
Unique HTXTM Capabilities (cont.)
Copyright HyperTransport Consortium 2009
Superior Bandwidth
Feature PCIeGen1
PCIeGen2
HTX HTX3
Max Clock Rate 2.5 GHz 5.0 GHz 800 MHz 2.6 GHz
Double Data Rate NO NO YES YES
Max Bandwidth x Lane 2.5 Gbps 5.0 Gbps 1.6 GT/s (*) 5.2 GT/s (*)
8B/10B Penalty -20% -20% No Penalty No Penalty
Net Bandwidth x Lane 2.0 Gbps 4.0 Gbps 1.6 GT/s (*) 5.2 GT/s (*)
Net Bandwidth 16-Bit - Aggregate
8 Gbytes/s
16 Gbytes/s
6.4 GBytes/s
20.8 GBytes/s
(*) HyperTransport supports Double Data Rate (DDR), transferring data on both the leading and trailing edge of the clock. Therefore HyperTransport’s bandwidth is more appropriately represented by the term “Transfers/second” than the term “Bits/second.”
Unique HTXTM Capabilities (cont.)
Copyright HyperTransport Consortium 2009
Tangible Time-to-Result Savings!
Time-to-Result Savings vs. PCIe Gen2HTX3TM
Number of Packets Transferred
100,000Per Task
1 MillionPer Task
1 BillionPer Task
Bytes per Packet Transferred
4 0.78 mS 7.8 mS 7.8 Sec
16 4 mS 40 mS 40 Sec
256 0.32 Sec 3.20 Sec 53 Min
512 1.16 Sec 11.62 Sec 3.23 Hrs
The results take into account PCIe’s 20% clock recovery, packet payload and 55% chipset overhead penalties. HTX’s Priority Request InterleavingTM , if applicable, will add to HTX’s total time-to-result latency advantage
Compute-Intensive Tasks Require 100Ks to Billions of Packet Transactions
Unique HTXTM Capabilities (cont.)
Copyright HyperTransport Consortium 2009
Example: Celoxica’s AcceleratorCompany’s Benchmark Results
Unique HTXTM Capabilities (cont.)
HTXTM Interface Interface
Latency Access to Network Data Regardless of Packet Size
1.4 uS <10 uS
HPC - Industry’s Bright Star
Copyright HyperTransport Consortium 2009
Strong Business Growth Opportunities