amd opteron tm overview. june 1, 2015computation products group 2 top level agenda the hpc market &...

120
AMD Opteron TM Overview

Post on 18-Dec-2015

223 views

Category:

Documents


4 download

TRANSCRIPT

  • Slide 1
  • AMD Opteron TM Overview
  • Slide 2
  • June 1, 2015Computation Products Group 2 Top Level Agenda The HPC Market & AMD AMD64 A Programmers View AMD Opteron Processor The HW Core Improvements Integrated Memory Controller HyperTransport Technology Clustering Performance System Solutions & Applications Development platforms Recent Events Summary
  • Slide 3
  • June 1, 2015Computation Products Group 3 Computing System Evolution: Mainframes to desktops to clusters Mainframes ~ 1965 Tightly coupled processor, computer, OS and software from a single company Proprietary software >$1M Departmental Minicomputers ~ 1970 Significant proliferation of servers as machines leave glass houses
  • June 1, 2015Computation Products Group 42 AMD Athlon 64 Processor Replaces Address, Data and Control Bus L2 Cache L1 Instruction Cache L1 Data Cache AMD64 Processor Core DDR Memory Controller HyperTransport 72 16 AMD64: Desktop Processor 8 Byte memory controller supporting 200, 266, & 333 MHz DDR Memory CHIPKILL ECC with x4 DRAMs Drive up to 4 registered DIMMs 4 DIMMs 333MHz Future memory technology supported as it is defined Up to 4GB x4 DRAMS (4GB DIMMs) HyperTransport Technology I/O On chip L1 & L2 cache 64KB L1 ICache, 64KB L1 DCache Up to 1M ECC protected L2 Cache 740-pin PGA Package
  • Slide 43
  • June 1, 2015Computation Products Group 43 1P AMD Athlon 64 Desktop Processor System System Strengths Memory Latency, Bandwidth and memory reach: 2 40 physical ( 1 Terabyte) 2 48 virtual I/O Latency and Bandwidth ~1600M T/sec 6.4 GB/s 64-bit CPU More Reliable Lower Chip count Improved machine check Improved error handling AMD-8151 AGP 8X AMD-8151 AGP 8X 16x16 HyperTransport @ 1600 MTs 32bits @ 533Mhz AMD Athlon 64 200-333MHz 72-Bit Reg DDR AMD-8111 TM I/O Hub AMD-8111 TM I/O Hub FLASH SIO LPC PCI 33/32 NIC USB1.1,2.0 AC97 ACR 1.0 MII 10/100 EIDE 4GB DRAM
  • Slide 44
  • June 1, 2015Computation Products Group 44 1P AMD Opteron 100 Series L2 Cache L1 Instruction Cache L1 Data Cache AMD64 Processor Core DDR Memory Controller HyperTransport Replaces Address, Data and Control Bus 72/144 16 18 CAS lines for 32GB of memory AMD64: 1 way Value Server 16 Byte memory controller supporting 200, 266, & 333 MHz DDR Memory CHIPKILL ECC with x4 DRAMs Drive up to 8 registered DIMMs 8 DIMMs 333MHz Future memory technology supported as it is defined Up to 4GB x4 DRAMS (4GB DIMMs) Three 16-bit non-Coherent HyperTransport Technology Links On chip L1 & L2 cache 64KB L1 ICache, 64KB L1 DCache Up to 1M ECC protected L2 Cache 940-pin PGA Package
  • Slide 45
  • June 1, 2015Computation Products Group 45 1P AMD Opteron 100 Desktop Processor System AMD-8151 AGP 8X AMD-8151 AGP 8X 16x16 HyperTransport @ 1600 MTs 32bits @ 533Mhz AMD Opteron AMD-8111 TM I/O Hub AMD-8111 TM I/O Hub FLASH SIO LPC PCI 33/32 NIC USB1.1,2.0 AC97 ACR 1.0 MII 10/100 EIDE 8GB DRAM PCI-X AMD-8131 PCI-X Tunnel AMD-8131 PCI-X Tunnel PCI-X AMD-8131 PCI-X Tunnel AMD-8131 PCI-X Tunnel System Strengths Ideal for cost sensitive designs system where I/O is the critical commodity Storage servers Low end DCC workstations
  • Slide 46
  • June 1, 2015Computation Products Group 46 2P - AMD Opteron 200 Series L2 Cache L1 Instruction Cache L1 Data Cache AMD64 Processor Core DDR Memory Controller HyperTransport Replaces Address, Data and Control Bus 72/144 16 18 CAS lines for 32GB of memory AMD64: 2 Way Performance Server 16 Byte memory controller supporting 200, 266, & 333 MHz DDR Memory CHIPKILL ECC with x4 DRAMs Drive up to 8 registered DIMMs 8 DIMMs 333MHz Future memory technology supported as it is defined Up to 4GB x4 DRAMS (4GB DIMMs) One coherent and two 16-bit non-Coherent HyperTransport Technology Links On chip L1 & L2 cache 64KB L1 ICache, 64KB L1 DCache Up to 1M ECC protected L2 Cache 940-pin PGA Package
  • Slide 47
  • June 1, 2015Computation Products Group 47 2P AMD Opteron 200 Server AMD Opteron AMD-8111 TM I/O Hub AMD-8111 TM I/O Hub FLASH SIO LPC PCI 33/32 NIC USB1.1,2.0 AC97 ACR 1.0 MII 10/100 EIDE PCI-X AMD-8131 PCI-X Tunnel AMD-8131 PCI-X Tunnel AMD Opteron PCI-X AMD-8131 PCI-X Tunnel AMD-8131 PCI-X Tunnel PCI-X AMD-8131 PCI-X Tunnel AMD-8131 PCI-X Tunnel Bridge or SSL/IPSec. System Strengths Ideal for systems where large flat memory is important (16GB of SMP memory) Data mining Rational Data Base applications 8GB DRAM
  • Slide 48
  • June 1, 2015Computation Products Group 48 4P - 8P AMD Opteron 800 L2 Cache L1 Instruction Cache L1 Data Cache AMD64 Processor Core DDR Memory Controller HyperTransport 72/144 16 AMD64: 4 - 8 Way Performance Server 16 Byte memory controller supporting 200, 266, & 333 MHz DDR Memory CHIPKILL ECC with x4 DRAMs Drive up to 8 registered DIMMs 8 DIMMs 333MHz Future memory technology supported as it is defined Up to 4GB x4 DRAMS (4GB DIMMs) Three 16-bit Coherent HyperTransport Technology Links On chip L1 & L2 cache 64KB L1 ICache, 64KB L1 DCache Up to 1M ECC protected L2 Cache 940-pin PGA Package
  • Slide 49
  • June 1, 2015Computation Products Group 49 AMD Opteron 800 HPC Processing Node HPC Strengths Flat SMP like Memory Model: All four reside with the same 2 48 memory map Expandable to 8P NUMA Glue-less Coherent multi- processing: low Latency and high Bandwidth ~1600M T/sec ( 6.4 GB/s) 32GB of High B/W external memory bus (>5.3GB/sec.) Native high B/W memory map I/O (>25Gbits/sec.)
  • Slide 50
  • Model Number Implementation First digitFirst digit = scalability of AMD Opteron processor Second and third digitsSecond and third digits = relative performance among AMD Opteron processors Model number conveys directional improvement AMD Opteron 200 Series AMD Opteron 100 Series 8462.0GHz 8441.8GHz 8421.6GHz 8401.4GHz ModelClock 8462.0GHz 8441.8GHz 8421.6GHz 8401.4GHz ModelClock 2462.0GHz 2441.8GHz 2421.6GHz 2401.4GHz ModelClock 2462.0GHz 2441.8GHz 2421.6GHz 2401.4GHz ModelClock AMD Opteron 800 Series Up to 8 way 1462.0GHz ModelClock 1441.8GHz ModelClock AMD Opteron Processor Model _ _ _ 2.0GHz 146 Up to 2 way 1 way Model Number Implementation
  • Slide 51
  • June 1, 2015Computation Products Group 51 Price Performance Positioning Performance Price A solution unto it self 800 200 100 256K 1M
  • Slide 52
  • Opteron Processor Architecture
  • Slide 53
  • June 1, 2015Computation Products Group 53 The Elements of the CPU L1 Instruction Cache 64KB 44-entry Load/Store Queue L2 Cache L1 Data Cache 64KB Crossbar Memory Controller HyperTransport TM System Request Queue Fetch Int Decode & Rename OPs 36-entry FP scheduler FADDFMISCFMUL Branch Prediction Instruction Control Unit (72 entries) Fastpath Microcode Engine Scan/Align FP Decode & Rename AGU ALU AGU ALU MULT AGU ALU Res Bus Unit
  • Slide 54
  • June 1, 2015Computation Products Group 54 Processor Throughput Supply 16 instruction bytes to the decoder per cycle Convert x86 instructions to fixed length OPs 24-entry integer scheduler can Dispatch 3 OPs per cycle to integer/FP schedulers Instructions use one of two decoding pipelines Fastpath: instructions which are decoded in to two or fewer mOPs are decoded by hardware and then packed into 3 dispatch positions Microcode: x86 instructions which are decoded in to more than two mOPs, calculate microcode ROM entry point and fetch sequence from Microcode ROM Compared to AMD Athlon XP, more instructions use the Fastpath Eg: Packed SSE is microcoded in AMD Athlon XP and Fastpath in AMD Opteron processors AMD Opteron has 8% fewer microcoded instructions for SPECint2000 AMD Opteron has 28% fewer microcoded instructions for SPECfp2000
  • Slide 55
  • June 1, 2015Computation Products Group 55 Floating Point & Integer Performance FPU Throughput SSE2, x87 Theoretical: (1 Mul + 1 Add)/cycle Realized: 1.9 FLOPs/cycle SSE, 3DNow! Theoretical: (2 Mul + 2 Add)/cycle Realized: 3.4+ FLOPs/cycle 32-bit Integer Throughput 1 add / clock cycle 1 multiply / clock cycle Multiply latency has shrunk from 5 cycles on AMD Athlon TM to 3 cycles on the AMD Opteron 64-bit Integer Throughput 1 add / clock cycle 1 multiply every other clock cycle Multiply latency is 4 cycles Integer Instruction Scheduler Out Of Order (OOO) from a queue of 24* Integer Macro-Ops *Athlon TM Instruction Scheduler is 18 Macro-Ops deep
  • Slide 56
  • June 1, 2015Computation Products Group 56 Internal Caching L1 caches 64k bytes instruction and data 2-way set associative Data Cache is ECC protected Instruction Cache is Parity protected L2 cache Caches instruction and data streams 16-way set associative, ECC protected >2X Athlon XP L2 L1 bandwidth Improved Translation Look-aside Buffer for large multiprocessor workloads Twice the size and Lower latencies then AMD Athlon XP L2 Translation Look-aside Buffer 512 entry - 4-way associative L1 Translation Look-aside Buffer 32 entry Instruction & Data -fully associative Machine check architecture for reporting failures L1 Instruction Cache 64KB 44-entry Load/Store Queue L2 Cache L1 Data Cache 64KB Bus Unit
  • Slide 57
  • June 1, 2015Computation Products Group 57 Reliability Features L1 Cache Data cache is ECC protected via background scrubber Instruction cache is parity protected upon R/W L2 cache Cache Tag arrays are ECC protected via background scrubber Instructions are parity protected, Data is ECC protected ECC bit reused for Branch Prediction and Instruction Decode (end bits) DRAM is ECC protected with chipkill ECC support Each fetch is parity checked ECC via scrubber period is user programmable for 40ns to 84usec. Remaining arrays are parity protected Instruction cache, tags and TLBs Data tags and TLBs Generally read only data which can be recovered Machine Check Architecture Report failures and predictive failure results ECC Branch Predictor ThermTrip Memory scrubbers
  • Slide 58
  • June 1, 2015Computation Products Group 58 Branch Prediction Improvements Full L1 Cache Coverage Twice the selectors as AMD Athlon XP 4K Branch Target Addresses Backed up by Branch Address Calculator 4 cycle correction for unconditional relative branches 16K Bimodal Counters Four times AMD Athlon XP Full Pre-decode and Branch Identification in L2 Cache New and unique to AMD Opteron Family of Processors Reuses L2 ECC bits on clean/shared instruction lines and on extra bit Branch Prediction Fetch OPs Instruction Control Unit (72 entries) Fastpath Microcode Engine Scan/Align
  • Slide 59
  • Integrated Northbridge
  • Slide 60
  • June 1, 2015Computation Products Group 60 Firmware View of Northbridge Performs same functions found in Northbridge Memory Controller fully integrated Host-Bridge function as defined by the PCI spec PCI to PCI Bridge as defined by the PCI spec Graphics Address Resolution Table (GART) Multi-processor coherency Controlled via PCI configuration registers Memory controller configuration HyperTransport technology routing Configured by Firmware HyperTransport initialization via Hardware Auto-size, coherent or non-coherent, Legacy path to the ROM in Southbridge HyperTransport technology speed and routing via firmware Everything else in firmware follows existing paradigms PCI enumeration Memory sizing and configuration I/O controller setup Crossbar Memory Controller HyperTransport TM System Request Queue
  • Slide 61
  • June 1, 2015Computation Products Group 61 Systems View of Northbridge (Assumes a 2GHz processor Clock)
  • Slide 62
  • June 1, 2015Computation Products Group 62 HyperTransport Technology Screaming I/O for chip-to-chip communication High bandwidth Point-to-point links Split transaction and full duplex Differential Signaling Tunneling capability HyperTransport Links Three 16-bit links (3.2 GB/s per direction) Reduced pin count compared to the typical Bus based systems Compatible with high-volume PC board infrastructure Each can be: cHT: coherent (Processor-to-Processor) link or, ncHT: non-coherent (Processor-to-I/O) link For more info see: http://www.HyperTransport.org/http://www.HyperTransport.org/ Enables scalable 2-8 processor Cache-Coherent MP systems Glueless MP
  • Slide 63
  • Performance
  • Slide 64
  • June 1, 2015Computation Products Group 64 Multi-Processor Performance Evaluation Simulation Parameters Microbenchmark Simulations: RTL based Cycle accurate DRAM Page hit System Parameters: AMD Opteron 2 GHz CPU Memory Clock = 333 MHz Data Rate Registered PC2700 DDR memory DRAM width = 128 bits interleaved CAS latency = 2.5 memory clocks HT frequency = 1600 MHz Data Rate (16 bits) DDR Peak Bandwidth = 5.4 GB/s HT Peak Bandwidth = 3.2 GB/s (each direction)
  • Slide 65
  • June 1, 2015Computation Products Group 65 SPECint Performance AMD Opteron processor estimates Intel Xeon processor * *Source http://www.spec.org/osg/cpu2000/results/cpu2000.html SPECint 2000 400 500 600 700 800 900 1000 1100 1200 1300 10001200140016001800200022002400260028003000 Operating Frequency [MHz] SPECint 2000 *Based on 2GHz lab hardware Using 32 bit binaries
  • Slide 66
  • June 1, 2015Computation Products Group 66 SPECfp Performance Comparison *Sourcehttp://www.spec.org/osg/cpu2000/results/cpu2000. html SPECfp 2000 *Based on 2GHz lab hardware Using 32 bit binaries 10001200140016001800200022002400260028003000 Operating Frequency [MHz] AMD Opteron processor estimates Intel Xeon processor * 400 500 600 700 800 900 1000 1100 1200 1300 3200340036003800 400042004400460048005000 1400 1500 A A A A A B ~400 MHz ~ 1100 MHz B B B
  • Slide 67
  • June 1, 2015Computation Products Group 67 Source: http://www.spec.org SPECfp 2000 Scores 0 200 400 600 800 1000 1200 1400 00.511.522.53 CPU Frequency (GHz) Score Base (IA32)Peak(IA32)AMD Opteron Processor (Estimated Performance) AMD Opteron P4 400FSB P4 533FSB PIII 133FSB SPECfp 2000 Base Competitive Summary (32-bit Windows, PC2700 CAS2.5) AMD Opteron Redesign effort
  • Slide 68
  • June 1, 2015Computation Products Group 68 AMD Opteron SPEC projections compared to Alpha EV7 AMD Opteron should be more cost-effective versus Alpha EV7 Standards versus Proprietary Millions per month versus 100s
  • Slide 69
  • June 1, 2015Computation Products Group 69 AMD Opteron SPEC projections compared to Itanium-2 AMD Opteron will be more cost-effective than Itanium-2 Standards versus Proprietary Millions per month versus 1,000s
  • Slide 70
  • June 1, 2015Computation Products Group 70 Integrated Memory Controller Latency (Local Memory Access, Registered Memory, CAS 2.5) 1.6GHz PC2700 65ns (L1 cache miss,TLB hit) 85-95ns (L1 cache miss,TLB miss) Block Size (bytes) Time (ns) Stride (bytes) Stride >1M 32k< Stride
  • June 1, 2015Computation Products Group 74 Sufficiently Uniform Memory Organization (SUMO) Disadvantages 3P and 4P nodes work better if the OS is aware of the memory map >4P may require a NUMA aware OS if the CACHE hit rate is low Advantages Software view of memory is SMP Latency difference between local & remote memory is a function of the number of processors in the node 1P and 2P look like a SMP machine 3P and 4P are NUMA like but can still be viewed as a ccUMA or asymmetric SMP node >4P can be viewed as ccUMA and depending on CACHE hit rate, may or may not required NUMA aware OS Physical address space is flat and can be viewed as fully coherent or not (MOEIS state) DRAM can be contiguous or interleaved Additional processor nodes bring true increased memory bandwidth Designed for lower overall system chip count (glue-less interface)
  • Slide 75
  • June 1, 2015Computation Products Group 75 Future NUMA Systems Scaling beyond 8 Processor Scaling beyond 8P is enabled External Coherent HyperTransport switch Coherent Interconnect Snoop filter Data caching Up to 16 processors within the same 2 40 SPM memory space 4P4P 4P4P 4P4P 4P4P SW2 SW3 4P4P 4P4P 4P4P 4P4P SW2 SW3 Interconnect Fabric 4P4P 4P4P 4P4P 4P4P SW0 SW1 4P4P 4P4P 4P4P 4P4P SW2 SW3
  • Slide 76
  • AMD Opteron Support ICs
  • Slide 77
  • June 1, 2015Computation Products Group 77 AMD Opteron Support ICs AMD is committed to deliver the highest quality systems solutions Providing a family of x64-64 processors is just the start AMD will promote and enable a broad range of HyperTransport support silicon from internal and external design efforts. AMD, with the HyperTransport consortium, will grow the HyperTransport eco-system
  • Slide 78
  • June 1, 2015Computation Products Group 78 HyperTransport Technology Consortium
  • Slide 79
  • June 1, 2015Computation Products Group 79 AMD-8131 HyperTransport PCI-X Tunnel Dual PCIx Master Each PCI-X Bridge independently supports 66, 100, 133MHz PCI-X Protocol 33 and 66MHz PCI 2.2 Protocol SHPC Controller 64-bit data path IOAPIC Arbiter for up to 5 masters Hot-swap HyperTransport TM Support: 16/16 up, 8/8 down, independent support for Up to 1600MT/s up and down Full Link Auto sizing and speed selection 829 OBGA, 37.5mm body, 1.27mm pitch, full array, 6-Layer Motherboard Breakout AMD Opteron Or AMD Athlon64 AMD-8111 TM I/O Hub FLASH SIO LPC 32bits @ 33Mhz USB1.0,2.0 AC97 UDMA100 10/100 Ethernet 10/100 Phy 100 BaseT 8x8 HyperTransport @ 800MTs AMD-8131 HyperTransport Dual PCI-X 16x16 HyperTransport @ 1600MTs
  • Slide 80
  • June 1, 2015Computation Products Group 80 AMD-8111 HyperTransport I/O Hub I/O Hub Engineered from past successful AMD I/O hub development efforts 8x8 wide 200 MHz DDR HyperTransport technology interface (800MB/s aggregate BW) Enhanced 10/100 Ethernet MAC USB1.1, USB2.0, EDMA, AC97 LPC for BIOS ROM and Super I/O PCI version 2.2 - 33/32 Bridge (legacy) Supports arbitration of up to 8 external masters SMbus 1.0 and 2.0 controllers 492 PBGA, 35x35mm body, 1.27mm pitch AMD-8111 TM I/O Hub FLASH SIO LPC 32bits @ 33Mhz NIC 10/100 BaseT 8x8 HyperTransport TM @ 800MHz USB1.1,2.0 AC97 MII EIDE
  • Slide 81
  • June 1, 2015Computation Products Group 81 AMD-8151 HyperTransport AGP Tunnel 8xAGP Fully AGP 3.0 Compliant 66,133,266,533MHz operation HyperTransport TM Support: 16/16 up, 8/8 down, independent support for Up to 1600MT/s up, Up to 800MT/s down Full Link Auto sizing and speed selection 564 OBGA, 31x31mm body, 1.27mm pitch, full array 8x AGP Int Gfx AMD 8151 HyperTransport AGP AMD Opteron Or AMD Athlon64 AMD-8111 TM I/O Hub FLASH SIO LPC 32bits @ 33Mhz USB1.0,2.0 AC97 UDMA100 10/100 Ethernet 10/100 Phy 100 BaseT 8x8 HyperTransport @ 800MTs
  • Slide 82
  • June 1, 2015Computation Products Group 82 Opteron & Athlon Server Chipset Roadmap 2H02 2003 2004 2005 AMD-760MP/MPX AMD-8111 HyperTransport I/O Hub 7 th Generation 8 th Generation AMD-8151 HyperTransport AGP Tunnel AMD-8131 HyperTransport PCI-X Tunnel 2 PCI-X Bridges HyperTransport Second Generation PCI Device Second Generation HyperTransport I/O Hub
  • Slide 83
  • June 1, 2015Computation Products Group 83 Desktop Infrastructure Roadmap Athlon 64 Desktop Chipset Roadmap
  • Slide 84
  • June 1, 2015Computation Products Group 84 A Growing ecosystem of HyperTransport enabled ICs Available today: Dual MIPS processor - Broadcom BCM1250 PCI 66/64 Bridge from Alliance Semi. NITROX Security Macro Processor from Cavium Networks FPGA from XILINX and Altera Announced: RM9000 MIPS processor from PMC Sierra 4 Port 8/8 HyperTransport TM switch swap support from Alliance Semi. SSL/TLS Record Processing Systems Broadcom BC5850 Luminance Modular Array Technology - Lightspeed Semiconductor Planned: InfiniBand Bridge Proprietary High Speed Interconnect 4 Port 16/16 non-coherent switch 4 port 16/16 coherent switch PCI-X Bridges
  • Slide 85
  • June 1, 2015Computation Products Group 85 HyperTransport TM technology 4-way 16/16 Non-Coherent Switch Extends the fabric by re-mapping Unit_IDs at each port Tracks path of packet that pass through it, guaranteeing the same return path Records the incoming Unit_ID so it can be restored in the response packet Follows same rules as Processor Host interface Peer-to-peer through the switch freeing up the host Facilitates multiple Host fabrics
  • Slide 86
  • June 1, 2015Computation Products Group 86 Nine channel GigE Firewall 8x8 HyperTransport 1000M transfers/sec. FLASH LPC 8x8 HyperTransport @ 400MT/s 16x16 HyperTransport @ 1600MT/s Legacy PCI USB1.0 AC97 UDMA133 MII 10/100 Phy 100 BaseT Management LAN AMD Opteron Zircon BMC SIO PCI Graphics VGA 64bits @ 133Mhz 64bits @ 133Mhz PCI-X AMD-8111 TM I/O Hub AMD-8111 TM I/O Hub AMD-8131 TM PCIX Tunnel AMD-8131 TM PCIX Tunnel
  • Slide 87
  • June 1, 2015Computation Products Group 87 AMD Opteron DP - 2P Server with SSL/IPsec encryption SP1011 PCI Bridge PCI 66/64 SP 8/8 Switch Security Macro Processor RM9000x2 DDR SDRAM SysAD Bus
  • Slide 88
  • June 1, 2015Computation Products Group 88 1U/1P AMD Opteron Server
  • Slide 89
  • June 1, 2015Computation Products Group 89 1U/2P AMD Opteron Server
  • Slide 90
  • June 1, 2015Computation Products Group 90 4P Coherent System Based on two 2P MP Nodes AMD Opteron DP 200-333MHz 9 byte Reg. DDR 8-G DRAM AMD Opteron DP 200-333MHz 9 byte Reg. DDR FLASH LPC Legacy PCI USB1.0 AC97 UDMA133 MII 10/100 Phy 100 BaseT Management LAN Management SIO PCI Graphics VGA AMD-8111 TM I/O Hub AMD-8111 TM I/O Hub 16x16 HyperTransport @ 1600MT/s PCI-X AMD-8131 TM PCI-X Tunnel AMD-8131 TM PCI-X Tunnel AMD Opteron DP 200-333MHz 9 byte Reg. DDR 8-G DRAM AMD Opteron DP 200-333MHz 9 byte Reg. DDR Horis Probe directory SRAM
  • Slide 91
  • June 1, 2015Computation Products Group 91 AMD Opteron Beowulf 4P SMP Processing Node AMD Opteron 200-333MHz 9 byte Reg. DDR 8GB DRAM AMD Opteron 200-333MHz 9 byte Reg. DDR 8-G DRAM AMD Opteron 200-333MHz 9 byte Reg. DDR 8GB DRAM AMD Opteron 200-333MHz 9 byte Reg. DDR FLASH LPC Legacy PCI USB1.0 AC97 UDMA133 MII 10/100 Phy 100 BaseT Management LAN Management SIO PCI Graphics VGA AMD-8111 TM I/O Hub AMD-8111 TM I/O Hub 16x16 HyperTransport @ 1600MT/s PCI-X AMD-8131 TM PCI-X Tunnel AMD-8131 TM PCI-X Tunnel To AMD 8131 Tunnel One 4P SMP node 16G-flops 32GB DRAM 10GB/sec. Memory BW
  • Slide 92
  • June 1, 2015Computation Products Group 92 HyperTransport Technology on the Backplane non coherent interconnect SI4041 Switch SI4041 Switch SI4041 Switch 4P Blade Switches and 8111 on the backplane Hot swap connection
  • Slide 93
  • June 1, 2015Computation Products Group 93 Two - 8 Processor System Topologies (NUMA)
  • Slide 94
  • June 1, 2015Computation Products Group 94 AMD Opteron PCI-X AMD-8131 PCI-X Tunnel AMD-8131 PCI-X Tunnel AMD Opteron 200-333MHz 72-Bit Reg DDR 8GB DRAM AMD-8111 TM I/O Hub AMD-8111 TM I/O Hub FLASH SIO LPC PCI 33/32 8x8 HyperTransport @ 1.6GB/sec. USB1.1,2.0 AC97 ACR 1.0 GMII PCI-X AMD-8131 PCI-X Tunnel AMD-8131 PCI-X Tunnel PCI-X AMD-8131 PCI-X Tunnel AMD-8131 PCI-X Tunnel 2P Server with Add-on Accelerator Daughter Card EIDE NIC 10/100 HyperTransport-enabled daughter card Luminance Modular Array ASIC Interface Device Luminance Modular Array ASIC Interface Device
  • Slide 95
  • June 1, 2015Computation Products Group 95 AMD Athlon 64 1P Blade Design 16x16 HyperTransport @ 1,000MT/s AMD Athlon 64 4GB DRAM Luminance Modular Array ASIC Interface Device Luminance Modular Array ASIC Interface Device HCA Interface Ultra low cost Blade design 4GB 333MHz DRAM 2GHz processor ~35 Watts Luminance Device Boots the Processor Provides HCA network interface Boot ROM
  • Slide 96
  • June 1, 2015Computation Products Group 96 AMD Opteron Processor DP 2P Graphics Workstation TM
  • Slide 97
  • June 1, 2015Computation Products Group 97 2P AMD Opteron Processor Graphics Workstation (Cave)
  • Slide 98
  • June 1, 2015Computation Products Group 98 High density SprayCooled Blade Configuration 4P 16G-flop Blade Design 64GB of SMP DRAM ASIC boots the 4P unit PCI-X provides all I/O Vapor cooled in sealed enclosure External VRM
  • Slide 99
  • June 1, 2015Computation Products Group 99 How ISR SprayCool TM Technology Works b. Vapor travels though the heat exchanger to be condensed c. Fluid collects in reservoir d. Fluid is purified by the filtration system e. Fluid is pumped back into the electronics in a continuous cycle a. As the electronics are sprayed, the fluid vaporizes, cooling the electronics to a low, stable temperature. f. Sealed enclosure protects electronics from dust, dirt, salt-air
  • Slide 100
  • June 1, 2015Computation Products Group 100 16 cards 16G-flops/card 256G-flops peak throughput 64GB of memory per card 1TerraByte of sys. Memory 240 cubic inches 114M-flops/cubic inch 4.27GB of memory storage cubic inch ~6K watts ~3 watts/cubic inch 14 10 16 High Density HPC Cluster SprayCool Technology from ISR
  • Slide 101
  • AMD Reference Design Kits
  • Slide 102
  • June 1, 2015Computation Products Group 102 Four Hardware platforms Solo (AMD): 1P AMD Opteron mother-board for Desk top applications Serenade (AMD): 2P AMD Opteron system board for HPC and server applications Quartet (AMD): 4U-4P AMD Opteron system board for HPC and server applications Khperi (Newisys): 1U-2P AMD Opteron server board
  • Slide 103
  • June 1, 2015Computation Products Group 103 Solo Features Athlon64 Uni-processor Two Unbuffered PC2700, PC2100 DDR DIMMs AMD 8151 AGP8X HyperTransport Tunnel AMD 8111 I/O Hub Four PCI 32b 33MHz slots Two ATA-100 EIDE connectors Size USB 2.0 ports 3 on back panel, 2 on front panel, and 1 on ACR AC 97 audio SMBus 1.0 and 2.0 support One ACR slot; 1 Fan with sense and 1 Fan without sense Floppy, serial, parallel, 2 PS/2 and 2 IEEE 1394a connectors LPC Super I/O with 2 fans with sense 4-layer ATX form factor with ATX power supply PC2001, WHQL, Energy Star, WFM 2.0 compliant
  • Slide 104
  • June 1, 2015Computation Products Group 104 Hammer Performance Desktop (Solo-RDK)
  • Slide 105
  • June 1, 2015Computation Products Group 105 CPU/Memory Complex Opteron processor 200 Series (supports up to 2 processors) Four banks of 128bit registered DDR memory/CPU (DDR 200-333)I/O Full size PCI-X slots: Two PCI-X 64/100 MHz or one PCI-X 64/133 (none hot plug-able) One mini-PCI slot Dual Broadcom 10/100/1000 Ethernet onboard Dual LSI U320 SCSI (one channel to disk, one channel to rear expansion) Single USB1.1: to front SIO (Floppy, Serial, Keyboard, Mouse)Management Single dedicated management, LAN10/100 Optional BMC management controller, IPMI 1.5 compliantStorage Dual drive bays: (standard) IDE or (standard or hot-swap) SCSI drives Slim-line IDE CD-ROM or slim-line floppy drivePhysicals 1U Rack-mount server form factor, tool-less access, full extension slide rails Single 500W power-supply, rear accessible to line cord Removable blowers, cooling performed front-to-rear (passive CPU heatsinks) Front LED panel with activity and status: PWR, RESET, USB, PCI-Video Dimensions: (1U) x 19 W x 28 D 1U/2P Serenade
  • Slide 106
  • June 1, 2015Computation Products Group 106 1U/2P Serenade Front View 28 500W Power Supply CDROM or Floppy (slimline) Drive Carriers (x2) (SCSI hot swappable) 10 Redundant Blowers (front to back cooling) AMD Opteron 200 Series (x2) 32/33MHz PCI (half-height/half length) (Video option) 8 DIMMs DDR 266-333 ECC (4DIMMs/CPU) SCSI Disk Option (Mini-PCI) Full Size PCI-X Slots (x2) 64/100 MHz or single PCI-X 64/133 (riser w/sideband)
  • Slide 107
  • June 1, 2015Computation Products Group 107 1U/2P Serenade Rear View Full Size PCI-X Slots (x2) 64/100 MHz or single PCI-X 64/133 module assembly (riser w/sideband) AMD Opteron 200 Series (x2) cooling ducts Dedicated 10/100 IPMI Management Port Dual 10/100/1000 ENET 32/33MHz PCI (half-height/half length) (std. half-height video option) PS2 ports U320 SCSI Option (Mini-PCI) USB port
  • Slide 108
  • June 1, 2015Computation Products Group 108 Quartet: 4U/4P SledgeHammer MP 940-pin Processor
  • Slide 109
  • June 1, 2015Computation Products Group 109 Quartet System Features 4U Rack-mount server form factor (25 deep) EIA-Std 4P Opteron (940-pin) Four banks of 128bit registered DDR memory per CPU (designed for DDR-333) 16 Total Five full size PCI-X slots (AMD 8131): Two PCI-X 64/133 MHz (hot plug-able) Three PCI-X 64/66 MHz Ethernet Ports: Dual Broadcom 10/100/1000 Ethernet onboard Single 10/100 (AMD-8111) Dual LSI U320 SCSI (one channel to disk, one channel to rear expansion) System Management: Qlogic UL BMC IPMI 1.5 via dedicated LAN/Modem
  • Slide 110
  • June 1, 2015Computation Products Group 110 Quartet System Features (cont) Dual IDE: Slim-line CD-ROM, Slim Floppy Dual USB: one front, one rear SIO (Floppy, Serial, Keyboard, Mouse) Storage: Four 1 hot-swap Ultra320 SCSI drives Video: ATI 4 Meg (via card option PCI 32/33) Three 500W hot-swap power-supplies (2+1 redundancy) for 4U; rear accessible to three line cords Hot-swap redundant fans (10) Front LED panel with activity and status: PWR, RESET, USB, PCI-Video Full extension slide rails Dimensions: 5.25 H x 19 W x 28 D (*5.25 is main/processor section; an additional 1.75 is the power supply bay) Cooling front to rear (passive CPU heatsinks) Tool-less access
  • Slide 111
  • June 1, 2015Computation Products Group 111 Dual Processor Opteron System 1U 2P Opteron 16 GigaBytes RAM, max Fully Managed Linux 32 & 64 bit Windows 32 bit 2000 and.Net Server Windows 64 bit (when available) Khepri
  • Slide 112
  • June 1, 2015Computation Products Group 112 Khepri Block Diagram
  • Slide 113
  • June 1, 2015Computation Products Group 113 Khepri Alpha Internal View
  • Slide 114
  • June 1, 2015Computation Products Group 114 Availability Solo (AMD Athlon 64) Prototypes are available now Production planned in Sept. 2003 Serenade (AMD) Development platform RDK available now Production planned for June 2003 Quartet (AMD) RDK available June 2003 Production planned for Aug. 2003 Khperi (Newisys) Development units are available now through AMD Beachhead Program Production Now
  • Slide 115
  • June 1, 2015Computation Products Group 115 Platform Enablement Program Over the past 24 months, AMD has provided technical design support to over ~50 companies To date, Newisys has enabled over 17 vendors with their Khepri 2P platform reference design By Launch (April 2003) there will be 4+ announcements of 4P HPC servers based on AMD Opteron. By Nov. 2003 there we be many more vendors with 4P and up to four vendors with 8P SMP/NUMA AMD Opteron platforms. With the availability of a HyperTransport coherent switch, the NUMA server can grow to 32P and beyond.
  • Slide 116
  • June 1, 2015Computation Products Group 116 2002-2003 AMD Server Roadmap Enterprise Scalable SH MP 2.2 Basic + SH MP 2.0 SH MP 1.8 Basic Value + SH DP 1.6 SH DP 1.4 THR 2.13/2600+ THR 2.0/2400+ THR 1.8/2200+ THR 1.67/2000+ BAR 2.2/2800+ SH DP 1.6 SH DP 1.4 BAR 2.2/2800+ SH DP 1.4 BAR 2.2/2800+ SH DP 1.4 Value Ultra-Value DP/MP Systems 1Q03 2Q033Q03 4Q03 SH DP 2.4/4200SH DP 2.6/4500 SH DP 1.4/2600SH DP 1.6/3000 SH DP 1.4/2600 THR 2.13/2600+ THR 2.0/2400+ THR 1.8/2200+ THR 1.67/2000+ SH DP 1.8 SH DP 1.6 4Q02 THR 2.0/2400+ THR 1.8/2200+ THR 1.67/2000+ SH DP 1.8 SH DP 1.6 SH DP 1.4 SH DP 2.0 SH DP 1.8 SH DP 1.6 SH DP 2.2 SH DP 2.0 SH DP 1.8 SH DP 2.0 SH DP 1.8 SH DP 2.4 SH DP 2.2 SH DP 2.0 SH DP 2.6 SH DP 2.4 SH DP 2.2 THR 2.13/2600+ THR 2.0/2400+ THR 2.13/2600+ AMD Opteron processor SledgeHammer DP AMD Opteron processor SledgeHammer MP AMD Athlon MP processor Barton (266MHz FSB) AMD Athlon MP processor Thoroughbred (266MHz FSB) SH MP 2.0 SH MP 1.8 SH MP 2.2 SH MP 2.0 SH MP 2.6 SH MP 2.4 SH MP 1.6SH MP 1.8 SH MP 1.4 SH MP 1.6 SH MP 1.4
  • Slide 117
  • Summary
  • Slide 118
  • June 1, 2015Computation Products Group 118 AMD Opteron Processor Optimized for high performance operation Chip infrastructure optimized for sub micron process impacting: Power distribution, Clocking, Circuit design and layout 20-25% better performance per clock than AMD Athlon XP Smart low-latency memory controller Branch prediction, Cache and TLB improvements Advanced clock distribution methods New operand/address sizes, rather than new instructions Integrated DDR Memory System Controller Closing the gap between external memory access and CPU speed Reduced latency of current Stare of Art (AMD Athlon processor) Greater the bandwidth of current State of Art (AMD Athlon system) Integrated Coherent HyperTransport I/O supporting High speed peripheral connections - >6.4GB/s throughput Coherent HyperTransport technology to support glueless MP interface
  • Slide 119
  • Slide 120
  • June 1, 2015Computation Products Group 120 Trademark Attribution Copyright 2002 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow Logo, AMD Athlon, AMD Opteron, 3DNow! and combinations thereof are trademarks of Advanced Micro Devices, Inc. HyperTransport is a licensed trademark of the HyperTransport Consortium. MMX is trademark of Intel Corporation. Other product names used in this presentation are for identification purposes only and may be trademarks of their respective companies.