amd opteron architecture

15
AMD OPTERON ARCHITECTURE Omar Aragon Abdel Salam Sayyad This presentation is missing the references used

Upload: neona

Post on 25-Feb-2016

53 views

Category:

Documents


3 download

DESCRIPTION

AMD OPTERON ARCHITECTURE. Omar Aragon Abdel Salam Sayyad This presentation is missing the references used. Outline. Features Block diagram Microarchitecture Pipeline Cache Memory controller HyperTransport InterCPU Connections. Features. 64-bit x86-based microprocessor - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: AMD OPTERON ARCHITECTURE

AMD OPTERON ARCHITECTURE

Omar AragonAbdel Salam Sayyad

This presentation is missing the references used

Page 2: AMD OPTERON ARCHITECTURE

Outline• Features• Block diagram• Microarchitecture• Pipeline• Cache• Memory controller• HyperTransport• InterCPU Connections

Page 3: AMD OPTERON ARCHITECTURE

Features• 64-bit x86-based microprocessor• On chip double-data-rate (DDR) memory controller [low

memory latency] • Three HyperTransport links [connect to other devices

without support chips]• Out of order, superscalar processor• Adds 64-bit (48-bit virtual and 40-bit physical) addressing

and expands number of registers• Supports legacy 32-bit applications without modifications

or recompilation

Page 4: AMD OPTERON ARCHITECTURE

Features• Double the number of registers

• Integer general purposes registers (GPR’s) – 16 each• Streaming SIMD extension (SSE) registers – 16 each

• Satisfies the register allocation needs of more than 80% of functions appearing in a typical program.

• Connected to a memory through an integrated memory controller

• High performance I/O subsystem via HyperTransport bus.

Page 5: AMD OPTERON ARCHITECTURE

Block diagram

Page 6: AMD OPTERON ARCHITECTURE

Microarchitecture• Works with fixed-length micro-ops and dispatches into two

independent schedulers: One for integer, and one for floating point and multimedia (MMX, 3DNow, SSE and SSE2)

• Load and store micro-ops go to the load/store unit• 11 micro-ops each cycle to the following execution

resources.• Three integer execution units• Three address generation units• Three floating point and multimedia units• Two load/store to the data cache

Page 7: AMD OPTERON ARCHITECTURE

Microarchitecture

Page 8: AMD OPTERON ARCHITECTURE

Pipeline• Long enough for high frequency and short enough for

good IPC (Instructions per cycle)• Fully integrated from instruction fetch through DRAM

access.• Execute pipeline is typically

• 12 stages for integer• 17 stages for floating-point• Data cache access occurs in stage 11.

• In case that L1 cache miss, the pipeline access the L2 cache in parallel and the request goes to the system request queue.

• Pipeline in the DRAM run as the same frequency as the core

Page 9: AMD OPTERON ARCHITECTURE

Pipeline

Page 10: AMD OPTERON ARCHITECTURE

Memory, Cache, and HyperTransport

Page 11: AMD OPTERON ARCHITECTURE

Cache• Separate L1 Instruction and Data caches.

• Each is 64 Kbytes, 2-way set associative, 64-byte cache line.• L2 cache (Data & Instructions)

• Size: 1 Mbytes. 16-way set associative.• uses a pseudo-least-recently-used (LRU) replacement policy

• Independent L1 and L2 translation look-aside buffers (TLB).• The L1 TLB is fully associative and stores thirty-two 4-Kbyte page

translations, and eight 2-Mbyte/4-Mbyte page translations.• The L2 TLB is four-way set-associative with 512 4-Kbyte entries.

Page 12: AMD OPTERON ARCHITECTURE

Onboard Memory Control• 128-bit memory bus• Latency reduced and bandwidth doubled• Multicore: Processors have own memory interface and

own memory• Available memory scales with the number of processors• DDR-SDRAM only• Up to 8 registered DDR DIMMs per processor• Memory bandwidth of up to 5.3 Gbytes/s per processor.

Page 13: AMD OPTERON ARCHITECTURE

HyperTransport• Bidirectional, serial/parallel, scalable, high-bandwidth low-

latency bus• Packet based

• 32-bit words regardless of physical width• Facilitates power management and low latencies

Page 14: AMD OPTERON ARCHITECTURE

HyperTransport in the Opteron• 16 CAD HyperTransport (16-bit wide, CAD=Command,

Address, Data) • processor-to-processor and processor-to-chipset• bandwidth of up to 6.4 GB/s (per HT port)

• 8-bit wide HyperTransport for components such as normal I/O-Hubs

Page 15: AMD OPTERON ARCHITECTURE

InterCPU Connections• Multiple CPUs connected through a proprietary extension

running on additional HyperTransport interfaces • Allows support of a cache-coherent, Non-Uniform Memory

Access, multi-CPU memory access protocol• Non-Uniform Memory Access

• Separate cache memory for each processor• Memory access time depends on memory location. (i.e. local

faster than non-local)• Cache coherence

• Integrity of data stored in local caches of a shared resource • Each CPU can access the main memory of another

processor, transparent to the programmer