radeon™ hd 2900 - lunds tekniska högskola

14
Radeon™ HD 2900 Michael Doggett October 3, 2007

Upload: others

Post on 30-Dec-2021

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Radeon™ HD 2900 - Lunds tekniska högskola

Radeon™ HD 2900

Michael Doggett

October 3, 2007

Page 2: Radeon™ HD 2900 - Lunds tekniska högskola

Radeon HD 2900 2 October 2007

Overview

Starting PointRequirementsTop levelPipeline Blocks from ‘top to bottom’– Command Processor– Shader Setup Engine– Threaded Dispatcher– Shader Core– Texture Filtering– Render Backend

Conclusion

Page 3: Radeon™ HD 2900 - Lunds tekniska högskola

Radeon HD 2900 3 October 2007

Starting point

Combine the best of existing technology– R5xx series

Heavily threaded shader cores

– Hides latency of memory fetch

Vec4+1 Vertex and Vec3+1 Pixel shaders

Ringbus memory subsystem

– XBOX 360 GPUUnified shader architecture

– Vertex and Pixel

– Vec4+1

– Stream Out

Unified L1 texture cache

Introduced Tessellator

Page 4: Radeon™ HD 2900 - Lunds tekniska högskola

Radeon HD 2900 4 October 2007

Requirements

DirectX10 compatible

Support new driver model– Vista driver model

Scalable family– “Number” of shader cores, texture units, render back-ends.

– Shader scalable in number of pipes, SIMDs.

– Target specific cost, feature set and performance levels for each part

Page 5: Radeon™ HD 2900 - Lunds tekniska högskola

Radeon HD 2900 5 October 2007

Page 6: Radeon™ HD 2900 - Lunds tekniska högskola

Radeon HD 2900 6 October 2007

Radeon HD 2000 Series

0128256L2 texture cache (KB)

64128512Memory Pins

180390720Transistors (Millions)

82153420Area (mm2)

656580Technology (nm)

4416Render Backends

4816Texture Units

4816Shader Units

234SIMDs

40120320Stream Processors

240026002900Radeon

Page 7: Radeon™ HD 2900 - Lunds tekniska högskola

Radeon HD 2900 7 October 2007

Where next ?

Move fixed functions blocks to shader– Improve programmability, reduce area, improve reuse,

maintain/target performance

Enhancements for GPGPU– Improved precision and compliance

– New APIs, new functions

New technologies such as 65, 55, 45, 32…

Graphics and gaming keeps on evolving– DX-next is already being discussed

– We are well into next generation and next-next generations …

Page 8: Radeon™ HD 2900 - Lunds tekniska högskola

Radeon HD 2900 8 October 2007

Hardware-Software Interface

Page 9: Radeon™ HD 2900 - Lunds tekniska högskola

Radeon HD 2900 9 October 2007

Radeon HD 2900

Unified shader

– Vertex, Geometry and Pixel

– Multiple SIMD

– VLIW 5-way scalar

Shader cached memory read/write

Geometry shader on/off chip storage

512 bit stacked I/O Memory Interface

Full DX10 functionality

Page 10: Radeon™ HD 2900 - Lunds tekniska högskola

Radeon HD 2900 10 October 2007

Questions and Demo

Thanks to Eric Demers, Mike Mantor and Norm Rubin

Page 11: Radeon™ HD 2900 - Lunds tekniska högskola

Radeon HD 2900 11 October 2007

Page 12: Radeon™ HD 2900 - Lunds tekniska högskola

Radeon HD 2900 12 October 2007

300

5300

10300

15300

20300

25300

30300

35300

2000 2001 2002 2003 2004 2005 2006 2007

GPU Performance

Moore's Law (doubling every 24months)

Performance Improvements

7500 8500 9700 9800 X800 X1800 X1950 2900

Page 13: Radeon™ HD 2900 - Lunds tekniska högskola

Radeon HD 2900 13 October 2007

GPU vs. Size/Perf./Features

Page 14: Radeon™ HD 2900 - Lunds tekniska högskola

Radeon HD 2900 14 October 2007

Abstract :

The ATI Radeon HD 2900 developed by AMD is a Graphics Processing Unit (GPU) capable of massively parallel computation for high performance 3D graphics and general purpose algorithms. The unified shader architecture consists of a combination of MIMD and SIMD architectures of 5 way scalar arithmetic units running in parallel. The shader uses multi-threading to hide latency of memory access so that compute units are kept busy. The threads consist of vertex, geometry and pixel threads that represent different programmable stages of a traditional 3D graphics pipeline mapped onto a single scheduled shader unit. Varied distributed and unified caches are used for data, instructions, read only texture reads and vertex data. A ring based memory subsystem allows multiple clients to access multiple memory channels.