eecs 150 components & design techniques for digital systems...

64
UC Regents Spring 2009 © UCB EECS 150 L28: Graphics Processors 2009-4-30 John Lazzaro (www.cs.berkeley.edu/~lazzaro) EECS 150 Components & Design Techniques For Digital Systems Lecture 28: Graphics Processors www-inst.eecs.berkeley.edu/~cs150/ TAs: CS 194-6 alums - Chris, Ilia, and Chen Play 1

Upload: others

Post on 03-Feb-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: EECS 150 Components & Design Techniques For Digital Systems …cs150/sp09/Lecture/lec28-gpu.pdf · A cube whose faces are made up of triangles. This is a 3-D model of a cube -- model

UC Regents Spring 2009 © UCBEECS 150 L28: Graphics Processors

2009-4-30John Lazzaro

(www.cs.berkeley.edu/~lazzaro)

EECS 150 Components & Design Techniques

For Digital Systems

Lecture 28: Graphics Processors

www-inst.eecs.berkeley.edu/~cs150/

TAs: CS 194-6 alums - Chris, Ilia, and Chen

Play

1

Page 2: EECS 150 Components & Design Techniques For Digital Systems …cs150/sp09/Lecture/lec28-gpu.pdf · A cube whose faces are made up of triangles. This is a 3-D model of a cube -- model

UC Regents Spring 2009 © UCBEECS 150 L28: Graphics Processors

Today: Graphics Processors

Computer Graphics. A brief introduction to “the pipeline”.

Stream Processing. Casting the graphics pipeline into hardware.

Unified Pipelines. GeForce 8800,from Nvidia, introduced in 2006.

Larrabee. Intel multi-core graphics architecture, SIGGRAPH 2008.

2

Page 3: EECS 150 Components & Design Techniques For Digital Systems …cs150/sp09/Lecture/lec28-gpu.pdf · A cube whose faces are made up of triangles. This is a 3-D model of a cube -- model

UC Regents Spring 2009 © UCBEECS 150 L28: Graphics Processors

Personal computer graphics architecture

Case Study: Mac Mini (PowerPC edition)

3

Page 4: EECS 150 Components & Design Techniques For Digital Systems …cs150/sp09/Lecture/lec28-gpu.pdf · A cube whose faces are made up of triangles. This is a 3-D model of a cube -- model

UC Regents Spring 2009 © UCBEECS 150 L28: Graphics Processors

Personal computer graphics architecture

4

Page 5: EECS 150 Components & Design Techniques For Digital Systems …cs150/sp09/Lecture/lec28-gpu.pdf · A cube whose faces are made up of triangles. This is a 3-D model of a cube -- model

UC Regents Spring 2009 © UCBEECS 150 L28: Graphics Processors

Figure 2-1 Block diagram

167 MHzMaxBus

12 MbpsUSB

PCI bus

BootROM

USB 2.0 port (480 Mbps)

USB 2.0 port (480 Mbps)

PCI USB 2.0controller

DDR SDRAMDIMM slot

32 MBDDR RAM

DVI/VGA/composite/S-videooutput port

Ethernet port10/100 Mbps

FireWire 400 port

AGP 4Xbus

167 MHzMemory

bus

PMUpower controller

Powerbutton Fan

Opticaldrive

UltraATA/100

bus

Device 0

Device 1

Headphone/audio line-out jack

Hard diskdrive

Radeon9200

graphics IC

Audiocodec

EthernetPHY

FireWirePHY

PowerPC G4microprocessor

(L2 cache: 512K 1:1)

AirPort Extreme

I2S

I2S

I2C

BluetoothModem port

Modem module

Data pumpand DAA

Built-inspeaker

Intrepidmemorycontrollerand I/Odevice

controller

Main ICs and Buses

The architecture of Mac mini is designed around the PowerPC G4 microprocessor and the Intrepidmemory and I/O device controller. The Intrepid occupies the center of the block diagram.

The MaxBus connects the PowerPC G4 microprocessor to the Intrepid ASIC. The MaxBus has 64 datalines, 32 address lines, and a bus clock speed of 167 MHz. The Intrepid ASIC has other buses thatconnect with the boot ROM, the hard disk drive, and the optical drive, the power controller IC, thesound IC, the internal modem module, and the optional wireless LAN module.

The Intrepid I/O controller has a 32-bit PCI bus with a bus clock speed of 33 MHz.

Each of the components listed here is described in one of the following sections.

16 Block Diagram and Buses2005-04-05 | © 2005 Apple Computer, Inc. All Rights Reserved.

C H A P T E R 2

Architecture

Processor bus.How the CPUtalks to everything else.

CPU: PowerPC G4 (Freescale)

Bus controller. Low-cost Mac Mini only has 1. Most PCs have two: fast North Bridge, slow South Bridge.

Mac Mini G4: System block diagram

5

Page 6: EECS 150 Components & Design Techniques For Digital Systems …cs150/sp09/Lecture/lec28-gpu.pdf · A cube whose faces are made up of triangles. This is a 3-D model of a cube -- model

UC Regents Spring 2009 © UCBEECS 150 L28: Graphics Processors

Figure 2-1 Block diagram

167 MHzMaxBus

12 MbpsUSB

PCI bus

BootROM

USB 2.0 port (480 Mbps)

USB 2.0 port (480 Mbps)

PCI USB 2.0controller

DDR SDRAMDIMM slot

32 MBDDR RAM

DVI/VGA/composite/S-videooutput port

Ethernet port10/100 Mbps

FireWire 400 port

AGP 4Xbus

167 MHzMemory

bus

PMUpower controller

Powerbutton Fan

Opticaldrive

UltraATA/100

bus

Device 0

Device 1

Headphone/audio line-out jack

Hard diskdrive

Radeon9200

graphics IC

Audiocodec

EthernetPHY

FireWirePHY

PowerPC G4microprocessor

(L2 cache: 512K 1:1)

AirPort Extreme

I2S

I2S

I2C

BluetoothModem port

Modem module

Data pumpand DAA

Built-inspeaker

Intrepidmemorycontrollerand I/Odevice

controller

Main ICs and Buses

The architecture of Mac mini is designed around the PowerPC G4 microprocessor and the Intrepidmemory and I/O device controller. The Intrepid occupies the center of the block diagram.

The MaxBus connects the PowerPC G4 microprocessor to the Intrepid ASIC. The MaxBus has 64 datalines, 32 address lines, and a bus clock speed of 167 MHz. The Intrepid ASIC has other buses thatconnect with the boot ROM, the hard disk drive, and the optical drive, the power controller IC, thesound IC, the internal modem module, and the optional wireless LAN module.

The Intrepid I/O controller has a 32-bit PCI bus with a bus clock speed of 33 MHz.

Each of the components listed here is described in one of the following sections.

16 Block Diagram and Buses2005-04-05 | © 2005 Apple Computer, Inc. All Rights Reserved.

C H A P T E R 2

Architecture

The bus controller talks to everything elseAGP 4X bus.Graphics chip.

PCI bus: Boot ROM, USB 2.

ATA/100 bus.For hard disk,DVD/CD ROM.

PCI, ATA, AGP devices can be bus master, for Direct Memory Access (DMA). Disk can write RAM directly.

6

Page 7: EECS 150 Components & Design Techniques For Digital Systems …cs150/sp09/Lecture/lec28-gpu.pdf · A cube whose faces are made up of triangles. This is a 3-D model of a cube -- model

UC Regents Spring 2009 © UCBEECS 150 L28: Graphics Processors

Figure 2-1 Block diagram

167 MHzMaxBus

12 MbpsUSB

PCI bus

BootROM

USB 2.0 port (480 Mbps)

USB 2.0 port (480 Mbps)

PCI USB 2.0controller

DDR SDRAMDIMM slot

32 MBDDR RAM

DVI/VGA/composite/S-videooutput port

Ethernet port10/100 Mbps

FireWire 400 port

AGP 4Xbus

167 MHzMemory

bus

PMUpower controller

Powerbutton Fan

Opticaldrive

UltraATA/100

bus

Device 0

Device 1

Headphone/audio line-out jack

Hard diskdrive

Radeon9200

graphics IC

Audiocodec

EthernetPHY

FireWirePHY

PowerPC G4microprocessor

(L2 cache: 512K 1:1)

AirPort Extreme

I2S

I2S

I2C

BluetoothModem port

Modem module

Data pumpand DAA

Built-inspeaker

Intrepidmemorycontrollerand I/Odevice

controller

Main ICs and Buses

The architecture of Mac mini is designed around the PowerPC G4 microprocessor and the Intrepidmemory and I/O device controller. The Intrepid occupies the center of the block diagram.

The MaxBus connects the PowerPC G4 microprocessor to the Intrepid ASIC. The MaxBus has 64 datalines, 32 address lines, and a bus clock speed of 167 MHz. The Intrepid ASIC has other buses thatconnect with the boot ROM, the hard disk drive, and the optical drive, the power controller IC, thesound IC, the internal modem module, and the optional wireless LAN module.

The Intrepid I/O controller has a 32-bit PCI bus with a bus clock speed of 33 MHz.

Each of the components listed here is described in one of the following sections.

16 Block Diagram and Buses2005-04-05 | © 2005 Apple Computer, Inc. All Rights Reserved.

C H A P T E R 2

Architecture

ATI Radeon 9200: Graphics Processing Unit (GPU).

Mac Mini: Graphics sub-system

To Display

AGP 4X: Hi-Speed Graphics Bus

Dedicated Graphics RAM

Average selling price (ASP) for GPUs: $307

Page 8: EECS 150 Components & Design Techniques For Digital Systems …cs150/sp09/Lecture/lec28-gpu.pdf · A cube whose faces are made up of triangles. This is a 3-D model of a cube -- model

UC Regents Spring 2009 © UCBEECS 150 L28: Graphics Processors

8

Page 9: EECS 150 Components & Design Techniques For Digital Systems …cs150/sp09/Lecture/lec28-gpu.pdf · A cube whose faces are made up of triangles. This is a 3-D model of a cube -- model

UC Regents Spring 2009 © UCBEECS 150 L28: Graphics Processors

9

Page 10: EECS 150 Components & Design Techniques For Digital Systems …cs150/sp09/Lecture/lec28-gpu.pdf · A cube whose faces are made up of triangles. This is a 3-D model of a cube -- model

Parts + manufacturing cost: $283.37

Parts cost in volume: $274.69

GPU cost a significant part of total “Bill of Materials”

Source: iSuppli corporation

10

Page 11: EECS 150 Components & Design Techniques For Digital Systems …cs150/sp09/Lecture/lec28-gpu.pdf · A cube whose faces are made up of triangles. This is a 3-D model of a cube -- model

UC Regents Spring 2009 © UCBEECS 150 L28: Graphics Processors

About 12 MB/frame (24-bit pixels)24 frames/sec: 300 MB/second

1600

2560

11

Page 12: EECS 150 Components & Design Techniques For Digital Systems …cs150/sp09/Lecture/lec28-gpu.pdf · A cube whose faces are made up of triangles. This is a 3-D model of a cube -- model

UC Regents Spring 2009 © UCBEECS 150 L28: Graphics Processors

Figure 2-1 Block diagram

167 MHzMaxBus

12 MbpsUSB

PCI bus

BootROM

USB 2.0 port (480 Mbps)

USB 2.0 port (480 Mbps)

PCI USB 2.0controller

DDR SDRAMDIMM slot

32 MBDDR RAM

DVI/VGA/composite/S-videooutput port

Ethernet port10/100 Mbps

FireWire 400 port

AGP 4Xbus

167 MHzMemory

bus

PMUpower controller

Powerbutton Fan

Opticaldrive

UltraATA/100

bus

Device 0

Device 1

Headphone/audio line-out jack

Hard diskdrive

Radeon9200

graphics IC

Audiocodec

EthernetPHY

FireWirePHY

PowerPC G4microprocessor

(L2 cache: 512K 1:1)

AirPort Extreme

I2S

I2S

I2C

BluetoothModem port

Modem module

Data pumpand DAA

Built-inspeaker

Intrepidmemorycontrollerand I/Odevice

controller

Main ICs and Buses

The architecture of Mac mini is designed around the PowerPC G4 microprocessor and the Intrepidmemory and I/O device controller. The Intrepid occupies the center of the block diagram.

The MaxBus connects the PowerPC G4 microprocessor to the Intrepid ASIC. The MaxBus has 64 datalines, 32 address lines, and a bus clock speed of 167 MHz. The Intrepid ASIC has other buses thatconnect with the boot ROM, the hard disk drive, and the optical drive, the power controller IC, thesound IC, the internal modem module, and the optional wireless LAN module.

The Intrepid I/O controller has a 32-bit PCI bus with a bus clock speed of 33 MHz.

Each of the components listed here is described in one of the following sections.

16 Block Diagram and Buses2005-04-05 | © 2005 Apple Computer, Inc. All Rights Reserved.

C H A P T E R 2

Architecture

Figure 2-1 Block diagram

167 MHzMaxBus

12 MbpsUSB

PCI bus

BootROM

USB 2.0 port (480 Mbps)

USB 2.0 port (480 Mbps)

PCI USB 2.0controller

DDR SDRAMDIMM slot

32 MBDDR RAM

DVI/VGA/composite/S-videooutput port

Ethernet port10/100 Mbps

FireWire 400 port

AGP 4Xbus

167 MHzMemory

bus

PMUpower controller

Powerbutton Fan

Opticaldrive

UltraATA/100

bus

Device 0

Device 1

Headphone/audio line-out jack

Hard diskdrive

Radeon9200

graphics IC

Audiocodec

EthernetPHY

FireWirePHY

PowerPC G4microprocessor

(L2 cache: 512K 1:1)

AirPort Extreme

I2S

I2S

I2C

BluetoothModem port

Modem module

Data pumpand DAA

Built-inspeaker

Intrepidmemorycontrollerand I/Odevice

controller

Main ICs and Buses

The architecture of Mac mini is designed around the PowerPC G4 microprocessor and the Intrepidmemory and I/O device controller. The Intrepid occupies the center of the block diagram.

The MaxBus connects the PowerPC G4 microprocessor to the Intrepid ASIC. The MaxBus has 64 datalines, 32 address lines, and a bus clock speed of 167 MHz. The Intrepid ASIC has other buses thatconnect with the boot ROM, the hard disk drive, and the optical drive, the power controller IC, thesound IC, the internal modem module, and the optional wireless LAN module.

The Intrepid I/O controller has a 32-bit PCI bus with a bus clock speed of 33 MHz.

Each of the components listed here is described in one of the following sections.

16 Block Diagram and Buses2005-04-05 | © 2005 Apple Computer, Inc. All Rights Reserved.

C H A P T E R 2

Architecture

Anatomy of a “dumb” graphics card ...AGP 4X: 1.1 GB/s. Can handle 24 f/s(300 MB/s) for a 2560x1600 display.

DVI Formatter D/A

Control Logic

12 MB Frame Buffer

12 MB Frame Buffer

Double Buffering:

CPU writes “next frame” in one buffer.

Control logic sends “this frame” out of other buffer to display.

Problem: CPU has to compute a new pixel every 10 ns. 10 clock cycles for a 1 GHz CPU clock.

12

Page 13: EECS 150 Components & Design Techniques For Digital Systems …cs150/sp09/Lecture/lec28-gpu.pdf · A cube whose faces are made up of triangles. This is a 3-D model of a cube -- model

UC Regents Spring 2009 © UCBEECS 150 L28: Graphics Processors

Graphics Acceleration

Q. In a multi-core world, why should we use a special processor for graphics? A. Programmers generally use a certain coding style for graphics. We can design a processor to fit the style.

Q. What kind of graphics are we accelerating?A. In 2009, interactive entertainment (3-D games). In the 1990s, 2-D acceleration (fast windowing systems, games like Pac-Man).

Next: An intro to 3-D graphics.

13

Page 14: EECS 150 Components & Design Techniques For Digital Systems …cs150/sp09/Lecture/lec28-gpu.pdf · A cube whose faces are made up of triangles. This is a 3-D model of a cube -- model

UC Regents Spring 2009 © UCBEECS 150 L28: Graphics Processors

The Triangle ...

Simplest closed shape that may be defined by straight edges.

With enough triangles, you can make anything.

14

Page 15: EECS 150 Components & Design Techniques For Digital Systems …cs150/sp09/Lecture/lec28-gpu.pdf · A cube whose faces are made up of triangles. This is a 3-D model of a cube -- model

UC Regents Spring 2009 © UCBEECS 150 L28: Graphics Processors

A cube whose faces are made up of triangles. This is a 3-D model of a cube -- model includes faces we can’t see in this view.

A sphere whose faces are made up of triangles. With enough triangles, the curvature of the sphere can be made arbitrarily smooth.

15

Page 16: EECS 150 Components & Design Techniques For Digital Systems …cs150/sp09/Lecture/lec28-gpu.pdf · A cube whose faces are made up of triangles. This is a 3-D model of a cube -- model

UC Regents Spring 2009 © UCBEECS 150 L28: Graphics Processors

A teapot (famous object in computer graphics history). A “wire-frame” of triangles can capture the 3-D shape of complex, man-made objects.

16

Page 17: EECS 150 Components & Design Techniques For Digital Systems …cs150/sp09/Lecture/lec28-gpu.pdf · A cube whose faces are made up of triangles. This is a 3-D model of a cube -- model

UC Regents Spring 2009 © UCBEECS 150 L28: Graphics Processors

Triangle defined by 3 verticesBy transforming (v’ = f(v)) all vertices in a 3-D object (like the teapot), you can move it in the 3-D world, change it’s size, rotate it, etc.

vertex vo = (xo, yo, zo)vertex v1 = (x1, y1, z1)

vertex v2 = (x2, y2, z2)

If a teapot has 10,000 triangles, need to transform 30,000 vertices to move it in a 3-D scene ... per frame!

17

Page 18: EECS 150 Components & Design Techniques For Digital Systems …cs150/sp09/Lecture/lec28-gpu.pdf · A cube whose faces are made up of triangles. This is a 3-D model of a cube -- model

UC Regents Spring 2009 © UCBEECS 150 L28: Graphics Processors

Vertex can have color, lighting info ...If vertices colors are different, this means that a smooth gradient of color washes across triangle.

vertex vo = (ro, go, bo)vertex v1 = (r1, g1, b1)

vertex v2 = (r2, g2, b2)

More realistic graphics models include light sources in the scene. Per-vertex information can carry information about how light hits the vertex.

18

Page 19: EECS 150 Components & Design Techniques For Digital Systems …cs150/sp09/Lecture/lec28-gpu.pdf · A cube whose faces are made up of triangles. This is a 3-D model of a cube -- model

UC Regents Spring 2009 © UCBEECS 150 L28: Graphics Processors

We see a 2-D window into the 3-D world

Let’s follow

one 3-D

triangle.

19

Page 20: EECS 150 Components & Design Techniques For Digital Systems …cs150/sp09/Lecture/lec28-gpu.pdf · A cube whose faces are made up of triangles. This is a 3-D model of a cube -- model

UC Regents Spring 2009 © UCBEECS 150 L28: Graphics Processors

From 3-d triangles to screen pixelsFirst, project each 3-D triangle that might

“face” the “eye” onto the image plane.

Then, create “pixel fragments” on the boundary of the

image plane triangle

Then, create “pixel fragments” to fill in the triangle

(rasterization).

Why “pixel fragments”? A screen pixel color might depend on many triangles (example: a glass teapot).

20

Page 21: EECS 150 Components & Design Techniques For Digital Systems …cs150/sp09/Lecture/lec28-gpu.pdf · A cube whose faces are made up of triangles. This is a 3-D model of a cube -- model

UC Regents Spring 2009 © UCBEECS 150 L28: Graphics Processors

Process pixel fragment to “shade” it.Algorithmic approach: Per-pixel computational model of metal and how light reflects off of it. Move teapot and what reflects off it changes.

21

Page 22: EECS 150 Components & Design Techniques For Digital Systems …cs150/sp09/Lecture/lec28-gpu.pdf · A cube whose faces are made up of triangles. This is a 3-D model of a cube -- model

UC Regents Spring 2009 © UCBEECS 150 L28: Graphics Processors

Process each fragment to “shade” it.Artistic approach: Artist paints surface of teapot in Photoshop. We “map” this “texture” onto each pixel fragment during shading.Final step: Output Merge. Assemble pixel fragments to make final 2-d image pixels.

22

Page 23: EECS 150 Components & Design Techniques For Digital Systems …cs150/sp09/Lecture/lec28-gpu.pdf · A cube whose faces are made up of triangles. This is a 3-D model of a cube -- model

UC Regents Spring 2009 © UCBEECS 150 L28: Graphics Processors

Real-world texture maps: Bike decals

23

Page 24: EECS 150 Components & Design Techniques For Digital Systems …cs150/sp09/Lecture/lec28-gpu.pdf · A cube whose faces are made up of triangles. This is a 3-D model of a cube -- model

UC Regents Spring 2009 © UCBEECS 150 L28: Graphics Processors

Applying texture maps: Quality matters! !"#$%&'()*!

! ! ! !

!

+,-.*/0! ! )*!)*1231*4( !!

!

!

5"67#$()/(8$679:#(&$;&7#$(<"9&$#"=6(

!

"#$%!%&'('!)*+,!-(*'./!"+0*(.,'(1!2334!%#+5%!1#'!/$,$1%!+)!1'610*'!)$/1'*$(7!+(!1+8.9:%!;<-%=!"#'!*.,>!#'*'!$%!8$?$8'8!$(1+!1#*''!%'&1$+(%=!@'&1$+(!A!*'&'$?'%!7++8!)$/1'*$(7!80'!1+!$1%!%$,>/'!B3!8'7*''!>*+C'&1$+(=!D+5'?'*E!F'$(7!+(!.(!.(7/'E!%'&1$+(%!G!.(8!H!*'&'$?'!/$11/'!)$/1'*$(7E!*'%0/1$(7!$(!F/0**'8!1'610*'%!5$1#!/$11/'!8'1.$/=!

!

!

5"67#$(2/(>="?@&#@A"%(+$;&7#$(5"9&$#"=6(@=(&B$(C$<@#%$(DD**(C+'(

!

"#'!I0,'('6!J(7$('!8'/$?'*%!.!).*!,+*'!*+F0%1!.($%+1*+>$&!)$/1'*$(7!./7+*$1#,!1#.1!.&&+0(1%!)+*!.//!%0*).&'%E!*'7.*8/'%%!+)!1#'$*!+*$'(1.1$+(=!AF+?'!$%!1#'!%.,'!$,.7'!*'(8'*'8!+(!1#'!;')+*&'!KK33!;"L=!M+1'!#+5!%'&1$+(%!G!.(8!H!.*'!).*!F'11'*!8')$('8=!

!

!

! !"#$%&'()*!

! ! ! !

!

+,-.*/0! ! )*!)*1231*4( !!

!

!

5"67#$()/(8$679:#(&$;&7#$(<"9&$#"=6(

!

"#$%!%&'('!)*+,!-(*'./!"+0*(.,'(1!2334!%#+5%!1#'!/$,$1%!+)!1'610*'!)$/1'*$(7!+(!1+8.9:%!;<-%=!"#'!*.,>!#'*'!$%!8$?$8'8!$(1+!1#*''!%'&1$+(%=!@'&1$+(!A!*'&'$?'%!7++8!)$/1'*$(7!80'!1+!$1%!%$,>/'!B3!8'7*''!>*+C'&1$+(=!D+5'?'*E!F'$(7!+(!.(!.(7/'E!%'&1$+(%!G!.(8!H!*'&'$?'!/$11/'!)$/1'*$(7E!*'%0/1$(7!$(!F/0**'8!1'610*'%!5$1#!/$11/'!8'1.$/=!

!

!

5"67#$(2/(>="?@&#@A"%(+$;&7#$(5"9&$#"=6(@=(&B$(C$<@#%$(DD**(C+'(

!

"#'!I0,'('6!J(7$('!8'/$?'*%!.!).*!,+*'!*+F0%1!.($%+1*+>$&!)$/1'*$(7!./7+*$1#,!1#.1!.&&+0(1%!)+*!.//!%0*).&'%E!*'7.*8/'%%!+)!1#'$*!+*$'(1.1$+(=!AF+?'!$%!1#'!%.,'!$,.7'!*'(8'*'8!+(!1#'!;')+*&'!KK33!;"L=!M+1'!#+5!%'&1$+(%!G!.(8!H!.*'!).*!F'11'*!8')$('8=!

!

!

“Good” algorithm. B and C look blurry.

“Better” algorithm. B and C aredetailed.

24

Page 25: EECS 150 Components & Design Techniques For Digital Systems …cs150/sp09/Lecture/lec28-gpu.pdf · A cube whose faces are made up of triangles. This is a 3-D model of a cube -- model

UC Regents Spring 2009 © UCBEECS 150 L28: Graphics Processors

Putting it All Together ...

Luxo, Jr: Short movie made by Pixar, shown at SIGGRAPH in 1986.

First Academy Award given to a computer graphics movie.

25

Page 26: EECS 150 Components & Design Techniques For Digital Systems …cs150/sp09/Lecture/lec28-gpu.pdf · A cube whose faces are made up of triangles. This is a 3-D model of a cube -- model

UC Regents Spring 2009 © UCBEECS 150 L28: Graphics Processors

26

Page 27: EECS 150 Components & Design Techniques For Digital Systems …cs150/sp09/Lecture/lec28-gpu.pdf · A cube whose faces are made up of triangles. This is a 3-D model of a cube -- model

UC Regents Spring 2009 © UCBEECS 150 L28: Graphics Processors

Graphics Acceleration

Next: Back to architecture ...

27

Page 28: EECS 150 Components & Design Techniques For Digital Systems …cs150/sp09/Lecture/lec28-gpu.pdf · A cube whose faces are made up of triangles. This is a 3-D model of a cube -- model

UC Regents Spring 2009 © UCBEECS 150 L28: Graphics Processors

The graphics pipeline in hardware (2004)Figure 2-1 Block diagram

167 MHzMaxBus

12 MbpsUSB

PCI bus

BootROM

USB 2.0 port (480 Mbps)

USB 2.0 port (480 Mbps)

PCI USB 2.0controller

DDR SDRAMDIMM slot

32 MBDDR RAM

DVI/VGA/composite/S-videooutput port

Ethernet port10/100 Mbps

FireWire 400 port

AGP 4Xbus

167 MHzMemory

bus

PMUpower controller

Powerbutton Fan

Opticaldrive

UltraATA/100

bus

Device 0

Device 1

Headphone/audio line-out jack

Hard diskdrive

Radeon9200

graphics IC

Audiocodec

EthernetPHY

FireWirePHY

PowerPC G4microprocessor

(L2 cache: 512K 1:1)

AirPort Extreme

I2S

I2S

I2C

BluetoothModem port

Modem module

Data pumpand DAA

Built-inspeaker

Intrepidmemorycontrollerand I/Odevice

controller

Main ICs and Buses

The architecture of Mac mini is designed around the PowerPC G4 microprocessor and the Intrepidmemory and I/O device controller. The Intrepid occupies the center of the block diagram.

The MaxBus connects the PowerPC G4 microprocessor to the Intrepid ASIC. The MaxBus has 64 datalines, 32 address lines, and a bus clock speed of 167 MHz. The Intrepid ASIC has other buses thatconnect with the boot ROM, the hard disk drive, and the optical drive, the power controller IC, thesound IC, the internal modem module, and the optional wireless LAN module.

The Intrepid I/O controller has a 32-bit PCI bus with a bus clock speed of 33 MHz.

Each of the components listed here is described in one of the following sections.

16 Block Diagram and Buses2005-04-05 | © 2005 Apple Computer, Inc. All Rights Reserved.

C H A P T E R 2

Architecture

Output Merge

Figure 2-1 Block diagram

167 MHzMaxBus

12 MbpsUSB

PCI bus

BootROM

USB 2.0 port (480 Mbps)

USB 2.0 port (480 Mbps)

PCI USB 2.0controller

DDR SDRAMDIMM slot

32 MBDDR RAM

DVI/VGA/composite/S-videooutput port

Ethernet port10/100 Mbps

FireWire 400 port

AGP 4Xbus

167 MHzMemory

bus

PMUpower controller

Powerbutton Fan

Opticaldrive

UltraATA/100

bus

Device 0

Device 1

Headphone/audio line-out jack

Hard diskdrive

Radeon9200

graphics IC

Audiocodec

EthernetPHY

FireWirePHY

PowerPC G4microprocessor

(L2 cache: 512K 1:1)

AirPort Extreme

I2S

I2S

I2C

BluetoothModem port

Modem module

Data pumpand DAA

Built-inspeaker

Intrepidmemorycontrollerand I/Odevice

controller

Main ICs and Buses

The architecture of Mac mini is designed around the PowerPC G4 microprocessor and the Intrepidmemory and I/O device controller. The Intrepid occupies the center of the block diagram.

The MaxBus connects the PowerPC G4 microprocessor to the Intrepid ASIC. The MaxBus has 64 datalines, 32 address lines, and a bus clock speed of 167 MHz. The Intrepid ASIC has other buses thatconnect with the boot ROM, the hard disk drive, and the optical drive, the power controller IC, thesound IC, the internal modem module, and the optional wireless LAN module.

The Intrepid I/O controller has a 32-bit PCI bus with a bus clock speed of 33 MHz.

Each of the components listed here is described in one of the following sections.

16 Block Diagram and Buses2005-04-05 | © 2005 Apple Computer, Inc. All Rights Reserved.

C H A P T E R 2

Architecture

To display

Create pixels fragments

Algorithms are usually hardwired

Process each vertex

3-D vertex “stream” sent by CPU

Programmable CPU”Vertex Shader”

Process pixel fragments

Programmable CPU”Pixel Shader”

Programming Language/API? DirectX, OpenGL

28

Page 29: EECS 150 Components & Design Techniques For Digital Systems …cs150/sp09/Lecture/lec28-gpu.pdf · A cube whose faces are made up of triangles. This is a 3-D model of a cube -- model

UC Regents Spring 2009 © UCBEECS 150 L28: Graphics Processors

Vertex Shader: A “stream processor”

Shader CPU

Input Registers (Read Only)

Vertex “stream” from CPUOnly one vertex at a time placed in input registers.

Constant Registers

(Read Only)

From CPU: changes slowly (per frame,

per object)

Output Registers (Write Only)

Vertex “stream” ready for 3-D to 2-D conversion

Shader creates one vertex out for each vertex in.Working

Registers (Read/Write)

Shader Program Memory

Short (ex: 128 instr)straight-line code. Same code runs on every vertex.

29

Page 30: EECS 150 Components & Design Techniques For Digital Systems …cs150/sp09/Lecture/lec28-gpu.pdf · A cube whose faces are made up of triangles. This is a 3-D model of a cube -- model

UC Regents Spring 2009 © UCBEECS 150 L28: Graphics Processors

Optimized instructions and data formats

Input Registers

From CPU

Output Registers

Shader CPU Shader Program Memory

128-bit registers, holding four 32-bit floats.

Typical use: (x,y,z,w) representation of a point in 3-Dspace.

x y z w

x y z w

Typical instruction:

rsq dest src

dest.{x,y,z,w} = 1.0/sqrt(abs(src.w)).If src.w=0, dest ∞.

The 1/sqrt() function is often used in graphics.

To 3-D/2-D 30

Page 31: EECS 150 Components & Design Techniques For Digital Systems …cs150/sp09/Lecture/lec28-gpu.pdf · A cube whose faces are made up of triangles. This is a 3-D model of a cube -- model

UC Regents Spring 2009 © UCBEECS 150 L28: Graphics Processors

Easy to parallelize: Vertices independent

Input Registers

From CPU

Output Registers

Shader CPU

x y z w

x y z w

Input Registers

Output Registers

Shader CPU

x y z w

x y z w

To 3-D/ 2-D

Caveat: Care might be needed when merging streams.

Why?3-D to 2-D may expect triangle vertices in order in the stream.

Shader CPUs easy to multithread.

31

Page 32: EECS 150 Components & Design Techniques For Digital Systems …cs150/sp09/Lecture/lec28-gpu.pdf · A cube whose faces are made up of triangles. This is a 3-D model of a cube -- model

UC Regents Spring 2009 © UCBEECS 150 L28: Graphics Processors

Pixel shader specializations ...

Process each vertex

Create pixels fragments

Output Merge

Figure 2-1 Block diagram

167 MHzMaxBus

12 MbpsUSB

PCI bus

BootROM

USB 2.0 port (480 Mbps)

USB 2.0 port (480 Mbps)

PCI USB 2.0controller

DDR SDRAMDIMM slot

32 MBDDR RAM

DVI/VGA/composite/S-videooutput port

Ethernet port10/100 Mbps

FireWire 400 port

AGP 4Xbus

167 MHzMemory

bus

PMUpower controller

Powerbutton Fan

Opticaldrive

UltraATA/100

bus

Device 0

Device 1

Headphone/audio line-out jack

Hard diskdrive

Radeon9200

graphics IC

Audiocodec

EthernetPHY

FireWirePHY

PowerPC G4microprocessor

(L2 cache: 512K 1:1)

AirPort Extreme

I2S

I2S

I2C

BluetoothModem port

Modem module

Data pumpand DAA

Built-inspeaker

Intrepidmemorycontrollerand I/Odevice

controller

Main ICs and Buses

The architecture of Mac mini is designed around the PowerPC G4 microprocessor and the Intrepidmemory and I/O device controller. The Intrepid occupies the center of the block diagram.

The MaxBus connects the PowerPC G4 microprocessor to the Intrepid ASIC. The MaxBus has 64 datalines, 32 address lines, and a bus clock speed of 167 MHz. The Intrepid ASIC has other buses thatconnect with the boot ROM, the hard disk drive, and the optical drive, the power controller IC, thesound IC, the internal modem module, and the optional wireless LAN module.

The Intrepid I/O controller has a 32-bit PCI bus with a bus clock speed of 33 MHz.

Each of the components listed here is described in one of the following sections.

16 Block Diagram and Buses2005-04-05 | © 2005 Apple Computer, Inc. All Rights Reserved.

C H A P T E R 2

Architecture

Figure 2-1 Block diagram

167 MHzMaxBus

12 MbpsUSB

PCI bus

BootROM

USB 2.0 port (480 Mbps)

USB 2.0 port (480 Mbps)

PCI USB 2.0controller

DDR SDRAMDIMM slot

32 MBDDR RAM

DVI/VGA/composite/S-videooutput port

Ethernet port10/100 Mbps

FireWire 400 port

AGP 4Xbus

167 MHzMemory

bus

PMUpower controller

Powerbutton Fan

Opticaldrive

UltraATA/100

bus

Device 0

Device 1

Headphone/audio line-out jack

Hard diskdrive

Radeon9200

graphics IC

Audiocodec

EthernetPHY

FireWirePHY

PowerPC G4microprocessor

(L2 cache: 512K 1:1)

AirPort Extreme

I2S

I2S

I2C

BluetoothModem port

Modem module

Data pumpand DAA

Built-inspeaker

Intrepidmemorycontrollerand I/Odevice

controller

Main ICs and Buses

The architecture of Mac mini is designed around the PowerPC G4 microprocessor and the Intrepidmemory and I/O device controller. The Intrepid occupies the center of the block diagram.

The MaxBus connects the PowerPC G4 microprocessor to the Intrepid ASIC. The MaxBus has 64 datalines, 32 address lines, and a bus clock speed of 167 MHz. The Intrepid ASIC has other buses thatconnect with the boot ROM, the hard disk drive, and the optical drive, the power controller IC, thesound IC, the internal modem module, and the optional wireless LAN module.

The Intrepid I/O controller has a 32-bit PCI bus with a bus clock speed of 33 MHz.

Each of the components listed here is described in one of the following sections.

16 Block Diagram and Buses2005-04-05 | © 2005 Apple Computer, Inc. All Rights Reserved.

C H A P T E R 2

Architecture

Pixel shader needs fast access to the map of Europe on teapot (via graphics card RAM).

Texture maps (look-up tables) play a key role.

Process pixel fragments

”Pixel Shader” CPU

32

Page 33: EECS 150 Components & Design Techniques For Digital Systems …cs150/sp09/Lecture/lec28-gpu.pdf · A cube whose faces are made up of triangles. This is a 3-D model of a cube -- model

UC Regents Spring 2009 © UCBEECS 150 L28: Graphics Processors

Pixel Shader: Stream processor + Memory

Shader CPU

Input Registers (Read Only)

Pixel fragment stream from rasterizerOnly one fragment at a time placed in input registers.

Constant Registers

(Read Only)

From CPU: changes slowly (per frame,

per object)

Registers (Read/Write)

Register R0 is pixel fragment,

ready for output merge

Shader creates one fragment out for each fragment in.

Indices into texture maps.

TextureRegisters

Texture Engine

Memory System

Engine does interpolation.

33

Page 34: EECS 150 Components & Design Techniques For Digital Systems …cs150/sp09/Lecture/lec28-gpu.pdf · A cube whose faces are made up of triangles. This is a 3-D model of a cube -- model

UC Regents Spring 2009 © UCBEECS 150 L28: Graphics Processors

Example Design: Nvidia GeForce 7900

Vertex Shaders: 8

Pixel Shaders:24

3-D to 2-D

Output Merge Units

Texture Cache

278 Million Transistors, 650 MHz clock, 90 nm process

Figure 2-1 Block diagram

167 MHzMaxBus

12 MbpsUSB

PCI bus

BootROM

USB 2.0 port (480 Mbps)

USB 2.0 port (480 Mbps)

PCI USB 2.0controller

DDR SDRAMDIMM slot

32 MBDDR RAM

DVI/VGA/composite/S-videooutput port

Ethernet port10/100 Mbps

FireWire 400 port

AGP 4Xbus

167 MHzMemory

bus

PMUpower controller

Powerbutton Fan

Opticaldrive

UltraATA/100

bus

Device 0

Device 1

Headphone/audio line-out jack

Hard diskdrive

Radeon9200

graphics IC

Audiocodec

EthernetPHY

FireWirePHY

PowerPC G4microprocessor

(L2 cache: 512K 1:1)

AirPort Extreme

I2S

I2S

I2C

BluetoothModem port

Modem module

Data pumpand DAA

Built-inspeaker

Intrepidmemorycontrollerand I/Odevice

controller

Main ICs and Buses

The architecture of Mac mini is designed around the PowerPC G4 microprocessor and the Intrepidmemory and I/O device controller. The Intrepid occupies the center of the block diagram.

The MaxBus connects the PowerPC G4 microprocessor to the Intrepid ASIC. The MaxBus has 64 datalines, 32 address lines, and a bus clock speed of 167 MHz. The Intrepid ASIC has other buses thatconnect with the boot ROM, the hard disk drive, and the optical drive, the power controller IC, thesound IC, the internal modem module, and the optional wireless LAN module.

The Intrepid I/O controller has a 32-bit PCI bus with a bus clock speed of 33 MHz.

Each of the components listed here is described in one of the following sections.

16 Block Diagram and Buses2005-04-05 | © 2005 Apple Computer, Inc. All Rights Reserved.

C H A P T E R 2

Architecture

34

Page 35: EECS 150 Components & Design Techniques For Digital Systems …cs150/sp09/Lecture/lec28-gpu.pdf · A cube whose faces are made up of triangles. This is a 3-D model of a cube -- model

UC Regents Spring 2009 © UCBEECS 150 L28: Graphics Processors

Break Time ...

Next: Unified architectures

Play

35

Page 36: EECS 150 Components & Design Techniques For Digital Systems …cs150/sp09/Lecture/lec28-gpu.pdf · A cube whose faces are made up of triangles. This is a 3-D model of a cube -- model

UC Regents Spring 2009 © UCBEECS 150 L28: Graphics Processors

Unified Architectures

Basic idea: Replace specialized logic (vertex shader, pixel shader, hardwired algorithms) with many copies of one unified CPU design.

Consequence: You no longer “see” the graphics pipeline when you look at the architecture block diagram.

Designed for: DirectX 10 (Microsoft Vista), and new non-graphics markets for GPUs.

36

Page 37: EECS 150 Components & Design Techniques For Digital Systems …cs150/sp09/Lecture/lec28-gpu.pdf · A cube whose faces are made up of triangles. This is a 3-D model of a cube -- model

UC Regents Spring 2009 © UCBEECS 150 L28: Graphics Processors

!

!"#$%&"'%!()*(+,'-$.!./&(!"+(/0!%#0-(./1+'+/1+/2+()+!3++/(4+#!.52+6( 0/1( )+!3++/( '.,+-6( 7#0&8+/!69( (:%-!.'-+( ./6!0/2+6( $7( 4+#!+,(0/1( '.,+-( 6"01+#6( 0#+( %6+1( !$( '#$2+66( ./1+'+/1+/!( 4+#!.2+6( 0/1('.,+-( 7#0&8+/!6( ./('0#0--+-9( (;0#130#+( .8'-+8+/!0!.$/6( !*'.20--*(./2-%1+( 0( -0#&+#( /%8)+#($7( '.,+-( 6"01+#6( !"0/(4+#!+,( 6"01+#6( #+57-+2!./&(!"+(".&"+#(#0!.$($7('.,+-6(!$(4+#!.2+6(./(0(!*'.20-(#+/1+#./&(3$#<-$01(=:$/!#*8(0/1(:$#+!$/(>??@A9((B".6(2"0#02!+#.6!.2(0-6$(./7-%+/2+6(!"+(2$6!($7('.,+-(6"01+#6(#+-0!.4+(!$(4+#!+,(6"01+#6(6./2+('.,+-(6"01+#6(0#+(8$#+("+04.-*(#+'-.20!+19(

B"+('#$&#0880)-+('.'+-./+( .6(1.#+2!+1(%6./&(0( -$35-+4+-(0)56!#02!.$/( -0*+#( 6%2"( 06( C'+/DE( $#( F.#+2!GF9( ( B"+( 0)6!#02!.$/(-0*+#(6+#4+6(!$(".1+(!"+(1.77+#+/2+6()+!3++/(40#*./&(.8'-+8+/!05!.$/6($7(!"+('.'+-./+(0/1('#$4.1+(0(8$#+(2$/4+/.+/!('#$&#088./&(0)6!#02!.$/9((H.,+1('-0!7$#86I(6%2"(06(2$/6$-+6I(1.77+#(7#$8(JK6(./(!"0!(!"+#+(.6($/-*($/+("0#130#+(.8'-+8+/!0!.$/I(6$($7!+/(-$35-+4+-(1+!0.-6($7(!"+("0#130#+(0#+(+,'$6+1(!"#$%&"(!"+(0)6!#02!.$/(-0*+#9((

L+( #+7+#( !$( !"+( 0)6!#02!.$/( -0*+#( 06( 0( "#$%&'(( 0/1( .!( .6( 2$/5!#$--+1(!"#$%&"(.!6(MJN9(B"+(#%/!.8+('#$4.1+6(1+4.2+(./1+'+/1+/!(#+6$%#2+(80/0&+8+/!( O0--$20!.$/I( -.7+!.8+I( ./.!.0-.P0!.$/I( 4.#!%0-5.P0!.$/I(+!2Q(7$#(!+,!%#+(80'6I(4+#!+,()%77+#6I(0/1($!"+#(6!0!+(0/1(.!(2$88%/.20!+6( 3.!"( !"+( "0#130#+( 022+-+#0!$#( !"#$%&"( 1+4.2+51+'+/1+/!( 1#.4+#( 6$7!30#+9( ( B"+( !#0/6.!.$/( !$( 0( '#$&#0880)-+('.'+-./+( "06( 011+1( !"+( !06<( $7( 0)6!#02!./&( 0/1(80/0&./&( 6"01+#('#$&#086(!$(!"+(#%/!.8+9(

B"+( -.8.!+1( ./6!#%2!.$/( 6!$#+( $7( +0#-*( '#$&#0880)-+( '#$2+656$#6( 801+( !"+( 2"$.2+( $7( '#$&#088./&( ./( 0/( 066+8)-*5-.<+( -0/5&%0&+(=D#0*(>??GA()$!"('#02!.20-(0/1(./(80/*(206+6(/+2+660#*(!$(80,.8.P+( 2$/!#$-( $7( !"+( -.8.!+1( #+6$%#2+69( ( ;$3+4+#I( 8$1+6!(./2#+06+6( ./( 040.-0)-+( "0#130#+( #+6$%#2+6( 2#+0!+1( 0( /++1( 7$#( 0(".&"+#5-+4+-( '#$&#088./&( 0)6!#02!.$/( !$( 80,.8.P+( '#$&#088+#('#$1%2!.4.!*9(K5-.<+('#$&#088./&( -0/&%0&+6(3.!"( 6$8+(2%6!$8.5P0!.$/6( !$( 80!2"( !"+( %/1+#-*./&( #+/1+#./&( '.'+-./+( OR54+2!$#6I(./!#./6.26I(NSC(#+&.6!+#6Q(0/63+#+1(!".6(/++1(=J#$%17$$!(+!(0-9(>??TU(:.2#$6$7!( >??>U( :0#<( >??GU( V+66+/.2"( >??RU( :2K$$-( 0/1( F%(B$.!(>??RUA((M11.!.$/0--*I($!"+#(-0/&%0&+6("04+()++/(1+4+-$'+1(!$(+,'-$#+( !"+( %6+( $7( !"+( 6%)6!0/!.0-( 7-$0!./&5'$./!( '#$2+66./&( 0/1(8+8$#*( )0/13.1!"( $7(DJW6( 7$#( 0''-.20!.$/( 1$80./6( $!"+#( !"0/(

#+/1+#./&(=X%2<(+!(0-9(>??RU(:2K$#8.2<(+!(0-9(>??RAI()%!(3+(3.--(/$!(011#+66(!".6(-0!!+#(6%)Y+2!(7%#!"+#(./(!".6('0'+#9(

L".-+( !"+#+(0#+(6.8.-0#.!.+6( !$( .8'+#0!.4+(KJW('#$&#088./&(-0/&%0&+6(O/$!0)-*(KQI(!"+#+(0#+(6$8+(6.&/.7.20/!(1+'0#!%#+69((H$#(+,08'-+I(!"+(802"./+(0/1(2$8'.-0!.$/(8$1+-(.6(8$#+(4.#!%0-(8052"./+5-.<+I( 3.!"( !"+( 6"01+#( 066+8)-*( -0/&%0&+( 6+#4./&( 06( 0(8052"./+5./1+'+/1+/!( ./!+#8+1.0!+( -0/&%0&+( ONEQ( #0!"+#( !"0/( 0( 6'+52.7.2(802"./+( -0/&%0&+T9( B"$%&"( 0( ".&"5-+4+-( -0/&%0&+( -.<+(:.52#$6$7!Z6(;E[E(20/()+(2$8'.-+1( !$( NE($77-./+I( !"+( !#0/6-0!.$/( !$(!"+(!0#&+!("0#130#+($22%#6(Y%6!(./(!.8+(O\NBQ(0!(#%/5!.8+(3.!"(!"+(!#0/6-0!$#( .8'-+8+/!+1(06('0#!($7( !"+(1#.4+#( ./7#06!#%2!%#+(7$#( !"+(DJW9( (L+(/$!+( !"0!( !"+(C'+/DE(["01./&(E0/&%0&+( !0<+6(0(1.757+#+/!( 0''#$02"( 3.!"( !"+( +/!.#+( 2$8'.-0!.$/( '#$2+66( $22%#./&( 0!(#%/5!.8+9((

M/$!"+#( 6.&/.7.20/!( 1.77+#+/2+( .6( !"0!( 6"01./&( '#$&#086( 0#+(/$!( 6!0/10-$/+(0''-.20!.$/6I( 0/1(0#+( ./6!+01(+,+2%!./&( ./(2$/2+#!(3.!"(0('#$&#08(+,+2%!./&($/(!"+(KJW(!"0!($#2"+6!#0!+6(!"+(#+/1+#5./&('.'+-./+9( (B"+(KJW('#$&#08(0-6$( 6%''-.+6( '0#08+!+#6( !$( !"+(6"01./&('#$&#08(./(!"+(7$#8($7(!+,!%#+(80'6($#()*('$'%-0!./&($/52".'(#+&.6!+#6(20--+1(2$/6!0/!69(

L".-+( !".6( '0'+#( 1$+6( /$!( 1+62#.)+( 0( 6'+2.7.2( "0#130#+( +85)$1.8+/!($7( !"+(/+3('.'+-./+(0#2".!+2!%#+I( !"+('.'+-./+(1+6.&/( .6(6"0'+1(6.&/.7.20/!-*()*("0#130#+('#02!.20-.!.+6(0/1(306(1+6.&/+1(2$/2%##+/!-*( 3.!"( 8%-!.'-+( "0#130#+( .8'-+8+/!0!.$/69( :0/*( $7(!"+( 6!#%2!%#0-( %/1+#'.//./&6( 7#$8( 2%##+/!( "0#130#+( .8'-+8+/!05!.$/6( =MBN( >??@U( F$&&+!!( >??@U( :$/!#*8( 0/1( :$#+!$/( >??@A(2$/!./%+(!$()+()$!"(#+-+40/!(0/1(./7-%+/!.0-(./(!".6(1+6.&/9(

!"#$%&"' ''()(''*++(' '''*)+''*++*' ',)+''*++-. ''-)+''*++/'

T>]( >@^( !@T>(./6!#%2!.$/(6-$!6(

R_]` G>_^R` !@T>(

!^RV(

!a^( !>@^( !>@^(2$/6!0/!(#+&.65

!+#6( ]( G>( >>R(

T^,R?a^(

T>( T>( G>(!8'(#+&.6!+#6(

>( T>( G>(

R?a^(

T^( T^( T^( T^(./'%!(#+&.6!+#6(

R_>b ]_>b T?( G>(

#+/1+#(!0#&+!6( T( R( R( ](

608'-+#6( ]( T^( T^( T^(

( ( R(!+,!%#+6(

( ]( T^( T^(

T>](

>F(!+,(6.P+( ( ( >V,>V( ]V,]V(

./!+&+#($'6( ( ( ( !(

-$01($'( ( ( ( !(

608'-+($776+!6( ( ( ( !(

!( !( !(!#0/62+/1+/!0-(

$'6( ( !( !(

!(

(

1+#.40!.4+($'( ( ( !( !(

( 6!0!.2( 6!0!S1*/(7-$3(2$/!#$-(

( ( ( 6!0!S1*/(

1*/08.2(

!"#$%&'(&!"#$%&'()$%*'+%#,-&%'.)(/#&01)2'1-((#&34&&51/%.0+0.#,0)2'&%*%#1%$'02'67768'"#&$9#&%'02'677:;'

<,%=,-&%'*)#$'>'#&0,"(%,0.'

021,&-.,0)21;'?,%=,-&%'>'.)*)&'&%@01,%&1;'$#1"%$'*02%'1%/#&#,%1'A%&,%='1"#$%&'

B#C)A%D'+&)('/0=%*'1"#$%&'BC%*)9D'

)*&!+%&,-.%$-/%&

B"+(F.#+2!GF( T?( '.'+-./+( #+!0./6( !"+( 6!#%2!%#+( $7( !"+( !#01.!.$/0-("0#130#+5022+-+#0!+1( GF( '.'+-./+9( ( B3$( /+3( 6!0&+6( "04+( )++/(011+1(0/1($!"+#(6!0&+6("04+()++/(+.!"+#(6.8'-.7.+1($#(7%#!"+#(&+/5+#0-.P+19((B"+()06.2('.'+-./+(.6(.--%6!#0!+1(./(H.&%#+(T9(H$#(2$/6.65!+/2*(3+(1+62#.)+(+02"($7(!"+('.'+-./+(6!0&+6I(#0!"+#(!"0/(Y%6!(!"+(011.!.$/69(L+( %6+( !#01.!.$/0-( !+#86( 6%2"( 06( 4+#!+,I( !+,!%#+I( 0/1('.,+-( 7$#( 2$/!./%.!*( 3.!"( '#.$#( /$8+/2-0!%#+I( )%!( 02</$3-+1&+(!"0!( !".6( !+#8./$-$&*( #+7-+2!6( 0( 6'+2.7.2( %60&+( $7( 0(8$#+( &+/+#0-('#$2+66./&(20'0).-.!*9(

012%$' 344"567"&' 8039( &0!"+#6( TF( 4+#!+,( 10!0( 7#$8(%'( !$( ](./'%!(6!#+086(0!!02"+1(!$(4+#!+,()%77+#6(0/1(2$/4+#!6(10!0(.!+86(!$(0(20/$/.20-(7$#80!(O+9&9I(7-$0!G>Q9((c02"(6!#+08(6'+2.7.+6(0/(./1+5'+/1+/!( 4+#!+,( 6!#%2!%#+( 2$/!0././&( %'( !$( T^( 7.+-16( O20--+1( +-+58+/!6Q9( (M/(+-+8+/!( .6(0("$8$&+/$%6( !%'-+($7(T( !$(R(10!0( .!+86(O+9&9I( 7-$0!G>6Q9( (M(4+#!+,( .6( 066+8)-+1()*( #+01./&( 7#$8( !"+(2%#5#+/!-*(+/0)-+1(6!#+0869((d$#80--*(4+#!+,(10!0(.6(#+01(6+e%+/!.0--*(7#$8(+02"(4+#!+,()%77+#U("$3+4+#I( .7(0/(./1+,()%77+#( .6(6'+2.7.+1(!"+/( +02"( 6!#+08(%6+6( 0( 6"0#+1( ./1+,( !$( 2$8'%!+( !"+( $776+!( ./!$(+02"(4+#!+,()%77+#9( ( ( N/1+,./&(0--$36(011.!.$/0-('+#7$#80/2+($'5!.8.P0!.$/6( ./( !"0!( !"+(4+#!+,('#$2+66$#( 2$8'%!+6( 0( #+6%-!( !"0!( .6(2$8'-+!+-*( 1+!+#8./+1( )*( !"+( ./1+,( 40-%+I( !"+#+7$#+( #+2$8'%!05!.$/( $7( #+6%-!6( 7$#( !"+( 608+( ./1+,( 20/( )+( 04$.1+1( %6./&( 0( #+6%-!(202"+(./1+,+1()*(!"+(./1+,(40-%+9((

B"+(NM(0-6$(6%''$#!6(0(8+2"0/.68(!"0!(0--$36(!"+(NM(!$(+77+25!.4+-*(#+'-.20!+(0/($)Y+2!($( !.8+69(B".6(8+2"0/.68(.6(0/(011#+665./&(8$1+( #+7+##+1( !$( 06( &$)%*$+&$,( ./(3".2"( 0( #+'+0!( 2$%/!($( .6(066$2.0!+1(3.!"()-$2<($7(-( 4+#!.2+6( O2$##+6'$/1./&( !$( 0/($)Y+2!Q9((M!( !"+( 608+( !.8+I( !"+( '#.8.!.4+( 10!0( .6( f!0&&+1g( 3.!"( 0( 2%##+/!(./6!0/2+I('#.8.!.4+I(0/1(4+#!+,(.1(0/1(!"+6+(.16(20/()+(022+66+1(./(!"+( '#$&#0880)-+( 6!0&+6( !$( 2$8'%!+( 40-%+6( 6%2"( 06( !#0/67$#805!.$/6($#(80!+#.0-('0#08+!+#6()06+1($/(!"+6+(.169(

(((((((((((((((((((((((((((((((((((((((( (((((((((((((((((((((((((T(B".6(1$+6(2$/!#01.2!(!"+(/$!.$/(!"0!(!"+(066+8)-*5-+4+-(6"01+#('#$&#0858+#("06(0)6$-%!+(2$/!#$-9(

DirectX 10 (Vista): Towards Shader UnityEarlier APIs: Pixel and Vertex CPUs very different ...

DirectX 10: Many specs are identical for Pixel and Vertex CPUs

37

Page 38: EECS 150 Components & Design Techniques For Digital Systems …cs150/sp09/Lecture/lec28-gpu.pdf · A cube whose faces are made up of triangles. This is a 3-D model of a cube -- model

UC Regents Spring 2009 © UCBEECS 150 L28: Graphics Processors

DirectX 10 : New Pipeline Features ...!

!"#$"%& '()*"#& +!',! "#! $%#&! '%$$%()*! +#,-! &%! &./(#0%.$!1,.&"',#! 0.%$! %23,'&! #4/',! &%! ')"4! #4/',5! 67,!89! .,/-#! /! #"(:),!1,.&,;!/(-!4.%-+',#!/!#"(:),!1,.&,;!/#!%+&4+&5!!67,!89!/(-!%&7,.!4.%:./$$/2),!#&/:,#!#7/.,!/!'%$$%(!0,/&+.,!#,&!&7/&!"(')+-,#!/(!,;4/(-,-!#,&!%0!0)%/&"(:<4%"(&=!"(&,:,.=!'%(&.%)=!/(-!$,$%.*!.,/-!"(#&.+'&"%(#! /))%>"(:! /'',##! &%! +4! &%! ?@A! $,$%.*! 2+00,.#! B&,;<&+.,#C!/(-!?D!4/./$,&,.! B'%(#&/(&C!2+00,.#5!67"#!'%$$%(!'%.,! "#!-,#'."2,-!"(!$%.,!-,&/")!"(!9,'&"%(!E5!

-"./"$#0&'()*"#& +-',! &/F,#! &7,!1,.&"',#!%0!/!#"(:),!4."$"<&"1,! B4%"(&=! )"(,! #,:$,(&=! %.! &."/(:),C! /#! "(4+&! /(-! :,(,./&,#! &7,!1,.&"',#!%0!G,.%!%.!$%.,!4."$"&"1,#5! !67,!"(4+&!/(-!%+&4+&!4."$"<&"1,! &*4,#!(,,-!(%&!$/&'7=!2+&! &7,*!/.,! 0";,-!0%.! &7,!#7/-,.!4.%<:./$5!!!H!I9!4.%:./$!'/(!/$4)"0*!&7,!(+$2,.!%0!"(4+&!4."$"&"1,#!2*!,$"&&"(:!/--"&"%(/)!4."$"&"1,#!#+23,'&!&%!/!4,.<"(1%'/&"%(!)"$"&!%0!?J@E!K@<2"&!1/)+,#!%0!1,.&,;!-/&/5!!6."/(:),#!/(-!)"(,#!/.,!%+&<4+&! /#! '%((,'&,-! #&."4#! %0! 1,.&"',#5! H! I9! 4.%:./$! '/(! %+&4+&!$%.,! &7/(! %(,! #&."4! "(! /! #"(:),! "(1%'/&"%(! %.! "&! '/(! ,00,'&"1,)*!-,),&,!/(! "(4+&!4."$"&"1,!2*!(%&!4.%-+'"(:!/(!%+&4+&5!H!I9!4.%<:./$! '/(! /)#%! #"$4)*! /00";! /--"&"%(/)! /&&."2+&,#! &%! /! 4."$"&"1,!>"&7%+&! :,(,./&"(:! /--"&"%(/)! :,%$,&.*=! 0%.! ,;/$4),=! '%$4+&"(:!/--"&"%(/)!+("0%.$<1/)+,-!/&&."2+&,#!0%.!,/'7!4."$"&"1,5!!9"(',!/))!%0!&7,!4."$"&"1,!1,.&"',#!/.,!/1/")/2),=!:,%$,&."'!/&&."2+&,#!#+'7!/#!/!&."/(:),L#!4)/(,!,M+/&"%(!'/(!2,!.,/-")*!'%$4+&,-5!!

N(!/--"&"%(!&%!&7,!&./-"&"%(/)!"(4+&!4."$"&"1,#=!&."/(:),!/(-!)"(,!4."$"&"1,#!$/*!/)#%!2,!4.%',##,-!>"&7!&7,".!/-3/',(&!1,.&"',#5! !H!&."/(:),!'%$4."#,#!K!1,.&"',#!4)+#!K!/-3/',(&!1,.&"',#!>7"),!/!)"(,!7/#!@!1,.&"',#!>"&7!@!/-3/',(&!1,.&"',#!/#!#7%>(!"(!O":+.,!@5!!H-<3/',(&!1,.&"',#!/.,!"(')+-,-!/#!4/.&!%0!&7,!1,.&,;!2+00,.!0%.$/&#!0%.!&."/(:),! /(-! )"(,! 4."$"&"1,#! /(-! /.,! ,;&./'&,-! 2*! &7,! NH!>7,(! /!4."$"&"1,!&%4%)%:*!>"&7!/-3/',('*!"#!#4,'"0",-!B.,(-,.,-C5!

'$#")/&12$32$& +'1,! '%4",#!/!#+2#,&!%0! &7,!1,.&,;! "(0%.$/<&"%(!%+&4+&!2*! &7,!I9! &%!+4! &%!E!?P!%+&4+&!2+00,.#! "(!#,M+,(&"/)!

%.-,.5! ! N-,/))*! &7,!9Q!#7%+)-!7/1,!#*$$,&."'!%+&4+&!'/4/2")"&",#!>"&7!&7,!B(%(<"(-,;,-C!"(4+&!'/4/2")"&",#!%0!&7,!NH!BA!#&.,/$#!;!?D!,),$,(&#C=! 2+&! &7,! 7/.->/.,! '%#&#! >,.,! (%&! 3+#&"0",-5! 67,! 9Q! "#!)"$"&,-! &%! ,"&7,.! ?!$+)&"<,),$,(&! %+&4+&! #&.,/$!%0! +4! &%! ?D! ,),<$,(&#!%.!+4!&%!E!#"(:),<,),$,(&!%+&4+&!#&.,/$#5!R7"),!&7,!NH!'/(!#+44%.&! .,/-"(:! 0.%$!A<! /(-! ?D<2"&! -/&/! &*4,#! /(-! '%(1,.&"(:! &%!0)%/&K@=! &7,!9Q!'/(!%()*!>."&,! ./>!K@<2"&!-/&/! &*4,#5! !S%>,1,.=!-/&/!'%(1,.#"%(!/(-!4/'F"(:!'/(!2,!,/#")*! "$4),$,(&,-! "(!/!I9!4.%:./$!.,-+'"(:!&7,!(,,-!0%.!0";,-<0+('&"%(!#+44%.&5!

'"$423& )5*& 6)7$"#89)$8.5& '$):"& +6',! "#! /! 0";,-<0+('&"%(!#&/:,! 7/(-)"(:! ')"44"(:=! '+))"(:=! 4,.#4,'&"1,! -"1"-,=! 1",>4%.&!&./(#0%.$=!4."$"&"1,!#,&<+4=!#'"##%."(:=!-,4&7!%00#,&=!/(-!0./:$,(&!:,(,./&"%(5! !T%-,.(!IUV!-,#":(#! "(1/."/2)*! "(')+-,! #%$,! 0%.$!%0!,/.)*!-,4&7!4.%',##"(:!BG<'+))=!7",./.'7"'/)<GC!WH6N!@JJXY!T%(<&.*$!/(-!T%.,&%(!@JJXZ!/#!>,))5!!R,!,;4)"'"&)*!$,(&"%(!&7"#!%4<&"$"G/&"%(!/#!"&!"#!2,'%$"(:!),##!&./(#4/.,(&!&%!/44)"'/&"%(!-,1,)<%4,.#5!67,!"(4+&!%0!&7,![9!"#!&7,!1,.&"',#!/(-!/&&."2+&,#!%0!/!#"(:),!4."$"&"1,!/(-!&7,!%+&4+&!"#!/!#,.",#!%0!4";,)!0./:$,(&#5!!!

67,!4";,)!#7/-,.!4.%:./$!#4,'"0",#!&7,!$/((,.!"(!>7"'7!1,.<&,;! /&&."2+&,#! /.,! "(&,.4%)/&,-! &%! 4.%-+',! 0./:$,(&! /&&."2+&,#! B(%!"(&,.4%)/&"%(=!(%(<4,.#4,'&"1,<'%..,'&,-!"(&,.4%)/&"%(=!%.!4,.#4,'<&"1,<'%..,'&,-!"(&,.4%)/&"%(C5!!T%-,.(!IUV#!+#+/))*!#+44%.&!$+)<&"#/$4),! /(&"/)"/#"(:! WHF,),*! ?\\KZ5! ! T+)&"#/$4)"(:! .,M+".,#!/--"&"%(/)!'/.,!"(!#4,'"0*"(:!/&&."2+&,!,1/)+/&"%(!2,7/1"%.!>7,(!/!0./:$,(&!-%,#!(%&!"(')+-,!&7,!4";,)!',(&,.=!#"(',!',(&,.!,1/)+/&"%(!$/*! .,#+)&! "(! /(! %+&<%0<:/$+&! 1/)+,5! ! H(! /--"&"%(/)! ,1/)+/&"%(!M+/)"0",.! B',(&.%"-C! '/(!2,! #4,'"0",-! &%! .,M+,#&! ,1/)+/&"%(!>"&7"(!&7,!0./:$,(&!2%+(-/.",#5!

;8%"<&'()*"#& +;',!.,/-#!&7,!/&&."2+&,#!%0!/!#"(:),!4";,)!0./:<$,(&!/(-!4.%-+',#!/! #"(:),!%+&4+&! 0./:$,(&!'%(#"#&"(:!%0!?! &%!A!/&&."2+&,!B'%)%.C!1/)+,#!/(-!%4&"%(/))*!/!-,4&7!1/)+,5!!!67,!/&&."2<+&,! 1/)+,#! B,),$,(&#C! /.,! ,/'7!>."&&,(! &%! /! #,4/./&,! '%)%.! 2+00,.!B&,.$,-!/!"#$%#"!&'"(#&C!%.!&7,!,(&".,!.,#+)&!$/*!2,!-"#'/.-,-!B(%!0./:$,(&! "#!%+&4+&C5! ! !]%.$/))*!-,4&7!/(-!#&,('")!1/)+,#!/.,! 0%.<>/.-,-! 0.%$! &7,! [95! ! S%>,1,.=! &7,! U9! '/(! .,4)/',! &7,! -,4&7!1/)+,!>"&7!/!'%$4+&,-!1/)+,=!2+&!(%&!&7,!#&,('")!1/)+,5!!^%&7!-"#<'/.-"(:! 4";,)#! /(-! .,4)/'"(:! &7,! -,4&7! 1/)+,! $/*! -,0,/&! -,4&7<4.%',##"(:! %4&"$"G/&"%(#! "(! &7,! [9! #"(',! &7,*! '/(! '7/(:,! &7,!0./:$,(&L#!1"#"2")"&*5!

!!"#$%&'()'!"#$%&'!()*(+"+$,"-$.'

/012#(033"&"2-4(0#$(5"65,"65&$3.(

!"#$%&'(()*+,)-.!'/

012),&3456)-.03/

7$%#$%&8)-9)-.78/

:,1#&;&0-<=)>%&;3)%$#&;&?5-,@&A&;&

B5(%)-1C)&.B3/

D)-%)2&3456)-.D3/

E)<*)%-@&3456)-&.E3/

8)*<-@

!"6)2F$GG)-

H)#%4I3%)">1,

3%-)5*7$%#$%&.37/

J2KL+

J2KL+

KL+

MN2J2KL+

KL+;O+

J2KL+&<-

MN2J2KL+

D)-%)2F$GG)-

D)-%)2F$GG)-

P)2%$-)

O

MLO

P)2%$-)

3%-)5*F$GG)-

P)2%$-)

B)"6)-P5-9)%

J&<-&M

MLO

O

35*#,)-

MN

35*#,)-

35*#,)-

MN

MN2J2KL+

KL2J2KL+

&&&O2J2KL+&;

KL+&;&O+

!6(

&:,1#I:$,,&;&

BP&'--5@

:<"(%5"%

:<"(%5"%

:<"(%5"%

J2KL+

Q5>1"9

MLO

M&1"RM&<$%

M&1"R&M&<$%

M&1"R&ST*5"@&<$%

M&1"R&ST*5"@&<$%

M&1"R&STM&<$%

M&1"R&M&<$%

12$32$&="#:"#&+1=,>!&/F,#!/!0./:$,(&!0.%$!&7,!U9!/(-!4,.<0%.$#! &./-"&"%(/)! #&,('")! /(-! -,4&7! &,#&"(:! %4,./&"%(#! /#! >,))! /#!.,(-,.!&/.:,&!2),(-"(:5!67,!QT!#4,'"0",#!2"(-!4%"(&#!0%.!/!#"(:),!+("0",-! -,4&7_#&,('")! 2+00,.! /(-! +4! &%! A! %&7,.! .,(-,.! &/.:,&#! B/&<&."2+&,!2+00,.#C5!!67,!4";,)!#7/-,.!$+#&!%+&4+&!/!#,4/./&,!1/)+,!0%.!,/'7!.,(-,.!&/.:,&!B&7,.,!"#!(%!$+)&"'/#&C5!!R7"),!/!#"(:),!2),(-"(:!0+('&"%(!"#!#7/.,-!/'.%##!/))!%0!&7,!.,(-,.!&/.:,&#=!2),(-"(:!'/(!2,!,(/2),-!%.!-"#/2),-!"(-,4,(-,(&)*!0%.!,/'7!.,(-,.!&/.:,&5!

DS

DM

DL

'S

'L

'M

DS

DM

'S

'M

!"#$%&'*)'7#"0-6,$(0-3(,"-$(4$68$-&(9"&5(0310%$-&(:$#&"%$4.'

+,('-&./%0'12%$32$%&'456'7424'!8/9'

T%-,.(!IUV#!.,)*!7,/1")*!%(!4.%',##"(:!.,&/"(,-!-/&/!#&.+'&+.,#!"(!&7,!0%.$!%0!1,.&,;!/(-!"(-,;!2+00,.#=!&,;&+.,!$/4#=!.,(-,.!&/.<:,&#! /(-! -,4&7_#&,('")! 2+00,.#5! IUV#! &*4"'/))*! #&%.,! &7,#,! "(! /!7":7<4,.0%.$/(',!$,$%.*! #*#&,$! /&&/'7,-! -".,'&)*! &%! &7,!IUV5!!67,! ./(:,! %0! #&.+'&+.,#! "(')+-,#! 7%$%:,(,%+#! ?P! &7.%+:7! KP!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !!!!!!!!!!!!!!!!!!!!!!!!!@!9%$,!"$4),$,(&/&"%(#!&./-"&"%(/))*!.,0,.!&%!&7"#!0+('&"%(/)"&*!/#!`[QUa!0%.!./#&,.!%4,./&"%(#5!

Geometry Shader: Lets a shader program create new triangles.

Stream Output: Lets vertex streamrecirculate through shaders many times ...(and also, back to CPU)

Also: Shader CPUs are more

like RISC machines in many ways.

38

Page 39: EECS 150 Components & Design Techniques For Digital Systems …cs150/sp09/Lecture/lec28-gpu.pdf · A cube whose faces are made up of triangles. This is a 3-D model of a cube -- model

UC Regents Spring 2009 © UCBEECS 150 L28: Graphics Processors

Why? “Particle Systems” ...Why? Particle systems ...

39

Page 40: EECS 150 Components & Design Techniques For Digital Systems …cs150/sp09/Lecture/lec28-gpu.pdf · A cube whose faces are made up of triangles. This is a 3-D model of a cube -- model

UC Regents Spring 2009 © UCBEECS 150 L28: Graphics Processors

Why? Fractal images ...

40

Page 41: EECS 150 Components & Design Techniques For Digital Systems …cs150/sp09/Lecture/lec28-gpu.pdf · A cube whose faces are made up of triangles. This is a 3-D model of a cube -- model

UC Regents Spring 2009 © UCBEECS 150 L28: Graphics Processors

! GeForce 8800 Architecture in Detail

!

!

Figure 12. GeForce 8800 GTX Block Diagram !

!

!

!

TB-02787-001_v1.0 19

NVidia 8800: Unified GPU, announced Fall 2006

128 Shader CPUs Streams

loop around...

Thread processor sets shader type of each CPU

1.35 GHz Shader CPU Clock, 575 MHz core clock41

Page 42: EECS 150 Components & Design Techniques For Digital Systems …cs150/sp09/Lecture/lec28-gpu.pdf · A cube whose faces are made up of triangles. This is a 3-D model of a cube -- model

UC Regents Spring 2009 © UCBEECS 150 L28: Graphics Processors

! GeForce 8800 Architecture in Detail

!

!

Figure 12. GeForce 8800 GTX Block Diagram !

!

!

!

TB-02787-001_v1.0 19

Graphics-centric functionality ...3-D to 2-D (vertex to pixel)

Pixel fragment output merge

Texture engine and memory system

42

Page 43: EECS 150 Components & Design Techniques For Digital Systems …cs150/sp09/Lecture/lec28-gpu.pdf · A cube whose faces are made up of triangles. This is a 3-D model of a cube -- model

UC Regents Spring 2009 © UCBEECS 150 L28: Graphics Processors

Can be reconfigured with graphics logic hidden ...

128 scalar 1.35 GHz processors: Integer ALU, dual-issue single-precision IEEE floats.

! GeForce 8800 Architecture Overview

!

!

Figure 9. CUDA Thread Computing Pipeline "#$%!&'()*&+!'&,!(--*./(0.1'+!,.02!(!+0('3(43!-*(05146!514!&704(/0.'8!9(*:()*&!.'5146(0.1'!5416!9(+0!;:('0.0.&+!15!4(,!3(0(<!('3!-419.3&!02&!51**1,.'8!=&>!)&'&5.0+!.'!02.+!(4&(?!

! @'()*&+!2.82!3&'+.0>!/16-:0.'8!01!)&!3&-*1>&3!1'!+0('3(43!&'0&4-4.+&!,14=+0(0.1'+!('3!+&49&4!&'9.41'6&'0+!514!3(0(!.'0&'+.9&!(--*./(0.1'+A!

! $.9.3&+!/16-*&7!/16-:0.'8!0(+=+!.'01!+6(**&4!&*&6&'0+!02(0!(4&!-41/&++&3!+.6:*0('&1:+*>!.'!02&!BC#!01!&'()*&!4&(*D0.6&!3&/.+.1'!6(=.'8A!

! C419.3&!(!+0('3(43!-*(05146!)(+&3!1'!.'3:+04>D*&(3.'8!EFG$G%!2(43,(4&!('3!+150,(4&!514!(!,.3&!4('8&!15!2.82!3(0(!)('3,.302<!/16-:0(0.1'(**>!.'0&'+.9&!(--*./(0.1'+A!

! "16).'&+!,.02!6:*0.D/14&!"C#!+>+0&6+!01!-419.3&!(!5*&7.)*&!/16-:0.'8!-*(05146A!

! "1'041*+!/16-*&7!-4184(6+!('3!/1143.'(0&+!.'2&4&'0*>!-(4(**&*!/16-:0(0.1'!1'!02&!BC#!-41/&++&3!)>!021:+('3+!15!/16-:0.'8!024&(3+A!

!

"#$%H+!2.82!-&45146('/&<!+/(*()*&!/16-:0.'8!(4/2.0&/0:4&!+1*9&+!/16-*&7!-(4(**&*!-41)*&6+!IJJ7!5(+0&4!02('!04(3.0.1'(*!"C#D)(+&3!(4/2.0&/0:4&+?!

! #-!01!15!IKL!-(4(**&*!IAMNBOP!/16-:0&!/14&+!.'!B&Q14/&!LLJJ!BRS!BC#+!2(4'&++!6(++.9&!5*1(0.'8!-1.'0!-41/&++.'8!-1,&4!&'()*.'8!6(7.6:6!(--*./(0.1'!-&45146('/&A!

! R24&(3!/16-:0.'8!+/(*&+!(/41++!EFG$G%H+!/16-*&0&!*.'&!15!'&70!8&'&4(0.1'!BC#+!D!5416!&6)&33&3!BC#+!01!2.82!-&45146('/&!BC#+!02(0!+:--140!2:'34&3+!15!-41/&++14+A!

! EFG$G%!TUGV!0&/2'1*18>!(**1,+!6:*0.-*&!BC#+!01!3.+04.):0&!/16-:0.'8!01!-419.3&!:'-(4(**&*&3!/16-:0&!3&'+.0>A!

!

TB-02787-001_v1.0 13

Texture system set up to look like a conventional memory system (768MB GDDR3, 86 GB/s)

1000s of active threads

3 TeraFlops Peak Performance Ships with a C compiler.

43

Page 44: EECS 150 Components & Design Techniques For Digital Systems …cs150/sp09/Lecture/lec28-gpu.pdf · A cube whose faces are made up of triangles. This is a 3-D model of a cube -- model

UC Regents Spring 2009 © UCBEECS 150 L28: Graphics Processors

Chip Facts

90nm process681M Transistors80 die/wafer (pre-testing)

A big die. Many chips will not work (low yield). Low profits.

4 year design cycle

Design Facts

$400 Million design budget600 person-years: 10 people at start, 300 at peak

44

Page 45: EECS 150 Components & Design Techniques For Digital Systems …cs150/sp09/Lecture/lec28-gpu.pdf · A cube whose faces are made up of triangles. This is a 3-D model of a cube -- model

UC Regents Spring 2009 © UCBEECS 150 L28: Graphics Processors

GeForce 8800 GTX Card: $599 List Price

PCI-Express 16X Card - 2 Aux Power Plugs!

185 Watts Thermal Design Point (TDP) -- TDP is a “real-world” maximum power spec.

45

Page 46: EECS 150 Components & Design Techniques For Digital Systems …cs150/sp09/Lecture/lec28-gpu.pdf · A cube whose faces are made up of triangles. This is a 3-D model of a cube -- model

UC Regents Spring 2009 © UCBEECS 150 L28: Graphics Processors

Some products are “loss-leaders”

Breakthrough product creates“free” publicity you can’t buy.

(1) Hope: when chip “shrinks” to 65nm fab process, die will be smaller, yields will improve, profits will rise.

(2) Simpler versions of the design will be made to create an entire product family, some very profitable.“We tape out a chip a month”, NVidia CEO quote.

46

Page 47: EECS 150 Components & Design Techniques For Digital Systems …cs150/sp09/Lecture/lec28-gpu.pdf · A cube whose faces are made up of triangles. This is a 3-D model of a cube -- model

UC Regents Spring 2009 © UCBEECS 150 L28: Graphics Processors

And it happened! 2008 nVidia products

9800 GTX

Specs similar to 8800, card sells for $199.

GTX 280

Price similar to 8800, stream CPU count > 2X ...

47

Page 48: EECS 150 Components & Design Techniques For Digital Systems …cs150/sp09/Lecture/lec28-gpu.pdf · A cube whose faces are made up of triangles. This is a 3-D model of a cube -- model

UC Regents Spring 2009 © UCBEECS 150 L28: Graphics Processors

48

Page 49: EECS 150 Components & Design Techniques For Digital Systems …cs150/sp09/Lecture/lec28-gpu.pdf · A cube whose faces are made up of triangles. This is a 3-D model of a cube -- model

UC Regents Spring 2009 © UCBEECS 150 L28: Graphics Processors

Face was “scanned” to create a vertex model. 8800 GTX was used to do skin, eye, lips and hair rendering.

49

Page 50: EECS 150 Components & Design Techniques For Digital Systems …cs150/sp09/Lecture/lec28-gpu.pdf · A cube whose faces are made up of triangles. This is a 3-D model of a cube -- model

UC Regents Spring 2009 © UCBEECS 150 L28: Graphics Processors

History and Graphics Processors

Create standard model from common practice: Wire-frame geometry, triangle rasterization, pixel shading.

Put model in hardware: Block diagram of chip matches computer graphics math.

Evolve to be programmable: At some point, it becomes hard to see the math in the block diagram.

“Wheel of reincarnation” -- Hardwired graphics hardware evolves to look like general-purpose CPU. EECS visitor Ivan Sutherland co-wrote a paper on this topic in 1968!

50

Page 51: EECS 150 Components & Design Techniques For Digital Systems …cs150/sp09/Lecture/lec28-gpu.pdf · A cube whose faces are made up of triangles. This is a 3-D model of a cube -- model

UC Regents Spring 2009 © UCBEECS 150 L28: Graphics Processors

Intel @ SIGGRAPH 2008: Larrabee

!"#$%&'&(&)*&$+,(-./!"#$"%&'()&'*+%,"+-&'.)&'!/%+-0$"&'1)&'23%4567&'8)&'9:%+47&';)&'.<:"5&'=)&'><-?#-4&'!)&'(+?"&'9)&'!<0"%,+-&'>)&'*+@#-&'A)&'14/+4+&'A)&'B%3C73D4?#&'1)&'><+-&'8)&'E+-%+7+-&'=)'FGGH)'(+%%+:""I'9';+-5J*3%"'KHL'9%C7#6"C6<%"'M3%'N#4<+$'*3,/<6#-0)'!"#$%&'()*$+&',-*$./&'O&'9%6#C$"'PH'Q9<0<46'FGGHR&'PS'/+0"4)'.TU'V'PG)PPWSXPOLGLPF)POLGLPY'766/IXXZ3#)+C,)3%0XPG)PPWSXPOLGLPF)POLGLPY)

",01(234/$5,/2*&="%,#44#3-'63',+?"'Z#0#6+$'3%'7+%Z'C3/#"4'3M'/+%6'3%'+$$'3M'67#4'D3%?'M3%'/"%43-+$'3%'C$+44%33,'<4"'#4'0%+-6"Z'D#673<6'M""'/%3@#Z"Z'67+6'C3/#"4'+%"'-36',+Z"'3%'Z#46%#:<6"Z'M3%'/%3!'6'3%'Z#%"C6'C3,,"%C#+$'+Z@+-6+0"'+-Z'67+6'C3/#"4'473D'67#4'-36#C"'3-'67"'!'%46'/+0"'3%'#-#6#+$'4C%""-'3M'+'Z#4/$+5'+$3-0'D#67'67"'M<$$'C#6+6#3-)'*3/5%#0764'M3%'C3,/3-"-64'3M'67#4'D3%?'3D-"Z':5'367"%4'67+-'9*;',<46':"'73-3%"Z)'9:46%+C6#-0'D#67'C%"Z#6'#4'/"%,#66"Z)'83'C3/5'367"%D#4"&'63'%"/<:$#47&'63'/346'3-'4"%@"%4&'63'%"Z#46%#:<6"'63'$#464&'3%'63'<4"'+-5'C3,/3-"-6'3M'67#4'D3%?'#-'367"%'D3%?4'%"[<#%"4'/%#3%'4/"C#!'C'/"%,#44#3-'+-ZX3%'+'M"")'="%,#44#3-4',+5':"'%"[<"46"Z'M%3,'=<:$#C+6#3-4'."/6)&'9*;&'U-C)&'F'="--'=$+\+&'!<#6"'YGP&']"D' 3̂%?&']^'PGPFP_GYGP&'M+K'`P'QFPFR'HLa_GWHP&'3%'/"%,#44#3-4b+C,)3%0)c'FGGH'9*;'GYOG_GOGPXFGGHXGO_9A8PH'dS)GG'.TU'PG)PPWSXPOLGLPF)POLGLPY'766/IXXZ3#)+C,)3%0XPG)PPWSXPOLGLPF)POLGLPY

!

!"##"$%%&'(')"*+,-.#%'/01'(#2345%256#%'7.#'8496":'-.;<654*='

!"##$%&'()'#*+%%%,-./%0"#1'"2

*+%%%3#(4%&5#"2/)'

*+%%%6-1%7-#8$9:

*+%%%;(4:"')%<=#"8:

>+%

?#"@''5%,.='$*+%%%&9'5:'2%A.2B(28

*+%%%<@"1%!"B'

*+%%%A'#'1$%&./'#1"2

C+%

D-='#9%0"E(2*+%%%D-/'#%385"8"

*+%%%3@%F#-4:-G8B(

*+%%%6-2(%A."2

*+%%%"2@%%?"9%H"2#":"2

C

($95#"25!"#'

6:(8% 5"5'#% 5#'8'298% "% 1"2$I4-#'% E(8.")% 4-15.9(2/% "#4:(9'49.#'%

4-@'%2"1'@%!"##"=''+%"%2'G%8-J9G"#'%#'2@'#(2/%5(5')(2'+%"%1"2$I

4-#'% 5#-/#"11(2/% 1-@')+% "2@% 5'#J-#1"24'% "2")$8(8% J-#% 8'E'#")%

"55)(4"9(-28K%!"##"=''%.8'8%1.)9(5)'%(2I-#@'#%LMN%0?O%4-#'8%9:"9%

"#'%"./1'29'@%=$%"%G(@'%E'49-#%5#-4'88-#%.2(9+% "8%G'))% "8% 8-1'%

J(L'@% J.249(-2% )-/(4% =)-4B8K% 6:(8% 5#-E(@'8% @#"1"9(4"))$% :(/:'#%

5'#J-#1"24'%5'#%G"99%"2@%5'#%.2(9%-J%"#'"%9:"2%-.9I-JI-#@'#%0?O8%

-2% :(/:)$% 5"#"))')% G-#B)-"@8K% P9% ")8-% /#'"9)$% (24#'"8'8% 9:'%

J)'L(=()(9$%"2@%5#-/#"11"=()(9$%-J%9:'%"#4:(9'49.#'%"8%4-15"#'@%9-%

89"2@"#@%F?O8K%<%4-:'#'29%-2I@('%>2@%)'E')%4"4:'%"))-G8%'JJ(4('29%

(29'#I5#-4'88-#% 4-11.2(4"9(-2% "2@% :(/:I="2@G(@9:% )-4")% @"9"%

"44'88%=$%0?O%4-#'8K%6"8B%84:'@.)(2/%(8%5'#J-#1'@%'29(#')$%G(9:%

8-J9G"#'% (2% !"##"=''+% #"9:'#% 9:"2% (2% J(L'@% J.249(-2% )-/(4K% 6:'%

4.89-1(Q"=)'% 8-J9G"#'% /#"5:(48% #'2@'#(2/% 5(5')(2'% J-#% 9:(8%

"#4:(9'49.#'% .8'8% =(22(2/% (2% -#@'#% 9-% #'@.4'% #'R.(#'@% 1'1-#$%

="2@G(@9:+%1(2(1(Q'% )-4B% 4-29'29(-2+% "2@% (24#'"8'%-55-#9.2(9('8%

J-#% 5"#"))')(81% #')"9(E'% 9-% 89"2@"#@% F?O8K% 6:'% !"##"=''% 2"9(E'%

5#-/#"11(2/% 1-@')% 8.55-#98% "% E"#('9$% -J% :(/:)$% 5"#"))')%

"55)(4"9(-28% 9:"9% .8'% (##'/.)"#% @"9"% 89#.49.#'8K% ?'#J-#1"24'%

"2")$8(8% -2% 9:-8'% "55)(4"9(-28% @'1-289#"9'8% !"##"=''S8% 5-9'29(")%

J-#%"%=#-"@%#"2/'%-J%5"#"))')%4-15.9"9(-2K%

""#$% PKCK*% T0-15.9'#% F#"5:(48UV% H"#@G"#'% <#4:(9'49.#'IIF#"5:(48% ?#-4'88-#8+% ?"#"))')% ?#-4'88(2/W% PKCKC% T0-15.9'#%F#"5:(48UV% ?(49.#'XP1"/'% F'2'#"9(-2II,(85)"$% <)/-#(9:18W% PKCKY%T0-15.9'#%F#"5:(48UV%6:#''I,(1'28(-2")%F#"5:(48%"2@%D'")(81II0-)-#+%8:"@(2/+%8:"@-G(2/+%"2@%9'L9.#'%

%&'()*+8$% /#"5:(48% "#4:(9'49.#'+% 1"2$I4-#'% 4-15.9(2/+% #'")I9(1'% /#"5:(48+% 8-J9G"#'% #'2@'#(2/+% 9:#-./:5.9% 4-15.9(2/+% E(8.")%4-15.9(2/+%5"#"))')%5#-4'88(2/+%&P;,+%F?F?OK!

>?' @*5#.A6254.*'

;-@'#2%F?O8%"#'%(24#'"8(2/)$%5#-/#"11"=)'%(2%-#@'#%9-%8.55-#9%"@E"24'@% /#"5:(48% ")/-#(9:18% "2@% -9:'#% 5"#"))')% "55)(4"9(-28K%

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%*%P29')Z%0-#5-#"9(-2V%%)"##$K8'()'#+%@-./K4"#1'"2+%'#(4K85#"2/)'+%

9-1KJ-#8$9:+%5#"@''5K@.='$+%89'5:'2K[.2B(28+%"@"1K9K)"B'+%

#-='#9K@K4"E(2+%#-/'#K'85"8"+%'@G"#@K/#-4:-G8B(%\%9-2(K[."2%

](29')K4-1%

>%D<,%F"1'%6--)8V%%1(B'"]#"@/"1'9--)8K4-1%

C%&9"2J-#@%O2(E'#8(9$V%%$-')%\%:"2#":"2%]48K89"2J-#@K'@.%

H-G'E'#+% /'2'#")% 5.#5-8'% 5#-/#"11"=()(9$% -J% 9:'% /#"5:(48%5(5')(2'%(8%#'89#(49'@%=$%)(1(9"9(-28%-2%9:'%1'1-#$%1-@')%"2@%=$%J(L'@% J.249(-2% =)-4B8% 9:"9% 84:'@.)'% 9:'% 5"#"))')% 9:#'"@8% -J%'L'4.9(-2K% 7-#% 'L"15)'+% 5(L')% 5#-4'88(2/% -#@'#% (8% 4-29#-))'@% =$%9:'%#"89'#(Q"9(-2%)-/(4%"2@%-9:'#%@'@(4"9'@%84:'@.)(2/%)-/(4K%

6:(8%5"5'#%@'84#(='8%"%:(/:)$%5"#"))')%"#4:(9'49.#'%9:"9%1"B'8%9:'%#'2@'#(2/% 5(5')(2'% 4-15)'9')$% 5#-/#"11"=)'K% 6:'% !"##"=''%"#4:(9'49.#'%(8%="8'@%-2%(2I-#@'#%0?O%4-#'8%9:"9%#.2%"2%'L9'2@'@%E'#8(-2% -J% 9:'% LMN% (289#.49(-2% 8'9+% (24).@(2/% G(@'% E'49-#%5#-4'88(2/% -5'#"9(-28% "2@% 8-1'% 85'4(")(Q'@% 84")"#% (289#.49(-28K%7(/.#'% *% 8:-G8% "% 84:'1"9(4% ()).89#"9(-2% -J% 9:'% "#4:(9'49.#'K% 6:'%4-#'8% '"4:% "44'88% 9:'(#% -G2% 8.=8'9% -J% "% 4-:'#'29% !>% 4"4:'% 9-%5#-E(@'% :(/:I="2@G(@9:% !>% 4"4:'% "44'88% J#-1% '"4:% 4-#'% "2@% 9-%8(15)(J$%@"9"%8:"#(2/%"2@%8$24:#-2(Q"9(-2K%

!"##"=''% (8%1-#'% J)'L(=)'% 9:"2% 4.##'29%F?O8K% P98%0?OI)(B'% LMNI="8'@% "#4:(9'49.#'% 8.55-#98% 8.=#-.9(2'8% "2@% 5"/'% J".)9(2/K% &-1'%-5'#"9(-28% 9:"9% F?O8% 9#"@(9(-2"))$% 5'#J-#1% G(9:% J(L'@% J.249(-2%)-/(4+% 8.4:% "8% #"89'#(Q"9(-2% "2@% 5-89I8:"@'#% =)'2@(2/+% "#'%5'#J-#1'@%'29(#')$%(2%8-J9G"#'%(2%!"##"=''K%!(B'%F?O8+%!"##"=''%.8'8%J(L'@%J.249(-2%)-/(4%J-#%9'L9.#'%J()9'#(2/+%=.9%9:'%4-#'8%"88(89%9:'%J(L'@%J.249(-2%)-/(4+%'K/K%=$%8.55-#9(2/%5"/'%J".)98K%%

%

!"#$%&'(!"#$%&'()*$"+,")%&"-(..(/&&"'(012$+.&"(.$%*)&$)3.&!"4%&"03'/&."+,"567"$+.&8"(09")%&"03'/&."(09")1:&"+,"$+2:.+$&88+.8"(09" ;<=" />+$?8" (.&" *':>&'&0)()*+029&:&09&0)@" (8" (.&" )%&":+8*)*+08"+,")%&"567"(09"0+02567"/>+$?8"+0")%&"$%*:A"

6:(8%5"5'#%")8-%@'84#(='8%"%8-J9G"#'%#'2@'#(2/%5(5')(2'% 9:"9% #.28%'JJ(4('29)$% -2% 9:(8% "#4:(9'49.#'K% P9% .8'8% =(22(2/% 9-% (24#'"8'%5"#"))')(81% "2@% #'@.4'% 1'1-#$% ="2@G(@9:+% G:()'% "E-(@(2/% 9:'%5#-=)'18%-J%8-1'%5#'E(-.8%9()'I="8'@%"#4:(9'49.#'8K%P15)'1'29(2/%9:'% #'2@'#'#% (2% 8-J9G"#'%"))-G8%'L(89(2/% J'"9.#'8% 9-%='%-59(1(Q'@%="8'@% -2% G-#B)-"@% "2@% "))-G8% 2'G% J'"9.#'8% 9-% ='% "@@'@K% 7-#%'L"15)'+% 5#-/#"11"=)'% =)'2@(2/% "2@% -#@'#I(2@'5'2@'29%9#"285"#'24$%J(9%'"8()$%(29-%9:'%!"##"=''%8-J9G"#'%5(5')(2'K%%

7(2"))$+% 9:(8%5"5'#%@'84#(='8%"%5#-/#"11(2/%1-@')% 9:"9% 8.55-#98%1-#'% /'2'#")% 5"#"))')% "55)(4"9(-28+% 8.4:% "8% (1"/'% 5#-4'88(2/+%5:$8(4")%8(1.)"9(-2+%"2@%1'@(4")%\%J(2"24(")%"2")$9(48K%!"##"=''S8%8.55-#9% J-#% (##'/.)"#% @"9"% 89#.49.#'8% "2@% (98% 84"99'#I/"9:'#%4"5"=()(9$% 1"B'% (9% 8.(9"=)'% J-#% 9:'8'% 9:#-./:5.9% "55)(4"9(-28% "8%@'1-289#"9'@%=$%-.#%84")"=()(9$%"2@%5'#J-#1"24'%"2")$8(8K%

ACM Transactions on Graphics, Vol. 27, No. 3, Article 18, Publication date: August 2008.

!"#$%&'&(&)*&$+,(-./

!"#$"%&'()&'*+%,"+-&'.)&'!/%+-0$"&'1)&'23%4567&'8)&'9:%+47&';)&'.<:"5&'=)&'><-?#-4&'!)&'(+?"&'9)&'!<0"%,+-&'>)&'*+@#-&'A)&'14/+4+&'A)&'B%3C73D4?#&'1)&'><+-&'8)&'E+-%+7+-&'=)'FGGH)'(+%%+:""I'9';+-5J*3%"'KHL'9%C7#6"C6<%"'M3%'N#4<+$'*3,/<6#-0)'!"#$%&'()*$+&',-*$./&'O&'9%6#C$"'PH'Q9<0<46'FGGHR&'PS'/+0"4)'.TU'V'PG)PPWSXPOLGLPF)POLGLPY'766/IXXZ3#)+C,)3%0XPG)PPWSXPOLGLPF)POLGLPY)

",01(234/$5,/2*&

="%,#44#3-'63',+?"'Z#0#6+$'3%'7+%Z'C3/#"4'3M'/+%6'3%'+$$'3M'67#4'D3%?'M3%'/"%43-+$'3%'C$+44%33,'<4"'#4'0%+-6"Z'D#673<6'M""'/%3@#Z"Z'67+6'C3/#"4'+%"'-36',+Z"'3%'Z#46%#:<6"Z'M3%'/%3!'6'3%'Z#%"C6'C3,,"%C#+$'+Z@+-6+0"'+-Z'67+6'C3/#"4'473D'67#4'-36#C"'3-'67"'!'%46'/+0"'3%'#-#6#+$'4C%""-'3M'+'Z#4/$+5'+$3-0'D#67'67"'M<$$'C#6+6#3-)'*3/5%#0764'M3%'C3,/3-"-64'3M'67#4'D3%?'3D-"Z':5'367"%4'67+-'9*;',<46':"'73-3%"Z)'9:46%+C6#-0'D#67'C%"Z#6'#4'/"%,#66"Z)'83'C3/5'367"%D#4"&'63'%"/<:$#47&'63'/346'3-'4"%@"%4&'63'%"Z#46%#:<6"'63'$#464&'3%'63'<4"'+-5'C3,/3-"-6'3M'67#4'D3%?'#-'367"%'D3%?4'%"[<#%"4'/%#3%'4/"C#!'C'/"%,#44#3-'+-ZX3%'+'M"")'="%,#44#3-4',+5':"'%"[<"46"Z'M%3,'=<:$#C+6#3-4'."/6)&'9*;&'U-C)&'F'="--'=$+\+&'!<#6"'YGP&']"D' 3̂%?&']^'PGPFP_GYGP&'M+K'`P'QFPFR'HLa_GWHP&'3%'/"%,#44#3-4b+C,)3%0)c'FGGH'9*;'GYOG_GOGPXFGGHXGO_9A8PH'dS)GG'.TU'PG)PPWSXPOLGLPF)POLGLPY'766/IXXZ3#)+C,)3%0XPG)PPWSXPOLGLPF)POLGLPY

!

!"##"$%%&'(')"*+,-.#%'/01'(#2345%256#%'7.#'8496":'-.;<654*='

!"##$%&'()'#*+%%%,-./%0"#1'"2

*+%%%3#(4%&5#"2/)'

*+%%%6-1%7-#8$9:

*+%%%;(4:"')%<=#"8:

>+%

?#"@''5%,.='$*+%%%&9'5:'2%A.2B(28

*+%%%<@"1%!"B'

*+%%%A'#'1$%&./'#1"2

C+%

D-='#9%0"E(2*+%%%D-/'#%385"8"

*+%%%3@%F#-4:-G8B(

*+%%%6-2(%A."2

*+%%%"2@%%?"9%H"2#":"2

C

($95#"25!"#'

6:(8% 5"5'#% 5#'8'298% "% 1"2$I4-#'% E(8.")% 4-15.9(2/% "#4:(9'49.#'%

4-@'%2"1'@%!"##"=''+%"%2'G%8-J9G"#'%#'2@'#(2/%5(5')(2'+%"%1"2$I

4-#'% 5#-/#"11(2/% 1-@')+% "2@% 5'#J-#1"24'% "2")$8(8% J-#% 8'E'#")%

"55)(4"9(-28K%!"##"=''%.8'8%1.)9(5)'%(2I-#@'#%LMN%0?O%4-#'8%9:"9%

"#'%"./1'29'@%=$%"%G(@'%E'49-#%5#-4'88-#%.2(9+% "8%G'))% "8% 8-1'%

J(L'@% J.249(-2% )-/(4% =)-4B8K% 6:(8% 5#-E(@'8% @#"1"9(4"))$% :(/:'#%

5'#J-#1"24'%5'#%G"99%"2@%5'#%.2(9%-J%"#'"%9:"2%-.9I-JI-#@'#%0?O8%

-2% :(/:)$% 5"#"))')% G-#B)-"@8K% P9% ")8-% /#'"9)$% (24#'"8'8% 9:'%

J)'L(=()(9$%"2@%5#-/#"11"=()(9$%-J%9:'%"#4:(9'49.#'%"8%4-15"#'@%9-%

89"2@"#@%F?O8K%<%4-:'#'29%-2I@('%>2@%)'E')%4"4:'%"))-G8%'JJ(4('29%

(29'#I5#-4'88-#% 4-11.2(4"9(-2% "2@% :(/:I="2@G(@9:% )-4")% @"9"%

"44'88%=$%0?O%4-#'8K%6"8B%84:'@.)(2/%(8%5'#J-#1'@%'29(#')$%G(9:%

8-J9G"#'% (2% !"##"=''+% #"9:'#% 9:"2% (2% J(L'@% J.249(-2% )-/(4K% 6:'%

4.89-1(Q"=)'% 8-J9G"#'% /#"5:(48% #'2@'#(2/% 5(5')(2'% J-#% 9:(8%

"#4:(9'49.#'% .8'8% =(22(2/% (2% -#@'#% 9-% #'@.4'% #'R.(#'@% 1'1-#$%

="2@G(@9:+%1(2(1(Q'% )-4B% 4-29'29(-2+% "2@% (24#'"8'%-55-#9.2(9('8%

J-#% 5"#"))')(81% #')"9(E'% 9-% 89"2@"#@% F?O8K% 6:'% !"##"=''% 2"9(E'%

5#-/#"11(2/% 1-@')% 8.55-#98% "% E"#('9$% -J% :(/:)$% 5"#"))')%

"55)(4"9(-28% 9:"9% .8'% (##'/.)"#% @"9"% 89#.49.#'8K% ?'#J-#1"24'%

"2")$8(8% -2% 9:-8'% "55)(4"9(-28% @'1-289#"9'8% !"##"=''S8% 5-9'29(")%

J-#%"%=#-"@%#"2/'%-J%5"#"))')%4-15.9"9(-2K%

""#$% PKCK*% T0-15.9'#% F#"5:(48UV% H"#@G"#'% <#4:(9'49.#'IIF#"5:(48% ?#-4'88-#8+% ?"#"))')% ?#-4'88(2/W% PKCKC% T0-15.9'#%F#"5:(48UV% ?(49.#'XP1"/'% F'2'#"9(-2II,(85)"$% <)/-#(9:18W% PKCKY%T0-15.9'#%F#"5:(48UV%6:#''I,(1'28(-2")%F#"5:(48%"2@%D'")(81II0-)-#+%8:"@(2/+%8:"@-G(2/+%"2@%9'L9.#'%

%&'()*+8$% /#"5:(48% "#4:(9'49.#'+% 1"2$I4-#'% 4-15.9(2/+% #'")I9(1'% /#"5:(48+% 8-J9G"#'% #'2@'#(2/+% 9:#-./:5.9% 4-15.9(2/+% E(8.")%4-15.9(2/+%5"#"))')%5#-4'88(2/+%&P;,+%F?F?OK!

>?' @*5#.A6254.*'

;-@'#2%F?O8%"#'%(24#'"8(2/)$%5#-/#"11"=)'%(2%-#@'#%9-%8.55-#9%"@E"24'@% /#"5:(48% ")/-#(9:18% "2@% -9:'#% 5"#"))')% "55)(4"9(-28K%

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%*%P29')Z%0-#5-#"9(-2V%%)"##$K8'()'#+%@-./K4"#1'"2+%'#(4K85#"2/)'+%

9-1KJ-#8$9:+%5#"@''5K@.='$+%89'5:'2K[.2B(28+%"@"1K9K)"B'+%

#-='#9K@K4"E(2+%#-/'#K'85"8"+%'@G"#@K/#-4:-G8B(%\%9-2(K[."2%

](29')K4-1%

>%D<,%F"1'%6--)8V%%1(B'"]#"@/"1'9--)8K4-1%

C%&9"2J-#@%O2(E'#8(9$V%%$-')%\%:"2#":"2%]48K89"2J-#@K'@.%

H-G'E'#+% /'2'#")% 5.#5-8'% 5#-/#"11"=()(9$% -J% 9:'% /#"5:(48%5(5')(2'%(8%#'89#(49'@%=$%)(1(9"9(-28%-2%9:'%1'1-#$%1-@')%"2@%=$%J(L'@% J.249(-2% =)-4B8% 9:"9% 84:'@.)'% 9:'% 5"#"))')% 9:#'"@8% -J%'L'4.9(-2K% 7-#% 'L"15)'+% 5(L')% 5#-4'88(2/% -#@'#% (8% 4-29#-))'@% =$%9:'%#"89'#(Q"9(-2%)-/(4%"2@%-9:'#%@'@(4"9'@%84:'@.)(2/%)-/(4K%

6:(8%5"5'#%@'84#(='8%"%:(/:)$%5"#"))')%"#4:(9'49.#'%9:"9%1"B'8%9:'%#'2@'#(2/% 5(5')(2'% 4-15)'9')$% 5#-/#"11"=)'K% 6:'% !"##"=''%"#4:(9'49.#'%(8%="8'@%-2%(2I-#@'#%0?O%4-#'8%9:"9%#.2%"2%'L9'2@'@%E'#8(-2% -J% 9:'% LMN% (289#.49(-2% 8'9+% (24).@(2/% G(@'% E'49-#%5#-4'88(2/% -5'#"9(-28% "2@% 8-1'% 85'4(")(Q'@% 84")"#% (289#.49(-28K%7(/.#'% *% 8:-G8% "% 84:'1"9(4% ()).89#"9(-2% -J% 9:'% "#4:(9'49.#'K% 6:'%4-#'8% '"4:% "44'88% 9:'(#% -G2% 8.=8'9% -J% "% 4-:'#'29% !>% 4"4:'% 9-%5#-E(@'% :(/:I="2@G(@9:% !>% 4"4:'% "44'88% J#-1% '"4:% 4-#'% "2@% 9-%8(15)(J$%@"9"%8:"#(2/%"2@%8$24:#-2(Q"9(-2K%

!"##"=''% (8%1-#'% J)'L(=)'% 9:"2% 4.##'29%F?O8K% P98%0?OI)(B'% LMNI="8'@% "#4:(9'49.#'% 8.55-#98% 8.=#-.9(2'8% "2@% 5"/'% J".)9(2/K% &-1'%-5'#"9(-28% 9:"9% F?O8% 9#"@(9(-2"))$% 5'#J-#1% G(9:% J(L'@% J.249(-2%)-/(4+% 8.4:% "8% #"89'#(Q"9(-2% "2@% 5-89I8:"@'#% =)'2@(2/+% "#'%5'#J-#1'@%'29(#')$%(2%8-J9G"#'%(2%!"##"=''K%!(B'%F?O8+%!"##"=''%.8'8%J(L'@%J.249(-2%)-/(4%J-#%9'L9.#'%J()9'#(2/+%=.9%9:'%4-#'8%"88(89%9:'%J(L'@%J.249(-2%)-/(4+%'K/K%=$%8.55-#9(2/%5"/'%J".)98K%%

%

!"#$%&'(!"#$%&'()*$"+,")%&"-(..(/&&"'(012$+.&"(.$%*)&$)3.&!"4%&"03'/&."+,"567"$+.&8"(09")%&"03'/&."(09")1:&"+,"$+2:.+$&88+.8"(09" ;<=" />+$?8" (.&" *':>&'&0)()*+029&:&09&0)@" (8" (.&" )%&":+8*)*+08"+,")%&"567"(09"0+02567"/>+$?8"+0")%&"$%*:A"

6:(8%5"5'#%")8-%@'84#(='8%"%8-J9G"#'%#'2@'#(2/%5(5')(2'% 9:"9% #.28%'JJ(4('29)$% -2% 9:(8% "#4:(9'49.#'K% P9% .8'8% =(22(2/% 9-% (24#'"8'%5"#"))')(81% "2@% #'@.4'% 1'1-#$% ="2@G(@9:+% G:()'% "E-(@(2/% 9:'%5#-=)'18%-J%8-1'%5#'E(-.8%9()'I="8'@%"#4:(9'49.#'8K%P15)'1'29(2/%9:'% #'2@'#'#% (2% 8-J9G"#'%"))-G8%'L(89(2/% J'"9.#'8% 9-%='%-59(1(Q'@%="8'@% -2% G-#B)-"@% "2@% "))-G8% 2'G% J'"9.#'8% 9-% ='% "@@'@K% 7-#%'L"15)'+% 5#-/#"11"=)'% =)'2@(2/% "2@% -#@'#I(2@'5'2@'29%9#"285"#'24$%J(9%'"8()$%(29-%9:'%!"##"=''%8-J9G"#'%5(5')(2'K%%

7(2"))$+% 9:(8%5"5'#%@'84#(='8%"%5#-/#"11(2/%1-@')% 9:"9% 8.55-#98%1-#'% /'2'#")% 5"#"))')% "55)(4"9(-28+% 8.4:% "8% (1"/'% 5#-4'88(2/+%5:$8(4")%8(1.)"9(-2+%"2@%1'@(4")%\%J(2"24(")%"2")$9(48K%!"##"=''S8%8.55-#9% J-#% (##'/.)"#% @"9"% 89#.49.#'8% "2@% (98% 84"99'#I/"9:'#%4"5"=()(9$% 1"B'% (9% 8.(9"=)'% J-#% 9:'8'% 9:#-./:5.9% "55)(4"9(-28% "8%@'1-289#"9'@%=$%-.#%84")"=()(9$%"2@%5'#J-#1"24'%"2")$8(8K%

ACM Transactions on Graphics, Vol. 27, No. 3, Article 18, Publication date: August 2008.

A GPU that looks a lot like a many-core CPU.However, cores are specialized for graphics ...

51

Page 52: EECS 150 Components & Design Techniques For Digital Systems …cs150/sp09/Lecture/lec28-gpu.pdf · A cube whose faces are made up of triangles. This is a 3-D model of a cube -- model

UC Regents Spring 2009 © UCBEECS 150 L28: Graphics Processors

Larrabee CPU core microarchitecture !

!"#$%&'(

%')$*%+$,%-

+)./0.1

2,'3/(4%.*%4'/(%5'2')),)/$-

6%7/'4'2'%/$%

'%-+)./0&*

2,%4,(,2')%

5+25*$,%-/&2*

52*&,$$*

2%89*(4,./2'%

,.%')6%

:;;<=%>,'.+

2/(4%,/4

1.%/(0*23,2%&*

2,$?%,'&1%&'5

'@),%*

>%,A,&+./(4%

>*+2%$/-

+).'(,*+$%.12,'3

$?%'(3%'%$1

'2,3%&'&1

,6%B+.%4/C,(%/.$%>*

&+$%

*(%&*--,2&/')%

$,2C,2%D*2E)*'3$?%7/'4'2'%

)'&E$%'2&1

/.,&.+2')%

,),-,(.$%&2/./&')%

>*2%C/$+')%&*

-5+./(4?%$+

&1%'$%F

GHI%>)*

'./(40

5*/(.%,A,&+./*(?%$&'..,204

'.1,2?%*

2%>/A,3%>+(&./*(%.,A.+2,%$+

55*2.6%

!"#$%&&%'((#)%&*+%&(#,&-./0(-01&(#

J/4+2,%K%'@*C,%$1*D$%'%@)*&E%3/'42'-%*>%.1,%@'$/&%

L'22'@

,,%'2&1

/.,&.+2,6%L

'22'@,,%/$%3

,$/4(,3%'2*

+(3%-+)./5),%/(

$.'(./'./*

($%

*>%'(

%/(0*23,2%!

"#%&*2,%.1

'.%/$%'+4-,(.,3%D/.1%'%D

/3,%C,&.*

2%52*&,$$*

2%MN"#O6%!

*2,$%&*

--+(/&'.,%.1

2*+41%'%1

/410@'(3D/3.1%

/(.,2&*

((,&.%(

,.D*2E%D/.1%$*-,%>/A

,3%>+(&./*(%)*4/&?%-

,-*2P%GQR

%/(.,2>'&,$?%'(

3%*.1,2%(

,&,$$'2P%GQR

%)*4/&?%3

,5,(3/(4%*(%.1,%,A

'&.%'55)/&'./*

(6%J*2%,A'-5),?%'(%/-5),-,(.'./*

(%*>%L'22'@

,,%'$%'%

$.'(30')*(,%S"#%D*+)3%.P5/&'))P

%/(&)+3,%'%"

!G,%@+$6%

T1,%3'.'%/(

%T'@),%K

%-*./C'.,$%L

'22'@,,U$%+

$,%*>%/(0*23,2%&*

2,$%D/.1%D/3,%N"#$6%T1,%-/33),%&*

)+-(%$1*D$%.1,%5,'E%5,2>*

2-'(&,%

*>%'%-

*3,2(%*+.0*>0*23,2%!

"#?%.1,%G(.,) V%!*2,W

:%I+*%52*&,$$*

26%T1,%2/4

1.01'(3%&*)+-(%$1*D$%'%.,$.%!

"#%3,$/4

(%@'$,3

%*(%.1,%

",(./+-V%52*&,$$*

2?%D1/&1%D'$%/(

.2*3+&,3%/(%KXX:%'(3%+$,3%3+')0

/$$+,%/(

0*23,2%/(

$.2+&./*(%,A,&+./*(%8Y)5,2.%K

XXZ=6%T

1,%",(./+-%

52*&,$$*

2%&*2,%D

'$%-*3/>/,3

%.*%$+55*2.%>*

+2%.12,'3

$%'(3%'%K

[0

D/3,%N"#6%T1,%>/(

')%.D*%2*D$%$5

,&/>P%.1,%(+-@,2%*

>%(*(0C,&.*

2%/($.2+&./*($%.1'.%&'(

%@,%/$$+

,3%5,2%&)*

&E%@P%*(,%!"#%'(3%.1,%.*.')%

(+-@,2%*

>%C,&.*

2%*5,2'./*

($%.1'.%&'(

%@,%/$$+

,3%5,2%&)*

&E6%T1,%.D

*%

&*(>/4+2'./*

($%+$,%2*

+41)P%.1,%$'-

,%'2,'%'(3%5*D,26%%

\%!"#%&*2,$]%

:%*+.0*>0*23,2%

K;%/(0*23,2%

G($.2+&./*(%/$$+

,]%^%5,2%&)*

&E%

:%5,2%&)*

&E%

N"#%5,2%&*

2,]%^0D/3,%FF_%

K[0D/3,%

L:%&'&1

,%$/`,]%^%HB%

^%HB%

F/(4),0$.2,'-

]%"!#$%!&'(

&)!!

*!#$%!&'(

&)!

N,&.*

2%.12*+415+.]%

+!#$%!&'(

&)!

,-.!#$%!&'(

&)!

!"#$%!&

"!#$%&'(&')*+)!,-.!/0

&')*+)!1

23!4'567)/-'0"!*+-/8

0/08!

%9+!6)'4+--'

)!(')!/04)+7

-+*!%9)'$896$%!47

0!)+-$

:%!/0!;!%9+!6+7<!

-/08:+&-%)+7

5!6+)('

)5704+=!>

$%!?@A!%9

+!6+7<!,+4%'

)!%9)'$896$%!

B/%9!)'$89:C!%9

+!-75+!7)+7!70*!6'B+).!D

9/-!*/((+)+0

4+!/-!E@A!/0

!FG#2H=!-/0

4+!%9+!B/*+!I23!-$66')%-!($

-+*!5$:%/6:C&7

**!>$%!HHJ!

*'+-0K%.!D

9+-+!/0

&')*+)!4'

)+-!7)+!0

'%!G7))7>++=!>

$%!7)+!-/5

/:7).!

T1,%.,$.%3

,$/4(%/(%T'@),%K%/$%(

*.%/3,(./&')%.*

%L'22'@

,,6%T*%52*C/3,%

'%-*2,%3

/2,&.%&*-5'2/$*

(?%.1,%/(

0*23,2%&*

2,%.,$.%3,$/4

(%+$,$%.1

,%$'-

,%52*&,$$%'(

3%&)*&E%2'.,%'$%.1

,%*+.0*>0*23,2%&*

2,$%'(3%/(&)+3,$%

(*%>/A,3%>+(&./*(%42'51/&$%

)*4/&6%

T1/$%&*-5'2/$*

(%-*./C'.,$%

3,$/4

(%3,&/$/*

($%>*2%L'22'@

,,%$/(&,%/.%$1

*D$%.1'.%'%D

/3,%N"#%D/.1%

'%$/-5),%/(

0*23,2%&*

2,%'))*D$%!"#$%.*%2,'&1

%'%32'-

'./&'))P%1/41,2%

&*-5+.'./*

(')%3,($/.P%>*2%5'2')),)%'5

5)/&'./*

($6%

F,&./*

($%Z6K%.*%Z6<%@,)*D%3,$&2/@

,%.1,%E,P%>,'.+

2,$%*>%.1,%L'22'@

,,%'2&1

/.,&.+2,]%

.1,%!"#%&*2,?%

.1,%$&')'2%

+(/.%'(3%&'&1

,%&*(.2*)%

/($.2+&./*($?%.1

,%C,&.*

2%52*&,$$*

2?%.1,%/(.,25

2*&,$$*

2%2/(4%(,.D*2E?%

'(3%.1,%&1

*/&,$%>*

2%D1'.%/$%/-

5),-,(.,3%/(%>/A,3%>+(&./*(%)*4/&6%

!"2#$%&&%'((#34&(#%5*#3%-.(6#

J/4+2,%Z

%$1*D$%'%$&1

,-'./&%*

>%'%$/(4),%L

'22'@,,%!

"#%&*2,?%5

)+$%

/.$%&*((,&./*

(%.*%.1,%*(03/,%/(

.,2&*((,&.%(

,.D*2E%'(3%.1,%&*

2,U$%)*&')%$+

@$,.%*

>%.1,%L:%&'&1

,6%T1,%/($.2+&./*(%3,&*3,2%$+

55*2.$%.1

,%$.'(

3'23%",(./+-%52*&,$$*

2%Aa[%/($.2+&./*(%$,.?%D

/.1%.1,%'3

3/./*(%

*>%(,D%/($.2+&./*($%.1'.%'2,%3

,$&2/@,3%/(%F,&./*

($%Z6:%'(3%Z6Z6%T*%

$/-5)/>P%.1,%3,$/4

(%.1,%$&')'2%

'(3%C,&.*

2%+(/.$%+$,%$,5'2'.,%

2,4/$.,2%$,.$6%I

'.'%.2'($>,22,3

%@,.D,,(%.1,-%/$%D

2/..,(%.*%-,-*2P%

'(3%.1,(%2,'3

%@'&E%/(%>2*-%.1,%LK%&'&1

,6%

L'22'@

,,U$%LK%&'&1

,%'))*D$%)*D0)'.,(

&P%'&&,$$,$%

.*%&'&1

,%-,-*2P%/(.*%.1,%$&')'2%'(

3%C,&.*

2%+(/.$6%T

*4,.1,2%D

/.1%L'22'@

,,U$%)*'30*5%N"#%/($.2+&./*($?%.1

/$%-,'($%.1

'.%.1,%LK%&'&1

,%&'(%@,%

.2,'.,3%$*-,D1'.%)/E

,%'(%,A.,(3,3%2,4

/$.,2%>/),6%T1/$%$/4

(/>/&'(

.)P%

/-52*C,$%.1

,%5,2>*

2-'(&,%*

>%-'(P%')4*2/.1-$?%,$5

,&/'))P%D/.1%.1,%

&'&1,%&*(.2*)%/($.2+&./*($%3,$&2/@

,3%F,&./*

(%Z6:6%T1,%$/(4),0

.12,'3

,3%",(./+-%52*&,$$*

2%52*C/3,3%'(%a9B%G&'&1

,%'(3%a9B%

I&'&1

,6%b,%$5

,&/>P%'%Z

:9B%G&'&1

,%'(3%Z:9B%I&'&1

,%.*%$+55*2.%

>*+2%,A,&+./*(%.12,'3

$%5,2%!

"#%&*2,6%

%

'()*+%!L

"!G7))7>++!1

23!4')+!7

0*!7--'4/7%+*!-C-%+5

!>:'4<-"!%9

+!123!/-!*

+)/,+*!()'5!%9+!2+0%/$5!6)'4+--'

)!/0&')*+)!*

+-/80=!6:$-!

ME&>/%!/0

-%)$4%/'0-=!5

$:%/&%9

)+7*/08!70*!7!B/*+!I23.!J749!4')+!

97-!(7-%!744+--!%'

!/%-!?NMOP!:'47:!-$>-+%!'

(!7!4'9+)+0

%!?0*!:+,+:!

4749+.!GQ!4749+!-/R+-!7

)+!L?OP!(')!S47

49+!70*!L?OP!(')!T4749+.!

U/08!0+%B')<!7

44+--+-!67--!%9

)'$89!%9+!G?!4749+!(')!4'

9+)+0

4C.!

L'22'@

,,U$%4)*@')%:(3%),C,)%ML:O%&'&1

,%/$%3/C/3,3%/(.*%$,5'2'.,%

)*&')%

$+@$,.$?%

*(,%5,2%!"#%&*2,6%_'&1%!"#%1'$%'%>'$.%

3/2,&.%

'&&,$$%5'.1%.*%/.$%*

D(%)*&')%$+

@$,.%*

>%.1,%L:%&'&1

,6%I'.'%2,'3

%@P%'%

!"#%&*2,%/$%$.*

2,3%/(%/.$%L

:%&'&1

,%$+@$,.%'(

3%&'(

%@,%'&&,$$,3

%c+/&E)P?%/(%5'2')),)%D

/.1%*.1,2%!

"#$%'&&,$$/(

4%.1,/2%*

D(%)*&')%L

:%

&'&1,%$+

@$,.$6%I

'.'%D2/..,(

%@P%'%!

"#%&*2,%/$%$.*

2,3%/(%/.$%*

D(%L:%

&'&1,%$+

@$,.%'(

3%/$%>)+

$1,3%>2*-%*.1,2%$+

@$,.$?%/>%(

,&,$$'2P6%T1,%

2/(4%(,.D*2E%,($+2,$%&*

1,2,(

&P%>*2%$1

'2,3%3'.'?%'$%3

,$&2/@,3%/(%

F,&./*

(%Z6^6%b

,%$5,&/>P

%:<[9B%>*2%,'&1

%L:%&'&1

,%$+@$,.6%T

1/$%

$+55*2.$%)'24

,%./),%$/`,$%>*2%$*

>.D'2,%2,(

3,2/(

4?%'$%3

,$&2/@,3%/(%

F,&./*

(%^6K6%

!"7#8-%9%&#:5/0#%5*#3%-.(#3450&49#;560&1-0/456#

L'22'@

,,U$%$&')'2%5/5,)/(,%/$%3

,2/C,3%>2*-%.1,%3+')0/$$+

,%",(./+-%

52*&,$$*

2?%D1/&1%+$,$%

'%$1*2.?%/(,A5,($/C,%,A,&+./*(%5/5,)/(,6%

L'22'@

,,%52*C/3,$%-

*3,2(%'33/./*($%$+

&1%'$%-

+)./0.1

2,'3/(4?%[^0

@/.%,A

.,($/*($?%'(

3%$*51/$./&'.,3

%52,>,.&1

/(46%T1,%&*

2,$%$+55*2.%

.1,%>+))%"

,(./+-%52*&,$$*

2%Aa[%/($.2+&./*(%$,.%

$*%.1,P%&'(%2+(%

,A/$./(

4%&*3,%/(&)+3/(4%*5,2'./(

4%$P$.,-

%E,2(,)$%'(

3%'55)/&'./*

($6%

L'22'@

,,%'33$%(,D%$&')'2%/(

$.2+&./*($%$+

&1%'$%@

/.%&*+(.%'(

3%@/.%

$&'(?%D1/&1%>/(3$%.1,%(,A.%$,.%@

/.%D/.1/(%'%2,4

/$.,26%%

L'22'@

,,%')$*

%'33$%(,D%/($.2+&./*($%'(3%/($.2+&./*(%-*3,$%>*2%

,A5)/&/.%&'&1

,%&*(.2*)6%_A'-5),$%/(

&)+3,%/(

$.2+&./*($%.*%52,>,.&1

%3'.'%/(

.*%.1,%LK%*2%L:%&'&1

,$%'(3%/($.2+&./*(%-*3,$%.*

%2,3+&,%.1

,%52/*2/.P%*>%'%&'&1

,%)/(,6%J*2%,A'-5),?%$.2,'-

/(4%3'.'%

.P5/&'))P

%$D,,5$%,A

/$./(4%3'.'%*

+.%*>%'%&'&1

,6%L'22'@

,,%/$%'@),%.*

%-'2E%,'&1

%$.2,'-

/(4%&'&1

,%)/(,%>*

2%,'2)P%,C/&./*

(%'>.,2%/.%/$%'&&,$$,3

6%T1,$,%

&'&1,%&*(.2*)%/($.2+&./*($%')$*

%'))*D%.1,%L:%&'&1

,%.*%@,%+$,3%

$/-/)'2)P

%.*%'%$&2'.&1

5'3%-,-*2P?%D1/),%2,-

'/(/(4%>+))P%&*1,2,(

.6%%

b/.1/(%'%$/(

4),%&*

2,?%$P(&12*(/`/(4%'&&,$$%.*

%$1'2,3

%-,-*2P%@P%

-+)./5),%.1

2,'3$%/$%/(

,A5,($/C,6%T

1,%.12,'3

$%*(%'%$/(

4),%&*

2,%$1'2,%

.1,%$'-

,%)*&')%

LK%&'&1

,?%$*%'%$/(4),%'.*-/&%$,-

'51*2,%2,'3

%D/.1/(%.1,%LK%&'&1

,%/$%$+>>/&/,(

.6%FP(&12*(/`/(4%'&&,$$%@

,.D,,(%

Larra

bee: A

Many-C

ore

x86 A

rchite

ctu

re fo

r Vis

ual C

om

putin

g • 1

8:3

AC

M T

ransactio

ns o

n G

raphics, V

ol. 2

7, N

o. 3

, Article 1

8, P

ublicatio

n d

ate: August 2

008.

16-lane Vector Unit takes 2/3 of CPU core chip area.

Handles the graphics “heavy lifting”.

x86-ISA Scalar Unit

based on the 1992 original

Pentium.

In-order static

pipeline, dual issue.

52

Page 53: EECS 150 Components & Design Techniques For Digital Systems …cs150/sp09/Lecture/lec28-gpu.pdf · A cube whose faces are made up of triangles. This is a 3-D model of a cube -- model

UC Regents Spring 2009 © UCBEECS 150 L28: Graphics Processors

Vector Unit specializations ...

!

!"#$%&#'()*

+',(%,(!*+'('-

&'.,%/'0(,%.

)'(%$(+'1"%+',(%.

$'+2&+*)',,*

+(#*)3,4(56%,(%,(7(8

'##(3.*8.(&+*9#'!(%.(!"#$%2&

+*)',,*

+(:',%;

.4((

<"#$%2%,,"

'(=>?()*+',(

*@$'.(#*,'(

&'+@*

+!7.)'(

:"'($*($6'(

:%@@%)"

#$A(*@(@%.:%.;(%.,$+")$%*.,($67$()7.('-')"$'($*;'$6'+4(

B7++79

''C,(:"7#2%,,"

'(:')*:'+(6

7,(7(6%;6(!"#$%2%,,"

'(+7$'(%.()*:'(

$67$(8

'C/'($',$':

4(56'(&7%+%.

;(+"#',(@*

+($6'(&+%!7+A(7.:(,')*

.:7+A(

%.,$+")$%*.(&%&',(7+'(

:'$'+!

%.%,$%)0(

86%)6(7##*8,()*!&%#'+,(

$*(

&'+@*

+!(*@@#%.

'(7.7#A,%,(8

%$6(7(8

%:'+(,)*

&'($67.(7(+"

.$%!'(*"$2*@2

*+:'+(%.,$+")$%*.(&%)3'+()7.4(D##(%.,$+")$%*.,()7.(%,,"

'(*.($6'(

&+%!7+A(&%&'#%.'0(86%)6(!%.%!%E',(

$6'()*!9%.7$*+%7#(

&+*9#'!,(

@*+(7()*

!&%#'+4(5

6'(,')*

.:7+A(&%&'#%.'()7.

('-')"$'(7(#7+;

'(,"9,'$(

*@($6'(,)7#7+(-

FG(%.,$+")$%*.(,'$0(%.

)#":%.;(#*7:,0(,$*

+',0(,%!&#'(

DB?(*&'+7$%*

.,0(9+7.)6',0()7)6

'(!7.%&"#7$%*

.(%.,$+")$%*.,0(7.

:(

/')$*

+(,$*+',4(H

')7",'($6

'(,')*.:7+A(&%&'#%.'(%,(+'#7$%/

'#A(,!

7##(7.:()6'7&0($6'(7+'7(7.

:(&*8'+(8

7,$':(9A(@7%#%.

;($*(:"7#2%,,"

'(*.(

'/'+A()A)#'(

%,(,!7##4(

I.(*"+(7.7#A,%,0(

%$(%,(+'#7$%/

'#A('7,A

(@*+(

)*!&%#'+,($*

(,)6':"#'(:"7#2%,,"

'(%.,$+")$%*.,4(

J%.7##A0(B7++79

''(,"&&*+$,(@*

"+($6+'7:

,(*@('-')"$%*.0(8%$6(,'&

7+7$'(+';%,$'+(,'$,(&

'+($6+'7:

4(K8%$)6%.;($6+'7:

,()*/'+,()7,',(8

6'+'($6

'()*!&%#'+(

%,(".79#'($*(,)6':"#'()*:'(8%$6*"$(,$7##,4(

K8%$)6%.;(

$6+'7:

,(7#,*()*/'+,(&

7+$(*@($6'(#7$'.

)A($*(#*7:(@+*!($6'(BL()7)6

'($*($6'(BM()7)6

'0(@*+($6*,'()7,',(8

6'.(:7$7()7.

.*$(9'(&+'@'$)6

':(

%.$*($6'(BM()7)6

'(%.(7:/7.)'4(=

7)6'(",'(%,(!

*+'('@@')$%/

'(86'.(

!"#$%&#'($6

+'7:,(+"

..%.;(*.($6'(,7!

'()*+'("

,'($6'(,7!

'(:7$7,'$0(

'4;4(+'.

:'+%.

;($+%7.

;#',($*

($6'(,7!

'($%#'4((

!"!#$%&'()#*)(&%++,-.#/-,'#

B7++79

''(;7%.,(%$,()*

!&"$7$%*

.7#(:'.,%$A(@+*!($6'(MG28%:'(/')$*

+(&+*)',,%.

;(".%$(NO

>?P0(8

6%)6('-')"$',(%.

$';'+0(,%.

;#'2&

+')%,%*.(

@#*7$0(7.:(:*"9#'2&+')%,%*

.(@#*7$(%.,$+")$%*.,4(56'(O>?(7.:(%$,(

+';%,$'+,(7+'(7&

&+*-%!7$'#A

(*.'($6%+:($6'(7+'7(*

@($6'(=>?()*+'(9"$(

&+*/%:'(!*,$(*@($6'(%.$';'+(7.

:(@#*7$%.;(&*%.$(&'+@*

+!7.)'4(J

%;"+'(

Q(,6*8,(7(9

#*)3(:%7;+7!(*@($6'(O>?(8%$6($6'(BM()7)6

'4((

(

!"#$%&!'"!#$%&'

(!)*+&!,-'%.!/+01(02"!&3$!#45!6)77'(&6!

89

'7$(0*/!+*6&()

%&+'*6:!;&!6)

77'(&6!6<

+==-+*1!&3$!($1

+6&$(!+*7)&6!0*/!

*)2$(+%!%'

*>$(6+'

*!0*/!($7

-+%0&+'*!'*!&3$!2

$2'(?!+*

7)&:!@

06.!

($1+6&$(6!0

--'<!7($/+%0&+*1!&3$!($6)

-&+*1!>$%&'

(!<(+&$6:!

R'()6*,'(7(MG28%:'(O>?(7,(7($+7:

'*@@(9'$8''.(%.)+'7,':

()*!&"$7$%*

.7#(:'.,%$A(7.:($6'(:%@@%)"

#$A(*@(*9$7%.%.;(6%;6(

"$%#%E7$%*

.(@*+(8%:'+(O>?,4(S7+#A(7.7#A,%,(

,";;',$':

(FFT(

"$%#%E7$%*

.(@*+($A&%)7#(&

%-'#(,6

7:'+(8

*+3#*7:,(%@(M

G(#7.

',(&+*)',,(

MG(,'&

7+7$'(&%-'#,(*

.'()*

!&*.'.$(7$(7($%!

'0($67$(%,0(8

%$6(,'&

7+7$'(%.,$+")$%*.,($*

(&+*)',,(+':

0(;+''.

0('$)40(@*+(MG(&%-'#,(7$(7($%!

'0(%.,$'7:

(*@(&+*)',,%.

;(!"#$%&#'()*

#*+()6

7..'#,(7$(*

.)'4(5

6'(U/%:%7(

V'J*+)'(

F(*&'+7$',(

%.(7(,%!%#7+(

@7,6%*.0(*+;7.%E%.;(%$,(,)7#7+(

KI<W(&+*)',,*

+,(%.(;+*"&,(*@(XL($67$('-')"$'($6'(,7!

'(%.,$+")$%*.(YU%)3*##,('$(7#4(L

ZZF[4(5

6'(!

7%.(:%@@'+'.

)'(%,($67$(%.

(B7++79

''($6'(#**&()*.$+*#0()7)6

'(!7.7;'!'.$0(7.:(*$6'+(,")6(

*&'+7$%*

.,(7+'()*

:'($67$(+"

.,(%.(&7+7##'#(8

%$6($6'(O>?0(%.,$'7:

(*@(

9'%.;(%!&#'!'.$':(7,(@%-

':(@".)$%*.(#*;%)4(

B7++79

''(O>?(%.,$+")$%*.,(7##*

8("&($*($6+''(,*

"+)'(*

&'+7.

:,0(*.'(

*@(86%)6()7.

()*!'(:%+')$#A

(@+*!($6'(BM()7)6

'4(I@($6'(:7$7(6

7,(

9''.(&+'@'$)6

':(%.$*($6'()7)6

'0(7,(:',)+%9

':(%.(K')$%*

.(X4L0($6'.(

$6'(BM()7)6

'(%,(%.('@@')$(7.

('-$'.:':(+';

%,$'+(@%#'4(F29%$(".*+!0(F2

9%$("%.$0(MG29%$(,%.

$(7.:(MG29%$(@#*

7$(:7$7()7.

(9'(+'7:

(@+*!($6'(

)7)6'(7.

:()*./'+$':

($*(XL29%$(@#*

7$,(*+(XL29%$(%.

$';'+,(8

%$6(.*(#*,,(

*@(&'+@*

+!7.)'4(5

6%,(,%;

.%@%)7.

$#A(%.)+'7,',($6

'(7!*".$(*@(:7$7(

$67$()7.

(9'(,$*

+':(%.($6'()7)6

',(7.:(7#,*

(+':")',($6

'(.'':(@*+(

,'&7+7$'(:

7$7()*./'+,%*

.(%.,$+")$%*.,4(

56'(.'-$(,$7;

'(%,($*(7#%;

.($6'(:7$7(@+*

!(+';%,$'+,(7.

:(!'!*+A(8%$6(

$6'(&+*)',,%.

;(#7.',(%.

($6'(O>?4(\';%,$'+(:

7$7()7.(9'(,8

%EE#':(%.(

7(/7+%'$A

(*@(87A,0('4;

4($*(,"&&*+$(!

7$+%-(!"#$%&#%)7$%*

.4(W7$7(@+*

!(

!'!*+A()7.(9'(+'&#%)7$':

(7)+*

,,($6'(O>?(#7.',4(

56%,(%,(7(

)*!!*.(*&'+7$%*

.(%.(9*$6(;+7&6%),(7.

:(.*.2;+7&6%),(&

7+7##'#(:7$7(

&+*)',,%.

;0(86%)6(,%;.%@%)7.

$#A(%.)+'7,',($6

'()7)6'('@@%)%'.

)A4(

56'(O>?(,"&&*+$,(7(8

%:'(/7+%'$A

(*@(%.,$+")$%*.,(*.(9*$6(%.$';'+(

7.:(@#*7$%.;(&*%.$(:7$7(

$A&',4(

56'(%.,$+")$%*.(,'$(

&+*/%:',($6'(

,$7.:7+:(7+%$6

!'$%)(*

&'+7$%*

.,0(%.

)#":%.;(@",':(!"#$%&#A27::0(7.:(

$6'(,$7.

:7+:(#*;%)7#(*

&'+7$%*

.,0(%.

)#":%.;(%.,$+")$%*.,($*

('-$+7)$(

.*.29A$'27#%;

.':(@%'#:

,(@+*!(&%-'#,4(

56','(

7+'(7##(

#*7:2*&(

%.,$+")$%*.,0(8

6%)6(+'7:

(@+*!(+';

%,$'+,(*+(!'!*+A(7.:(8+%$'($6

'(+',"

#$($*(7(/

')$*+(+';

%,$'+4(D::%$%*.7#(#*

7:(7.:(,$*+'(%.

,$+")$%*.,(

,"&&*+$(7(8%:'+(/7+%'$A

(*@()*./'+,%*

.,(9'$8''.(@#*7$%.;(&*%.$(

/7#"',(7.

:($6'(#',,()*

!!*.(*+(!*+'()*

!&#'-(:7$7(@*

+!7$,(@*

".:(

*.(!*,$(V

>?,4(?

,%.;(,'&

7+7$'(%.,$+")$%*.,(@*+($6','(@*

+!7$,(,7/

',(,%;.%@%)7.

$(7+'7(7.:(&*8'+(7$(7(,!

7##(&'+@*

+!7.)'()*

,$4(

56'(O>?(%.,$+")$%*.(,'$(7#,*

(%.)#":',(;

7$6'+(7.

:(,)7$$'+(,"

&&*+$0(

$67$(%,0(#*

7:,(7.

:(,$*+',(@+*

!(.*.2)*.$%;"*",(7:

:+',,',4(I.

,$'7:(*@(

#*7:%.;(7(M

G28%:'(/')$*

+(@+*!(7(,%.

;#'(7:

:+',,0(M

G('#'!

'.$,(7+'(

#*7:':(@+*!(*+(,$*+':($*("&($*(MG(:%@@'+'.

$(7::+',,',(

$67$(7+'(

,&')%@%':

(%.(7.*$6'+(/')$*

+(+';%,$'+4(

56%,(7##*8,(MG(,67:'+(

%.,$7.

)',($*(9'(+".(%.(&7+7##'#0(

'7)6(*@(86%)6(7&&'7+,(

$*(+".(

,'+%7##A0('/'.(86'.(&'+@*

+!%.;(7++7A

(7))',,',(

8%$6()*!&"$':(

%.:%)',4(5

6'(,&

'':(*@(;7$6'+],)7$$'+(%,(#%!

%$':(9A($6'()7)6

'0(86%)6(

$A&%)7##A

(*.#A(7))',,',(*

.'()7)6

'(#%.'(&'+()A

)#'4(^*8'/'+0(!

7.A(

8*+3#*7:,(67/'(6%;6#A()*6'+'.

$(7))',,(

&7$$'+.

,0(7.:($6'+'@*

+'($73'(!")6(#',,($6

7.(MG()A)#',($*

('-')"$'4((

J%.7##A0(B7++79

''(O>?(%.,$+")$%*.,()7.

(9'(&+':%)7$':

(9A(7(!

7,3(

+';%,$'+0(

86%)6(67,(*

.'(9%$(&'+(/

')$*+(#7.

'4(56'(!7,3()*.$+*#,(

86%)6(&7+$,(*

@(7(/')$*

+(+';%,$'+(*

+(!'!*+A(#*)7$%*

.(7+'(8

+%$$'.(

7.:(86%)6(7+'(#'@$("

.$*")6':4(J*+('-

7!&#'0(7(,)7#7+(%@2$6

'.2'#,'(

)*.$+*#(,$+")$"+'()7.(9'(!7&&':(*.$*($6'(O>?(9A(",%.;(7.(

%.,$+")$%*.($*(,'$(7(!

7,3(+';

%,$'+(97,':

(*.(7()*

!&7+%,*

.0(7.:($6'.(

'-')"$%.;(9*$6(%@(7.

:('#,'()#7"

,',(8%$6(*&&*,%$'(&

*#7+%$%',(*

@($6'(

!7,3(+';

%,$'+()*.$+*##%.;(86'$6'+($*

(8+%$'(+',"

#$,4(=#7",',()7.

(9'(

,3%&&':('.$%+'#A

(%@($6'(!

7,3(+';

%,$'+(%,(7##(E'+*,(*+(7##(*

.',4(5

6%,(

+':")',(9

+7.)6(!%,&+':%)$%*

.(&'.7#$%',(@*

+(,!7##()#7"

,',(7.:(;%/',(

$6'()*

!&%#'+C,(%.

,$+")$%*.(,)6

':"#'+(;

+'7$'+(@+'':*!4((

56'(O>?(7#,*

(",',(

$6','(

!7,3,(@*+(&7)3':(#*7:(7.:(,$*+'(

%.,$+")$%*.,0(86%)6(7))',,(

'.79#':('#'!

'.$,(@+*!(,'1"'.$%7#(

#*)7$%*

.,(%.(!'!*+A4(56%,('.79#',(

$6'(&+*;+7!!'+($*(9".:#'(

,&7+,'(,$+7.

:,(*@('-

')"$%*.(,7$%,@A

%.;()*!&#'-(9+7.)6()*.:%$%*.,(

%.$*(7(@*

+!7$(!

*+'('@@%)%'.

$(@*+(/')$*

+()*!&"$7$%*

.4((

!"0#1-'%)2*)(&%++()#3,-.#4%'5()6#

B7++79

''(",',(7(9

%2:%+')$%*

.7#(+%.

;(.'$8*+3($*(7##*

8(7;'.$,(,"

)6(

7 ,(=>?()*+',0(B

L()7)6

',(7.:(*$6'+(#*

;%)(9

#*)3,($*()*!!".%)7$'(

8%$6('7)6

(*$6'+(8

%$6%.($6'()6

%&4(R

6'.(,)7#%.

;($*(!*+'($6

7.(MG(

)*+',0(8

'(",'(!

"#$%&#'(,6

*+$(#%.

3':(+%.;,4((

S7)6(+%.;(:7$72&

7$6(%,(_

ML29%$,(8

%:'(&'+(:

%+')$%*.4(D##($6

'(+*"$%.;(

:')%,%*

.,(7+'(!

7:'(9'@*+'(%.

`')$%.;(!',,7;

',(%.$*($6'(.'$8*+34(

J*+('-

7!&#'0('7)6

(7;'.$()7.

(7))'&$(7(!

',,7;'(@+*

!(*.'(:%+')$%*

.(

18:4

• L. S

eile

r et a

l.

AC

M T

ransactio

ns o

n G

raph

ics, Vo

l. 27

, No

. 3, A

rticle 18

, Pu

blicatio

n d

ate: Au

gu

st 20

08

.

Load/store instruction variants may specify type conversion.

This supports a toll-free way store low resolution data in packed form, while letting the CPU compute on the data in a high-resolution format.

53

Page 54: EECS 150 Components & Design Techniques For Digital Systems …cs150/sp09/Lecture/lec28-gpu.pdf · A cube whose faces are made up of triangles. This is a 3-D model of a cube -- model

UC Regents Spring 2009 © UCBEECS 150 L28: Graphics Processors

Can two EECS students prototype a GPU in one semester?

54

Page 55: EECS 150 Components & Design Techniques For Digital Systems …cs150/sp09/Lecture/lec28-gpu.pdf · A cube whose faces are made up of triangles. This is a 3-D model of a cube -- model

-Xilinx Virtex-5 ML505-110T-Vertex Shaders : Red-Pixel Shaders: Yellow-Rasterizer: Cyan-DDR2 Interface: Green

60% slice usage

55

Page 56: EECS 150 Components & Design Techniques For Digital Systems …cs150/sp09/Lecture/lec28-gpu.pdf · A cube whose faces are made up of triangles. This is a 3-D model of a cube -- model

UC Regents Fall 2008 © UCBCS 194-6 L1: Virtex-5 Microarchitecture

Fall 2008 CS 194-6: Pure project course.

Project teams: 1-to-3 person groups. This Friday: setting project topics. 10-noon, 125 Cory (not 119 Cory!).

No exams.

Mondays: Lectures teach topics useful for the project

No homework.

Fridays: Project

check-ups. 56

Page 57: EECS 150 Components & Design Techniques For Digital Systems …cs150/sp09/Lecture/lec28-gpu.pdf · A cube whose faces are made up of triangles. This is a 3-D model of a cube -- model

UC Regents Fall 2008 © UCBCS 194-6 L1: Virtex-5 Microarchitecture

CS 194-6: Project red-letter days.

Grade is based on revised versions of presentation slides (email PDFs to Greg and John after presentation).

Specification Design ReviewMon, Sept 29

Implementation Design Review

Mon, Nov 3

Final Presentation

Fri, Dec 5

57

Page 58: EECS 150 Components & Design Techniques For Digital Systems …cs150/sp09/Lecture/lec28-gpu.pdf · A cube whose faces are made up of triangles. This is a 3-D model of a cube -- model

YONG-JIN KWONCHEN SUN

PEGASUSPipeline Engineered Graphics Accelerator Supporting Unified

Shaders

58

Page 59: EECS 150 Components & Design Techniques For Digital Systems …cs150/sp09/Lecture/lec28-gpu.pdf · A cube whose faces are made up of triangles. This is a 3-D model of a cube -- model

Top level Block Diagram

59

Page 60: EECS 150 Components & Design Techniques For Digital Systems …cs150/sp09/Lecture/lec28-gpu.pdf · A cube whose faces are made up of triangles. This is a 3-D model of a cube -- model

3D API

<Constants>

Transform Matrix (4x4)

Inverse Perspective Matrix (4x4)

Transposed Rotation Matrix (4x4)

</Constant>

<Lights>

0, Vertex Index1

1, Vertex Index2

</Lights>

<Vertex>

0: x y z r g b nx ny nz

1: x y z r g b nx ny nz

2: x y z r g b nx ny nz

</Vertex>

<Triangle>

0: V1 V2 V3

1: V3 V4 V5

</Triangle>

Bit-widths:

Matrix – 256b

Element: 16 bits

Lights – 32b

Light Index: 16 bits

Vertex Index: 16 bits

Vertices – 115b

Vertex: 16 bits

Index: 16 bits

Color: 8 bits

Normal: 9 bits

Triangles – 64b

Triangle Index: 16 bits

Vertex Index: 16 bits

60

Page 61: EECS 150 Components & Design Techniques For Digital Systems …cs150/sp09/Lecture/lec28-gpu.pdf · A cube whose faces are made up of triangles. This is a 3-D model of a cube -- model

Java Framework

61

Page 62: EECS 150 Components & Design Techniques For Digital Systems …cs150/sp09/Lecture/lec28-gpu.pdf · A cube whose faces are made up of triangles. This is a 3-D model of a cube -- model

Java vs Hardware Comparison

� Java simulation on top

� Hardware Render on Bottom

� No lights yet� Ignore the

Black Vs. Gray background

62

Page 63: EECS 150 Components & Design Techniques For Digital Systems …cs150/sp09/Lecture/lec28-gpu.pdf · A cube whose faces are made up of triangles. This is a 3-D model of a cube -- model

More Java vs Hardware

-Java on the left, Actual render on the right-No hardware results yet for pixel shaded teapot

-Hardware fully functional (buggy shader code)-One thing to note: 500+ vertices and 900+ triangles

63

Page 64: EECS 150 Components & Design Techniques For Digital Systems …cs150/sp09/Lecture/lec28-gpu.pdf · A cube whose faces are made up of triangles. This is a 3-D model of a cube -- model

UC Regents Spring 2009 © UCBEECS 150 L28: Graphics Processors

Next Tuesday’s Lecture:

Highly recommended !

64