chip multi-threading keeps the data center cool · •chip multithreading does keep the data center...

24
David Greenhill Distinguished Engineer Sun Microsystems Inc [email protected] Chip Multi-Threading Keeps the Data Center Cool

Upload: others

Post on 09-Jul-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Chip Multi-Threading Keeps the Data Center Cool · •Chip multithreading does keep the data center cool •2 generations of the Niagara processor •Future challenges >Very high

David GreenhillDistinguished Engineer Sun Microsystems [email protected]

Chip Multi-Threading Keepsthe Data Center Cool

Page 2: Chip Multi-Threading Keeps the Data Center Cool · •Chip multithreading does keep the data center cool •2 generations of the Niagara processor •Future challenges >Very high

FDIP 2006 David Greenhil 2

Introduction• Market forces are driving the following

> More performance> More threaded workloads> Power limited designs

• Technology is driving> Higher power each generation> Less performance from frequency gains> Good scaling of I/O bandwidth through SERDES

➔ This is exploited by Sun Chip Multi-threaded (CMT) Systems

Page 3: Chip Multi-Threading Keeps the Data Center Cool · •Chip multithreading does keep the data center cool •2 generations of the Niagara processor •Future challenges >Very high

FDIP 2006 David Greenhil 3

NoParallelism CC MM CC MM CC MM

ComputeInstruction

LevelParallelism

CC MM CC MM CC MM

Time

Time SavedThreadLevel

Parallelism

CC MMCC MM

CC MM

● ILP Offers Limited Headroom● TLP Provides Greater Performance Efficiency

Comparing Modern CPU Design Techniques

Memory

Page 4: Chip Multi-Threading Keeps the Data Center Cool · •Chip multithreading does keep the data center cool •2 generations of the Niagara processor •Future challenges >Very high

FDIP 2006 David Greenhil 4

Core 1

Memory Latency Compute

CMT – Multithreaded Cores

Thread 4Thread 3Thread 2Thread 1

Core 2Thread 4Thread 3Thread 2Thread 1

Core 3Thread 4Thread 3Thread 2Thread 1

Thread 4Thread 3Thread 2Thread 1

Core 4

Core 5Thread 4Thread 3Thread 2Thread 1

Core 6Thread 4Thread 3Thread 2Thread 1

Core 7Thread 4Thread 3Thread 2Thread 1

Core 8Thread 4Thread 3Thread 2Thread 1

Time

Page 5: Chip Multi-Threading Keeps the Data Center Cool · •Chip multithreading does keep the data center cool •2 generations of the Niagara processor •Future challenges >Very high

FDIP 2006 David Greenhil 5

Features:• 8 64-bit Multithreaded SPARC Cores• Shared 3MB L2 Cache• 16KB I-Cache per Core• 8KB D-Cache per Core• 4 144-bit DDR2 channels• 3.2 GB/sec JBUS I/O

Technology:• TI's 90nm CMOS Process• 9LM Cu Interconnect• 63 Watts @ 1.2GHz/1.2V• Die Size: 378mm2

• 279M Transistors• Flip-chip ceramic LGA

SPARC

Core 1

SPARC

Core 3

SPARC

Core 5

SPARC

Core 7

DD

R2_

0D

DR

2_1

DD

R2_

2D

DR

2_3

L2 Data

Bank 0

L2 Data

Bank 1

L2 Data

Bank 3

L2 Data

Bank 2L2Tag

Bank 0

L2Tag

Bank 2

L2 Buff

Bank 0L2 Buff

Bank 1

CLK /

Test

Unit

DRAM

Ctl 0,2

DRAM

Ctl 1,3

CROSSBAR

JBUS

IO

BridgeL2 Buff

Bank 2L2 Buff

Bank 3

FPU

SPARC

Core 0

SPARC

Core 2

SPARC

Core4

SPARC

Core 6

L2Tag

Bank 1

L2Tag

Bank 3

Niagara Micrograph and Overview

Page 6: Chip Multi-Threading Keeps the Data Center Cool · •Chip multithreading does keep the data center cool •2 generations of the Niagara processor •Future challenges >Very high

FDIP 2006 David Greenhil 6

CMOS Power

(Ignoring 2nd order terms)

Power = aCV2F + Leakage (P,V,T)

a = activity factorC = capacitance of nodesV = voltageF = chip frequency

Page 7: Chip Multi-Threading Keeps the Data Center Cool · •Chip multithreading does keep the data center cool •2 generations of the Niagara processor •Future challenges >Very high

FDIP 2006 David Greenhil 7

• Fully static design • Fine granularity clock gating

for datapaths (30% flops disabled)

• Lower 1.5 P/N width ratio for library cells

• Interconnect wire classes optimized for power x delay

• SRAM activation control

SPARC CoresLeakage

L2DataL2Tag UnitL2 Buffer Unit

Crossbar

Floating PointMisc Units

Wires & RptrsGlobal Clock

IOs

63W @ 1.2GHz / 1.2V< 2 Watts / Thread

Cores(26%)

Leakage(25%)

IOs(11%)

L2Cache(12%)

Wires &Rptrs(17%)Xbar

(6%)

“Niagara” T1 Chip Power

Page 8: Chip Multi-Threading Keeps the Data Center Cool · •Chip multithreading does keep the data center cool •2 generations of the Niagara processor •Future challenges >Very high

FDIP 2006 David Greenhil 8

84 86 87 88 90 91 93 94 95 97 98 99 01 02 04 051.001.251.501.752.002.252.502.753.003.253.503.754.004.254.504.755.00

Personal View of Processor Designs

Voltage

Year

Volta

ge

InmosTransputer

UltraSparc

Niagara

Page 9: Chip Multi-Threading Keeps the Data Center Cool · •Chip multithreading does keep the data center cool •2 generations of the Niagara processor •Future challenges >Very high

FDIP 2006 David Greenhil 9

84 86 87 88 90 91 93 94 95 97 98 99 01 02 04 051.001.251.501.752.002.252.502.753.003.253.503.754.004.254.504.755.00

Personal View of Processor Designs

Voltage

Year

Volta

ge

Trend of Mid 1990's not sustainable

Page 10: Chip Multi-Threading Keeps the Data Center Cool · •Chip multithreading does keep the data center cool •2 generations of the Niagara processor •Future challenges >Very high

FDIP 2006 David Greenhil 10

84 86 87 88 90 91 93 94 95 97 98 99 01 02 04 051.001.251.501.752.002.252.502.753.003.253.503.754.004.254.504.755.00

102030405060708090100110120130140150

Personal View of Processor Designs

Voltage Power

Year

Volta

ge

ChipMultithreadingEffect

InmosTransputer

Ultrasparc

Niagara

Page 11: Chip Multi-Threading Keeps the Data Center Cool · •Chip multithreading does keep the data center cool •2 generations of the Niagara processor •Future challenges >Very high

FDIP 2006 David Greenhil 11

Niagara low power • Niagara low power style

> No speculation> No out of order> No complex branch

prediction> No predication

> Short pipeline> Moderate clock frequency> Static CMOS design> Threading to cover

memory latency

• Typical competitor> Lots of speculation

Out of order etc> Wide issue> Deep pipelines> High frequency> Lots of dynamic circuits> Long stall when memory

is accessed

Page 12: Chip Multi-Threading Keeps the Data Center Cool · •Chip multithreading does keep the data center cool •2 generations of the Niagara processor •Future challenges >Very high

FDIP 2006 David Greenhil 12

CoolThreadsTM Advantages

59oC 107oC

66oC59oC

66oC59oC59oC

59oC

• Improved reliability with lower and more uniform junction temperatures– Increased lifetime – Total failure rate reduced by

~8X (vs 105oC)• Optimized performance/

reliability trade-off– Frequency guardbands due to

CHC, NBTI, etc. reduced by > 55%

– Reduced design margins (EM/NBTI)

– Less variation across die

Page 13: Chip Multi-Threading Keeps the Data Center Cool · •Chip multithreading does keep the data center cool •2 generations of the Niagara processor •Future challenges >Very high

FDIP 2006 David Greenhil 13

Data Center Constraints• Many data centers are maxed out

> Some constrained by cooling limits> Others by electrical substations

• Getting new buildings & equipment is expensive• In some locations e.g. financial centers its

impossible• Performance of the data center is constrained by

performance/watt of the servers

Page 14: Chip Multi-Threading Keeps the Data Center Cool · •Chip multithreading does keep the data center cool •2 generations of the Niagara processor •Future challenges >Very high

FDIP 2006 David Greenhil 14

Niagara-2

• Double throughput versus UltraSparc T1> Maintain Sparc binary compatibility

> http://opensparc.sunsource.net/nonav/index.html

• Improve throughput / watt• Improve single-thread performance• Integrate important SOC components

> Networking> Cryptography

Page 15: Chip Multi-Threading Keeps the Data Center Cool · •Chip multithreading does keep the data center cool •2 generations of the Niagara processor •Future challenges >Very high

FDIP 2006 David Greenhil 15

Niagara-2 Chip Overview • 8 Sparc cores, 8 threads each

• Shared 4MB L2, 8-banks, 16-way associative

• Four dual-channel FBDIMM memory controllers

• Two 10/1 Gb Enet ports w/onboard packet classification and filtering

• One PCI-E x8 1.0 port

• 711 signal I/O, 1831 total

Page 16: Chip Multi-Threading Keeps the Data Center Cool · •Chip multithreading does keep the data center cool •2 generations of the Niagara processor •Future challenges >Very high

FDIP 2006 David Greenhil 16

Sparc Core Block Diagram

EXU1

IFU

LSUSPU

TLU

MMU/HWTW

FGU

Gasket

xbar/L2

EXU0

• IFU – Instruction Fetch Unit> 16 KB I$, 32B lines, 8-way SA> 64-entry fully-associative ITLB

• EXU0/1 – Integer Execution Units> 4 threads share each unit> Executes one integer

instruction/cycle• LSU – Load/Store Unit

> 8KB D$, 16B lines, 4-way SA> 128-entry fully-associative

DTLB• FGU – Floating/Graphics Unit• SPU – Stream Processing Unit

> Cryptographic acceleration• TLU – Trap Logic Unit

> Updates machine state, handles exceptions and interrupts

• MMU – Memory Management Unit> Hardware tablewalk (HWTW)> 8KB, 64KB, 4MB, 256MB pages

Page 17: Chip Multi-Threading Keeps the Data Center Cool · •Chip multithreading does keep the data center cool •2 generations of the Niagara processor •Future challenges >Very high

FDIP 2006 David Greenhil 17

01/97 01/99 01/01 01/03 01/05 01/07100

1000

10000

SDram-100

Rambus-800

SDram-133

DDR-266

Rambus-1066

XDR-3200

DDR-400

DDR2-667

FBDIMM1

DDR3-1066

FBDIMM2

Dram Data Rates Versus Time

Date

Dat

a R

ate

Mb/

s

Page 18: Chip Multi-Threading Keeps the Data Center Cool · •Chip multithreading does keep the data center cool •2 generations of the Niagara processor •Future challenges >Very high

FDIP 2006 David Greenhil 18

Dram Issues• Niagara I uses DDRII – 400• Niagara II uses FBDIMM 4Gb/s

> Higher data rate is good > AMB power & cost is a problem

• Need to amortize the serialization cost across more memories> Stacking technologies> More Dram/DIMM> Other configurations of buffers to fanout to DDR DIMMs

Page 19: Chip Multi-Threading Keeps the Data Center Cool · •Chip multithreading does keep the data center cool •2 generations of the Niagara processor •Future challenges >Very high

FDIP 2006 David Greenhil 19

Niagara & Niagara II Packages

Page 20: Chip Multi-Threading Keeps the Data Center Cool · •Chip multithreading does keep the data center cool •2 generations of the Niagara processor •Future challenges >Very high

FDIP 2006 David Greenhil 20

Future packaging requirements• Challenges for future packages• Similar pin counts but:• Data rates keep increasing to match higher

processor performance• Costs getting squeezed – particularly in entry level

servers

Page 21: Chip Multi-Threading Keeps the Data Center Cool · •Chip multithreading does keep the data center cool •2 generations of the Niagara processor •Future challenges >Very high

FDIP 2006 David Greenhil 21

At the system level

E10000199732 x US277.4 ft3

2000 lbs13,456 W52,000 BTUs/hr

20051 x US T10.85 ft3

37 lbs~300 W1,364 BTUs/hr

T2000

Yesterdays Vertical System Todays Horizontal System

91x smaller54x lighter44x cooler

Page 22: Chip Multi-Threading Keeps the Data Center Cool · •Chip multithreading does keep the data center cool •2 generations of the Niagara processor •Future challenges >Very high

FDIP 2006 David Greenhil 22

T2000

Page 23: Chip Multi-Threading Keeps the Data Center Cool · •Chip multithreading does keep the data center cool •2 generations of the Niagara processor •Future challenges >Very high

FDIP 2006 David Greenhil 23

Conclusion• Chip multithreading does keep the data center cool • 2 generations of the Niagara processor• Future challenges

> Very high data rate interfaces> Managing power is an on going challenge at both CPU,

Memory and System Levels

Page 24: Chip Multi-Threading Keeps the Data Center Cool · •Chip multithreading does keep the data center cool •2 generations of the Niagara processor •Future challenges >Very high

Thankyou !

David GreenhillDistinguished Engineer [email protected]