snatch: opportunistically reassigning power allocation...

Post on 20-Aug-2020

12 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Snatch: Opportunistically Reassigning Power Allocation between Processor

and Memory in 3D Stacks

Dimitrios Skarlatos, Renji Thomas, Aditya Agrawal, Shibin Qin, Robert Pilawa, Ulya Karpuzcu, Radu Teodorescu, Nam Sung Kim, and Josep Torrellas

UIUC, OSU, UMN, NVIDIA

1

Motivation: Cost of Power/Ground Pins in 3D stacks

Memory VR

PCB substrate

Processor die

C4 bumps

DRAM0 die

DRAM1 die

microbumps

Processor VR

BGA pins

TSVs

2

Motivation: Cost of Power/Ground Pins in 3D stacks

• Size & cost of packages is proportional to # of pins

Memory VR

PCB substrate

Processor die

C4 bumps

DRAM0 die

DRAM1 die

microbumps

Processor VR

BGA pins

TSVs

3

Motivation: Cost of Power/Ground Pins in 3D stacks

• Size & cost of packages is proportional to # of pins

• 3D Stacks: Disjoint Power/Ground pins for Processor and Memory

Memory VR

PCB substrate

Processor die

C4 bumps

DRAM0 die

DRAM1 die

microbumps

Processor VR

BGA pins

TSVs

4

Motivation: Cost of Power/Ground Pins in 3D stacks

• Size & cost of packages is proportional to # of pins

• 3D Stacks: Disjoint Power/Ground pins for Processor and Memory

• Each dimensioned for the worst case

Memory VR

PCB substrate

Processor die

C4 bumps

DRAM0 die

DRAM1 die

microbumps

Processor VR

BGA pins

TSVs

5

Motivation: Underutilization of Power Budget

• High Processor or Memory Power phases

6

Contribution: Snatch

• Dynamically and opportunistically divert power between processor and memory

Conventional

Processor die

Mem0 die

Mem1 die

Proc PDN

TSVs

Mem PDN

Processor VR

Memory VR

7

Contribution: Snatch

• Dynamically and opportunistically divert power between processor and memory

• On-chip voltage regulator connects the two Power Delivery Networks

• Processor or Memory can consume more power for the same # of pins

Snatch

Processor die

Mem0 die

Mem1 die

Proc PDN

TSVs

Mem PDN

Processor

VR

Memory

VR

On-chip VR

8

Impact Compared to Conventional 3D Stacks

• For same # of power/ground pins:

• Application can consume more power

• Up to 23% application speedup

• For the same maximum power in Processor and Memory

• Fewer pins, about 30% package cost reduction

9

Snatch Outline

• Implementation

• Operation

• Case 1:

• Same Max Power in Processor and Memory, reduced # of pins

• Case 2:

• Same # of pins, improved performance

• Evaluation

10

Snatch Outline

• Implementation

• Operation

• Case 1:

• Same Max Power in Processor and Memory, reduced # of pins

• Case 2:

• Same # of pins, improved performance

• Evaluation

11

Conventional Implementation

0.8-0.95V

5.5W 4.5W

1.1V

12V 12V

Memory VR

PCB substrate

Processor die

C4 bumps

DRAM0 die

DRAM1 die

microbumps

Processor VR

BGA pins

TSVs

Cross-section 12

Snatch implementation

12V 12V

Cross-section

5.5W 4.5W

0.8-0.95V

1.1V

Memory VR

PCB substrate

Processor die

C4 bumps

DRAM0 die

DRAM1 die

microbumps

Processor VR

BGA pins

TSVs

Single on-chip VRSingle On-Chip VR

13

Snatch implementation

• Small 2W on-chip bidirectional VR on Proc die

• Bulk of work from off-chip VRs

12V 12V

Cross-section

5.5W 4.5W

0.8-0.95V

1.1V

Memory VR

PCB substrate

Processor die

C4 bumps

DRAM0 die

DRAM1 die

microbumps

Processor VR

BGA pins

TSVs

Single on-chip VRSingle On-Chip VR

14

Memory VR

PCB substrate

Processor die

C4 bumps

DRAM0 die

DRAM1 die

microbumps

Processor VR

BGA pins

TSVs

Single on-chip VR

Snatch: Dynamic power reassignment

• Up/Down convert power Snatched

Single On-Chip VR

15

Memory VR

PCB substrate

Processor die

C4 bumps

DRAM0 die

DRAM1 die

microbumps

Processor VR

BGA pins

TSVs

Single on-chip VR

Snatch: Dynamic power reassignment

• Up/Down convert power Snatched

0.8-0.95V

1.1V

Single On-Chip VR

2W

16

Snatch: Cross-Section

• Small 2W on-chip bidirectional VR on Proc die

Cross-section

Memory VR

PCB substrate

Processor die

C4 bumps

DRAM0 die

DRAM1 die

microbumps

Processor VR

BGA pins

TSVs

Single on-chip VRSingle On-Chip VR

17

Snatch: Top Down

• Small 2W on-chip bidirectional VR on Proc die

0.8-0.95V

5.5W

4.5W 1.1V

Top Down

Processor

On-chip

VR

Memory

Processor

VR

Memory

VR

PDN

PDN

Memory VR

PCB substrate

Processor die

C4 bumps

DRAM0 die

DRAM1 die

microbumps

Processor VR

BGA pins

TSVs

Single on-chip VRSingle On-Chip VR

Cross-section18

Snatch Outline

• Implementation

• Operation

• Case 1:

• Same Max Power in Processor and Memory, reduced # of pins

• Case 2:

• Same # of pins, improved performance

• Evaluation

19

Snatching Memory Power

• On processor intensive phase

5.5W

4.5W

5.5W

4.5W

Processor

On-chip

VR

Memory

Processor

VR

Memory

VR

PDN

PDN

20

Snatching Memory Power

• On processor intensive phase

• Snatch Memory Power TurboBoost Processor

7.5W

2.5W

5.5W

4.5W

Processor

On-chip

VR

Memory

Processor

VR

Memory

VR

PDN

PDN

2W

21

Snatching Processor Power

• On memory intensive phase

• Snatch Processor Power TurboBoost Memory

3.5W

6.5W

5.5W

4.5W

Processor

On-chip

VR

Memory

Processor

VR

Memory

VR

PDN

PDN

2W

22

Snatching Decisions

• Processor or Memory Intensive Phase?

5.5W

4.5W

5.5W

4.5W

Processor

On-chip

VR

Memory

Processor

VR

Memory

VR

PDN

PDN

?W

23

Snatching Decisions

• Processor or Memory Intensive Phase?

• How much Power is available?

5.5W

4.5W

5.5W

4.5W

Processor

On-chip

VR

Memory

Processor

VR

Memory

VR

PDN

PDN

?W

24

Snatching Decisions

• Processor or Memory Intensive Phase?

• How much Power is available?

• How much Power can we Snatch?5.5W

4.5W

5.5W

4.5W

Processor

On-chip

VR

Memory

Processor

VR

Memory

VR

PDN

PDN

?W

25

Conservative Snatching Algorithm

• Keep track of past power values of 10µs epochs

Processor

On-chip

VR

Memory

Processor

VR

Memory

VR

PDN

PDN

?W

5.5W

4.5W

26

Conservative Snatching Algorithm

• Keep track of past power values of 10µs epochs

• Average for activity detection

Processor

On-chip

VR

Memory

Processor

VR

Memory

VR

PDN

PDN

?W

5.5W

4.5W

27

Conservative Snatching Algorithm

• Keep track of past power values of 10µs epochs

• Average for activity detection

• MAX for power availabilityProcessor

On-chip

VR

Memory

Processor

VR

Memory

VR

PDN

PDN

?W

5.5W

4.5W

28

Conservative Snatching Algorithm

• Keep track of past power values of 10µs epochs

• Average for activity detection

• MAX for power availability

• Avoid hysteresis

Processor

On-chip

VR

Memory

Processor

VR

Memory

VR

PDN

PDN

?W

5.5W

4.5W

29

Snatch Outline

• Implementation

• Operation

• Case 1:

• Same Max Power in Processor and Memory, reduced # of pins

• Case 2:

• Same # of pins, improved performance

• Evaluation

30

Conventional Power Provisioning

• Processor provisioned for 7.5W

Processor

On-chip

VR

Memory

Processor

VR

Memory

VR

PDN

PDN

31

Conventional Power Provisioning

• Processor provisioned for 7.5W

7.5W7.5W

Processor

On-chip

VR

Memory

Processor

VR

Memory

VR

PDN

PDN

32

Conventional Power Provisioning

• Processor provisioned for 7.5W

• Memory provisioned for 6.5W

6.5W

6.5W

7.5W7.5W

Processor

On-chip

VR

Memory

Processor

VR

Memory

VR

PDN

PDN

33

Conventional Power Provisioning

• Processor provisioned for 7.5W

• Memory provisioned for 6.5W

• Total = Processor + Memory = 14W

6.5W

6.5W

7.5W7.5W

Processor

On-chip

VR

Memory

Processor

VR

Memory

VR

PDN

PDN

34

Snatch: Provisioning 3D Stacks Just Right

• Processor provisioned for 7.5W

• Memory provisioned for 6.5W

• Total = Processor + Memory = 14W

6.5W

6.5W

7.5W7.5W

Processor

On-chip

VR

Memory

Processor

VR

Memory

VR

PDN

PDN

35

• Processor provisioned for 7.5W

• Memory provisioned for 6.5W

• Total = Processor + Memory = 14W

Snatch: Provisioning 3D Stacks Just Right

Snatch 2W

6.5W

6.5W

7.5W7.5W

Processor

On-chip

VR

Memory

Processor

VR

Memory

VR

PDN

PDN

36

7.5W5.5W

Snatch: Provisioning 3D Stacks Just Right

5.5W +-2W

Snatch 2W

6.5W

6.5W

Processor

On-chip

VR

Memory

Processor

VR

Memory

VR

PDN

PDN

• Processor provisioned for 7.5W - 2W = 5.5W

• Memory provisioned for 6.5W

• Total = Processor + Memory = 14W

37

6.5W

Snatch: Provisioning 3D Stacks Just Right

4.5W +-2W

7.5W5.5W 5.5W +-2W

Snatch 2W

Processor

On-chip

VR

Memory

Processor

VR

Memory

VR

PDN

PDN

4.5W

• Processor provisioned for 7.5W - 2W = 5.5W

• Memory provisioned for 6.5W - 2W = 4.5W

• Total = Processor + Memory = 14W

38

Snatch: Provisioning 3D Stacks Just Right

Reduce Total Provisioning from 14W to 10W, approx same

performance

6.5W

4.5W +-2W

7.5W5.5W 5.5W +-2W

Snatch 2W

Processor

On-chip

VR

Memory

Processor

VR

Memory

VR

PDN

PDN

4.5W

• Processor provisioned for 7.5W - 2W = 5.5W

• Memory provisioned for 6.5W - 2W = 4.5W

• Total = Processor + Memory = 14W - 4W = 10W

39

Snatch: Provisioning 3D Stacks Just Right

Reduce Total Provisioning from 14W to 10W, approx same

performance

30% Reduction in Package Power/Ground Pins

6.5W

4.5W +-2W

7.5W5.5W 5.5W +-2W

Snatch 2W

Processor

On-chip

VR

Memory

Processor

VR

Memory

VR

PDN

PDN

4.5W

• Processor provisioned for 7.5W - 2W = 5.5W

• Memory provisioned for 6.5W - 2W = 4.5W

• Total = Processor + Memory = 14W - 4W = 10W

40

Snatch Outline

• Implementation

• Operation

• Case 1:

• Same Max Power in Processor and Memory, reduced # of pins

• Case 2:

• Same # of pins, improved performance

• Evaluation

41

Conventional Power Provisioning

• Processor & Memory provisioned for 5.5W & 4.5W

Processor

On-chip

VR

Memory

Processor

VR

Memory

VR

PDN

PDN

42

Conventional Power Provisioning

• Processor & Memory provisioned for 5.5W & 4.5W

5.5W

4.5W

5.5W

4.5W

Processor

On-chip

VR

Memory

Processor

VR

Memory

VR

PDN

PDN

43

Snatch: Provide Additional Power

• Processor & Memory provisioned for 5.5W & 4.5W

• Snatch power

Snatch 2W

4.5W

Processor

On-chip

VR

Memory

Processor

VR

Memory

VR

PDN

PDN5.5W

4.5W

5.5W

44

Snatch: Opportunistically boost performance

• Processor & Memory provisioned for 5.5W & 4.5W

• Snatch power and boost performance

5.5W +-2W

4.5W +-2W

Snatch 2W

Processor

On-chip

VR

Memory

Processor

VR

Memory

VR

PDN

PDN5.5W

4.5W

45

Snatch: Boost Performance with Same # of Pins

• Processor & Memory provisioned for 5.5W & 4.5W

• Snatch power and boost performance

• Same # of pins as conventional5.5W +-2W

4.5W +-2W

Snatch 2W

Processor

On-chip

VR

Memory

Processor

VR

Memory

VR

PDN

PDN5.5W

4.5W

46

Snatch: Boost Performance with Same # of Pins

• Processor & Memory provisioned for 5.5W & 4.5W

• Snatch power and boost performance

• Same # of pins as conventional

Higher performance for the same package cost

5.5W +-2W

4.5W +-2W

Snatch 2W

Processor

On-chip

VR

Memory

Processor

VR

Memory

VR

PDN

PDN5.5W

4.5W

47

Snatch: Boost Performance with Same # of Pins

• Processor & Memory provisioned for 5.5W & 4.5W

• Snatch power and boost performance

• Same # of pins as conventional

Higher performance for the same package cost

IR-drop and EM characteristics remain the same

5.5W +-2W

4.5W +-2W

Snatch 2W

Processor

On-chip

VR

Memory

Processor

VR

Memory

VR

PDN

PDN5.5W

4.5W

48

Snatch Outline

• Implementation

• Operation

• Case 1:

• Same Max Power in Processor and Memory, reduced # of pins

• Case 2:

• Same # of pins, improved performance

• Evaluation

49

Evaluation Methodology

• Case 2: Same # of pins, improved performance

• Processor: 22nm LP 8-core w/ SESC + McPAT

• Memory: 4GB 2-layer WideIO2 w/ DRAMSim2

• Benchmarks: SPLASH-2, NAS, and SPEC

50

Performance

Sp

eed

up

0.4

0.6

0.7

0.9

1.0

1.2

1.3

Average(Splash + NAS) Average(SPEC)

Baseline Turbo Boost Snatch

51

Performance

Sp

eed

up

0.4

0.6

0.7

0.9

1.0

1.2

1.3

Average(Splash + NAS) Average(SPEC)

Baseline Turbo Boost SnatchP = {5.5W, 1.2GHz} M= {4.5W, 400MHz}

Baseline

52

Performance

Sp

eed

up

0.4

0.6

0.7

0.9

1.0

1.2

1.3

Average(Splash + NAS) Average(SPEC)

Baseline Turbo Boost Snatch

P = {5.5W, 1.2-1.5GHz} M= {4.5W, 400-900MHz} DVFS Within Power Budget

P = {5.5W, 1.2GHz} M= {4.5W, 400MHz}

Baseline

Turbo Boost

53

Performance

Sp

eed

up

0.4

0.6

0.7

0.9

1.0

1.2

1.3

Average(Splash + NAS) Average(SPEC)

Baseline Turbo Boost Snatch

P = {5.5W, 1.2-1.5GHz} M= {4.5W, 400-900MHz}DVFS and Snatch up to 2W

P = {5.5W, 1.2-1.5GHz} M= {4.5W, 400-900MHz} DVFS Within Power Budget

P = {5.5W, 1.2GHz} M= {4.5W, 400MHz}

Baseline

Turbo Boost

Snatch

54

P = {5.5W, 1.2-1.5GHz} M= {4.5W, 400-900MHz}DVFS and Snatch up to 2W

P = {5.5W, 1.2-1.5GHz} M= {4.5W, 400-900MHz} DVFS Within Power Budget

P = {5.5W, 1.2GHz} M= {4.5W, 400MHz}

Baseline

Turbo Boost

Snatch

Performance

Sp

eed

up

0.4

0.6

0.7

0.9

1.0

1.2

1.3

Average(Splash + NAS) Average(SPEC)

Baseline Turbo Boost Snatch

25%8%

Snatch boosts performance on average, by 25% against Baseline and 8% against Turbo Boost

for Splash and NAS benchmarks

10%

55

P = {5.5W, 1.2-1.5GHz} M= {4.5W, 400-900MHz}DVFS and Snatch up to 2W

P = {5.5W, 1.2-1.5GHz} M= {4.5W, 400-900MHz} DVFS Within Power Budget

P = {5.5W, 1.2GHz} M= {4.5W, 400MHz}

Baseline

Turbo Boost

Snatch

Performance

Sp

eed

up

0.4

0.6

0.7

0.9

1.0

1.2

1.3

Average(Splash + NAS) Average(SPEC)

Baseline Turbo Boost Snatch

25%8%

Snatch boosts performance on average, by 10% against Baseline for SPEC benchmarks

Negligible gains against Turbo Boost

10%

56

Snatching Activity Overview%

of

To

tal

Tim

e S

na

tch

ing

0

22.5

45

67.5

90

Bar

nes BT

CG

Ch

oles

kyF

FT

FM

M FT IS LU

LU

(NA

S)M

GR

adio

sity

Rad

ixR

aytr

ace

SPW

-Nsq

uar

edW

-Sp

atia

lA

vgM

->P

mcf

mil

clb

mb

zip

2A

vgP

->M

M->P P->M

57

Snatching Activity Overview%

of

To

tal

Tim

e S

na

tch

ing

0

22.5

45

67.5

90

Bar

nes BT

CG

Ch

oles

kyF

FT

FM

M FT IS LU

LU

(NA

S)M

GR

adio

sity

Rad

ixR

aytr

ace

SPW

-Nsq

uar

edW

-Sp

atia

lA

vgM

->P

mcf

mil

clb

mb

zip

2A

vgP

->M

M P P M

58

Snatching Activity Overview%

of

To

tal

Tim

e S

na

tch

ing

0

22.5

45

67.5

90

Bar

nes BT

CG

Ch

oles

kyF

FT

FM

M FT IS LU

LU

(NA

S)M

GR

adio

sity

Rad

ixR

aytr

ace

SPW

-Nsq

uar

edW

-Sp

atia

lA

vgM

->P

mcf

mil

clb

mb

zip

2A

vgP

->M

M P P M

59

Snatching Activity Overview%

of

To

tal

Tim

e S

na

tch

ing

0

22.5

45

67.5

90

Bar

nes BT

CG

Ch

oles

kyF

FT

FM

M FT IS LU

LU

(NA

S)M

GR

adio

sity

Rad

ixR

aytr

ace

SPW

-Nsq

uar

edW

-Sp

atia

lA

vgM

->P

mcf

mil

clb

mb

zip

2A

vgP

->M

M P P M

Application Snatch on average, 30% for Splash+NAS

9.4% for SPEC

60

More On the Paper

• Design and Implementation:

• On-chip Voltage Regulator

• Snatch Algorithm

• Additional Evaluation:

• Snatch Algorithm

• Power Delivery Network

• Pin Reliability

• 3D Stack Thermals

61

Summary

• Snatch: An opportunistic power reassignment design for 3D Stacked architectures

• Small on-chip bidirectional VR

• Processor - Memory phase detection and power availability estimation

• Up to 23% application speedup

• Alternatively, about 30% package cost reduction

62

63Image Source : http://www.hutui6.com/data/out/178/67609941-snatch-wallpapers.jpg

top related