stack value file : custom microarchitecture for the stack

29
1 R ® Stack Value File Stack Value File : : Custom Microarchitecture for Custom Microarchitecture for the Stack the Stack Hsien-Hsin Lee Hsien-Hsin Lee Mikhail Mikhail Smelyanskiy Smelyanskiy Chris Newburn Chris Newburn Gary Tyson Gary Tyson University of Michigan University of Michigan Intel Corporation Intel Corporation

Upload: saber

Post on 01-Feb-2016

60 views

Category:

Documents


0 download

DESCRIPTION

Stack Value File : Custom Microarchitecture for the Stack. Hsien-Hsin Lee Mikhail Smelyanskiy Chris Newburn Gary Tyson. University of Michigan Intel Corporation. Agenda. Organization of Memory Regions Stack Reference Characteristics Stack Value File Performance Analysis - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Stack Value File  :  Custom Microarchitecture for the Stack

1

R

®

Stack Value FileStack Value File : : Custom Microarchitecture for the StackCustom Microarchitecture for the Stack

Hsien-Hsin LeeHsien-Hsin Lee Mikhail Mikhail SmelyanskiySmelyanskiy

Chris NewburnChris Newburn Gary TysonGary Tyson

University of MichiganUniversity of Michigan

Intel CorporationIntel Corporation

Page 2: Stack Value File  :  Custom Microarchitecture for the Stack

2Hsien-Hsin Lee HPCA-7

R

®

AgendaAgenda

Organization of Memory RegionsStack Reference CharacteristicsStack Value FilePerformance AnalysisConclusions

Page 3: Stack Value File  :  Custom Microarchitecture for the Stack

3Hsien-Hsin Lee HPCA-7

R

®

Memory Space Memory Space PartitioningPartitioning

Based on programming language

Non-overlapped subdivisions

Split code and data I-cache & D-cache

Split data into regions– Stack ()– Heap ()– Global (static)– Read-only (static)

Protected

reserved

reservedmax mem

min mem

Read-only data

Code Region

Global Static Data Region

Heap grows upward

Stack grows downward

Page 4: Stack Value File  :  Custom Microarchitecture for the Stack

4Hsien-Hsin Lee HPCA-7

R

®

Memory Access Memory Access DistributionDistribution

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Read-only

Heap ref

Static ref

Stack ref

SPEC2000int benchmark (Alpha binary) 42% instructions access memory

Page 5: Stack Value File  :  Custom Microarchitecture for the Stack

5Hsien-Hsin Lee HPCA-7

R

®

Access Method Access Method BreakdownBreakdown

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Read-only ref

Heap ref

Static thru $gpr

Static thru $gp

Stack thru $gpr

Stack thru $fp

Stack thru $sp

86% of the stack references use ($sp+disp)

Page 6: Stack Value File  :  Custom Microarchitecture for the Stack

6Hsien-Hsin Lee HPCA-7

R

®

Morphing $sp-relative Morphing $sp-relative ReferencesReferences

Morph $sp-relative references into register accesses

Use a Stack Value File (SVF)Resolve address early in decode stage

for stack-pointer indexed accessesResolve stack memory dependency

earlyAliased references are re-routed to

SVF

Page 7: Stack Value File  :  Custom Microarchitecture for the Stack

7Hsien-Hsin Lee HPCA-7

R

®

Stack Reference Stack Reference CharacteristicsCharacteristics

Contiguity –Good temporal and spatial locality– Can be stored in a simple, fast

structure•Smaller die area relative to a regular cache

•Less power dissipation

–No address tag need for each datum

Page 8: Stack Value File  :  Custom Microarchitecture for the Stack

8Hsien-Hsin Lee HPCA-7

R

®

Stack Reference Stack Reference CharacteristicsCharacteristics

First touch is almost always a StoreStore–Avoid waste bandwidth to bring in

dead data–A register write to the SVF

Deallocated stack frame–Dead data–No need to write them back to memory

Page 9: Stack Value File  :  Custom Microarchitecture for the Stack

9Hsien-Hsin Lee HPCA-7

R

®

Baseline Baseline MicroarchitectureMicroarchitecture

Ld/StUnit

Instr-Cache Decoder

ArchRF

ReOrder Buffer

Fetch Decode Dispatch

Issue Execute Commit

MO

B

Reservation Station / L

SQDecoderQ

RegRenamer

(RAT) Func Unit

Page 10: Stack Value File  :  Custom Microarchitecture for the Stack

10Hsien-Hsin Lee HPCA-7

R

®

Microarchitecture Microarchitecture ExtensionExtension

Hash

MaxSP

Ld/StUnit

SP

Pre-Decode

Instr-Cache

offset

Decoder

ArchRF

Value Stack

File

ReOrder Buffer

Fetch Decode Dispatch

Issue Execute Commit

interlock

MO

B

Reservation Station / L

SQDecoderQ

Reg

Morphing

Renamer(RAT) Func Unit

Page 11: Stack Value File  :  Custom Microarchitecture for the Stack

11Hsien-Hsin Lee HPCA-7

R

®

Microarchitecture Microarchitecture ExtensionExtension

Hash

MaxSP

Ld/StUnit

SP

Pre-Decode

Instr-Cache

offset

Decoder

ArchRF

Value Stack

File

ReOrder Buffer

Fetch Decode Dispatch

Issue Execute Commit

interlock

MO

B

Reservation Station / L

SQDecoderQ

Reg

Morphing

Renamer(RAT) Func Unit

stq $r10, 24($sp)stq $r10, 24($sp)

TOSTOS

Page 12: Stack Value File  :  Custom Microarchitecture for the Stack

12Hsien-Hsin Lee HPCA-7

R

®

Microarchitecture Microarchitecture ExtensionExtension

Hash

MaxSP

Ld/StUnit

SP

Pre-Decode

Instr-Cache

offset

Decoder

ArchRF

Value Stack

File

ReOrder Buffer

Fetch Decode Dispatch

Issue Execute Commit

interlock

MO

B

Reservation Station / L

SQDecoderQ

Reg

Morphing

Renamer(RAT) Func Unit

stq $r10, 24($sp)stq $r10, 24($sp)

33

TOSTOS

Page 13: Stack Value File  :  Custom Microarchitecture for the Stack

13Hsien-Hsin Lee HPCA-7

R

®

Microarchitecture Microarchitecture ExtensionExtension

Hash

MaxSP

Ld/StUnit

SP

Pre-Decode

Instr-Cache

offset

Decoder

ArchRF

Value Stack

File

ReOrder Buffer

Fetch Decode Dispatch

Issue Execute Commit

interlock

MO

B

Reservation Station / L

SQDecoderQ

Reg

MorphingMorphing

Renamer(RATRAT) Func Unit

stq $r10, 24($sp)stq $r10, 24($sp)

TOSTOS

$r35 $r35 ROB-18ROB-18

Page 14: Stack Value File  :  Custom Microarchitecture for the Stack

14Hsien-Hsin Lee HPCA-7

R

®

Microarchitecture Microarchitecture ExtensionExtension

Hash

MaxSP

Ld/StUnit

SP

Pre-Decode

Instr-Cache

offset

Decoder

ArchRF

Value Stack

File

ReOrder Buffer

Fetch Decode Dispatch

Issue Execute Commit

interlock

MO

B

Reservation Station / L

SQDecoderQ

Reg

MorphingMorphing

Renamer(RATRAT) Func Unit

stq $r10, 24($sp)stq $r10, 24($sp)

TOSTOS

$r35 $r35 ROB-18ROB-18

Page 15: Stack Value File  :  Custom Microarchitecture for the Stack

15Hsien-Hsin Lee HPCA-7

R

®

Microarchitecture Microarchitecture ExtensionExtension

Hash

MaxSP

Ld/StUnit

SP

Pre-Decode

Instr-Cache

offset

Decoder

ArchRF

Stack

Value File

ReOrder Buffer

Fetch Decode Dispatch

Issue Execute Commit

interlock

MO

B

Reservation Station / L

SQDecoderQ

Reg

MorphingMorphing

Renamer(RATRAT) Func Unit

stq $r10, 24($sp)stq $r10, 24($sp)

TOSTOS

$r35 $r35 SVF3SVF3

Page 16: Stack Value File  :  Custom Microarchitecture for the Stack

16Hsien-Hsin Lee HPCA-7

R

®

Why could SVF be faster ?Why could SVF be faster ?

It reduces the latency of stack references

It effectively increases the number of memory port by rerouting more than ½ of all memory references to the SVF

It reduces contention in the MOB More flexibility in renaming stack

references It reduces memory traffic

Page 17: Stack Value File  :  Custom Microarchitecture for the Stack

17Hsien-Hsin Lee HPCA-7

R

®

Simulation FrameworkSimulation FrameworkParamemters 4-wide 8-wide 16-wideDecode width 4 8 16

Issue width 4 8 16Commit width 4 8 16

IFQ size 16 32 64LSQ size 32 64 128RUU size 64 128 256DL1$ size 4w 64KB 4w 64KB 4w 64KB

DL1$ latency 3 3 3UL2$ size 4w 512KB 4w 512KB 4w 512KB

UL2$ latency 16 16 16Mem latency 60 60 60

Simple Scalar (Alpha binary), OOO model

Page 18: Stack Value File  :  Custom Microarchitecture for the Stack

18Hsien-Hsin Lee HPCA-7

R

®

Speedup Potential of SVFSpeedup Potential of SVF

1.0

1.1

1.2

1.3

1.4

1.5

1.6

1.7

1.8

1.9

bzip2 crafty eon gap gcc gzip mcf parser twolf vortex perlbmk vpr Avg

4-wide 8-wide 16-wide 16-wide (gshare)

Assume all references can be morphed ~30% speedup for a 16-wide with dual-ported L1

Page 19: Stack Value File  :  Custom Microarchitecture for the Stack

19Hsien-Hsin Lee HPCA-7

R

®

SVF Reference Type SVF Reference Type BreakdownBreakdown

86% stack references can be morphed Re-routed references enter normal memory pipeline

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

rerouted_svf_st

rerouted_svf_ld

fast_svf_st

fast_svf_ld

Page 20: Stack Value File  :  Custom Microarchitecture for the Stack

20Hsien-Hsin Lee HPCA-7

R

®

Comparison with stack Comparison with stack cachecache

0.8

0.9

1.0

1.1

1.2

1.3

1.4

1.5

1.6

bzip2 crafty eon gap gcc gzip mcf parser twolf vortex perlbmk vpr Avg

Baseline (2+0) StackCache (2+2) SVF (2+2) Baseline (4+0)

(R+S) : RRegular and SStack or SSVF cache ports

Page 21: Stack Value File  :  Custom Microarchitecture for the Stack

21Hsien-Hsin Lee HPCA-7

R

®

Memory TrafficMemory Traffic

SVF dramatically reduces memory traffic by many order of magnitude.– For gcc, ~28M (Stk$ L2) reduced to

~86K (SVF L1). Incoming traffic is eliminated because

SVF does not allocate a cache line on a miss.

Outgoing traffic consists of only those wordswords that are dirty when evicted (instead of entire cache lines).

Page 22: Stack Value File  :  Custom Microarchitecture for the Stack

22Hsien-Hsin Lee HPCA-7

R

®

SVF over Baseline SVF over Baseline PerformancePerformance

1.0

1.2

1.4

1.6

1.8

2.0

2.2

2.4

2.6

bzip2 crafty eon gap gcc gzip mcf parser twolf vortex perlbmk vpr Avg

(1+1)/(1+0) (1+2)/(1+0) (2+2)/(2+0) (2+3)/(2+0) (2+4)/(2+0)

(R+S) : RRegular and SSVF cache ports

Page 23: Stack Value File  :  Custom Microarchitecture for the Stack

23Hsien-Hsin Lee HPCA-7

R

®

ConclusionsConclusions

Stack references have several unique characteristics – Contiguity, $sp+disp, first reference

store, frame deallocation.Stack Value File

– a microarchitecture extension to exploit these characteristics

– improves performance by 24 - 65%

Page 24: Stack Value File  :  Custom Microarchitecture for the Stack

24

R

®

Questions & AnswersQuestions & Answers

Page 25: Stack Value File  :  Custom Microarchitecture for the Stack

25

R

®

That's all, folks !!!That's all, folks !!!

http://www.eecs.umich.edu/~linear

Page 26: Stack Value File  :  Custom Microarchitecture for the Stack

26

R

®

Backup FoilsBackup Foils

Page 27: Stack Value File  :  Custom Microarchitecture for the Stack

27Hsien-Hsin Lee HPCA-7

R

®

Stack Depth VariationStack Depth Variation

Page 28: Stack Value File  :  Custom Microarchitecture for the Stack

28Hsien-Hsin Lee HPCA-7

R

®

Offset Locality of StackOffset Locality of Stack

Cumulative offset within a function call

Avg: 3b - 380b >80% offset

within“400b” >99% offset

within“8Kb”10

20

30

40

50

60

70

80

90

100

10 100 1000 10000

Offset in Bytes (Log scale)

Cu

mu

lati

ve %

Page 29: Stack Value File  :  Custom Microarchitecture for the Stack

29Hsien-Hsin Lee HPCA-7

R

®

ConclusionsConclusionsStack reference features

– Contiguity– No dirty writeback when stack deallocated

Stack Value File– Fast indexing.– Alleviate multi-porting L1 cache.– Smaller, No tags, and less power.– Exploiting ILP