exploiting value locality in physical register files

33
Exploiting Value Locality in Physical Register Files Saisanthosh Balakrishnan Guri Sohi University of Wisconsin-Madison 36 th Annual International Symposium on Microarchitecture

Upload: phelan-bonner

Post on 04-Jan-2016

44 views

Category:

Documents


1 download

DESCRIPTION

Exploiting Value Locality in Physical Register Files. Saisanthosh Balakrishnan Guri Sohi University of Wisconsin-Madison. 36 th Annual International Symposium on Microarchitecture. Motivation. More in-flight instructions (ILP, SMT). Need more physical registers - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Exploiting Value Locality in Physical Register Files

Exploiting Value Locality inPhysical Register Files

Saisanthosh Balakrishnan Guri SohiUniversity of Wisconsin-Madison

36th Annual International Symposium on Microarchitecture

Page 2: Exploiting Value Locality in Physical Register Files

Exploiting Value Locality in Physical Register Files 2

Motivation

Need more physical registers

Increase in area, access time, power

More in-flight instructions (ILP, SMT)

Access locality: Hierarchical and register caches

Communication locality: Banked and clustered design

Late allocation: Virtual-physical registers

Value locality: Physical register reuse

Optimized Design Optimized Storage

Reduce storage requirements:

1. Exploit register value locality

2. Simplify for common values

Page 3: Exploiting Value Locality in Physical Register Files

Exploiting Value Locality in Physical Register Files 3

Outline

The property: Value locality in register file

Optimized storage schemes

Results

Conclusion

Page 4: Exploiting Value Locality in Physical Register Files

Exploiting Value Locality in Physical Register Files 4

Value Locality

Locality of the results produced by all dynamic instructions

1. Identify the most common results

2. Locality in the results produced (register writes)

3. Duplication in register file

Page 5: Exploiting Value Locality in Physical Register Files

Exploiting Value Locality in Physical Register Files 5

Value Locality

bzip2 crafty gap gcc gzip mcf ammp art 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 4831843632 7 2 4 4 5368778784 5368710000 2 5368712376 3 81 4831839248 32 5 2560 4.59187e+18 62914560 5369648560 5 3 2 4831863760 1884672376 4.58549e+18 65536 8 5369767936 52 3 10 3584 3 5368712432 2 8 -1 5368758224 32 6656 5370448344 32 5369777344 3 59 16 2 5632 32 2 6 4 7 -1 49 48 7 3 5368862128 16 5 8 -1 14848 10

10 most common values in some SPEC CPU2000 benchmarks

0 and 1 are most common results on all benchmarks

Page 6: Exploiting Value Locality in Physical Register Files

Exploiting Value Locality in Physical Register Files 6

Value Locality

Consider valid registers only

Many registers could hold the same value

Duplication in register file

0%

10%

20%

30%

40%

50%

60%

80

128

160 80

128

160

SpecInt Average SpecFP Average

0 1 Other

Locality in all results produced (register writes)

Percentage of values written to registersalready present in the physical register file (80, 128, 160

regs.)

Page 7: Exploiting Value Locality in Physical Register Files

Exploiting Value Locality in Physical Register Files 7

Value Locality

0%

5%

10%

15%

20%

25%

30%

80

12

8

16

0

80

12

8

16

0

SpecInt Average SpecFP Average

0 1 Other

Duplication of values in physical register file

# of duplicate registers = # of valid physical registers – # of unique values in the

register file

Value being held in n registers (n-1) duplicate registers

Number of duplicate registers depends on register lifetime

60% to 85% of duplication because of 0 and 1

Page 8: Exploiting Value Locality in Physical Register Files

Exploiting Value Locality in Physical Register Files 8

Observations

1. Many physical registers hold same value

2. 0’s and 1’s are most common results

Reuse physical registersInstructions with same result mapped to one physical registerFirst proposed by [Jourdan et al. MICRO-31]Reduced storage requirements

Optimized storage for common valuesRegisterless StorageExtended Registers and Tags Simplified micro-architectural changes

Page 9: Exploiting Value Locality in Physical Register Files

Exploiting Value Locality in Physical Register Files 9

Outline

Value locality in register file

Exploiting it: Optimized storage schemesPhysical Register ReuseRegisterless StorageExtended Registers and Tags

Results

Conclusion

Page 10: Exploiting Value Locality in Physical Register Files

Exploiting Value Locality in Physical Register Files 10

Physical Register Reuse

1. Conventional renaming of destination register

2. On execution, detect if result present in register file

3. Free assigned physical register

4. Remap instruction’s destination register

5. Handle dependent instructions

Many instructions with same result map to one physical register

Register file with unique values More free registers

Page 11: Exploiting Value Locality in Physical Register Files

Exploiting Value Locality in Physical Register Files 11

Physical Register Reuse

Fetch Decode Rename Queue Schedule Reg. Read Writeback Commit

ValueCache

Value cache – to detect duplicate results

Maps physical register tags to values CAM structure, returns tag for a value Actions to invalidate / add entries

Execute

Page 12: Exploiting Value Locality in Physical Register Files

Exploiting Value Locality in Physical Register Files 12

Physical Register Reuse

Fetch Decode Rename Queue Schedule Writeback Commit

ValueCache

Reference counter for every register

Increment when mapped Decrement when freed Return to free list when 0

Reference counting in Register File

Refe

ren

ce c

ou

nte

rs

Physical Register

File

ExecuteReg. Read

Page 13: Exploiting Value Locality in Physical Register Files

Exploiting Value Locality in Physical Register Files 13

Physical Register Reuse

Fetch Decode Queue Schedule Reg. Read Execute Writeback Commit

ValueCache

Refe

ren

ce c

ou

nte

rs

Physical Register

File

Update rename map. Remap instruction’s destination register

Handling dependent instructions

Rename

Page 14: Exploiting Value Locality in Physical Register Files

Exploiting Value Locality in Physical Register Files 14

Physical Register Reuse

Fetch Decode Rename Schedule Execute Writeback Commit

ValueCache

Refe

ren

ce c

ou

nte

rs

Physical Register

File

AliasTable

Queue

Re-broadcast new destination register tag

Re-broadcast <invalid_src_tag>. Lookup alias table on register read

Handling dependent instructions

Reg. Read

Page 15: Exploiting Value Locality in Physical Register Files

Exploiting Value Locality in Physical Register Files 15

Physical Register Reuse

Reduced register requirements

Avoids register write of duplicate values

Non-trivial micro-architectural changes Value Cache lookup, Alias Table indirection, Reference countsRecovering from exceptions

Remapping of destination register requires re-broadcast

Page 16: Exploiting Value Locality in Physical Register Files

Exploiting Value Locality in Physical Register Files 16

Outline

Value locality in register file

Optimized storage schemesPhysical register reuseRegisterless Storage (0’s & 1’s)Extended Registers and Tags (0’s & 1’s)

Results

Conclusion

Page 17: Exploiting Value Locality in Physical Register Files

Exploiting Value Locality in Physical Register Files 17

Registerless Storage

Exploit common value locality – state bits for 0 and 1

Register file without 0s and 1s More free registers

1. Conventional renaming of destination register

2. On execution, detect if result is 0 or 1

3. Free assigned physical register

4. Remap instruction’s destination register to reserved tags

5. Communicate value directly to dependent instructions

Page 18: Exploiting Value Locality in Physical Register Files

Exploiting Value Locality in Physical Register Files 18

Registerless Storage

Simplified micro-architectural changesNo Value Cache, Alias Table, Reference counts

No registers for 0 and 1: Reduced register requirements

Eliminates register reads and writes of 0 and 1

Optimize storage for common values. But, avoid remapping destination register tag

Remapping of destination register requires re-broadcast

Page 19: Exploiting Value Locality in Physical Register Files

Exploiting Value Locality in Physical Register Files 19

Extended Registers and Tags

Associate physical register with 2-bit extensionV: Valid and D: data = {0, 1}

Most 0’s and 1’s in register extensions

Physical register V D

Rename: Assign physical register with its extension (if available) Execute: If result is 0 or 1

Use extension, if available. Free physical register.Physical register can be assigned to some other instruction

Page 20: Exploiting Value Locality in Physical Register Files

Exploiting Value Locality in Physical Register Files 20

Extended Register and Tags

Extended tagging scheme eliminates remapping

Increase tag (N-bits) namespace by 1-bit (MSB)

Unmodified free list

To assign a tag:1. Get tag from free list2. Get MSB from the corresponding extension’s valid bit

MSB = 0 {register, extension}MSB = 1 {register}

Tag management

Page 21: Exploiting Value Locality in Physical Register Files

Exploiting Value Locality in Physical Register Files 21

Extended Registers and Tags

Physical register V DMSB = 1

Physical register 1 DMSB = 0Result = {0, 1}

Physical register V DMSB = 1

Register read operation

Physical register 1 DMSB = 0

Physical register 0 DMSB = 0

Register write operation

Physical register 0 DMSB = 0Result != {0, 1}

Never used

Page 22: Exploiting Value Locality in Physical Register Files

Exploiting Value Locality in Physical Register Files 22

Extended Registers and Tags

Simplified micro-architectural changes

Extended registers hold 0’s and 1’s

Better design

Some common values use physical registers

Page 23: Exploiting Value Locality in Physical Register Files

Exploiting Value Locality in Physical Register Files 23

Outline

Value locality

Optimized storage schemes

ResultsPerformance – more in-flight instructionsBenefit – smaller register file

Reduced register traffic

Conclusion

Page 24: Exploiting Value Locality in Physical Register Files

Exploiting Value Locality in Physical Register Files 24

Performance

0%

5%

10%

15%

20%

bzi

p2

craf

ty

eon

gap gcc

gzi

p

mcf

par

ser

per

l

two

lf

vort

ex vpr

amm

p

app

lu

apsi ar

t

equ

ake

face

rec

fma3

d

gal

gel

luca

s

mes

a

mg

rid

sixt

rack

swim

wu

pwis

e

int-

avg

fp-a

vg

Register Reuse (0-c Alias Table)

Relative performance improvement with increased instruction window

Sp

eed

up

Page 25: Exploiting Value Locality in Physical Register Files

Exploiting Value Locality in Physical Register Files 25

Performance

-15%

-10%

-5%

0%

5%

10%

15%

20%

bzi

p2

cra

fty

eo

n

gap

gcc

gzi

p

mc

f

pars

er

perl

two

lf

vo

rtex

vp

r

am

mp

ap

plu

ap

si

art

eq

ua

ke

fac

ere

c

fma3

d

galg

el

luc

as

me

sa

mg

rid

six

trac

k

sw

im

wu

pw

ise

int-

avg

fp-a

vg

Register Reuse (1-c Alias Table) Register Reuse (0-c Alias Table)

Relative performance improvement with increased instruction window

Sp

eed

up

Page 26: Exploiting Value Locality in Physical Register Files

Exploiting Value Locality in Physical Register Files 26

Performance

-15%

-10%

-5%

0%

5%

10%

15%

20%

bzi

p2

cra

fty

eo

n

ga

p

gc

c

gzi

p

mc

f

pa

rse

r

pe

rl

two

lf

vo

rte

x

vp

r

am

mp

ap

plu

ap

si

art

eq

ua

ke

fac

ere

c

fma

3d

ga

lge

l

luc

as

me

sa

mg

rid

six

tra

ck

sw

im

wu

pw

ise

int-

av

g

fp-a

vg

Register Reuse (1-c Alias Table) Registerless StorageRegister Reuse (0-c Alias Table)

Relative performance improvement with increased instruction window

Sp

eed

up

Page 27: Exploiting Value Locality in Physical Register Files

Exploiting Value Locality in Physical Register Files 27

Performance

-15%

-10%

-5%

0%

5%

10%

15%

20%

bzi

p2

cra

fty

eo

n

ga

p

gc

c

gzi

p

mc

f

pa

rse

r

pe

rl

two

lf

vo

rte

x

vp

r

am

mp

ap

plu

ap

si

art

eq

ua

ke

fac

ere

c

fma

3d

ga

lge

l

luc

as

me

sa

mg

rid

six

tra

ck

sw

im

wu

pw

ise

int-

av

g

fp-a

vg

Register Reuse (1-c Alias Table) Extended Regs & TagsRegisterless Storage Register Reuse (0-c Alias Table)

Relative performance improvement with increased instruction window

Sp

eed

up

Page 28: Exploiting Value Locality in Physical Register Files

Exploiting Value Locality in Physical Register Files 28

Register Traffic

0%

4%

8%

12%

16%

bzip

2

cra

fty

eo

n

ga

p

gc

c

gzip

mc

f

pa

rse

r

pe

rl

two

lf

vo

rte

x

vp

r

am

mp

ap

plu

ap

si

art

eq

ua

ke

fac

ere

c

fma

3d

ga

lge

l

luc

as

me

sa

mg

rid

six

tra

ck

sw

im

wu

pw

ise

int-

av

g

fp-a

vg

0 1

Register read reduction with Registerless storage

0%

10%

20%

30%

40%

bzip

2

craf

ty

eon

gap

gcc

gzip

mcf

pars

er

perl

twol

f

vort

ex vpr

amm

p

appl

u

apsi art

equa

ke

face

rec

fma3

d

galg

el

luca

s

mes

a

mgr

id

sixt

rack

swim

wup

wis

e

int-

avg

fp-a

vg

Register write reduction with Registerless storage

Page 29: Exploiting Value Locality in Physical Register Files

Exploiting Value Locality in Physical Register Files 29

Conclusions

Value localityObserve duplication in physical register fileSignificant common value locality

Optimization schemesPhysical Register Reuse0’s and 1’s: Registerless Storage and Extended Registers and Tags

BenefitsReduced register requirementsReduced register trafficPower savings, better design

Page 30: Exploiting Value Locality in Physical Register Files

Questions

Page 31: Exploiting Value Locality in Physical Register Files

Exploiting Value Locality in Physical Register Files 31

Comparison

Value cache Identify {0, 1} Identify {0, 1}

Reuse register holding the same value. Free reg.

Free register Write 0 or 1 to extension, if available. Free register

Update rename map. Alias table or Re-broadcast

Update rename map. Re-broadcast {0, 1}

Recover ref. counts, aliastable, value cache

Reg. File with unique values

Reg. File without 0, 1 0, 1 in extensions

Exploit

Detect

Handle dep instns

Handle exception

Outcome

Physical Reg. Reuse Reg.less storage ERT

Page 32: Exploiting Value Locality in Physical Register Files

Exploiting Value Locality in Physical Register Files 32

Simulation Methodology

Alpha ISA. SimpleScalar based.Nops removed at the front-endRegister r31 not included in value locality measurements

Detailed out-of-order simulator4-wide machine, 14-stage pipeline, 1-cycle register file accessBase case: 128 physical registers, 256 entry instruction windowInstruction and data caches

64KB, 2-way set associative, 64-byte line size, 3-cycle access L2 is 2MB unified with 128-byte line size, 6-cycle access

BenchmarksSPEC CPU2000 suite (12 int and 14 fp benchmarks)Reference inputs run to completion

Page 33: Exploiting Value Locality in Physical Register Files

Exploiting Value Locality in Physical Register Files 33

Performance

-15%

-10%

-5%

0%

5%

10%

15%

20%

bzi

p2

craf

ty

eon

gap gcc

gzi

p

mcf

par

ser

per

l

two

lf

vort

ex vpr

amm

p

app

lu

apsi art

equ

ake

face

rec

fma3

d

gal

gel

luca

s

mes

a

mg

rid

sixt

rack

swim

wu

pw

ise

int-

avg

fp-a

vg

Register Reuse (1-c Alias Table)

Relative performance improvement with increased instruction window