Cost Effective Memory Dependence Prediction Using Speculation Levels and
Color SetsSoner Önder
Michigan Technological University, Houghton MI
www.cs.mtu.edu/~soner
2Outline
Background Memory dependence prediction. Pairing based approach. Store sets.
Color sets Notion of color sets. Color set implementation.
Color set predictor. Instruction window modifications.
Experimental evaluation Basic policy. Aggressive policy.
3
Memory Dependence Prediction
Seq.
1
2
3
p
p+1
p+2
p+3
Instruction
ST-1
ST-2
ST-3
ST-p
ST-p+1
ST-p+2
LD-s
Ready
No
Yes
No
Yes
Yes
No
Yes
St-pNop
•Assume ST-2, ST-p and LD-s all access the same memory location.
•If we issue LD-s at this point in time, we’ll get a memory order violation.
•If we know Load Ld-s is dependent on Store St-p, we can issue the load at the right time.
4
Dynamic Memory Disambiguation
Problem: In the presence of unresolved
stores in the instruction window, which load(s) must be held?
Ideal Solution: Wait only for the producer store.
Simple Solutions:
Wait for all - no speculation. Issue blindly - blind speculation.
5
Memory dependence prediction(Moshovos et al. 1997-1998)
•Earlier work which mainly concentrated on predicting precise dependencies among pairs of load/store instructions :
To enable early issuing of loads through memory dependence prediction.
To streamline communication so that values can be directly passed from producers to consumers instead of through memory.
•Emphasis has been given to identifying the precise store instruction a load may depend on.
6
Store-set Memory Dependence Predictor(Chrysos & Emer - 1998)
A store set is the set of all stores a load has been observed to be dependent on.
• Initially employ blind speculation for loads.
• Upon memory order violation create a store set for the offending load and store.
• Next time the same load is encountered make the load wait until the store issues.
• Store set may contain multiple stores: chain the stores and make load dependant upon the last store.
7
SSID
Store-set Implementation
PC
LFST
•Dependence information is digested to create SETS of colliding instructions.•Each set tells exactly which stores a load should wait for.•Sufficiently large tables yield performance of an ORACLE.
8Color Set predictor
•Instead of
predicting precise dependencies among pairs of loads/stores
or
constructing sets of store and load instructions which collided in the past,
We assign the processor, load and store instructions various speculation levels (colors) and predict the speculation level (i.e.,the color) a load or store can be issued without a collision.
Pre
dicto
r size
9Color Set predictor
Since we only try to predict the speculation level, we expect to have:
smaller storage for the predictor,
better performance at smaller hardware budgets,
faster implementations,
power savings and
more collisions.
10So, it is something like this
The rules governing the color change:policies.
We investigate two policies, a basic policy and an aggressive policy.
00 01 10 11 Processor
00 01 10 11 Load
11Load instruction selection
00 01 10 11
Eligible load instructions
Current processor color
12Load instruction selection
00 01 10 11
Eligible load instructions
Current processor color
13Load instruction selection
00 01 10 11
Eligible load instructions
Current processor color
14Load instruction selection
00 01 10 11
Eligible load instructions
Current processor color
15
Instruction window extensions
colorInhibit Window details
+
+
+
+
+
+
Instructions entering window
0
0
1
0
1
0
0
+
<=
Issue?
0
Global color
16Collisions
00 01 10 11
Current processor color
01
load
01
store load store
0110
17
Color Set Predictor Basic Policy
1. Basic policy gradually becomes aggressive when port utilization is low.
2. The load instruction is given a higher color and a store instruction given a lower color upon a collision.
3. Processor runs at the smaller of the current processor color and the color of the store instructions.
4. Rules 2 & 3 together runs the processor at a lower speculation level than the level the prior collision has occurred.
18
Color Set Predictor Aggressive Policy
1. Aggressive policy switches to maximum speculation level when port utilization is low.
2. The load instruction is given a higher color and a store instruction is specifically marked upon a collision.
3. Processor decrements the current processor color when a colliding store is detected.
4. As a result, the processor runs at the highest speculation level that won’t result in a collision and at a different color than the color it had during the collision.
19Color Set Predictor
•Accessed early in the pipeline using L/S PC•Updated upon collision/successful speculation
L/S PC
10
Basic Policy
00 No speculation01 Level 110 Level 211 Level 3
Aggressive Policy
00 No speculation01 Level 110 Level 211 Level 3/Colliding store
L/S color
20
Processor’s colorful perspective
00 01 10 11
Low port utilization
Colliding stores
Basic policy •When port utilization is low, the processor moves on to next color.
•Processor assumes the lowest ranking store’s color.
21
Processor’s colorful perspective
Low port utilization
Colliding stores
Aggressive policy
00 01 10 11
•When a colliding store enters the window, the processor decrements its color.
•When port utilization is low, processor switches to red.
22Load instruction color states
00 01 10 11
Collision
Successful speculation
Both policies
23Simulation Framework
•Aggressive out-of-order superscalar processor:
8 instructions/cycle fetch/dispatch
16 instructions/cycle retire width
64 entry centralized reservation station
8 symmetric functional units
Multi-block gshare fetch unit
2 memory ports r/w
Perfect D-cache
•Simulated using cycle-accurate simulators generated automatically from ADL descriptions using the FAST system.
24Performance Spec Fp
2
2.2
2.4
2.6
2.8
3
3.2
3.4
3.6
4096 2048 1024 512 256 128
Predictor size
IPC
Color Set Basic
Color Set Aggressive
Store Set
Arithmetic Mean
25Performance Spec Fp
1.8
2
2.2
2.4
2.6
2.8
3
3.2
3.4
4096 2048 1024 512 256 128
Predictor size
IPC
Color Set Basic
Color Set Aggressive
Store Set
Harmonic Mean
26Performance Spec Int
2.5
2.7
2.9
3.1
3.3
3.5
3.7
3.9
4096 2048 1024 512 256 128
Predictor size
IPC
Color Set Basic
Color Set Aggressive
Store Set
Arithmetic Mean
27Performance Spec Int
2.5
2.7
2.9
3.1
3.3
3.5
3.7
3.9
4096 2048 1024 512 256 128
Predictor size
IPC
Color Set Basic
Color Set Aggressive
Store Set
Harmonic Mean
28
Individual benchmarks 128-Fp
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
I PC Basic Policy Aggressive Policy Store set
29
Individual benchmarks 4096-Fp
0
1
2
3
4
5
6
101.to
mca
tv
102.
swim
103.
su2c
or10
4.hy
dro2
d
107.
mgr
id
110.
appl
u12
5.tu
rb3d
141.ap
si
145.
f ppp
p
146.
wav
e5
A-M
ean
H-M
ean
I PC Basic Policy Aggressive Policy Store set
30
Individual benchmarks 128-Int
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
099.
go
124.
m88
ksim
126.
gcc
130.
li
132.
li
132.
ij peg
134.
perl
147.
vort
ex
A-M
ean
H-M
ean
I PC Basic Policy Aggressive Policy Store set
31
Individual benchmarks 4096-Int
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
099.
go
124.
m88
ksim
126.
gcc
130.
li
132.
li
132.
ij peg
134.
perl
147.
vort
ex
A-M
ean
H-M
ean
I PC Basic Policy Aggressive Policy Store set
32So ...
•Cost effective dependence prediction.
•Why does it work?
•Design space: Number of colors/number of entries.
Confidence mechanisms.
Other policies.
•Power consumption Disable chunks of predictor and use basic
policy;
Enable and become aggressive.
Have a colorful evening
Soner Önder
Michigan Technological University
Antalya, Turkey