nc state university transparent control independence (tci) ahmed s. al-zawawi vimal k. reddy eric...

34
NC STATE UNIVERSITY Transparent Control Independence (TCI) Ahmed S. Al-Zawawi Vimal K. Reddy Eric Rotenberg Haitham H. Akkary* *Dept. of Electrical & Computer Engineering *North Carolina State University, Raleigh, NC *Digital Enterprise Group *Intel Corporation, Hillsboro, OR

Upload: keyshawn-sizer

Post on 29-Mar-2015

249 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: NC STATE UNIVERSITY Transparent Control Independence (TCI) Ahmed S. Al-Zawawi Vimal K. Reddy Eric Rotenberg Haitham H. Akkary* *Dept. of Electrical & Computer

NC STATE UNIVERSITY

Transparent Control Independence (TCI)

Ahmed S. Al-ZawawiVimal K. ReddyEric Rotenberg

Haitham H. Akkary*

*Dept. of Electrical & Computer Engineering*North Carolina State University, Raleigh, NC

*Digital Enterprise Group*Intel Corporation, Hillsboro, OR

Page 2: NC STATE UNIVERSITY Transparent Control Independence (TCI) Ahmed S. Al-Zawawi Vimal K. Reddy Eric Rotenberg Haitham H. Akkary* *Dept. of Electrical & Computer

NC STATE UNIVERSITY

Effect of branch mispredictions

Branch misprediction rate of 5%-10% still a problem Each misprediction squash’s 100s of inst. Reduces performance: limits window size Increases power: useless speculative work© 2007 Ahmed S. Al-Zawawi ISCA 34 2

Page 3: NC STATE UNIVERSITY Transparent Control Independence (TCI) Ahmed S. Al-Zawawi Vimal K. Reddy Eric Rotenberg Haitham H. Akkary* *Dept. of Electrical & Computer

NC STATE UNIVERSITY

© 2007 Ahmed S. Al-Zawawi ISCA 34 3

Control independence basics

branch

R5

R5

R5

reconv.

control-independentdata-dependent

(CIDD)

control-dependent (CD)

control-independent (CI)

control-independentdata-independent

(CIDI)

Page 4: NC STATE UNIVERSITY Transparent Control Independence (TCI) Ahmed S. Al-Zawawi Vimal K. Reddy Eric Rotenberg Haitham H. Akkary* *Dept. of Electrical & Computer

NC STATE UNIVERSITY

© 2007 Ahmed S. Al-Zawawi ISCA 34 4

Control independence basics

branch

R5

R5

R5

reconv.

control-independentdata-dependent

(CIDD)

control-dependent (CD)

control-independentdata-independent

(CIDI)control-independent

(CI)

Page 5: NC STATE UNIVERSITY Transparent Control Independence (TCI) Ahmed S. Al-Zawawi Vimal K. Reddy Eric Rotenberg Haitham H. Akkary* *Dept. of Electrical & Computer

NC STATE UNIVERSITY

© 2007 Ahmed S. Al-Zawawi ISCA 34 5

Control independence basics

branch

R5

R5

R5

reconv.

control-independentdata-dependent

(CIDD)

control-dependent (CD)

control-independentdata-independent

(CIDI)control-independent

(CI)

Page 6: NC STATE UNIVERSITY Transparent Control Independence (TCI) Ahmed S. Al-Zawawi Vimal K. Reddy Eric Rotenberg Haitham H. Akkary* *Dept. of Electrical & Computer

NC STATE UNIVERSITY

© 2007 Ahmed S. Al-Zawawi ISCA 34 6

Control independence basics

control-independent (CI)

branch

R5

R5

R5

reconv.

control-independentdata-dependent

(CIDD)

control-dependent (CD)

control-independentdata-independent

(CIDI)

Page 7: NC STATE UNIVERSITY Transparent Control Independence (TCI) Ahmed S. Al-Zawawi Vimal K. Reddy Eric Rotenberg Haitham H. Akkary* *Dept. of Electrical & Computer

NC STATE UNIVERSITY

© 2007 Ahmed S. Al-Zawawi ISCA 34 7

Four steps for exploiting CI

Page 8: NC STATE UNIVERSITY Transparent Control Independence (TCI) Ahmed S. Al-Zawawi Vimal K. Reddy Eric Rotenberg Haitham H. Akkary* *Dept. of Electrical & Computer

NC STATE UNIVERSITY

© 2007 Ahmed S. Al-Zawawi ISCA 34 8

Four steps for exploiting CI

1. Identify reconv. point

Page 9: NC STATE UNIVERSITY Transparent Control Independence (TCI) Ahmed S. Al-Zawawi Vimal K. Reddy Eric Rotenberg Haitham H. Akkary* *Dept. of Electrical & Computer

NC STATE UNIVERSITY

© 2007 Ahmed S. Al-Zawawi ISCA 34 9

Four steps for exploiting CI

1. Identify reconv. point

2. Remove/Insert CD inst.

Page 10: NC STATE UNIVERSITY Transparent Control Independence (TCI) Ahmed S. Al-Zawawi Vimal K. Reddy Eric Rotenberg Haitham H. Akkary* *Dept. of Electrical & Computer

NC STATE UNIVERSITY

© 2007 Ahmed S. Al-Zawawi ISCA 34 10

Four steps for exploiting CI

1. Identify reconv. point

2. Remove/Insert CD inst.

3. Identify CIDD inst.

Page 11: NC STATE UNIVERSITY Transparent Control Independence (TCI) Ahmed S. Al-Zawawi Vimal K. Reddy Eric Rotenberg Haitham H. Akkary* *Dept. of Electrical & Computer

NC STATE UNIVERSITY

© 2007 Ahmed S. Al-Zawawi ISCA 34 11

Four steps for exploiting CI

1. Identify reconv. point

2. Remove/Insert CD inst.

3. Identify CIDD inst.

4. Repair CIDD inst.a) Fix data dependencies

b) Re-execute CIDD inst.

Page 12: NC STATE UNIVERSITY Transparent Control Independence (TCI) Ahmed S. Al-Zawawi Vimal K. Reddy Eric Rotenberg Haitham H. Akkary* *Dept. of Electrical & Computer

NC STATE UNIVERSITY

CIDI-supplied source value

© 2007 Ahmed S. Al-Zawawi ISCA 34 12

Insert correct CD instructions in middle of the window: Repair program order

Re-execute CIDD instructions:Re-reference values from CIDI instructions

Squash wrong CD instructionsIdentify wrong CD inst. and CIDD inst.

CIDD instructions

Wrong CD instructions

Conventional CI misprediction recovery

Misprediction

R

CI inst.CD inst.

Instruction WindowBr

Page 13: NC STATE UNIVERSITY Transparent Control Independence (TCI) Ahmed S. Al-Zawawi Vimal K. Reddy Eric Rotenberg Haitham H. Akkary* *Dept. of Electrical & Computer

NC STATE UNIVERSITY

2. Dependence order between CIDD & CIDI inst.:

Re-executing CIDD instructions requires preserving referenced CIDI instructions

1. Program order between CD & CI inst:

Fine-grain retirement using ROB requires reordering the correct CD inst. with the CI inst.

© 2007 Ahmed S. Al-Zawawi ISCA 34 13

Conventional CI limitations

Fully decouple CIDI instructions

from CD & CIDD instructions

Goal of selective misprediction recovery:

Page 14: NC STATE UNIVERSITY Transparent Control Independence (TCI) Ahmed S. Al-Zawawi Vimal K. Reddy Eric Rotenberg Haitham H. Akkary* *Dept. of Electrical & Computer

NC STATE UNIVERSITY

© 2007 Ahmed S. Al-Zawawi ISCA 34 14

No need to identify wrong CD and CIDD instructionsInsert correct CD instructions like any new instructions

Insert duplicate CIDD instructions like any new instructions

Repair program state using self-sufficient recovery program

while relaxing program order

TCI misprediction recovery

Misprediction

R

CI inst.CD inst.

Correct CD inst.Duplicate CIDD inst.

Recoveryprogram

Instruction WindowBr

Page 15: NC STATE UNIVERSITY Transparent Control Independence (TCI) Ahmed S. Al-Zawawi Vimal K. Reddy Eric Rotenberg Haitham H. Akkary* *Dept. of Electrical & Computer

NC STATE UNIVERSITY

CIDI-suppliedsource value

© 2007 Ahmed S. Al-Zawawi ISCA 34 15

Leverage checkpointed source values to mimic the effect of program order

Exploit coarse-grain checkpoint-based retirement to relax ordering constraints

TCI misprediction recovery

Misprediction

R

Recoveryprogram C

heckpoint 2

branchcheckpoint

Duplicate CIDD inst.Correct CD inst.

In-order retirement is not possible wheninstructions are out of program order

Leverage branch checkpoint for correct CD instructions

CIDD instructions

Checkpoint-based retirement enablesaggressive register reclamation (e.g., CPR):Completed instructions free their resources

Instruction WindowBr

Checkpoint 1

Checkpoint CIDI-supplied source values

Page 16: NC STATE UNIVERSITY Transparent Control Independence (TCI) Ahmed S. Al-Zawawi Vimal K. Reddy Eric Rotenberg Haitham H. Akkary* *Dept. of Electrical & Computer

NC STATE UNIVERSITY

© 2007 Ahmed S. Al-Zawawi ISCA 34 16

Transparent Control Independence TCI repairs program state, not program order TCI pipeline is recovery-free

Transparent recovery by fetching additional instructions with checkpointed source values

TCI pipeline is free-flowing Leverage conventional speculation to execute

correct and incorrect instructions quickly and efficiently

Completed instructions free their resources

Page 17: NC STATE UNIVERSITY Transparent Control Independence (TCI) Ahmed S. Al-Zawawi Vimal K. Reddy Eric Rotenberg Haitham H. Akkary* *Dept. of Electrical & Computer

NC STATE UNIVERSITY

© 2007 Ahmed S. Al-Zawawi ISCA 34 17

TCI microarchitecture

Add repair rename map Add selective re-execution buffer (RXB)

correctCD

3

CI2

1 predicted CD

I$ Spec. Map

Checkpoints

Repair MapRXB

IQ RF FU

to RXB(CIDD instructions)

to RXB(CIDD source values)

draininstructions

4 re-execute CIDD

Page 18: NC STATE UNIVERSITY Transparent Control Independence (TCI) Ahmed S. Al-Zawawi Vimal K. Reddy Eric Rotenberg Haitham H. Akkary* *Dept. of Electrical & Computer

NC STATE UNIVERSITY

© 2007 Ahmed S. Al-Zawawi ISCA 34 18

Predict the branch

Instructions execute and leave the pipeline when done

branch

R5

R5

R5

reconv.

CD

CI

CIDD

predict actual

correctCD

3

CI2I$ Spec.

Map

Checkpoints

Repair MapRXB

IQ RF

1 predicted CD

FU

to RXB(CIDD instructions)

to RXB(CIDD source values)

draininstructions

4 re-execute CIDD

Page 19: NC STATE UNIVERSITY Transparent Control Independence (TCI) Ahmed S. Al-Zawawi Vimal K. Reddy Eric Rotenberg Haitham H. Akkary* *Dept. of Electrical & Computer

NC STATE UNIVERSITY

© 2007 Ahmed S. Al-Zawawi ISCA 34 19

Construct recovery program

Copy duplicate of CIDD inst. with their source values

into RXB

branch

R5

R5

R5

reconv.

CD

CI

CIDD

predict actualre-execute CIDD4

3correct

CD

predicted CD1

I$ Spec. Map

Checkpoints

Repair MapRXB

IQ RF FU

to RXB(CIDD instructions)

to RXB(CIDD source values)

draininstructions

2 CI

Page 20: NC STATE UNIVERSITY Transparent Control Independence (TCI) Ahmed S. Al-Zawawi Vimal K. Reddy Eric Rotenberg Haitham H. Akkary* *Dept. of Electrical & Computer

NC STATE UNIVERSITY

© 2007 Ahmed S. Al-Zawawi ISCA 34 20

Insert correct CD instructions

Load branch checkpoint into repair rename map, then

fetch correct CD inst.

branch

R5

R5

R5

reconv.

CD

CI

CIDD

predict actualre-execute CIDD4

CI2

predicted CD1

I$ Spec. Map

Checkpoints

Repair MapRXB

IQ RF FU

to RXB(CIDD instructions)

to RXB(CIDD source values)

draininstructions

3correct

CD

Page 21: NC STATE UNIVERSITY Transparent Control Independence (TCI) Ahmed S. Al-Zawawi Vimal K. Reddy Eric Rotenberg Haitham H. Akkary* *Dept. of Electrical & Computer

NC STATE UNIVERSITY

© 2007 Ahmed S. Al-Zawawi ISCA 34 21

Repair & re-execute CIDD instructions

Inject duplicate CIDD inst.with their checkpointed

source values

branch

R5

R5

R5

reconv.

CD

CI

CIDD

predict actual

correctCD

3

CI2

predicted CD1

I$ Spec. Map

Checkpoints

Repair MapRXB

IQ RF FU

to RXB(CIDD instructions)

to RXB(CIDD source values)

draininstructions

4 re-execute CIDD

Page 22: NC STATE UNIVERSITY Transparent Control Independence (TCI) Ahmed S. Al-Zawawi Vimal K. Reddy Eric Rotenberg Haitham H. Akkary* *Dept. of Electrical & Computer

NC STATE UNIVERSITY

© 2007 Ahmed S. Al-Zawawi ISCA 34 22

Merge repair & spec. rename maps

Copy corrected register mappings from repair map to spec. map

branch

R5

R5

R5

reconv.

CD

CI

CIDD

predict actualre-execute CIDD4

CI2

predicted CD1

I$ Spec. Map

Checkpoints

Repair MapRXB

IQ RF FU

to RXB(CIDD instructions)

to RXB(CIDD source values)

draininstructions

5Mergemap

Page 23: NC STATE UNIVERSITY Transparent Control Independence (TCI) Ahmed S. Al-Zawawi Vimal K. Reddy Eric Rotenberg Haitham H. Akkary* *Dept. of Electrical & Computer

NC STATE UNIVERSITY

1. Identifying CIDD instructions: Control-flow stack (CFS) detects nested reconv. points Influenced register set (IRS) and branch-sets

2. RXB reconstruction: CIDD inst. of multiple branches are co-mingled A misprediction may require repairing RXB

3. Renaming partial programs: Re-rename recovery program despite its CIDI gaps

4. Merging repair/speculative rename maps

© 2007 Ahmed S. Al-Zawawi ISCA 34 23

TCI implementation details

Page 24: NC STATE UNIVERSITY Transparent Control Independence (TCI) Ahmed S. Al-Zawawi Vimal K. Reddy Eric Rotenberg Haitham H. Akkary* *Dept. of Electrical & Computer

NC STATE UNIVERSITY

To RXB

To IQ Temporary Buffer (TB)ID

I$

B2

pred.actual

CI

xyz

181920

R21617

14

111213

CD

B1

pred.actual

CI

456

R189

14

111213

CD

21

© 2007 Ahmed S. Al-Zawawi ISCA 34 24

Example: construct the RXB

B2 R2

9 x 16 18 20

RXBTail

B1 R1

B1 & B2 are branches R1 & R2 are reconvergent points Rectangular inst. are CIDD on B1 Oval inst. are CIDD on B2

Selective Re-execution Buffer (RXB)

Page 25: NC STATE UNIVERSITY Transparent Control Independence (TCI) Ahmed S. Al-Zawawi Vimal K. Reddy Eric Rotenberg Haitham H. Akkary* *Dept. of Electrical & Computer

NC STATE UNIVERSITY

B2

pred.actual

CI

xyz

181920

R21617

14

111213

CD

B2

pred. act.

CI

xyz

181920

R21617

14

111213

CD

To RXB

To IQ Temporary Buffer (TB)ID

I$

RXBTail

12

© 2007 Ahmed S. Al-Zawawi ISCA 34 25

Dispatch 11 Don’t insert 11 into the RXB:

CIDI w.r.t. B1 & B2

Fetch correct CD: 11 and 12 Meanwhile pre-read 16 to Temp Buffer

Rollback RXB tail, like complete squash Initiate RXB pre-read pointer Start fetching correct CD

Dispatch 12 Insert 12 into the RXB:

CIDD w.r.t. B1

12

12

Example: reconstructing the RXB

B2 R2 RXB Pre-read

9 x 16 18 20

11,1211

RXBTail

Objective of this example: Inject recovery program for B2 Reconstruct RXB for B1

B1 R1

Page 26: NC STATE UNIVERSITY Transparent Control Independence (TCI) Ahmed S. Al-Zawawi Vimal K. Reddy Eric Rotenberg Haitham H. Akkary* *Dept. of Electrical & Computer

NC STATE UNIVERSITY

To RXB

To IQ Temporary Buffer (TB)ID

I$

© 2007 Ahmed S. Al-Zawawi ISCA 34 26

Dispatch 13 Don’t insert 13 into the RXB:

CIDI w.r.t. B1 & B2

Reconvergence point detected Correct CD complete

Dispatch 14 Insert 14 into the RXB:

CIDD w.r.t. B1

Fetch correct CD: 13 and 14 Meanwhile pre-read 18 to Temp Buffer

1414

14

B2 R2 RXB Pre-read

9

16

18 20

13,1413

RXBTail

Example: reconstructing the RXB

12

z

xyCD

pred.B2

act.

CI181920

R21617

14

111213

B1 R1

Page 27: NC STATE UNIVERSITY Transparent Control Independence (TCI) Ahmed S. Al-Zawawi Vimal K. Reddy Eric Rotenberg Haitham H. Akkary* *Dept. of Electrical & Computer

NC STATE UNIVERSITY

To RXB

To IQ Temporary Buffer (TB)ID

I$

© 2007 Ahmed S. Al-Zawawi ISCA 34 27

Dispatch 18:CIDD w.r.t. B2

Don’t insert 18 into the RXB:Not CIDD w.r.t. B1

Dispatch 20:CIDD w.r.t. B2

Insert 20 into the RXB:CIDD w.r.t. B1

B2 recovery program injection complete B1 recovery program is maintained and

compressed

Don’t dispatch 16:Not CIDD w.r.t. B2

Insert 16 into the RXB:CIDD w.r.t. B1

Begin renaming CIDD instructions from Temp Buffer

Meanwhile pre-read 20 into Temp Buffer

20

RXBTail

20 20

Example: reconstructing the RXB

B2 R2 RXB Pre-read

9

16 18

20

161820

12 14

z

xyCD

pred.B2

act.

CI181920

R21617

14

111213

B1 R1

Page 28: NC STATE UNIVERSITY Transparent Control Independence (TCI) Ahmed S. Al-Zawawi Vimal K. Reddy Eric Rotenberg Haitham H. Akkary* *Dept. of Electrical & Computer

NC STATE UNIVERSITY

© 2007 Ahmed S. Al-Zawawi ISCA 34 28

Simulation methodology

Baseline: Checkpoint-based superscalar processor Issue width: 4 Perceptron branch predictor Register file: 256 registers Branch checkpoints: 16 Load store queue: 512 entries L1 I & L1 D: 64KB 4-way (Hit: 1 cycle) L2: 2MB 8-way (Hit:10 cycles, Miss: 200 cycles)

Benchmarks:11 SPEC2000 INT + 4 SPEC95 INTSimPoint: 10M inst. warm-up + 100M inst. simulated

Page 29: NC STATE UNIVERSITY Transparent Control Independence (TCI) Ahmed S. Al-Zawawi Vimal K. Reddy Eric Rotenberg Haitham H. Akkary* *Dept. of Electrical & Computer

NC STATE UNIVERSITY

© 2007 Ahmed S. Al-Zawawi ISCA 34 29

CIDD inst. re-renaming models Seq CIDD (TCI):

Only CIDD inst. are re-renamed and re-executed Seq CI: [Akkary et al.] [Chou et al.] [Rotenberg et al.]

All CI inst. are re-renamed, but only CIDD inst. re-execute Proxy: [Cher et al.] [Gandhi et al.]

Uses proxy move instructions to insulate CIDD inst. from source name changes

Only proxies are re-renamed Both proxies and CIDD inst. re-execute by holding issue

queue entries

All models have relaxed order through checkpoint-based substrate

Page 30: NC STATE UNIVERSITY Transparent Control Independence (TCI) Ahmed S. Al-Zawawi Vimal K. Reddy Eric Rotenberg Haitham H. Akkary* *Dept. of Electrical & Computer

NC STATE UNIVERSITY

TCI maximum %IPC improvement is 61%(64%)Proxy average %IPC improvement is 6%(11%)© 2007 Ahmed S. Al-Zawawi ISCA 34 30

-15%

-5%

5%

15%

25%

35%

45%

55%

65%

bzip

com

press

crafty gap gcc go

gzipijp

eg lim

cf

parser

perl

twolf

vorte

xvp

r

% I

PC

im

pro

vem

ent

ove

r b

ase

Proxy Seq CI TCI

Results for 32 & 64 entries issue queue

Proxy can degrade performanceSeq CI can degrade performanceTCI average %IPC improvement is 16%(16%)

Page 31: NC STATE UNIVERSITY Transparent Control Independence (TCI) Ahmed S. Al-Zawawi Vimal K. Reddy Eric Rotenberg Haitham H. Akkary* *Dept. of Electrical & Computer

NC STATE UNIVERSITY

Proxy is bandwidth efficient, but resource inefficient© 2007 Ahmed S. Al-Zawawi ISCA 34 31

Varying the issue queue size

TCI is both bandwidth and resource efficientSeq CI is bandwidth inefficient, but resource efficient

Page 32: NC STATE UNIVERSITY Transparent Control Independence (TCI) Ahmed S. Al-Zawawi Vimal K. Reddy Eric Rotenberg Haitham H. Akkary* *Dept. of Electrical & Computer

NC STATE UNIVERSITY

© 2007 Ahmed S. Al-Zawawi ISCA 34 32

Varying the RXB size

0.0

0.5

1.0

1.5

2.0

2.5

32 64 128 256 512

RXB Size

Ha

rmo

nic

me

an

IPC

Seq CIDD (TCI)Seq CIBase

In Seq CI, the RXB limits the window sizeTCI overcomes problem by only buffering CIDD inst.

Page 33: NC STATE UNIVERSITY Transparent Control Independence (TCI) Ahmed S. Al-Zawawi Vimal K. Reddy Eric Rotenberg Haitham H. Akkary* *Dept. of Electrical & Computer

NC STATE UNIVERSITY

Conclusion

Recover program state, not program order Transparent branch misprediction recovery

using fully decoupled recovery program Resource efficient

All instructions execute, drain, and free resources quickly based on conventional speculation

Bandwidth efficient TCI only re-sequences CIDD instructions

© 2007 Ahmed S. Al-Zawawi ISCA 34 33

Page 34: NC STATE UNIVERSITY Transparent Control Independence (TCI) Ahmed S. Al-Zawawi Vimal K. Reddy Eric Rotenberg Haitham H. Akkary* *Dept. of Electrical & Computer

NC STATE UNIVERSITY

Questions