compiler speculative optimizations wei hsu 7/05/2006

49
Compiler Speculative Compiler Speculative Optimizations Optimizations Wei Hsu Wei Hsu 7/05/2006 7/05/2006

Post on 22-Dec-2015

243 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Compiler Speculative Optimizations Wei Hsu 7/05/2006

Compiler Speculative Compiler Speculative OptimizationsOptimizations

Wei HsuWei Hsu

7/05/20067/05/2006

Page 2: Compiler Speculative Optimizations Wei Hsu 7/05/2006

Speculative Speculative ExecutionExecution It means the early execution of code, the result of It means the early execution of code, the result of

which may not be needed (work may be wasteful).which may not be needed (work may be wasteful). In pipelined processor, speculative execution is In pipelined processor, speculative execution is

often used to reduce the cost of branch mis-often used to reduce the cost of branch mis-predictions.predictions.

Some processors automatically prefetch the next Some processors automatically prefetch the next instruction and/or data cache lines into the on-chip instruction and/or data cache lines into the on-chip caches. Prefetch has also been used for disk read.caches. Prefetch has also been used for disk read.

More aggressive speculative execution has been More aggressive speculative execution has been used in “used in “run-aheadrun-ahead” or “” or “execute-aheadexecute-ahead” processors ” processors to warm up the caches. to warm up the caches.

Value prediction/speculation is another exampleValue prediction/speculation is another example

Page 3: Compiler Speculative Optimizations Wei Hsu 7/05/2006

Compiler Controlled Compiler Controlled SpeculationSpeculation SpeculationSpeculation is one of the most important is one of the most important

methodsmethods for for finding and exploiting ILP.finding and exploiting ILP. Allows the execution to exploit statistical ILP Allows the execution to exploit statistical ILP

((e.g. a branch is taken 90% of time, or e.g. a branch is taken 90% of time, or

the address of pointer *p is different from the address the address of pointer *p is different from the address

of pointer *q most of the timeof pointer *q most of the time)) To overcome two most common constraints To overcome two most common constraints

for instruction scheduling (and other for instruction scheduling (and other optimizations)optimizations) Control dependenceControl dependence Memory dependenceMemory dependence

Page 4: Compiler Speculative Optimizations Wei Hsu 7/05/2006

Compiler Controlled SpeculationCompiler Controlled Speculation (cont.) (cont.) Allows compiler to issue operation early Allows compiler to issue operation early

before a dependencybefore a dependency Removes latency of operation from the critical Removes latency of operation from the critical

pathpath Helps hiding long latency memory operationsHelps hiding long latency memory operations Control SpeculationControl Speculation

– the execution of an operation before the branch which the execution of an operation before the branch which guards itguards it

Data SpeculationData Speculation – which is the execution of a memory load prior to a which is the execution of a memory load prior to a

preceding store which may alias with itpreceding store which may alias with it

Page 5: Compiler Speculative Optimizations Wei Hsu 7/05/2006

Control Control SpeculationSpeculation ExampleExample

lw $r1, 0($r2)add $r3, $r1,$r4lw $r5,4($r3)sw $r5,4($sp)

If (cond){ A=p[i]->b;}

lw $r6,… sub $r3, $r6…bne …

In this block, there is no room to schedule the load !!

Why not moving the load instruction into the previous block?

Page 6: Compiler Speculative Optimizations Wei Hsu 7/05/2006

Control Control SpeculationSpeculation ExampleExample

add $r3, $r1,$r4lw $r5,4($r3)sw $r5,4($sp)

If (cond){ A=p[i]->b;}

sub $r3, $r6…bne …

1) Is the cond most likely to be true?profile feedback may guide the optimization

2) What if the address of p is bad, and cause memory fault?can we have a special load instruction that ignores

memory faults?

lw $r1, 0($r2)

lw $r6,…

Page 7: Compiler Speculative Optimizations Wei Hsu 7/05/2006

Control Control SpeculationSpeculation ExampleExample

add $r3, $r1,$r4lw $r5,4($r3)sw $r5,4($sp)

If (cond){ A=p[i]->b;}

lw $r6,… lw $r1, 0($r2)sub $r3, $r6…bne …

What if the address of p is bad, and cause a memory fault?can we have a special load instruction that ignores

memory faults?

Fault!!

Coredump

Page 8: Compiler Speculative Optimizations Wei Hsu 7/05/2006

Control Control SpeculationSpeculation ExampleExample

add $r3, $r1,$r4lw $r5,4($r3)sw $r5,4($sp)

If (cond){ A=p[i]->b;}

lw $r6,… lw.s $r1, 0($r2)sub $r3, $r6…bne …

What if the address of p is bad, and cause memory fault?can we have a special load instruction that ignores

memory faults?

Make thisspecial inst, soit never faults!!

For example, Sparc supports non-faulting load instructions that can ignore segmentation faults…

Page 9: Compiler Speculative Optimizations Wei Hsu 7/05/2006

Architecture Supports in SparcV9 SparcV9 provides non-faulting loads (similar to SparcV9 provides non-faulting loads (similar to

silent loadssilent loads used in Multiflow’s Trace and used in Multiflow’s Trace and Cydrome’s Cydra-5 computers).Cydrome’s Cydra-5 computers).

Nonfaulting loads execute as any other loads except that, segmentation fault conditions do not cause program termination.

To minimize page faults when a speculative load references a Null pointer (address zero), it is desirable to map low addresses (especially address zero) to a page with special attribute.

Non-faulting loads are often used in data prefetching, but are not for general code motions.

Page 10: Compiler Speculative Optimizations Wei Hsu 7/05/2006

Using non-faulting loads for prefetching

While (j < k)

{

i=Index[j][1];

x=array[i];

y=x+…

j+=m;

}

While (j < k)

{

load $r1, index[j][1];

load $r2, array($r1)

add $r3, $r2,$r4

….

}

May incur cache misses on each iteration

While (j < k)

{

load $r1, index[j][1];

load $r5, index[j+m][1];

load $r2, array($r1)

prefetch array($r5)

add $r3, $r2,$r4

….

} load $r5 may fault !!

Page 11: Compiler Speculative Optimizations Wei Hsu 7/05/2006

Using non-faulting loads for prefetching

While (j < k)

{

i=Index[j][1];

x=array[i];

y=x+…

j+=m;

}

While (j < k)

{

load $r1, index[j][1];

load $r2, array($r1)

add $r3, $r2,$r4

….

}

May incur cache misses on each iteration

While (j < k)

{

load $r1, index[j][1];

nf-ld $r5, index[j+m][1];

load $r2, array($r1)

prefetch array($r5)

add $r3, $r2,$r4

….

} load $r5 may fault !!

Page 12: Compiler Speculative Optimizations Wei Hsu 7/05/2006

Non-faulting Non-faulting LoadsLoads

InsufficientInsufficient

add $r3, $r1,$r4lw $r5,4($r3)sw $r5,4($sp)beq …

If (cond){ A=p[i]->b;}

lw $r6,… lw.s $r1, 0($r2)sub $r3, $r6…beq …

What if the address of p is bad, and cause memory fault?can we have a special load instruction that ignores

memory faults?But what if the real load of p cause a memory fault?We cannot just ignore it!!

Page 13: Compiler Speculative Optimizations Wei Hsu 7/05/2006

check.s $r1add $r3, $r1,$r4lw $r5,4($r3)sw $r5,4($sp)

If (cond){ A=p[i]->b;}

lw $r6,… lw.s $r1, 0($r2)sub $r3, $r6…bne …

What if the address of p is bad, and cause memory fault?can we have a special load instruction that ignores

memory faults?But what if the real load of p cause a memory fault?We cannot just ignore it!!Let’s remember the fault status, and check when the loadeddata is actually used

Page 14: Compiler Speculative Optimizations Wei Hsu 7/05/2006

check.s $r1, recoveryadd $r3, $r1,$r4lw $r5,4($r3)sw $r5,4($sp)

…recovery:

lw $r1, 0($r2)

If (cond){ A=p[i]->b;}

lw $r6,… lw.s $r1, 0($r2)sub $r3, $r6…bne …

Recovery Recovery CodeCode

Page 15: Compiler Speculative Optimizations Wei Hsu 7/05/2006

check.s $r3, recoverylw $r5,4($r3)sw $r5,4($sp)

…recovery:

lw $r1, 0($r2)add $r3, $r1,$r4

If (cond){ A=p[i]->b;}

lw $r6,… lw.s $r1, 0($r2)sub $r3, $r6…add $r3, $r1,$r4 bne …

Recovery Recovery CodeCode

All instructions thatare data dependenton the speculative load and movedwith it must go to the recovery block

Page 16: Compiler Speculative Optimizations Wei Hsu 7/05/2006

Architecture Supports in IA64control speculation

• original:

(p1) br.cond

ld8 r1 = [ r2 ]

• transformed:

ld8.s r1 = [ r2 ]

. . .

(p1) br.cond

. . .

chk.s r1, recovery

Controldependence

Page 17: Compiler Speculative Optimizations Wei Hsu 7/05/2006

Data Data SpeculationSpeculation ExampleExample

lw $r3,4($sp)sw $r3, 0($r1)lw $r5,0($r2)addi $r6,$r5,1sw $r6,8($sp)

{*p = a;b= *q + 1;}

In this block, there is no room to schedule the load !!

How can we move the load instruction ahead of the store?$r2 and $r1 may be different most of the time, but could possibly be the same.

Page 18: Compiler Speculative Optimizations Wei Hsu 7/05/2006

Data Data SpeculationSpeculation ExampleExample

sw $r3, 0($r1)

addi $r6,$r5,1sw $r6,8($sp)

{*p = a;b= *q + 1;}

lw $r3,4($sp)

lw $r5,0($r2)

Page 19: Compiler Speculative Optimizations Wei Hsu 7/05/2006

Data Data SpeculationSpeculation ExampleExample

sw $r3, 0($r1)If (r1==r2) copy $r5,$r3addi $r6,$r5,1sw $r6,8($sp)

{*p = a;b= *q + 1;}

lw $r3,4($sp)

lw $r5,0($r2)

What if there are m loads moving above n stores?m x n comparisons must be generated !!So some HW/AR supports are needed

Page 20: Compiler Speculative Optimizations Wei Hsu 7/05/2006

Architecture Supports in IA64

Data Speculation

•original:

st4 [ r3 ] = r7

ld8 r1 = [ r2 ]

•transformed:

ld8.a r1 = [ r2 ]

. . .

st4 [ r3 ] = r7

. . .

chk.a r1, recovery

Memorydependence

Page 21: Compiler Speculative Optimizations Wei Hsu 7/05/2006

ALAT (Advance Load Address Table)

Data Speculation

•original:

st4 [ r3 ] = r7

ld8 r1 = [ r2 ]

•transformed:

ld8.a r1 = [ r2 ]

. . .

st4 [ r3 ] = r7

. . .

chk.a r1, recovery

r1 0x1ab0

Assume (r2)=0x00001ab0

Page 22: Compiler Speculative Optimizations Wei Hsu 7/05/2006

ALAT (Advance Load Address Table)

Data Speculation

•original:

st4 [ r3 ] = r7

ld8 r1 = [ r2 ]

•transformed:

ld8.a r1 = [ r2 ]

. . .

st4 [ r3 ] = r7

. . .

chk.a r1, recovery

r1 0x1ab0

Assume (r3)= 0x0000111a

There is no match in the ALAT table. No changeto ALAT.

chk.a find entry r1 in ALATIt turns into a NOP

Page 23: Compiler Speculative Optimizations Wei Hsu 7/05/2006

ALAT (Advance Load Address Table)

Data Speculation

•original:

st4 [ r3 ] = r7

ld8 r1 = [ r2 ]

•transformed:

ld8.a r1 = [ r2 ]

. . .

st4 [ r3 ] = r7

. . .

chk.a r1, recovery

r1 0x1ab0

Assume (r3)= 0x00001ab0

There is a match in the ALAT table. The r1 entrywill be removed

chk.a find no entry ofr1 in ALAT, check failed,branch to recovery routine

Page 24: Compiler Speculative Optimizations Wei Hsu 7/05/2006

More Cases for Data SpeculationMore Cases for Data Speculation

Many high performance architectural Many high performance architectural features are not effectively exploited by features are not effectively exploited by compilers due to imprecise analysis.compilers due to imprecise analysis.

Examples:Examples: Automatic vectorization / parallelizationAutomatic vectorization / parallelization Local memory allocation / assignmentLocal memory allocation / assignment Register allocationRegister allocation … …

Page 25: Compiler Speculative Optimizations Wei Hsu 7/05/2006

ExamplesExamples

VectorizationVectorizationloop (k=1; k<n; k++)loop (k=1; k<n; k++)

a[k] = a[j] * b[k];a[k] = a[j] * b[k];endend

Register AllocationRegister Allocation= a->b;= a->b;

*p = …*p = …= a->b; = a->b;

What if a,bare pointers?What if j == k?

Can we allocatea->b to a register? Could *p modifya->b? or a ?

Page 26: Compiler Speculative Optimizations Wei Hsu 7/05/2006

Complete alias and dependence analysis Complete alias and dependence analysis are costly and difficultare costly and difficult need Inter-procedural analysisneed Inter-procedural analysis hard to handle dynamic allocated memory hard to handle dynamic allocated memory

objectsobjects runtime disambiguation is expensiveruntime disambiguation is expensive

But … true memory dependence rarely But … true memory dependence rarely happen!!happen!!

Page 27: Compiler Speculative Optimizations Wei Hsu 7/05/2006

Static and Dynamic DependencesStatic and Dynamic Dependences

0%

50%

100%am

mpart

bzip2

crafty eo

neq

uake ga

p gcc

gzip

mesa

mcf

parse

rpe

rlbmk tw

olfvo

rtex vp

rav

g

occur > 5%

occur < 5%

never occurat runtime

Most ambiguous data dependences identified by compiler do not Most ambiguous data dependences identified by compiler do not occur at runtimeoccur at runtime

Page 28: Compiler Speculative Optimizations Wei Hsu 7/05/2006

Speculation can compensate for imprecise Speculation can compensate for imprecise alias information alias information ifif speculation failure can speculation failure can be efficiently detected and recoveredbe efficiently detected and recovered

Can we effectively use hardware supports Can we effectively use hardware supports to speculatively promote memory to speculatively promote memory references to registers? references to registers?

Can we speculatively vectorize or Can we speculatively vectorize or parallelize loops? parallelize loops?

Page 29: Compiler Speculative Optimizations Wei Hsu 7/05/2006

Motivation ExampleMotivation Example

… = *p*q = ..… = *p

Original program

ld r32=[r31]*q = …ld r32=[r31]… = r32

Traditional compiler code

Page 30: Compiler Speculative Optimizations Wei Hsu 7/05/2006

Another ExampleAnother Example

if (p->s1->s1->x1){ …. *ip = 0; p->s1->s1->x1++; ….}

Original program

ld8 r14=[r32]adds r14=8,r14ld8 r14 = [r14]ld4 r14 = [r14]cmp4 p6,p7=0,r14(p6) br….st [r16] = r0ld8 r14=[r32]adds r14=8,r14ld8 r15 = [r14]ld4 r14 = [r15]adds r14=1,r14st4 [r15] = r14Traditional compiled code

Page 31: Compiler Speculative Optimizations Wei Hsu 7/05/2006

Our approach at UMOur approach at UM

Use Use alias profilealias profile or or compiler heuristicscompiler heuristics to obtain to obtain approximated alias informationapproximated alias information

Use data speculation to verify such alias Use data speculation to verify such alias information at run timeinformation at run time

Use the Advance Load Address Table (ALAT) in Use the Advance Load Address Table (ALAT) in IA64 for the necessary support of data speculationIA64 for the necessary support of data speculation

Page 32: Compiler Speculative Optimizations Wei Hsu 7/05/2006

Background of ALAT in IA64Background of ALAT in IA64

;;

Page 33: Compiler Speculative Optimizations Wei Hsu 7/05/2006

Speculative Register PromotionSpeculative Register Promotion Use ld.a for the first loadUse ld.a for the first load Check the subsequent loadsCheck the subsequent loads

– Scheme 1: use ld.c for subsequent reads to the Scheme 1: use ld.c for subsequent reads to the same reference.same reference.

– Scheme 2: use chk.a for subsequent reads. Scheme 2: use chk.a for subsequent reads. This allows promotion of multi-level pointer This allows promotion of multi-level pointer variables. variables. ((e.g. if a->b->c is speculatively promoted to a e.g. if a->b->c is speculatively promoted to a register, but a is aliased and modified, then the register, but a is aliased and modified, then the recovery code to reload a, a->b and a->b->c must be recovery code to reload a, a->b and a->b->c must be executedexecuted))

Page 34: Compiler Speculative Optimizations Wei Hsu 7/05/2006

ExamplesExamples

=*p+1;

*q=…

=*p+3;

ld.a r1=[p]

add r3=r1,1

*q = ….

ld.c r1=[p]

add r4=r1, 3

a. read after read

*p= ;

*q =….

…=*p+3;

st [p]=r1

ld.a r1=[p]

*q = ….

ld.c r1=[p]

add r4=r1, 3

b. read after write

=*p;

*q = …

=*p;

*q = …

=*p;

ld.a r1=[p]

*q = …

ld.c.nc r1=[p]

*q = …

ld.c.clr r1=[p]

c. multiple redundant loads

Page 35: Compiler Speculative Optimizations Wei Hsu 7/05/2006

Compiler Support for Compiler Support for Speculative Register PromotionSpeculative Register Promotion Enhanced SSA form with the notion of data Enhanced SSA form with the notion of data

speculationspeculation SSA form for indirect memory referenceSSA form for indirect memory reference

operator : MayModoperator : MayMod operator : MayUseoperator : MayUse

Speculative SSA formSpeculative SSA form ss operator: the variable in operator: the variable in ss is unlikely to be is unlikely to be

updated by the corresponding definition statementupdated by the corresponding definition statement ss operator: the variable in operator: the variable in ss is unlikely to be is unlikely to be

referenced by the indirect referencereferenced by the indirect reference

Page 36: Compiler Speculative Optimizations Wei Hsu 7/05/2006

Speculative SSA Form According Speculative SSA Form According To Alias ProfilingTo Alias Profiling

*p =

b2 (b1)

a2 (a1)

v2 (v1)

(b1)

(a1)

(v1)

= *p

The two examples assume that the points-to set of p generated by compiler is {a, b}, the points-to set of p obtained from alias profiling is {b}. v is the virtual variable for *p. aj stands for version j of variable a.

s

s

s

s

Page 37: Compiler Speculative Optimizations Wei Hsu 7/05/2006

Overview of Speculative Register Overview of Speculative Register Promotion*Promotion*

Phi insertionPhi insertion RenameRename Down_safetyDown_safety Will_be_availableWill_be_available FinalizeFinalize Code motionCode motion

* Based on SSAPRE * Based on SSAPRE [Kennedy, [Kennedy, et.al.et.al. ACM TOLPAS ‘99] ACM TOLPAS ‘99]

Page 38: Compiler Speculative Optimizations Wei Hsu 7/05/2006

Enhanced RenameEnhanced Rename

… = a1

*p1 = …

v2 (v1)

a2 (a1)

b2 (b1)

… = a2

 a) Traditional Renaming

… = a1

*p1 = …

v2 (v1)

a2 s (a1)

b2 (b1)

… = a1 <speculative>

(b) Speculative Renaming

The target set of *p generated by the compiler is {a, b} and v is the virtual variable for *p. The target set of *p generated by the alias profiling is {b}.

Page 39: Compiler Speculative Optimizations Wei Hsu 7/05/2006

Example of Speculative Code Example of Speculative Code MotionMotion

… = a1

*p1 = …

v4 s (v3)

a2 s (a1)

b4 (b3)

… = a1 <speculative>

(a) Before Code Motion

t1 = a1 (ld.a)

*p1 = …

v4 (v3)

a2 s (a1)

b4 (b3)

t4 = a1 (ld.c)

(b) Final Output

Page 40: Compiler Speculative Optimizations Wei Hsu 7/05/2006

Implementation

Open Research Compiler v1.1Open Research Compiler v1.1 BenchmarkBenchmark

– Spec2000 C programsSpec2000 C programs PlatformPlatform

– HP i2000, 733 MHz Itanium processor, 1GB HP i2000, 733 MHz Itanium processor, 1GB SDRAMSDRAM

– Redhat Linux v7.1Redhat Linux v7.1 Pfmon v1.1Pfmon v1.1

Page 41: Compiler Speculative Optimizations Wei Hsu 7/05/2006

Example from EquakeCall site: smvp(Call site: smvp(,,,, disp[dispt], disp[disptplus]);,,,, disp[dispt], disp[disptplus]);

void smvp(int nodes, double ***A, int *Acol, int *Aindex, void smvp(int nodes, double ***A, int *Acol, int *Aindex, double **v, double **w) { . . .double **v, double **w) { . . .

for (i = 0; i < nodes; i++) { . . .for (i = 0; i < nodes; i++) { . . .

while (Anext < Alast) {while (Anext < Alast) { col = Acol[Anext]; col = Acol[Anext]; sum0 += sum0 += A[Anext][0][0]A[Anext][0][0] *… *… sum1+=sum1+= A[Anext][1][1] A[Anext][1][1] *…*… sum2+=sum2+= A[Anext][2][2] A[Anext][2][2] *…*… w[col][0] += w[col][0] += A[Anext][0][0A[Anext][0][0]*v[i][0] + …]*v[i][0] + … w[col][1] += w[col][1] += A[Anext][1][1]A[Anext][1][1]*v[i][1] + …*v[i][1] + … w[col][2] += w[col][2] += A[Anext][2][2]A[Anext][2][2]*v[i][2] + …*v[i][2] + … Anext++;Anext++; }}}}}}

A[][][] and v[][] arenot promoted to registersdue to possible alias withw[][].

Page 42: Compiler Speculative Optimizations Wei Hsu 7/05/2006

Example from EquakeCall site: smvp(Call site: smvp(,,,, disp[dispt], disp[disptplus]);,,,, disp[dispt], disp[disptplus]);

void smvp(int nodes, double ***A, int *Acol, int *Aindex, void smvp(int nodes, double ***A, int *Acol, int *Aindex, double **v, double **w) { . . .double **v, double **w) { . . .

for (i = 0; i < nodes; i++) { . . .for (i = 0; i < nodes; i++) { . . .

while (Anext < Alast) {while (Anext < Alast) { col = Acol[Anext]; col = Acol[Anext]; sum0 += sum0 += A[Anext][0][0]A[Anext][0][0] *… *… sum1+=sum1+= A[Anext][1][1] A[Anext][1][1] *…*… sum2+=sum2+= A[Anext][2][2] A[Anext][2][2] *…*… w[col][0] += w[col][0] += A[Anext][0][0A[Anext][0][0]*v[i][0] + …]*v[i][0] + … w[col][1] += w[col][1] += A[Anext][1][1]A[Anext][1][1]*v[i][1] + …*v[i][1] + … w[col][2] += w[col][2] += A[Anext][2][2]A[Anext][2][2]*v[i][2] + …*v[i][2] + … Anext++;Anext++; }}}}}}

Promoting A[][][] and v[][] to registers using ALAT improves this Procedure by 10%

Page 43: Compiler Speculative Optimizations Wei Hsu 7/05/2006

Example from EquakeCall site: smvp(Call site: smvp(,,,, disp[dispt], disp[disptplus]);,,,, disp[dispt], disp[disptplus]);

void smvp(int nodes, double ***A, int *Acol, int *Aindex, void smvp(int nodes, double ***A, int *Acol, int *Aindex, double **v, double **w) { . . .double **v, double **w) { . . .

for (i = 0; i < nodes; i++) { . . .for (i = 0; i < nodes; i++) { . . .

while (Anext < Alast) {while (Anext < Alast) { col = Acol[Anext]; col = Acol[Anext]; sum0 += sum0 += A[Anext][0][0]A[Anext][0][0] *… *… sum1+=sum1+= A[Anext][1][1] A[Anext][1][1] *…*… sum2+=sum2+= A[Anext][2][2] A[Anext][2][2] *…*… w[col][0] += w[col][0] += A[Anext][0][0A[Anext][0][0]*v[i][0] + …]*v[i][0] + … w[col][1] += w[col][1] += A[Anext][1][1]A[Anext][1][1]*v[i][1] + …*v[i][1] + … w[col][2] += w[col][2] += A[Anext][2][2]A[Anext][2][2]*v[i][2] + …*v[i][2] + … Anext++;Anext++; }}}}}}

Using heuristic rules, ourcompiler can promote both***A and **v to registers.But using alias profile, ourcompiler fails to promote **v, because at the call sitev and w are passed with the same array name.

Page 44: Compiler Speculative Optimizations Wei Hsu 7/05/2006

Performance Improvement of Performance Improvement of Speculative Register PromotionSpeculative Register Promotion

-2.0%

0.0%2.0%

4.0%6.0%

8.0%

10.0%12.0%

14.0%

Imp

rovem

en

t p

ercen

tag

e

cpu cycle

data access cycle

loads retired

Page 45: Compiler Speculative Optimizations Wei Hsu 7/05/2006

Effectiveness of Speculative Effectiveness of Speculative Register Promotion Register Promotion

0.0%5.0%

10.0%15.0%20.0%25.0%

Rat

io ldc-chk as a percentage ofretired loads

mis-speculation rate

Page 46: Compiler Speculative Optimizations Wei Hsu 7/05/2006

Performance Improvement of Performance Improvement of Speculative Register Promotion Speculative Register Promotion

based on Heuristic Rulesbased on Heuristic Rules

0.0%2.0%4.0%6.0%8.0%

10.0%12.0%14.0%16.0%18.0%20.0%

Impr

ovem

ent p

erce

ntag

e

cpu cycle

data access cycle

loads retired

Page 47: Compiler Speculative Optimizations Wei Hsu 7/05/2006

Performance Improvement of Performance Improvement of Speculative Register Promotion Speculative Register Promotion

on Itanium-2on Itanium-2

-5.0%

0.0%

5.0%

10.0%

15.0%

20.0%

25.0%

30.0%

Imp

rove

men

t p

erce

nta

ge

cpu cycles (profiled base)

cpu cycles (heuristic based)

loads retired (profile based)

loads retired (heuristic based)

Page 48: Compiler Speculative Optimizations Wei Hsu 7/05/2006

Advantages of UsingAdvantages of Using Heuristic Rules Heuristic Rules

Full coverage. Full coverage. Input-insensitive. Input-insensitive. Efficient.Efficient. Scalable. Scalable.

Page 49: Compiler Speculative Optimizations Wei Hsu 7/05/2006

A case for using Profiles A case for using Profiles

DO 140 L = L3,L4, 2DO 140 L = L3,L4, 2

Q(IJ(L))Q(IJ(L)) = Q(IJ(L))+W1(L)*QMLT(L) = Q(IJ(L))+W1(L)*QMLT(L)

Q(IJ(L)+1) = Q(IJ(L)+1)+W2(L)*QMLT(L)Q(IJ(L)+1) = Q(IJ(L)+1)+W2(L)*QMLT(L)

… …..

Q(IJ(L+1))=Q(IJ(L+1))=Q(IJ(L+1))Q(IJ(L+1))+W1(L+1)*QMLT(L+1) +W1(L+1)*QMLT(L+1) Q(IJ(L+1)+1)=Q(IJ(L+1)+1)+W2(L+1)*QMLT(L+1)Q(IJ(L+1)+1)=Q(IJ(L+1)+1)+W2(L+1)*QMLT(L+1)

…………

140 CONTINUE140 CONTINUEHeuristic rules think Q(IJ(L)) isdifferent from Q(IJ(L+1)), butthey are actually identical since IJ() is often sorted. e.g. 1,1,2,2,2,5,5,6,6,6,6,9,9,9