automatic generation of peephole superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf ·...
TRANSCRIPT
![Page 1: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/1.jpg)
Automatic Generation
of Peephole Superoptimizers
Sorav Bansal and Alex Aiken
Stanford University
{sbansal, aiken}� s.stanford.edu
Produced with LATEX seminar style & PSTricks 1
![Page 2: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/2.jpg)
2
![Page 3: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/3.jpg)
2-A
![Page 4: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/4.jpg)
SUPEROPTIMIZATION
SUPEROPTIMIZATION 3
![Page 5: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/5.jpg)
SUPEROPTIMIZATION
Classical Meaning: To find the optimal code sequence for a
single, loop-free assembly sequence of instructions
TargetSequence
ArchitectureSpecification
SUPEROPTIMIZATION 3-A
![Page 6: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/6.jpg)
SUPEROPTIMIZATION
Classical Meaning: To find the optimal code sequence for a
single, loop-free assembly sequence of instructions
TargetSequence
Exhaustive Search
ArchitectureSpecification
SUPEROPTIMIZATION 3-B
![Page 7: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/7.jpg)
SUPEROPTIMIZATION
Classical Meaning: To find the optimal code sequence for a
single, loop-free assembly sequence of instructions
TargetSequence
Exhaustive Search
OptimalSequence
ArchitectureSpecification
SUPEROPTIMIZATION 3-C
![Page 8: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/8.jpg)
SUPEROPTIMIZATION
Classical Meaning: To find the optimal code sequence for a
single, loop-free assembly sequence of instructions
Common Point-of-View
➜ Superoptimization is Sloowww..
➜ Superoptimization is useful only for occasional optimization of
the critical inner loop
SUPEROPTIMIZATION 4
![Page 9: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/9.jpg)
SUPEROPTIMIZATION
movl %eax , %ecx
in %al ,shll %eax ,leal (%ecx), %eax
push (%ebx),movl %ecx , (%ebx)
shll (%ebx),addl $9 , (%ebx)
call func ,mov %eax , %ecx
shll %al ,out %al ,push (%ebx),mov %ecx , (%ebx)
push %eax ,mov %eax , %ecx
a.out
SUPEROPTIMIZATION 5
![Page 10: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/10.jpg)
SUPEROPTIMIZATION
movl %eax , %ecx
in %al ,shll %eax ,leal (%ecx), %eax
push (%ebx),movl %ecx , (%ebx)
shll (%ebx),addl $9 , (%ebx)
call func ,mov %eax , %ecx
shll %al ,out %al ,push (%ebx),mov %ecx , (%ebx)
push %eax ,mov %eax , %ecx
a.out
Exhaustive Search
SUPEROPTIMIZATION 5-A
![Page 11: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/11.jpg)
SUPEROPTIMIZATION
movl %eax , %ecx
in %al ,shll %eax ,leal (%ecx), %eax
push (%ebx),movl %ecx , (%ebx)
shll (%ebx),addl $9 , (%ebx)
call func ,mov %eax , %ecx
shll %al ,out %al ,push (%ebx),mov %ecx , (%ebx)
push %eax ,mov %eax , %ecx
a.out
Exhaustive Search
Optimization Tabletarget optimal
1. movl %ecx (%ebx) leal 9(%ecx) %ebx
shll (%ebx) movl %ebx (%ebx)
addl $9 (%ebx)
SUPEROPTIMIZATION 5-B
![Page 12: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/12.jpg)
SUPEROPTIMIZATION
movl %eax , %ecx
in %al ,shll %eax ,leal (%ecx), %eax
push (%ebx),movl %ecx , (%ebx)
shll (%ebx),addl $9 , (%ebx)
call func ,mov %eax , %ecx
shll %al ,out %al ,push (%ebx),mov %ecx , (%ebx)
push %eax ,mov %eax , %ecx
a.out
Exhaustive Search
Optimization Tabletarget optimal
1. movl %ecx (%ebx) leal 9(%ecx) %ebx
shll (%ebx) movl %ebx (%ebx)
addl $9 (%ebx)
Optimization Tabletarget optimal
1. movl %ecx (%ebx) leal 9(%ecx) %ebx
shll (%ebx) movl %ebx (%ebx)
addl $9 (%ebx)
2. shll %eax movl %ecx %eax
leal (%ecx) (%eax)
SUPEROPTIMIZATION 5-C
![Page 13: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/13.jpg)
SUPEROPTIMIZATION
movl %eax , %ecx
in %al ,shll %eax ,leal (%ecx), %eax
push (%ebx),movl %ecx , (%ebx)
shll (%ebx),addl $9 , (%ebx)
call func ,mov %eax , %ecx
shll %al ,out %al ,push (%ebx),mov %ecx , (%ebx)
push %eax ,mov %eax , %ecx
a.out
mov %eax , %ecx
in %al ,shll %eax ,leal (%ecx), %eax
push (%ebx),movl %ecx , %eax
shll %ecx ,movl %ecx , %eax
call func ,mov %eax , %ecx
shll %al ,out %al ,push (%ebx),mov %ecx , (%ebx)
push %eax ,mov %eax , %ecx
b.out
Exhaustive Search
Optimization Tabletarget optimal
1. movl %ecx (%ebx) leal 9(%ecx) %ebx
shll (%ebx) movl %ebx (%ebx)
addl $9 (%ebx)
Optimization Tabletarget optimal
1. movl %ecx (%ebx) leal 9(%ecx) %ebx
shll (%ebx) movl %ebx (%ebx)
addl $9 (%ebx)
2. shll %eax movl %ecx %eax
leal (%ecx) (%eax)
Optimization Tabletarget optimal
1. movl %ecx (%ebx) leal 9(%ecx) %ebx
shll (%ebx) movl %ebx (%ebx)
addl $9 (%ebx)
2. shll %eax movl %ecx %eax
leal (%ecx) (%eax)
3. movl %ecx %eax shll %ecx
shll %ecx movl %ecx %eax
movl %ecx %eax
SUPEROPTIMIZATION 5-D
![Page 14: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/14.jpg)
SUPEROPTIMIZATION
movl %eax , %ecx
in %al ,shll %eax ,leal (%ecx), %eax
push (%ebx),movl %ecx , (%ebx)
shll (%ebx),addl $9 , (%ebx)
call func ,mov %eax , %ecx
shll %al ,out %al ,push (%ebx),mov %ecx , (%ebx)
push %eax ,mov %eax , %ecx
a.out
mov %eax , %ecx
in %al ,shll %eax ,leal (%ecx), %eax
push (%ebx),movl %ecx , %eax
shll %ecx ,movl %ecx , %eax
call func ,mov %eax , %ecx
shll %al ,out %al ,push (%ebx),mov %ecx , (%ebx)
push %eax ,mov %eax , %ecx
b.out
Exhaustive Search
Optimization Tabletarget optimal
1. movl %ecx (%ebx) leal 9(%ecx) %ebx
shll (%ebx) movl %ebx (%ebx)
addl $9 (%ebx)
Optimization Tabletarget optimal
1. movl %ecx (%ebx) leal 9(%ecx) %ebx
shll (%ebx) movl %ebx (%ebx)
addl $9 (%ebx)
2. shll %eax movl %ecx %eax
leal (%ecx) (%eax)
Optimization Tabletarget optimal
1. movl %ecx (%ebx) leal 9(%ecx) %ebx
shll (%ebx) movl %ebx (%ebx)
addl $9 (%ebx)
2. shll %eax movl %ecx %eax
leal (%ecx) (%eax)
3. movl %ecx %eax shll %ecx
shll %ecx movl %ecx %eax
movl %ecx %eax
SUPEROPTIMIZATION 5-E
![Page 15: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/15.jpg)
SUPEROPTIMIZATION
movl %eax , %ecx
in %al ,shll %eax ,leal (%ecx), %eax
push (%ebx),movl %ecx , (%ebx)
shll (%ebx),addl $9 , (%ebx)
call func ,mov %eax , %ecx
shll %al ,out %al ,push (%ebx),mov %ecx , (%ebx)
push %eax ,mov %eax , %ecx
a.out
mov %eax , %ecx
in %al ,shll %eax ,leal (%ecx), %eax
push (%ebx),movl %ecx , %eax
shll %ecx ,movl %ecx , %eax
call func ,mov %eax , %ecx
shll %al ,out %al ,push (%ebx),mov %ecx , (%ebx)
push %eax ,mov %eax , %ecx
b.out
Exhaustive Search
Optimization Tabletarget optimal
1. movl %ecx (%ebx) leal 9(%ecx) %ebx
shll (%ebx) movl %ebx (%ebx)
addl $9 (%ebx)
Optimization Tabletarget optimal
1. movl %ecx (%ebx) leal 9(%ecx) %ebx
shll (%ebx) movl %ebx (%ebx)
addl $9 (%ebx)
2. shll %eax movl %ecx %eax
leal (%ecx) (%eax)
Optimization Tabletarget optimal
1. movl %ecx (%ebx) leal 9(%ecx) %ebx
shll (%ebx) movl %ebx (%ebx)
addl $9 (%ebx)
2. shll %eax movl %ecx %eax
leal (%ecx) (%eax)
3. movl %ecx %eax shll %ecx
shll %ecx movl %ecx %eax
movl %ecx %eax
Optimization Database
SUPEROPTIMIZATION 5-F
![Page 16: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/16.jpg)
SUPEROPTIMIZATION
movl %eax , %ecx
in %al ,shll %eax ,leal (%ecx), %eax
push (%ebx),movl %ecx , (%ebx)
shll (%ebx),addl $9 , (%ebx)
call func ,mov %eax , %ecx
shll %al ,out %al ,push (%ebx),mov %ecx , (%ebx)
push %eax ,mov %eax , %ecx
a.out
mov %eax , %ecx
in %al ,shll %eax ,leal (%ecx), %eax
push (%ebx),movl %ecx , %eax
shll %ecx ,movl %ecx , %eax
call func ,mov %eax , %ecx
shll %al ,out %al ,push (%ebx),mov %ecx , (%ebx)
push %eax ,mov %eax , %ecx
b.out
Exhaustive Search
Optimization Tabletarget optimal
1. movl %ecx (%ebx) leal 9(%ecx) %ebx
shll (%ebx) movl %ebx (%ebx)
addl $9 (%ebx)
Optimization Tabletarget optimal
1. movl %ecx (%ebx) leal 9(%ecx) %ebx
shll (%ebx) movl %ebx (%ebx)
addl $9 (%ebx)
2. shll %eax movl %ecx %eax
leal (%ecx) (%eax)
Optimization Tabletarget optimal
1. movl %ecx (%ebx) leal 9(%ecx) %ebx
shll (%ebx) movl %ebx (%ebx)
addl $9 (%ebx)
2. shll %eax movl %ecx %eax
leal (%ecx) (%eax)
3. movl %ecx %eax shll %ecx
shll %ecx movl %ecx %eax
movl %ecx %eax
Optimization Database
• To what length of instruction sequences can we scale?
SUPEROPTIMIZATION 5-G
![Page 17: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/17.jpg)
SUPEROPTIMIZATION
movl %eax , %ecx
in %al ,shll %eax ,leal (%ecx), %eax
push (%ebx),movl %ecx , (%ebx)
shll (%ebx),addl $9 , (%ebx)
call func ,mov %eax , %ecx
shll %al ,out %al ,push (%ebx),mov %ecx , (%ebx)
push %eax ,mov %eax , %ecx
a.out
mov %eax , %ecx
in %al ,shll %eax ,leal (%ecx), %eax
push (%ebx),movl %ecx , %eax
shll %ecx ,movl %ecx , %eax
call func ,mov %eax , %ecx
shll %al ,out %al ,push (%ebx),mov %ecx , (%ebx)
push %eax ,mov %eax , %ecx
b.out
Exhaustive Search
Optimization Tabletarget optimal
1. movl %ecx (%ebx) leal 9(%ecx) %ebx
shll (%ebx) movl %ebx (%ebx)
addl $9 (%ebx)
Optimization Tabletarget optimal
1. movl %ecx (%ebx) leal 9(%ecx) %ebx
shll (%ebx) movl %ebx (%ebx)
addl $9 (%ebx)
2. shll %eax movl %ecx %eax
leal (%ecx) (%eax)
Optimization Tabletarget optimal
1. movl %ecx (%ebx) leal 9(%ecx) %ebx
shll (%ebx) movl %ebx (%ebx)
addl $9 (%ebx)
2. shll %eax movl %ecx %eax
leal (%ecx) (%eax)
3. movl %ecx %eax shll %ecx
shll %ecx movl %ecx %eax
movl %ecx %eax
Optimization Database
• To what length of instruction sequences can we scale?
• Can the database be made large enough to capture
most “traditional” optimizations?
SUPEROPTIMIZATION 5-H
![Page 18: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/18.jpg)
Traversing a Link-List
struct node{
int val;struct node *next;
};
void traverse (struct node *head){
while (head){
head->value *= 2;head = head->next;
}}
SUPEROPTIMIZATION 6
![Page 19: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/19.jpg)
Traversing a Link-List
struct node{
int val;struct node *next;
};
void traverse (struct node *head){
while (head){
head->value *= 2;head = head->next;
}}
SUPEROPTIMIZATION 6-A
![Page 20: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/20.jpg)
Traversing a Link List
1: movl head , %edx2: movl head , %eax3: movl %(eax), %eax4: sall %eax5: movl %eax, %(edx)6: movl head , %eax7: movl 4(%eax), %eax8: movl %eax, head9: mpl $0, headNaive assembly odegenerated by g
1: movl head, %edx2: movl %edx, %eax3: movl %(eax), %eax4: sall %eax5: movl %eax, %(edx)6: movl head, %eax7: movl 4(%eax), %eax8: movl %eax, head9: mpl $0, headStep One
SUPEROPTIMIZATION 7
![Page 21: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/21.jpg)
Traversing a Link List
1: movl head , %edx2: movl head , %eax3: movl %(eax), %eax4: sall %eax5: movl %eax, %(edx)6: movl head , %eax7: movl 4(%eax), %eax8: movl %eax, head9: mpl $0, headNaive assembly odegenerated by g
1: movl head, %edx2: movl %edx, %eax3: movl %(eax), %eax4: sall %eax5: movl %eax, %(edx)6: movl head, %eax7: movl 4(%eax), %eax8: movl %eax, head9: mpl $0, headStep One
SUPEROPTIMIZATION 7-A
![Page 22: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/22.jpg)
Traversing a Link List
1: movl head , %edx2: movl head , %eax3: movl %(eax), %eax4: sall %eax5: movl %eax, %(edx)6: movl head , %eax7: movl 4(%eax), %eax8: movl %eax, head9: mpl $0, headNaive assembly odegenerated by g
1: movl head, %edx2: movl %edx, %eax3: movl %(eax), %eax4: sall %eax5: movl %eax, %(edx)6: movl head, %eax7: movl 4(%eax), %eax8: movl %eax, head9: mpl $0, headStep One
Automatic Removal of Redundant Load
SUPEROPTIMIZATION 7-B
![Page 23: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/23.jpg)
Traversing a Link List
1: movl head, %edx2: movl %edx, %eax3: movl %(eax), %eax4: sall %eax5: movl %eax, %(edx)6: movl head, %eax7: movl 4(%eax), %eax8: movl %eax, head9: mpl $0, headStep One
1: movl head, %edx2: movl %edx, %eax3: movl %(eax), %eax4: sall %eax5: movl %eax, %(edx)6: movl head, %eax7: movl 4(%eax), %eax8: movl %eax, head9: mpl $0, %eaxStep Two
SUPEROPTIMIZATION 8
![Page 24: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/24.jpg)
Traversing a Link List
1: movl head, %edx2: movl %edx, %eax3: movl %(eax), %eax4: sall %eax5: movl %eax, %(edx)6: movl head, %eax7: movl 4(%eax), %eax8: movl %eax, head9: mpl $0, headStep One
1: movl head, %edx2: movl %edx, %eax3: movl %(eax), %eax4: sall %eax5: movl %eax, %(edx)6: movl head, %eax7: movl 4(%eax), %eax8: movl %eax, head9: mpl $0, %eaxStep Two
SUPEROPTIMIZATION 8-A
![Page 25: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/25.jpg)
Traversing a Link List
1: movl head, %edx2: movl %edx, %eax3: movl %(eax), %eax4: sall %eax5: movl %eax, %(edx)6: movl head, %eax7: movl 4(%eax), %eax8: movl %eax, head9: mpl $0, headStep One
1: movl head, %edx2: movl %edx, %eax3: movl %(eax), %eax4: sall %eax5: movl %eax, %(edx)6: movl head, %eax7: movl 4(%eax), %eax8: movl %eax, head9: mpl $0, %eaxStep Two
Automatic Elimination of Memory Access
SUPEROPTIMIZATION 8-B
![Page 26: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/26.jpg)
Traversing a Link List
1: movl head, %edx2: movl %edx, %eax3: movl %(eax), %eax4: sall %eax5: movl %eax, %(edx)6: movl head, %eax7: movl 4(%eax), %eax8: movl %eax, head9: mpl $0, %eaxStep Two
1: movl head, %edx2:3:4: sall (%edx)5:6: movl head, %eax7: movl 4(%eax), %eax8: movl %eax, head9: mpl $0, %eaxStep Three
Automatic Instruction Selection
SUPEROPTIMIZATION 9
![Page 27: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/27.jpg)
Traversing a Link List
1: movl head, %edx2: sall (%edx)3: movl head, %eax4: movl 4(%eax), %eax5: movl %eax, head6: mpl $0, %eax7:8:9: Step Three
1: movl head, %edx2: sall (%edx)3: movl %edx, %eax4: movl 4(%eax), %eax5: movl %eax, head6: mpl $0, %eax7:8:9: Step Four
Automatic Removal of Redundant Load
SUPEROPTIMIZATION 10
![Page 28: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/28.jpg)
Traversing a Link List
1: movl head, %edx2: sall (%edx)3: movl %edx, %eax4: movl 4(%eax), %eax5: movl %eax, head6: mpl $0, %eax7:8:9: Step Four
1: movl head, %edx2: sall (%edx)3:4: movl 4(%edx), %eax5: movl %eax, head6: mpl $0, %eax7:8:9: Step Five
Automatic Copy Propagation
SUPEROPTIMIZATION 11
![Page 29: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/29.jpg)
Traversing a Link List
1: movl head, %edx2: sall (%edx)3: movl 4(%edx), %eax4: movl %eax, head5: mpl $0, %eax6:7:8:9: Step Five
1: movl head, %edx2: sall (%edx)3: movl 4(%edx), %edx4: movl %edx, head5: mpl $0, %edx6:7:8:9: Step Six
Automatic Register Usage Optimization
SUPEROPTIMIZATION 12
![Page 30: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/30.jpg)
Traversing a Link List
1: movl head, %edx2: sall (%edx)3: movl 4(%edx), %edx4: movl %edx, head5: mpl $0, %edx6:7:8:9: Our optimizer
1: sall (%edx)2: movl 4(%edx), %edx3: mpl $0, %edx4:5:6:7:8:9: g -optimized
SUPEROPTIMIZATION 13
![Page 31: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/31.jpg)
Traversing a Link List
1: movl head, %edx2: sall (%edx)3: movl 4(%edx), %edx4: movl %edx, head5: mpl $0, %edx6:7:8:9: Our optimizer
1: sall (%edx)2: movl 4(%edx), %edx3: mpl $0, %edx4:5:6:7:8:9: g -optimized
Global Optimizations involving loop carried dependencies
cannot be handled by peephole optimizations
SUPEROPTIMIZATION 13-A
![Page 32: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/32.jpg)
Traversing a Link-List
1: movl head, %edx2: movl %edx, %eax3: movl %(eax), %eax4: sall %eax5: movl %eax, %(edx)6: movl head, %eax7: movl 4(%eax), %eax8: movl %eax, head9: cmpl $0, head
1: movl head, %edx2: movl %edx, %eax3: movl %(eax), %eax4: sall %eax5: movl %eax, %(edx)6: movl head, %eax7: movl 4(%eax), %eax8: movl %eax, head9: cmpl $0, %eax
1: movl head, %edx2: sall (%edx)3: movl head, %eax4: movl 4(%eax), %eax5: movl %eax, head6: cmpl $0, %eax7:8:9:
1: movl head, %edx2: sall (%edx)3: movl 4(%edx), %edx4: movl %edx, head5: cmpl $0, %edx6:7:8:9:
1: movl head, %edx2: sall (%edx)3: movl 4(%edx), %eax4: movl %eax, head5: cmpl $0, %eax6:7:8:9:
1: movl head, %edx2: sall (%edx)3: movl %edx, %eax4: movl 4(%eax), %eax5: movl %eax, head6: cmpl $0, %eax7:8:9:
An automatically generated “peephole” optimizer can
capture many traditional optimizations
➜ Removal of Redundant Loads
➜ Instruction Selection
➜ Copy Propagation
➜ Reducing Register Usage
SUPEROPTIMIZATION 14
![Page 33: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/33.jpg)
SUPEROPTIMIZATION 15
![Page 34: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/34.jpg)
SUPEROPTIMIZATION 16
![Page 35: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/35.jpg)
SYSTEM ARCHITECTURE
SYSTEM ARCHITECTURE 17
![Page 36: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/36.jpg)
SYSTEM ARCHITECTURE
Training
Programs
SYSTEM ARCHITECTURE 17-A
![Page 37: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/37.jpg)
SYSTEM ARCHITECTURE
Training
Programs
Harvester
SYSTEM ARCHITECTURE 17-B
![Page 38: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/38.jpg)
SYSTEM ARCHITECTURE
Training
Programs
Harvester Canonicalizer
SYSTEM ARCHITECTURE 17-C
![Page 39: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/39.jpg)
SYSTEM ARCHITECTURE
Training
Programs
Harvester Canonicalizer Fingerprinter
SYSTEM ARCHITECTURE 17-D
![Page 40: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/40.jpg)
SYSTEM ARCHITECTURE
Training
Programs
Harvester Canonicalizer Fingerprinter
Fingerprint Hashtable
SYSTEM ARCHITECTURE 17-E
![Page 41: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/41.jpg)
SYSTEM ARCHITECTURE
Training
Programs
Harvester Canonicalizer Fingerprinter
Fingerprint Hashtable
SYSTEM ARCHITECTURE 17-F
![Page 42: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/42.jpg)
SYSTEM ARCHITECTURE
Training
Programs
Harvester Canonicalizer Fingerprinter
Fingerprint Hashtable
Fingerprinter
SYSTEM ARCHITECTURE 17-G
![Page 43: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/43.jpg)
SYSTEM ARCHITECTURE
Training
Programs
Harvester Canonicalizer Fingerprinter
Fingerprint Hashtable
Fingerprinter match?
SYSTEM ARCHITECTURE 17-H
![Page 44: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/44.jpg)
SYSTEM ARCHITECTURE
Training
Programs
Harvester Canonicalizer Fingerprinter
Fingerprint Hashtable
Fingerprinter match?Boolean
Equivalence Test
SYSTEM ARCHITECTURE 17-I
![Page 45: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/45.jpg)
HARVESTER
Harvests only loop-free instruction sequences that have a
single entry point
- no middle instruction is a jump target
0: movl %esp, %ebp
1: movl 8(%ebp) %edx
2: sall %edx
3: movl %eax, %(edx)
:
: . . .
:
i : jne instruction 1
HARVESTER 18
![Page 46: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/46.jpg)
HARVESTER
Harvests only loop-free instruction sequences that have a
single entry point
- no middle instruction is a jump target
0: movl %esp, %ebp
1: movl 8(%ebp) %edx
2: sall %edx
3: movl %eax, %(edx)
:
: . . .
:
i : jne instruction 1
HARVESTER 18-A
![Page 47: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/47.jpg)
CANONICALIZER
movl %esp, %ebp
movl %ebp, %espmovl %esp, %ebp
CANONICALIZER 19
![Page 48: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/48.jpg)
CANONICALIZER
movl %esp, %ebp
movl %ebp, %espmovl %esp, %ebp
movl %eax, %ebp
movl %ebp, %eaxmovl %eax, %ebp
CANONICALIZER 19-A
![Page 49: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/49.jpg)
CANONICALIZER
movl %esp, %ebp
movl %ebp, %espmovl %esp, %ebp
movl %eax, %ebp
movl %ebp, %eaxmovl %eax, %ebp
movl %esp, %esi
movl %esi, %espmovl %esp, %esi
CANONICALIZER 19-B
![Page 50: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/50.jpg)
CANONICALIZER
movl %esp, %ebp
movl %ebp, %espmovl %esp, %ebp
There are 56 variants of this rule
CANONICALIZER 20
![Page 51: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/51.jpg)
CANONICALIZER
movl %esp, %ebp
movl %ebp, %espmovl %esp, %ebp
There are 56 variants of this rule
movl reg1, reg2
movl reg2, reg1movl reg1, reg2
Reduce it to one rule
CANONICALIZER 20-A
![Page 52: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/52.jpg)
CANONICALIZER
addl %1234, %ebp
addl %5678, %ebpaddl %7912, %ebp
CANONICALIZER 21
![Page 53: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/53.jpg)
CANONICALIZER
addl %1234, %ebp
addl %5678, %ebpaddl %7912, %ebp
Use symbolic constants to deal with immediates
addl cons0, %ebp
addl cons1, %ebpaddl cons0+cons1, %ebp
CANONICALIZER 21-A
![Page 54: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/54.jpg)
CANONICALIZER
Rename the target sequence to get it in its canonical form:
Registers
➜ First register used is renamed to r0➜ Second distinct register used is renamed to r1➜ and so on...
Constants
➜ First constant used is renamed to 0 (a pre-defined constant)
➜ Second distinct constant is renamed to 1
Memory Addresses
➜ Rename a memory access in s-i-b form to a dereference of the
canonical name of its base (and index) register
CANONICALIZER 22
![Page 55: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/55.jpg)
CANONICALIZER
Only instruction sequences in canonical form are enumerated
0 1 0 1 0 + 1 0 - 1 0 + 1 0 - 1
CANONICALIZER 23
![Page 56: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/56.jpg)
CANONICALIZER
Only instruction sequences in canonical form are enumerated
movl r3, r4
movl r5, r3
0 1 0 1 0 + 1 0 - 1 0 + 1 0 - 1
CANONICALIZER 23-A
![Page 57: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/57.jpg)
CANONICALIZER
Only instruction sequences in canonical form are enumerated
movl r3, r4
movl r5, r3
movl r0, r1
movl r2, r0
0 1 0 1 0 + 1 0 - 1 0 + 1 0 - 1
CANONICALIZER 23-B
![Page 58: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/58.jpg)
CANONICALIZER
Only instruction sequences in canonical form are enumerated
movl r3, r4
movl r5, r3
movl r0, r1
movl r2, r0
Special Constants are enumerated for immediate operands
➜ 0 ; 1
➜ 0 ; 1
➜ 0 + 1 ; 0 - 1➜ 0 + 1 ; 0 - 1
CANONICALIZER 23-C
![Page 59: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/59.jpg)
CANONICALIZER
Only instruction sequences in canonical form are enumerated
movl r3, r4
movl r5, r3
movl r0, r1
movl r2, r0
Special Constants are enumerated for immediate operands
➜ 0 ; 1
➜ 0 ; 1
➜ 0 + 1 ; 0 - 1➜ 0 + 1 ; 0 - 1
LHS of an optimization rule is always in canonical form
CANONICALIZER 23-D
![Page 60: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/60.jpg)
CANONICALIZER
addl $47, %ebp
addl $53, %ebp
addl c0, %r0
addl c1, %r0
addl $100, %ebp addl c0+c1, %r0
Canonicalize
Query
Optimization
Database
Un-Canonicalize
CANONICALIZER 24
![Page 61: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/61.jpg)
FINGERPRINTER
Execute the instruction sequences on random-bit states
Use hash of result to compute a fingerprint
FINGERPRINTER 25
![Page 62: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/62.jpg)
FINGERPRINTER
Execute the instruction sequences on random-bit states
Use hash of result to compute a fingerprint
Memory is approximated by a 256 byte array
➜ Two equivalent accesses guaranteed to access the same
location in array
FINGERPRINTER 25-A
![Page 63: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/63.jpg)
FINGERPRINTER
Execute the instruction sequences on random-bit states
Use hash of result to compute a fingerprint
Memory is approximated by a 256 byte array
➜ Two equivalent accesses guaranteed to access the same
location in array
Fingerprint computed by direct execution on hardware
➜ Fingerprint of a sequence computed in <3µs
FINGERPRINTER 25-B
![Page 64: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/64.jpg)
SYSTEM ARCHITECTURE
Training
Programs
Harvester Canonicalizer Fingerprinter
Fingerprint Hashtable
Fingerprinter match?
SYSTEM ARCHITECTURE 26
![Page 65: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/65.jpg)
SYSTEM ARCHITECTURE
Training
Programs
Harvester Canonicalizer Fingerprinter
Fingerprint Hashtable
Fingerprinter match?Boolean
Equivalence Test
SYSTEM ARCHITECTURE 26-A
![Page 66: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/66.jpg)
EQUIVALENCE TEST
Execution Test
(fast)
Boolean Test
(slower)
EQUIVALENCE TEST 27
![Page 67: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/67.jpg)
EQUIVALENCE TEST
Execution Test
(fast)
Boolean Test
(slower)
Execution Test
Fingerprint on many different states
EQUIVALENCE TEST 27-A
![Page 68: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/68.jpg)
EQUIVALENCE TEST
Execution Test
(fast)
Boolean Test
(slower)
Execution Test
Fingerprint on many different states
Boolean Test
Use SAT Solver
EQUIVALENCE TEST 27-B
![Page 69: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/69.jpg)
BOOLEAN TEST
Machine state represented by a set of registers and a model
of memory
BOOLEAN TEST 28
![Page 70: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/70.jpg)
BOOLEAN TEST
Machine state represented by a set of registers and a model
of memory
Memory model
➜ Map from address expressions to data expressions
➜ Aliasing handled by comparing addresses of multiple accesses
BOOLEAN TEST 28-A
![Page 71: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/71.jpg)
BOOLEAN TEST
Machine state represented by a set of registers and a model
of memory
Memory model
➜ Map from address expressions to data expressions
➜ Aliasing handled by comparing addresses of multiple accesses
Instructions encoded as boolean circuits
input state output state
BOOLEAN TEST 28-B
![Page 72: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/72.jpg)
BOOLEAN TEST
Machine state represented by a set of registers and a model
of memory
Memory model
➜ Map from address expressions to data expressions
➜ Aliasing handled by comparing addresses of multiple accesses
Instructions encoded as boolean circuits
input state output state
Use a SAT Solver to compute equivalence
BOOLEAN TEST 28-C
![Page 73: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/73.jpg)
CONTEXT SENSITIVITY
Equivalence of two instruction sequences is defined under
the set of registers live beyond the sequence itself
CONTEXT SENSITIVITY 29
![Page 74: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/74.jpg)
EXPERIMENTAL RESULTS
Length Original After Canonicalize
Search Space and Prune
1 5,606 654
2 31 m 1.09 m
3 176 b 2.8 b
Search Space
Exhaustively enumerate sequences up to length 3
EXPERIMENTAL RESULTS 30
![Page 75: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/75.jpg)
EXPERIMENTAL RESULTS
Integer addition
for (i = 0; i < 64; i++)sum += a[i]
EXPERIMENTAL RESULTS 31
![Page 76: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/76.jpg)
EXPERIMENTAL RESULTS
Integer addition
for (i = 0; i < 64; i++)sum += a[i]
psadbw: sum of absolute differences of 8 consecutive
integers
psubb %mm0, %mm0
psadbw &a[i], %mm0
movd %mm0, sum
3x faster
EXPERIMENTAL RESULTS 31-A
![Page 77: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/77.jpg)
EXPERIMENTAL RESULTS
Comparison c[i] = (a[i] < b[i]) ? c0 : c1 7x
Minimum c[i] = (a[i] < b[i]) ? a[i] : b[i] 8x
Pixel-difference sum += abs(a[i]− b[i]) 10x
XOR c[i] = a[i]⊕ b[i] 2x
Sprite Copy c[i] = (a[i] == 0)? b[i] : a[i] 2x
MMX and conditional-move instructions are still under-used
EXPERIMENTAL RESULTS 32
![Page 78: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/78.jpg)
EXPERIMENTAL RESULTS
We evaluate our optimizer on SPEC benchmarks compiled
using g
Use two cost functions
➜ Codesize
➜ Runtime
• Number of memory accesses
• Number of jump instructions
• Instruction costs
EXPERIMENTAL RESULTS 33
![Page 79: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/79.jpg)
EXPERIMENTAL RESULTS
Program Codesize
Improvement
gzip 3.95%
mcf 5.86%
crafty 1.71%
bzip2 4.58%
gcc 1.12%
twolf 1.47%
parser 3.06%
Codesize improvement on executables
already optimized for size (-Os)
Codesize Improvement: 1 – 6%
EXPERIMENTAL RESULTS 34
![Page 80: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/80.jpg)
EXPERIMENTAL RESULTS
Program Instruction Count Improvement
gzip 4.16%
mcf 3.73%
crafty 2.19%
bzip2 4.11%
gcc 2.44%
twolf 2.17%
parser 3.84%
Instruction count improvement on
optimized executables (-O2)
Instruction Count Improvement: 2 – 4%
Speedup: 1 – 5%
EXPERIMENTAL RESULTS 35
![Page 81: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/81.jpg)
EXPERIMENTAL RESULTS
Number of Distinct Optimization Rules Learnt
Codesize 3000
Runtime 2100
Re-use of Optimization Rules
18,679
4,098
2,82
3
1,737
1,256
Frequency of Use
>1000
201-1000
51-200
11-50
1-10
Total: 28,593 optimizations
EXPERIMENTAL RESULTS 36
![Page 82: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/82.jpg)
Initial Machine State
r0r1r2r3r4memory
flags
Target Machine State
r0r1r2r3r4memory
flags
EXPERIMENTAL RESULTS 37
![Page 83: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/83.jpg)
Initial Machine State
r0r1r2r3r4memory
flags
Target Machine State
r0r1r2r3r4memory
flags
Exhaustive Search over
Length-3 sequences
EXPERIMENTAL RESULTS 37-A
![Page 84: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/84.jpg)
Initial Machine State
r0r1r2r3r4memory
flags
Target Machine State
r0r1r2r3r4memory
flags
Exhaustive Search over
Length-3 sequences
EXPERIMENTAL RESULTS 37-B
![Page 85: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/85.jpg)
Plain Text Cipher Text
Exhaustive Search over
Length-3 sequences
56-bit keys
EXPERIMENTAL RESULTS 38
![Page 86: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/86.jpg)
CT = Encrypt56(PT )
EXPERIMENTAL RESULTS 39
![Page 87: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/87.jpg)
CT = Encrypt56(PT )
If, Key56 = Key28 ◦ Key28
s.t., CT = Encrypt128(Encrypt228(PT ))
EXPERIMENTAL RESULTS 39-A
![Page 88: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/88.jpg)
CT = Encrypt56(PT )
If, Key56 = Key28 ◦ Key28
s.t., CT = Encrypt128(Encrypt228(PT ))
Then,
Decrypt128(CT ) = Encrypt228(PT )
EXPERIMENTAL RESULTS 39-B
![Page 89: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/89.jpg)
MEET-IN-MIDDLE
Plain Text Cipher Text
Encrypt using all
possible 28-bit keys
Decrypt using all
possible 28-bit keys
MEET-IN-MIDDLE 40
![Page 90: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/90.jpg)
MEET-IN-MIDDLE
Plain Text Cipher Text
Encrypt using all
possible 28-bit keys
Decrypt using all
possible 28-bit keys
Match
MEET-IN-MIDDLE 40-A
![Page 91: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/91.jpg)
MEET-IN-MIDDLE
Plain Text Cipher Text
Encrypt using all
possible 28-bit keys
Decrypt using all
possible 28-bit keys
Match
O(228)
MEET-IN-MIDDLE 40-B
![Page 92: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/92.jpg)
MEET-IN-MIDDLE
Initial
r0r1r2r3r4memory
flags
Target
r0r1r2r3r4memory
flags
Enumerate all
length n sequences
Enumerate all
length m inverse-sequences
MEET-IN-MIDDLE 41
![Page 93: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/93.jpg)
MEET-IN-MIDDLE
Initial
r0r1r2r3r4memory
flags
Target
r0r1r2r3r4memory
flags
Enumerate all
length n sequences
Enumerate all
length m inverse-sequences
Match
MEET-IN-MIDDLE 41-A
![Page 94: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/94.jpg)
MEET-IN-MIDDLE
Initial
r0r1r2r3r4memory
flags
Target
r0r1r2r3r4memory
flags
Enumerate all
length n sequences
Enumerate all
length m inverse-sequences
Match
Effective Enumerated Length: m + n
MEET-IN-MIDDLE 41-B
![Page 95: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/95.jpg)
MEET-IN-MIDDLE
What is the inverse of an instruction sequence?
How fast can we match?
MEET-IN-MIDDLE 42
![Page 96: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/96.jpg)
INVERSE OF AN INSTRUCTION SEQUENCE
Initial
r0r1r2r3r4memory
flags
Finalr0r1r2r3r4
memoryflagsExecute
Sequence
Tightest Approximation
to Initial
r0r1r2r3r4memory
flagsExecute
Sequence−1
Approximate using don’t-know bits
INVERSE OF AN INSTRUCTION SEQUENCE 43
![Page 97: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/97.jpg)
INSTRUCTION INVERSE
Instruction Inverse
ddd d1d11d d0d0dd0 ddd d
dd
INSTRUCTION INVERSE 44
![Page 98: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/98.jpg)
INSTRUCTION INVERSE
Instruction Inverse
add r0, r1 sub r0,r1 flags← d
dd d1d11d d0d0dd0 ddd d
dd
INSTRUCTION INVERSE 44-A
![Page 99: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/99.jpg)
INSTRUCTION INVERSE
Instruction Inverse
add r0, r1 sub r0,r1 flags← dadd r0, r0 shr r0; r0[31]← d flags← d
d d1d11d d0d0dd0 ddd d
dd
INSTRUCTION INVERSE 44-B
![Page 100: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/100.jpg)
INSTRUCTION INVERSE
Instruction Inverse
add r0, r1 sub r0,r1 flags← dadd r0, r0 shr r0; r0[31]← d flags← dmov r0, r1 r0← d
d1d11d d0d0dd0 ddd d
dd
INSTRUCTION INVERSE 44-C
![Page 101: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/101.jpg)
INSTRUCTION INVERSE
Instruction Inverse
add r0, r1 sub r0,r1 flags← dadd r0, r0 shr r0; r0[31]← d flags← dmov r0, r1 r0← dand r0, $010110 r0← r0 & d1d11d flags← d
0d0dd0 ddd d
dd
INSTRUCTION INVERSE 44-D
![Page 102: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/102.jpg)
INSTRUCTION INVERSE
Instruction Inverse
add r0, r1 sub r0,r1 flags← dadd r0, r0 shr r0; r0[31]← d flags← dmov r0, r1 r0← dand r0, $010110 r0← r0 & d1d11d flags← d
or r0, $010110 r0← r0 | 0d0dd0 flags← d
dd d
dd
INSTRUCTION INVERSE 44-E
![Page 103: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/103.jpg)
INSTRUCTION INVERSE
Instruction Inverse
add r0, r1 sub r0,r1 flags← dadd r0, r0 shr r0; r0[31]← d flags← dmov r0, r1 r0← dand r0, $010110 r0← r0 & d1d11d flags← d
or r0, $010110 r0← r0 | 0d0dd0 flags← d
xor r0, r1 xor r0, r1 flags← d
d ddd
INSTRUCTION INVERSE 44-F
![Page 104: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/104.jpg)
INSTRUCTION INVERSE
Instruction Inverse
add r0, r1 sub r0,r1 flags← dadd r0, r0 shr r0; r0[31]← d flags← dmov r0, r1 r0← dand r0, $010110 r0← r0 & d1d11d flags← d
or r0, $010110 r0← r0 | 0d0dd0 flags← d
xor r0, r1 xor r0, r1 flags← d
xchg r0, r1 xchg r0, r1
d ddd
INSTRUCTION INVERSE 44-G
![Page 105: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/105.jpg)
INSTRUCTION INVERSE
Instruction Inverse
add r0, r1 sub r0,r1 flags← dadd r0, r0 shr r0; r0[31]← d flags← dmov r0, r1 r0← dand r0, $010110 r0← r0 & d1d11d flags← d
or r0, $010110 r0← r0 | 0d0dd0 flags← d
xor r0, r1 xor r0, r1 flags← d
xchg r0, r1 xchg r0, r1
shr r0, $3 shl r0, $3; r0[0..2]← d flags← d
dd
INSTRUCTION INVERSE 44-H
![Page 106: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/106.jpg)
INSTRUCTION INVERSE
Instruction Inverse
add r0, r1 sub r0,r1 flags← dadd r0, r0 shr r0; r0[31]← d flags← dmov r0, r1 r0← dand r0, $010110 r0← r0 & d1d11d flags← d
or r0, $010110 r0← r0 | 0d0dd0 flags← d
xor r0, r1 xor r0, r1 flags← d
xchg r0, r1 xchg r0, r1
shr r0, $3 shl r0, $3; r0[0..2]← d flags← d
ror r0, $3 rol r0, $3
inc r0 dec r0 flags← d
neg r0 neg r0 flags← d
not r0 not r0
INSTRUCTION INVERSE 44-I
![Page 107: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/107.jpg)
Some instructions need not be enumerated during inverse
enumeration
r0r1r2r3r4memory
flags r0r1r2r3r4memory
flags
INSTRUCTION INVERSE 45
![Page 108: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/108.jpg)
Some instructions need not be enumerated during inverse
enumeration
Initial State
r0r1r2r3r4memory
flags
Final State
r0r1r2r3r4memory
flags
mov r0, r1
INSTRUCTION INVERSE 45-A
![Page 109: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/109.jpg)
Some instructions need not be enumerated during inverse
enumeration
Initial State
r0r1r2r3r4memory
flags
Final State
r0r1r2r3r4memory
flags
mov r0, r1
{mov r0, r1} can be the last instruction of the optimal
sequence only if r0 = r1 in the final state.
INSTRUCTION INVERSE 45-B
![Page 110: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/110.jpg)
Some instructions need not be enumerated during inverse
enumeration
Initial State
r0r1r2r3r4memory
flags
Final State
r0r1r2r3r4memory
flags
mov r0, r1
{mov r0, r1} can be the last instruction of the optimal
sequence only if r0 = r1 in the final state.
If r0 6= r1, {mov r0, r1} need not be enumerated
INSTRUCTION INVERSE 45-C
![Page 111: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/111.jpg)
The final state must be constrained w.r.t to the current
instruction being enumerated
Instruction Constraint
mov r0, r1 r0 = r1
add r0, r1 flags must agree with r0, r1
and r0, $010110 bits 0, 3 and 5 must be 0
or r0, $010110 bits 1, 2 and 4 must be 1
shr r0, $3 Three MSBs must be zero
xor r0, r1 flags must agree with r0, r1
inc r0 flags must agree with r0
dec r0 flags must agree with r0
INSTRUCTION INVERSE 46
![Page 112: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/112.jpg)
The final state must be constrained w.r.t to the current
instruction being enumerated
Instruction Constraint
mov r0, r1 r0 = r1
add r0, r1 flags must agree with r0, r1
and r0, $010110 bits 0, 3 and 5 must be 0
or r0, $010110 bits 1, 2 and 4 must be 1
shr r0, $3 Three MSBs must be zero
xor r0, r1 flags must agree with r0, r1
inc r0 flags must agree with r0
dec r0 flags must agree with r0
Intuitively: Goal-Directed Search can be pruned
much better
INSTRUCTION INVERSE 46-A
![Page 113: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/113.jpg)
Prune at every step
Final State
r0r1r2r3r4memory
flags
mov r0, r1
Final - 1r0r1r2r3r4
memoryflags
Prune
add r1, r2
Prune
INSTRUCTION INVERSE 47
![Page 114: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/114.jpg)
MATCH
Initial
r0r1r2r3r4memory
flags
Target
r0r1r2r3r4memory
flags
Forward Enumeration
machine states
Backward Enumeration
machine states
0
00 01
TRIE
MATCH 48
![Page 115: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/115.jpg)
MATCH
Initial
r0r1r2r3r4memory
flags
Target
r0r1r2r3r4memory
flags
Forward Enumeration
machine states
Backward Enumeration
machine states
0
00 01
TRIE
• A Trie is constructed on the forward-enumeration machine
states
MATCH 48-A
![Page 116: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/116.jpg)
MATCH
Initial
r0r1r2r3r4memory
flags
Target
r0r1r2r3r4memory
flags
Forward Enumeration
machine states
Backward Enumeration
machine states
0
00 01
TRIE
• A Trie is constructed on the forward-enumeration machine
states
• Each backward-enumeration machine state is searched on the
constructed trie
MATCH 48-B
![Page 117: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/117.jpg)
COMPLEXITY
Assuming size of instruction set = I (approx 3000)
For a length m + n instruction sequence:
• Original: O(Im+n)
• New Best Case: O(In + Im. log2 In)
• New Worst Case: O(Im+n)
• If average number of don’t care bits are f%: O(f.Im+n).
In practice, much less due to pruning.
COMPLEXITY 49
![Page 118: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/118.jpg)
PRELIMINARY RESULTS
477/5500 instructions are perfectly invertible (i.e., no don’t
knows)
(invertibility depends on both opcode and operands)
PRELIMINARY RESULTS 50
![Page 119: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/119.jpg)
PRELIMINARY RESULTS
• 3 = 2 + 1
– Can enumerate length-3 in few minutes as opposed to
2 days
PRELIMINARY RESULTS 51
![Page 120: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/120.jpg)
PRELIMINARY RESULTS
• 3 = 2 + 1
– Can enumerate length-3 in few minutes as opposed to
2 days
• 4 = 2 + 2
– For some sequences, completes in few minutes
– For some, it takes a day
PRELIMINARY RESULTS 51-A
![Page 121: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/121.jpg)
PRELIMINARY RESULTS
• 3 = 2 + 1
– Can enumerate length-3 in few minutes as opposed to
2 days
• 4 = 2 + 2
– For some sequences, completes in few minutes
– For some, it takes a day
• 4 = 3 + 1
– Currently Implementing. Likely to be much faster than
2+2.
PRELIMINARY RESULTS 51-B
![Page 122: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/122.jpg)
PRELIMINARY RESULTS
• 3 = 2 + 1
– Can enumerate length-3 in few minutes as opposed to
2 days
• 4 = 2 + 2
– For some sequences, completes in few minutes
– For some, it takes a day
• 4 = 3 + 1
– Currently Implementing. Likely to be much faster than
2+2.
• 5 = 3 + 2
– Perhaps, possible
PRELIMINARY RESULTS 51-C
![Page 123: Automatic Generation of Peephole Superoptimizerstheory.stanford.edu/~sbansal/talks/msr-talk.pdf · Automatic Generation of Peephole Superoptimizers Sorav Bansal and Alex Aiken Stanford](https://reader033.vdocuments.net/reader033/viewer/2022060315/5f0bbb137e708231d431efe1/html5/thumbnails/123.jpg)
CONCLUSIONS
• We have demonstrated the construction of an optimizer
using only exhaustive search
• Many (sometimes huge) performance opportunities are
still unexploited by compilers
• Scaling to longer sequence lengths is the key
• http://cs.stanford.edu/∼sbansal/superoptimizer
CONCLUSIONS 52