tracing versus partial evaluation: which meta-compilation approach is better for self-optimizing...
TRANSCRIPT
![Page 1: Tracing versus Partial Evaluation: Which Meta-Compilation Approach is Better for Self-Optimizing Interpreters?](https://reader033.vdocuments.net/reader033/viewer/2022052117/58ecd4d01a28ab8f398b4703/html5/thumbnails/1.jpg)
Tracing versus Partial Evaluation
Which Meta-Compilation Approach is Better for Self-Optimizing
Interpreters?
Stefan Marr, Stéphane DucasseOOPSLA, October 28, 2015
Work Done At
![Page 2: Tracing versus Partial Evaluation: Which Meta-Compilation Approach is Better for Self-Optimizing Interpreters?](https://reader033.vdocuments.net/reader033/viewer/2022052117/58ecd4d01a28ab8f398b4703/html5/thumbnails/2.jpg)
Disclaimer
2
I am currently funded by
* Würthinger, T.; Wimmer, C.; Wöß A.; Stadler, L.; Duboscq, G.; Humer, C.; Richards, G.; Simon, D. & Wolczko, M,
One VM to Rule Them All, in Proceedings of the 2013 ACM International Symposium on New Ideas, New Paradigms, and Reflections on Programming & Software, ACM.
Oracle Labs
![Page 3: Tracing versus Partial Evaluation: Which Meta-Compilation Approach is Better for Self-Optimizing Interpreters?](https://reader033.vdocuments.net/reader033/viewer/2022052117/58ecd4d01a28ab8f398b4703/html5/thumbnails/3.jpg)
3
![Page 4: Tracing versus Partial Evaluation: Which Meta-Compilation Approach is Better for Self-Optimizing Interpreters?](https://reader033.vdocuments.net/reader033/viewer/2022052117/58ecd4d01a28ab8f398b4703/html5/thumbnails/4.jpg)
Compare Concrete Systems
Truffle + Graal
with Partial Evaluation
RPythonwith Meta-Tracing
[3] Würthinger et al., One VM to Rule Them All, Onward! 2013, ACM, pp. 187-204.
[2] Bolz et al., Tracing the Meta-level: PyPy's Tracing JIT Compiler, ICOOOLPS Workshop 2009, ACM, pp. 18-25.
Oracle Labs
![Page 5: Tracing versus Partial Evaluation: Which Meta-Compilation Approach is Better for Self-Optimizing Interpreters?](https://reader033.vdocuments.net/reader033/viewer/2022052117/58ecd4d01a28ab8f398b4703/html5/thumbnails/5.jpg)
Selecting A Case Study
On both Systems
5
Self-Optimizing AST Interpreter
![Page 6: Tracing versus Partial Evaluation: Which Meta-Compilation Approach is Better for Self-Optimizing Interpreters?](https://reader033.vdocuments.net/reader033/viewer/2022052117/58ecd4d01a28ab8f398b4703/html5/thumbnails/6.jpg)
Represents Large Group ofDynamic Languages
Dynamically Typed (Smalltalk)
Classes(and everything is an Object)
Closures (lambdas)
Non-local Returns(almost exceptions)
Set of Benchmark6
http://som-st.github.io
![Page 7: Tracing versus Partial Evaluation: Which Meta-Compilation Approach is Better for Self-Optimizing Interpreters?](https://reader033.vdocuments.net/reader033/viewer/2022052117/58ecd4d01a28ab8f398b4703/html5/thumbnails/7.jpg)
SOMMT versus SOMPE
Meta-Tracing Partial Evaluation
7
cnt
1
+cnt:=
ifcnt:=
0
cnt
1+
cnt:=if cnt:=
0
[3] Würthinger et al., One VM to Rule Them All, Onward! 2013, ACM, pp. 187-204.
[2] Bolz et al., Tracing the Meta-level: PyPy'sTracing JIT Compiler, ICOOOLPS Workshop 2009, ACM, pp. 18-25.
![Page 8: Tracing versus Partial Evaluation: Which Meta-Compilation Approach is Better for Self-Optimizing Interpreters?](https://reader033.vdocuments.net/reader033/viewer/2022052117/58ecd4d01a28ab8f398b4703/html5/thumbnails/8.jpg)
WHICH APPROACH IS FASTER FAST?
minimal amount of engineering to get good performance
8
![Page 9: Tracing versus Partial Evaluation: Which Meta-Compilation Approach is Better for Self-Optimizing Interpreters?](https://reader033.vdocuments.net/reader033/viewer/2022052117/58ecd4d01a28ab8f398b4703/html5/thumbnails/9.jpg)
Peak Performance of Basic Interpreters
Runtime Normalized
to Java 8
(lower is better)
Compiled
SOM[MT]
Compiled
SOM[PE]
10
100
Bounce
Bubble
Sort
DeltaB
lue
Fannkuch
Gra
phS
earc
hJson
Mand
elb
rot
NB
ody
Pa
geR
ank
Perm
ute
Queens
Quic
kS
ort
Ric
hard
sS
ieve
Sto
rage
Tow
ers
Bounce
Bubble
Sort
DeltaB
lue
Fannkuch
Gra
phS
earc
hJson
Mand
elb
rot
NB
ody
Pa
geR
ank
Perm
ute
Queens
Quic
kS
ort
Ric
hard
sS
ieve
Sto
rage
Tow
ers
Runtim
e n
orm
aliz
ed t
oJava (
com
pile
d o
r in
terp
rete
d)
SOMMT on RPython SOMPE on Truffle
Minimal SOMMT
5.5x slowermin. 1.6xmax. 14x
Minimal SOMPE
170x slowermin. 60x
max. 600x
![Page 10: Tracing versus Partial Evaluation: Which Meta-Compilation Approach is Better for Self-Optimizing Interpreters?](https://reader033.vdocuments.net/reader033/viewer/2022052117/58ecd4d01a28ab8f398b4703/html5/thumbnails/10.jpg)
WHICH APPROACH IS THE FASTEST?
best peak performance
10
![Page 11: Tracing versus Partial Evaluation: Which Meta-Compilation Approach is Better for Self-Optimizing Interpreters?](https://reader033.vdocuments.net/reader033/viewer/2022052117/58ecd4d01a28ab8f398b4703/html5/thumbnails/11.jpg)
Which Self-Optimizations Should a Language Implementer Add?
• Type-specialize variables
• Type-specialize object fields
• Type-specialize collection storage
• Lower control structures from library
• Lower common library operations
• Inline caching
• Inline primitive operations
• Cache globals
• …11
![Page 12: Tracing versus Partial Evaluation: Which Meta-Compilation Approach is Better for Self-Optimizing Interpreters?](https://reader033.vdocuments.net/reader033/viewer/2022052117/58ecd4d01a28ab8f398b4703/html5/thumbnails/12.jpg)
Peak Performance of Optimized InterpreterCompiled
SOM[MT]
Compiled
SOM[PE]
1
4
8
12
Bounce
Bubble
Sort
DeltaB
lue
Fannkuch
Gra
phS
earc
h
Json
Mand
elb
rot
NB
ody
Pa
geR
ank
Perm
ute
Queens
Quic
kS
ort
Ric
hard
s
Sie
ve
Sto
rage
Tow
ers
Bounce
Bubble
Sort
DeltaB
lue
Fannkuch
Gra
phS
earc
h
Json
Mand
elb
rot
NB
ody
Pa
geR
ank
Perm
ute
Queens
Quic
kS
ort
Ric
hard
s
Sie
ve
Sto
rage
Tow
ers
Runtim
e n
orm
aliz
ed t
oJava (
com
pile
d o
r in
terp
rete
d)
SOMMT on RPython SOMPE on Truffle
Runtime Normalized
to Java 8
(lower is better)
Optimized SOMMT
3x slowermin. 1.5xmax. 11x
Optimized SOMPE
2.3x slowermin. 4%
max. 4.9x
2.4xspeedup
80xspeedup
![Page 13: Tracing versus Partial Evaluation: Which Meta-Compilation Approach is Better for Self-Optimizing Interpreters?](https://reader033.vdocuments.net/reader033/viewer/2022052117/58ecd4d01a28ab8f398b4703/html5/thumbnails/13.jpg)
Optimization Impact on SOMPE
13
I
I
I
I
I
I
I
I
I
I
I
I
I
lower control structures
inline caching
cache globals
typed fields
lower common ops
array strategies
inline basic ops.
typed vars
opt. local vars
baseline
min. escaping closures
typed args
catch−return nodes0
.85
1.0
0
1.2
0
1.5
0
2.0
0
3.0
0
4.0
0
5.0
0
7.0
0
8.0
0
10
.00
12
.00
Speedup Factor(higher is better, logarithmic scale)Speedup Factor
(higher is better, logarithmic scale)
![Page 14: Tracing versus Partial Evaluation: Which Meta-Compilation Approach is Better for Self-Optimizing Interpreters?](https://reader033.vdocuments.net/reader033/viewer/2022052117/58ecd4d01a28ab8f398b4703/html5/thumbnails/14.jpg)
Implementation Sizes
RPython
From Minimal to Optimized
+57% LOC
From 3,455 LOC to 5,414 LOC
Truffle
From Minimal to Optimized
+ 103% LOC
From 5,424 LOC to 11,037 LOC
14
The Way I writePython
The Way I write Java
![Page 15: Tracing versus Partial Evaluation: Which Meta-Compilation Approach is Better for Self-Optimizing Interpreters?](https://reader033.vdocuments.net/reader033/viewer/2022052117/58ecd4d01a28ab8f398b4703/html5/thumbnails/15.jpg)
WHICH APPROACH GIVES BETTER STARTUP PERFORMANCE?
Considering the User-Perceived System Performance
15
![Page 16: Tracing versus Partial Evaluation: Which Meta-Compilation Approach is Better for Self-Optimizing Interpreters?](https://reader033.vdocuments.net/reader033/viewer/2022052117/58ecd4d01a28ab8f398b4703/html5/thumbnails/16.jpg)
Measuring “Whole Program” Runtime
16
4
8
12
16
0 200 400 600
GeoM
eanO
f(W
all−
Clo
ck T
ime for
x Ite
ration
s,
div
ided b
y c
orr
espondin
g J
ava r
esult)
VM
Java
RTruffleSOM−jit−experiments
TruffleSOM−graal−no−expgc
Wall−Clock Behavior for Various Run Lengths: Aggregation over all Benchmarks
Fact
or
ove
r Ja
va, f
or
x-it
erat
ion
s
Iterations of Benchmark in Same Process8sec 25sec 46sec
• Process Start to Finish
• Overall Wall-clock time
• Normalized to Java
Java
SOMMT
SOMPE
![Page 17: Tracing versus Partial Evaluation: Which Meta-Compilation Approach is Better for Self-Optimizing Interpreters?](https://reader033.vdocuments.net/reader033/viewer/2022052117/58ecd4d01a28ab8f398b4703/html5/thumbnails/17.jpg)
CONCLUSIONS
17
![Page 18: Tracing versus Partial Evaluation: Which Meta-Compilation Approach is Better for Self-Optimizing Interpreters?](https://reader033.vdocuments.net/reader033/viewer/2022052117/58ecd4d01a28ab8f398b4703/html5/thumbnails/18.jpg)
Tracing vs. Partial Evaluation
• Peak performance seems similar
– No indications of conceptual limitations
• Startup Performance
– Unclear, tiered compilation?
• But, tracing is faster fast!
– Requires less optimizations
– Better ‘prototype’ performance18
![Page 19: Tracing versus Partial Evaluation: Which Meta-Compilation Approach is Better for Self-Optimizing Interpreters?](https://reader033.vdocuments.net/reader033/viewer/2022052117/58ecd4d01a28ab8f398b4703/html5/thumbnails/19.jpg)
Peak Performance of Optimized InterpreterCompiled
SOM[MT]
Compiled
SOM[PE]
1
4
8
12
Bounce
Bubble
Sort
DeltaB
lue
Fannkuch
Gra
phS
earc
h
Json
Mand
elb
rot
NB
ody
Pa
geR
ank
Perm
ute
Queens
Quic
kS
ort
Ric
hard
s
Sie
ve
Sto
rage
Tow
ers
Bounce
Bubble
Sort
DeltaB
lue
Fannkuch
Gra
phS
earc
h
Json
Mand
elb
rot
NB
ody
Pa
geR
ank
Perm
ute
Queens
Quic
kS
ort
Ric
hard
s
Sie
ve
Sto
rage
Tow
ers
Runtim
e n
orm
aliz
ed t
oJava (
com
pile
d o
r in
terp
rete
d)
SOMMT on RPython SOMPE on Truffle
Runtime Normalized
to Java 8
(lower is better)
Optimized SOMMT
3x slowermin. 1.5xmax. 11x
Optimized SOMPE
2.3x slowermin. 4%
max. 4.9x