intraprocedural optimizations jonathan bachrach mit ai lab
TRANSCRIPT
![Page 1: Intraprocedural Optimizations Jonathan Bachrach MIT AI Lab](https://reader035.vdocuments.net/reader035/viewer/2022062422/56649ebe5503460f94bc7b52/html5/thumbnails/1.jpg)
Intraprocedural Optimizations
Jonathan Bachrach
MIT AI Lab
![Page 2: Intraprocedural Optimizations Jonathan Bachrach MIT AI Lab](https://reader035.vdocuments.net/reader035/viewer/2022062422/56649ebe5503460f94bc7b52/html5/thumbnails/2.jpg)
Outline
• Goal: eliminate abstraction overhead using static analysis and program transformation
• Topics:– Intraprocedural type inference– Static method selection– Specialization and Inlining– Static class prediction– Splitting– Box/unboxing– Common Subexpression Elimination– Overflow and range checks– Partial evaluation revisited
• Partially based on: Chambers’ “Efficient Implementation of Object-oriented Programming Languages” OOPSLA Tutorial
![Page 3: Intraprocedural Optimizations Jonathan Bachrach MIT AI Lab](https://reader035.vdocuments.net/reader035/viewer/2022062422/56649ebe5503460f94bc7b52/html5/thumbnails/3.jpg)
Running Example
(dg + ((x <num>) (y <num>) => <num>))
(dm + ((x <int>) (y <int>) => <int>) (%ib (%i+ (%iu x) (%iu y)))
(dm + ((x <flo>) (y <flo>) => <flo>) (%fb (%f+ (%fu x) (%fu y)))
(dm x2 ((x <num>) => <num>) (+ x x))(dm x2 ((x <int>) => <int>) (+ x x))
• Anatomy of Pure Proto Arithmetic– Dispatch
– Boxing
– Overflow checks
– Actual instruction
• C Arithmetic– Actual instruction
![Page 4: Intraprocedural Optimizations Jonathan Bachrach MIT AI Lab](https://reader035.vdocuments.net/reader035/viewer/2022062422/56649ebe5503460f94bc7b52/html5/thumbnails/4.jpg)
Biggest Inefficiencies
• Method dispatch
• Method calls
• Boxing
• Type checks
• Overflow and range checks
• Slot access
• Object creation
![Page 5: Intraprocedural Optimizations Jonathan Bachrach MIT AI Lab](https://reader035.vdocuments.net/reader035/viewer/2022062422/56649ebe5503460f94bc7b52/html5/thumbnails/5.jpg)
Intraprocedural Type Inference
• Goal: determine concrete class(es) of each variable and expression
• Standard data flow analysis through control graph– Propagate bindings b -> { class … } – Sources are literals, isa expressions, results of some
primitives, and type declarations
– Form unions of bindings at merge points
– Narrow sets after typecases
– Assumes closed world (or at least final classes)
![Page 6: Intraprocedural Optimizations Jonathan Bachrach MIT AI Lab](https://reader035.vdocuments.net/reader035/viewer/2022062422/56649ebe5503460f94bc7b52/html5/thumbnails/6.jpg)
Type Inference Example
(set x (isa <tab> …)) ;; x in { <tab> }(set y (table-growth-factor x)) ;; y in { <int> <flo> }(set z (if t x y)) ;; z in { <tab> <int>
<flo> }
![Page 7: Intraprocedural Optimizations Jonathan Bachrach MIT AI Lab](https://reader035.vdocuments.net/reader035/viewer/2022062422/56649ebe5503460f94bc7b52/html5/thumbnails/7.jpg)
Narrowing Type Precision
(if (isa? x <int>) (+ x 1) (+ x 37.0))
(if (isa? x <int>) (let (([x <int>] x)) (+ x 1)) (let (([x !<int>] x)) (+ x 37.0)))
![Page 8: Intraprocedural Optimizations Jonathan Bachrach MIT AI Lab](https://reader035.vdocuments.net/reader035/viewer/2022062422/56649ebe5503460f94bc7b52/html5/thumbnails/8.jpg)
Static Method Selection
(set x (isa <tab> …)) ;; x in { <tab> }(set y (table-growth-factor x)) ;; y in { <int> <flo> }(print out y)
• If only one class is statically possible then can perform dispatch statically:(set y (<tab>:table-growth-factor x))
• If a couple classes are statically possible then can insert typecase:(sel (class-of y) ((<int>) (<int>:print y)) ((<flo>) (<flo>:print y)))
![Page 9: Intraprocedural Optimizations Jonathan Bachrach MIT AI Lab](https://reader035.vdocuments.net/reader035/viewer/2022062422/56649ebe5503460f94bc7b52/html5/thumbnails/9.jpg)
Type Check Removal
• Type inference can clearly be used to remove type checks and casts
(set x (isa <tab> …)) ;; x in { <tab> }(if (isa? x <tab>) (go) (stop))==>(set x (isa <tab> …)) ;; x in { <tab> }(go)
![Page 10: Intraprocedural Optimizations Jonathan Bachrach MIT AI Lab](https://reader035.vdocuments.net/reader035/viewer/2022062422/56649ebe5503460f94bc7b52/html5/thumbnails/10.jpg)
Intraprocedural Type Inference Critique
• Pros: – Simple
– Fast
– Fewer dependents
• Cons: – Limited type precision
• No result types
• Incoming arg types
• No slot types
• Etc.
![Page 11: Intraprocedural Optimizations Jonathan Bachrach MIT AI Lab](https://reader035.vdocuments.net/reader035/viewer/2022062422/56649ebe5503460f94bc7b52/html5/thumbnails/11.jpg)
Specialization
• Q: How can we improve intraprocedural type inference precision?
• A: Specialization which is the cloning of methods with narrowed argument types
• Improves type precision of callee by contextualizing body:(dm sqr ((x <num>) (y <num>)) (* x y))==>(dm sqr ((x <int>) (y <int>)) (* x y))(dm sqr ((x <flo>) (y <flo>)) (* x y))
• Must make sure super calls still mean same thing
![Page 12: Intraprocedural Optimizations Jonathan Bachrach MIT AI Lab](https://reader035.vdocuments.net/reader035/viewer/2022062422/56649ebe5503460f94bc7b52/html5/thumbnails/12.jpg)
Specialization of Constructors
• Crucial to get object creation to be fast• Specialization can be used to build custom
constructors(def <thingy> (isa <any>)) (slot <thingy> thingy-x 0) (slot (t <thingy>) thingy-tracker (+ (thingy-x t) 1)) (slot <thingy> thingy-cache (fab <tab>))
(df thingy-isa (x tracker cache) (let ((thingy (clone <thingy>))) (unless (== x nul) (set (%slot-value thingy thingy-x) x)) (set (%slot-value thingy thingy-tracker) (if (== tracker nul) (+ (thingy-x p) 1) tracker)))) (set (%slot-value thingy thingy-cache) (if (== cache nul) (fab <tab>) cache))))
![Page 13: Intraprocedural Optimizations Jonathan Bachrach MIT AI Lab](https://reader035.vdocuments.net/reader035/viewer/2022062422/56649ebe5503460f94bc7b52/html5/thumbnails/13.jpg)
Inlining
• Q: Can we do better?
• A: Inlining can improve specialization by inserting specialized body
• Improves type precision at call-site by contextualizing body (includes result types):(dm f ((x <int>) (y <int>)) (+ (g x y) 1))(dm g (x y) (+ x y))==>(dm f ((x <int>) (y <int>)) (+ (+ x y) 1))
![Page 14: Intraprocedural Optimizations Jonathan Bachrach MIT AI Lab](https://reader035.vdocuments.net/reader035/viewer/2022062422/56649ebe5503460f94bc7b52/html5/thumbnails/14.jpg)
Synergy: Method Selection + Inlining
(df f ((x <int>) (y <int>)) (+ x y))
;; method selection(df f ((x <int>) (y <int>)) (<int>:+ x y))
;; inlining(df f ((x <int>) (y <int>)) (%ib (%i+ (%iu x) (%iu y))))
![Page 15: Intraprocedural Optimizations Jonathan Bachrach MIT AI Lab](https://reader035.vdocuments.net/reader035/viewer/2022062422/56649ebe5503460f94bc7b52/html5/thumbnails/15.jpg)
Pitfalls of Inlining and Specialization
• Must control inlining and specialization carefully to avoid code bloat
• Inlining can work merely using syntactic size trying never to increase size over original call
• Class-centric specialization usually works by copying down inherited methods tightening up self references (harder for multimethods)
• Can run inlining/specialization trials based on– Final static size– Performance feedback
![Page 16: Intraprocedural Optimizations Jonathan Bachrach MIT AI Lab](https://reader035.vdocuments.net/reader035/viewer/2022062422/56649ebe5503460f94bc7b52/html5/thumbnails/16.jpg)
Class Centric Specialization
(def <point> (isa <any>)) (slot <point> (point-x <int>) 0)(dm point-move ((p <point>) (offset <num>)) (set (point-x p) (+ (point-x p) offset)))(def <color-point> (isa <point>))
==>
(dm point-move ((p <color-point>) (offset <num>)) (set (point-x p) (+ (point-x p) offset)))
![Page 17: Intraprocedural Optimizations Jonathan Bachrach MIT AI Lab](https://reader035.vdocuments.net/reader035/viewer/2022062422/56649ebe5503460f94bc7b52/html5/thumbnails/17.jpg)
Static Class Prediction
• Can improve type precision in cases where for a given generic a particular method is much more frequent
• Insert type check testing prediction– Can narrow type precision along then and else
branches
• Especially useful in combination with inlining
![Page 18: Intraprocedural Optimizations Jonathan Bachrach MIT AI Lab](https://reader035.vdocuments.net/reader035/viewer/2022062422/56649ebe5503460f94bc7b52/html5/thumbnails/18.jpg)
Static Class Prediction Example
(df f (x) (let ((y (+ x 1))) (+ y 2)))
(df f (x) (let ((y (if (isa? x <int>) (+ x 1) (+ x 1)))) (if (isa? y <int>) (+ y 2) (+ y 2)))))
(df f (x) (let ((y (if (isa? x <int>) (<int>:+ x 1) (+ x 1)))) (if (isa? y <int>) (<int>:+ y 2) (+ y 2)))))
![Page 19: Intraprocedural Optimizations Jonathan Bachrach MIT AI Lab](https://reader035.vdocuments.net/reader035/viewer/2022062422/56649ebe5503460f94bc7b52/html5/thumbnails/19.jpg)
Synergy: Class Prediction + Method Selection + Inlining
(df f (x) (let ((y (if (isa? x <int>) (+ x 1) (+ x 1)))) (if (isa? y <int>) (+ y 2) (+ y 2)))))
;; method selection(df f (x) (let ((y (if (isa? x <int>) (<int>:+ x 1) (+ x 1)))) (if (isa? y <int>) (<int>:+ y 2) (+ y 2)))))
;; inlining(df f (x) (let ((y (if (isa? x <int>) (%ib (%i+ (%iu x) %1)) (+ x 1)))) (if (isa? y <int>) (%ib (%i+ (%iu y) (%iu 2))) (+ y 2)))))
![Page 20: Intraprocedural Optimizations Jonathan Bachrach MIT AI Lab](https://reader035.vdocuments.net/reader035/viewer/2022062422/56649ebe5503460f94bc7b52/html5/thumbnails/20.jpg)
Splitting
• Problem: Class prediction often leads to a bunch of redundant type tests
• Solution: Split off whole sections of graph specialized to particular class on variable– Can split off entire loops– Can specialize on other dataflow information
![Page 21: Intraprocedural Optimizations Jonathan Bachrach MIT AI Lab](https://reader035.vdocuments.net/reader035/viewer/2022062422/56649ebe5503460f94bc7b52/html5/thumbnails/21.jpg)
Splitting Example
(df f (x) (let ((y (+ x 1))) (+ y 2)))
(df f (x) (if (isa? x <int>) (let ((y (+ x 1))) (+ y 2)) (let ((y (+ x 1))) (+ y 2))))
(df f (x) (if (isa? x <int>) (let ((y (<int>:+ x 1))) (<int>:+ y 2)) (let ((y (+ x 1))) (+ y 2))))
![Page 22: Intraprocedural Optimizations Jonathan Bachrach MIT AI Lab](https://reader035.vdocuments.net/reader035/viewer/2022062422/56649ebe5503460f94bc7b52/html5/thumbnails/22.jpg)
Splitting Downside
• Splitting can also lead to code bloat
• Must be intelligent about what to split– A priori knowledge (e.g., integers most
frequent)– Actual performance
![Page 23: Intraprocedural Optimizations Jonathan Bachrach MIT AI Lab](https://reader035.vdocuments.net/reader035/viewer/2022062422/56649ebe5503460f94bc7b52/html5/thumbnails/23.jpg)
Box / Unboxing
(df + ((x <int>) (y <int>) => <int>) (%ib (%i+ (%iu x) (%iu y))))
(df f ((a <int>) (b <int>) => <int>) (+ (+ a b) a))
;; inlining +
(df f ((a <int>) (b <int>) => <int>) (%ib (%i+ (%iu (%ib (%i+ (%iu a) (%iu b)))) (%iu a))))
;; remove box/unbox pair
(df f ((a <int>) (b <int>) => <int>) (%ib (%i+ (%i+ (%iu a) (%iu b)) (%iu a))))
![Page 24: Intraprocedural Optimizations Jonathan Bachrach MIT AI Lab](https://reader035.vdocuments.net/reader035/viewer/2022062422/56649ebe5503460f94bc7b52/html5/thumbnails/24.jpg)
Synergy: Splitting + Method Selection + Inlining + Box/Unboxing
(df f (x) (if (isa? x <int>) (let ((y (+ x 1))) (+ y 2)) (let ((y (+ x 1))) (+ y 2))))
;; method selection(df f (x) (if (isa? x <int>) (let ((y (<int>:+ x 1))) (<int>:+ y 2)) (let ((y (+ x 1))) (+ y 2))))
(df f (x) (if (isa? x <int>) (<int>:+ (<int>:+ x 1) 2) (let ((y (+ x 1))) (+ y 2))));; inlining(df f (x) (if (isa? x <int>) (%ib (i+ (%iu (%ib (%i+ (%iu x) %1)))) %2)) (let ((y (+ x 1))) (+ y 2))));; box/unbox(df f (x) (if (isa? x <int>) (%ib (%i+ (%i+ (%iu x) %1)) %2)) (let ((y (+ x 1))) (+ y 2))))
![Page 25: Intraprocedural Optimizations Jonathan Bachrach MIT AI Lab](https://reader035.vdocuments.net/reader035/viewer/2022062422/56649ebe5503460f94bc7b52/html5/thumbnails/25.jpg)
Common Subexpression Elimination (CSE)
• Removes redundant computations– Constant slot or binding access– Stateless/side-effect-free function calls
• Examples(or (elt (cache x) ‘a) (elt (cache x) ‘b)) ==> (let ((t (cache x))) (or (elt t ‘a) (elt t ‘b))
(if (< i 0) (if (< i 0) (go) (putz)) (dance)) ==> (if (< i 0) (go) (dance))
![Page 26: Intraprocedural Optimizations Jonathan Bachrach MIT AI Lab](https://reader035.vdocuments.net/reader035/viewer/2022062422/56649ebe5503460f94bc7b52/html5/thumbnails/26.jpg)
Overflow and Bounds Checksaka “Moon Challenge”
• Goal: – Support mathematical integers and bounds checked
collection access– Eliminate bounds and overflow checks
• Strategy:– Assume most integer arithmetic and collection accesses
occur in restricted loop context where range can be readily inferred
– Perform range analysis to remove checks• Bound from above variables by size of collection• Bound from below variables by zero• Induction step is 1+
![Page 27: Intraprocedural Optimizations Jonathan Bachrach MIT AI Lab](https://reader035.vdocuments.net/reader035/viewer/2022062422/56649ebe5503460f94bc7b52/html5/thumbnails/27.jpg)
Range Check Example
(rep (((sum <int>) 0) ((i <int>) 0)) (if (< i (len v)) (let ((e (elt v i))) (rep (+ sum e) (+ i 1))) sum))
;; inlining bounds checks(rep (((sum <int>) 0) ((i <int>) 0)) (if (< i (len v)) (let ((e (if (or (< i 0) (>= i (len v))) (sig ...) (vref v i)))) (rep (+ sum e) (+ i 1))) sum))
;; CSE(rep (((sum <int>) 0) ((i <int>) 0)) (if (< i (len v)) (let ((e (if (< i 0) (sig ...) (vref v i)))) (rep (+ sum e) (+ i 1))) sum))
;; range analysis(rep (((sum <int>) 0) ((i <int>) 0)) (if (< i (len v)) (let ((e (vref v i))) (rep (+ sum e) (+ i 1))) sum))
![Page 28: Intraprocedural Optimizations Jonathan Bachrach MIT AI Lab](https://reader035.vdocuments.net/reader035/viewer/2022062422/56649ebe5503460f94bc7b52/html5/thumbnails/28.jpg)
Overflow Check Removal aka “Moon Challenge” Critique
• Pros: – simple analysis
• Cons: – could miss a number of cases
• but then previous approaches (e.g., box/unbox) could be applied
![Page 29: Intraprocedural Optimizations Jonathan Bachrach MIT AI Lab](https://reader035.vdocuments.net/reader035/viewer/2022062422/56649ebe5503460f94bc7b52/html5/thumbnails/29.jpg)
Advanced topic:Representation Selection
• Embed objects in others to remove indirections
• Change object representation over time
• Use minimum number of bits to represent enums
• Pack fields in objects
![Page 30: Intraprocedural Optimizations Jonathan Bachrach MIT AI Lab](https://reader035.vdocuments.net/reader035/viewer/2022062422/56649ebe5503460f94bc7b52/html5/thumbnails/30.jpg)
Advanced Topic:Algorithm Selection
• Goal: compiler determines that one algorithm is more appropriate for given data– Sorted data– Biased data
• Solution: – Embed statistics gathering in runtime– Add guards to code and split
![Page 31: Intraprocedural Optimizations Jonathan Bachrach MIT AI Lab](https://reader035.vdocuments.net/reader035/viewer/2022062422/56649ebe5503460f94bc7b52/html5/thumbnails/31.jpg)
Rule-based Compilation
• First millennium compilers were based on special rules for– Method selection– Pattern matching– Oft-used system functions like format
• Problems– Error prone– Don’t generalize to user code
• Challenge– Minimize number of rules– Competitive compiler speed– Produce competitive code
![Page 32: Intraprocedural Optimizations Jonathan Bachrach MIT AI Lab](https://reader035.vdocuments.net/reader035/viewer/2022062422/56649ebe5503460f94bc7b52/html5/thumbnails/32.jpg)
Partial Evaluation to the Rescue
• Holy grail idea:– Optimizations are manifest in code– Do previous optimizations with only p.e.
• Simplify compiler based on limited moves– Static eval and folding– Inlining
• Eliminate– Custom method selection– Custom constructor optimization– Etc.
![Page 33: Intraprocedural Optimizations Jonathan Bachrach MIT AI Lab](https://reader035.vdocuments.net/reader035/viewer/2022062422/56649ebe5503460f94bc7b52/html5/thumbnails/33.jpg)
Partial Eval Example(dm format (port msg (args …)) (rep nxt ((I 0) (ai 0)) (when (< I (len msg))) (let ((c (elt msg I))) (if (= c #\%) (seq (print port (elt args ai)) (nxt (+ I 1) (+ ai 1)))) (seq (write port c) (nxt (+ I 1) ai)))))))
(format out “%>? ” n)
• First millennium solution is to have a custom optimizer for format
(seq (print port n) (write port “> “))
• Second millennium solution with partial evaluation
(nxt 0 0)
(seq (print port n) (nxt 1 1))
(seq (print port n) (seq (write port #\>) (nxt 2 1)))
(seq (print port n) (seq (write port #\>) (seq (write port #\space))))
![Page 34: Intraprocedural Optimizations Jonathan Bachrach MIT AI Lab](https://reader035.vdocuments.net/reader035/viewer/2022062422/56649ebe5503460f94bc7b52/html5/thumbnails/34.jpg)
Partial Eval Challenge
• Inlining and static eval are slow– “Running” code through inlining
– Need to compile oft-used optimizations
• Residual code is not necessarily efficient– Sometimes algorithmic change is necessary for optimal
efficiency• Example: method selection uses class numbering and decision
tree whereas straightforward code does naïve method sorting
• Perhaps there is a middle ground
![Page 35: Intraprocedural Optimizations Jonathan Bachrach MIT AI Lab](https://reader035.vdocuments.net/reader035/viewer/2022062422/56649ebe5503460f94bc7b52/html5/thumbnails/35.jpg)
Open Problems
• Automatic inlining, splitting, and specialization• Efficient mathematical integers• Constant determination• Representation selection• Algorithmic selection• Efficient partial evaluation• Super compiler that runs for days
![Page 36: Intraprocedural Optimizations Jonathan Bachrach MIT AI Lab](https://reader035.vdocuments.net/reader035/viewer/2022062422/56649ebe5503460f94bc7b52/html5/thumbnails/36.jpg)
Reading List
• Chambers: “Efficient Implementation of Object-oriented Programming Languages” OOPSLA Tutorial
• Chambers and Ungar: SELF papers
• Chambers et al.: Vortex papers