semantics-aware performance optimization harry xu cs departmental seminar 01/13/2012
TRANSCRIPT
![Page 1: Semantics-Aware Performance Optimization Harry Xu CS Departmental Seminar 01/13/2012](https://reader036.vdocuments.net/reader036/viewer/2022062518/56649ea95503460f94badc54/html5/thumbnails/1.jpg)
Semantics-Aware Performance Optimization
Harry Xu CS Departmental Seminar
01/13/2012
![Page 2: Semantics-Aware Performance Optimization Harry Xu CS Departmental Seminar 01/13/2012](https://reader036.vdocuments.net/reader036/viewer/2022062518/56649ea95503460f94badc54/html5/thumbnails/2.jpg)
Who Am I
• Recently got my Ph.D. (in 08/11)
• Interested in (static and dynamic) program analysis– Theoretical
foundations – Applications
• Recent interest--- software bloat analysishttp://www.ics.uci.edu/~guoqingx
Looking for motivated Ph.D. students
![Page 3: Semantics-Aware Performance Optimization Harry Xu CS Departmental Seminar 01/13/2012](https://reader036.vdocuments.net/reader036/viewer/2022062518/56649ea95503460f94badc54/html5/thumbnails/3.jpg)
Is Today’s Software Fast Enough?
• Pervasive use of large-scale, enterprise-level applications– Layers of libraries and frameworks– Object-orientation encourages excess
• No free lunch anymore from hardware advances – The size of software grows faster than the
hardware capabilities (a.k.a. Myhrvold’s Law)
![Page 4: Semantics-Aware Performance Optimization Harry Xu CS Departmental Seminar 01/13/2012](https://reader036.vdocuments.net/reader036/viewer/2022062518/56649ea95503460f94badc54/html5/thumbnails/4.jpg)
As A ResultHeaps are getting bigger• Grown from 500M to 2-3G or more in the past few years• But not necessarily supporting more users or functions
Surprisingly common (all are from real apps):• Supporting thousands of users (millions are expected)• Saving 500K session state per user (2K is expected)• Requiring 2M for a text index per simple document• Creating 100K temporary objects per web hit
Big impact on applications in a lot of different domains
![Page 5: Semantics-Aware Performance Optimization Harry Xu CS Departmental Seminar 01/13/2012](https://reader036.vdocuments.net/reader036/viewer/2022062518/56649ea95503460f94badc54/html5/thumbnails/5.jpg)
Let’s Do Optimizations
• Dynamic optimizations really helped– JIT compilers can significantly lower the level of
inefficiencies– Combine traditional dataflow analyses with inlining– Optimizations as part of the run time
• Situations where traditional compilers may not work well– Dynamic languages (Dr. Michael Franz’s interest)– Semantic inefficiencies (my interest :)
![Page 6: Semantics-Aware Performance Optimization Harry Xu CS Departmental Seminar 01/13/2012](https://reader036.vdocuments.net/reader036/viewer/2022062518/56649ea95503460f94badc54/html5/thumbnails/6.jpg)
Semantic Inefficiencies
• Performance problems caused by developers’ inappropriate choices and mistakes
• In many cases, they are from positive software engineering practices– Make everything as general as possible– Use objects for all simple tasks– Negative effects multiply when we pile up abstractions
• Semantics-agnostic optimizations cannot optimize them away– Human insight is required
![Page 7: Semantics-Aware Performance Optimization Harry Xu CS Departmental Seminar 01/13/2012](https://reader036.vdocuments.net/reader036/viewer/2022062518/56649ea95503460f94badc54/html5/thumbnails/7.jpg)
Key Insight
Bringing semantic information into the optimizer is more important than developing sophisticated analyses that are still semantics-agnostic
Develop semantics-aware optimizations
![Page 8: Semantics-Aware Performance Optimization Harry Xu CS Departmental Seminar 01/13/2012](https://reader036.vdocuments.net/reader036/viewer/2022062518/56649ea95503460f94badc54/html5/thumbnails/8.jpg)
Outline
• Motivation and Introduction• LeakChaser: a semantics-aware memory leak
detector [PLDI 2011]• CoCo: a sound and adaptive system for online
replacement of data structures [Ongoing]
![Page 9: Semantics-Aware Performance Optimization Harry Xu CS Departmental Seminar 01/13/2012](https://reader036.vdocuments.net/reader036/viewer/2022062518/56649ea95503460f94badc54/html5/thumbnails/9.jpg)
Java Memory Leaks• Objects are reachable, but not used– E.g., cached in a big HashMap, but never removed
• Existing memory leak detection techniques– Unaware of program semantics: track arbitrary
objects– No focus: profiling the whole execution—real causes
buried in a sea of likely problems• Developer insight is necessary in leak detection—
a three-tier approach to exploit such insight Tier L Tier M Tier H Manual
LeakChaser
Manual
LeakChaser
Manual
LeakChaser
![Page 10: Semantics-Aware Performance Optimization Harry Xu CS Departmental Seminar 01/13/2012](https://reader036.vdocuments.net/reader036/viewer/2022062518/56649ea95503460f94badc54/html5/thumbnails/10.jpg)
Exploiting Developer Insight
10
• Let programmers write specifications• Lifetime invariants often exist in large-scale
apps
– Lifetimes for certain objects are strongly correlated
Would like to have a new assertion framework
Screen s = new Screen(…);Configuration c = new Configuration();/*s and c should always be created together and eventually die together */
![Page 11: Semantics-Aware Performance Optimization Harry Xu CS Departmental Seminar 01/13/2012](https://reader036.vdocuments.net/reader036/viewer/2022062518/56649ea95503460f94badc54/html5/thumbnails/11.jpg)
Tier L: Low-Level Liveness Assertions
• An assertion framework to specify lifetime relationships– assertDiesBefore(c, s) // s: Screen, c:
Configuration– assertDiesBeforeAlloc(s, s)
• Can be used to assert arbitrary objects that have high-level semantic relationships
11
![Page 12: Semantics-Aware Performance Optimization Harry Xu CS Departmental Seminar 01/13/2012](https://reader036.vdocuments.net/reader036/viewer/2022062518/56649ea95503460f94badc54/html5/thumbnails/12.jpg)
High-Level Events: Transactions• Frequently-executed code regions – Likely to contain memory leaks– Inspired by EJB transactions
• Allow programmers to specify transactions
12
ResultSet runQuery(String query){ Connection c = getConnection(…); Statement s = c.createStmt(); ResultSet r = s.executeQuery(query); return r;}
runQuery runQuery runQueryrunQuery
…… …Heap
![Page 13: Semantics-Aware Performance Optimization Harry Xu CS Departmental Seminar 01/13/2012](https://reader036.vdocuments.net/reader036/viewer/2022062518/56649ea95503460f94badc54/html5/thumbnails/13.jpg)
Transaction Specification
• Transaction– A user-specified spatial boundary– A transaction identifier object that is correlated with
the livenss of this region: temporal boundary13
ResultSet runQuery(String query){ transaction { Connection c = getConnection(…); Statement s = c.createStmt(); ResultSet r = s.executeQuery(query); } return r;}
(query)
![Page 14: Semantics-Aware Performance Optimization Harry Xu CS Departmental Seminar 01/13/2012](https://reader036.vdocuments.net/reader036/viewer/2022062518/56649ea95503460f94badc54/html5/thumbnails/14.jpg)
Tier M: Checking Transaction Properties
• Semanticsfor each object o created in this transaction do assertDiesBefore (o, query)
• Share regionfor each object o created in this transaction if o is not created in share do assertDiesBefore (o, query)
14
ResultSet runQuery(String query){ transaction (query) { Connection c = getConnection(…); share{ Statement s = c.createStmt(); globalMap.cache(s); } ResultSet r = s.executeQuery(query); } return r;}
Programmers do not need to understand implementation details to use low-level assertions
![Page 15: Semantics-Aware Performance Optimization Harry Xu CS Departmental Seminar 01/13/2012](https://reader036.vdocuments.net/reader036/viewer/2022062518/56649ea95503460f94badc54/html5/thumbnails/15.jpg)
Tier H: Inferring Transaction Properties• Minimum requirement
for user involvement– Specify a transaction– Tell LeakChaser to run in
the inference mode• Semantics
for each o created in this transactionif assertDiesBefore
(o, query) = false startTrackStaleness(o) if(o.staleness >= S) reportLeak(); 15
ResultSet runQuery(String query){ transaction (query ){ Connection c = getConnection(…); Statement s = c.createStmt(); globalMap.cache(s); ResultSet r = s.executeQuery(query); } return r;}
, INFER
Programmers let LeakChaser do most of the work
![Page 16: Semantics-Aware Performance Optimization Harry Xu CS Departmental Seminar 01/13/2012](https://reader036.vdocuments.net/reader036/viewer/2022062518/56649ea95503460f94badc54/html5/thumbnails/16.jpg)
Three-Tier Approach
• Tier H, Tier M, and Tier L– Decreasing levels of abstraction– More knowledge required for diagnosis– Increased precision
• LeakChaser: an iterative diagnosis process– Start Tier H with little knowledge– Gradually explore leaky behaviors to locate the
root cause
16
![Page 17: Semantics-Aware Performance Optimization Harry Xu CS Departmental Seminar 01/13/2012](https://reader036.vdocuments.net/reader036/viewer/2022062518/56649ea95503460f94badc54/html5/thumbnails/17.jpg)
Case Studies
• Six case studies on real-world applications– Eclipse diff (bug #115789)– SPECJbb 2000: LeakChaser found a memory problem never
reported before– Eclipse editor (bug #139465): quickly concluded that this was not
a bug– Eclipse WTP (bug #155898): found the cause for this bug that was
reported three years ago and is still open– MySQL leak– Mckoi leak: first time found that a leaking thread is the root
cause• The ability of diagnosing problems for a large system at its
client17
![Page 18: Semantics-Aware Performance Optimization Harry Xu CS Departmental Seminar 01/13/2012](https://reader036.vdocuments.net/reader036/viewer/2022062518/56649ea95503460f94badc54/html5/thumbnails/18.jpg)
Implementation
• Jikes RVM– Works for both baseline and optimizing compilers– Works for all non-generational tracing GCs
• Overhead on FastAdaptiveImmix (average)– Infrastructure: 10% on time, less than 10% on
space – Transactions: 2.3X slowdown for 1088377
transactions• LeakChaser is available for download– http://jikesrvm.org/Research+Archive
18
![Page 19: Semantics-Aware Performance Optimization Harry Xu CS Departmental Seminar 01/13/2012](https://reader036.vdocuments.net/reader036/viewer/2022062518/56649ea95503460f94badc54/html5/thumbnails/19.jpg)
Summary
• Developer insight is given in the form of specifications
• Any other ways to express developer insight?
![Page 20: Semantics-Aware Performance Optimization Harry Xu CS Departmental Seminar 01/13/2012](https://reader036.vdocuments.net/reader036/viewer/2022062518/56649ea95503460f94badc54/html5/thumbnails/20.jpg)
Outline
• Motivation and Introduction• LeakChaser: a semantics-aware memory leak
detector [PLDI 2011]• CoCo: a sound and adaptive system for online
replacement of data structures [Ongoing]
![Page 21: Semantics-Aware Performance Optimization Harry Xu CS Departmental Seminar 01/13/2012](https://reader036.vdocuments.net/reader036/viewer/2022062518/56649ea95503460f94badc54/html5/thumbnails/21.jpg)
Container Inefficiencies
• Inappropriate choice of container is an important source of bloat
• ExamplesUse HashSet to store very few elements
ArraySet or SingletonSet
Call many get(i) on a LinkedList
ArrayList
…
![Page 22: Semantics-Aware Performance Optimization Harry Xu CS Departmental Seminar 01/13/2012](https://reader036.vdocuments.net/reader036/viewer/2022062518/56649ea95503460f94badc54/html5/thumbnails/22.jpg)
Optimizing Containers
• Container semantics required– Different design and implementation rationales
• Chameleon – an offline approach [PLDI 2009] – Profile container usage – Report problematic container choices– Make recommendations
• Make it online? – remove burden from developers completely– Appears to be an impossible task – Soundness – how to provide consistency guarantee– Performance – how to reduce switch overhead
![Page 23: Semantics-Aware Performance Optimization Harry Xu CS Departmental Seminar 01/13/2012](https://reader036.vdocuments.net/reader036/viewer/2022062518/56649ea95503460f94badc54/html5/thumbnails/23.jpg)
Nothing Is Impossible
• The CoCo approach – Users specify replacement rules, e.g.,
LinkedList ArrayList if #get(i) > X– CoCo switches implementations at run time
• CoCo is an application-level approach that performs optimizations via pure Java code– Manually modified container code– Automatically generated glue code
![Page 24: Semantics-Aware Performance Optimization Harry Xu CS Departmental Seminar 01/13/2012](https://reader036.vdocuments.net/reader036/viewer/2022062518/56649ea95503460f94badc54/html5/thumbnails/24.jpg)
CoCo System Overview
1. Manually modify container classes to make them CoCo-optimizable
2. Use CoCo static compiler to generate glue code and compile it with the optimizable container classes
3. Run the program with our modified JikesRVM
![Page 25: Semantics-Aware Performance Optimization Harry Xu CS Departmental Seminar 01/13/2012](https://reader036.vdocuments.net/reader036/viewer/2022062518/56649ea95503460f94badc54/html5/thumbnails/25.jpg)
The CoCo Methodology
• CoCo works only for same-interface optimizations– LinkedList can be replaced only with another List
(that implement java.util.List)• For each allocation of container type c, create
a group of other containers {c1, c2, c3}LinkedList l = new LinkedList();
LinkedList
ArrayList HashArrayList
XYZListActive
Inactive
Container Combo
![Page 26: Semantics-Aware Performance Optimization Harry Xu CS Departmental Seminar 01/13/2012](https://reader036.vdocuments.net/reader036/viewer/2022062518/56649ea95503460f94badc54/html5/thumbnails/26.jpg)
Soundness• All operations are
performed only on the active container
• When an object is added into the active container, its abstraction is added into inactive containers
LinkedList
ArrayList
HashArrayList
XYZList
Active
Inactive
Comboadd(o)
get()
addAbstraction(α)
addAbstraction(α)
addAbstraction(α)
Inactive
Inactive
![Page 27: Semantics-Aware Performance Optimization Harry Xu CS Departmental Seminar 01/13/2012](https://reader036.vdocuments.net/reader036/viewer/2022062518/56649ea95503460f94badc54/html5/thumbnails/27.jpg)
Soundness• Once a switch rule
evaluates to true, an inactive container becomes active
• If an abstraction is located by a retrieval, it is concretized to provide safety guarantee
LinkedList
ArrayList
HashArrayList
XYZList
active
Inactive
Combo
Inactive
Inactive
o
α
α
α
if #get(i) > X LinkedList ArrayList
get()α
concretizeo
![Page 28: Semantics-Aware Performance Optimization Harry Xu CS Departmental Seminar 01/13/2012](https://reader036.vdocuments.net/reader036/viewer/2022062518/56649ea95503460f94badc54/html5/thumbnails/28.jpg)
Optimizable Container Classesclass LinkedList implements List { void add (Object o) { //actual stuff }
Object get (int index) {//actual stuff }
}
class ArrayList implements List { void add (Object o) { //actual stuff } Object get (int index) { //actual stuff }
}
class ListCombo implements List { void add(Object o) {
} Object get(int index) {
}}
}
$CoCo
$CoCo
$CoCo
$CoCo
void add (Object o) {combo.add(o);}
ListCombo combo = … ;
Object get (int index) { return combo.get(index);}
void add (Object o) {combo.add(o);}
ListCombo combo = … ;
Object get (int index) { return combo.get(index);}
List active = …;List[] inactiveList = …;
active.add$CoCo(o);
Object o = active.get$CoCo(index);
Manually modified Automatically generated
for each l in inactiveList { l.addAbstract(α); }
If (o instanceof Abstraction) { o = active.concretize(o);} return o;
![Page 29: Semantics-Aware Performance Optimization Harry Xu CS Departmental Seminar 01/13/2012](https://reader036.vdocuments.net/reader036/viewer/2022062518/56649ea95503460f94badc54/html5/thumbnails/29.jpg)
Optimizable Container ClassesLinkedList l = new LinkedList();
create combo
create inactive lists
l.add(o);
call
forward
dispatch
![Page 30: Semantics-Aware Performance Optimization Harry Xu CS Departmental Seminar 01/13/2012](https://reader036.vdocuments.net/reader036/viewer/2022062518/56649ea95503460f94badc54/html5/thumbnails/30.jpg)
Perform Online Container Switch
• Change field active to the appropriate container
• The client still interfaces with the original container
class ListCombo implements List { void add(Object o) {
} Object get(int index) { }
}
active.add$CoCo(o); …
Object o = active.get$CoCo(index); …
void profileAndReplace(int oprType) { switch(oprType) { //profiling case ADD: ADD_OPR++; break; case GET: GET_OPR++; break; } //rules if (GET_OPR > X && active instanceof LinkedList) swap(active, inactive[i]); // inactive[i] is ArrayList}
profileAndReplace(ADD);
profileAndReplace(GET);
![Page 31: Semantics-Aware Performance Optimization Harry Xu CS Departmental Seminar 01/13/2012](https://reader036.vdocuments.net/reader036/viewer/2022062518/56649ea95503460f94badc54/html5/thumbnails/31.jpg)
Abstraction
• An abstraction is a placeholder for a set of concrete elements– Container specific– Its granularity influences performance
• For List, an abstraction contains– Host container ID– Indices of a range of elements it represents
![Page 32: Semantics-Aware Performance Optimization Harry Xu CS Departmental Seminar 01/13/2012](https://reader036.vdocuments.net/reader036/viewer/2022062518/56649ea95503460f94badc54/html5/thumbnails/32.jpg)
Concretization
• Bring back all elements represented• Containers to be optimized– Better have significant algorithmic advantages in certain
execution scenarios– Modify 4 containers from Java collection framework and
implemented 3 from scratch– List – ArrayList, LinkedList, and HashArrayList
Map – HashMap and ArrayMap Set – HashSet and ArraySet
– The same set of switch rules as used in Chameleon
![Page 33: Semantics-Aware Performance Optimization Harry Xu CS Departmental Seminar 01/13/2012](https://reader036.vdocuments.net/reader036/viewer/2022062518/56649ea95503460f94badc54/html5/thumbnails/33.jpg)
Implementation
• Modify Jikes RVM to provide run-time support– Replace each new java.util.X (…) with new
coco.util.X (…) in both baseline and optimizing compilers
• Optimizations– Dropping combo– Sampling– Lazy creation of inactive containers– Aggressively inline CoCo-related methods
![Page 34: Semantics-Aware Performance Optimization Harry Xu CS Departmental Seminar 01/13/2012](https://reader036.vdocuments.net/reader036/viewer/2022062518/56649ea95503460f94badc54/html5/thumbnails/34.jpg)
Evaluation
• Micro-benchmarks– 75X faster for one benchmark after switching from
ArrayList to HashArrayList• DaCapo– 14 large-scale, real-world programs– 1/50 sampling rate : profileAndReplace is invoked
once per 50 calls to add/get– 8.4% speedup on average
![Page 35: Semantics-Aware Performance Optimization Harry Xu CS Departmental Seminar 01/13/2012](https://reader036.vdocuments.net/reader036/viewer/2022062518/56649ea95503460f94badc54/html5/thumbnails/35.jpg)
Conclusions
• The first semantics-aware bloat removal technique• Developer insight is encoded by – Replacement rules – Abstraction and concretization functions
• Impact on future dynamic optimization research– Develop more semantics-aware optimization techniques– Any other ways to provide compilers with developer
insight?
![Page 36: Semantics-Aware Performance Optimization Harry Xu CS Departmental Seminar 01/13/2012](https://reader036.vdocuments.net/reader036/viewer/2022062518/56649ea95503460f94badc54/html5/thumbnails/36.jpg)
Thanks You
Q/A