presto: program analyses and software tools research group, ohio state university efficient...
TRANSCRIPT
PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Efficient Checkpointing of Java Software using Context-Sensitive Capture and Replay
Guoqing Xu, Atanas Rountev, Yan Tang, Feng Qin
Ohio State University
ESEC/FSE 07
22 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Outline
Motivation- Challenges for checkpointing/replaying Java
software- Summary of our approach
Contributions- Static analyses- Multiple execution regions- Experimental evaluation
Conclusions
33 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Motivation Checkpointing/replaying has been used for a
variety of purposes at system level- Originally designed to support fault tolerance- Debugging of OS and of parallel and distributed
software
Checkpointing can benefit a number of software engineering tasks- Reduce the cost of manual debugging and testing- Support for automated techniques for debugging
and testing: e.g., dynamic slicing and delta-debugging
- Inspired by both system-level checkpointing [Pan-PDD88, Dunlap-OSDI02, King-USENIX05] and “saving-and-restoring” software engineering techniques [Saff-ASE05, Orso-WODA05, Orso-WODA06, Elbaum-FSE06]
44 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Challenges Ease of use and deployment
- Application-level checkpointing: no JVM/runtime support, just code analysis and instrumentation
- Challenge: no direct access to the call stack; no control over thread scheduling or external resources (files, etc.)
Reduce the size of the recorded state- Dumping the entire heap may be prohibitively
expensive, especially for large programs- Challenge: static analyses to prune redundant state
Static and dynamic overhead- Static analysis cost is amortized over multiple runs- Approach is intended for long-running applications
55 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Summary of Our Approach Tool input: program + checkpoint definition Performs static analyses and code instrumentation Tool output: two program versions First, an augmented checkpointing version is
executed once to record (parts of) the run-time program states - At the checkpoint: heap objects, static fields, locals- At certain points along the call chain leading to the
checkpoint Next, a pruned replaying version is executed multiple
times- Restore variables saved at the checkpoint- Restore variables saved at points along the call chain
How do we resume execution from the checkpoint?- Step 1: control flow quickly reaches the checkpoint- Step 2: recover state at checkpoint- Step 3: incrementally recover state after call sites along the
call chain leading to the checkpoint
66 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Definitions Crosscut call chain (CC-chain)
- A programmer-specified call chain that leads to the method that contains the checkpoint
- E.g. main(44) -> run(28)
Decision points - A call site on the CC-chain (e.g. m.run) – due to
polymorphism- A predicate on which a decision point or the
checkpoint is control-dependent
At a decision point, the checkpointing version records the control-flow outcome
The replaying version uses this info to force the control flow to reach the checkpoint
77 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Replaying, Step 1: Recover the Call Stack
Predicate decision point: recover boolean value
Call site decision point o.m(a1…, an)- Recover the run-time type of the receiver object;
instantiated during replaying using sun.misc.Unsafe
88 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Checkpointing Versionvoid run(String[] args) { processCmdLine(args); loadNecessaryClasses(); Set wp_packs = getWpacks(); Set body_packs = getBpacks(); boolean b = Options.v().whole_jimple(); => save(b); if (b){// DP getPack("cg").apply(); // --- checkpoint --- => save(…); getPack("wjtp").apply(); getPack("wjop").apply(); getPack("wjap").apply(); } retrieveAllBodies(); … } ...}
static void main(String[] args) { Main m = new Main(); boolean b = args.length !=0; => save(b); if (b) // DP => save(type_of(m)); m.run(args); // DP}
99 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Replaying Versionvoid run(String[] args) { processCmdLine(args); loadNecessaryClasses(); Set wp_packs = getWpacks(); Set body_packs = getBpacks(); boolean b = Options.v().whole_jimple(); => read(b); if (b){// DP getPack("cg").apply(); // --- checkpoint --- =>read(…); getPack("wjtp").apply(); getPack("wjop").apply(); getPack("wjap").apply(); } retrieveAllBodies(); … }
static void main(String[] args) { Main m = new Main(); boolean b = args.length !=0; => read(b); if (b) // DP => read(type_of(m)); => unsafe.allocate(m); => args = null; m.run(args); // DP}
1010 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Step 2: Recover at the Checkpoint Our static analysis selects locals for
recording(for checkpointing)/recovering(for replaying) when- They are written before the checkpoint- They are read after the checkpoint
Record primitive-typed values or entire object graphs on the heap (all reachable objects)
Static fields are selected based on the same idea
void run(String[] args) { processCmdLine(args); loadNecessaryClasses(); Set wp_packs = getWpacks(); Set body_packs = getBpacks(); if (Options.v().whole_jimple()) { getPack("cg").apply(); // --- checkpoint --- getPack("wjtp").apply(); getPack("wjop").apply(); getPack("wjap").apply(); } retrieveAllBodies(); for (Iterator i = body_packs.iterator(); i.hasNext();) { … }… }
body_packs
1111 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Selection of Static Fields A whole program Mod/Use analysis
- A static field is “written” if its value is changed, or any heap object reachable from it is mutated
- A static field is “read” if its value is directly read
Analysis algorithm- Context-sensitive and flow-insensitive; uses the
points-to solution and the call graph from Spark [Lhotak CC-03]
- Bottom-up traversal of the SCC-DAG of the call graph
- For each method m, a set Cm is maintained to contain all objects from which a mutated object can be reached
- Propagate backwards the objects in Cm that escape a callee method to its callers
- Select a static field fld if PointsToSet(fld) ∩ Cm ≠ ∅
1212 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Step 3: Recover after the Checkpoint Replaying only at decision points and the
checkpoint is not enough to guarantee correct execution after the checkpoint
Additionally record/recover local variables that will be read after each call site in CC-chain
void main(){
Set hs = new HashSet();
B b = new B(hs);
//-- reco/rest //(type_of(b))
b.m();
//-- extra reco/rest (hs)
if(hs == b.s){ … }
}
class B{
Set s;
void m(){
B r0 = this;
r0.s = new HashSet();
//-- checkpoint
//-- reco/rest (r0)
r0.s.add(“”);
}
}
hs uninitialize
d
1313 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Additional Issues A checkpoint can have multiple run-time
instances If a method in CC-chain has callers that are
not in the chain, it has to be replicated Currently do not support multi-threaded
programs Our technique does not guarantee the
correctness of the execution, when the post-checkpoint part of the program- Depends on external resources, such as files,
databases- Depends on unique-per-execution values, such as
clock- Is modified with new cross-checkpoint
dependencies Multiple execution regions
- Designated by a starting point and an ending point- Specified by two CC-chains
1414 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Study 1: Static Analysis
5 3 jb-6.1
8 3 jlex-1.2.6
5 2 db
4 2 jtar-1.21
8 2 jflex
9 4 violet
8 3 jess
11 4 sablecc
9 4 javacup
35 10 soot-2.2.3
10 3 raytrace
14 3 socksecho
11 3 socksproxy
6 1 compress
20 3 muffin
#IP #R Program
1515 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Static Analysis: Locals Reduction
0
200
400
600
800
1000
1200
1400
1600
1800 Total Locals Selected Locals
1616 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Static Analysis: Static Fields Reduction
0
500
1000
1500
2000
2500
3000
3500 Total SF Selected SF
1717 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Static Analysis: Removed/Inserted Statements
0
20
40
60
80
100
120Stmts Left after Pruning(%) Stmts Inserted(%)
1818 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Static Analysis Cost Phase 1: Soot infrastructure cost
- Between 1.64ms and 30.6ms per thousand Jimple statements
- On average, 11.1ms/1000 statements
Phase 2: Our analysis cost- Between 1.67ms and 26.6ms per thousand Jimple
statements- On average, 9.4ms/1000 statements
This should be amortized across multiple runs of the replaying version
1919 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Study 2: Run-Time Performance (compress) Original program: compressing and
decompressing 5 big tar files several times Evaluated for five checkpoint definitions
- One checkpoint, close to the beginning of the program
- Two regions of compression and decompression- A region containing the process of compression- A region containing the process of decompression- One checkpoint, close to the end of the program
2020 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
compress Performance Normalized
running time
Normalized size of captured program state
0
20
40
60
80
100
120
140
1 2 3 4 5
checkpointing version replaying version
0
10
20
30
40
50
60
70
80
90
100
1 2 3 4 5
Size of Heap Size of Captured Program State
2121 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Study 2: Run-Time Performance (soot) Input: soot-2.2.3 itself containing 2227333
methods Phases
- Enabling cg.spark, wjtp, wjop.ji, wjap.uft, jtp, jop.cp
Evaluated for six checkpoint definitions- Before whole-program packs- After cg- After wjtp- After wjop- After wjap- After body packs
2222 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
soot Performance Normalized
running time
Normalized size of captured program state
0
20
40
60
80
100
120
1 2 3 4 5 6
Checkpointing version Replaying version
0
10
20
30
40
50
60
70
80
90
100
1 2 3 4 5 6
Size of Heap Size of Captured Program State
2323 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Study 2: Run-Time Performance (jflex-1.4.1)
Input: a .flex grammar file corresponding to a DFA containing 21769 states
Evaluated for four checkpoint definitions- After NFA is generated- After DFA is generated to DFA- After minimization - After emission
2424 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
jflex Performance Normalized
running time
Normalized size of captured program state
0
50
100
150
1 2 3 4
Replaying version Checkpointing version
0
10
20
30
40
50
60
70
80
90
100
1 2 3 4
Size of Heap Size of Captured Program State
2525 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Summary of Evaluation Static analysis successfully reduces the size of
program state recorded and recovered It is more meaningful to checkpoint/replay
long-running programs Checkpoints are better taken after a phase of
long computation with (relatively) small output state- √ compress: small program state, short running
time- √ soot: large program state, but very long computation time- X jflex: large program state, short running time
2626 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Conclusions A static-analysis-based
checkpointing/replaying technique An implementation and an evaluation that
shows our technique can be an interesting candidate for testing, debugging, and dynamic slicing of long-running programs
Future work- Language-level checkpointing/replaying multi-
threaded programs- More precise static analyses could be employed to
reduce the size of program state to be captured- The run-time support for object reading and writing
could be improved