boosting verification by automatic tuning of decision procedures

33
Automatic Tuning 1/33 Boosting Verification by Automatic Tuning of Decision Procedures Domagoj Babić joint work with Frank Hutter, Holger H. Hoos, Alan J. Hu University of British Columbia

Upload: hinda

Post on 09-Feb-2016

38 views

Category:

Documents


0 download

DESCRIPTION

Boosting Verification by Automatic Tuning of Decision Procedures. Domagoj Babi ć joint work with Frank Hutter, Holger H. Hoos, Alan J. Hu University of British Columbia. Decision procedures. Decision procedure. formula. SAT(solution)/UNSAT. Core technology for formal reasoning - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Boosting Verification by Automatic Tuning of Decision Procedures

Automatic Tuning 1/33

Boosting Verification by Automatic Tuning ofDecision Procedures

Domagoj Babić

joint work with Frank Hutter, Holger H. Hoos, Alan J. Hu University of British Columbia

Page 2: Boosting Verification by Automatic Tuning of Decision Procedures

Automatic Tuning 2/33

Decision procedures

• Core technology for formal reasoning

• Trend towards completely automatized verification– Scalability is problematic– Better (more scalable) decision procedures needed– Possible direction: application-specific tuning

Decisionprocedureformula SAT(solution)/UNSAT

Page 3: Boosting Verification by Automatic Tuning of Decision Procedures

Automatic Tuning 3/33

Outline

• Problem definition• Manual tuning• Automatic tuning• Experimental results• Found parameter sets• Future work

Page 4: Boosting Verification by Automatic Tuning of Decision Procedures

Automatic Tuning 4/33

Performance of Decision Procedures

• Heuristics

• Learning (avoiding repeating redundant work)

• Algorithms

Page 5: Boosting Verification by Automatic Tuning of Decision Procedures

Automatic Tuning 5/33

Heuristics and search parameters

• The brain of every decision procedure– Determine performance

• Numerous heuristics:– Learning, clause database cleanup, variable/phase

decision,...• Numerous parameters:

– Restart period, variable decay, priority increment,...

• Significantly influence the performance• Parameters/heuristics perform differently on

different benchmarks

Page 6: Boosting Verification by Automatic Tuning of Decision Procedures

Automatic Tuning 6/33

Spear bit-vector decision procedureparameter space

• Large number of combinations:– After limiting the range of double & unsigned

– After discretization of double parameters

3.78£1018

– After exploiting dependencies

8.34£1017 combinations– Finding a good

combination – hard!

Spear 1.9:– 4 heuristics X

22 optimization functions– 2 heuristics X

3 optimization functions– 12 double– 4 unsigned– 4 bool

------------------------ 26 parameters

Page 7: Boosting Verification by Automatic Tuning of Decision Procedures

Automatic Tuning 7/33

Goal

• Find a good combination of parameters (and heuristics):– Optimize for different problem sets

(minimizing the average runtime)

• Avoid time-consuming manual optimization

• Learn from found parameter sets– Apply that knowledge to design of decision

procedures

Page 8: Boosting Verification by Automatic Tuning of Decision Procedures

Automatic Tuning 8/33

Outline

• Problem definition• Manual tuning• Automatic tuning• Experimental results• Found parameter sets• Future work

Page 9: Boosting Verification by Automatic Tuning of Decision Procedures

Automatic Tuning 9/33

Manual optimization

• Standard way for finding parameter sets

• Developers pick small set of easy benchmarks(Hard benchmarks = slow development cycle)– Hard to achieve robustness– Easy to over-fit (to small and specific benchmarks)

• Spear manual tuning:– Approximately one week of tedious work

Page 10: Boosting Verification by Automatic Tuning of Decision Procedures

Automatic Tuning 10/33

When to give up manual optimization?

• Depends mainly on sensitivity of the decision procedure to parameter modifications

• Decision procedures for NP-hard problems extremely sensitive to parameter modifications– 1-2 orders of magnitude changes in performance

usual– Sometimes up to 4 orders of magnitude

Page 11: Boosting Verification by Automatic Tuning of Decision Procedures

Automatic Tuning 11/33

Sensitivity Example

• Example: same instance, same parameters, same machine, same solver– Spear compiled with 80-bit floating-point precision:

0.34 [s] – Spear compiled with 64-bit floating-point precision:

times out after 6000 [s]– First ~55000 decisions equal, one mismatch, next

~100 equal, then complete divergence• Manual optimization for NP-hard problems

ineffective.

Page 12: Boosting Verification by Automatic Tuning of Decision Procedures

Automatic Tuning 12/33

Outline

• Problem definition• Manual tuning• Automatic tuning• Experimental results• Found parameter sets• Future work

Page 13: Boosting Verification by Automatic Tuning of Decision Procedures

Automatic Tuning 13/33

Automatic tuning

• Loop until happy (with found parameters)

– Perturb existing set of parameters

– Perform hill-climbing:• Modify one parameter at the time• Keep modification if improvement• Stop when a local optimum is found

Page 14: Boosting Verification by Automatic Tuning of Decision Procedures

Automatic Tuning 14/33

Implementation: FocusedILS [Hutter, Hoos, Stutzle, ’07]

• Used for Spear tuning• Adaptively chooses training instances

– Quickly discard poor parameter settings– Evaluate better ones more thoroughly

• Any scalar metric can be optimized– Runtime, precision, number of false

positives,...• Can optimize median, average,...

Page 15: Boosting Verification by Automatic Tuning of Decision Procedures

Automatic Tuning 15/33

Outline

• Problem definition• Manual tuning• Automatic tuning• Experimental results• Found parameter sets• Future work

Page 16: Boosting Verification by Automatic Tuning of Decision Procedures

Automatic Tuning 16/33

Experimental Setup - Benchmarks

• 2 experiments:– General purpose tuning (Spear v0.9)

• Industrial instances from previous SAT competitions– Application-specific tuning (Spear v1.8)

• Bounded model checking instances (BMC)• Calysto software checking instances

• Machines– 55 dual 3.2 GHz Intel Xeon PCs w/ 2 GB RAM cluster

• Benchmark sets divided– Training & test, disjoint– Test timeout: 10 hrs

Page 17: Boosting Verification by Automatic Tuning of Decision Procedures

Automatic Tuning 17/33

Tuning 1: General-purpose optimization

• Training – Timeout: 10 sec– Risky, but no experimental evidence of over-fitting– 3 days of computation on cluster

• Very heterogeneous training set– Industrial instances from previous competitions

• 21% geometric mean speedup on industrial test set over the manual settings

• ~3X on bounded model checking• ~78X on Calysto software checking

Page 18: Boosting Verification by Automatic Tuning of Decision Procedures

Automatic Tuning 18/33

Tuning 1: Bounded model checking instances

Page 19: Boosting Verification by Automatic Tuning of Decision Procedures

Automatic Tuning 19/33

Tuning1: Calysto instances

Page 20: Boosting Verification by Automatic Tuning of Decision Procedures

Automatic Tuning 20/33

Tuning 2: Application-specific optimization

• Training – Timeout: 300 sec– Bounded model checking optimization – 2 days on the cluster– Calysto instances – 3 days on the cluster

• Homogeneous training set

• Speedups over SAT competition settings:– ~2X on BMC– ~20X on SWV

• Speedups over manual settings:– ~4.5X on BMC– ~500X on SWV

Page 21: Boosting Verification by Automatic Tuning of Decision Procedures

Automatic Tuning 21/33

Tuning 2:Bounded model checking instances

~4.5X

Page 22: Boosting Verification by Automatic Tuning of Decision Procedures

Automatic Tuning 22/33

Tuning 2: Calysto instances

~500X

Page 23: Boosting Verification by Automatic Tuning of Decision Procedures

Automatic Tuning 23/33

Overall Results

Solver BMC SWV

#(solved) Avg.runtime (solved)

#(solved) Avg.runtime (solved)

Minisat 289/377 360.9 302/302 161.3

Spear manual

287/377 340.8 298/302 787.1

Spear SAT comp

287/377 223.4 302/302 35.9

Spear auto-tunedapp-specific

291/377 113.7 302/302 1.5

Page 24: Boosting Verification by Automatic Tuning of Decision Procedures

Automatic Tuning 24/33

Overall Results

Solver BMC SWV

#(solved) Avg.runtime (solved)

#(solved) Avg.runtime (solved)

Minisat 289/377 360.9 302/302 161.3

Spear manual

287/377 340.8 298/302 787.1

Spear SAT comp

287/377 223.4 302/302 35.9

Spear auto-tunedapp-specific

291/377 113.7 302/302 1.5

Page 25: Boosting Verification by Automatic Tuning of Decision Procedures

Automatic Tuning 25/33

Overall Results

Solver BMC SWV

#(solved) Avg.runtime (solved)

#(solved) Avg.runtime (solved)

Minisat 289/377 360.9 302/302 161.3

Spear manual

287/377 340.8 298/302 787.1

Spear SAT comp

287/377 223.4 302/302 35.9

Spear auto-tunedapp-specific

291/377 113.7 302/302 1.5

Page 26: Boosting Verification by Automatic Tuning of Decision Procedures

Automatic Tuning 26/33

Overall Results

Solver BMC SWV

#(solved) Avg.runtime (solved)

#(solved) Avg.runtime (solved)

Minisat 289/377 360.9 302/302 161.3

Spear manual

287/377 340.8 298/302 787.1

Spear SAT comp

287/377 223.4 302/302 35.9

Spear auto-tunedapp-specific

291/377 113.7 302/302 1.5

Page 27: Boosting Verification by Automatic Tuning of Decision Procedures

Automatic Tuning 27/33

Overall Results

Solver BMC SWV

#(solved) Avg.runtime (solved)

#(solved) Avg.runtime (solved)

Minisat 289/377 360.9 302/302 161.3

Spear manual

287/377 340.8 298/302 787.1

Spear SAT comp

287/377 223.4 302/302 35.9

Spear auto-tunedapp-specific

291/377 113.7 302/302 1.5

Page 28: Boosting Verification by Automatic Tuning of Decision Procedures

Automatic Tuning 28/33

Outline

• Problem definition• Manual tuning• Automatic tuning• Experimental results• Found parameter sets• Future work

Page 29: Boosting Verification by Automatic Tuning of Decision Procedures

Automatic Tuning 29/33

Software verification parameters

– Greedy activity-based heuristic• Probably helps focusing on the most frequently

used sub-expressions– Aggressive restarts

• Probably standard heuristics and initial ordering do not work well for SWV problems

– Phase selection: always false• Probably related to checked property

(NULL ptr dereference)– No randomness

• Spear & Calysto highly optimized

Page 30: Boosting Verification by Automatic Tuning of Decision Procedures

Automatic Tuning 30/33

Bounded model checking parameters

– Less aggressive activity heuristic– Infrequent restarts

• Probably initial ordering (as encoded) works well– Phase selection: less watched clauses

• Minimizes the amount of work– Small amount of randomness helps

• 5% random variable and phase decisions– Simulated annealing works well

• Decrease randomness by 30% after each restart• Focuses the solver on hard chunks of the design

Page 31: Boosting Verification by Automatic Tuning of Decision Procedures

Automatic Tuning 31/33

Outline

• Problem definition• Manual tuning• Automatic tuning• Experimental results• Found parameter sets• Future work

Page 32: Boosting Verification by Automatic Tuning of Decision Procedures

Automatic Tuning 32/33

Future Work

• Per-instance tuning(machine-learning-based techniques)

• Analysis of relative importance of parameters– Simplify the solver

• Tons of data, little analysis done... Correlations between parameters and stats could reveal important dependencies...

Page 33: Boosting Verification by Automatic Tuning of Decision Procedures

Automatic Tuning 33/33

Take-away messages

• Automatic tuning effective– Especially application-specific

• Avoids time-consuming manual tuning

• Sensitivity to parameter modifications– Few benchmarks = inconclusive results?