ddec: data-driven equivalence checking
DESCRIPTION
Rahul Sharma, Eric Schkufza , Berkeley Churchill, Alex Aiken. DDEC: Data-Driven Equivalence Checking. Equivalence checking. Prove two programs are equivalent Compiler optimizations Validate refactorings Cross checking different implementations Old and well studied problem - PowerPoint PPT PresentationTRANSCRIPT
DDEC: Data-Driven Equivalence Checking Rahul Sharma, Eric Schkufza, Berkeley Churchill, Alex Aiken
Equivalence checking
Prove two programs are equivalent Compiler optimizations Validate refactorings Cross checking different implementations
Old and well studied problem Undecidable in general Major challenge: prove equivalence of
loops Straight line programs relatively easy
Motivating applications
Prove equivalence of two binaries
…while ……
Trustworthy Compiler
CompCert, gcc –O0
Optimizing Compiler
gcc –O3, icc –O3
Confidence of , Performance of
Stochastic superoptimization
StraightLineCode
Trustworthy Compiler
CompCert, gcc –O0
STOKE (ASPLOS 13)
Random mutations
…while ……
Previous work
Do not support “while” loops: [CHR00], [FH02], [FH05], [AEF+05], [SBC+05], [MSF06]
Do not reason about termination: [SDE+08], [GS09], [RE11], [LHM+12], [PY13], [LMS+13]
Translation validation: [Nec00],[GZB05], … Need information from the compiler
Simulation relation
Decompose proofTarget movq 8(rsp), rdi#rdi != 0
movq 8(rsp), rdidecq rdimovq rdi, 8(rsp)
retq
movq 8(rsp), r9
#r9 != 0
decq r9 retq
: states equal
aa’
b b’
c c’
: live out equal: 8(rsp)=rdi=r9’
Rewrite
InferenceGiven a simulation relation, proofs for loops
reduce to proofs for loop free fragments Use decision procedures
Main challenge: infer a simulation relation Infer synchronization points Infer invariants
We use compilers as black boxes
Mine relations from concrete executions
Runtime information
Run some tests to get data From executions, unit tests, random
tests, etc.
Runtime information
Ensure the loops iterate for equal iterations Use data to align and Target
B
retq
B’
retq
Rewrite 2n n
B;B
n
Runtime information
Attempt to detect synchronization points Number of times program points are
executed Values alignTarget
movq 8(rsp), rdi#rdi != 0
movq 8(rsp), rdidecq rdimovq rdi, 8(rsp)
retq
movq 8(rsp), r9
#r9 != 0
decq r9 retq
Rewrite n
1 n
n+1
n+1
n
Invariants
Invariants are restricted to equalities Infer invariants from observed data
values8(rsp) rdi
2 2
1 1
0 0
Target movq 8(rsp), rdi#rdi != 0
movq 8(rsp), rdidecq rdimovq rdi, 8(rsp)
retq
Invariants
Invariants are restricted to equalities Infer invariants from observed data
values 8(rsp) rdi r9’
2 2 2
1 1 1
0 0 0
movq 8(rsp), r9
#r9 != 0
decq r9 retq
Rewrite
Linear algebra
Mine all equalities
Find all s.t. Nullspace or kernel
𝐼≡8 (𝑟𝑠𝑝 )=𝑟𝑑𝑖∧𝑟𝑑𝑖=𝑟 9 ′
𝐼 ′≡4𝑒𝑎𝑥=𝑒𝑑𝑥 ′+3∧10𝑒𝑎𝑥+𝑒𝑑𝑥=𝑒𝑐𝑥 ′
8(rsp) rdi r9’
2 2 2
1 1 1
0 0 0
𝐴≡
Check simulation relation The executions are synchronized The invariants are maintained
Target movq 8(rsp), rdi#rdi != 0
movq 8(rsp), rdidecq rdimovq rdi, 8(rsp)
retq
movq 8(rsp), r9
#r9 != 0
decq r9 retq
aa’
b b’
c c’
Rewrite
8 (𝑟𝑠𝑝 )=𝑟𝑑𝑖∧𝑟𝑑𝑖=𝑟 9 ′
States equal
Live outs equal
Check simulation relation The executions are synchronized The invariants are maintained Queries in quantifier free bitvector arithmetic
Complete SMT solvers! Incorporate counter-examples in relations
Sound but not complete If checking succeeds then equivalent Can fail to infer a sound simulation relation
Limitations
Insufficient data to infer a sound relation
Expressiveness of invariants Inequalities, quantifiers, etc.
Expressiveness of SMT solver Floating point, multiply, divide, etc.
Implementation
Run tests and generate data https://github.com/eschkufz/x64asm
Nullspace computation libIML: integer matrix library
SMT solver: Z3
Case studies
Compute kernel inside OpenSSL
Validating CompCert against gcc
Stochastic optimization for loops
OpenSSL
Multiplication kernel
Extensive performance tests Run the kernel ~15 million times Choose 16 random tests for inference
Compile with gcc –O0 and gcc –O3 Successfully prove equivalence
Cross compiler validation
STOKE
Optimization resultsProgram Stoke vs gcc -O0 Stoke vs gcc –O3Bansal 1.58X 1.04XSAXPY 9.22X 1.48X
Conclusion
Prove equivalence of loops in two stages Infer simulation relation Check the inferred relation using SMT solvers
Use runtime data for inference
No change required to the compilers
Better verifiers lead to better optimizers
Inference from concrete states M. D. Ernst, J. H. Perkins, P. J. Guo, S. McCamant, C. Pacheco,
M. S. Tschantz, and C. Xiao. The Daikon system for dynamic detection of likely invariants. Sci. Comput. Program., 69(1-3):35–45, 2007
T. Nguyen, D. Kapur, W. Weimer, and S. Forrest. Using dynamic analysis to discover polynomial and array invariants. ICSE 2012
P. Garg, C. Löding, P. Madhusudan, D. Neider: Learning Universally Quantified Invariants of Linear Data Structures. CAV 2013
R. Sharma, S. Gupta, B. Hariharan, A. Aiken, P. Liang, A. V. Nori: A Data Driven Approach for Algebraic Loop Invariants. ESOP 2013
R. Sharma, S. Gupta, B. Hariharan, A. Aiken, A. V. Nori: Verification as Learning Geometric Concepts. SAS 2013
A.V. Nori, R. Sharma: Termination proofs from tests. ESEC/SIGSOFT FSE 2013