decision procedures for string constraints
DESCRIPTION
Decision Procedures for String Constraints. Pieter Hooimeijer. http://en.wikipedia.org/wiki/Osborne_1. < img src = ' untrusted input '/>. What could possibly go wrong?. < img src = ' untrusted input '/>. Attacker : im.png' onload =' javascript :. < img src = ' untrusted input '/>. - PowerPoint PPT PresentationTRANSCRIPT
2
Decision Procedures for String
Constraints
Pieter Hooimeijer
3http://en.wikipedia.org/wiki/Osborne_1
4
5
<img src='untrusted input'/>
What couldpossibly go wrong?
6
7
Attacker:
im.png' onload='javascript:...
<img src='untrusted input'/>
8
Attacker:
im.png' onload='javascript:...
<img src='untrusted input'/>
9
Attacker:
im.png' onload='javascript:...
<img src='untrusted input'/>
<img src='im.png' onload ='j
10
Attacker:
im.png' onload='javascript:...
<img src='untrusted input'/>
<img src='im.png' onload ='j
11
12
www.cs.virginia.edu/~ph4u/
Talk Outline
13
Background Building Tuning Conclusion
Talk Outline
14
Background Building Tuning Conclusion
15
2007 2008 2009 2010 2011 2012
ASEBug Reports
SocialNetsProxied Content
SesenaMacroLab 3
SensysMacroLab 2
USENIX SecBEK
POPLBEK2
SensysMacroLab
ISSTAHampi
TOSEMHampi 2
2013
PLDIDPRLE
ASEStrSolve
VMCAIData structures
J. ASEStrSolve 2
16
2007 2008 2009 2010 2011 2012
ASEBug Reports
SocialNetsProxied Content
SesenaMacroLab 3
SensysMacroLab 2
USENIX SecBEK
POPLBEK2
SensysMacroLab
ISSTAHampi
TOSEMHampi 2
2013
This Talk
PLDIDPRLE
ASEStrSolve
VMCAIData structures
J. ASEStrSolve 2
Decision Procedures
• Program analysis work frequently uses one of these:
• They solve mathematical constraints
• There is a standard input format
17
Example
18
[𝑥↦5]
19
(declare-fun x () Int)(assert (= (* x x) 25))(assert (> x 0))(check-sat)(get-model)
✔
20
Motivation
Reasoning about strings is difficult:– for programmers– for automated tools
String Constraint Solvers
21
Kaluza
Hampi
Rex
22
KaluzaHampi Rex
String a;//...R = Regex("^ab$");R.IsMatch(a) = true;
String a;//...R = Regex("^ab$");assert(R.Match(a));
String a;//...R = Regex("^ab$");R.IsMatch(a) = true;
[𝑎↦ ′ab ′ ]
23
✔
String a;//...R = Regex("^ab$");assert(R.Match(a));
KaluzaHampi Rex
String a;//...R = Regex("^ab$");R.IsMatch(a) = true;
[𝑎↦ ′ab ′ ]
24
✔
String a;//...R = Regex("^ab$");assert(R.Match(a));
KaluzaHampi Rex
solution(s)constraints
solvers
What should we model?
25
26
Example
How hard is regexmatching in Perl?
27
A: Just as hard as 3-SAT…
$istr = '^' . ('(x?)' x $V) . ".*;\n"$ireg = '^' . ('(x?)' x $V) . ".*;\n" . join('', map {'(?:' . join('|', map { $_ < 0 ? ('\\' . -$_ . 'x') : ('\\' . $_ ) } @$_ ) . "),\n" } @Clauses );
http://perl.plover.com/NPC/NPC-3SAT.html
Where do
constraints come from?
28
29
String a;// ...R = Regex("^ab$");if (R.IsMatch(a)) { // ...}
Code
30
Constraint Generation
Constraint Solving
31
Constraint Generation
Constraint Solving
Talk Outline
32
Background Building Tuning Conclusion
33
Chapter 2: Defining String Constraints
Contributions:1. The definition of the regular
matching assignments problem
2. An algorithm, its implementation, and correctness proof
3. An evaluation, applying (2) to a static analysis problem
34
dem
o (
inte
rnet
perm
itti
ng
)
Evaluation
35
The Task: generate string inputs that exercise 17 known vulnera-bilities in 30,000 lines of PHP
Metric: running time
Results
36
• Our constraint definition is sufficiently expressive to capture the constraints of interest
• Wall-clock running time is between 0.01 seconds and 10 minutes
Talk Outline
37
Background Building Tuning Conclusion
38
Chapter 3: Evaluating Data Structures
Contribution:4. An apples-to-apples performance
comparison of data structures and algorithms for automata-based string constraint solving
39
Motivation
• Existing work provided tool-to-tool performance comparisons
• Confounds: Performance gains may be due to external factors
40
The Framework
• Based on Rex • Fixes external factors:– front-end parser– regex-to-automaton conversion– implementation language– search tree
41
Study Design
Tasks: –automaton intersection–automaton subtraction
Metric: – running time
Character Sets
42
BDDPredRangeHash
binary decision diagramssymbolic bitvector ranges in DNFconcrete set of character rangesconcrete set of individual characters
43
Task 1 (55x):
Task 2 (100x):
44
Eager Lazy
Task 1 (55x):
Task 2 (100x):
45
Eager Lazy
Task 1 (55x):
Task 2 (100x): ASCII
Unicode
ASCII
Unicode
ASCII
Unicode
ASCII
Unicode
Results
46
Eager Lazy
Task 1 (55x):
Task 2 (100x): ASCII
Unicode
ASCII
Unicode
ASCII
Unicode
ASCII
Unicode
47
Lazy Eager
0.1
1
10
100
1000
0.1
1
10
100
0.1
1
10
100
1000
0.1
1
10
100
1000BDD Pred Range HashBDD Pred Range Hash
ASCI
IU
nico
de
48
0.1
1
10
100
1000
0.1
1
10
100
0.1
1
10
100
1000
0.1
1
10
100
1000BDD Pred Range HashBDD Pred Range Hash
ASCI
IU
nico
deLazy Eager
49
Chapter 4: Solving String Constraints Lazily
Contributions:5. A novel (lazy) algorithm for
solving multivariate string constraints
6. A comprehensive performance evaluation
50
Motivation
• More scalable algorithms are more likely to see real use
51
Approach
1. Eagerly construct ahigh-level representationof the search space
2. Explore the search spacelazily, adding restrictionsfor one variable at a time
52
Evaluation
Difference HampiLong
StringsCFG
Intersection
53
Evaluation
Difference HampiLong
StringsCFG
Intersection
Hampi: Background
54
2007 2008 2009 2010 2011 2012
SocialNetsProxied Content
USENIX SecBEK
POPLBEK2
ISSTAHampi
TOSEMHampi 2
2013
PLDIDPRLE
ASEStrSolve
VMCAIData structures
J. ASEStrSolve 2
Hampi: Background
55
SocialNetsProxied Content
USENIX SecBEK
POPLBEK2
TOSEMHampi 2
PLDIDPRLE
ASEStrSolve
VMCAIDatastructures
J. ASEStrSolve 2
ISSTAHampi
2011 2012 201320102007 2008 2009
56
Hampi: Architecture
Hampi
STP (bv)
MiniSAT
57
encodingHampi
STP (bv)
MiniSAT solving
58
Experiment
Task: regex difference(same dataset as before)
Metric: proportion of wall-clock time spent solving
59
Results
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Encoding Other
Leng
th B
ound
1
5
10
15
Proportion of Running time
60
Results
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Encoding Other
Leng
th B
ound
1
5
10
15
61
Results
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Encoding Other
Leng
th B
ound
1
5
10
15
0 1,000 2,000 3,000 4,000 5,000 6,000 7,000 8,000 9,000 10,0000%
20%
40%
60%
80%
100%
Encoding Solving
Absolute Running time (seconds)
Prop
ortio
n of
Run
ning
Tim
e
62
Evaluation
Difference HampiLong
StringsCFG
Intersection
63
Experiment
Task: intersect two regexes parameterized on n:
[a-c]*a[a-c]{n+1}and
[a-c]*b[a-c]{n}
Metric: running time
64
Participating Tools
Hampi
Rex Strsolve
65
Results
Rex
Hampi
Strsolve
0 250 500 750 10000.001
0.01
0.1
1
Tim
e (s
)
n
Talk Outline
66
Background Building Tuning Conclusion
Conclusion• Introduced string constraint solving in the
context of program analysis
• Two algorithms:one eager (DPRLE), one lazy (strsolve)
• Presented experiments– data structure selection
– solving multivariate constraints
• Our lazy prototype outperforms other approaches on indicative workloads
67
68
www.cs.virginia.edu/~ph4u/
Thanks for stopping by!
69