symbolic finite state transducers: algorithms and applications
DESCRIPTION
Symbolic Finite State Transducers: Algorithms and Applications. Margus Veanes Pieter Hooimeijer Benjamin Livshits David Molnar Nikolaj Bjørner. Symbolic Finite State Transducers: Algorithms and Applications. Margus Veanes Pieter Hooimeijer Benjamin Livshits David Molnar - PowerPoint PPT PresentationTRANSCRIPT
2
Symbolic Finite State Transducers:Algorithms and Applications
Margus VeanesPieter HooimeijerBenjamin LivshitsDavid Molnar Nikolaj Bjørner
3
Symbolic Finite State Transducers:Algorithms and Applications
Margus VeanesPieter HooimeijerBenjamin LivshitsDavid Molnar Nikolaj Bjørner
4
Formal languagesare well-studied.
5
a*b+𝑞0
a𝑞1
b
b
✔abb aaaa✘
6
Series10
20
40
60
80
100
120103
Num
ber o
f pap
ers
“automata”
POPL (2001–2011)
7
What about
transformation?
8http://en.wikipedia.org/wiki/Osborne_1
9
10
Compute image:
Check properties: Equivalence Composition
✔ abb{baa}
aaaa ✘𝑞0 𝑞1
a/b
b/a
b/a
11
Series10
20
40
60
80
100
120103
8
Num
ber o
f pap
ers
“automata” “transducers”
POPL (2001–2011)
12
Talk Outline
Background Approach Case Studies
13
Background
“Fast and Precise Sanitizer Analysis with BEK”
Idea:Develop a language for commonly-used string transformations. Prove properties about those transfor-mations.
14
Code
𝑞0 𝑞1
a/b
b/a
b/a
t := iter(c in s)[b := false;]{ case (!b && c in "['\"\\]"): b := false; yield('\\', c); case (c == '\\'): b := !b; yield(c); case (true): b := false; yield(c);};
FSTs
Gap
15
Code
𝑞0 𝑞1
a/b
b/a
b/a
t := iter(c in s)[b := false;]{ case (!b && c in "['\"\\]"): b := false; yield('\\', c); case (c == '\\'): b := !b; yield(c); case (true): b := false; yield(c);};
FSTs
1
domain-specific languages
16
Code
𝑞0 𝑞1
a/b
b/a
b/a
t := iter(c in s)[b := false;]{ case (!b && c in "['\"\\]"): b := false; yield('\\', c); case (c == '\\'): b := !b; yield(c); case (true): b := false; yield(c);};
FSTs
1
more expressive
transducers2
domain-specific languages
17
domain-specific languages
Code
𝑞0 𝑞1
a/b
b/a
b/a
t := iter(c in s)[b := false;]{ case (!b && c in "['\"\\]"): b := false; yield('\\', c); case (c == '\\'): b := !b; yield(c); case (true): b := false; yield(c);};
FSTs
1
more expressive
transducers2
18
Talk Outline
Background Approach Case Studies
19
20
Symbolic Finite State Transducers
Idea:• Equip transitions with formulae• Allow the use of any decidable
theory
21
Definition
Symbolic Finite State Transducer (SFT):
22
Symbolic Finite State Transducer (SFT):
- states- start state- final states
23
Symbolic Finite State Transducer (SFT):
- states- start state- final states
𝑅𝑞𝜙/ 𝒇→
𝑟
24
Symbolic Finite State Transducer (SFT):
- states- start state- final states
𝑅𝑞𝜙/ 𝒇→
𝑟
predicates output
25
Symbolic Finite State Transducer (SFT):
- states- start state- final states- transition
Background Theory:
- predicates
- label theory
26
Example𝑞0
𝑞1(𝜆 𝑥 . 𝑥=0 )/ [𝜆𝑥 .1 ]
(𝜆 𝑥 .𝐭 )/ [𝜆𝑥 .2𝑥 ]
27
𝑞0
𝑞1(𝜆 𝑥 . 𝑥=0 )/ [𝜆𝑥 .1 ]
(𝜆 𝑥 .𝐭 )/ [𝜆𝑥 .2𝑥 ]guards symbolic outputs
28
29
Closure under composition
SFT A B
in outSFT A in outSFT B
Requirement:
30
Single-valued equivalence
Definition:1𝑎 :𝜎 ∗
𝑏𝑐 :𝛾∗
𝑏∈ 𝐴(𝑎)𝑐∈𝐵(𝑎)
𝑏=𝑐
31
Algorithm:• Construct 2-output
product transducer• Find conflicts (dft):– output length– output value
Complexity:
𝑂 (𝑛2 ⋅ 𝑓 (𝑚 ) )number of rules
complexity of decision procedure
32
Key restriction: single-valuedness
Transducer A is single-valued if, for all inputs, A has at most one out-put.
𝐴=𝐴1
33
Note: This definition permits non-determinism, e.g.:
b/[]
b/[]
......
...
Transducer A is single-valued if, for all inputs, A has at most one out-put.
𝐴=𝐴1
34
35
algebra
36
subsumption equivalence idempotence
commutativity
...
algebra interesting properties
37
Talk Outline
Background Approach Case Studies
38
39
Case Studies
HTMLdecode
"b"'b'
MalwareFingerprinting
ImageBlurring
LocationPrivacy
40
HTMLdecode
"b"'b'
MalwareFingerprinting
ImageBlurring
LocationPrivacy
41
HTMLdecode
"<""<" "<""<"
Decode
42
"<""<" "<""<"
Decode
The Task: Prove that HTMLdecode is not idempotent
The Metric: Running time
43
"<""<" "<""<"
Decode
The Problem: Unicode defines 1,114,112 code points.
44
Three Participating Representations
C#
SFT (Eager)
C# C#
SFT+Registers(Eager)
+REG+REG
SFT+Registers(Lazy)
45
1
10
100
1,000
10,000
100,000
1,000,000
10,000,000
2 3 4 5 6
Transducer size ()6.6M
maximum number of digits
46
C#
SFT (Eager)
C# C#
SFT+Registers(Eager)
+REG+REG
SFT+Registers(Lazy)
47
1
10
100
1,000
10,000
100,000
1,000,000
10,000,000
2 3 4 5 6
Transducer size ()6.6M
51
SFT
SFT + Symbolic State Space
maximum number of digits
48
1.000
10.000
100.000
1,000.000
10,000.000
100,000.000
1,000,000.000
Tim
e (s
econ
ds; l
og s
cale
)
maximum number of digits2 3 4 5 6
Idempotence Checking: TimeSFT SFT +
REG(lazy)
SFT + REG(eager)
49
Talk Outline
Background Approach Case Studies
50
Conclusion
• Introduced Symbolic Finite State Transducers over any decidable background theory
• Presented decidability and complexity results
• Comes with a scalable and robust* implementation
51
Thank you!Please try our…
implementation
http://research.microsoft.com/automata/
online tutorial
http://www.rise4fun.com/Bek/tutorial
52
53
http://www.rise4fun.com/Bek