regular expression sub-matching using partial derivativessuma0002/talks/ppdp12-part... · 2013. 8....
TRANSCRIPT
![Page 1: Regular Expression Sub-Matching using Partial Derivativessuma0002/talks/ppdp12-part... · 2013. 8. 7. · Regular Expression Sub-Matching using Partial Derivatives Martin Sulzmann](https://reader035.vdocuments.net/reader035/viewer/2022071019/5fd3990ea32855689c68a98d/html5/thumbnails/1.jpg)
Regular Expression Sub-Matchingusing Partial Derivatives
Martin Sulzmann Kenny Zhuo Ming Lu
Hochschule Karlsruhe Nanyang Polytechnic
Regular Expression Sub-Matching using Partial Derivatives – p. 1/18
![Page 2: Regular Expression Sub-Matching using Partial Derivativessuma0002/talks/ppdp12-part... · 2013. 8. 7. · Regular Expression Sub-Matching using Partial Derivatives Martin Sulzmann](https://reader035.vdocuments.net/reader035/viewer/2022071019/5fd3990ea32855689c68a98d/html5/thumbnails/2.jpg)
Regular Expressions - The Basics
Words: w ::= Σ∗
Regular expressions
r ::= r + r Choice| rr Concatenation| r∗ Kleene star| ǫ Empty word| φ Empty language| l ∈ Σ Letters
(A + (BC))∗ denotes a regular language
L( (A + (BC))∗ ) = {ǫ, A, BC, ABC, ...}
Regular Expression Sub-Matching using Partial Derivatives – p. 2/18
![Page 3: Regular Expression Sub-Matching using Partial Derivativessuma0002/talks/ppdp12-part... · 2013. 8. 7. · Regular Expression Sub-Matching using Partial Derivatives Martin Sulzmann](https://reader035.vdocuments.net/reader035/viewer/2022071019/5fd3990ea32855689c68a98d/html5/thumbnails/3.jpg)
Regular ExpressionSub-Matching
Matching
w matches r iff w ∈ L(r)
ABAAC matches (A + AB)(BAA + A)(AC + C)
L( (A + AB)(BAA + A)(AC + C) ) ={ABAAAC, ABAAC, AAAC, AAC, ABBAAAC, ABBAAC, ABAC}
Regular Expression Sub-Matching using Partial Derivatives – p. 3/18
![Page 4: Regular Expression Sub-Matching using Partial Derivativessuma0002/talks/ppdp12-part... · 2013. 8. 7. · Regular Expression Sub-Matching using Partial Derivatives Martin Sulzmann](https://reader035.vdocuments.net/reader035/viewer/2022071019/5fd3990ea32855689c68a98d/html5/thumbnails/4.jpg)
Regular ExpressionSub-Matching
Matching
w matches r iff w ∈ L(r)
ABAAC matches (A + AB)(BAA + A)(AC + C)
L( (A + AB)(BAA + A)(AC + C) ) ={ABAAAC, ABAAC, AAAC, AAC, ABBAAAC, ABBAAC, ABAC}
Which sub-parts are matched?
Regular Expression Sub-Matching using Partial Derivatives – p. 3/18
![Page 5: Regular Expression Sub-Matching using Partial Derivativessuma0002/talks/ppdp12-part... · 2013. 8. 7. · Regular Expression Sub-Matching using Partial Derivatives Martin Sulzmann](https://reader035.vdocuments.net/reader035/viewer/2022071019/5fd3990ea32855689c68a98d/html5/thumbnails/5.jpg)
Regular ExpressionSub-Matching
Matching
w matches r iff w ∈ L(r)
ABAAC matches (A + AB)(BAA + A)(AC + C)
L( (A + AB)(BAA + A)(AC + C) ) ={ABAAAC, ABAAC, AAAC, AAC, ABBAAAC, ABBAAC, ABAC}
Which sub-parts are matched?
(x1 : (A + AB))(x2 : (BAA + A))(x3 : (AC + C))
Regular Expression Sub-Matching using Partial Derivatives – p. 3/18
![Page 6: Regular Expression Sub-Matching using Partial Derivativessuma0002/talks/ppdp12-part... · 2013. 8. 7. · Regular Expression Sub-Matching using Partial Derivatives Martin Sulzmann](https://reader035.vdocuments.net/reader035/viewer/2022071019/5fd3990ea32855689c68a98d/html5/thumbnails/6.jpg)
Regular ExpressionSub-Matching
Matching
w matches r iff w ∈ L(r)
ABAAC matches (A + AB)(BAA + A)(AC + C)
L( (A + AB)(BAA + A)(AC + C) ) ={ABAAAC, ABAAC, AAAC, AAC, ABBAAAC, ABBAAC, ABAC}
Which sub-parts are matched?
(x1 : (A + AB))(x2 : (BAA + A))(x3 : (AC + C))
Sub-matchings for ABAAC arex1 = AB
x2 = A
x3 = AC
Regular Expression Sub-Matching using Partial Derivatives – p. 3/18
![Page 7: Regular Expression Sub-Matching using Partial Derivativessuma0002/talks/ppdp12-part... · 2013. 8. 7. · Regular Expression Sub-Matching using Partial Derivatives Martin Sulzmann](https://reader035.vdocuments.net/reader035/viewer/2022071019/5fd3990ea32855689c68a98d/html5/thumbnails/7.jpg)
Regular ExpressionSub-Matching
Matching
w matches r iff w ∈ L(r)
ABAAC matches (A + AB)(BAA + A)(AC + C)
L( (A + AB)(BAA + A)(AC + C) ) ={ABAAAC, ABAAC, AAAC, AAC, ABBAAAC, ABBAAC, ABAC}
Which sub-parts are matched?
(x1 : (A + AB))(x2 : (BAA + A))(x3 : (AC + C))
Sub-matchings for ABAAC arex1 = AB
x2 = A
x3 = AC
Now that the difference is clear:Matching = Sub-matching
Regular Expression Sub-Matching using Partial Derivatives – p. 3/18
![Page 8: Regular Expression Sub-Matching using Partial Derivativessuma0002/talks/ppdp12-part... · 2013. 8. 7. · Regular Expression Sub-Matching using Partial Derivatives Martin Sulzmann](https://reader035.vdocuments.net/reader035/viewer/2022071019/5fd3990ea32855689c68a98d/html5/thumbnails/8.jpg)
Slow Regular Expression Matching
ExampleAn “A n-times”(x : A?n)(y : An) r? = ǫ + r
Regular Expression Sub-Matching using Partial Derivatives – p. 4/18
![Page 9: Regular Expression Sub-Matching using Partial Derivativessuma0002/talks/ppdp12-part... · 2013. 8. 7. · Regular Expression Sub-Matching using Partial Derivatives Martin Sulzmann](https://reader035.vdocuments.net/reader035/viewer/2022071019/5fd3990ea32855689c68a98d/html5/thumbnails/9.jpg)
Slow Regular Expression Matching
ExampleAn “A n-times”(x : A?n)(y : An) r? = ǫ + r
Consider n = 2
Regular Expression Sub-Matching using Partial Derivatives – p. 4/18
![Page 10: Regular Expression Sub-Matching using Partial Derivativessuma0002/talks/ppdp12-part... · 2013. 8. 7. · Regular Expression Sub-Matching using Partial Derivatives Martin Sulzmann](https://reader035.vdocuments.net/reader035/viewer/2022071019/5fd3990ea32855689c68a98d/html5/thumbnails/10.jpg)
Slow Regular Expression Matching
ExampleAn “A n-times”(x : A?n)(y : An) r? = ǫ + r
Consider n = 2
AA ⊢ (x : A?A?)(y : AA)
Regular Expression Sub-Matching using Partial Derivatives – p. 4/18
![Page 11: Regular Expression Sub-Matching using Partial Derivativessuma0002/talks/ppdp12-part... · 2013. 8. 7. · Regular Expression Sub-Matching using Partial Derivatives Martin Sulzmann](https://reader035.vdocuments.net/reader035/viewer/2022071019/5fd3990ea32855689c68a98d/html5/thumbnails/11.jpg)
Slow Regular Expression Matching
ExampleAn “A n-times”(x : A?n)(y : An) r? = ǫ + r
Consider n = 2
AA ⊢ (x : A?A?)(y : AA)
AA ⊢ (x : A?A?)(y : AA)
Regular Expression Sub-Matching using Partial Derivatives – p. 4/18
![Page 12: Regular Expression Sub-Matching using Partial Derivativessuma0002/talks/ppdp12-part... · 2013. 8. 7. · Regular Expression Sub-Matching using Partial Derivatives Martin Sulzmann](https://reader035.vdocuments.net/reader035/viewer/2022071019/5fd3990ea32855689c68a98d/html5/thumbnails/12.jpg)
Slow Regular Expression Matching
ExampleAn “A n-times”(x : A?n)(y : An) r? = ǫ + r
Consider n = 2
AA ⊢ (x : A?A?)(y : AA)
AA ⊢ (x : A?A?)(y : AA)
Fail ⇒ Backtrack
Regular Expression Sub-Matching using Partial Derivatives – p. 4/18
![Page 13: Regular Expression Sub-Matching using Partial Derivativessuma0002/talks/ppdp12-part... · 2013. 8. 7. · Regular Expression Sub-Matching using Partial Derivatives Martin Sulzmann](https://reader035.vdocuments.net/reader035/viewer/2022071019/5fd3990ea32855689c68a98d/html5/thumbnails/13.jpg)
Slow Regular Expression Matching
ExampleAn “A n-times”(x : A?n)(y : An) r? = ǫ + r
Consider n = 2
AA ⊢ (x : A?A?)(y : AA)
AA ⊢ (x : A?A?)(y : AA)
Fail ⇒ BacktrackAA ⊢ (x : A?A?)(y : AA)
Regular Expression Sub-Matching using Partial Derivatives – p. 4/18
![Page 14: Regular Expression Sub-Matching using Partial Derivativessuma0002/talks/ppdp12-part... · 2013. 8. 7. · Regular Expression Sub-Matching using Partial Derivatives Martin Sulzmann](https://reader035.vdocuments.net/reader035/viewer/2022071019/5fd3990ea32855689c68a98d/html5/thumbnails/14.jpg)
Slow Regular Expression Matching
ExampleAn “A n-times”(x : A?n)(y : An) r? = ǫ + r
Consider n = 2
AA ⊢ (x : A?A?)(y : AA)
AA ⊢ (x : A?A?)(y : AA)
Fail ⇒ BacktrackAA ⊢ (x : A?A?)(y : AA)
...
Regular Expression Sub-Matching using Partial Derivatives – p. 4/18
![Page 15: Regular Expression Sub-Matching using Partial Derivativessuma0002/talks/ppdp12-part... · 2013. 8. 7. · Regular Expression Sub-Matching using Partial Derivatives Martin Sulzmann](https://reader035.vdocuments.net/reader035/viewer/2022071019/5fd3990ea32855689c68a98d/html5/thumbnails/15.jpg)
Slow Regular Expression Matching
ExampleAn “A n-times”(x : A?n)(y : An) r? = ǫ + r
Consider n = 2
AA ⊢ (x : A?A?)(y : AA)
AA ⊢ (x : A?A?)(y : AA)
Fail ⇒ BacktrackAA ⊢ (x : A?A?)(y : AA)
...AA ⊢ (x : A?A?)(y : AA)
Regular Expression Sub-Matching using Partial Derivatives – p. 4/18
![Page 16: Regular Expression Sub-Matching using Partial Derivativessuma0002/talks/ppdp12-part... · 2013. 8. 7. · Regular Expression Sub-Matching using Partial Derivatives Martin Sulzmann](https://reader035.vdocuments.net/reader035/viewer/2022071019/5fd3990ea32855689c68a98d/html5/thumbnails/16.jpg)
Slow Regular Expression Matching
ExampleAn “A n-times”(x : A?n)(y : An) r? = ǫ + r
Consider n = 2
AA ⊢ (x : A?A?)(y : AA)
AA ⊢ (x : A?A?)(y : AA)
Fail ⇒ BacktrackAA ⊢ (x : A?A?)(y : AA)
...AA ⊢ (x : A?A?)(y : AA)
Success but exponential complexity
Regular Expression Sub-Matching using Partial Derivatives – p. 4/18
![Page 17: Regular Expression Sub-Matching using Partial Derivativessuma0002/talks/ppdp12-part... · 2013. 8. 7. · Regular Expression Sub-Matching using Partial Derivatives Martin Sulzmann](https://reader035.vdocuments.net/reader035/viewer/2022071019/5fd3990ea32855689c68a98d/html5/thumbnails/17.jpg)
Fast Regular Expression Matching
For brevity, we ignore sub-matching locations
Regular Expression Sub-Matching using Partial Derivatives – p. 5/18
![Page 18: Regular Expression Sub-Matching using Partial Derivativessuma0002/talks/ppdp12-part... · 2013. 8. 7. · Regular Expression Sub-Matching using Partial Derivatives Martin Sulzmann](https://reader035.vdocuments.net/reader035/viewer/2022071019/5fd3990ea32855689c68a98d/html5/thumbnails/18.jpg)
Fast Regular Expression Matching
For brevity, we ignore sub-matching locations
Convert A?A?AA to NFA
�������� A//�������� A
//�������� A//�������� A
//����������������
//��������
OO
//��������
OO
//��������
OO
Regular Expression Sub-Matching using Partial Derivatives – p. 5/18
![Page 19: Regular Expression Sub-Matching using Partial Derivativessuma0002/talks/ppdp12-part... · 2013. 8. 7. · Regular Expression Sub-Matching using Partial Derivatives Martin Sulzmann](https://reader035.vdocuments.net/reader035/viewer/2022071019/5fd3990ea32855689c68a98d/html5/thumbnails/19.jpg)
Fast Regular Expression Matching
For brevity, we ignore sub-matching locations
Convert A?A?AA to NFA
�������� A//�������� A
//�������� A//�������� A
//����������������
//��������
OO
//��������
OO
//��������
OO
Simultaneous search for match AA
Regular Expression Sub-Matching using Partial Derivatives – p. 5/18
![Page 20: Regular Expression Sub-Matching using Partial Derivativessuma0002/talks/ppdp12-part... · 2013. 8. 7. · Regular Expression Sub-Matching using Partial Derivatives Martin Sulzmann](https://reader035.vdocuments.net/reader035/viewer/2022071019/5fd3990ea32855689c68a98d/html5/thumbnails/20.jpg)
Fast Regular Expression Matching
For brevity, we ignore sub-matching locations
Convert A?A?AA to NFA
�������� A//• A
//• A//• A
//����������������
//��������
OO
//��������
OO
//��������
OO
Simultaneous search for match AA
AA
Regular Expression Sub-Matching using Partial Derivatives – p. 5/18
![Page 21: Regular Expression Sub-Matching using Partial Derivativessuma0002/talks/ppdp12-part... · 2013. 8. 7. · Regular Expression Sub-Matching using Partial Derivatives Martin Sulzmann](https://reader035.vdocuments.net/reader035/viewer/2022071019/5fd3990ea32855689c68a98d/html5/thumbnails/21.jpg)
Fast Regular Expression Matching
For brevity, we ignore sub-matching locations
Convert A?A?AA to NFA
�������� A//�������� A
//• A//• A
//•
//��������
OO
//��������
OO
//��������
OO
Simultaneous search for match AA
AA
Regular Expression Sub-Matching using Partial Derivatives – p. 5/18
![Page 22: Regular Expression Sub-Matching using Partial Derivativessuma0002/talks/ppdp12-part... · 2013. 8. 7. · Regular Expression Sub-Matching using Partial Derivatives Martin Sulzmann](https://reader035.vdocuments.net/reader035/viewer/2022071019/5fd3990ea32855689c68a98d/html5/thumbnails/22.jpg)
Fast Regular Expression Matching
For brevity, we ignore sub-matching locations
Convert A?A?AA to NFA
�������� A//�������� A
//• A//• A
//•
//��������
OO
//��������
OO
//��������
OO
Simultaneous search for match AA
AA
No backtracking, linear searchTo be tracked states linear in the size of regularexpressionLinear complexity!So far Thompson and Glushkov NFA constructionSee Russ Cox, Alain Frisch et. al., Ville Laurikari,...
Regular Expression Sub-Matching using Partial Derivatives – p. 5/18
![Page 23: Regular Expression Sub-Matching using Partial Derivativessuma0002/talks/ppdp12-part... · 2013. 8. 7. · Regular Expression Sub-Matching using Partial Derivatives Martin Sulzmann](https://reader035.vdocuments.net/reader035/viewer/2022071019/5fd3990ea32855689c68a98d/html5/thumbnails/23.jpg)
Our Contributions
Matching automata construction based onBrzozowski’s Derivatives (DFA)Antimirov’s Partial Derivatives (NFA)
Fast and elegant algorithms forPOSIX matchinggreedy left-most matching
Implementation in Haskell supporting real-world regularexpressions
Regular Expression Sub-Matching using Partial Derivatives – p. 6/18
![Page 24: Regular Expression Sub-Matching using Partial Derivativessuma0002/talks/ppdp12-part... · 2013. 8. 7. · Regular Expression Sub-Matching using Partial Derivatives Martin Sulzmann](https://reader035.vdocuments.net/reader035/viewer/2022071019/5fd3990ea32855689c68a98d/html5/thumbnails/24.jpg)
Brzozowski’s Derivatives
r\l “take away the leading l”
r\l derivative of r w.r.t. l
L(r\l) = {w | lw ∈ L(r)}
Compute r\l by induction, e.g.A\A = ǫ, B\A = φ, r∗\l = (r\l)r∗
r1r2\l = (r1\l)r2 + r2\l if ǫ ∈ L(r1)
Matching derivation:
r1
l→ r2 iff r2 = r1\l
w = l1...ln check if rl1→ ...
ln→ r′ where ǫ ∈ L(r′)
Regular Expression Sub-Matching using Partial Derivatives – p. 7/18
![Page 25: Regular Expression Sub-Matching using Partial Derivativessuma0002/talks/ppdp12-part... · 2013. 8. 7. · Regular Expression Sub-Matching using Partial Derivatives Martin Sulzmann](https://reader035.vdocuments.net/reader035/viewer/2022071019/5fd3990ea32855689c68a98d/html5/thumbnails/25.jpg)
Matching with Derivatives
Example: Match AB against (x : A + y : AB + z : B)∗
Regular Expression Sub-Matching using Partial Derivatives – p. 8/18
![Page 26: Regular Expression Sub-Matching using Partial Derivativessuma0002/talks/ppdp12-part... · 2013. 8. 7. · Regular Expression Sub-Matching using Partial Derivatives Martin Sulzmann](https://reader035.vdocuments.net/reader035/viewer/2022071019/5fd3990ea32855689c68a98d/html5/thumbnails/26.jpg)
Matching with Derivatives
Example: Match AB against (x : A + y : AB + z : B)∗
(A + AB + B)∗ {x : ǫ, y : ǫ, z : ǫ}
Regular Expression Sub-Matching using Partial Derivatives – p. 8/18
![Page 27: Regular Expression Sub-Matching using Partial Derivativessuma0002/talks/ppdp12-part... · 2013. 8. 7. · Regular Expression Sub-Matching using Partial Derivatives Martin Sulzmann](https://reader035.vdocuments.net/reader035/viewer/2022071019/5fd3990ea32855689c68a98d/html5/thumbnails/27.jpg)
Matching with Derivatives
Example: Match AB against (x : A + y : AB + z : B)∗
(A + AB + B)∗ {x : ǫ, y : ǫ, z : ǫ}
A→ (A + AB + B)∗\A
Regular Expression Sub-Matching using Partial Derivatives – p. 8/18
![Page 28: Regular Expression Sub-Matching using Partial Derivativessuma0002/talks/ppdp12-part... · 2013. 8. 7. · Regular Expression Sub-Matching using Partial Derivatives Martin Sulzmann](https://reader035.vdocuments.net/reader035/viewer/2022071019/5fd3990ea32855689c68a98d/html5/thumbnails/28.jpg)
Matching with Derivatives
Example: Match AB against (x : A + y : AB + z : B)∗
(A + AB + B)∗ {x : ǫ, y : ǫ, z : ǫ}
A→ (A + AB + B)∗\A
= (A + AB + B)\A (A + AB + B)∗
Regular Expression Sub-Matching using Partial Derivatives – p. 8/18
![Page 29: Regular Expression Sub-Matching using Partial Derivativessuma0002/talks/ppdp12-part... · 2013. 8. 7. · Regular Expression Sub-Matching using Partial Derivatives Martin Sulzmann](https://reader035.vdocuments.net/reader035/viewer/2022071019/5fd3990ea32855689c68a98d/html5/thumbnails/29.jpg)
Matching with Derivatives
Example: Match AB against (x : A + y : AB + z : B)∗
(A + AB + B)∗ {x : ǫ, y : ǫ, z : ǫ}
A→ (A + AB + B)∗\A
= (A + AB + B)\A (A + AB + B)∗
= (A\A + AB\A + B\A) (A + AB + B)∗
Regular Expression Sub-Matching using Partial Derivatives – p. 8/18
![Page 30: Regular Expression Sub-Matching using Partial Derivativessuma0002/talks/ppdp12-part... · 2013. 8. 7. · Regular Expression Sub-Matching using Partial Derivatives Martin Sulzmann](https://reader035.vdocuments.net/reader035/viewer/2022071019/5fd3990ea32855689c68a98d/html5/thumbnails/30.jpg)
Matching with Derivatives
Example: Match AB against (x : A + y : AB + z : B)∗
(A + AB + B)∗ {x : ǫ, y : ǫ, z : ǫ}
A→ (A + AB + B)∗\A
= (A + AB + B)\A (A + AB + B)∗
= (A\A + AB\A + B\A) (A + AB + B)∗
= (ǫ + B) (A + AB + B)∗
Regular Expression Sub-Matching using Partial Derivatives – p. 8/18
![Page 31: Regular Expression Sub-Matching using Partial Derivativessuma0002/talks/ppdp12-part... · 2013. 8. 7. · Regular Expression Sub-Matching using Partial Derivatives Martin Sulzmann](https://reader035.vdocuments.net/reader035/viewer/2022071019/5fd3990ea32855689c68a98d/html5/thumbnails/31.jpg)
Matching with Derivatives
Example: Match AB against (x : A + y : AB + z : B)∗
(A + AB + B)∗ {x : ǫ, y : ǫ, z : ǫ}
A→ (A + AB + B)∗\A
= (A + AB + B)\A (A + AB + B)∗
= (A\A + AB\A + B\A) (A + AB + B)∗
= (ǫ + B)1 (A + AB + B)∗ {x1 : A, y1 : A, z1 : ǫ}
Record matchings for each iteration
Paper records matchings within pattern
Regular Expression Sub-Matching using Partial Derivatives – p. 8/18
![Page 32: Regular Expression Sub-Matching using Partial Derivativessuma0002/talks/ppdp12-part... · 2013. 8. 7. · Regular Expression Sub-Matching using Partial Derivatives Martin Sulzmann](https://reader035.vdocuments.net/reader035/viewer/2022071019/5fd3990ea32855689c68a98d/html5/thumbnails/32.jpg)
Matching with Derivatives
Example: Match AB against (x : A + y : AB + z : B)∗
(A + AB + B)∗ {x : ǫ, y : ǫ, z : ǫ}
A→ (A + AB + B)∗\A
= (A + AB + B)\A (A + AB + B)∗
= (A\A + AB\A + B\A) (A + AB + B)∗
= (ǫ + B)1 (A + AB + B)∗ {x1 : A, y1 : A, z1 : ǫ}
B→ ((ǫ + B)1
︸ ︷︷ ︸
p1
(A + AB + B)∗︸ ︷︷ ︸
p2
)\B
(p1p2)\l = (p1\l, p2) + (empty(p1)p2\l) if p1 empty
Choice of matchings. Don’t drop p1,
keep p1 and make p1 “empty” (ǫ + B)1 ⇒ (ǫ + φ)1
so p1 won’t contribute further matchingsRegular Expression Sub-Matching using Partial Derivatives – p. 8/18
![Page 33: Regular Expression Sub-Matching using Partial Derivativessuma0002/talks/ppdp12-part... · 2013. 8. 7. · Regular Expression Sub-Matching using Partial Derivatives Martin Sulzmann](https://reader035.vdocuments.net/reader035/viewer/2022071019/5fd3990ea32855689c68a98d/html5/thumbnails/33.jpg)
Matching with Derivatives
Example: Match AB against (x : A + y : AB + z : B)∗
(A + AB + B)∗ {x : ǫ, y : ǫ, z : ǫ}
A→ (A + AB + B)∗\A
= (A + AB + B)\A (A + AB + B)∗
= (A\A + AB\A + B\A) (A + AB + B)∗
= (ǫ + B)1 (A + AB + B)∗ {x1 : A, y1 : A, z1 : ǫ}
B→ ((ǫ + B)1 (A + AB + B)∗)\B
= ((ǫ+B)1\B (A+AB +B)∗)+(ǫ+φ)1((A+AB +B)∗\B)
... = ǫ1(A + AB + B)∗ + (ǫ + φ)1ǫ2(A + AB + B)∗
Regular Expression Sub-Matching using Partial Derivatives – p. 8/18
![Page 34: Regular Expression Sub-Matching using Partial Derivativessuma0002/talks/ppdp12-part... · 2013. 8. 7. · Regular Expression Sub-Matching using Partial Derivatives Martin Sulzmann](https://reader035.vdocuments.net/reader035/viewer/2022071019/5fd3990ea32855689c68a98d/html5/thumbnails/34.jpg)
Matching with Derivatives
Example: Match AB against (x : A + y : AB + z : B)∗
(A + AB + B)∗ {x : ǫ, y : ǫ, z : ǫ}
A→ (A + AB + B)∗\A
= (A + AB + B)\A (A + AB + B)∗
= (A\A + AB\A + B\A) (A + AB + B)∗
= (ǫ + B)1 (A + AB + B)∗ {x1 : A, y1 : A, z1 : ǫ}
B→ ((ǫ + B)1 (A + AB + B)∗)\B
= ((ǫ+B)1\B (A+AB +B)∗)+(ǫ+φ)1((A+AB +B)∗\B)
... = ǫ1(A + AB + B)∗ + (ǫ + φ)1ǫ2(A + AB + B)∗
{y1 : AB} and {x1 : A, z2 : B}
POSIX and greedy left-most match
Regular Expression Sub-Matching using Partial Derivatives – p. 8/18
![Page 35: Regular Expression Sub-Matching using Partial Derivativessuma0002/talks/ppdp12-part... · 2013. 8. 7. · Regular Expression Sub-Matching using Partial Derivatives Martin Sulzmann](https://reader035.vdocuments.net/reader035/viewer/2022071019/5fd3990ea32855689c68a98d/html5/thumbnails/35.jpg)
Derivatives Matching Summary
Computes all matchings ⇒ exponential complexity
Optimization:
Simplify, e.g. aggressively to the left
(1) r + r ⇒ r keep left r (2) φ r ⇒ φ (3) ǫ r ⇒ r
ǫ1(A + AB + B)∗︸ ︷︷ ︸
p1
+ (ǫ + φ)1ǫ2(A + AB + B)∗︸ ︷︷ ︸
p2
⇒∗ ǫ1(A + AB + B)∗ + ǫ2(A + AB + B)∗
⇒ ǫ1(A + AB + B)∗
⇒ (A + AB + B)∗
Yields POSIX match
Regular Expression Sub-Matching using Partial Derivatives – p. 9/18
![Page 36: Regular Expression Sub-Matching using Partial Derivativessuma0002/talks/ppdp12-part... · 2013. 8. 7. · Regular Expression Sub-Matching using Partial Derivatives Martin Sulzmann](https://reader035.vdocuments.net/reader035/viewer/2022071019/5fd3990ea32855689c68a98d/html5/thumbnails/36.jpg)
Matching with Partial Derivatives
Derivatives represent states of a DFA·\· : r 7→ L 7→ r
(A + AB + B)∗\A = (ǫ + B) (A + AB + B)∗
On the fly DFA construction.
Partial derivatives represent states of an NFA·\p· : r 7→ L 7→ 2r
L(r\l) = L(r1 + ... + rn) where r\pl = {r1, ..., rn}
(A+AB+B)∗\pA = {ǫ(A+AB+B)∗, B(A+AB+B)∗}
Set of partial derivatives finite and linear in size ofregular expression.Build NFA match automata.
Regular Expression Sub-Matching using Partial Derivatives – p. 10/18
![Page 37: Regular Expression Sub-Matching using Partial Derivativessuma0002/talks/ppdp12-part... · 2013. 8. 7. · Regular Expression Sub-Matching using Partial Derivatives Martin Sulzmann](https://reader035.vdocuments.net/reader035/viewer/2022071019/5fd3990ea32855689c68a98d/html5/thumbnails/37.jpg)
Matching with Partial Derivatives
(Sketch of) NFA match automata
76540123p2
// 76540123p1A
//
A
>>|
||
||
||
||
76540123p3 ...
(A + AB + B)∗︸ ︷︷ ︸
p1
\pA = {ǫ(A + AB + B)∗︸ ︷︷ ︸
p2
, B(A + AB + B)∗︸ ︷︷ ︸
p3
}
Depth-first left-most traversal⇒ greedy left-most match
Not POSIX because structure is broken apart
See paper for details of construction
Regular Expression Sub-Matching using Partial Derivatives – p. 11/18
![Page 38: Regular Expression Sub-Matching using Partial Derivativessuma0002/talks/ppdp12-part... · 2013. 8. 7. · Regular Expression Sub-Matching using Partial Derivatives Martin Sulzmann](https://reader035.vdocuments.net/reader035/viewer/2022071019/5fd3990ea32855689c68a98d/html5/thumbnails/38.jpg)
POSIX versus Greedy Left-Most Match
Derivatives for POSIX matching
POSIX = maximal match w.r.t structureAB matches (A + AB + B)∗
(A + AB + B)∗\A = (ǫ + B) (A + AB + B)∗
Partial derivatives for greedy left-most matchingGreedy left-most = maximal match ignoring anystructureAB matches (A + AB + B)∗
(A + AB + B)∗\pA ={ǫ(A + AB + B)∗,
B(A + AB + B)∗}
Regular Expression Sub-Matching using Partial Derivatives – p. 12/18
![Page 39: Regular Expression Sub-Matching using Partial Derivativessuma0002/talks/ppdp12-part... · 2013. 8. 7. · Regular Expression Sub-Matching using Partial Derivatives Martin Sulzmann](https://reader035.vdocuments.net/reader035/viewer/2022071019/5fd3990ea32855689c68a98d/html5/thumbnails/39.jpg)
Implementation
Fine-tuned Haskell implementation of greedy left-mostusing partial derivative NFAs.
Real-world extensions:Group matchings, anchored match, ...
Competitive performance (see paper for details):C-based: RE2, PCREHaskell-based: Weighted, TDFA
Partial derivative NFA construction “smaller” comparedto Thompon and Glushkov NFA construction
Reference implementation of Thompon, Glushkovand Partial Derivative NFA construction
Regular Expression Sub-Matching using Partial Derivatives – p. 13/18
![Page 40: Regular Expression Sub-Matching using Partial Derivativessuma0002/talks/ppdp12-part... · 2013. 8. 7. · Regular Expression Sub-Matching using Partial Derivatives Martin Sulzmann](https://reader035.vdocuments.net/reader035/viewer/2022071019/5fd3990ea32855689c68a98d/html5/thumbnails/40.jpg)
Conclusion
First application of derivatives and partial derivatives forregular expression sub-matching
Future work:Implementation in other languagesEfficient POSIX implementation
Tricky for NFA but see “backwards scanning” trickby Russ CoxExploiting laziness of the on-the fly derivativeconstruction
Error explanationWhy is there no match?
Regular Expression Sub-Matching using Partial Derivatives – p. 14/18
![Page 41: Regular Expression Sub-Matching using Partial Derivativessuma0002/talks/ppdp12-part... · 2013. 8. 7. · Regular Expression Sub-Matching using Partial Derivatives Martin Sulzmann](https://reader035.vdocuments.net/reader035/viewer/2022071019/5fd3990ea32855689c68a98d/html5/thumbnails/41.jpg)
Errata - Simplifications
page 7, left colomn: The derivation should be as followswhere we have underlined the corrected parts andexpressions involving φ have been alredy removed.
(x|ǫ : A∗, y|ǫ : A∗)A→ (x|A : A∗, y|ǫ : A∗) + (x|ǫ : ǫ, y|A : A∗)A→ ((x|AA : A∗, y|ǫ : A∗) + (x|A : ǫ, y|A : A∗))+
(x|ǫ : ǫ, y|AA : A∗))A→ ...
Simplifications at the pattern and regular expression level.
Regular Expression Sub-Matching using Partial Derivatives – p. 15/18
![Page 42: Regular Expression Sub-Matching using Partial Derivativessuma0002/talks/ppdp12-part... · 2013. 8. 7. · Regular Expression Sub-Matching using Partial Derivatives Martin Sulzmann](https://reader035.vdocuments.net/reader035/viewer/2022071019/5fd3990ea32855689c68a98d/html5/thumbnails/42.jpg)
Errata - Derivative Matching
Figure 8 “Derivative Matching”:env(·) :: p → {Γ}
env((x|w : r)) =
8
<
:
{{(x, w)}} if ǫ ∈ L(r)
{} otherwise
env((x|w : p)) = {{(x, w)} ⊎ es|es ∈ env(p)}
env((p1, p2)) = {e1 ⊎ e2|e1 ∈ env(p1), e2 ∈ env(p2)}
env((p1 + p2)) = env(p1) ⊎ env(p2)
env(p∗) = env(p)
match(·, ·) :: p → w → {Γ}
match(p, w) = env(p\w)
There’s an issue:
In case env(p) yields {} but the pattern is empty, weshould actually return instead {{}}.
Regular Expression Sub-Matching using Partial Derivatives – p. 16/18
![Page 43: Regular Expression Sub-Matching using Partial Derivativessuma0002/talks/ppdp12-part... · 2013. 8. 7. · Regular Expression Sub-Matching using Partial Derivatives Martin Sulzmann](https://reader035.vdocuments.net/reader035/viewer/2022071019/5fd3990ea32855689c68a98d/html5/thumbnails/43.jpg)
Errata - Derivative Matching (2)
To explain the issue, consider the initial pattern(x|ǫ : (y|ǫ : A)∗).Building the derivative w.r.t A yields
(x|A : (y|A : ǫ, (y|ǫ : A)∗))
Applying env() on the subpattern (y|ǫ : A)∗ yields {}because the underlying pattern (y|ǫ : A) is not empty.Clearly, (y|ǫ : A)∗ contains empty (zero iterations). Hence, inthis situation, we shouldn’t return {} (“no match”) but ratherreport {{}} (“empty match”).
Regular Expression Sub-Matching using Partial Derivatives – p. 17/18
![Page 44: Regular Expression Sub-Matching using Partial Derivativessuma0002/talks/ppdp12-part... · 2013. 8. 7. · Regular Expression Sub-Matching using Partial Derivatives Martin Sulzmann](https://reader035.vdocuments.net/reader035/viewer/2022071019/5fd3990ea32855689c68a98d/html5/thumbnails/44.jpg)
Errata - Derivative Matching (3)
Here’s the fix:env(·) :: p → {Γ}
env((x|w : r)) =
8
<
:
{{(x, w)}} if ǫ ∈ L(r)
{} otherwise
env((x|w : p)) = envH (((x|w : p), {{(x, w)} ⊎ es|es ∈ env(p)}))
env((p1, p2)) = envH ((p1, p2), {e1 ⊎ e2|e1 ∈ env(p1), e2 ∈ env(p2)}))
env((p1 + p2)) = envH ((p1 + p2, env(p1) ⊎ env(p2)))
env(p∗) = envH ((p∗, env(p)))
envH ((p, e)) =
8
<
:
{{}} if ǫ ∈ L(p ↓) and e = {}
e otherwise
Regular Expression Sub-Matching using Partial Derivatives – p. 18/18