01 02 e 0 w alpharegex 01 02 wprl.korea.ac.kr/~pronto/home/posters/regex-synthesis.pdf ·...
TRANSCRIPT
고려대학교 정보통신대학 컴퓨터통신공학부 소순범고려대학교 정보대학 컴퓨터학과 이민아
1.����������� ������������������ 연구����������� ������������������ 동기 2.����������� ������������������ 문제����������� ������������������ 및����������� ������������������ 목표
3.����������� ������������������ 정규식����������� ������������������ 합성����������� ������������������ 알고리즘
4.����������� ������������������ 실험문제
예제 개수합성된 정규식
소요 시간 (초)속도향상
P N 기본 알고리즘 우리 알고리즘
w는 오른쪽으로부터 5번째 글자가 1이다. 3 3 (0+1)*1(0+1)(0+1)(0+1)(0+1) 148.0 8.2 18x
w는 최대 두 개의 0을 가진다. 8 7 1*0?1*0?1* 425.0 1.2 354x
w는 0과 1이 번갈아가며 등장한다. 10 11 0?(10)*1? 4073.9 1.6 2546x
w에 있는 0의 개수는 3으로 나누어 떨어진다. 8 7 (1+01*01*0)* > 7200.0 5.9 n/a
w가 0으로 시작하면 홀수의 길이를 가지고, 1로 시작하면 짝수의 길이를 가진다. 5 3 (0+1(0+1)) ((0+1)(0+1))* > 7200.0 10.9 n/a
w는 최소 1개의 0과 최대 1개의 1을 가진다. 12 10 0*(01?+100*) > 7200.0 7.5 n/a
w는 최대 1쌍의 연속한 1을 가진다. 9 8 (1+(01?)*)(0+10*) 465.1 24.4 19x
실험����������� ������������������ 환경����������� ������������������ ����������� ������������������ ����������� ������������������ MacBook����������� ������������������ Pro����������� ������������������ /����������� ������������������ OS����������� ������������������ X����������� ������������������ El����������� ������������������ Capitan����������� ������������������ 10.11.1����������� ������������������ /����������� ������������������ 2.2����������� ������������������ GHz����������� ������������������ Intel����������� ������������������ Core����������� ������������������ i7����������� ������������������ /����������� ������������������ 16GB����������� ������������������ 1600����������� ������������������ MHz����������� ������������������ DDR3
5.����������� ������������������ 결론✓ 적은 수의 예로부터 사람도 풀기 어려워 하는 정규식을 빠르게 합성
✓ 효율적으로 상태공간을 탐색하기 위한 다양한 탐색 기법 제시
✓ 실제 계산이론 책에 등장하는 고난이도 문제를 통해 성능 입증
고려대학교 프로그래밍 연구실오학주 교수님
Σ = {0,����������� ������������������ 1}����������� ������������������ 에����������� ������������������ 대해,����������� ������������������ 다음����������� ������������������ 언어에����������� ������������������ 대한����������� ������������������ 정규식을����������� ������������������ 찾으시오.L = {w ∈ {0, 1}* | w 는����������� ������������������ 정확히����������� ������������������ 한����������� ������������������ 쌍의����������� ������������������ 연속인����������� ������������������ 0들을����������� ������������������ 갖는다.}
옳은����������� ������������������ 예����������� ������������������
00,����������� ������������������ 1001,����������� ������������������
0101001010����������� ������������������ 1111001111
틀린����������� ������������������ 예����������� ������������������
01,����������� ������������������ 11,����������� ������������������ 000,����������� ������������������ 00100
효율적인 상태공간 가지치기
3. 해를 가질수 없는 상태(Dead States) 가지치기
2. 같은 의미 상태(Semantically-Equivalent States) 가지치기
1. 간단한 정규식 우선탐색 (Best-first Enumerative Search)
정규표현식 자동 합성기
00,����������� ������������������ 1001,����������� ������������������
0101001010����������� ������������������ 1111001111
01,����������� ������������������ 11,����������� ������������������ 000,����������� ������������������ 00100
정규식 문법으로 생성되는 모든 상태공간 탐색기본 알고리즘 챌린지 매우 큰 상태공간. 깊이 d에 있는 상태 개수:
해결 방법 효율적인 공간 탐색 알고리즘 고안
4. 불필요한 상태(Redundant States) 가지치기
계산이론 수업을 듣다가: 정규식 합성을 자동으로 할 수 없을까? 주어진 예제를 만족하는 정규식을 자동 합성하기:
AlphaRegex
AlphaRegex
(0?1)⇤00(10?)⇤
목표: 계산이론 수강생과 교수님보다 똑똑하게!
옳은 예(Positive examples)
틀린 예(Negative examples)
자동 합성된 정규표현식
(in 0.5s)
⇤
a ✏ ; ⇤+⇤
a+ a a+ ✏ a+ ; a+ (⇤+⇤)
a+ (a+ a) a+ (a+ ✏) a+ (a+ ;) · · ·
a+ (⇤ ·⇤)
· · ·
a+ (⇤⇤)
· · ·
✏+ a ✏+ ✏ ✏+ ; ✏+ (⇤+⇤)
· · ·
· · ·
⇤ ·⇤· · ·
⇤⇤
· · ·
Figure 1. Exhaustive Search
e1 ! e01e1 + e2 ! e01 + e2
e2 ! e02e1 + e2 ! e1 + e02
e1 ! e01e1 · e2 ! e01 · e2
e2 ! e02e1 · e2 ! e1 · e02
e ! e0
e⇤ ! e0⇤e ! e0
e? ! e0?
⇤ ! aa 2 ⌃ ⇤ ! ✏ ⇤ ! ;
⇤ ! ⇤+⇤ ⇤ ! ⇤ ·⇤ ⇤ ! ⇤⇤ ⇤ ! ⇤?
Figure 2. Transition Relation between States
regular expression (e.g. c = 7). The number of states atdepth d in worst case is
N(0) = 1
N(d+ 1) = N(d) · c2d
when c = 7:
N(d) = 7Pd�1
k=0 2k 2 O(72d�1)
Search Strategy We pick a state that has a minimal cost,where the cost of states is defined as follows:
C(a) = C(✏) = C(;) = c1C(⇤) = c2 (c2 > c1)
C(e1 + e2) > C(e1) + C(e2)C(e1 · e2) > C(e1) + C(e2)
C(e⇤) > C(e)
Intuitively, we prefer simpler expressions by following theprinciple of Ockham’s razor, so that the solution found isthe simplest regular expression that is consistent with theexamples.
Algorithm 1 Search AlgorithmInput: Positive and negative examples (P,N )Output: A regular expression E consistent with (P,N )
1: W := {⇤}2: repeat3: pick s from W4: if solution(s) then return s5: else6: W := W [ next(s)7: end if8: until W 6= ;
3.2 NormalizationExamples:
[[s⇤s⇤]] = [[s⇤]]
[[(s+ s)]] = [[s]]
[[(s · s⇤)⇤]] = [[s⇤]]
...
3.3 Pruning Search SpaceDefinition 1 (Dead States). Let (P,N ) be a regular expres-sion problem. We say a state s 2 S is dead, denoted dead(s),iff every closed state s0 reachable from s is not a solution:
dead(s) ()�(s !⇤ s0) ^ s0 6! =) ¬solution(s0)
�.
Intuitively, a state s is dead if exploring further the reach-able states of s is guaranteed to fail to find a solution. Oursearch algorithm aims to identify as many dead states as pos-sible and does not attempt to explore beyond them. Specifi-cally, we identify two types of dead states: pdead and ndead.
Definition 2. A state s is dead for positive examples, de-noted pdead(s), iff every closed state s0 reachable from sfails to accept a positive example:
pdead(s) ()�s !⇤ s0 ^ s0 6! =) 9p 2 P. p 62 [[s0]]
�.
Example 1. Suppose b 2 P . Any closed state s0 reachablefrom state s = a·⇤ is doomed to reject the positive example;
3 2016/6/6
Lemma 4. Let s be any state. Then,
pdead(s) () 9p 2 P. p 62 [[bs]].
Proof. Consider each direction.
• (=)) Suppose pdead(s) holds:
s !⇤ s0 ^ s0 6! =) 9p 2 P. p 62 [[s0]]. (5)
From (5) and Lemma 6, we obtain 9p 2 P. p 62 [[bs]].• ((=) Suppose p 62 [[bs]]. By Lemma 2, we have
p 62[
s!⇤s0^s0 6![[s0]].
which implies that p 62 [[s0]] for all closed s0 reachablefrom s.
Lemma 5. Let s be any state. Then,
ndead(s) () 9n 2 N . n 2 [[es]].
Proof. Consider each direction.
• (=)) Suppose ndead(s) holds:
s !⇤ s0 ^ s0 6! =) 9n 2 N . n 2 [[s0]]. (6)
From (6) and Lemma 7, we obtain 9n 2 N . n 2 [[es]].• ((=) Suppose n 2 [[es]]. By Lemma 3, we have
n 2\
s!⇤s0^s0 6![[s0]]
which implies that n 2 [[s0]] for all closed s0 reachablefrom s.
Lemma 6. For any state s, we have s !⇤ bs and bs 6!.
Proof. By structural induction on s.
Lemma 7. For any state s, we have s !⇤ es and es 6!.
Proof. By structural induction on s.
Final Algorithm With normalization and pruning, thesearch algorithm uses the following next function:
next(s) =
8<
:
; 9p 2 P. p 62 [[bs]]; 9n 2 N . n 2 [[es]]{normalize(s0) | s ! s0} otherwise
5 2016/6/4
Lemma 4. Let s be any state. Then,
pdead(s) () 9p 2 P. p 62 [[bs]].
Proof. Consider each direction.
• (=)) Suppose pdead(s) holds:
s !⇤ s0 ^ s0 6! =) 9p 2 P. p 62 [[s0]]. (5)
From (5) and Lemma 6, we obtain 9p 2 P. p 62 [[bs]].• ((=) Suppose p 62 [[bs]]. By Lemma 2, we have
p 62[
s!⇤s0^s0 6![[s0]].
which implies that p 62 [[s0]] for all closed s0 reachablefrom s.
Lemma 5. Let s be any state. Then,
ndead(s) () 9n 2 N . n 2 [[es]].
Proof. Consider each direction.
• (=)) Suppose ndead(s) holds:
s !⇤ s0 ^ s0 6! =) 9n 2 N . n 2 [[s0]]. (6)
From (6) and Lemma 7, we obtain 9n 2 N . n 2 [[es]].• ((=) Suppose n 2 [[es]]. By Lemma 3, we have
n 2\
s!⇤s0^s0 6![[s0]]
which implies that n 2 [[s0]] for all closed s0 reachablefrom s.
Lemma 6. For any state s, we have s !⇤ bs and bs 6!.
Proof. By structural induction on s.
Lemma 7. For any state s, we have s !⇤ es and es 6!.
Proof. By structural induction on s.
Final Algorithm With normalization and pruning, thesearch algorithm uses the following next function:
next(s) =
8<
:
; 9p 2 P. p 62 [[bs]]; 9n 2 N . n 2 [[es]]{normalize(s0) | s ! s0} otherwise
5 2016/6/4
⇤
a ✏ ; ⇤+⇤
a+ a a+ ✏ a+ ; a+ (⇤+⇤)
a+ (a+ a) a+ (a+ ✏) a+ (a+ ;) · · ·
a+ (⇤ ·⇤)
· · ·
a+ (⇤⇤)
· · ·
✏+ a ✏+ ✏ ✏+ ; ✏+ (⇤+⇤)
· · ·
· · ·
⇤ ·⇤· · ·
⇤⇤
· · ·
Figure 1. Exhaustive Search
e1 ! e01e1 + e2 ! e01 + e2
e2 ! e02e1 + e2 ! e1 + e02
e1 ! e01e1 · e2 ! e01 · e2
e2 ! e02e1 · e2 ! e1 · e02
e ! e0
e⇤ ! e0⇤e ! e0
e? ! e0?
⇤ ! aa 2 ⌃ ⇤ ! ✏ ⇤ ! ;
⇤ ! ⇤+⇤ ⇤ ! ⇤ ·⇤ ⇤ ! ⇤⇤ ⇤ ! ⇤?
Figure 2. Transition Relation between States
regular expression (e.g. c = 7). The number of states atdepth d in worst case is
N(0) = 1
N(d+ 1) = N(d) · c2d
when c = 7:
N(d) = 7Pd�1
k=0 2k 2 O(72d�1)
Search Strategy We pick a state that has a minimal cost,where the cost of states is defined as follows:
C(a) = C(✏) = C(;) = c1C(⇤) = c2 (c2 > c1)
C(e1 + e2) > C(e1) + C(e2)C(e1 · e2) > C(e1) + C(e2)
C(e⇤) > C(e)
Intuitively, we prefer simpler expressions by following theprinciple of Ockham’s razor, so that the solution found isthe simplest regular expression that is consistent with theexamples.
Algorithm 1 Search AlgorithmInput: Positive and negative examples (P,N )Output: A regular expression E consistent with (P,N )
1: W := {⇤}2: repeat3: pick s from W4: if solution(s) then return s5: else6: W := W [ next(s)7: end if8: until W 6= ;
3.2 NormalizationExamples:
[[s⇤s⇤]] = [[s⇤]]
[[(s+ s)]] = [[s]]
[[(s · s⇤)⇤]] = [[s⇤]]
...
3.3 Pruning Search SpaceDefinition 1 (Dead States). Let (P,N ) be a regular expres-sion problem. We say a state s 2 S is dead, denoted dead(s),iff every closed state s0 reachable from s is not a solution:
dead(s) ()�(s !⇤ s0) ^ s0 6! =) ¬solution(s0)
�.
Intuitively, a state s is dead if exploring further the reach-able states of s is guaranteed to fail to find a solution. Oursearch algorithm aims to identify as many dead states as pos-sible and does not attempt to explore beyond them. Specifi-cally, we identify two types of dead states: pdead and ndead.
Definition 2. A state s is dead for positive examples, de-noted pdead(s), iff every closed state s0 reachable from sfails to accept a positive example:
pdead(s) ()�s !⇤ s0 ^ s0 6! =) 9p 2 P. p 62 [[s0]]
�.
Example 1. Suppose b 2 P . Any closed state s0 reachablefrom state s = a·⇤ is doomed to reject the positive example;
3 2016/6/6
효율적인 상태공간 탐색기법• Pruning dead states: 탐색을 아무리 진행해도 해를 가질수 없는 상태공간은 탐색하지 않음
no matter how the hole gets instantiated, the string b cannotbe accepted.
(b 2 P)
...
a ·⇤
......
...
(a 2 N )
...
a · (⇤)⇤
......
...
Definition 3. A state s is dead for negative examples, de-noted ndead(s), iff every closed state s0 reachable from sfails to reject a negative example:
ndead(s) ()�s !⇤ s0 ^ s0 6! =) 9n 2 N . n 2 [[s0]]
�.
Example 2. Suppose a 2 N . Any closed state s0 reachablefrom state s = a · (⇤)⇤ is doomed to accept the negativeexample; no matter how the hole gets instantiated, the lan-guage of any reachable state includes the string a.
It is clear that a state is guaranteed to be dead if one ofpdead(s) and ndead(s) holds:
Lemma 1. Let s be any state. Then,�pdead(s) _ ndead(s)
�=) dead(s).
Note that, however, the converse of Lemma 1 is not true.Suppose s is a dead state. This means that every reach-able state s0 either rejects some positive example or acceptssome negative example. However, pdead(s) requires thatthe reachable state s0 always rejects some positive example.Similarly, ndead(s) requires a strong condition that everyreachable state s0 accepts some negative example.
Example 3. When P = N , no solutions cannot exist andthe initial state ⇤ is dead. However, neither pdead(s) norndead(s) holds, because we can always find a regular ex-pression (e.g., (a + b)⇤) that accepts all positive examplesand we can also always find a regular expression (e.g., ;)that rejects all negative examples.
We identify the pdead states and ndead states by comput-ing over- and under-approximations of states.
Definition 4. The over-approximation bs and under-approximationee of state s are defined inductively as follows:
ba = ab✏ = ✏b; = ;
\e1 + e2 = be1 + be2\e1 · e2 = be1 · be2
be⇤ = (be)⇤b⇤ = (a+ b)⇤
ea = ae✏ = ✏e; = ;
e1 + e2 = ee1 + ee2e1 · e2 = ee1 · ee2
ee⇤ = (ee)⇤e⇤ = ;
Intuitively, the over-approximation bs is obtained by replac-ing all holes in s by (a + b)⇤, and the under-approximationes is obtained by replacing the holes by ;.
Example 4. Consider a state s = a + (⇤ · ⇤). Then,bs = a+ ((a+ b)⇤ + (a+ b)⇤) and es = a+ (;+ ;).
bs is over-approximated in a sense that the language of bscontains all the languages of states reachable from s (Lemma2). Dually, es is under-approximated because every state s0
reachable from s subsumes the language of es (Lemma 3).
Lemma 2. For any state s, we have
[[bs]] ◆[
s!⇤s0^s0 6![[s0]].
Proof. Todo
Lemma 3. for any state s, we have
[[es]] ✓\
s!⇤s0^s0 6![[s0]].
Proof. Todo
Given a state s, we conclude that s is dead with positiveexample (i.e. pdead(s)) if bs rejects some positive example:
9p 2 P. p 62 [[bs]] (3)
and we conclude that s is dead with negative example (i.e.ndead(s)) if es accepts some negative example:
9n 2 N . n 2 [[es]]. (4)
Lemma 4 and 5 show that our algorithm for identifyingpdead and ndead states is both sound and complete.
Lemma 4. Let s be any state. Then,
pdead(s) () 9p 2 P. p 62 [[bs]].
Proof. Consider each direction.
• (=)) Suppose pdead(s) holds:
s !⇤ s0 ^ s0 6! =) 9p 2 P. p 62 [[s0]]. (5)
From (5) and Lemma 6, we obtain 9p 2 P. p 62 [[bs]].
4 2016/6/4
no matter how the hole gets instantiated, the string b cannotbe accepted.
(b 2 P)
...
a ·⇤
......
...
(a 2 N )
...
a · (⇤)⇤
......
...
Definition 3. A state s is dead for negative examples, de-noted ndead(s), iff every closed state s0 reachable from sfails to reject a negative example:
ndead(s) ()�s !⇤ s0 ^ s0 6! =) 9n 2 N . n 2 [[s0]]
�.
Example 2. Suppose a 2 N . Any closed state s0 reachablefrom state s = a · (⇤)⇤ is doomed to accept the negativeexample; no matter how the hole gets instantiated, the lan-guage of any reachable state includes the string a.
It is clear that a state is guaranteed to be dead if one ofpdead(s) and ndead(s) holds:
Lemma 1. Let s be any state. Then,�pdead(s) _ ndead(s)
�=) dead(s).
Note that, however, the converse of Lemma 1 is not true.Suppose s is a dead state. This means that every reach-able state s0 either rejects some positive example or acceptssome negative example. However, pdead(s) requires thatthe reachable state s0 always rejects some positive example.Similarly, ndead(s) requires a strong condition that everyreachable state s0 accepts some negative example.
Example 3. When P = N , no solutions cannot exist andthe initial state ⇤ is dead. However, neither pdead(s) norndead(s) holds, because we can always find a regular ex-pression (e.g., (a + b)⇤) that accepts all positive examplesand we can also always find a regular expression (e.g., ;)that rejects all negative examples.
We identify the pdead states and ndead states by comput-ing over- and under-approximations of states.
Definition 4. The over-approximation bs and under-approximationee of state s are defined inductively as follows:
ba = ab✏ = ✏b; = ;
\e1 + e2 = be1 + be2\e1 · e2 = be1 · be2
be⇤ = (be)⇤b⇤ = (a+ b)⇤
ea = ae✏ = ✏e; = ;
e1 + e2 = ee1 + ee2e1 · e2 = ee1 · ee2
ee⇤ = (ee)⇤e⇤ = ;
Intuitively, the over-approximation bs is obtained by replac-ing all holes in s by (a + b)⇤, and the under-approximationes is obtained by replacing the holes by ;.
Example 4. Consider a state s = a + (⇤ · ⇤). Then,bs = a+ ((a+ b)⇤ + (a+ b)⇤) and es = a+ (;+ ;).
bs is over-approximated in a sense that the language of bscontains all the languages of states reachable from s (Lemma2). Dually, es is under-approximated because every state s0
reachable from s subsumes the language of es (Lemma 3).
Lemma 2. For any state s, we have
[[bs]] ◆[
s!⇤s0^s0 6![[s0]].
Proof. Todo
Lemma 3. for any state s, we have
[[es]] ✓\
s!⇤s0^s0 6![[s0]].
Proof. Todo
Given a state s, we conclude that s is dead with positiveexample (i.e. pdead(s)) if bs rejects some positive example:
9p 2 P. p 62 [[bs]] (3)
and we conclude that s is dead with negative example (i.e.ndead(s)) if es accepts some negative example:
9n 2 N . n 2 [[es]]. (4)
Lemma 4 and 5 show that our algorithm for identifyingpdead and ndead states is both sound and complete.
Lemma 4. Let s be any state. Then,
pdead(s) () 9p 2 P. p 62 [[bs]].
Proof. Consider each direction.
• (=)) Suppose pdead(s) holds:
s !⇤ s0 ^ s0 6! =) 9p 2 P. p 62 [[s0]]. (5)
From (5) and Lemma 6, we obtain 9p 2 P. p 62 [[bs]].
4 2016/6/4
• Pruning redundant states: 해를 가질 수 있더라도 다른 곳에 더 간단한 해가 존재하는 상태공간은 탐색하지 않음
no matter how the hole gets instantiated, the string b cannotbe accepted.
(b 2 P)
...
a ·⇤
......
...
(a 2 N )
...
a · (⇤)⇤
......
...
(aab 2 P)
...
a · (b+ ✏) ·⇤
......
...
Definition 3. A state s is dead for negative examples, de-noted ndead(s), iff every closed state s0 reachable from sfails to reject a negative example:
ndead(s) ()�s !⇤ s0 ^ s0 6! =) 9n 2 N . n 2 [[s0]]
�.
Example 2. Suppose a 2 N . Any closed state s0 reachablefrom state s = a · (⇤)⇤ is doomed to accept the negativeexample; no matter how the hole gets instantiated, the lan-guage of any reachable state includes the string a.
It is clear that a state is guaranteed to be dead if one ofpdead(s) and ndead(s) holds:
Lemma 1. Let s be any state. Then,�pdead(s) _ ndead(s)
�=) dead(s).
Note that, however, the converse of Lemma 1 is not true.Suppose s is a dead state. This means that every reach-able state s0 either rejects some positive example or acceptssome negative example. However, pdead(s) requires thatthe reachable state s0 always rejects some positive example.Similarly, ndead(s) requires a strong condition that everyreachable state s0 accepts some negative example.
Example 3. When P = N , no solutions cannot exist andthe initial state ⇤ is dead. However, neither pdead(s) norndead(s) holds, because we can always find a regular ex-pression (e.g., (a + b)⇤) that accepts all positive examplesand we can also always find a regular expression (e.g., ;)that rejects all negative examples.
We identify the pdead states and ndead states by comput-ing over- and under-approximations of states.
Definition 4. The over-approximation bs and under-approximationee of state s are defined inductively as follows:
ba = ab✏ = ✏b; = ;
\e1 + e2 = be1 + be2\e1 · e2 = be1 · be2
be⇤ = (be)⇤b⇤ = (a+ b)⇤
ea = ae✏ = ✏e; = ;
e1 + e2 = ee1 + ee2e1 · e2 = ee1 · ee2
ee⇤ = (ee)⇤e⇤ = ;
Intuitively, the over-approximation bs is obtained by replac-ing all holes in s by (a + b)⇤, and the under-approximationes is obtained by replacing the holes by ;.
Example 4. Consider a state s = a + (⇤ · ⇤). Then,bs = a+ ((a+ b)⇤ + (a+ b)⇤) and es = a+ (;+ ;).
bs is over-approximated in a sense that the language of bscontains all the languages of states reachable from s (Lemma2). Dually, es is under-approximated because every state s0
reachable from s subsumes the language of es (Lemma 3).
Lemma 2. For any state s, we have
[[bs]] ◆[
s!⇤s0^s0 6![[s0]].
Proof. Todo
Lemma 3. for any state s, we have
[[es]] ✓\
s!⇤s0^s0 6![[s0]].
Proof. Todo
Given a state s, we conclude that s is dead with positiveexample (i.e. pdead(s)) if bs rejects some positive example:
9p 2 P. p 62 [[bs]] (3)
and we conclude that s is dead with negative example (i.e.ndead(s)) if es accepts some negative example:
9n 2 N . n 2 [[es]]. (4)
Lemma 4 and 5 show that our algorithm for identifyingpdead and ndead states is both sound and complete.
4 2016/6/4
효율적인 상태공간 탐색기법• Pruning dead states: 탐색을 아무리 진행해도 해를 가질수 없는 상태공간은 탐색하지 않음
no matter how the hole gets instantiated, the string b cannotbe accepted.
(b 2 P)
...
a ·⇤
......
...
(a 2 N )
...
a · (⇤)⇤
......
...
Definition 3. A state s is dead for negative examples, de-noted ndead(s), iff every closed state s0 reachable from sfails to reject a negative example:
ndead(s) ()�s !⇤ s0 ^ s0 6! =) 9n 2 N . n 2 [[s0]]
�.
Example 2. Suppose a 2 N . Any closed state s0 reachablefrom state s = a · (⇤)⇤ is doomed to accept the negativeexample; no matter how the hole gets instantiated, the lan-guage of any reachable state includes the string a.
It is clear that a state is guaranteed to be dead if one ofpdead(s) and ndead(s) holds:
Lemma 1. Let s be any state. Then,�pdead(s) _ ndead(s)
�=) dead(s).
Note that, however, the converse of Lemma 1 is not true.Suppose s is a dead state. This means that every reach-able state s0 either rejects some positive example or acceptssome negative example. However, pdead(s) requires thatthe reachable state s0 always rejects some positive example.Similarly, ndead(s) requires a strong condition that everyreachable state s0 accepts some negative example.
Example 3. When P = N , no solutions cannot exist andthe initial state ⇤ is dead. However, neither pdead(s) norndead(s) holds, because we can always find a regular ex-pression (e.g., (a + b)⇤) that accepts all positive examplesand we can also always find a regular expression (e.g., ;)that rejects all negative examples.
We identify the pdead states and ndead states by comput-ing over- and under-approximations of states.
Definition 4. The over-approximation bs and under-approximationee of state s are defined inductively as follows:
ba = ab✏ = ✏b; = ;
\e1 + e2 = be1 + be2\e1 · e2 = be1 · be2
be⇤ = (be)⇤b⇤ = (a+ b)⇤
ea = ae✏ = ✏e; = ;
e1 + e2 = ee1 + ee2e1 · e2 = ee1 · ee2
ee⇤ = (ee)⇤e⇤ = ;
Intuitively, the over-approximation bs is obtained by replac-ing all holes in s by (a + b)⇤, and the under-approximationes is obtained by replacing the holes by ;.
Example 4. Consider a state s = a + (⇤ · ⇤). Then,bs = a+ ((a+ b)⇤ + (a+ b)⇤) and es = a+ (;+ ;).
bs is over-approximated in a sense that the language of bscontains all the languages of states reachable from s (Lemma2). Dually, es is under-approximated because every state s0
reachable from s subsumes the language of es (Lemma 3).
Lemma 2. For any state s, we have
[[bs]] ◆[
s!⇤s0^s0 6![[s0]].
Proof. Todo
Lemma 3. for any state s, we have
[[es]] ✓\
s!⇤s0^s0 6![[s0]].
Proof. Todo
Given a state s, we conclude that s is dead with positiveexample (i.e. pdead(s)) if bs rejects some positive example:
9p 2 P. p 62 [[bs]] (3)
and we conclude that s is dead with negative example (i.e.ndead(s)) if es accepts some negative example:
9n 2 N . n 2 [[es]]. (4)
Lemma 4 and 5 show that our algorithm for identifyingpdead and ndead states is both sound and complete.
Lemma 4. Let s be any state. Then,
pdead(s) () 9p 2 P. p 62 [[bs]].
Proof. Consider each direction.
• (=)) Suppose pdead(s) holds:
s !⇤ s0 ^ s0 6! =) 9p 2 P. p 62 [[s0]]. (5)
From (5) and Lemma 6, we obtain 9p 2 P. p 62 [[bs]].
4 2016/6/4
no matter how the hole gets instantiated, the string b cannotbe accepted.
(b 2 P)
...
a ·⇤
......
...
(a 2 N )
...
a · (⇤)⇤
......
...
Definition 3. A state s is dead for negative examples, de-noted ndead(s), iff every closed state s0 reachable from sfails to reject a negative example:
ndead(s) ()�s !⇤ s0 ^ s0 6! =) 9n 2 N . n 2 [[s0]]
�.
Example 2. Suppose a 2 N . Any closed state s0 reachablefrom state s = a · (⇤)⇤ is doomed to accept the negativeexample; no matter how the hole gets instantiated, the lan-guage of any reachable state includes the string a.
It is clear that a state is guaranteed to be dead if one ofpdead(s) and ndead(s) holds:
Lemma 1. Let s be any state. Then,�pdead(s) _ ndead(s)
�=) dead(s).
Note that, however, the converse of Lemma 1 is not true.Suppose s is a dead state. This means that every reach-able state s0 either rejects some positive example or acceptssome negative example. However, pdead(s) requires thatthe reachable state s0 always rejects some positive example.Similarly, ndead(s) requires a strong condition that everyreachable state s0 accepts some negative example.
Example 3. When P = N , no solutions cannot exist andthe initial state ⇤ is dead. However, neither pdead(s) norndead(s) holds, because we can always find a regular ex-pression (e.g., (a + b)⇤) that accepts all positive examplesand we can also always find a regular expression (e.g., ;)that rejects all negative examples.
We identify the pdead states and ndead states by comput-ing over- and under-approximations of states.
Definition 4. The over-approximation bs and under-approximationee of state s are defined inductively as follows:
ba = ab✏ = ✏b; = ;
\e1 + e2 = be1 + be2\e1 · e2 = be1 · be2
be⇤ = (be)⇤b⇤ = (a+ b)⇤
ea = ae✏ = ✏e; = ;
e1 + e2 = ee1 + ee2e1 · e2 = ee1 · ee2
ee⇤ = (ee)⇤e⇤ = ;
Intuitively, the over-approximation bs is obtained by replac-ing all holes in s by (a + b)⇤, and the under-approximationes is obtained by replacing the holes by ;.
Example 4. Consider a state s = a + (⇤ · ⇤). Then,bs = a+ ((a+ b)⇤ + (a+ b)⇤) and es = a+ (;+ ;).
bs is over-approximated in a sense that the language of bscontains all the languages of states reachable from s (Lemma2). Dually, es is under-approximated because every state s0
reachable from s subsumes the language of es (Lemma 3).
Lemma 2. For any state s, we have
[[bs]] ◆[
s!⇤s0^s0 6![[s0]].
Proof. Todo
Lemma 3. for any state s, we have
[[es]] ✓\
s!⇤s0^s0 6![[s0]].
Proof. Todo
Given a state s, we conclude that s is dead with positiveexample (i.e. pdead(s)) if bs rejects some positive example:
9p 2 P. p 62 [[bs]] (3)
and we conclude that s is dead with negative example (i.e.ndead(s)) if es accepts some negative example:
9n 2 N . n 2 [[es]]. (4)
Lemma 4 and 5 show that our algorithm for identifyingpdead and ndead states is both sound and complete.
Lemma 4. Let s be any state. Then,
pdead(s) () 9p 2 P. p 62 [[bs]].
Proof. Consider each direction.
• (=)) Suppose pdead(s) holds:
s !⇤ s0 ^ s0 6! =) 9p 2 P. p 62 [[s0]]. (5)
From (5) and Lemma 6, we obtain 9p 2 P. p 62 [[bs]].
4 2016/6/4
• Pruning redundant states: 해를 가질 수 있더라도 다른 곳에 더 간단한 해가 존재하는 상태공간은 탐색하지 않음
no matter how the hole gets instantiated, the string b cannotbe accepted.
(b 2 P)
...
a ·⇤
......
...
(a 2 N )
...
a · (⇤)⇤
......
...
(aab 2 P)
...
a · (b+ ✏) ·⇤
......
...
Definition 3. A state s is dead for negative examples, de-noted ndead(s), iff every closed state s0 reachable from sfails to reject a negative example:
ndead(s) ()�s !⇤ s0 ^ s0 6! =) 9n 2 N . n 2 [[s0]]
�.
Example 2. Suppose a 2 N . Any closed state s0 reachablefrom state s = a · (⇤)⇤ is doomed to accept the negativeexample; no matter how the hole gets instantiated, the lan-guage of any reachable state includes the string a.
It is clear that a state is guaranteed to be dead if one ofpdead(s) and ndead(s) holds:
Lemma 1. Let s be any state. Then,�pdead(s) _ ndead(s)
�=) dead(s).
Note that, however, the converse of Lemma 1 is not true.Suppose s is a dead state. This means that every reach-able state s0 either rejects some positive example or acceptssome negative example. However, pdead(s) requires thatthe reachable state s0 always rejects some positive example.Similarly, ndead(s) requires a strong condition that everyreachable state s0 accepts some negative example.
Example 3. When P = N , no solutions cannot exist andthe initial state ⇤ is dead. However, neither pdead(s) norndead(s) holds, because we can always find a regular ex-pression (e.g., (a + b)⇤) that accepts all positive examplesand we can also always find a regular expression (e.g., ;)that rejects all negative examples.
We identify the pdead states and ndead states by comput-ing over- and under-approximations of states.
Definition 4. The over-approximation bs and under-approximationee of state s are defined inductively as follows:
ba = ab✏ = ✏b; = ;
\e1 + e2 = be1 + be2\e1 · e2 = be1 · be2
be⇤ = (be)⇤b⇤ = (a+ b)⇤
ea = ae✏ = ✏e; = ;
e1 + e2 = ee1 + ee2e1 · e2 = ee1 · ee2
ee⇤ = (ee)⇤e⇤ = ;
Intuitively, the over-approximation bs is obtained by replac-ing all holes in s by (a + b)⇤, and the under-approximationes is obtained by replacing the holes by ;.
Example 4. Consider a state s = a + (⇤ · ⇤). Then,bs = a+ ((a+ b)⇤ + (a+ b)⇤) and es = a+ (;+ ;).
bs is over-approximated in a sense that the language of bscontains all the languages of states reachable from s (Lemma2). Dually, es is under-approximated because every state s0
reachable from s subsumes the language of es (Lemma 3).
Lemma 2. For any state s, we have
[[bs]] ◆[
s!⇤s0^s0 6![[s0]].
Proof. Todo
Lemma 3. for any state s, we have
[[es]] ✓\
s!⇤s0^s0 6![[s0]].
Proof. Todo
Given a state s, we conclude that s is dead with positiveexample (i.e. pdead(s)) if bs rejects some positive example:
9p 2 P. p 62 [[bs]] (3)
and we conclude that s is dead with negative example (i.e.ndead(s)) if es accepts some negative example:
9n 2 N . n 2 [[es]]. (4)
Lemma 4 and 5 show that our algorithm for identifyingpdead and ndead states is both sound and complete.
4 2016/6/4
⇤
a ✏ ; ⇤+⇤
a+ a a+ ✏ a+ ; a+ (⇤+⇤)
a+ (a+ a) a+ (a+ ✏) a+ (a+ ;) · · ·
a+ (⇤ ·⇤)
· · ·
a+ (⇤⇤)
· · ·
✏+ a ✏+ ✏ ✏+ ; ✏+ (⇤+⇤)
· · ·
· · ·
⇤ ·⇤· · ·
⇤⇤
· · ·
Figure 1. Exhaustive Search
e1 ! e01e1 + e2 ! e01 + e2
e2 ! e02e1 + e2 ! e1 + e02
e1 ! e01e1 · e2 ! e01 · e2
e2 ! e02e1 · e2 ! e1 · e02
e ! e0
e⇤ ! e0⇤e ! e0
e? ! e0?
⇤ ! aa 2 ⌃ ⇤ ! ✏ ⇤ ! ;
⇤ ! ⇤+⇤ ⇤ ! ⇤ ·⇤ ⇤ ! ⇤⇤ ⇤ ! ⇤?
Figure 2. Transition Relation between States
regular expression (e.g. c = 7). The number of states atdepth d in worst case is
N(0) = 1
N(d+ 1) = N(d) · c2d
when c = 7:
N(d) = 7Pd�1
k=0 2k 2 O(72d�1)
Search Strategy We pick a state that has a minimal cost,where the cost of states is defined as follows:
C(a) = C(✏) = C(;) = c1C(⇤) = c2 (c2 > c1)
C(e1 + e2) > C(e1) + C(e2)C(e1 · e2) > C(e1) + C(e2)
C(e⇤) > C(e)
Intuitively, we prefer simpler expressions by following theprinciple of Ockham’s razor, so that the solution found isthe simplest regular expression that is consistent with theexamples.
Algorithm 1 Search AlgorithmInput: Positive and negative examples (P,N )Output: A regular expression E consistent with (P,N )
1: W := {⇤}2: repeat3: pick s from W4: if solution(s) then return s5: else6: W := W [ next(s)7: end if8: until W 6= ;
3.2 NormalizationExamples:
[[s⇤s⇤]] = [[s⇤]]
[[(s+ s)]] = [[s]]
[[(s · s⇤)⇤]] = [[s⇤]]
...
3.3 Pruning Search SpaceDefinition 1 (Dead States). Let (P,N ) be a regular expres-sion problem. We say a state s 2 S is dead, denoted dead(s),iff every closed state s0 reachable from s is not a solution:
dead(s) ()�(s !⇤ s0) ^ s0 6! =) ¬solution(s0)
�.
Intuitively, a state s is dead if exploring further the reach-able states of s is guaranteed to fail to find a solution. Oursearch algorithm aims to identify as many dead states as pos-sible and does not attempt to explore beyond them. Specifi-cally, we identify two types of dead states: pdead and ndead.
Definition 2. A state s is dead for positive examples, de-noted pdead(s), iff every closed state s0 reachable from sfails to accept a positive example:
pdead(s) ()�s !⇤ s0 ^ s0 6! =) 9p 2 P. p 62 [[s0]]
�.
Example 1. Suppose b 2 P . Any closed state s0 reachablefrom state s = a·⇤ is doomed to reject the positive example;
3 2016/6/4
⇤
a ✏ ; ⇤+⇤
a+ a a+ ✏ a+ ; a+ (⇤+⇤)
a+ (a+ a) a+ (a+ ✏) a+ (a+ ;) · · ·
a+ (⇤ ·⇤)
· · ·
a+ (⇤⇤)
· · ·
✏+ a ✏+ ✏ ✏+ ; ✏+ (⇤+⇤)
· · ·
· · ·
⇤ ·⇤· · ·
⇤⇤
· · ·
Figure 1. Exhaustive Search
e1 ! e01e1 + e2 ! e01 + e2
e2 ! e02e1 + e2 ! e1 + e02
e1 ! e01e1 · e2 ! e01 · e2
e2 ! e02e1 · e2 ! e1 · e02
e ! e0
e⇤ ! e0⇤e ! e0
e? ! e0?
⇤ ! aa 2 ⌃ ⇤ ! ✏ ⇤ ! ;
⇤ ! ⇤+⇤ ⇤ ! ⇤ ·⇤ ⇤ ! ⇤⇤ ⇤ ! ⇤?
Figure 2. Transition Relation between States
for regular expression (e.g. c = 7). The number of states atdepth d in worst case is
N(0) = 1
N(d+ 1) = N(d) · c2d
when c = 7:
N(d) = 7Pd�1
k=0 2k 2 O(72d�1)
Search Strategy We pick a state that has a minimal cost,where the cost of states is defined as follows:
C(a) = C(✏) = C(;) = 1C(e1 + e2) = C(e1) + C(e2) + 5C(e1 · e2) = C(e1) + C(e2) + 5
C(e⇤) = C(e) + 5C(⇤) = 10
Intuitively, we prefer simpler expressions by following theprinciple of Ockham’s razor, so that the solution found isthe simplest regular expression that is consistent with theexamples.
Algorithm 1 Search AlgorithmInput: Positive and negative examples (P,N )Output: A regular expression E consistent with (P,N )
1: W := {⇤}2: repeat3: pick s from W4: if solution(s) then return s5: else6: W := W [ next(s)7: end if8: until W 6= ;
3.2 NormalizationExamples:
[[s⇤s⇤]] = [[s⇤]]
[[(s+ s)]] = [[s]]
[[(s · s⇤)⇤]] = [[s⇤]]
...
3.3 Pruning Search SpaceDefinition 1 (Dead States). Let (P,N ) be a regular expres-sion problem. We say a state s 2 S is dead, denoted dead(s),iff every closed state s0 reachable from s is not a solution:
dead(s) ()�(s !⇤ s0) ^ s0 6! =) ¬solution(s0)
�.
Intuitively, a state s is dead if exploring further the reach-able states of s is guaranteed to fail to find a solution. Oursearch algorithm aims to identify as many dead states as pos-sible and does not attempt to explore beyond them. Specifi-cally, we identify two types of dead states: pdead and ndead.
Definition 2. A state s is dead for positive examples, de-noted pdead(s), iff every closed state s0 reachable from sfails to accept a positive example:
pdead(s) ()�s !⇤ s0 ^ s0 6! =) 9p 2 P. p 62 [[s0]]
�.
Example 1. Suppose b 2 P . Any closed state s0 reachablefrom state s = a·⇤ is doomed to reject the positive example;
3 2016/6/4
엄밀한 이론에 기반
고안한 가지치기 기법들은 프로그래밍 언어 이론에 기반하여 결과의 안전성(Soundness)을 보장.
정규식 합성 알고리즘
• 기본 알고리즘: 정규식 문법으로 생성되는 모든 상태공간을 탐색
⇤
a ✏ ; ⇤+⇤
a+ a a+ ✏ a+ ; a+ (⇤+⇤)
a+ (a+ a) a+ (a+ ✏) a+ (a+ ;) · · ·
a+ (⇤ ·⇤)
· · ·
a+ (⇤⇤)
· · ·
✏+ a ✏+ ✏ ✏+ ; ✏+ (⇤+⇤)
· · ·
· · ·
⇤ ·⇤· · ·
⇤⇤
· · ·
Figure 1. search space
e1 ! e01e1 + e2 ! e01 + e2
e2 ! e02e1 + e2 ! e1 + e02
e1 ! e01e1 · e2 ! e01 · e2
e2 ! e02e1 · e2 ! e1 · e02
e ! e0
e⇤ ! e0⇤e ! e0
e? ! e0?
⇤ ! aa 2 ⌃ ⇤ ! ✏ ⇤ ! ;
⇤ ! ⇤+⇤ ⇤ ! ⇤ ·⇤ ⇤ ! ⇤⇤ ⇤ ! ⇤?
Figure 2. Transition Relation between States
Algorithm 1 Search AlgorithmInput: Positive and negative examples (P,N )Output: A regular expression E consistent with (P,N )
1: W := {⇤}2: repeat3: pick s from W4: if solution(s) then return s5: else6: W := W [ next(s)7: end if8: until W 6= ;
Search Strategy We pick a state that has a minimal cost,where the cost of states is defined as follows:
C(a) = C(✏) = C(;) = 1C(e1 + e2) = C(e1) + C(e2) + 5C(e1 · e2) = C(e1) + C(e2) + 5
C(e⇤) = C(e) + 5C(⇤) = 10
Intuitively, we prefer simpler expressions by following theprinciple of Ockham’s razor, so that the solution found isthe simplest regular expression that is consistent with theexamples.
3.2 NormalizationExamples:
s⇤s⇤ ! s⇤
(s+ s) ! s
(s · s⇤)⇤ ! s⇤
...
3.3 Pruning Search SpaceDefinition 1 (Dead States). Let (P,N ) be a regular expres-sion problem. We say a state s 2 S is dead, denoted dead(s),iff every closed state s0 reachable from s is not a solution:
dead(s) ()�(s !⇤ s0) ^ s0 6! =) ¬solution(s0)
�.
Intuitively, a state s is dead if exploring further the reach-able states of s is guaranteed to fail to find a solution. Oursearch algorithm aims to identify as many dead states as pos-sible and does not attempt to explore beyond them. Specifi-cally, we identify two types of dead states: pdead and ndead.
Definition 2. A state s is dead for positive examples, de-noted pdead(s), iff every closed state s0 reachable from sfails to accept a positive example:
pdead(s) ()�s !⇤ s0 ^ s0 6! =) 9p 2 P. p 62 [[s0]]
�.
Example 1. Suppose b 2 P . Any closed state s0 reachablefrom state s = a·⇤ is doomed to reject the positive example;no matter how the hole gets instantiated, the string b cannotbe accepted.
Definition 3. A state s is dead for negative examples, de-noted ndead(s), iff every closed state s0 reachable from sfails to reject a negative example:
ndead(s) ()�s !⇤ s0 ^ s0 6! =) 9n 2 N . n 2 [[s0]]
�.
Example 2. Suppose a 2 N . Any closed state s0 reach-able from state s = a(⇤)⇤ is doomed to accept the negativeexample; no matter how the hole gets instantiated, the lan-guage of any reachable state includes the string a.
It is clear that a state is guaranteed to be dead if one ofpdead(s) and ndead(s) holds:
3 2016/6/4
Challenge: 매우 큰 상태공간
⇤
a ✏ ; ⇤+⇤
a+ a a+ ✏ a+ ; a+ (⇤+⇤)
a+ (a+ a) a+ (a+ ✏) a+ (a+ ;) · · ·
a+ (⇤ ·⇤)
· · ·
a+ (⇤⇤)
· · ·
✏+ a ✏+ ✏ ✏+ ; ✏+ (⇤+⇤)
· · ·
· · ·
⇤ ·⇤· · ·
⇤⇤
· · ·
Figure 1. Exhaustive Search
e1 ! e01e1 + e2 ! e01 + e2
e2 ! e02e1 + e2 ! e1 + e02
e1 ! e01e1 · e2 ! e01 · e2
e2 ! e02e1 · e2 ! e1 · e02
e ! e0
e⇤ ! e0⇤e ! e0
e? ! e0?
⇤ ! aa 2 ⌃ ⇤ ! ✏ ⇤ ! ;
⇤ ! ⇤+⇤ ⇤ ! ⇤ ·⇤ ⇤ ! ⇤⇤ ⇤ ! ⇤?
Figure 2. Transition Relation between States
for regular expression (e.g. c = 7). The number of states atdepth d in worst case is
N(0) = 1
N(d+ 1) = N(d) · c2d
when c = 7:
N(d) = 7Pd�1
k=0 2k 2 O(72d�1)
Search Strategy We pick a state that has a minimal cost,where the cost of states is defined as follows:
C(a) = C(✏) = C(;) = 1C(e1 + e2) = C(e1) + C(e2) + 5C(e1 · e2) = C(e1) + C(e2) + 5
C(e⇤) = C(e) + 5C(⇤) = 10
Intuitively, we prefer simpler expressions by following theprinciple of Ockham’s razor, so that the solution found isthe simplest regular expression that is consistent with theexamples.
Algorithm 1 Search AlgorithmInput: Positive and negative examples (P,N )Output: A regular expression E consistent with (P,N )
1: W := {⇤}2: repeat3: pick s from W4: if solution(s) then return s5: else6: W := W [ next(s)7: end if8: until W 6= ;
3.2 NormalizationExamples:
s⇤s⇤ ! s⇤
(s+ s) ! s
(s · s⇤)⇤ ! s⇤
...
3.3 Pruning Search SpaceDefinition 1 (Dead States). Let (P,N ) be a regular expres-sion problem. We say a state s 2 S is dead, denoted dead(s),iff every closed state s0 reachable from s is not a solution:
dead(s) ()�(s !⇤ s0) ^ s0 6! =) ¬solution(s0)
�.
Intuitively, a state s is dead if exploring further the reach-able states of s is guaranteed to fail to find a solution. Oursearch algorithm aims to identify as many dead states as pos-sible and does not attempt to explore beyond them. Specifi-cally, we identify two types of dead states: pdead and ndead.
Definition 2. A state s is dead for positive examples, de-noted pdead(s), iff every closed state s0 reachable from sfails to accept a positive example:
pdead(s) ()�s !⇤ s0 ^ s0 6! =) 9p 2 P. p 62 [[s0]]
�.
Example 1. Suppose b 2 P . Any closed state s0 reachablefrom state s = a·⇤ is doomed to reject the positive example;
3 2016/6/4
깊이 d에 있는 상태개수:
actively responds to each input by taking only a few secondsfor deriving new regular expressions that reflect the change.
Contributions This paper makes the following contribu-tions:• We present a new synthesis algorithm for synthesizing
regular expressions in realtime from examples. The mainnovelty is the techniques that effectively prune out largesearch space using over- and under-approximations ofregular expressions.
• We evaluate the proposed technique on 30 benchmarkproblems. The results show that our method quickly de-rive regular expressions on all of the benchmarks withinfew seconds.
• We implement the technique in a tool, ALPHAREGEX,and made it publicly available at http://prl.korea.ac.kr/AlphaRegex.
2. Regular Expression Problems2.1 Regular ExpressionsIntroductory textbooks on automata theory [? ? ? ] use thefollowing syntax for regular expressions:
e ! a 2 ⌃ | ✏ | ; | e1 + e2 | e1 · e2 | e⇤ (1)
A symbol a from an alphabet ⌃, the empty string ✏, and theempty language ;, constitute the primitive regular expres-sions. The remaining cases are inductively defined. Givenregular expressions e1 and e2, we can construct regular ex-pressions by taking the union e1 + e2 or the concatenatione1 · e2. e⇤ denotes the Kleene closure of e. In the introduc-tory courses, the alphabet is typically assumed to be binary;we assume ⌃ = {a, b} in the rest of this paper.
Formally, a regular expression e denotes a language (i.e.a set of strings). We write [[e]] ✓ ⌃⇤ for the language that edenotes, which is inductively defined as follows:
[[a]] = {a}[[✏]] = {✏}[[;]] = ;
[[e1 + e2]] = [[e1]] [ [[e2]][[e1 · e2]] = [[e1]][[e2]]
[[e⇤]] = [[e]]⇤
2.2 Regular Expression ProblemsIn a regular expression problem, students are given with adescription of a regular language L. We assume that the de-scription of a language is given by a pair (P,N ) of examplestrings, where P ✓ ⌃⇤ is a set of positive examples thatmust be included in the language and N ✓ ⌃⇤ is a set ofnegative examples that must be excluded from the language.Given (P,N ), the regular expression problem asks studentsto find a regular expression e that is consistent with the givenexamples:
8p 2 P.p 2 [[e]] ^ 8n 2 N .n 62 [[e]].
3. Our Synthesis Algorithm3.1 Basic Search AlgorithmSuppose a regular expression problem (P,N ) is given. Weformulate this problem as a search problem and present anefficient algorithm to find a solution. The search problem isdefined by a transition system (S,!, I, F ), where S is theset of states, (!) ✓ S ⇥ S is a transition relation, I 2 S isan initial state, and F ✓ S is a set of final, solution states.
• States: A state s 2 S is a partial regular expression thatpossibly has holes (⇤). A hole is a placeholder that canbe replaced by another regular expression. The set S ofstates is inductively defined as follows:
s ! a 2 ⌃ | ✏ | ; | s1 + s2 | s1 · s2 | s⇤ | ⇤ (2)
Note that a state has multiple holes. For example, (a +(⇤ ·⇤))⇤ is a state which has two holes in it.
• Initial State: The initial state is a single hole, i.e., I = ⇤.• Transition Relation: The transition relation (!) ✓ S ⇥S determines the next states of a given state. The transi-tion relation ! is inductively defined as a set of inferencerules in Figure 2. For example, (a+⇤)⇤ ! (a+(⇤·⇤))⇤
because we can find a derivation according to the infer-ence rules as follows:
⇤ ! ⇤ ·⇤(a+⇤) ! (a+ (⇤ ·⇤))
(a+⇤)⇤ ! (a+ (⇤ ·⇤))⇤
We write next(s) for the set of all states that follow s:
next(s) = {s0 | s ! s0}.
For example, when ⌃ = {a, b}, next(a + ⇤) = {(a +a)⇤, (a + b)⇤, (a + ✏)⇤, (a + ;)⇤, (a + (⇤ + ⇤))⇤, (a +(⇤ · ⇤))⇤, (a + (⇤⇤))⇤, (a + (⇤?))⇤}. We write s 6! toindicate that s has no next states; that is, s is a closedexpression with no holes.
• Solution States: A state s is a solution state iff s is aclosed expression (i.e., s 6!) and s is consistent with thegiven positive and negative examples:
solution(s) ()s 6! ^ 8p 2 P.p 2 [[s]] ^ 8n 2 N .n 62 [[s]].
Algorithm 1 presents a naive workset algorithm that solvesthe search problem. Initially, the workset consists of theinitial state (line 1). We choose and remove a state s fromthe workset (line 3). If a solution is found, it is returned.Otherwise, we search for the next states of s by adding theminto the workset.
Size of Search Space The maximum number of holes instate at depth d is 2d. The number of next states for a statewith n holes is cn, where c is the number of inductive rules
2 2016/6/4
정규식 합성 알고리즘
• 기본 알고리즘: 정규식 문법으로 생성되는 모든 상태공간을 탐색
⇤
a ✏ ; ⇤+⇤
a+ a a+ ✏ a+ ; a+ (⇤+⇤)
a+ (a+ a) a+ (a+ ✏) a+ (a+ ;) · · ·
a+ (⇤ ·⇤)
· · ·
a+ (⇤⇤)
· · ·
✏+ a ✏+ ✏ ✏+ ; ✏+ (⇤+⇤)
· · ·
· · ·
⇤ ·⇤· · ·
⇤⇤
· · ·
Figure 1. search space
e1 ! e01e1 + e2 ! e01 + e2
e2 ! e02e1 + e2 ! e1 + e02
e1 ! e01e1 · e2 ! e01 · e2
e2 ! e02e1 · e2 ! e1 · e02
e ! e0
e⇤ ! e0⇤e ! e0
e? ! e0?
⇤ ! aa 2 ⌃ ⇤ ! ✏ ⇤ ! ;
⇤ ! ⇤+⇤ ⇤ ! ⇤ ·⇤ ⇤ ! ⇤⇤ ⇤ ! ⇤?
Figure 2. Transition Relation between States
Algorithm 1 Search AlgorithmInput: Positive and negative examples (P,N )Output: A regular expression E consistent with (P,N )
1: W := {⇤}2: repeat3: pick s from W4: if solution(s) then return s5: else6: W := W [ next(s)7: end if8: until W 6= ;
Search Strategy We pick a state that has a minimal cost,where the cost of states is defined as follows:
C(a) = C(✏) = C(;) = 1C(e1 + e2) = C(e1) + C(e2) + 5C(e1 · e2) = C(e1) + C(e2) + 5
C(e⇤) = C(e) + 5C(⇤) = 10
Intuitively, we prefer simpler expressions by following theprinciple of Ockham’s razor, so that the solution found isthe simplest regular expression that is consistent with theexamples.
3.2 NormalizationExamples:
s⇤s⇤ ! s⇤
(s+ s) ! s
(s · s⇤)⇤ ! s⇤
...
3.3 Pruning Search SpaceDefinition 1 (Dead States). Let (P,N ) be a regular expres-sion problem. We say a state s 2 S is dead, denoted dead(s),iff every closed state s0 reachable from s is not a solution:
dead(s) ()�(s !⇤ s0) ^ s0 6! =) ¬solution(s0)
�.
Intuitively, a state s is dead if exploring further the reach-able states of s is guaranteed to fail to find a solution. Oursearch algorithm aims to identify as many dead states as pos-sible and does not attempt to explore beyond them. Specifi-cally, we identify two types of dead states: pdead and ndead.
Definition 2. A state s is dead for positive examples, de-noted pdead(s), iff every closed state s0 reachable from sfails to accept a positive example:
pdead(s) ()�s !⇤ s0 ^ s0 6! =) 9p 2 P. p 62 [[s0]]
�.
Example 1. Suppose b 2 P . Any closed state s0 reachablefrom state s = a·⇤ is doomed to reject the positive example;no matter how the hole gets instantiated, the string b cannotbe accepted.
Definition 3. A state s is dead for negative examples, de-noted ndead(s), iff every closed state s0 reachable from sfails to reject a negative example:
ndead(s) ()�s !⇤ s0 ^ s0 6! =) 9n 2 N . n 2 [[s0]]
�.
Example 2. Suppose a 2 N . Any closed state s0 reach-able from state s = a(⇤)⇤ is doomed to accept the negativeexample; no matter how the hole gets instantiated, the lan-guage of any reachable state includes the string a.
It is clear that a state is guaranteed to be dead if one ofpdead(s) and ndead(s) holds:
3 2016/6/4
Challenge: 매우 큰 상태공간
⇤
a ✏ ; ⇤+⇤
a+ a a+ ✏ a+ ; a+ (⇤+⇤)
a+ (a+ a) a+ (a+ ✏) a+ (a+ ;) · · ·
a+ (⇤ ·⇤)
· · ·
a+ (⇤⇤)
· · ·
✏+ a ✏+ ✏ ✏+ ; ✏+ (⇤+⇤)
· · ·
· · ·
⇤ ·⇤· · ·
⇤⇤
· · ·
Figure 1. Exhaustive Search
e1 ! e01e1 + e2 ! e01 + e2
e2 ! e02e1 + e2 ! e1 + e02
e1 ! e01e1 · e2 ! e01 · e2
e2 ! e02e1 · e2 ! e1 · e02
e ! e0
e⇤ ! e0⇤e ! e0
e? ! e0?
⇤ ! aa 2 ⌃ ⇤ ! ✏ ⇤ ! ;
⇤ ! ⇤+⇤ ⇤ ! ⇤ ·⇤ ⇤ ! ⇤⇤ ⇤ ! ⇤?
Figure 2. Transition Relation between States
for regular expression (e.g. c = 7). The number of states atdepth d in worst case is
N(0) = 1
N(d+ 1) = N(d) · c2d
when c = 7:
N(d) = 7Pd�1
k=0 2k 2 O(72d�1)
Search Strategy We pick a state that has a minimal cost,where the cost of states is defined as follows:
C(a) = C(✏) = C(;) = 1C(e1 + e2) = C(e1) + C(e2) + 5C(e1 · e2) = C(e1) + C(e2) + 5
C(e⇤) = C(e) + 5C(⇤) = 10
Intuitively, we prefer simpler expressions by following theprinciple of Ockham’s razor, so that the solution found isthe simplest regular expression that is consistent with theexamples.
Algorithm 1 Search AlgorithmInput: Positive and negative examples (P,N )Output: A regular expression E consistent with (P,N )
1: W := {⇤}2: repeat3: pick s from W4: if solution(s) then return s5: else6: W := W [ next(s)7: end if8: until W 6= ;
3.2 NormalizationExamples:
s⇤s⇤ ! s⇤
(s+ s) ! s
(s · s⇤)⇤ ! s⇤
...
3.3 Pruning Search SpaceDefinition 1 (Dead States). Let (P,N ) be a regular expres-sion problem. We say a state s 2 S is dead, denoted dead(s),iff every closed state s0 reachable from s is not a solution:
dead(s) ()�(s !⇤ s0) ^ s0 6! =) ¬solution(s0)
�.
Intuitively, a state s is dead if exploring further the reach-able states of s is guaranteed to fail to find a solution. Oursearch algorithm aims to identify as many dead states as pos-sible and does not attempt to explore beyond them. Specifi-cally, we identify two types of dead states: pdead and ndead.
Definition 2. A state s is dead for positive examples, de-noted pdead(s), iff every closed state s0 reachable from sfails to accept a positive example:
pdead(s) ()�s !⇤ s0 ^ s0 6! =) 9p 2 P. p 62 [[s0]]
�.
Example 1. Suppose b 2 P . Any closed state s0 reachablefrom state s = a·⇤ is doomed to reject the positive example;
3 2016/6/4
깊이 d에 있는 상태개수:
actively responds to each input by taking only a few secondsfor deriving new regular expressions that reflect the change.
Contributions This paper makes the following contribu-tions:• We present a new synthesis algorithm for synthesizing
regular expressions in realtime from examples. The mainnovelty is the techniques that effectively prune out largesearch space using over- and under-approximations ofregular expressions.
• We evaluate the proposed technique on 30 benchmarkproblems. The results show that our method quickly de-rive regular expressions on all of the benchmarks withinfew seconds.
• We implement the technique in a tool, ALPHAREGEX,and made it publicly available at http://prl.korea.ac.kr/AlphaRegex.
2. Regular Expression Problems2.1 Regular ExpressionsIntroductory textbooks on automata theory [? ? ? ] use thefollowing syntax for regular expressions:
e ! a 2 ⌃ | ✏ | ; | e1 + e2 | e1 · e2 | e⇤ (1)
A symbol a from an alphabet ⌃, the empty string ✏, and theempty language ;, constitute the primitive regular expres-sions. The remaining cases are inductively defined. Givenregular expressions e1 and e2, we can construct regular ex-pressions by taking the union e1 + e2 or the concatenatione1 · e2. e⇤ denotes the Kleene closure of e. In the introduc-tory courses, the alphabet is typically assumed to be binary;we assume ⌃ = {a, b} in the rest of this paper.
Formally, a regular expression e denotes a language (i.e.a set of strings). We write [[e]] ✓ ⌃⇤ for the language that edenotes, which is inductively defined as follows:
[[a]] = {a}[[✏]] = {✏}[[;]] = ;
[[e1 + e2]] = [[e1]] [ [[e2]][[e1 · e2]] = [[e1]][[e2]]
[[e⇤]] = [[e]]⇤
2.2 Regular Expression ProblemsIn a regular expression problem, students are given with adescription of a regular language L. We assume that the de-scription of a language is given by a pair (P,N ) of examplestrings, where P ✓ ⌃⇤ is a set of positive examples thatmust be included in the language and N ✓ ⌃⇤ is a set ofnegative examples that must be excluded from the language.Given (P,N ), the regular expression problem asks studentsto find a regular expression e that is consistent with the givenexamples:
8p 2 P.p 2 [[e]] ^ 8n 2 N .n 62 [[e]].
3. Our Synthesis Algorithm3.1 Basic Search AlgorithmSuppose a regular expression problem (P,N ) is given. Weformulate this problem as a search problem and present anefficient algorithm to find a solution. The search problem isdefined by a transition system (S,!, I, F ), where S is theset of states, (!) ✓ S ⇥ S is a transition relation, I 2 S isan initial state, and F ✓ S is a set of final, solution states.
• States: A state s 2 S is a partial regular expression thatpossibly has holes (⇤). A hole is a placeholder that canbe replaced by another regular expression. The set S ofstates is inductively defined as follows:
s ! a 2 ⌃ | ✏ | ; | s1 + s2 | s1 · s2 | s⇤ | ⇤ (2)
Note that a state has multiple holes. For example, (a +(⇤ ·⇤))⇤ is a state which has two holes in it.
• Initial State: The initial state is a single hole, i.e., I = ⇤.• Transition Relation: The transition relation (!) ✓ S ⇥S determines the next states of a given state. The transi-tion relation ! is inductively defined as a set of inferencerules in Figure 2. For example, (a+⇤)⇤ ! (a+(⇤·⇤))⇤
because we can find a derivation according to the infer-ence rules as follows:
⇤ ! ⇤ ·⇤(a+⇤) ! (a+ (⇤ ·⇤))
(a+⇤)⇤ ! (a+ (⇤ ·⇤))⇤
We write next(s) for the set of all states that follow s:
next(s) = {s0 | s ! s0}.
For example, when ⌃ = {a, b}, next(a + ⇤) = {(a +a)⇤, (a + b)⇤, (a + ✏)⇤, (a + ;)⇤, (a + (⇤ + ⇤))⇤, (a +(⇤ · ⇤))⇤, (a + (⇤⇤))⇤, (a + (⇤?))⇤}. We write s 6! toindicate that s has no next states; that is, s is a closedexpression with no holes.
• Solution States: A state s is a solution state iff s is aclosed expression (i.e., s 6!) and s is consistent with thegiven positive and negative examples:
solution(s) ()s 6! ^ 8p 2 P.p 2 [[s]] ^ 8n 2 N .n 62 [[s]].
Algorithm 1 presents a naive workset algorithm that solvesthe search problem. Initially, the workset consists of theinitial state (line 1). We choose and remove a state s fromthe workset (line 3). If a solution is found, it is returned.Otherwise, we search for the next states of s by adding theminto the workset.
Size of Search Space The maximum number of holes instate at depth d is 2d. The number of next states for a statewith n holes is cn, where c is the number of inductive rules
2 2016/6/4
✓ 840 lines in OCaml
✓ 학생들이 어려워하는 정규식 문제를 위주로
✓ 탐색 기법을 하나도 적용하지 않은 기본 알고리즘을 비교군으로
✓ 탐색 기법을 모두 적용한 알고리즘의 성능 및 향상폭 측정