mba ebooks ! edhole

12
EDUCATION HOLE PRESENTS THEORY OF AUTOMATA & FORMAL LANGUAGES Unit-III

Upload: edholecom

Post on 29-Jun-2015

19 views

Category:

Education


0 download

DESCRIPTION

Here you will get ebooks

TRANSCRIPT

Page 1: Mba ebooks ! Edhole

EDUCATION HOLE PRESENTS

THEORY OF AUTOMATA & FORMAL LANGUAGES

Unit-III

Page 2: Mba ebooks ! Edhole

Arden Theorem ........................................................................................................................ 2

Pumping Lemma for regular expressions .......................................................................................... 3

Use of lemma ................................................................................................................................... 4

Proof of the pumping lemma ........................................................................................................... 4 General version of pumping lemma for regular languages .................................................................................. 5 Converse of lemma not true ................................................................................................................................. 5

My hill-Nerode theorem .......................................................................................................... 6

Context free grammar: Ambiguity ........................................................................................... 8

Recognizing ambiguous grammars ................................................................................................... 8

Inherently ambiguous languages ...................................................................................................... 9

Simplification of CFGs .............................................................................................................. 9

Normal forms for CFGs ................................................................................................................... 10

Pumping lemma for CFLs ....................................................................................................... 11

Usage of the lemma ....................................................................................................................... 11

Ambiguous to Unambiguous CFG .......................................................................................... 12

Arden Theorem In theoretical computer science, Arden's rule, also known as Arden's lemma, is a mathematical statement about a certain form of language equations. Let P and Q be two Regular Expression s over Σ. If P does not contain Λ, then for the equation

R = Q + RP has a unique (one and only one) solution R = QP*.

Proof:

Now point out the statements in Arden's Theorem in General form.

Page 3: Mba ebooks ! Edhole

(i) P and Q are two Regular Expressions.

(ii) P does not contain Λ symbol.

(iii) R = Q + RP has a solution, i.e. R = QP*

(iv) This solution is the one and only one solution of the equation.

If R = QP* is a solution of the equation R = Q + RP then by putting the value of R in the equation we shall get the value ‘0’.

(Putting the value of R in the LHS we get)

So from here it is proved that R = QP* is a solution of the equation R = Q + RP.

Pumping Lemma for regular expressions

Let L be a regular language. Then there exists an integer p ≥ 1 depending only on L such that every string w in L of length at least p (p is called the "pumping length") can be written as w = xyz (i.e., w can be divided into three substrings), satisfying the following conditions:

1. |y| ≥ 1; 2. |xy| ≤ p 3. for all i ≥ 0, xyiz ∈ L

y is the substring that can be pumped (removed or repeated any number of times, and the resulting string is always in L). (1) means the loop y to be pumped must be of length at least one; (2) means the loop must occur within the first p characters. |x| must be smaller than p (conclusion of (1) and (2)), apart from that there is no restriction on x and z.

In simple words, for any regular language L, any sufficiently long word w (in L) can be split into 3 parts. i.e. w = xyz , such that all the strings xykz for k≥0 are also in L.

Page 4: Mba ebooks ! Edhole

Below is a formal expression of the Pumping Lemma.

Use of lemma

The pumping lemma is often used to prove that a particular language is non-regular: a proof by contradiction (of the language's regularity) may consist of exhibiting a word (of the required length) in the language which lacks the property outlined in the pumping lemma.

For example the language L = {anbn : n ≥ 0} over the alphabet Σ = {a, b} can be shown to be non-regular as follows. Let w, x, y, z, p, and i be as used in the formal statement for the pumping lemma above. Let w in L be given by w = apbp. By the pumping lemma, there must be some decomposition w = xyz with |xy| ≤ p and |y| ≥ 1 such that xyiz in L for every i ≥ 0. Using |xy| ≤ p, we know y only consists of instances of a. Moreover, because |y| ≥ 1, it contains at least one instance of the letter a. We now pump y up: xy2z has more instances of the letter a than the letter b, since we have added some instances of a without adding instances of b. Therefore xy2z is not in L. We have reached a contradiction. Therefore, the assumption that L is regular must be incorrect. Hence L is not regular.

The proof that the language of balanced (i.e., properly nested) parentheses is not regular follows the same idea. Given p, there is a string of balanced parentheses that begins with more than p left parentheses, so that y will consist entirely of left parentheses. By repeating y, we can produce a string that does not contain the same number of left and right parentheses, and so they cannot be balanced.

Proof of the pumping lemma

For every regular language there is a finite state automaton (FSA) that accepts the language. The numbers of states in such an FSA are counted and that count is used as the pumping length p. For a string of length at least p, let s0 be the start state and let s1, ..., sp be the sequence of the next p states visited as the string is emitted. Because the FSA has only p states, within this sequence of p + 1 visited states there must be at least one state that is repeated. Write S for such a state. The transitions that take the machine from the first encounter of state S to the second encounter of state S match some string. This string is called y in the lemma, and since the machine will match a string without the y portion, or the string y can be repeated any number of times, the conditions of the lemma are satisfied.

Page 5: Mba ebooks ! Edhole

For example, the following image shows an FSA.

The FSA accepts the string: abcd. Since this string has a length which is at least as large as the number of states, which is four, the pigeonhole principle indicates that there must be at least one repeated state among the start state and the next four visited states. In this example, only q1 is a repeated state. Since the substring bc takes the machine through transitions that start at state q1 and end at state q1, that portion could be repeated and the FSA would still accept, giving the string abcbcd. Alternatively, the bc portion could be removed and the FSA would still accept giving the string ad. In terms of the pumping lemma, the string abcd is broken into an x portion a, a y portion bc and a z portion d.

General version of pumping lemma for regular languages

If a language L is regular, then there exists a number p ≥ 1 (the pumping length) such that every string uwv in L with |w| ≥ p can be written in the form

uwv = uxyzv

with strings x, y and z such that |xy| ≤ p, |y| ≥ 1 and

uxyizv is in L for every integer i ≥ 0.

This version can be used to prove many more languages are non-regular, since it imposes stricter requirements on the language.

Converse of lemma not true

Note that while the pumping lemma states that all regular languages satisfy the conditions described above, the converse of this statement is not true: a language that satisfies these conditions may still be non-regular. In other words, both the original and the general version of the pumping lemma give a necessary but not sufficient condition for a language to be regular.

For example, consider the following language L:

.

Page 6: Mba ebooks ! Edhole

In other words, L contains all strings over the alphabet {0,1,2,3} with a substring of length 3 including a duplicate character, as well as all strings over this alphabet where precisely 1/7 of the string's characters are 3's. This language is not regular but can still be "pumped" with p = 5. Suppose some string s has length at least 5. Then, since the alphabet has only four characters, at least two of the five characters in the string must be duplicates. They are separated by at most three characters.

• If the duplicate characters are separated by 0 characters, or 1, pump one of the other two characters in the string, which will not affect the substring containing the duplicates.

• If the duplicate characters are separated by 2 or 3 characters, pump 2 of the characters separating them. Pumping either down or up results in the creation of a substring of size 3 that contains 2 duplicate characters.

• The second condition of L ensures that L is not regular: i.e., there are an infinite number of strings that are in L but cannot be obtained by pumping some smaller string in L.

For a practical test that exactly characterizes regular languages, see the Myhill-Nerode theorem. The typical method for proving that a language is regular is to construct either a finite state machine or a regular expression for the language.

My hill-Nerode theorem parity machine, acceptor accepting {anbm|n,m ≥ 1}, and many more such examples. Consider the serial adder. After getting some input, the machine can be in ‘carry’ state or ‘no carry’ state. It does not matter what exactly the earlier input was. It is only necessary to know whether it has produced a carry or not. Hence, the FSA need not distinguish between each and every input. It distinguishes between classes of inputs. In the above case, the whole set of inputs can be partitioned into two classes – one that produces a carry and another that does not produce a carry. Similarly, in the case of parity checker, the machine distinguishes between two classes of input strings: those containing odd number of 1’s and those containing even number of 1’s. Thus, the FSA distinguishes between classes of input strings. These classes are also finite. Hence, we say that the FSA has finite amount of memory.

The following three statements are equivalent.

1. L ⊆ Σ* is accepted by a DFSA. 2. L is the union of some of the equivalence classes of a right invariant equivalence relation

of finite index on Σ*. 3. Let equivalence relation RL be defined over Σ* as follows: xRL y if and only if, for all z

∊ Σ*, xz is in L exactly when yz is in L. Then RL is of finite index.

Proof We shall prove (1) ⇒ (2), (2) ⇒ (3), and (3) ⇒ (1).

Page 7: Mba ebooks ! Edhole

(1) ⇒ (2)

Let L be accepted by a FSA M = (K, Σ, δ, q0, F). Define a relation RM on Σ* such that xRMy if δ(q0 , x) = δ(q0,y). RM is an equivalence relation, as seen below.

∀x xRMx, since δ(q0 , x) = δ(q0, x),

∀x xRMy ⇒ yRMx ∵ δ(q0 , x) = δ(q0 , y) which means δ(q0 , y) = δ(q0, x),

∀x, y xRM y and yRMz ⇒ xRMz.

For if δ(q0 , x) = δ(q0 , y) and δ(q0 , y) = δ(q0 , z) then δ(q0 , x) = δ(q0, z).

So RM divides Σ* into equivalence classes. The set of strings which take the machine from q0 to a particular state qi are in one equivalence class. The number of equivalence classes is therefore equivalent to the number of states of M, assuming every state is reachable from q0. (If a state is not reachable from q0, it can be removed without affecting the language accepted). It can be easily seen that this equivalence relation RM is right invariant, i.e., if

xRM y, xzRM yz ∀z ∊ Σ*.

δ(q0, x) = δ (q0, y) if xRM y,

δ(q0, xz) = δ(δ (q0, x), z) = δ(δ (q0, y), z) = δ(q0, yz). Therefore xzRM yz.

L is the union of those equivalence classes of RM which correspond to final states of M.

(2) ⇒ (3)

Assume statement (2) of the theorem and let E be the equivalence relation considered. Let RL be defined as in the statement of the theorem. We see that xEy ⇒ xRL y.

If xEy, then xzEyz for each z ∊ Σ*. xz and yz are in the same equivalence class of E. Hence, xz and yz are both in L or both not in L as L is the union of some of the equivalence classes of E. Hence xRL y.

Hence, any equivalence class of E is completely contained in an equivalence class of RL. Therefore, E is a refinement of RL and so the index of RL is less than or equal to the index of E and hence finite.

(3) ⇒ (1)

First, we show RL is right invariant. xRL y if ∀z in Σ*, xz is in L exactly when yz is in L or we can also write this in the following way: xRL y if for all w, z in Σ*, xwz is in L exactly when ywz is in L.

Page 8: Mba ebooks ! Edhole

If this holds xwRLyw.

Therefore, RL is right invariant.

Let [x] denote the equivalence class of RL to which x belongs.

Construct a DFSA ML = (K′, Σ, δ′, q0, F′) as follows: K′ contains one state corresponding to each equivalence class of RL. [ε] corresponds to q′0. F′ corresponds to those states [x], x ∊ L. δ′ is defined as follows: δ′ ([x], a) = [xa]. This definition is consistent as RL is right invariant. Suppose x and y belong to the same equivalence class of RL. Then, xa and ya will belong to the same equivalence class of RL. For,

δ′([x], a) = δ′([y], a)

⇓ ⇓

[xa] = [ya]

if x ∊ L, [x] is a final state in M′, i.e., [x] ∊ F′. This automaton M′ accepts L.

Context free grammar: Ambiguity An ambiguous grammar is a formal grammar for which there exists a string that can have more than one leftmost derivation, while an unambiguous grammar is a formal grammar for which every valid string has a unique leftmost derivation. Many languages admit both ambiguous and unambiguous grammars, while some languages admit only ambiguous grammars. Any non-empty language admits an ambiguous grammar by taking an unambiguous grammar and introducing a duplicate rule or synonym (the only language without ambiguous grammars is the empty language). A language that only admits ambiguous grammars is called an inherently ambiguous language, and there are inherently ambiguous context-free languages. Deterministic context-free grammars are always unambiguous, and are an important subclass of unambiguous CFGs; there are non-deterministic unambiguous CFGs, however. For real-world programming languages, the reference CFG is often ambiguous, due to issues such as the dangling else problem. If present, these ambiguities are generally resolved by adding precedence rules or other context-sensitive parsing rules, so the overall phrase grammar is unambiguous.

Recognizing ambiguous grammars

The general decision problem of whether a grammar is ambiguous is undecidable because it can be shown that it is equivalent to the Post correspondence problem. At least, there are tools

Page 9: Mba ebooks ! Edhole

implementing some semi-decision procedure for detecting ambiguity of context-free grammars. The efficiency of context-free grammar parsing is determined by the automaton that accepts it. Deterministic context-free grammars are accepted by deterministic pushdown automata and can be parsed in linear time, for example by the LR parser.[2] This is a subset of the context-free grammars which are accepted by the pushdown automaton and can be parsed in polynomial time, for example by the CYK algorithm. Unambiguous context-free grammars can be nondeterministic. For example, the language of even-length palindromes on the alphabet of 0 and 1 has the unambiguous context-free grammar S → 0S0 | 1S1 | ε. An arbitrary string of this language cannot be parsed without reading all its letters first which means that a pushdown automaton has to try alternative state transitions to accommodate for the different possible lengths of a semi-parsed string.[3] Nevertheless, removing grammar ambiguity may produce a deterministic context-free grammar and thus allow for more efficient parsing. Compiler generators such as YACC include features for resolving some kinds of ambiguity, such as by using the precedence and associativity constraints.

Inherently ambiguous languages

Inherent ambiguity was proven with Parikh's theorem in 1961 by Rohit Parikh in an MIT research report.

While some context-free languages (the set of strings that can be generated by a grammar) have both ambiguous and unambiguous grammars, there exist context-free languages for which no unambiguous context-free grammar can exist. An example of an inherently ambiguous language

is the union of with . This set is context-free, since the union of two context-free languages is always context-free. But Hopcroft & Ullman (1979) give a proof that there is no way to unambiguously parse strings in the (non-context-free) subset which is the intersection of these two languages.

Simplification of CFGs a context-free grammar (CFG) is a formal grammar in which every production rule is of the form

V → w

Where V is a single nonterminal symbol, and w is a string of terminals and/or nonterminals (w can be empty). A formal grammar is considered "context free" when its production rules can be applied regardless of the context of a nonterminal. It does not matter which symbols the nonterminal is surrounded by, the single nonterminal on the left hand side can always be replaced by the right hand side. Languages generated by context-free grammars are known as

Page 10: Mba ebooks ! Edhole

context-free languages (CFL). Different Context Free grammars can generate the same context free language. It is important to distinguish properties of the language (intrinsic properties) from properties of a particular grammar (extrinsic properties). Given two context free grammars, the language equality question (do they generate the same language?) is undecidable. Context-free grammars are important in linguistics for describing the structure of sentences and words in natural language, and in computer science for describing the structure of programming languages and other formal languages. In linguistics, some authors use the term phrase structure grammar to refer to context-free grammars, whereby phrase structure grammars are distinct from dependency grammars. In computer science, a popular notation for context-free grammars is Backus–Naur Form, or BNF.

Normal forms for CFGs If L(G) does not contain , then G can have a CNF form with productions only of type

where

Page 11: Mba ebooks ! Edhole

Pumping lemma for CFLs The pumping lemma for context-free languages, also known as the Bar-Hillel lemma, is a lemma that gives a property shared by all context-free languages. If a language L is context-free, then there exists some integer p ≥ 1 such that every string s in L with |s| ≥ p (where p is a "pumping length") can be written as

s = uvxyz with substrings u, v, x, y and z, such that 1. |vxy| ≤ p, 2. |vy| ≥ 1, and 3. uv nxy nz is in L for all n ≥ 0.

Usage of the lemma

The pumping lemma for context-free languages can be used to show that certain languages are

not context-free. For example, we can show that language is not context-free by using the pumping lemma in a proof by contradiction. First, assume that is context free. By the pumping lemma, there exists an integer which is the pumping length of language

. Consider the string in . The pumping lemma tells us that can be written in

the form , where , and are substrings, such that ,

, and is in for every integer . By our choice of and the fact that

Page 12: Mba ebooks ! Edhole

, it is easily seen that the substring can contain no more than two distinct letters. That is, we have one of five possibilities for :

1. for some .

2. for some and with .

3. for some .

4. for some and with .

5. for some .

For each case, it is easily verified that does not contain equal numbers of each letter

for any . Thus, does not have the form . This contradicts the definition of . Therefore, our initial assumption that is context free must be false.

While the pumping lemma is often a useful tool to prove that a given language is not context-free, it does not give a complete characterization of the context-free languages. If a language does not satisfy the condition given by the pumping lemma, we have established that it is not context-free. On the other hand, there are languages that are not context-free, but still satisfy the condition given by the pumping lemma. There are more powerful proof techniques available, such as Ogden's lemma, but also these techniques do not give a complete characterization of the context-free languages.

Ambiguous to Unambiguous CFG While some context-free languages (the set of strings that can be generated by a grammar) have both ambiguous and unambiguous grammars, there exist context-free languages for which no unambiguous context-free grammar can exist. An example of an inherently ambiguous language is the union of with . This set is context-free, since the union of two context-free languages is always context-free. But Hopcroft & Ullman (1979) give a proof that there is no way to unambiguously parse strings in the (non-

context-free) subset which is the intersection of these two languages.