context free languages and pushdown automata

Context Free Languages and Pushdown

Automata

COMP2600 — Formal Methods for Software Engineering

Ranald Clouston

Australian National University

Semester 2, 2013

COMP 2600 — Context Free Languages and Pushdown Automata 1

Parsing

The process of parsing a program is partly about confirming that a given

program is well-formed – syntax .

But it is also about representing the structure of the program so that it can be

executed – semantics.

For this purpose, the trail of sentential forms created en route to generating a

given sentence is just as important as the question of whether the sentence

can be generated or not.


The Semantics of Parses

Take the code

if e1 then if e2 then s1 else s2

where e1, e2 are boolean expressions and s1, s2 are subprograms.

Does this mean

if e1 then ( if e2 then s1 else s2 )

or

if e1 then ( if e2 else s1 ) else s2

We’d better have an unambiguous way to tell which is right, or we cannot

know what the program will do at runtime!


Ambiguity

Recall that we can present CFG derivations as parse trees.

Until now this was mere pretty presentation; now it will become important.

A context-free grammar G is unambiguous iff every string can be derived

by at most one parse tree.

G is ambiguous iff there exists any word w ∈ L(G) derivable by more than

one parse tree.


Example: If-Then and If-Then-Else

Consider the CFG

S → if bexp then S | if bexp then S else S | prog

where bexp and prog stand for boolean expressions and (if-statement free)programs respectively, defined elsewhere.

The string if e1 then if e2 then s1 else s2 then has two parse trees:

S

yysssssss

""EEEEEE S

yysssssss

�� ""EEEEE

))RRRRRRRRRRRR

if e1 then S

uujjjjjjjjjjj

||yyyyyy�� ""EEEEEE if e1 then S

yysssssss

��

else S

��

if e2 then S

��

else S

��

if e2 then S

��

s2

s1 s2 s1



That grammar was ambiguous. But here’s a grammar accepting the exact

same language that is unambiguous:

S→ if bexp then S | T

T → if bexp then T else S | prog

There is now only one parse for if e1 then if e2 then s1 else s2.

This is given on the next slide:



S

yysssssss

""EEEEEE

if e1 then S

��

T

uujjjjjjjjjjj

||yyyyy�� ""EEEEEE

if e2 then T

��

else S

��

s1 T

��

s2

You cannot parse this string as if e1 then ( if e2 else s1 ) else s2.


Reflecting on This Example

We have seen that the same language can be presented with ambiguous

and unambiguous grammars.

Ambiguity is in general a property of grammars, not of languages.

From the point of view of semantics and parsing, it is often desirable to turn

an ambiguous grammar into an equivalent unambiguous grammar.

This generally involves choices that are not driven by the language itself.

E.g. there exists another grammar for the language of the previous slides that

is unambiguous and allows the parse if e1 then ( if e2 else s1 ) else s2 but

not if e1 then ( if e2 then s1 else s2 ).


What Ambiguity Isn’t

You might wonder if our grammar is still ambiguous, given the production

T → if bexp then T else S

After a derivation that uses this, we have a choice of whether we expand the

T or the S first. Is this an ambiguity?

No. A context-free grammar gets its name because non-terminals can be

expanded without regard for the context they appear in.

In other words, it doesn’t matter if you expand T or S ‘first’. From the perspec-

tive of the parse tree these expansions are happening in parallel.

This is a reason why a parse tree is a better formalism for presenting a CFG

derivation than listing each step.


Inherently Ambiguous Languages

In fact not all context-free languages can be given unambiguous grammars –some are inherently ambiguous. Consider the language

L = {aib jck | i = j or j = k}

How do we know that this is context-free? First, notice that

L = {aibick}∪{aib jc j}

We then combine CFGs for each side of this union (a standard trick):

S → T | W

T → UV W → XY

U → aUb | ε X → aX | ε

V → cV | ε Y → bY c | ε


Inherently Ambiguous Languages

The problem with this CFG is that the union we used has a non-empty inter-section, where the a’s, b’s, and c’s all have equal number.

The sentences in this intersection are a source of ambiguity:

S��

S��

T}}||||

BBBB W

}}{{{{!!CCCC

U~~}}}}�� BBBB V

�� AAAA X~~~~~~��

Y}}||||

�� @@@@

a U��

b c V��

a X��

b Y��

c

ε ε ε ε

In fact (not proved here!) there is no alternative choice of grammar for thislanguage that avoids this ambiguity.


The Bad News

We would like to have an algorithm that turns grammars into equivalent un-

ambiguous grammars where possible.

However this is an uncomputable problem.

Worse – determining whether a grammar is ambiguous or not in the first place

is also uncomputable!

Uncomputable problems are everyone in computer science. You will see

many more next week in particular.

The best response is not despair; rather we should see what tricks and tech-

niques might help us out at least some of the time.


Example: Subtraction

Consider the grammar

S → S−S | int

where int could be any integer and the symbol ‘−’ is intended to be executed

as subtraction.

This grammar is ambiguous, and this matters:

S

��~~~~~~

�� >>>>>> S

��

�� @@@@@@

S

��

�� @@@@@@ − 1 5 − S

��~~~~~~

�� >>>>>>

5 − 3 3 − 1

The left tree evaluates 5−3−1 to 1; the left evaluates the same string to 3!


Technique 1: Associativity

We can remove the ambiguity of a binary infix operator by making it associate

to the left or right.

S → S− int | int

Now 5−3−1 can only be read as (5−3)−1 - this is left associativity.

For right associativity we would use the production S → int−S instead.

Idea: Force one side of our operator to a ‘lower’ level, making sure we avoid

loops in our grammar that allow the original non-terminal to be recovered.

Here we force the right hand side of the minus sign to the lowest possible

level – a terminal symbol.


Example: Multiplication and Addition

S → S∗S | S+S | int

where ∗ is to be executed as multiplication and + as addition.

Again this is obviously ambiguous – 1+2∗3 could evaluate to 7 or 9.

(Note that 1+ 2+ 3 is also a source of ambiguity as it can be produced by different

parse trees, even though it is not ambiguous in the interpretation we have in mind.)

If all we care about is resolving ambiguity we can use the same trick as the

last slide, making both ∗ and + left (or right) associative.

But this is not the behaviour we expect from these operations: we expect ∗ to

have higher precedence than +.


Technique 2: Precedence

S→ S+T | T

T → T ∗ int | int

Given a string 1+2∗3, or 2∗3+1, we have no choice but to expand to S+Tfirst, so that (thinking bottom-up) the + will be last command to be executed.

Suppose we tried to derive 1+ 2 ∗ 3 by first doing S⇒ T ⇒ T ∗ 3. We are

then stuck because we cannot send T to 1+2!

As with associativity this trick works by forcing ourselves down to a lower level

to generate parts of our sentences. Here we have three levels: S, then T , then

the integers as non-terminals.


Example: Basic Arithmetic

S → S+T | S−T | T

T → T ∗U | T/U | U

U → (S) | int

Note that we have brackets available to give a clearly labelled way to loop

back to the top – if we want to break the usual rules of arithmetic to get the

execution (1+2)∗3 then we must indicate this with explicit brackets.

(Note also that the previous slides’ languages were actually regular and so

could have been generated by right-linear grammars. The language above is

truly context-free because of the need to keep track of bracket balancing to

an arbitrary depth.)


Example: Balanced Brackets

The following grammar generates a language where each left bracket is ex-

actly matched by a closing bracket:

S → ε | (S) | SS

This is ambiguous in a rather stupid way: to generate () we could use the

derivation S⇒ (S)⇒ (), which seems sensible.

But we could also start with S⇒ SS, then use the left S to derive () as above

while sending the right to ε immediately. Or we could have done the opposite.

In fact there are infinitely many parse trees for any string in the language!

Edit: There is another source of ambiguity here – ()()() has two parses even

without such ‘abuse’ of ε. We fix this on the next slide with left associativity.


Technique 3: Controlling ε

Instead, we ensure there is only one way to access the empty string:

S→ ε | T

T → TU | U

U → () | (T )

The string ε can be derived directly from S, but everything else must go

through T .

This is another technique that relies on creating a hierarchy of levels within

our grammar.

ε as a source of ambiguity can be easy to miss, so keep your eyes out for it!


Hierarchy of Automata

We have seen that there is a hierarchy of grammars. A language may be

generated by grammars at a certain level, but not by less powerful grammars.

We have seen that regular (right-linear) grammars are exactly as expressive

as finite state automata.

Are there notions of automata exactly as expressive as context-free gram-

mars? Context-sensitive grammars? Unrestricted grammars?

Yes - there is a hierarchy of automata.

Just as we needed a general definition of grammar to define their hierarchy,

we need a general view of automata...


General Structure of Automata

Finite

State

Control

input tapereadhead

a0 a1 a2 ... an. . . .

Auxiliary

Memory

The input tape is a sequence of tokens.

Each time a symbol is processed the read head advances.

The finite state control (FSC) can be in any one of a finite number of states.

The auxiliary memory is usually a linear organisation (e.g. a stack or array).


General Automata ctd

Each action of the machine may change the FSC state, change the auxiliary

memory, or advance to the next input symbol, ‘consuming’ the current one.

The action of the machine depends on the current input symbol, the current

FSC state, and the current memory.

The machine starts in some particular start state (q0), with the read head at

the first input symbol, and the memory containing only the start symbol (Z).

A machine accepts an input string as a sentence of the language if it reaches

a final state (in F) with the input exhausted

(we sometimes also require that the memory be emptied).


Automata and Grammars

The kind of auxiliary memory in a machine determines the class of languagesthat the machine can recognise:

Language Class Memory

regular none

context-free stack

context-sensitive tape (bounded by input length)

unrestricted unbounded tape

We have already looked at Finite State Automata (automata without memory)and their relation to regular languages.

We now consider Push-Down Automata (i.e. automata with stack memory)and their relation to context-free grammars and languages.


Push-down Automata — PDA

stack

memoryz2

z1

zk

Finite

Control

input tapereadhead

a0 a1 a2 ... an. . . .

State


PDAs ctd

Each action of the machine may involve change to the FSC state, pushing or

popping the stack, and advancing to the next input symbol.

The action of the machine may depend on the current FSC state, the current

input symbol, and the current top-of-stack symbol.

The machine accepts an input string if it reaches a final state, with the input

exhausted, and the stack empty.

ASIDE: Other (equivalent) definitions of PDAs exist, where

• an input string is accepted if the PDA has an empty stack with the input

exhausted

• an input string is accepted if the PDA is in a final state with the input

exhausted


Example

{anbn | n≥ 1}

Recall that this language cannot be recognised by a FSA (because there can

only be a finite number of states), but can be generated by a context-free

grammar.

It can also be recognised by a PDA.

Ad hoc design:

• phase 1: (state q1) push a’s from the input onto the stack

• phase 2: (state q2) pop a’s from the stack, if there is a b on input

• finalise: if the stack is empty and the input is exhausted in the final state

(q3), accept the string.


Deterministic PDA – Definition

A deterministic PDA has the form (Q,q0,F, Σ, Γ,Z, δ), where

• Q is the set of states, q0 ∈ Q is the initial state and F ⊆ Q are the final

states;

• Σ is the alphabet, or set of input symbols;

• Γ is the set of stack symbols, and Z ∈ Γ is the initial stack symbol;

• δ is a (partial) transition function

δ : Q× (Σ∪{ε})×Γ → Q×Γ∗

δ : (state, input token or ε, top-of-stack) → (new state,stack string)

such that for all q ∈ Q and s ∈ Γ, if δ(q,ε,s) is defined then δ(q,a,s) is

undefined for all a ∈ Σ. This condition is vital to maintain determinism.


Notation

Given the transition function

δ : Q× (Σ∪{ε})×Γ → Q×Γ∗


we write δ(q,a,s) = q′/σ to show how this function operates.

Then the string σ in the result is the string with which we replace the top-of-

stack symbol s.

(Doing this makes it simple to specify pushes and pops in a uniform way.)

We will usually indicate final states by underlining them, e.g. q.


Two types of PDA transition

• when δ contains (q1,x,s) 7→ q2/σ,

state input stack

the PDA can go from q1 next symbol is x s on top-of-stack

to q2 x is consumed σ replaces s on stack

• when δ contains (q1,ε,s) 7→ q2/σ,

state input stack

the PDA can go from q1 any input, or none s on top-of-stack

to q2 input is ignored σ replaces s on stack

Such transitions do not look at or consume an input symbol.


Example ctd

Recall that the stack starts off containing only the initial symbol Z.

PDA to recognise anbn (start state q0, final state(s) underlined):

δ(q0,a,Z) = q1/aZ · · · push first a

δ(q1,a,a) = q1/aa · · · push further a’s

δ(q1,b,a) = q2/ε · · · start popping a’s

δ(q2,b,a) = q2 / ε · · · pop further a’s

δ(q2,ε,Z) = q3/ε · · · accept

Note that δ is a partial function, undefined for many arguments.


Example ctd — PDA Trace

PDA configurations can be written as a triple (state, remaining input, stack)with the top of stack to the left.

(q0,aaabbb,Z) ⇒ (q1,aabbb,aZ) (push first a)

⇒ (q1,abbb,aaZ) (push further a’s)

⇒ (q1,bbb,aaaZ) (push further a’s)

⇒ (q2,bb,aaZ) (start popping a’s)

⇒ (q2,b,aZ) (pop further a’s)

⇒ (q2,ε,Z) (pop further a’s)

⇒ (q3,ε,ε) (accept)

The machine halts in the final state with input exhausted and an empty stack,so the string is accepted.


Example ctd — Rejection

The string aaba should be rejected by this PDA:

(q0,aaba,Z) ⇒ (q1,aba,aZ)

⇒ (q1,ba,aaZ)

⇒ (q2,a,aZ)

⇒ ???

No transition applies, and the PDA is “stuck” without reaching a final state.

Rejection happens when the transition function is undefined for the current

configuration ( that is, state, input symbol and top of stack symbol ).


Example: Palindromes with ‘Centre Mark’

Consider the language

{wcwR | w ∈ {a,b}∗ ∧ wR is w reversed}

This is context-free, and we can design a deterministic PDA to accept it:

• Push a’s and b’s onto the stack as we seem them;

• When we see c, change state;

• Now try to match the tokens we are reading with the tokens on top of the

stack, popping as we go;

• If the top of the stack is the empty stack symbol Z, pop it and enter the

final state via an ε-transition. Hopefully our input has been used up too!

Full formal details are left as an exercise.


Non-Deterministic PDAs

For deterministic PDAs transitions are a (partial) function:

δ : Q× (Σ∪{ε})×Γ → Q×Γ∗


with a side-condition about ε-transitions.

For non-deterministic PDAs transitions are a relation

δ ⊆ Q× (Σ∪{ε})×Γ × Q×Γ∗

with no side-condition.

There may be configurations where there are multiple choices of transition.


Non-Deterministic PDAs ctd.

Recall for finite state automata, non-determinism can be more convenient,

but does not give extra power (thanks to the subset construction).

This is not the case for PDAs - non-deterministic PDAs can recognise lan-

guages that deterministic PDAs cannot.

It turns out that non-deterministic PDAs are the more important from the per-

spective of Chomsky’s hierarchy.


Example: Even-Length Palindromes

The language of even length palindromes

{wwR | w ∈ {a,b}∗ ∧ wR is w reversed}

is context-free but cannot be recognised by a deterministic PDA, because

without a centre mark it cannot know whether it is in the first half of a sentence

(so should be pushing everything into memory) or the second half (so it should

be matching input and stack, and popping).

But a non-deterministic PDA can recognise this language via the transition

δ(q,ε,x) = r/x

where x ∈ {a,b,Z}, q is the ‘push’ state, and r the ‘match and pop’ state.

In other words, we continually ‘guess’ which job we should be doing!


Grammars and PDAs

Theorem

The class of languages recognised by non-deterministic PDA’s is exactly the

class of context-free languages.

We will only justify this result in one direction: for any CFG, there is a corre-

sponding PDA.

This is the most interesting direction since it is the basis of automatically

deriving parsers from grammars.

(The other direction is a bit complicated).


From CFG to PDA

The PDA has three states: q0 (initial), q1 (processing), and q2 (final). Its

alphabet will be that of the CFG, and its stack symbols will be all the CFG’s

terminals and non-terminals.

1. Initialise the process by pushing the start symbol S onto the stack, and

enter state q1:

δ(q0,ε,Z) 7→ q1/SZ

2. If a non-terminal is on top of stack, replace it with the right hand side of a

production. For all productions A→ α:

δ(q1,ε,A) 7→ q1/α

continued. . .


From CFG to PDA, ctd

3. For all terminal symbols t, pop the stack if it matches the input:

δ(q1, t, t) 7→ q1/ε

4. For termination, add the transition, with final state q2:

δ(q1,ε,Z) 7→ q2/ε

In general we get a non-deterministic PDA since there may be several pro-

ductions for each non-terminal.


Example — Derive a PDA for a CFG

S → S+T | T

T → T ∗U | U

U → (S) | int

1. Initialise:

δ(q0,ε,Z) 7→ q1/SZ

2. Expand non-terminals:

δ(q1,ε,S) 7→ q1/S+T δ(q1,ε,T ) 7→ q1/U

δ(q1,ε,S) 7→ q1/T δ(q1,ε,U) 7→ q1/(S)

δ(q1,ε,T ) 7→ q1/T ∗U δ(q1,ε,U) 7→ q1/int


CFG to PDA ctd

3. Match and pop terminals:

δ(q1,+,+) 7→ q1/ε

δ(q1,∗,∗) 7→ q1/ε

δ(q1, int, int) 7→ q1/ε

δ(q1,(,() 7→ q1/ε

δ(q1,),)) 7→ q1/ε

4. Terminate:

δ(q1,ε,Z) 7→ q2/ε


Example Trace

(q0, int∗ int, Z) ⇒ (q1, int∗ int, SZ)

⇒ (q1, int∗ int, T Z)

⇒ (q1, int∗ int, T ∗UZ)

⇒ (q1, int∗ int, U ∗UZ)

⇒ (q1, int∗ int, int∗UZ)

⇒ (q1, ∗int, ∗UZ)

⇒ (q1, int, UZ)

⇒ (q1, int, intZ)

⇒ (q1, ε, Z)

⇒ (q2, ε, ε)

⇒ accept


A Context-Sensitive Language

In case you’re curious, a brief look at context-sensitive languages.

The following language is not context-free:

{anbncn | n≥ 1}

Intuitively, we can imagine a CFG generating either the ab pairs or the bcpairs, but this language requires us to keep the generation process in step, in

two different points in the sentential forms.

A context-sensitive grammar is on the next slide. Each production has a right-

hand side of equal or longer length than their left.

(If we wanted to include the empty word also, we need a special exemption to this

requirement.)


A Context-Sensitive Language ctd.

On the left we have a context-sensitive grammar, and on the right an example

derivation:

S → aBC

S → aSBC

CB → BC

aB → ab

bB → bb

bC → bc

cC → cc

S → aSBC

→ aaBCBC

→ aabCBC

→ aabBCC

→ aabbCC

→ aabbcC

→ aabbcc

The automata that recognise CSGs have a tape memory, of length bounded

by a linear function of the length of the input.


context free languages and pushdown automata

Documents