formalising the normal forms of cfgs in hol4users.cecs.anu.edu.au/~aditi/tum.pdf · context-free...

31
Formalising the Normal Forms of CFGs in HOL4 Aditi Barthwal 1 Michael Norrish 2 1 Australian National University 2 NICTA Technische Universit¨ at M ¨ unchen September 2010 Aditi Barthwal CFG Normal Forms 1/31

Upload: others

Post on 11-Jun-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Formalising the Normal Forms of CFGs in HOL4users.cecs.anu.edu.au/~aditi/tum.pdf · Context-free grammars G =(V T P S), where V = finite set of variables or nonterminals T = finite

Formalising the Normal Forms of CFGs in HOL4

Aditi Barthwal1 Michael Norrish2

1 Australian National University2 NICTA

Technische Universitat Munchen

September 2010

Aditi Barthwal CFG Normal Forms 1/31

Page 2: Formalising the Normal Forms of CFGs in HOL4users.cecs.anu.edu.au/~aditi/tum.pdf · Context-free grammars G =(V T P S), where V = finite set of variables or nonterminals T = finite

Context-free grammars

G = (V ;T ;P;S), where

V = finite set of variables or nonterminals

T = finite set of terminals

P = finite set of productions, each one of form A ! �, where

A 2 V and � is a string of symbols such that � 2 (V [ T )�S = start symbol

A word is a string over terminals.

Language of G, L(G), are all the words reachable from the start

symbol.

Aditi Barthwal CFG Normal Forms 2/31

Page 3: Formalising the Normal Forms of CFGs in HOL4users.cecs.anu.edu.au/~aditi/tum.pdf · Context-free grammars G =(V T P S), where V = finite set of variables or nonterminals T = finite

CFGs — The HOL Version

Types:

(’nts, ’ts) symbol = NTS of ’nts | TS of ’ts

(’nts, ’ts) rule

= rule of ’nts => (’nts, ’ts) symbol list

(’nts, ’ts) grammar

= G of (’nts, ’ts) rule list => ’nts

A grammar’s language:

L g =f tsl |

(derives g)� [NTS (startSym g)] tsl ^isWord tsl g

Aditi Barthwal CFG Normal Forms 3/31

Page 4: Formalising the Normal Forms of CFGs in HOL4users.cecs.anu.edu.au/~aditi/tum.pdf · Context-free grammars G =(V T P S), where V = finite set of variables or nonterminals T = finite

Results I will not talk about

Simplification/normalisation of CFGs by

removing symbols that do not generate a terminal string or

are not reachable from the start symbol of the grammar

(useless symbols);

Aditi Barthwal CFG Normal Forms 4/31

Page 5: Formalising the Normal Forms of CFGs in HOL4users.cecs.anu.edu.au/~aditi/tum.pdf · Context-free grammars G =(V T P S), where V = finite set of variables or nonterminals T = finite

Results I will not talk about

Simplification/normalisation of CFGs by

removing symbols that do not generate a terminal string or

are not reachable from the start symbol of the grammar

(useless symbols);

removing �-productions (as long as � is not in the language

generated by the grammar);

Aditi Barthwal CFG Normal Forms 5/31

Page 6: Formalising the Normal Forms of CFGs in HOL4users.cecs.anu.edu.au/~aditi/tum.pdf · Context-free grammars G =(V T P S), where V = finite set of variables or nonterminals T = finite

Results I will not talk about

Simplification/normalisation of CFGs by

removing symbols that do not generate a terminal string or

are not reachable from the start symbol of the grammar

(useless symbols);

removing �-productions (as long as � is not in the language

generated by the grammar);

removing unit productions, i.e. ones of the form A ! B where

B is a nonterminal symbol.

Aditi Barthwal CFG Normal Forms 6/31

Page 7: Formalising the Normal Forms of CFGs in HOL4users.cecs.anu.edu.au/~aditi/tum.pdf · Context-free grammars G =(V T P S), where V = finite set of variables or nonterminals T = finite

Chomsky Normal Form

A grammar G is in Chomsky Normal Form if every rule is of the

form

A ! A1A2

where Ai is a non-terminal

or

A ! a

where a is a terminal.

Aditi Barthwal CFG Normal Forms 7/31

Page 8: Formalising the Normal Forms of CFGs in HOL4users.cecs.anu.edu.au/~aditi/tum.pdf · Context-free grammars G =(V T P S), where V = finite set of variables or nonterminals T = finite

The Chomsky Normal Form Theorem

Language Equivalence

INFINITE U(:’nts) ^ [] =2 L g )9 g0: isCnf g0 ^ L g = L g0Proof:

H&U’s proof is 3.5 pages long with examples

The HOL proof is 1444 loc

Translation from H&U to HOL is straightforward

Aditi Barthwal CFG Normal Forms 8/31

Page 9: Formalising the Normal Forms of CFGs in HOL4users.cecs.anu.edu.au/~aditi/tum.pdf · Context-free grammars G =(V T P S), where V = finite set of variables or nonterminals T = finite

Point of Difference from Text Proof

Assumption INFINITE U (:’nts)

Required because Need to introduce a new nonterminal not in g

Aditi Barthwal CFG Normal Forms 9/31

Page 10: Formalising the Normal Forms of CFGs in HOL4users.cecs.anu.edu.au/~aditi/tum.pdf · Context-free grammars G =(V T P S), where V = finite set of variables or nonterminals T = finite

Point of Difference from Text Proof

Assumption INFINITE U (:’nts)

Required because Need to introduce a new nonterminal not in g

(S1) Universal set of nonterminals is infinite

Aditi Barthwal CFG Normal Forms 10/31

Page 11: Formalising the Normal Forms of CFGs in HOL4users.cecs.anu.edu.au/~aditi/tum.pdf · Context-free grammars G =(V T P S), where V = finite set of variables or nonterminals T = finite

Point of Difference from Text Proof

Assumption INFINITE U (:’nts)

Required because Need to introduce a new nonterminal not in g

(S1) Universal set of nonterminals is infinite +

(S2) Nonterminals in g are finite

Aditi Barthwal CFG Normal Forms 11/31

Page 12: Formalising the Normal Forms of CFGs in HOL4users.cecs.anu.edu.au/~aditi/tum.pdf · Context-free grammars G =(V T P S), where V = finite set of variables or nonterminals T = finite

Point of Difference from Text Proof

Assumption INFINITE U (:’nts)

Required because Need to introduce a new nonterminal not in g

(S1) Universal set of nonterminals is infinite +

(S2) Nonterminals in g are finite )Can pick a nonterminal that is in S1 but not in S2

Aditi Barthwal CFG Normal Forms 12/31

Page 13: Formalising the Normal Forms of CFGs in HOL4users.cecs.anu.edu.au/~aditi/tum.pdf · Context-free grammars G =(V T P S), where V = finite set of variables or nonterminals T = finite

The Relational Approach to Grammar Transformation

Both normalisations feature “non-determinism”:

choice of fresh non-terminals

order in which rules are transformed

Rather than define a function, use a “one-step” relation:

R : grammar ! grammar ! bool

(Additional parameters possible: e.g. fresh symbols)

Show:

Each application of R preserves language equality

There is always a step possible while grammar has not

reached final form

Aditi Barthwal CFG Normal Forms 13/31

Page 14: Formalising the Normal Forms of CFGs in HOL4users.cecs.anu.edu.au/~aditi/tum.pdf · Context-free grammars G =(V T P S), where V = finite set of variables or nonterminals T = finite

Greibach Normal Form (GNF)

A grammar G is in Greibach Normal Form if every rule is of the

form

A ! aA1A2 : : :An

where n � 0.

Aditi Barthwal CFG Normal Forms 14/31

Page 15: Formalising the Normal Forms of CFGs in HOL4users.cecs.anu.edu.au/~aditi/tum.pdf · Context-free grammars G =(V T P S), where V = finite set of variables or nonterminals T = finite

The GNF Destination

Language Equivalence

INFINITE U(:’nts) ^ [] =2 L g )9 g0: isGnf g0 ^ L g = L g0Proof (in H&U):

3 pages long

Includes a crucial picture

Aditi Barthwal CFG Normal Forms 15/31

Page 16: Formalising the Normal Forms of CFGs in HOL4users.cecs.anu.edu.au/~aditi/tum.pdf · Context-free grammars G =(V T P S), where V = finite set of variables or nonterminals T = finite

The Crux of GNF

The central issue in the proof is dealing with left-recursion: rules

of the form

A ! A �or loops such as

A ! B �B ! C C ! A Æ

Aditi Barthwal CFG Normal Forms 16/31

Page 17: Formalising the Normal Forms of CFGs in HOL4users.cecs.anu.edu.au/~aditi/tum.pdf · Context-free grammars G =(V T P S), where V = finite set of variables or nonterminals T = finite

GNF: Step 0

Convert grammar to Chomsky Normal Form.

Aditi Barthwal CFG Normal Forms 17/31

Page 18: Formalising the Normal Forms of CFGs in HOL4users.cecs.anu.edu.au/~aditi/tum.pdf · Context-free grammars G =(V T P S), where V = finite set of variables or nonterminals T = finite

GNF: Step 1

Order the non-terminals. (Another source of non-determinism!)

“Substitute out” variable references so that

Ai ! Aj �only occurs if j > i

(Hard in presence of left-recursion!)

Aditi Barthwal CFG Normal Forms 18/31

Page 19: Formalising the Normal Forms of CFGs in HOL4users.cecs.anu.edu.au/~aditi/tum.pdf · Context-free grammars G =(V T P S), where V = finite set of variables or nonterminals T = finite

GNF: Step 1 (The Easy Case)

Working on Ai .

Assume that all Aj<i have been done.

In order (j = 1 : : : i � 1), if rule is Ai ! Aj �take all possible RHSes for Aj (�1 : : : �n)

replace rule above with Ai ! �k � (k 2 f1 : : : ng)

(Each replacement preserves the language (H&U Lemma 4.3))

May result in a rule Ai ! Ai . . .

Aditi Barthwal CFG Normal Forms 19/31

Page 20: Formalising the Normal Forms of CFGs in HOL4users.cecs.anu.edu.au/~aditi/tum.pdf · Context-free grammars G =(V T P S), where V = finite set of variables or nonterminals T = finite

GNF: Step 1 (The Hard Bit)

May now have a left-recursive rule A ! A�(No left-recursive cycles possible though.)

Aditi Barthwal CFG Normal Forms 20/31

Page 21: Formalising the Normal Forms of CFGs in HOL4users.cecs.anu.edu.au/~aditi/tum.pdf · Context-free grammars G =(V T P S), where V = finite set of variables or nonterminals T = finite

Hopcroft & Ullman Lemma 4.4: the “left to right” lemma

Change the left recursive rules into right recursive rules.

Lemma (“left to right lemma”)

Let g = (V ;T ;P;S) be a CFG. Let A ! A�1 j A�2 j : : : j A�r be

the set of left recursive A-productions. Let A ! �1 j �2 j : : : j �s

be the remaining A-productions. Then we can construct

g0 = (V [ fBg;T ;P1;S) such that L(g) = L(g0) by replacing all

the left recursive A-productions by the following productions:

Rule 1 A ! �i and A ! �iB

Rule 2 B ! �i and B ! �iB

Here, B is a fresh nonterminal that does not belong in g.

Aditi Barthwal CFG Normal Forms 21/31

Page 22: Formalising the Normal Forms of CFGs in HOL4users.cecs.anu.edu.au/~aditi/tum.pdf · Context-free grammars G =(V T P S), where V = finite set of variables or nonterminals T = finite

Hopcroft & Ullman’s Picture

Any derivation in the left-recursive grammar can be mimicked in

the right-recursive grammar, and vice versa:

A

a1

A

bA

a2

B

an

B

a2

A

anA

b

B

a1

Aditi Barthwal CFG Normal Forms 22/31

Page 23: Formalising the Normal Forms of CFGs in HOL4users.cecs.anu.edu.au/~aditi/tum.pdf · Context-free grammars G =(V T P S), where V = finite set of variables or nonterminals T = finite

Hopcroft & Ullman’s Picture

Any derivation in the left-recursive grammar can be mimicked in

the right-recursive grammar, and vice versa:

A

a1

A

bA

a2

B

an

B

a2

A

anA

b

B

a1

Derivation A ! Aa1 ! Aa2a1 ! � � � ! An : : : a2a1 ! ban : : : a2a1

can be transformed into derivation

A ! bB ! ban ! � � � ! ban : : : a2 ! ban : : : a2a1.

Aditi Barthwal CFG Normal Forms 23/31

Page 24: Formalising the Normal Forms of CFGs in HOL4users.cecs.anu.edu.au/~aditi/tum.pdf · Context-free grammars G =(V T P S), where V = finite set of variables or nonterminals T = finite

Realising the Picture Formally

A

a1

A

bA

a2

B

an

B

a2

A

anA

b

B

a1

A-block

B-block

Proof by induction on block.

Aditi Barthwal CFG Normal Forms 24/31

Page 25: Formalising the Normal Forms of CFGs in HOL4users.cecs.anu.edu.au/~aditi/tum.pdf · Context-free grammars G =(V T P S), where V = finite set of variables or nonterminals T = finite

The “left to right” lemma

Result: Language Equivalence8 g g0: left2Right A B g g0 ) L g = L g0Aditi Barthwal CFG Normal Forms 25/31

Page 26: Formalising the Normal Forms of CFGs in HOL4users.cecs.anu.edu.au/~aditi/tum.pdf · Context-free grammars G =(V T P S), where V = finite set of variables or nonterminals T = finite

GNF: Step 2 (A-productions to a-productions)

a-productions Let a-productions be rules of the form A ! a�where a is a terminal symbol.

Ai ! Aj� in g1 are replaced by Ai ! a��, where Aj ! a�Aditi Barthwal CFG Normal Forms 26/31

Page 27: Formalising the Normal Forms of CFGs in HOL4users.cecs.anu.edu.au/~aditi/tum.pdf · Context-free grammars G =(V T P S), where V = finite set of variables or nonterminals T = finite

GNF: Step 2 (A-productions to a-productions)

Look at nonterminals in decreasing order, Aj , Aj�1,. . . ,A1

Grammar must have Aj ! a�, for some terminal a

If Aj�1 ! b� for some terminal b and some symbols �Done

If Aj�1 ! Aj� for some symbols �replace Aj with a�

Repeat for Aj�2 to A1

Aditi Barthwal CFG Normal Forms 27/31

Page 28: Formalising the Normal Forms of CFGs in HOL4users.cecs.anu.edu.au/~aditi/tum.pdf · Context-free grammars G =(V T P S), where V = finite set of variables or nonterminals T = finite

GNF: Step 3 (B-productions to a-productions)

Bk ! Ai� in g2 are replaced with Bk ! a��, where Ai ! a�Aditi Barthwal CFG Normal Forms 28/31

Page 29: Formalising the Normal Forms of CFGs in HOL4users.cecs.anu.edu.au/~aditi/tum.pdf · Context-free grammars G =(V T P S), where V = finite set of variables or nonterminals T = finite

The Proof Effort in Summary

�1 year�14000 lines of code�700 lemmas and theorems

+ library of common definitions and theorems

Aditi Barthwal CFG Normal Forms 29/31

Page 30: Formalising the Normal Forms of CFGs in HOL4users.cecs.anu.edu.au/~aditi/tum.pdf · Context-free grammars G =(V T P S), where V = finite set of variables or nonterminals T = finite

Conclusion

Relational idiom for non-determinism

Mechanisation of Chomsky Normal Form

Mechanisation of Greibach Normal Form

Lemma 4.3 — substituting out non-terminal references

Lemma 4.4 — removal of left-recursion

Translation of H&U’s picture into an induction

Aditi Barthwal CFG Normal Forms 30/31

Page 31: Formalising the Normal Forms of CFGs in HOL4users.cecs.anu.edu.au/~aditi/tum.pdf · Context-free grammars G =(V T P S), where V = finite set of variables or nonterminals T = finite

Hopcroft & Ullman Lemma 4.3

Let A-productions be those productions whose LHS is the

nonterminal A.

Lemma (“aProds lemma”)

Let G = (V ;T ;P;S) be a CFG. Let A ! �1B�2 be a production in

P and B ! �1j�2j : : : j�r be the set of all B-productions. Let

G1 = (V ;T ;P1;S) be obtained from G by deleting the production

A ! �1B�2 from P and adding the productions

A ! �1�1�2j�1�2�2j : : : j�1�2�2. Then L(G) = L(G1).

Aditi Barthwal CFG Normal Forms 31/31