14-sep-15 implementing a bnf grammar example design of a data structure

27
Jun 15, 2 022 Implementing a BNF Grammar Example design of a data structure

Upload: david-mclaughlin

Post on 28-Dec-2015

244 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 14-Sep-15 Implementing a BNF Grammar Example design of a data structure

Apr 19, 2023

Implementing a BNF Grammar

Example design of a data structure

Page 2: 14-Sep-15 Implementing a BNF Grammar Example design of a data structure

Form of BNF rules <BNF rule> ::= <nonterminal> "::=" <definitions> <nonterminal> ::= "<" <words> ">" <terminal> ::= <word> | <punctuation mark> | ' " ' <any

chars> ' " ' <words> ::= <word> | <words> <word> <word> ::= <letter> | <word> <letter> | <word> <digit> <definitions> ::= <definition> | <definitions> "|" <definition> <definition> ::= <empty> | <term> | <definition> <term> <empty> ::= <term> ::= <terminal> | <nonterminal>

Not defined here (but you know what they are) : <letter>, <digit>, <punctuation mark>, <any chars> (any printable nonalphabetic character except double quote)

Page 3: 14-Sep-15 Implementing a BNF Grammar Example design of a data structure

Notes on terminology A grammar typically has a “top level” element A string is a “sentential form” if it satisfies the syntax of the top level

element A string is a “sentence” if it is a sentential form and is composed entirely

of terminals A string “belongs to” a grammar if it satisfies the rules of that grammar

BNF was developed by John Backus and Peter Naur, computer scientists, in the 1950s

Context-free grammars were independently developed in 1956 by the linguist, Noam Chomsky, to describe human languages

BNF and context-free grammars use slightly different notation, but are completely equivalent

Page 4: 14-Sep-15 Implementing a BNF Grammar Example design of a data structure

Uses of a grammar

A BNF grammar can be used in two ways: To generate strings belonging to the grammar

To do this: start with a string containing a nonterminal;

while there are still nonterminals in the string { replace a nonterminal with one of its definitions}

To recognize strings belonging to the grammar This is the way programs are compiled--a program is a

string belonging to the grammar that defines the language Recognition is much harder than generation

Page 5: 14-Sep-15 Implementing a BNF Grammar Example design of a data structure

Sample generation <sentence> ::= <noun phrase> <verb phrase>

<noun phrase>::= <determiner> <noun><verb phrase> ::= <verb> <noun phrase><determiner> ::= a | the<noun> ::= boy | girl | dog<verb> ::= chased | heard | saw

<sentence> (I’m underlining the nonterminal that I’m about to expand) <noun phrase> <verb phrase> (The just-expanded part is in blue) <determiner> <noun> <verb phrase> the <noun> <verb phrase> the <noun> <verb> <noun phrase> the boy <verb> <noun phrase> the boy <verb> <determiner> <noun> the boy chased <determiner> <noun> the boy chased <determiner> girl the boy chased the girl

Page 6: 14-Sep-15 Implementing a BNF Grammar Example design of a data structure

Generating sentences

I want to write a program that reads in a grammar, stores it in some appropriate data structure, then generates random sentences belonging to that grammar

I need to decide: How to store the grammar What operations to provide on the grammar

These decisions are intertwined! How I store the grammar determines what operations are easy

and efficient (and even possible!)

Page 7: 14-Sep-15 Implementing a BNF Grammar Example design of a data structure

Philosophy

“Bad programmers worry about the code. Good programmers worry about data structures and their relationships.” --Linus Torvalds, creator of Linux

“Smart data structures and dumb code works a lot better than the other way around.” --Eric Raymond, in The Cathedral and the Bazaar

Page 8: 14-Sep-15 Implementing a BNF Grammar Example design of a data structure

Data structures for sentence generation

BNF, and sentence generation using BNF, are defined in terms of string manipulation We could write a sentence generation program using only strings, but it would be unnecessarily

difficult Since we want to repeatedly look up nonterminals to find their definitions, a map from

nonterminals (keys) to definitions (values) is the obvious thing to do For speed, we should use a HashMap; but to print the nonterminals in alphabetical order, we

should use a TreeMap Since a nonterminal may have an arbitrary number of definitions, those various

definitions can be stored as a set or a list Lists are somewhat easier to work with; in particular, it’s easier to choose a random value from

an ArrayList Each individual definition is some number of terminals and/or nonterminal, in a specific

order, so a list is most appropriate So to store our BNF, we use a TreeMap<String,

<ArrayList<ArrayList<String>>>

Notice that this structure would not be useful for recognizing sentences

Page 9: 14-Sep-15 Implementing a BNF Grammar Example design of a data structure

<sentence> ::= <noun phrase> <verb phrase>

<noun phrase>::= <determiner> <noun>

<verb phrase> ::= <verb> <noun phrase>

<determiner> ::= a | the

<noun> ::= boy | girl | dog

<verb> ::= chased | heard | saw

Page 10: 14-Sep-15 Implementing a BNF Grammar Example design of a data structure

Data structures for sentential forms In addition to storing the BNF grammar, we need a data structure to hold

sentential forms. Recall that:

A grammar typically has a “top level” element A string is a “sentential form” if it satisfies the syntax of the top level element

So as we develop <sentence> → <noun phrase> <verb phrase> → <determiner> <noun> <verb phrase> → … → the boy chased the girl, we need a data structure that allows us to easily change each of these sentential forms into the next

Two possible structures seem especially appropriate: A tree, with <sentence> as the root A list of terminals and nonterminals

Considering the operations required (expanding nonterminals, printing the final result), both of these seem about equally easy

Page 11: 14-Sep-15 Implementing a BNF Grammar Example design of a data structure

Development approaches

Bad approach:Design a general representation for grammars and a complete set of operations on them

Actually, this is a good approach if you are writing a general-purpose package for general use--for example, for inclusion in the Java API

Otherwise, it just makes your program much more complex

Good approach:Decide the operations you need for this program, and design a representation for grammars that supports these operations

It’s a nice bonus if the design can later be extended for other purposes Remember the Extreme Programming slogan YAGNI: “You ain’t gonna

need it.”

Our program will generate sentences, not recognize them!

Page 12: 14-Sep-15 Implementing a BNF Grammar Example design of a data structure

Requirements and constraints

We need to read the grammar in But we don’t need to modify it later Any tools for building the grammar structure can be private

We need to look up the definitions of nonterminals We need this because we will need to replace each

nonterminal with one of its definitions We need to know the top level element of the grammar

But we can just assume that we know what it is For example, we can insist that the top-level element be

<sentence>

Page 13: 14-Sep-15 Implementing a BNF Grammar Example design of a data structure

First cut public class Grammar implements Iterable List<String> rule; // a single alternative for a

nonterminal List<List<String>> definition; // all the rules for one

nonterminal Map<String, List<List<String>>> grammar; // rules for all the

nonterminals public Grammar() { grammar = new TreeMap<String,

List<String>>(); } public void addRule(String rule) throws IllegalArgumentException public List<List<String>> getDefinition(String nonterminal) public List<String> getOneRule(String nonterminal) // random

choice public Iterator iterator() public void print()

Page 14: 14-Sep-15 Implementing a BNF Grammar Example design of a data structure

First cut: Evaluation

Advantages Small, easily learned interface Just one class Can be made to work

Disadvantages As designed, <foo> ::= bar | baz is two rules, requiring two calls to

addRule; hence requires caller to do some of the parsing, to separate out the left-hand side

Requires some fairly complicated use of generics ArrayList implements List (hence is a List), but consider:

List<List<String>> definition = makeList(); This statement is legal if makeList() returns an ArrayList<List<String>> It is not legal if makeList() returns an

ArrayList<ArrayList<String>>

Page 15: 14-Sep-15 Implementing a BNF Grammar Example design of a data structure

Second cut: Overview

We can eliminate the compound generics by using more than one class

public class Grammar implements Iterable Map<String, Definitions> grammar; // all the rules

public class Definitions List<Rule> definition;

public class Rule String lhs; // the definiendum List<String> rhs; // the definiens

Page 16: 14-Sep-15 Implementing a BNF Grammar Example design of a data structure

Second cut: More detail public class Grammar implements Iterable

Map<String, Definition> grammar; // rules for all the nonterminals public Grammar() { grammar = new TreeMap<String, Definition>(); } // constructor public void addRule(String rule) throws IllegalArgumentException public Definition getDefinition(String nonterminal) public Iterator iterator() public void print()

public class Definition List<Rule> definition; // all definitions for some unspecified nonterminal Definition() // constructor void addRule(Rule rule) Rule getOneRule() public String toString()

public class Rule String lhs; // the definiendum List<String> rhs; // the definiens Rule(String text) // constructor public String getLeftHandSide() public List<String> getRightHandSide() public String toString()

Page 17: 14-Sep-15 Implementing a BNF Grammar Example design of a data structure

Second cut: Evaluation

Advantages: Simplifies use of generics

Disadvantages: Many more methods Definitions are “unattached” from nonterminal being defined

This makes it easier to parse definitions Seems a bit unnatural Need to pass the tokenizer around as an additional argument

Doesn’t help with the problem that the caller still has to separate out the definiendum from the definiens

Page 18: 14-Sep-15 Implementing a BNF Grammar Example design of a data structure

Third (very brief) cut

Definition and Rule are basically both just lists of strings Why not just have them implement List? Methods to implement:

public boolean add(Object o)public void add(int index, Object element)public boolean addAll(Collection c)public boolean addAll(int index, Collection c)public void clear() public boolean contains(Object o)public boolean containsAll(Collection c)public Object get(int index)public int indexOf(Object o)public boolean isEmpty()public Iterator iterator() public int lastIndexOf(Object o)public ListIterator listIterator()public ListIterator listIterator(int index)public boolean remove(Object o)public Object remove(int index)public boolean removeAll(Collection c)public boolean retainAll(Collection c)public Object set(int index, Object element)public int size() public List subList(int fromIndex, int toIndex)public Object[] toArray()public Object[] toArray(Object[] a)

Page 19: 14-Sep-15 Implementing a BNF Grammar Example design of a data structure

Fourth cut, not quite as brief

The class AbstractList “provides a skeletal implementation of the List interface...the programmer needs only to extend this class and provide implementations for the get(int index) and size() methods.”

I tried this, but... If I don’t know how AbstractList is implemented, how can

I write these methods? No book or API class that I looked at provided any clues I may be missing something, but it looks like the only thing

to do is to look at the source code for some of Java’s classes (like ArrayList) to see how they do it

Doable, but too much work!

Page 20: 14-Sep-15 Implementing a BNF Grammar Example design of a data structure

Letting go of a constraint

It is good practice to use a more general class or interface if you don’t need the services of a more specific class

In this problem, I want to use lists, but I don’t care whether they are ArrayLists, or LinkedLists, or something else

Hence, I generally prefer declarations like List<String> list = new ArrayList<String>();

In this case, however, trying to do this just seems to be the cause of many of the problems

What happens if I just make all lists ArrayLists?

Page 21: 14-Sep-15 Implementing a BNF Grammar Example design of a data structure

Fifth (and final) cut public class Grammar

Map<String, Definitions> grammar; // rules for all the nonterminals

public Grammar() { grammar = new TreeMap<String, ListOfDefinitions >();}

public void addRule(String rule) throws IllegalArgumentException public ListOfDefinitions getDefinitions(String nonterminal) public void print() public static boolean isNonterminal(String s) {

return s.startsWith("<") && s.endsWith(">") && s.length() > 2;}

private void addToGrammar(String lhs, SingleDefinition definition)

public class ListOfDefinitions extends ArrayList<SingleDefinition> @Override public String toString()

public class SingleDefinition extends ArrayList<String>

@Override public String toString()

Page 22: 14-Sep-15 Implementing a BNF Grammar Example design of a data structure

Explanation I of final BNF API Example:

<unsigned integer> ::= <digit> | <unsigned integer> <digit> The above is a rule

<unsigned integer> is the definiendum (the thing being defined) <digit> is a single definition of <unsigned integer> <unsigned integer> <digit> is another single definition of <unsigned

integer> So,

There is a SingleDefinition consisting of the ArrayList [ "<digit>" ] Another SingleDefinition consists of the ArrayList

[ "<unsigned integer>", "<digit>" ] A ListOfDefinitions object is a list of single definitions, in this case:

[ [ "<digit>" ], [ "<unsigned integer>", "<digit>" ] ] A Grammar maps nonterminals onto their definitions; thus, a grammar

containing the above rule would include the mapping:"<unsigned integer>" [ [ "<digit>" ], [ "<unsigned integer>", "<digit>" ] ]

Page 23: 14-Sep-15 Implementing a BNF Grammar Example design of a data structure

Explanation II of final BNF API

A Grammar is a set of mappings from definienda (nonterminals) to definitions, along with some operations on that set of definitions

You can addRule(String rule) to a Grammar The rule is parsed, and an entry made in the map Definitions for a nonterminal may be together, as in the above example, or

separate: <unsigned integer> ::= <digit> <unsigned integer> ::= <unsigned integer> <digit>

You can get the ListOfDefinitions for a given nonterminal You can print the complete Grammar

Page 24: 14-Sep-15 Implementing a BNF Grammar Example design of a data structure

Final version: Evaluation

Advantages: Grammar has one constructor and three public methods ListOfDefinitions and SingleDefinition are just ArrayLists, so there

are no new methods to learn All rule parsing is consolidated into a single public method,

addRule(String rule) I was able to come up with more meaningful names for classes

Disadvantages: User has to do a bit more list manipulation; in particular, choosing a

random element from a list This doesn’t seem like an appropriate thing to have in a grammar, anyway

Page 25: 14-Sep-15 Implementing a BNF Grammar Example design of a data structure

Morals “Weeks of programming can save you hours of planning.” The mistake most programmers make is to use the first design that

comes to mind This usually can be made to work, but it’s seldom optimal

Much as we would like to pretend otherwise, programming is an iterative process--we design, then try to implement, then change the design, then try to implement....

TDD (Test-Driven Development) is a “lightweight” (low cost) way to try out a design

For example, in my first design, I discovered how difficult it was to write tests that used the complex generics

Consequently, I never even tried to implement this first design Morals to take home:

Be flexible; try out more than one design Do TDD

Page 26: 14-Sep-15 Implementing a BNF Grammar Example design of a data structure

Aside: Tokenizing the input grammar

I wrote a BnfTokenizer class that returns every token as a String Nonterminals keep their angle brackets, and may be multi-

word Double-quoted strings are returned as a single token (minus

the double quotes) ::= and | are returned as single tokens

BnfTokenizer uses StreamTokenizer It provides two constructors,

BnfTokenizer() and BnfTokenizer(String text) And two methods,

void tokenize(text) and String nextToken()

Page 27: 14-Sep-15 Implementing a BNF Grammar Example design of a data structure

The End