language and grammars © university of liverpoolcomp 319slide 1
Post on 22-Dec-2015
220 views
TRANSCRIPT
LANGUAGE AND GRAMMARS
© University of LiverpoolCOMP 319 slide 1
Contents• Languages and Grammars• Formal languages• Formal grammars• Generative grammars• Analytic grammars• Context-free grammars• LL parsers• LR parsers• Rewrite systems• L-systems
© University of LiverpoolCOMP319 slide 2
Software Engineering FoundationSoftware engineering may be summarised by saying that it concerns the construction of programs to solve problems and that there are three parts:
- Construction/engineering, and methods
- Problems, and problem solving, and
- Programs© University of LiverpoolCOMP319 slide 3
Languages and grammar• Languages are spoken and written
(linguistics)• To be effective they must be based on a
shared set of rules – a grammar• Grammars are introspective they are
based on and couched in language• Natural language grammars are
constantly shifting and locally negotiated• A grammar is a formal language in which
the rules of discourse are discussed and are the aim
© University of LiverpoolCOMP319 slide 4
Formal language concepts• The concept emerges because of the
need to define rules (for language)• Formally, they are collections of
words composed of smaller, atomic units
• Issues of concern are- the number and nature of the atomic units,
- the precision level required,- the completeness of the formalism
© University of LiverpoolCOMP319 slide 5
Examples of formal languages
• The set of all words over {a, b}• The set {an : n is a prime number}• The set of syntactically correct
programs in a given computer programming language
• The set of inputs upon which a certain Turing machine halts
© University of LiverpoolCOMP319 slide 6
Formal language specification
There are many ways in which a formal language can be specified e.g.
• strings produced in a formal grammar
• strings produced by regular expressions
• the strings accepted by automata• logic and other formalisms
© University of LiverpoolCOMP319 slide 7
Language Production Operations
• Concatenation of strings drawn from the two languages
• Intersection or union of common strings in both languages
• Complement of one language• Right quotient of one by the other• Kleene star operation on one
language• Reverse of a language• Shuffle combination of languages
© University of LiverpoolCOMP319 slide 8
Formal Grammars
• Noam Chomsky- Linguist, philosopher at MIT- 1956, papers on information and grammar
• Types of formal grammar- Generative grammar- Analytical grammar
© University of LiverpoolCOMP319 slide 9
Generative formal grammars
• Generative grammars:A set of rules by which all possible strings in a language to be described can be generated by successively rewriting strings starting from a designated start symbol.
In effect it formalises an algorithm that generates strings in the language.
© University of LiverpoolCOMP319 slide 10
Analytic formal grammars
• Analytic grammars:A set of rules that assumes an arbitrary string as input, and which successively reduces or analyses that string to yield a final boolean “yes/no” that indicates whether that string is a member of the language described by the grammar
In effect a parser or recogniser for a language
© University of LiverpoolCOMP319 slide 11
Generative grammar components
Chomsky’s definition – essentially for linguistics but perfect for formal computing grammars; consists of the following components:
- A finite set N of nonterminal symbols- A finite set of terminal symbols disjoint from N- A finite set P of production rules where a rule is
of the form: string in ( N)* → string in ( N)*
- A symbol S in N that is identified as the start symbol
© University of LiverpoolCOMP319 slide 12
Generative grammar definition
• A language of a formal grammar:• G = (N, ,P, S)• Is denoted by L(G)• And is defined as all those strings
over such that can be generated by starting from the symbol S and then applying P until no more nonterminal symbols are present
© University of LiverpoolCOMP319 slide 13
A generative formal grammar• Given the terminals {a, b}, nonterminals {S, A,
B} where S is the special start symbol and• Productions:
S → ABSS → (the empty string)BA → ABBS → bBb → bbAb → abAa → aa
Defines all the words of the from anbn, (i.e. n copies of a followed by n copies of b)
© University of LiverpoolCOMP319 slide 14
Context Free Grammars
• Theoretical basis of most programming languages.
• Easy to generate a parser using a compiler compiler.
• Two main approaches exist: top-down parsing e.g. LL parsers, and bottom-up parsing e.g. LR parsers.
© University of LiverpoolCOMP319 slide 15
LL parser• Table based, top down parser for a
subset of the context-free grammars (LL grammars).
• Parsing is Left to right, and constructs a Leftmost derivation of the sentence.
• LL(k) parsers use k tokens of look-ahead to parse the LL(k) grammar sentence.
• LL(1) grammars are popular and fast because only the next token is considered in parsing decisions.
© University of LiverpoolCOMP319 slide 16
Table based LL parsing
© University of LiverpoolCOMP319 slide 17
Input buffer: <null> | | +-------------+ Stack | | S <---| Parser | --> Output $ | | +-------------+ ^ |
+-----------+ | Parsing | | table | +-----------+
Architecture• Consider the grammar
1. S → F2. S → ( S + F)3. F → 1
• This has the parsing table
e.g. 1 and S implies rule 1i.e. Stack S is replaced with
Fand 1 is outputStack and Input same =
deleteStack and Input different =
error• Example input
( 1 + 1 ) $
( ) 1 + $
S 2 - 1 - -
F - - 3 - -
Table based LL parsing
© University of LiverpoolCOMP319 slide 18
• Consider the grammar1. S → F2. S → ( S + F)3. F → 1
• This has the parsing table
e.g. 1 and S implies rule 1i.e. Stack S is replaced with
Fand 1 is outputStack and Input same =
deleteStack and Input different =
error• Example input
( 1 + 1 ) $
( ) 1 + $
S 2 - 1 - -
F - - 3 - -
input stack action output
( S$ parse ( S : 2 2
( (S + F)$ ( ( delete 2
1 S + F)$ parse 1 S : 1 21
1 F + F)$ parse 1 F : 3 213
1 1 + F)$ 1 1 delete 213
+ + F)$ + + delete 213
1 F)$ parse 1 F : 3 2133
1 1)$ 1 1 delete 2133
) )$ ) ) delete 2133
$ $ stop 2133
Parse Tree
Left Right Parser• Bottom up parser for context-free
grammars used by many program language compilers
• Parsing is Left to right, and produces a Rightmost derivation.
• LR(k) parsers uses k tokens of look-ahead.• LR(1) is the most common type of parser
used by many programming languages. Usually always generated using a parser generator which constructs the parsing table; e.g. Simple LR parser (SLR), Look Ahead LR (LALR) e.g. Yacc, Canonical LR.
© University of LiverpoolCOMP319 slide 20
Left Right parser example..
• Rules ...• 1) E → E * B• (2) E → E + B• (3) E → B• (4) B → 0• (5) B → 1
© University of LiverpoolCOMP319 slide 21
Left Right parser example
© University of LiverpoolCOMP319 slide 22
Re-writing• Rewriting is a general process involving
strings and alphabets. Classified according to what is rewritten e.g. strings, terms, graphs, etc.
• A rewrite system is a set of equations that characterises a system of computation that provides one method of automating theorem proving and is based on use of rewrite rules.
• Examples of practical systems that use this approach includes the software Mathematica.
© University of LiverpoolCOMP319 slide 23
Re-writing logic example
• ! ! A = A // eliminate double negative
• !(A AND B) = !A OR !B // de-morgan
© University of LiverpoolCOMP319 slide 24
Re-writing in Mathematica (Wolfram)
© University of LiverpoolCOMP319 slide 25
L-systems
• Named after Aristid Lindenmeyer (1925-1989) a Swedish theoretical biologist and botanist who worked at the University of Utrecht (Netherlands)
• Are a formal grammar used to model the growth and morphology of plants and animals
• In plant and animal modelling a special form, the parametric L-system is used – based on rewriting.
• Because of their recursive, parallel, and unlimited nature they lead to concepts of self-similarity and fractional dimension and fractal-like forms.
© University of LiverpoolCOMP319 slide 26
L-system structure• The basic system is identical to formal grammars:
G = {V, S, Ω, P}• where
G is the grammar definedV (the alphabet) a set of symbols that can be replaced by
(variables)S is a set of symbols that remain fixed (constants)Ω(start, axiom or initiator) a string from V, the initial stateP is a set of rules or productions defining the ways
variables can be replaced by constants and other variables. Each rule, consists of a LHS (predecessor) and RHS (successor)
© University of LiverpoolCOMP319 slide 27
© University of LiverpoolCOMP319 slide 28
Slide 28
Example 1: Fibonacci numbers
• V: A B • C: none• Ω : A• P: p1: A → B p2: B →
AB
N=0 AN=1 → BN=2 → AB N=3 → BAB N=4 → ABBAB N=5 → BABABBAB N=6 → ABBABBABABBABN=7 → BABABBAB...Counting lengths we get: 1,1,2,3,5,8,13,21,...The Fibonacci numbers
© University of LiverpoolCOMP319 slide 29
Slide 29
Example 2: Algal growth
• V: A B • C: none• Ω : A• P: p1: A → AB p2: B → A
N=0 A → ABN=1 → ABAN=2 → ABAABN=3 → ABAABABA
© University of LiverpoolCOMP319 slide 30
COMP319 Software Engineering II
Example 3: Koch snowflake
• V: F • C: none• Ω : F• P: p1: F → F+F-F-
F+F
N=0 F N=1 → F+F-F-F+FN=2 → F+F-F-F+F+F...N=3 etc
Example 4: 3D Hilbert curve
© University of LiverpoolCOMP319 slide 31
Example 5: Branching
© University of LiverpoolCOMP319 slide 32