context free

In this video, we're gonna begin ourdiscussion of parsing technology withcontext-free grammars. Now as we know, notall strings of tokens are actually validprograms and the parser has to tell thedifference. It has to know which stringsof tokens are valid and which ones areinvalid and give error messages for theinvalid ones. So, we need some way ofdescribing the valid strings of tokens andthen some kind of algorithm fordistinguishing the valid and invalidstrings of tokens from each other. Nowwe've also discussed that programminglanguages have a natural recursivestructure, So for example in Cool, anexpression That can be anyone of a verylarge number of things. So two of thethings that can be are an if expressionand a while expression but theseexpressions are themselves recursivelycomposed of other expressions. So forexample, the predicate of an if is a, a[inaudible] expression as is the thenbranch and the else branch and in a whileloop the termination test is an expressionand so is the loop body. And context-freegrammars are in natural notation fordescribing such recursive structures. Sowithin a context-free grammar so formallyit consist a set of terminals t, a set ofnonterminals n, a start symbol s and s isone of the nonterminals and a set ofproductions and what's a production? Aproduction is a symbol followed by anarrow followed by a list of symbols. Andthese symbols, there are certain rulesabout them so the x thing on the left handside of the arrow has to be a nonterminal.That's what it means to be on the lefthand side so the set of things on the lefthand side of productions are exactly thenonterminals. And then the right hand sideevery yi on the right hand side can beeither a nonterminal or it can be aterminal or it can be the special symbolepsilon. So let's do a simple example of aContext-free Grammar. Strings of balancedparenthesis which we discussed in anearlier video can be expressed as follows.So, we have our start symbol and. Onepossibility for a string o f balancedparentheses is that it consists of an openparen on another string of balancedparentheses and a close paren. And, theother possibility for a string of balancedparentheses that is empty because theempty string is also a string of balancedparentheses. So, there are two productionsfor this grammar and just to go over theto, to relate this example to the formaldefinition we gave on the previous slide,what is our set of nine terminals, it'sjust. The singles nonterminal s, what ourterminal symbols in this context-freegrammar is just open and close paren. Noother symbols. What's the start symbol?Well, it's s. It's the only nonterminal soit has to be the start symbol butgenerally we will, when we give grammarsthe first production will name a startsymbol so rather than name and explicitlywhichever production occurs first thesymbol on the left hand side will be thenonterminal for that particularcontext-free grammar. And then finally,what are the productions with the, we saidthere could be a set of productions andhere are the two productions for thisparticular Context-Free Grammar. Now,productions can be read as rules. So,let's write down one of our productionsfrom the from the example grammar and whatdoes this mean? This means wherever we seean s, we can replace it by the string ofsymbols on the right hand side. So,Wherever I see an s I can substitute and Ican take the s out. If that important, Iremove the s that appears on the left sideand I replace it by the string of symbolson the right hand side so productions canbe read as replacement rule so right handside replaces the left hand side. Sohere's a little more formal description ofthat process. We begin with the stringthat has only the start symbol s, so wealways start with just the start symbol.And now, we look at our string initiallyit's just a start symbol but it changesovertime, and we can replace anynon-terminal in the string by the righthand side, side of some production forthat non-terminal. So for exam ple, I canreplace a non-terminal x by the right handside of some production for x. X in thiscase, x goes to y1 through yn. And then wejust repeat step two over and over againuntil there are no non-terminals leftuntil the string consist of onlyterminals. And at that point, we're done.So, to write this out slightly moreformally, a single step here consist of astate which is a, which is a string ofsymbols, so this can be terminals andnon-terminals. And, somewhere in thestring is a non-terminal x and there is aproduction for x, in our grammar. So thisis part of the grammar, okay? This is aproduction And we can use now productionto take a step from, to a new state Wherewe have replaced X by the right hand sideof the production, Okay? So this is onestep of a context-free derivation. So nowif you wanted to do multiple steps, wecould have a bunch of steps, alpha zerogoes to alpha one goes to alpha two andthese are strings now. Alpha i's are allstrings and as we go along we eventuallyget to some strong alpha n, alright. Andthen we say that alpha zero rewrites inzero or more steps to alpha n so thismeans n zero, greater than or equal tozero steps. Okay. So this is just a shorthand for saying there is some sequence ofindividual productions. Individual rulesbeing applied to a string that gets usfrom the string alpha string zero to thestring alpha n and remember that ingeneral we start with just the startsymbol and so we have a whole bunch ofsequence of steps like this that get usfrom start symbol to some other string. Sofinally, we can define the language of aContext-Free Grammar. So, [inaudible]context-free grammar has a start symbol s,so then the language of the context-freegrammar is gonna be the string of symbolsalpha one through alpha n such that forall i. Alpha i is an element of theterminals of g, okay. So t here is the setof terminals of g and s goes, the startsymbol s goes in zero or more steps toalpha one, I'm sorry a1 to an, okay. Andso we're just saying, this is just sayingthat all the strings of terminals that Ican derive beginning with just the startsymbol, those are the strings in thelanguage. So the name terminal comes fromthe fact that once terminals are includedin the string, there is no rule ofreplacing them. That is once the terminalis generated, it's a permanent feature ofthe string and in applications toprogramming languages and context-freegrammars, the terminals are to be thetokens of the language that we aremodeling with our context-free grammar.With that in mind, let's try thecontext-free grammar for a fragment of[inaudible]. So, [inaudible] expressions,we talked about these earlier, but onepossibility for a [inaudible] expressionis that it's an if statement or an ifexpression. And, we call that [inaudible]if statements have three parts. And theyend with the keyword [inaudible] which isa little bit unusual. And so looking atthis looking at this particular rule, wecan see some conventions that way, thatare pretty standard and that we'll use sothat non-terminals are in all caps. Okay,so in this case was just [inaudible] we'lltry that in caps and then the terminalsymbols are in, in lower case, all right?And another possibility Is that it couldbe a while expression. And finally thelast possibility Is that it could beidentifier id and there actually many,many more possibilities and lots of othercases for expressions and let me just showyou one bit of notation to make thingslook a little bit nicer. So we have manywe have many productions for the samenon-terminal. We usually group thosetogether in the grammar and we only writea non-terminal on the right hand side onceand then we write explicit alternative. Sothis is actually. Completely the same aswriting out expert arrow two more timesbut we here we just is, this is just a wayof grouping these three productionstogether and saying that expr- is thenon-terminal for all three right handsides. Let's take a look at some of thestrings on the language of this Context-Free Grammar. So, a valid Kuhl expressionis just a single identifier and that'seasy to see because EXPR is our startsymbol, I'll call it EXPR. And, so theproduction it does says it goes to id. SoI can take the start symbol directly to astring of terminals, a single variablename is a valid Kuhl expression. Anotherexample is an e-th expression where e-thof the subexpressions is just a variablename. So this is perfectly fine structurefor a Kuhl expression. Similarly I can dothe same thing with the while expression.I can take the structure of a while andthen replace each of the subexpressionsjust with a single variable name and thatwould be a syntactically valid cool whileloop. There are more complicatedexpressions so for example, here we have awhy loop as the predicate of an ifexpression. That's something you mightnormally think or writing but perfectlywell form and tactically. Similarly, Icould have an if statement or an ifexpression as the predicate of and if it'sinside of an if expression. So, so nestedif expressions like this one are alsosyntactically valid. Let's do anothergrammar, this time for simple arithmeticexpressions. So, we'll have our startsymbol and only non-terminal for thisgrammar be called e and one of thepossibilities while e could be the sum ofexpressions. Or and remember this is analternative notation for e arrow. It'sjust a way of saying I'm going to use thenonterminal for another production. We canhave a sum of two expressions or we couldhave the Multiplication of twoexpressions. And then we could haveexpressions that appear inside theparentheses, so parenthesized expressions.And just to keep things simple, we couldjust have as our base, only base casesimple identifier so variable names. Andhere's a small grammar over plus and timesto see and in parentheses and variablenames. [inaudible] a few elements of thislanguage. So for example, a singlevariable name is a perfectly good elementof the language id + id is also in thislanguage. Which s is id + id id and wecould also use parens to group things sowe could say id + id id that's alsosomething you can generate using theserules and so on and so forth. There aremany, many more strings in this language.Context-free grammars are our big steptowards being able to say what we want ina parser but, we still need some otherthings. First of all, a context-freegrammar at least as we define it so far,just gives us a yes or no answer. Yessomething, yes a string is in the languageof the Context-free grammar or no it isnot. We also need a method for building aParse Tree at the input. So in those caseswhere it is on the language, we need toknow how it's in the language. We need theactual Parse Tree not just yes or no. Inthe cases where the string is not in thelanguage, we have to be able to handleerrors gracefully and give some kind offeedback to the programmer so we need amethod for doing that. And finally if wehave these two things we need an actualimplementation of them in order toactually implement context-free grammars.One last comment before we wrap up thisvideo. The form of the context-freegrammar can be important. Tools are oftensensitive to the particular you write thegrammar and while there are many ways towrite a grammar for the same language,only some of them may be accepted by thetools. And as we'll see there are caseswhere it's necessary to modify the grammarin order to get the tools to accept it.This happens actually sometimes as wellwith regular expressions but it's muchless common. So normally for most regularexpressions you would want to write thetools would be able to digest them. That'sfine. That's not also true. That's nottrue of an arbitrary context-free grammar.

context free

Documents