parsing. goals of parsing check the input for syntactic accuracy return appropriate error messages...

Click here to load reader

Upload: calvin-campbell

Post on 28-Dec-2015

237 views

Category:

Documents


2 download

TRANSCRIPT

Slide 1

Parsing1Goals of ParsingCheck the input for syntactic accuracyReturn appropriate error messagesRecover if possible

Produce, or at least traverse, a complete parse treeParse tree (or trace) is basis for translationTop-down ParsersParse tree is built from the root down to the leavesBuilds parse tree in preorderCorresponds to a leftmost derivationParsing decision problem: choosing correct ruleTwo most common algorithms:Recursive Descent implemented in codeTable driven implementationBoth are LL algorithms (left-to-right scan, left-most derivation)

Bottom-upParse tree is built from the leaves up to the rootBuilds parse in reverse of a rightmost derivationRequires finding a handle, that is, a correct RHSMost common algorithms are LR (left-to-right, rightmost derivation)

ComplexityThe most general parsing algorithms work for any unambiguous grammarComplicated, inefficientO(n^3)

Trade generality for efficiencyCommercial compilers have complexity O(n)

5Recursive DescentParser is made up of a collection of subprogramsOne for each non-terminalSubprogram responsible for generating the parse tree rooted at the given non-terminalPulls tokens from the tokenizer, and leaves the first token not a part of its rule in nextTokenIf multiple rules associated with the current non-terminal, first a determination of the correct rule must be madeFunction Factor// -> id | ()

void factor() { if (nextToken == ID_CODE) lex(); else if (nextToken == LEFT_PAREN_CODE) { lex(); expr(); if (nextToken == RIGHT_PAREN_CODE) lex(); else error(); } else error(); /* Neither RHS matches */ }

::= if ( ) [else ]void ifstmt() {if (nextToken != IF_CODE) error();else { lex(); if (nextToken != LEFT_PAREN) error(); else { lex(); boolexpr(); if (nextToken != RIGHT_PAREN) error();else { lex(); statement(); if (nextToken == ELSE_CODE) { lex(); statement(); } } } }}Grammar RestrictionsLeft-recursion is a problemA ::= A + BParsing would never terminate!

In some cases, left-recursion can be eliminated by refactoring the grammarE ::= E + T | TE ::= T EE ::= + T E | Grammar restrictions continuedAbility to choose correct production based on a single next tokenPairwise disjointedness test indicates whether or not this choice can be accomplishedIf the first terminal that can be generated from a rule is unique

A ::= aB | bAb | BbB ::= cB | d

A ::= aB | BabB ::= aB | bFIRST Sets {a} {b} {c, d}Disjoint, Recursive descent parsableFIRST Sets {a} {a,b}Not disjoint, not recursive descent parsableTable driven parsersEncode production choice in a tableRows indicate current top of the stackColumns for each input tokenEntry in matrix gives production number

Preferred for large grammars Algorithm is fixedOnly table size grows

Expression Grammar ExampleS ::= A $A ::= i = E;E ::= T EE ::= | AO T EAO ::= + | -T ::= F TT ::=MO F TMO ::= * | /F ::= F PF ::= | UOUO ::= - | !P ::= i | l | ( E )Nidlit+-*/!();A1E222E4433AO56T777T889988MO1011F121212F1313141413UO1516P171819Bottom-up ParsingOften called shift-reduce algorithmsIntegral piece of every bottom-up parser is a stackShift moves the next input token onto the stackReduce replaces a RHS on the top of the stack with the corresponding LHS

Most bottom-up parsing algorithms are variations of the LR processOriginally designed by Donald KnuthRelatively small program and a parsing tableAdvantages of LR ParsersWill work for nearly all grammars that describe programming languages.Work on a larger class of grammars than other bottom-up algorithms, but are as efficient as any other bottom-up parser.Can detect syntax errors as soon as it is possible.LR class of grammars is a superset of the class parsable by LL parsersDisadvantageFor anything but very small grammars, it is difficult to produce by hand the parsing tableBut this is exactly what tools like yacc and bison can do for us automatically!

Original version was computationally intensive (both in terms of time and memory)Variations developed:Less computer resources requiredNot as generalKey InsightA bottom-up parser can use the entire history of the parse, up to the current point, to make parsing decisionsThere are only a finite and relatively small number of different parse situations that could have occurred, so the history can be stored in a parser state, on the parse stack

Parser ConfigurationMade up of both the stack, and the inputFor each state on the stack, there is an associated grammar symbol E.g. (S0X1S1X2S2XmSm, aiai+1an$) where Si indicates a state, and Xi indicates a grammar symbolInitial configuration: (S0, a0an$)Table driven bottom up parsingTable has two components:ACTION tableSpecifies the action of the parser, given the parser state and the next tokenRows are state namesColumns are terminalsGOTO tableSpecifies state to put in the stack after a reduce operationRows are state namesColumns are non-terminals Structure of an LR parser

Parser actionsIf ACTION[Sm, ai] = Shift S, the next configuration is: (S0X1S1X2S2XmSmaiS, ai+1an$)

If ACTION[Sm, ai] = Reduce A and S = GOTO[Sm-r, A], where r = the length of , the next configuration is(S0X1S1X2S2Xm-rSm-rAS, aiai+1an$)

If ACTION[Sm, ai] = Accept, the parse is complete and no errors were found.

If ACTION[Sm, ai] = Error, the parser calls an error-handling routine.

Example Grammar1. E ::= E + T2. E ::= T3. T ::= T * F4. T ::= F5. F ::= ( E )6. F ::= idExample LR Parsing Table

1. E ::= E + T2. E ::= T3. T ::= T * F4. T ::= F5. F ::= ( E )6. F ::= idTrace of parse of id + id * idStackInputAction0Id + id * id $Shift 50id5+ id * id $Reduce 6 (GOTO[0,F]0F3+ id * id $Reduce 4 (GOTO[0,T]0T2+ id * id $Reduce 2 (GOTO[0,E]0E1+ Id * id $Shift 60E1+6id * id $Shift 50E1+6id5* id $Reduce 6 (GOTO [6,F]0E1+6F3* id $Reduce 4 (GOTO [6,T]0E1+6T9* id $Shift 70E1+6T9*7Id $Shift 50E1+6T9*7id5$Reduce 6 (GOTO [7,F]0E1+6T9*7F10$Reduce 3 (GOTO [6,T]0E1+6T9$Reduce 1 (GOTO [0,E]0E1$accept