parsing xml grammars, pdas, lexical analysis, recursive descent

12
Parsing XML Grammars, PDAs, Lexical Analysis, Recursive Descent

Upload: regina-lyons

Post on 14-Dec-2015

223 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Parsing XML Grammars, PDAs, Lexical Analysis, Recursive Descent

Parsing XML

Grammars, PDAs, Lexical Analysis, Recursive Descent

Page 2: Parsing XML Grammars, PDAs, Lexical Analysis, Recursive Descent

Recipe Book Markup Language

• Why Markup languages?– Give structure of contents – aid in interpreting

semantics of content, storing in database, etc.• Why XML?

– Human readable (sort of)– Widely accepted and used for data interchange

• Why RBML?– Don’t reinvent the wheel – use existing stuff IAAP– Simplest of the recipe XML formats I found

Page 3: Parsing XML Grammars, PDAs, Lexical Analysis, Recursive Descent

Formal Languages

• What is a Formal Language?– Mathematically defined subset of strings over a

finite alphabet• Regular Languages

– Very simple, can be recognized by FSM– Still very powerful

• Context-Free Languages– Pretty simple, can be recognized by PDA– Esp. useful for programming language

Page 4: Parsing XML Grammars, PDAs, Lexical Analysis, Recursive Descent

Regular Expressions/Languages• Alphabet, Σ = finite set of symbols• String, σ = sequence of 0 or more symbols in Σ*• Regular Expressions

– The empty set, Ø– The empty string, ε is an RE and denotes {ε}– For all a in Σ, a is an RE and denotes {a}– If r and s are REs, denoting the languages R and S,

resp., then (r+s), (rs), and (r*) are REs that denote R U S, RS, and R*, resp.

Page 5: Parsing XML Grammars, PDAs, Lexical Analysis, Recursive Descent

Context-Free Languages• Context-Free Grammar G=<V,T,P,S>

– V = variables– T = terminals (alphabet characters)– P = Productions– S = start symbol in V

• Productions– Replace a variable with a string from (V U T)*– Example: E -> E + E | E * E | (E) | id

Page 6: Parsing XML Grammars, PDAs, Lexical Analysis, Recursive Descent

RBML Grammarcookbook -> “<cookbook>”

title

(section | recipe)+

“</cookbook>”

title -> “<title>”

pcdata

“</title>”

section -> “<section>”

title

recipe+

“</section>”

recipe -> “<recipe>”

title

recipeinfo

ingredientlist

preparation

serving

notes

“</recipe>”

Page 7: Parsing XML Grammars, PDAs, Lexical Analysis, Recursive Descent

RBML Grammarrecipeinfo ->

<recipeinfo> (author | blurb | effort | genre | preptime | source | yield)*</recipeinfo>

ingredientlist -> <ingredientlist>ingredient)*</ingredientlist>

preparation -> <preparation>(pcdata | equipment | step | hyperlink)*</preparation>

serving -> <serving> (pcdata | hyperlink)*</serving>

notes -> <notes>(pcdata | hyperlink)*</notes>

Page 8: Parsing XML Grammars, PDAs, Lexical Analysis, Recursive Descent

RBML Grammarequipment -> <equipment>

(pcdata | hyperlink)*</equipment>

step -> <step>(pcdata | equipment | hyperlink)*</step>

ingredient -> <ingredient>(pcdata | quantity | unit | fooditem)*</ingredient>

quantity -> <quantity>number | number "or" number | number "and" number</quantity>

number -> integer | fraction | integer " " fraction fraction -> integer "/" integer

Page 9: Parsing XML Grammars, PDAs, Lexical Analysis, Recursive Descent

Recipe Book Markup Languageunit -> <unit>

pcdata

</unit>

fooditem -> <fooditem>

pcdata

</fooditem>

blurb -> <blurb>

pcdata

</blurb>

effort -> <effort>

pcdata

</effort>

genre -> <genre>

pcdata

</genre>

Page 10: Parsing XML Grammars, PDAs, Lexical Analysis, Recursive Descent

Recipe Book Markup Languagepreptime -> <preptime>

pcdata

</preptime>

source -> <source>

(pcdata | hyperlink)*

</source>

yield -> <yield>

pcdata

</yield>

hyperlink ->

pcdata url

Page 11: Parsing XML Grammars, PDAs, Lexical Analysis, Recursive Descent

Recursive Descent Parsing• Match required (literal) symbols• Call procedure to match variable

– May itself call similar procedures

Page 12: Parsing XML Grammars, PDAs, Lexical Analysis, Recursive Descent

Lexical Analysis• Helps prepare for parsing• Uses regular language expressions to

– Organize input into multi-symbol chunks– Each chunk has a meaning for parser