parsing xml grammars, pdas, lexical analysis, recursive descent
TRANSCRIPT
Parsing XML
Grammars, PDAs, Lexical Analysis, Recursive Descent
Recipe Book Markup Language
• Why Markup languages?– Give structure of contents – aid in interpreting
semantics of content, storing in database, etc.• Why XML?
– Human readable (sort of)– Widely accepted and used for data interchange
• Why RBML?– Don’t reinvent the wheel – use existing stuff IAAP– Simplest of the recipe XML formats I found
Formal Languages
• What is a Formal Language?– Mathematically defined subset of strings over a
finite alphabet• Regular Languages
– Very simple, can be recognized by FSM– Still very powerful
• Context-Free Languages– Pretty simple, can be recognized by PDA– Esp. useful for programming language
Regular Expressions/Languages• Alphabet, Σ = finite set of symbols• String, σ = sequence of 0 or more symbols in Σ*• Regular Expressions
– The empty set, Ø– The empty string, ε is an RE and denotes {ε}– For all a in Σ, a is an RE and denotes {a}– If r and s are REs, denoting the languages R and S,
resp., then (r+s), (rs), and (r*) are REs that denote R U S, RS, and R*, resp.
Context-Free Languages• Context-Free Grammar G=<V,T,P,S>
– V = variables– T = terminals (alphabet characters)– P = Productions– S = start symbol in V
• Productions– Replace a variable with a string from (V U T)*– Example: E -> E + E | E * E | (E) | id
RBML Grammarcookbook -> “<cookbook>”
title
(section | recipe)+
“</cookbook>”
title -> “<title>”
pcdata
“</title>”
section -> “<section>”
title
recipe+
“</section>”
recipe -> “<recipe>”
title
recipeinfo
ingredientlist
preparation
serving
notes
“</recipe>”
RBML Grammarrecipeinfo ->
<recipeinfo> (author | blurb | effort | genre | preptime | source | yield)*</recipeinfo>
ingredientlist -> <ingredientlist>ingredient)*</ingredientlist>
preparation -> <preparation>(pcdata | equipment | step | hyperlink)*</preparation>
serving -> <serving> (pcdata | hyperlink)*</serving>
notes -> <notes>(pcdata | hyperlink)*</notes>
RBML Grammarequipment -> <equipment>
(pcdata | hyperlink)*</equipment>
step -> <step>(pcdata | equipment | hyperlink)*</step>
ingredient -> <ingredient>(pcdata | quantity | unit | fooditem)*</ingredient>
quantity -> <quantity>number | number "or" number | number "and" number</quantity>
number -> integer | fraction | integer " " fraction fraction -> integer "/" integer
Recipe Book Markup Languageunit -> <unit>
pcdata
</unit>
fooditem -> <fooditem>
pcdata
</fooditem>
blurb -> <blurb>
pcdata
</blurb>
effort -> <effort>
pcdata
</effort>
genre -> <genre>
pcdata
</genre>
Recipe Book Markup Languagepreptime -> <preptime>
pcdata
</preptime>
source -> <source>
(pcdata | hyperlink)*
</source>
yield -> <yield>
pcdata
</yield>
hyperlink ->
pcdata url
Recursive Descent Parsing• Match required (literal) symbols• Call procedure to match variable
– May itself call similar procedures
Lexical Analysis• Helps prepare for parsing• Uses regular language expressions to
– Organize input into multi-symbol chunks– Each chunk has a meaning for parser