introduction to peg (parsing expression grammar) in python

71
Introduction to packrat parsing for PEGs (Parsing Expression Grammars) gavin bong pycon APAC 2011, Singapore Le 8 juin 2011 2129 mercredi

Upload: rwanda

Post on 12-Mar-2015

1.158 views

Category:

Documents


1 download

DESCRIPTION

Explainer for PEG

TRANSCRIPT

Page 1: Introduction to PEG (Parsing Expression Grammar) in python

Introduction to packrat parsing for PEGs (Parsing Expression Grammars)

gavin bongpycon APAC 2011, Singapore

Le 8 juin 2011 2129 mercredi

Page 2: Introduction to PEG (Parsing Expression Grammar) in python

2

roadmap

Motivation PEG theory pyparsingPyMeta PyPy rlib/parsing Closing

34 mins

mins04051607

min01min01

minsminsmins

Page 3: Introduction to PEG (Parsing Expression Grammar) in python

motivation

Natural languages

Mini languages (DSLs) Structured / unstructured file formats

4 thoughts :i. Aren't structured formats like JSON, XML, HTML well-served by existing parsers ?ii. Parsing log files & configuration files are easy with python.iii. Regular expression is good enough.

3

How to parse texts with PEGs NLTK

iv. What is wrong with the classical way of writing parsers ?

Page 4: Introduction to PEG (Parsing Expression Grammar) in python

CFG (Context Free Grammars)

In formal language theory, CFG is suitable for modeling both natural & computer languages.

4

BNF is the defacto notation for describing syntax of CFGs.

if_stmt ::= "if" expression ":" suite ( "elif" expression ":" suite )* [ "else" ":" suite ]

EBNF

Original BNF only supported recursion. sequence, decision(choice) repetition, recursion

S S→ aS → Ɛ

Page 5: Introduction to PEG (Parsing Expression Grammar) in python

CFG & AmbiguityCFG grammars are potentially ambiguous.

Dangling elseproblem

1 if( x > 5 )2 if( y > 5 )3 console.log("heaven");4 else console.log("limbo");

IfExp IfExp

CompName'x' Ops

>

Log

test Num5body

orelseStr

'limbo'

values

Comp

Name'x'

Name'y'

test

Str'heaven'Log

values

body

AST #1

5

Page 6: Introduction to PEG (Parsing Expression Grammar) in python

CFG & Ambiguity (2)

6

IfExp IfExp

CompName'x' Ops

>

Log

test Num5body

orelse

Str'limbo'

values

Comp

Name'x'

Name'y'

test

Str'heaven'Log

values

body

AST #2

Page 7: Introduction to PEG (Parsing Expression Grammar) in python

DefinitionsParse trees vs AST

Top-Down vs Bottom-up

= concrete whitespace, braces, semicolons

= abstract

= begin with start nonterminal.= work down the parse tree.

= identify terminals= infer nonterminals = climb the parse tree.

= nodes are nonterminals from grammar

= uses tree nodes specific to language constructs

Page 8: Introduction to PEG (Parsing Expression Grammar) in python

Definitions (2)Recursive descent parsing

8

* A top-down parser constructed from recursive functions.* Each function represents a rule in the grammar.

version ::= <digit> '.' <digit>digit ::= '0' | '1' ... | '9'

def version( source, position=0 ): digit( source, position ) period( source, position + 1 ) digit( source, position + 2 )

Run (pymeta) nose --nocapture -v test_rdp_list.py

Page 9: Introduction to PEG (Parsing Expression Grammar) in python

Recursive Descent Parsing

9

def digit(source, position): fn = (lambda t: t in string.digits,this_rule()) expect(source, position, fn)

def expect(source, position, comparator): try: expecting, msg = comparator if not expecting(source[0]): raise ParseError(position, msg)

source.popleft() #consume ! except IndexError: raise EOFError(position)

def period(source, position): fn = (lambda t: t == '.',this_rule()) expect(source, position, fn)

Page 10: Introduction to PEG (Parsing Expression Grammar) in python

Recursive Descent Parsing (2)

>>> version(collections.deque('1.6'))

>>> import collections

ParseError: (1, 'expected <period>')

>>> version(collections.deque('1,6'))

>>> version(collections.deque('1.'))

EOFError: (2, [('message', 'end of input')])

10

ParseError: (0, 'expected <digit>')

>>> version(collections.deque('A.6'))

Page 11: Introduction to PEG (Parsing Expression Grammar) in python

Classical method of parsing

Specific to LALR(1) bottom-up parsers11

1. Flesh out a grammar in BNF

2. Lexical analysis phase

lexer ( patterns, stream-of-characters) => stream of tokens

3. Parsing phase

parser ( grammar, stream-of-tokens) => parse tree / AST 4. Use your parser

Photo attribution: http://www.flickr.com/photos/j_aroche/2160902499/

Page 12: Introduction to PEG (Parsing Expression Grammar) in python

Spectrum of parsing solutions

Regex

Lex / Yacc parser generators (GNU flex/bison)

PEG parsers

Handwritten Recursive Descent Parsers

ANTLR

12

Page 13: Introduction to PEG (Parsing Expression Grammar) in python

Other python parsing toolkits

PLY

funcparserlib

Yapps

http:// wiki.python.org / moin / LanguageParsing

13

Page 14: Introduction to PEG (Parsing Expression Grammar) in python

PEG

Scanner-less

Formalized by Bryan Ford in 2002-2004

Grammar mimics a recursive descent parser (+ backtracking).

14

A PEG grammar consists of a set of parsing expressions of the form: A e →One expression is denoted the starting expressione1 / e2 Ordered Choicee1 e2 Sequencee+ e? e* Repetition&e !e Predicates

PEG != EBNF

Page 15: Introduction to PEG (Parsing Expression Grammar) in python

PEG's ordered choice

S → “Hitch” / “Hitchens”Q. Given an input string of “Hitchens”, what is the result of the parse ?

Law #1: Given an input of A, the parsing expression matches a prefix A' of A or fails. Law #2: A rule S -> M / N will try to parse for a M. If that fails, backtrack & look for N.

Answer: Hitch 15

Page 16: Introduction to PEG (Parsing Expression Grammar) in python

PEG vs CFG

PEG CFG

Handles ambiguous grammars

No Yes

Syntax definition philosophy

Analytical Generative

Requires a lexical analysis phase ? No Yes (lex/yacc)

Choicealternation

Ordered Commutativee1/e2

16

Left recursion * No Yes

* Warth et al. Packrat parsers can support left recursion (2008)

Page 17: Introduction to PEG (Parsing Expression Grammar) in python

PEG & Packrat parsing

Neotoma Cinerea

17

Solution: memoization guarantees linear time performance.

Context: recursive descent parsing with backtracking

Problem: an input substring might be re-parsed during backtracking.

grammar ::= AB | AC

Photo attribution: http://en.wikipedia.org/wiki/File:Neotoma_cinerea.jpg

Page 18: Introduction to PEG (Parsing Expression Grammar) in python

Parse modern Japanese dates in various formats.

If the date parses successfully, convert it to its equivalent datetime.date instance.

18

case study #1 problem statement

Page 19: Introduction to PEG (Parsing Expression Grammar) in python

case study #1 : The four ERAs

19

HEISEI ( ) 1989 Jan 8 - present

SHOWA ( ) 1926 Dec 25 – 1989 Jan 7

TAISHOU ( ) 1912 Jul 30 – 1926 Dec 24

MEIJI ( ) 1868 Sep 8 – 1912 July 29

Akihito

Hirohito

Yoshihito

Mutsuhito

Page 20: Introduction to PEG (Parsing Expression Grammar) in python

case study #1 : liberties taken

1. No support for days-of-the-week tagged onto the end.

2. Numbers use western digits, not kanji.

3. Some eras have overlapping days. Ignore.

4. For 1st year of an era, no support for gannen.

20

Page 21: Introduction to PEG (Parsing Expression Grammar) in python

case study #1 : initial attempt

from pyparsing import Literal, Word, nums

year = Literal( u'\u5e74' )month = Literal( u'\u6708' )day = Literal( u'\u65e5' )heisei_era = Literal( u'\u5e73\u6210' ) integer = Word(nums)

21

Word(nums, exact=2)

Page 22: Introduction to PEG (Parsing Expression Grammar) in python

case study #1 : initial attempt (2)

western_year = integer('yyyy') + yearimperial_year = heisei_era + western_year

day_spec = integer.setResultsName('dd') + daymonth_spec = integer('mm') + month

year_spec = (imperial_year('imperial') | western_year('western'))grammar = year_spec + month_spec + day_spec

Page 23: Introduction to PEG (Parsing Expression Grammar) in python

case study #1 : initial attempt (3)

23

result = grammar.parseString(japanese_date)print result.dump()

Page 24: Introduction to PEG (Parsing Expression Grammar) in python

pyparsing : introduction

Easy to use PEG-based text parser

Grammar definitions in python

Framework distributed as one file pyparsing.py

Runs on both python 2.x & 3.x .Future releases after 1.5.x will be focused on python 3.x only

24

Not classified as recursive descent !

Page 25: Introduction to PEG (Parsing Expression Grammar) in python

25

pyparsing : framework overview

Page 26: Introduction to PEG (Parsing Expression Grammar) in python

pyparsing & PEGs : correlation

e1 e2̷

e1 e2

e*

e+

e?

&e

!e

PEG pyparsinge1 + e2 == And( e1, e2 )

e1 | e2 == MatchFirst( [e1,e2] )

ZeroOrMore( e )

OneOrMore( e )

Optional( e )

Followed( e )

~e == NotAny( e )

Page 27: Introduction to PEG (Parsing Expression Grammar) in python

pyparsing : framework overview

27

Page 28: Introduction to PEG (Parsing Expression Grammar) in python

pyparsing : ordered choiceMatchFirst will short circuit as soon as a match is found. Not commutative.

Shadowing literals in which one is a substring of the other should be avoided.

28

Keywords are different

Page 29: Introduction to PEG (Parsing Expression Grammar) in python

pyparsing : backtracking

Or forces the parser to make an exhaustive search of the alternatives. (match longest)

Or might introduce ambiguities. No better than non-PEG parsers.

Tweak the order of alternatives & put most probable (e.g. frequency of occurrence) first. Avoids wasteful backtracking.

29

Page 30: Introduction to PEG (Parsing Expression Grammar) in python

pyparsing : backtracking

p1,p2,p3,p4,p5 = map(Literal,['ronaldo','messi', 'park-ji-sung', 'xavi','iniesta'])first = p2 + p1 + p4second = p2 + p1 + p5third = p2 + p1 + p3

grammar = first | second | third

print grammar.parseString( "messi ronaldo park-ji-sung" )

Ballon d'Or 2011 example

Page 31: Introduction to PEG (Parsing Expression Grammar) in python

pyparsing : backtracking

31

Page 32: Introduction to PEG (Parsing Expression Grammar) in python

pyparsing : left factoredp1,p2,p3,p4,p5 = map(Literal,['ronaldo','messi', 'park-ji-sung', 'xavi','iniesta'])absolute_certainty = p2 + p1too_close_to_call = p4 | p5 | p3

grammar = absolute_certainty + too_close_to_callprint grammar.parseString( "messi ronaldo \ park-ji-sung" )

32

Page 33: Introduction to PEG (Parsing Expression Grammar) in python

pyparsing : packrat Memoization must be manually turned on.

ParserElement.enablePackrat()

Caches: a. ParseResults b. Exceptions thrown

run python select_parser.py 33

Caveat emptor: A grammar with parse actions that has side effects do not always play well with memoization turned on.

Page 34: Introduction to PEG (Parsing Expression Grammar) in python

pyparsing : semantic actionsIn pyparsing parlance, a ParserElement can have zero or more parsing actions.

34

4 forms of parse actions: fn(s,loc,toks) fn(loc,toks) fn(toks) fn()

Usage: ParserElement.setParseAction( *fn ) ParserElement.addParseAction( *fn )

Uses: 1. Perform validation (see ParseException) 2. Process the matched token(s) & modify it Returning a value overwrites the matched token(s). 3. Annotate with custom types (collary of #2)

Page 35: Introduction to PEG (Parsing Expression Grammar) in python

case study #1 : Semantic action

integer = Word(nums).setParseAction( lambda t: int(t[0]))

All users of the integer expression will inherit the parse action.

def range_check(toks): month = int(toks[0]) if month <=0 or month >= 13: raise ParseException('month must be in range 1..12')

month_spec = integer('m').addParseAction(range_check) + month

Selective assignments of parse action to copies.

Show: japan_simple.py 35integer.copy().addParseAction( .. )integer( 'result_name' ).addParseAction( .. )!

Page 36: Introduction to PEG (Parsing Expression Grammar) in python

case study #1 : test files

imperial . utf8 western . utf8

36

Page 37: Introduction to PEG (Parsing Expression Grammar) in python

case study #1 : complete solution

Show: japan_dates.py

Demo:

37

@traceParseActiondef convert_kanji_year(toks): if 'imperial' in toks.keys(): year = toks.imperial.yearZero + toks.imperial.yy toks['era'] = toks.imperial.type_ toks['yyyy'] = year elif 'western' in toks.keys(): year = toks.yyyy try: toks['modernDate'] = date(year, toks.mm, toks.dd) except ValueError, error: raise ParseException(error.args[0])

Page 38: Introduction to PEG (Parsing Expression Grammar) in python

case study #2 problem statement

Parse Gmail search criterias.

Supports a tiny subset of the full grammar :

from : ( <sender> )

label : inbox -label : sent

yyyyy -yyyyy “zzzzz” -”zzzzz”38

Page 39: Introduction to PEG (Parsing Expression Grammar) in python

case study #2: example strings

label : sarawak -label : not-urgent

from : ( bruno manser )

from : ( [email protected] )

from : ( @swiss.org )

“penan injustice”

-logging

39

Page 40: Introduction to PEG (Parsing Expression Grammar) in python

case study #2: email addresses

emailfull = Regex(r"(?P<user>[A-Za-z0-9._%+-]+)@(?P<hostname>[A-Za-z0-9.-]+)\.(?P<tld>[A-Za-z]{2,4})")

emailpartial = Regex(r"@(?P<hostname>[A-Za-z0-9.-]+)\.(?P<tld>[A-Za-z]{2,4})")

40

email = (emailpartial | emailfull)

squeeze = lambda t: ' '.join( t[0].split() )

name = ZeroOrMore(Word(alphanums + ' ')) .setParseAction( squeeze )

Page 41: Introduction to PEG (Parsing Expression Grammar) in python

case study #2: email addresses

opener,closer,colon = map(Suppress,'():')

enclosed = email | name

nested = opener + enclosed + closer

grammar_email = Combine(Suppress('from') + colon + nested)

41

Page 42: Introduction to PEG (Parsing Expression Grammar) in python

case study #2: email addressesresult = grammar_email.parseString( 'from:([email protected])' )print result.dump()

42

result = grammar.parseString( 'from:( Marco de Gasperi )')print result.dump()

Run: nosetests -v testFromTo.py

Page 43: Introduction to PEG (Parsing Expression Grammar) in python

case study #2: labels

hyphen = Suppress('-')

label_rhs = delimitedList(Word(alphanums), delim='-', combine=True )

43

Combine( expr + ZeroOrMore( delim + expr ) )

label_include = Combine( Suppress('label') + colon + label_rhs )label_exclude = Combine( hyphen + label_include )

label_all = MatchFirst([ label_exclude.setResultsName('labels.exclude', listAllMatches=True), label_include('labels.include*')])

grammar_label = ZeroOrMore( label_all )

pyparsing 1.5.6

GOAL: group the excluded and included labels into their own sub-lists. E.g. label : fukushima1 -label : aloo-gobi

Page 44: Introduction to PEG (Parsing Expression Grammar) in python

case study #2: labelsresult = grammar_label.parseString('-label:fukushima1 label:onagawa -label:aloo-gobi label:cheese-naan' ) print result.dump()

Question. Will this grammar work if the user entered LABEL instead of label ?

44

CaselessLiteral('label')

Answer.

Page 45: Introduction to PEG (Parsing Expression Grammar) in python

case study #2: search stringsGOAL: group the excluded and included search strings into their own sub-lists.

key_single = Word(alphanums)key_quoted = quotedString.setParseAction(removeQuotes)

key_included = key_quoted | key_singlekey_excluded = Combine(hyphen + key_included)

key_all = MatchFirst( [key_excluded("key.exclude*"), key_included("key.include*")] )

grammar_key = ZeroOrMore( key_all )45

rumi - “ jack kerouac ”

Page 46: Introduction to PEG (Parsing Expression Grammar) in python

case study #2: search stringsresult = grammar_key.parseString( ' -osama obama -"bin laden" "white house" ' )print result.dump()

Question. If the user entered single instead of double quotes, will it conform to the grammar ?

46Answer. Yes

Page 47: Introduction to PEG (Parsing Expression Grammar) in python

case study #2: Final solution

email_all = grammar_email('from*')

gmail = (ZeroOrMore(email_all | label_all | key_all) + Suppress(restOfLine))

Let's compose all the individual pieces together.

47

result = gmail.parseString('love label:writing-tips "bird by bird" from:(Anne Lamott) -"dalai lama" -label:macchu-pichu from:([email protected]) -label:french-guiana -"epictetus" label:yoga "bugle podcast" label from:(@microsoft.com)')print result.dump()

nested = opener + Group(enclosed) + closer

Page 48: Introduction to PEG (Parsing Expression Grammar) in python

48

case study #2: Final solution['love', 'writing-tips', 'bird-by-bird', 'Anne Lamott', 'dalai lama', 'macchu-pichu', '@microsoft.com', '[email protected]', 'french-guiana', 'epictetus', 'yoga', 'bugle podcast', 'label']-from: ['Anne Lamott','@microsoft.com', '[email protected]']-key.exclude: ['dalai lama','epictetus'] -key.include: ['love', 'bird by bird', 'bugle podcast', 'label']-labels.exclude: ['macchu-pichu', 'french-guiana']-labels.include: ['writing-tips','yoga']

Page 49: Introduction to PEG (Parsing Expression Grammar) in python

pyparsing: Recursion

49

A grammar is recursive when there exists a nonterminal which has itself in the right-hand-side of the production rule. number ::= digit rest

rest ::= digit rest | empty

digit = Word(nums,exact=1).setName('1-digit')

rest = Forward()rest << Optional(digit + rest)

number = Combine(digit + rest, adjacent=False) ('digit-list')

grammar = number.setParseAction(lambda t:int(t[0])) + Suppress(restOfLine)

Run

Page 50: Introduction to PEG (Parsing Expression Grammar) in python

case study #3: binary tree

Parse parentheses notation for binary trees.

(nil,4,nil) ((nil,2,(nil,3,nil)),4,((nil,5,(nil,6,nil)),7,nil))

2

3

4

5

6

7

Convert it to list notation in python50

Page 51: Introduction to PEG (Parsing Expression Grammar) in python

case study #3: recursive solution

node ::= '(' node ',' number ',' node ')' | empty

BNF

Codeleft, right, comma = map(Suppress, '(),')empty = (CaselessLiteral('nil') .setParseAction(replaceWith(None)))tree = Forward()value = Word(nums).setParseAction(lambda t:int(t[0]))

tree << ((left + Group(tree) + bookend(value) + Group(tree) + right)

51Run

Page 52: Introduction to PEG (Parsing Expression Grammar) in python

“ ((nil,2,(nil,3,nil)),4,((nil,5,(nil,6,nil)),7,nil)) ”

[[[None],2,[[None],3,[None]]],4,[[[None],5, [[None],6,[None]]],7,[None]]]

case study #3: recursive solution

Input :

Output :

How to fix it : Group(tree)Re-implement Group in

class TreeGroup(TokenConverter): def postParse(self, instring, loc, tokenlist): if len(tokenlist) == 1 and tokenlist[0] is None: return tokenlist else: return [tokenlist]

52

Page 53: Introduction to PEG (Parsing Expression Grammar) in python

pyparsing does not support left recursion.term ::= \d+ expr ::= expr + term | term

@raises(RecursiveGrammarException) def test_left_recursion(self): expr.validate()

Run 53

pyparsing : left recursion

pyparsing will raise a RuntimeError with message 'maximum recursion depth exceeded' '

Eliminate left recursion if you want it to work in pyparsing

Page 54: Introduction to PEG (Parsing Expression Grammar) in python
Page 55: Introduction to PEG (Parsing Expression Grammar) in python

PyMeta : introduction

55

lowercase ::= <char_range 'a' 'z'>

OMeta is a language prototyping system (PEG).

Implemented in several programming languages.

* Packrat memoization

* Grammar: BNF dialect (with host language snippets)

* Object-Oriented: inheritance, overriding rules

def rule_lowercase(): // ..body..

* <anything> consumes one object from the input stream. (c.f. regex)* Built-in rules <letter> <digit> <letterOrDigit> <token '?'>

Page 56: Introduction to PEG (Parsing Expression Grammar) in python

PEGs & PyMeta

PEG PyMeta

Syntactic Predicates(unlimited lookahead)

e1 e2

e1 | e2

~~e

!e == ~e

e*

e+

e?

e1 e2

e*

e+

&e

e1 / e2

e?

!e

Page 57: Introduction to PEG (Parsing Expression Grammar) in python

case study #1 : in PyMetaModest goals:a) recognize western and Heisei imperial datesb) read & parse both imperial.utf8 & western.utf8

common.py : Common rules & utilities

western_dates.py : Grammar to recognize western dates

era_heisei.py : Grammar to recognize heisei dates

japan_date_parser.py : Final grammar

Separate files:

57

Page 58: Introduction to PEG (Parsing Expression Grammar) in python

case study #1 : in PyMeta pt Afrom pymeta.grammar import OMetabaseGrammar = r"""# common literals for all ERAs year ::= <token u'\u5E74'> month ::= <token u'\u6708'> day ::= <token u'\u65E5'>

common.py

range_num :min :max ::= <digit>+:m ?(int(join(m)) >= min & int(join(m)) <= max) => mrest_of_line ::= <anything>* <token '\n'>? => Noneempty_line ::= <spaces> <rest_of_line> => Nonepython_comment ::= <token '#'> <rest_of_line> => None """

JapanCommonParser = OMeta.makeGrammar(baseGrammar, globals(), "JapanCommonParser")

def join(x): return ''.join(x)

58

Page 59: Introduction to PEG (Parsing Expression Grammar) in python

case study #1 : in PyMeta pt Bwestern_dates.pywesternGrammar = r"""

western ::= <spaces> <digit>+:y <year> <range_num 1 12>:m <month> <range_num 1 31>:d <day> <rest_of_line> => westernized( int(join(y)),int(join(m)), int(join(d)))

grammar ::= <python_comment> | <western>"""

def westernized(yyyy, mm, dd): retval = JapanDate() retval['western'] = date(yyyy,mm,dd) return retval

WesternParser = JapanCommonParser.makeGrammar( westernGrammar, globals(), 'WesternParser') 59

Page 60: Introduction to PEG (Parsing Expression Grammar) in python

case study #1 : in PyMeta pt Cera_heisei.py

60

era_heisei = Era('Heisei','Akihito', (u'\u5E73\u6210',u'\u337B'), startDate=date(1989,1,8))

def heisei_year_ok(yy): return (yy >= 1 and yy <= era_heisei.maxYearUnit)

def collect( yy, mm, dd ): retval = JapanDate() retval['imperial'] = date( era_heisei.yearZero + yy, mm, dd ) retval['era'] = [ era_heisei.name, yy ] return retval

Page 61: Introduction to PEG (Parsing Expression Grammar) in python

case study #1: in PyMeta pt C (2)era_heisei.py (continued)

heiseiGrammar = r"""

hlong ::= <token u'\u5e73\u6210'> hshort ::= <token u'\u337b'>

heisei ::= (<hlong> | <hshort>) <digit>+:y ?(heisei_year_ok(int(join(y)))) <year> <range_num 1 12>:m <month> <range_num 1 31>:d <day> <rest_of_line> => collect(int(join(y)),int(join(m)),int(join(d)))"""

HeiseiParser = JapanCommonParser.makeGrammar(heiseiGrammar, globals(), 'HeiseiParser')

61

Page 62: Introduction to PEG (Parsing Expression Grammar) in python

case study #1 : in PyMeta pt Djapan_date_parser.py

finalGrammar = r""" # override 'grammar' in WesternParser grammar ::= <super> | <heisei> | <empty_line>"""

class BaseParser(HeiseiParser, WesternParser): pass

BaseParser.globals.update(WesternParser.globals)BaseParser.globals.update(HeiseiParser.globals)

JapanDateParser = BaseParser.makeGrammar( finalGrammar, globals(), "JapanDateParser")

62

Page 63: Introduction to PEG (Parsing Expression Grammar) in python

case study #1 : in PyMeta pt D (2)japan_date_parser.py (continued)

def parse_file(filename): “”” iterate through each line “”” .... snipped ... parser = JapanDateParser(line) result,error = parser.apply('grammar') .... snipped ...

results = parse_file('imperial.utf8')results = parse_file('western.utf8')

Run63

Page 64: Introduction to PEG (Parsing Expression Grammar) in python

case study #1 : PyMeta output

64

Page 65: Introduction to PEG (Parsing Expression Grammar) in python

PyMeta : Left Recursion

recursiveGrammar = r"""

num ::= <num>:n <digit>:d => n * 10 + d | <digit>

digit ::= :d ?((d>='0') & (d<='9')) => int(d)"""

PyMeta can handle left recursion.

Run 65

Quiz. Is the following grammar equivalent ?

num ::= <digit> | <num>:n <digit>:d => n * 10 + d

Page 66: Introduction to PEG (Parsing Expression Grammar) in python

PyMeta : Matching objects

listGrammar = “”” digit ::= :x ?(x.isdigit()) => int(x) interp ::= [<digit>:x '+' <digit>:y] => x + y”””

g = OMeta.makeGrammar(listGrammar, {})parser = g( [['600','+','66']] )result,error = parser.apply('interp')

iterable

python list

66

>>> result666

>>> errorParseError(2,[])

Page 67: Introduction to PEG (Parsing Expression Grammar) in python

PyMeta : Matching objects (2)

import :i ::= <anything>:a ?(a.__class__ == Import) => 'import '+', '.join(import_match(a.names))

Object graph (e.g. tree)python rewriter project visits the AST tree created by the compiler module (python 2.x) & regenerates the python statement.

>>> import compiler>>> print compiler.parse('import ctypes')>>> Module(None, Stmt([Import(['ctypes', None)])]))

67

Page 68: Introduction to PEG (Parsing Expression Grammar) in python

pyparsing vs PyMeta pyparsing PyMeta

Whitespace sensitive? No. But turned on vialeaveWhitespace()

Yes. Use <spaces> rule to eat whitespaces

Left recursion No Yes

Packrat memoization Yes. Off by default. Yes. Only no-arg rules

Operates on characterstreams

Yes Yes

Operates on objectstreams

No Yes

Syntactic predicates Yes Yes

Semantic predicates No (@see parse actions) Yes

Semantic actions Yes Yes

Regex support NoYes68

Page 69: Introduction to PEG (Parsing Expression Grammar) in python

PyPy rlib/parsing

69

Library for generating tokenizers & parsers in RPython.

Consists of: regex / packrat parser

tree structure / EBNF parser

NUMBER: "\-?(0|[1-9][0-9]*)(\.[0-9]+)?([eE][\+\-]?[0-9]+)?";value: <STRING> | <NUMBER> | <object> | <array> | <"null"> |<"true"> | <"false">;array: ["["] (value [","])* value ["]"];entry: STRING [":"] value;

Sample JSON ebnf

Resulting parse tree can be transformed or traversed with custom visitors. (dot)

Page 70: Introduction to PEG (Parsing Expression Grammar) in python

Topics not covered

● Usage of syntactic predicates ● Parsing grammars of mathematical

expression in order to preserve operator precedence

● Handling indents/dedents in order to parse indentation-sensitive languages– e.g. coffeescript, python, haskell

Page 71: Introduction to PEG (Parsing Expression Grammar) in python

Resourcespyparsing

PyMeta

PyPy Rpython parsing library

http://pyparsing.wikispaces.com/

http://www.tinlizzie.org/ometa/

http://doc.pypy.org/en/latest/rlib.html

http://gitorious.org/python-decompiler/python_rewriter

https://github.com/marcua/tweeql

71

[email protected]