pegs, treetop, and converting regular expressions to nfas jason dew and gary fredericks
TRANSCRIPT
![Page 1: PEGs, Treetop, and Converting Regular Expressions to NFAs Jason Dew and Gary Fredericks](https://reader036.vdocuments.net/reader036/viewer/2022062309/56649f305503460f94c4b6c4/html5/thumbnails/1.jpg)
PEGs, Treetop, and Converting Regular Expressions to NFAs
Jason Dew and Gary Fredericks
![Page 2: PEGs, Treetop, and Converting Regular Expressions to NFAs Jason Dew and Gary Fredericks](https://reader036.vdocuments.net/reader036/viewer/2022062309/56649f305503460f94c4b6c4/html5/thumbnails/2.jpg)
Parsing Expression Grammars and Treetop
Jason Dew
![Page 3: PEGs, Treetop, and Converting Regular Expressions to NFAs Jason Dew and Gary Fredericks](https://reader036.vdocuments.net/reader036/viewer/2022062309/56649f305503460f94c4b6c4/html5/thumbnails/3.jpg)
Outline
1.Introduction to PEGs
2.Introduction to Treetop
3.References and Questions
![Page 4: PEGs, Treetop, and Converting Regular Expressions to NFAs Jason Dew and Gary Fredericks](https://reader036.vdocuments.net/reader036/viewer/2022062309/56649f305503460f94c4b6c4/html5/thumbnails/4.jpg)
PEGs
• PEG := parsing expression grammar
• A generalization of regular expressions
• Similar to context-free grammars
• Unlike BNF, parse trees are unambiguous
![Page 5: PEGs, Treetop, and Converting Regular Expressions to NFAs Jason Dew and Gary Fredericks](https://reader036.vdocuments.net/reader036/viewer/2022062309/56649f305503460f94c4b6c4/html5/thumbnails/5.jpg)
Formal Definition
N: a finite set of non-terminal symbols
∑: a finite set of terminal symbols
P: a finite set of parsing rules
es: the starting expression
![Page 6: PEGs, Treetop, and Converting Regular Expressions to NFAs Jason Dew and Gary Fredericks](https://reader036.vdocuments.net/reader036/viewer/2022062309/56649f305503460f94c4b6c4/html5/thumbnails/6.jpg)
Formal Definition
Parsing rules take the form: A := e
non-terminal parsing expression(or some combination)
![Page 7: PEGs, Treetop, and Converting Regular Expressions to NFAs Jason Dew and Gary Fredericks](https://reader036.vdocuments.net/reader036/viewer/2022062309/56649f305503460f94c4b6c4/html5/thumbnails/7.jpg)
Parsing Expressions
Several ways to combine expressions:
• sequence: "foo" "bar”
• ordered choice: "foo" / "bar”
• zero or more: "foo"*
• one or more: "foo"+
• optional: "foo"?
![Page 8: PEGs, Treetop, and Converting Regular Expressions to NFAs Jason Dew and Gary Fredericks](https://reader036.vdocuments.net/reader036/viewer/2022062309/56649f305503460f94c4b6c4/html5/thumbnails/8.jpg)
Parsing Expressions
Lookahead assertions (these do not consume any
input):
• positive lookahead: "foo" &"bar”
• negative lookahead: "foo" !"baz"
![Page 9: PEGs, Treetop, and Converting Regular Expressions to NFAs Jason Dew and Gary Fredericks](https://reader036.vdocuments.net/reader036/viewer/2022062309/56649f305503460f94c4b6c4/html5/thumbnails/9.jpg)
Implementations
Java: parboiled, rats!C: peg, leg, pegcC++: boostPython: ppeg, pypeg, pijnuJavascript: koupreyPerl 6: (part of the language)Erlang: neotomaClojure: clj-pegF#: fparsec
and finally... Ruby has Treetop
![Page 10: PEGs, Treetop, and Converting Regular Expressions to NFAs Jason Dew and Gary Fredericks](https://reader036.vdocuments.net/reader036/viewer/2022062309/56649f305503460f94c4b6c4/html5/thumbnails/10.jpg)
Treetop
A DSL (domain-specific language)written in Ruby
for implementing PEGs
![Page 11: PEGs, Treetop, and Converting Regular Expressions to NFAs Jason Dew and Gary Fredericks](https://reader036.vdocuments.net/reader036/viewer/2022062309/56649f305503460f94c4b6c4/html5/thumbnails/11.jpg)
Syntax
Two main keywords in the DSL: grammar and rule
![Page 12: PEGs, Treetop, and Converting Regular Expressions to NFAs Jason Dew and Gary Fredericks](https://reader036.vdocuments.net/reader036/viewer/2022062309/56649f305503460f94c4b6c4/html5/thumbnails/12.jpg)
Semantics
Consider the following PEGand the input string (((a))):
the resulting parse tree:
![Page 13: PEGs, Treetop, and Converting Regular Expressions to NFAs Jason Dew and Gary Fredericks](https://reader036.vdocuments.net/reader036/viewer/2022062309/56649f305503460f94c4b6c4/html5/thumbnails/13.jpg)
And now for the cool part
• each of the nodes are instances of Treetop::Runtime::SyntaxNode
• semantics get defined here
• all of Ruby is available to you
![Page 14: PEGs, Treetop, and Converting Regular Expressions to NFAs Jason Dew and Gary Fredericks](https://reader036.vdocuments.net/reader036/viewer/2022062309/56649f305503460f94c4b6c4/html5/thumbnails/14.jpg)
Example
![Page 15: PEGs, Treetop, and Converting Regular Expressions to NFAs Jason Dew and Gary Fredericks](https://reader036.vdocuments.net/reader036/viewer/2022062309/56649f305503460f94c4b6c4/html5/thumbnails/15.jpg)
Example (sans code duplication)
![Page 16: PEGs, Treetop, and Converting Regular Expressions to NFAs Jason Dew and Gary Fredericks](https://reader036.vdocuments.net/reader036/viewer/2022062309/56649f305503460f94c4b6c4/html5/thumbnails/16.jpg)
Treetop::Runtime::SyntaxNode
Methods available:• #terminal? : true if this node corresponds to a
terminal symbol, false otherwise
• #non_terminal? : true if this node corresponds to a non-terminal symbol, false otherwise
• #text_value : returns the matched text
• #elements : returns the child nodes (only for non-terminal nodes)
![Page 17: PEGs, Treetop, and Converting Regular Expressions to NFAs Jason Dew and Gary Fredericks](https://reader036.vdocuments.net/reader036/viewer/2022062309/56649f305503460f94c4b6c4/html5/thumbnails/17.jpg)
References and Questions
• http://en.wikipedia.org/wiki/Parsing_expression_grammar• http://treetop.rubyforge.org/
![Page 18: PEGs, Treetop, and Converting Regular Expressions to NFAs Jason Dew and Gary Fredericks](https://reader036.vdocuments.net/reader036/viewer/2022062309/56649f305503460f94c4b6c4/html5/thumbnails/18.jpg)
RE → εNFA
Gary Fredericks
![Page 19: PEGs, Treetop, and Converting Regular Expressions to NFAs Jason Dew and Gary Fredericks](https://reader036.vdocuments.net/reader036/viewer/2022062309/56649f305503460f94c4b6c4/html5/thumbnails/19.jpg)
Plan
1. Demonstrate Application2. Show Treetop Parse Tree3. Class NFA
1. Simple one-character NFA2. Combined NFAs
1. Question Mark2. Kleene Star3. Concatenation4. Or
4. Optimizations
![Page 20: PEGs, Treetop, and Converting Regular Expressions to NFAs Jason Dew and Gary Fredericks](https://reader036.vdocuments.net/reader036/viewer/2022062309/56649f305503460f94c4b6c4/html5/thumbnails/20.jpg)
Application
• http://gfredericks.com/main/sandbox/regexTemporary shortcut:• gfredericks.com/531
![Page 21: PEGs, Treetop, and Converting Regular Expressions to NFAs Jason Dew and Gary Fredericks](https://reader036.vdocuments.net/reader036/viewer/2022062309/56649f305503460f94c4b6c4/html5/thumbnails/21.jpg)
Treetop Grammargrammar Regex rule reg expression end
rule expression term ("|" term)* <RegexNFA::Expression> end
rule term modified_factor+ <RegexNFA::Term> end
![Page 22: PEGs, Treetop, and Converting Regular Expressions to NFAs Jason Dew and Gary Fredericks](https://reader036.vdocuments.net/reader036/viewer/2022062309/56649f305503460f94c4b6c4/html5/thumbnails/22.jpg)
Treetop Grammar (cont)
rule modified_factor factor modifier? <RegexNFA::ModifiedFactor> end
rule factor "(" expression ")" <RegexNFA::Factor> / literal / characterClass end
rule modifier optional / one_or_more / zero_or_more / specified_number end
![Page 23: PEGs, Treetop, and Converting Regular Expressions to NFAs Jason Dew and Gary Fredericks](https://reader036.vdocuments.net/reader036/viewer/2022062309/56649f305503460f94c4b6c4/html5/thumbnails/23.jpg)
Treetop Grammar (cont) rule optional "?" <RegexNFA::Optional> end
rule one_or_more "+" <RegexNFA::OneOrMore> end
rule zero_or_more "*" <RegexNFA::ZeroOrMore> end
rule specified_number "{" [0-9]+ ("," [0-9]* )? "}" <RegexNFA::SpecifiedNumber> end…
![Page 24: PEGs, Treetop, and Converting Regular Expressions to NFAs Jason Dew and Gary Fredericks](https://reader036.vdocuments.net/reader036/viewer/2022062309/56649f305503460f94c4b6c4/html5/thumbnails/24.jpg)
Treetop Example
• a?d|(bc*){2} matches ad d bb bcb bbccccccc bccccccccbccccccc
![Page 25: PEGs, Treetop, and Converting Regular Expressions to NFAs Jason Dew and Gary Fredericks](https://reader036.vdocuments.net/reader036/viewer/2022062309/56649f305503460f94c4b6c4/html5/thumbnails/25.jpg)
Treetop Syntax Treea?d|(bc*){2}
![Page 26: PEGs, Treetop, and Converting Regular Expressions to NFAs Jason Dew and Gary Fredericks](https://reader036.vdocuments.net/reader036/viewer/2022062309/56649f305503460f94c4b6c4/html5/thumbnails/26.jpg)
![Page 27: PEGs, Treetop, and Converting Regular Expressions to NFAs Jason Dew and Gary Fredericks](https://reader036.vdocuments.net/reader036/viewer/2022062309/56649f305503460f94c4b6c4/html5/thumbnails/27.jpg)
Simplifying Assumptions
Every NFA has a start state with only outgoing transitions
Every NFA has a single accepting state with only incoming transitions
(This means no self-transitions in either case)
![Page 28: PEGs, Treetop, and Converting Regular Expressions to NFAs Jason Dew and Gary Fredericks](https://reader036.vdocuments.net/reader036/viewer/2022062309/56649f305503460f94c4b6c4/html5/thumbnails/28.jpg)
NFA.simple(char)
my_simple = NFA.simple(“c”)
![Page 29: PEGs, Treetop, and Converting Regular Expressions to NFAs Jason Dew and Gary Fredericks](https://reader036.vdocuments.net/reader036/viewer/2022062309/56649f305503460f94c4b6c4/html5/thumbnails/29.jpg)
NFA.simple(char)def NFA.simple(alphabet,symbol)
NFA.new( [:init,:a], # states
alphabet, # alphabet lambda do |st,sym| # transition (st==:init and sym==symbol) ? [:a] : [] end, :init, # start state [:a]) # accepting statesend
![Page 30: PEGs, Treetop, and Converting Regular Expressions to NFAs Jason Dew and Gary Fredericks](https://reader036.vdocuments.net/reader036/viewer/2022062309/56649f305503460f94c4b6c4/html5/thumbnails/30.jpg)
NFA::question_mark
my_simple.question_mark
![Page 31: PEGs, Treetop, and Converting Regular Expressions to NFAs Jason Dew and Gary Fredericks](https://reader036.vdocuments.net/reader036/viewer/2022062309/56649f305503460f94c4b6c4/html5/thumbnails/31.jpg)
NFA::question_markdef question_mark trans = lambda do |st, sym| original = @transition.call(st,sym) if(st == @start and sym.nil?) original += @accepting end return original end NFA.new(@states, @alphabet, trans, @start, @accepting)end
![Page 32: PEGs, Treetop, and Converting Regular Expressions to NFAs Jason Dew and Gary Fredericks](https://reader036.vdocuments.net/reader036/viewer/2022062309/56649f305503460f94c4b6c4/html5/thumbnails/32.jpg)
What’s going on here? (closures)
new_nfa = my_simple.question_mark
new_nfa
transition
my_simple
![Page 33: PEGs, Treetop, and Converting Regular Expressions to NFAs Jason Dew and Gary Fredericks](https://reader036.vdocuments.net/reader036/viewer/2022062309/56649f305503460f94c4b6c4/html5/thumbnails/33.jpg)
NFA::star
my_simple.star
![Page 34: PEGs, Treetop, and Converting Regular Expressions to NFAs Jason Dew and Gary Fredericks](https://reader036.vdocuments.net/reader036/viewer/2022062309/56649f305503460f94c4b6c4/html5/thumbnails/34.jpg)
NFA::stardef star a = self.wrap(“m”) states = a.states + [:init] transition = lambda do |st,sym| if(state==:init) if(symbol.nil?) return [a.start]+a.accepting else ret = a.transition.call(st,sym) if(a.accepting.any?{|s|ret.include?(s)}) ret << a.start end return ret end end
![Page 35: PEGs, Treetop, and Converting Regular Expressions to NFAs Jason Dew and Gary Fredericks](https://reader036.vdocuments.net/reader036/viewer/2022062309/56649f305503460f94c4b6c4/html5/thumbnails/35.jpg)
<cont> -- NFA::star NFA.new(states, @alphabet, transition, :init, a.accepting)end
![Page 36: PEGs, Treetop, and Converting Regular Expressions to NFAs Jason Dew and Gary Fredericks](https://reader036.vdocuments.net/reader036/viewer/2022062309/56649f305503460f94c4b6c4/html5/thumbnails/36.jpg)
NFA::concatenate
my_simple.concatenate(my_simple)
![Page 37: PEGs, Treetop, and Converting Regular Expressions to NFAs Jason Dew and Gary Fredericks](https://reader036.vdocuments.net/reader036/viewer/2022062309/56649f305503460f94c4b6c4/html5/thumbnails/37.jpg)
NFA::concatenate(other)def concatenate(other) a=self.wrap(“m”) b=other.wrap(“n”) states = a.states-a.accepting+b.states transition = lambda do |st, sym| if(a.states.include?(state)) a.transition.call(state,symbol). map{|s|a.accepting.include?(s) ? b.start : s} else b.transition.call(state,symbol) end end # continuing…
![Page 38: PEGs, Treetop, and Converting Regular Expressions to NFAs Jason Dew and Gary Fredericks](https://reader036.vdocuments.net/reader036/viewer/2022062309/56649f305503460f94c4b6c4/html5/thumbnails/38.jpg)
<cont> -- NFA::concatenate(other)
NFA.new(states, @alphabet, transition, a.start, b.accepting)end
![Page 39: PEGs, Treetop, and Converting Regular Expressions to NFAs Jason Dew and Gary Fredericks](https://reader036.vdocuments.net/reader036/viewer/2022062309/56649f305503460f94c4b6c4/html5/thumbnails/39.jpg)
NFA::or
NFA.simple(“c”).or(NFA.simple(“d”))
![Page 40: PEGs, Treetop, and Converting Regular Expressions to NFAs Jason Dew and Gary Fredericks](https://reader036.vdocuments.net/reader036/viewer/2022062309/56649f305503460f94c4b6c4/html5/thumbnails/40.jpg)
NFA::or(other)
def or(other) a = self.wrap(“m”) b = self.wrap(“n”) states = a.states + b.states + [:init, :accept] – [a.start, b.start, a.accepting, b.accepting]
![Page 41: PEGs, Treetop, and Converting Regular Expressions to NFAs Jason Dew and Gary Fredericks](https://reader036.vdocuments.net/reader036/viewer/2022062309/56649f305503460f94c4b6c4/html5/thumbnails/41.jpg)
<cont> -- NFA::or(other) transition = lambda do |st, sym| ret= if(st==:init) a.transition.call(a.start,sym)+ b.transition.call(b.start,sym) elsif(a.states.include?(st)) a.transition.call(st,sym) else b.transition.call(st,sym) end return ret.map do |s| [a.accepting+b.accepting]. include?(s) ? :accept : s end end
![Page 42: PEGs, Treetop, and Converting Regular Expressions to NFAs Jason Dew and Gary Fredericks](https://reader036.vdocuments.net/reader036/viewer/2022062309/56649f305503460f94c4b6c4/html5/thumbnails/42.jpg)
<cont> -- NFA::or(other)
NFA.new(states, @alphabet, transition, :init, [:accept])end
![Page 43: PEGs, Treetop, and Converting Regular Expressions to NFAs Jason Dew and Gary Fredericks](https://reader036.vdocuments.net/reader036/viewer/2022062309/56649f305503460f94c4b6c4/html5/thumbnails/43.jpg)
Syntax Tree Translationa?d|(bc*){2}
![Page 44: PEGs, Treetop, and Converting Regular Expressions to NFAs Jason Dew and Gary Fredericks](https://reader036.vdocuments.net/reader036/viewer/2022062309/56649f305503460f94c4b6c4/html5/thumbnails/44.jpg)
Conclusion
I enjoyed making this.Questions?
![Page 45: PEGs, Treetop, and Converting Regular Expressions to NFAs Jason Dew and Gary Fredericks](https://reader036.vdocuments.net/reader036/viewer/2022062309/56649f305503460f94c4b6c4/html5/thumbnails/45.jpg)
Optimization: Repetition
What do we do for the regular expression b{200}?
Naïve:my_b = NFA.simple(“b”)res=my_b199.times do res=res.concatenate(my_b)end
![Page 46: PEGs, Treetop, and Converting Regular Expressions to NFAs Jason Dew and Gary Fredericks](https://reader036.vdocuments.net/reader036/viewer/2022062309/56649f305503460f94c4b6c4/html5/thumbnails/46.jpg)
Naïve Approach Resultres
my_b
![Page 47: PEGs, Treetop, and Converting Regular Expressions to NFAs Jason Dew and Gary Fredericks](https://reader036.vdocuments.net/reader036/viewer/2022062309/56649f305503460f94c4b6c4/html5/thumbnails/47.jpg)
Better Idea – Divide and Conquer
def times(n) return self if(n==1) a = self.times(n/2) b = self.times(n-n/2) return a.concatenate(b)end
![Page 48: PEGs, Treetop, and Converting Regular Expressions to NFAs Jason Dew and Gary Fredericks](https://reader036.vdocuments.net/reader036/viewer/2022062309/56649f305503460f94c4b6c4/html5/thumbnails/48.jpg)
Divide and Conquer Result
![Page 49: PEGs, Treetop, and Converting Regular Expressions to NFAs Jason Dew and Gary Fredericks](https://reader036.vdocuments.net/reader036/viewer/2022062309/56649f305503460f94c4b6c4/html5/thumbnails/49.jpg)
Conclusion
I enjoyed making this.Questions?