parboiled explained
TRANSCRIPT
Parboiled2 explained
Covered
Why Parboiled2Library basicsPerfomance optimizationsBest PracticesMigration
Features PEG No lexer required Flexible typesfe EDSL Compile-time optimizations Decent error reporting scala.js support
When regex fail
Parse arbitrary HTML with regexes is like asking Paris Hilton to write an operating system (c)
When regex fail
Performance (regex)
Parsing
Warmup
620.38
621.95
Parboiled2Regex
Data is taken from here:http://bit.ly/1XHAJaA
Lower is better
Performance (json)
Parboiled1
Parboiled2
Argonaut
Json4SNative
Json4SJackson
85.64
13.17
7.01
8.06
4.09
Data is taken from here:http://myltsev.name/ScalaDays2014/#/
Lower is better
Performance (json)
Parser combinators
Parboiled1
Parboiled2
Argonaut
Json4SNative
Json4SJackson
2385.78
85.64
13.17
7.01
8.06
4.09
Data is taken from here:https://groups.google.com/forum/#!topic/parboiled-user/bGtdGvllGgU
Lower is better
Alternatives
● Grappa [java]● ANTLR● Regexps● Parser-combinators● Language Workbenches (xtext, MPS)
<dependency>
<groupId>org.parboiled</groupId>
<artifactId>parboiled_2.11</artifactId>
<version>2.1.0</version>
</dependency>
import org.parboiled2._
class MyParser (val input: ParserInput) extends Parser { // Your grammar}
Rule DSL
Basic match
def CaseDoesntMatter = rule { ignoreCase("string")}
def MyCharRule = rule { 'a' }def MyStringRule = rule { "string" }
def MyCharRule = rule { ch('a') }def MyStringRule = rule { str("string") }
Basic match
def CaseDoesntMatter: Rule0 = rule { ignoreCase("string") }
def MyCharRule: Rule0 = rule {'a'}
def MyStringRule: Rule0 = rule { "string" }
Syntactic predicates
● ANY – matches any character except EOI● EOI – virtual chararter represents the end of input
val EOI = '\uFFFF'
You must define EOI at the end of the main/root rule
Syntactic predicates● anyOf – at least one of the defined chars● noneOf – everything except those chars
def Digit = rule { anyOf("1234567890")}
def Visible = rule { noneOf(" \n\t")}
Character ranges
def Digit = rule { '0' - '9' }def AlphaLower = rule { 'a' - 'z' }
Good, but not flexible(the main issue of parboiled1)
● Sometimes you don't need ANY character
● You have a range of characters
Character predicatesThere is set of predifined char predicates:
● CharPredicate.All● CharPredicate.Digit● CharPredicate.Digit19● CharPredicate.HexDigit
Of course you can defien your own
def AllButQuotes = rule {
CharPredicate.Visible -- "\"" -- "'"
}
def ValidIdentifier = rule {
CharPredicate.AlphaNum ++ "_"
}
CharPredicate from (_.isSpaceChar)
Character predicates
def ArithmeticOperation = rule {
anyOf("+-*/^")
}
def WhiteSpaceChar = rule { noneOf(" \t\n")}
anyOf/noneOf
def cows = rule { 1000 times "cow" }
def PRI = rule { 1 to 3 times Digit }
N times
def OptWs = rule { zeroOrMore(Whitespace) // Whitespace.*}
def UInt = rule { oneOrMore(Digit) // Whitespace.+}
def CommaSeparatedNumbers = rule { oneOrMore(UInt).separatedBy(",")}
0+/1+
import CharPredicate.Digit
// "yyyy-mm-dd"def SimplifiedRuleForDate = rule { Year ~ "-" ~ Month ~ "-" ~ Day}
def Year = rule { Digit ~ Digit ~ Digit ~ Digit}
def Month = rule { Digit ~ Digit }def Day = rule { Digit ~ Digit }
Sequence
// zeroOrOnedef Newline = rule { optional('\r') ~ '\n'}
def Newline = rule { '\r'.? ~ '\n'}
Optional
def Signum = rule { '+' | '-' }
def bcd = rule { 'b' ~ 'c' | 'b' ~ 'd'}
Ordered choice
// why order mattersdef Operator = rule { "+=" | "-=" | "*=" | "++" | "--" | "+" | "-" | "*" | "/" ...}
def Operators = rule { ("+" ~ ("=" | "+").?) | ("-" ~ ("=" | "-").?) | ...}
Order matters
Running the parserclass MyParser(val input: ParserInput)
extends Parser {
def MyStringRule: Rule0 = rule {
ignoreCase("match") ~ EOI }
}
Running the parser
val p1 = new MyParser("match")val p2 = new MyParser("much")
p1.MyStringRule.run() // Success
p2.MyStringRule.run() // Failure
Different delivery schemes are also available
Running the parser
val p1 = new MyParser("match")val p2 = new MyParser("much")
p1.MyStringRule.run() // Success
p2.MyStringRule.run() // Failure
Different delivery schemes are also available
BKVserver.name = "webserver"server { port = "8080" address = "192.168.88.88"
settings { greeting_message = "Hello!\n It's me!" }}
Performance
Unroll n.times for n <=4
// Slowerrule { 4 times Digit }
// Fasterrule { Digit ~ Digit ~ Digit ~ Digit }
Faster stack operations
// Much fasterdef Digit4 = rule { Digit ~ Digit ~ Digit ~ Digit ~ push( #(charAt(-4))*1000 + #(charAt(-3))*100 + #(charAt(-2))*10 + #(lastChar) )}
Do not recreate CharPredicate
class MyParser(val input: ParserInput) extends Parser { val Uppercase = CharPredicate.from(_.isUpper)
…
}
Use predicatesdef foo = rule { capture(zeroOrMore(noneOf("\n")))}
def foo = rule { capture(zeroOrMore(!'\n')) //loop here}
def foo = rule { capture(zeroOrMore( !'\n' ~ ANY ))}
Best Practices
Best Practices
● Unit tests● Small rules● Decomposition● Case objects instead of strings
Push case objectsdef LogLevel = rule {
capture("info" | "warning" | "error")
}
def LogLevel = rule {
“info” ~ push(LogLevel.Info)
| “warning" ~ push(LogLevel.Warning)
| “error" ~ push(LogLevel.Error)
}
Simple syntax for object capture
case class Text(s: String)
def charsAST: Rule1[AST] = rule {
capture(Chars) ~> ((s: String) => Text(s))
}
def charsAST = rule {
capture(Chars) ~> Text
}
Named rulesdef Header: Rule1[Header] =
rule("I am header") { ... }
def Header: Rule1[Header] = namedRule("header") {...}
def UserName = rule {
Prefix ~ oneOrMore(NameChar).named("username")
}
Migration
Migration
● Separate classpath org.parboiled vs org.parboiled2
● Grammar is hard to break● Compotition: trait → abstract class● Removing primitives library
Drawbacks
Drawbacks
● PEG (absence of lexer)● No support for left recursive grammars● No error recovery mechanism● No IDE support● No support for indentation based grammars● Awful non informative error messages
Q/A