implementing external dsls using scala parser combinators
DESCRIPTION
Slides from talk I gave at St. Louis Lambda Lounge (http://lambdalounge.org/) for the Dec. 2009 meeting.TRANSCRIPT
Implementing External DSLs Using Scala Parser Combinators
St. Louis Lambda LoungeSept. 3, 2009
Tim DaltonSenior Software EngineerObject Computing Inc.
External vs Internal DSL Internal DSLs are implemented using syntax of
“host” programming language
Examples Fluent APIs in Java
RSpec and ScalaSpec
Constrained by features of programming language
External DSLs syntax is only limited by capabilities of the parser
What is a Combinator ? Combinators are functions that can be
combined to perform more complex operations
Concept originates in Lambda Calculus
Mostly comes from the Haskell community
Haskell implementations use Monads
Scala implementation “almost Monadic”
Scala’s Parser Implementation Context-free LL grammar
Left to right Leftmost derivation
Recursive descent
Backtracking There are ways to prevent backtracking
Advances planned for Scala 2.8 Support for Packrat parsing Parser Expression Grammar More predictive with less recursion and backtracking
Scala Combinator Parser Hierarchy
scala.util.parsing.combinator.Parsers
scala.util.parsing.combinator.syntactical.TokenParsers
scala.util.parsing.combinator.syntactical.StdTokenParsers
scala.util.parsing.combinator.RegexParsers
scala.util.parsing.combinator.JavaTokenParsers
A Simple Logo(-Like) Interpreter Only a few commands:
Right Turn <angle-degrees>
Left Turn <angle-degrees>
Forward <number-of-pixels>
Repeat <nested sequence of other commands>
Grammar for Simple Logo
forward = (“FORWARD” | “FD”) positive-integer
right = (“RIGHT” | “RT) positive-integer
left = (“LEFT” | “LT”) positive-integer
repeat = “REPEAT” positive-integer “[“{statement}”]”
statement = right | left | forward | repeat
program = { statement }
Scala Code to Implement Parser
object LogoParser extends RegexParsers { def positiveInteger = """\d+"""r def forward = ("FD"|"FORWARD")~positiveInteger def right = ("RT"|"RIGHT")~positiveInteger def left = ("LT"|"LEFT")~positiveInteger def repeat = "REPEAT" ~ positiveInteger ~ "[" ~
rep(statement) ~ "]" def statement:Parser[Any] = forward | right | left |
repeat
def program = rep(statement)}
Scala Code to Implement Parser
An internal DSL is used to implement an External One
Methods on preceding slide are referred to as parser generators
RegexParsers is subclass of Parsers trait that provides a generic parser combinator
A Closer Look
def positiveInteger = """\d+"""r The trailing “r” is a method call the converts
the string to a Regex object More verbose syntax:
"""\d+""".r()
String does not have an r() method !!
Class RichString does, so an implicit conversion is done
Implicit Conversions One of the more powerful / dangerous features
of Scala is implicit conversions
RichString.r method signaturedef r : Regex
scala.Predef implicit convertor implicit def stringWrapper(x :
java.lang.String) : RichString
The Scala compiler will look for implicit convertors in scope and insert them implicitly
“With great power, comes great responsibility”
Back to the Parserdef forward =
("FD"|"FORWARD")~positiveInteger
The “|” and “~” are methods of class Parsers.Parser[T] !!
RegexParser has implicit conversions:implicit def literal(s : String) :
Parser[String]
implicit def regex(r : Regex) : Parser[String]
Parser generator methods should return something that can be at least be converted to Parser[T]
Parser[T]’s and ParseResult[T]’s Parsers.Parser[T]
Extends Reader => ParseResult[T] This makes it a function object
ParserResult[T] Hierarchy: Parsers.Success Parsers.NoSuccess
Parsers.Failure Parsers.Error
Invoking Parsers[T] function object return one of the above subclasses
Combining Parser[T]’s Signature for Parser[T].| method:
def |[U >: T](q : => Parser[U]) : Parser[U]
Parser Combinator for alternative composition (OR) Succeeds (returns Parsers.Success) if either “this”
Parser[T] succeeds or “q” Parser[U] succeeds
Type U must be same or super-class of type T.
Combining Parser[T]’s Signature of Parser[T].~ method:
def ~[U](p : => Parser[U]) : Parser[~[T, U]]
Parser Combinator for sequential composition Succeeds only if “this” Parser succeeds and “q” Parser
succeeds
Return an instance “~” that contain both results Yes, “~” is also a class ! Like a Pair, but easier to pattern match on
Forward March Back to the specification of forward:
def forward = ("FD"|"FORWARD")~positiveInteger
For this combinator to succeed,
Either the Parser for literal “FD” or “FORWARD”
And the Parser for the positiveInt Regex
Both the literal strings and Regex result of positiveInt are implicitly converted to Parser[String]
Repetition Next lines of note:
def repeat ="REPEAT" ~ positiveInteger ~ "[" ~
rep(statement) ~ "]" def statement:Parser[Any] =
forward | right | left | repeat
Type for either repeat or statement need to be explicitly specified due to recursion
The rep method specifies that Parser can be repeated
Repetition Signature for Parsers.rep method:
def rep[T](p : => Parser[T]) : Parser[List[T]]
Parser Combinator for repetitions
Parses input until Parser, p, fails. Returns consecutive successful results as List.
Other Forms of Repetition def repsep[T](p: => Parser[T], q: =>
Parser[Any]) : Parser[List[T]] Specifies a Parser to be interleaved in the repetition Example: repsep(term, ",")
def rep1[T](p: => Parser[T]): Parser[List[T]] Parses non-empty repetitions
def repN[T](n : Int, p : => Parser[T]) : Parser[List[T]] Parses a specified number of repetitions
Execution Root Parser Generator:
def program = rep(statement)
To Execute the ParserparseAll(program, "REPEAT 4 [FD 100 RT 90]")
Returns Parsers.Success[List[Parsers.~[…]]] Remember ,Parsers.Success[T] is subclass of
ParseResult[T] toString:
[1.24] parsed: List(((((REPEAT~4)~[)~List((FD~100), (RT~90)))~]))
The “…” indicates many levels nested Parsers
Not-so-Happy Path Example of failed Parsing:parseAll(program, "REPEAT 4 [FD 100 RT
90)")
Returns Parsers.Failure Subclass of ParseResult[Nothing]
toString:[1.23] failure: `]' expected but `)' found
REPEAT 4 [FD 100 RT 90) ^
Failure message not always so “precise”
Making Something Useful Successful parse results need to transformed
into something that can be evaluated
Enter the “eye brows” method of Parser[T]: def ^^[U](f : (T) => U) : Parser[U]
Parser combinator for function application
Eye Brows Example
Example of “^^” method:
def positiveInteger = """\d+""".r ^^ { x:String =>
x.toInt }
Now positiveInteger generates Parser[Int] instead of Parser[String]
Transformer can be shortened to “{ _.toInt }”
Implementing Commands For the statements we need a hierarchy of
command classes:
sealed abstract class LogoCommand
case class Forward(x: Int) extends LogoCommand
case class Turn(x: Int) extends LogoCommand
case class Repeat(i: Int, e: List[LogoCommand]) extends LogoCommand
Transforming into Commands The Forward command:
def forward = ("FD"|"FORWARD")~positiveInteger ^^ { case _~value => Forward(value) }
A ~[String, Int] is being passed in the transformer
Pattern matching is to extract the Int, value and construct a Forward instance Forward is a case class, so “new” not needed
Case constructs can be partial functions themselves. Longer form:
…^^ { tilde => tilde match { case _~value =>
Forward(value) }}
Derivates of “~” Two methods related to “~”:
def <~ [U](p: => Parser[U]): Parser[T] Parser combinator for sequential composition which keeps
only the left result
def ~> [U](p: => Parser[U]): Parser[U] Parser combinator for sequential composition which keeps
only the right result
Note, neither returns a “~” instance
The forward method can be simplified:def forward = ("FD"|"FORWARD")~>positiveInteger ^^
{ Forward(_) }
Updated Parser def positiveInteger = """\d+""".r ^^ { _.toInt } def forward = ("FD"|"FORWARD")~>positiveInteger ^^
{ Forward(_) } def right = ("RT"|"RIGHT")~>positiveInteger ^^
{ x => Turn(-x) }
def left = ("LT"|"LEFT")~>positiveInteger ^^
{ Turn(_) } def repeat = "REPEAT" ~> positiveInteger ~ "[" ~
rep(statement) <~ "]" ^^ { case number~_~statements => Repeat(number,
statements)}
Updated Parser Results Executing the Parser now:
parseAll(program, "REPEAT 4 [FD 100 RT 90]")
Results:[1.24] parsed: List(Repeat(4,List(Forward(100), Turn(-
90))))
Returns Parsers.Success[List[Repeat]]
This can be evaluated !!
Evaluationclass LogoEvaluationState { var x = 0 var y = 0 var heading = 0}implicit def dblToInt(d: Double):Int = if (d > 0) (d+0.5).toInt
else (d-0.5).toInt
def parse(s: String) : List[LogoCommand] = LogoParser.parse(s).get
def evaluate(parseResult: LogoParser.ParseResult[List[LogoCommand]], g:Graphics2D) {
var state = new LogoEvaluationState if (parseResult.successful) { evaluate(parseResult.get, g, state) } // draw turtle evaluate(parse("RT 90 FD 3 LT 110 FD 10 LT 140 FD 10 LT 110 FD
3"), g, state)
} // Continued...
Evaluation (More Functional)private def evaluate(list: List[LogoCommand], g:Graphics2D,
state:LogoEvaluationState) { if (!list.isEmpty) {
val head :: tail = listhead match {
case Forward(distance) => { val (nextX, nextY) =
(state.x + distance * sin(toRadians(state.heading)), state.y + distance * cos(toRadians(state.heading)))
g.drawLine(state.x, state.y, nextX, nextY)state.x = nextXstate.y = nextYevaluate(tail, g, state)
} case Turn(degrees) => { state.heading += degrees evaluate(tail, g, state) } case Repeat(0, _) => evaluate(tail, g, state) case Repeat(count, statements) =>
evaluate(statements ::: Repeat(count-1, statements)::tail, g, state)
}}
}
Evaluation (More Imperative)def evaluate(list: List[LogoCommand], g:Graphics2D,
state:LogoEvaluationState) { list.foreach(evaluate(_, g, state))} def evaluate(command:LogoCommand, g:Graphics2D, state:LogoEvaluationState) { command match { case Forward(distance) => { val (nextX, nextY) = (state.x + distance *
Math.sin(Math.toRadians(state.heading)), state.y + distance *
Math.cos(Math.toRadians(state.heading))) g.drawLine(state.x, state.y, nextX, nextY) state.x = nextX state.y = nextY } case Turn(degrees) => state.heading += degrees case Repeat(count, statements) => (0 to count).foreach { _ => evaluate(statements, g, state) } }}
Demonstration