scala parsers error recovery in production

36
Parsers Error Recovery for Practical Use Parsers Error Recovery for Practical Use Alexander Azarov [email protected] / Osinka.ru February 18, 2012

Upload: alexander-azarov

Post on 05-Jul-2015

2.217 views

Category:

Technology


3 download

DESCRIPTION

Using Scala parser combinators for BBCode markup parsing in production system at club.osinka.ru: - Treating invalid markup in parsers: recovery of errors - Scala parser combinators performance.

TRANSCRIPT

Page 1: Scala parsers Error Recovery in Production

Parsers Error Recovery for Practical Use

Parsers Error Recovery for Practical Use

Alexander Azarov

[email protected] / Osinka.ru

February 18, 2012

Page 2: Scala parsers Error Recovery in Production

Parsers Error Recovery for Practical Use

Context

Osinka

I Forum of “BB” kindI (2.5M+ pages/day, 8M+ posts total)

I User generated content: BBCode markup

I Migrating to Scala

I Backend

Page 3: Scala parsers Error Recovery in Production

Parsers Error Recovery for Practical Use

Context

Plan

I Why parser combinators?

I Why error recovery?

I Example of error recovery

I Results

Page 4: Scala parsers Error Recovery in Production

Parsers Error Recovery for Practical Use

Context

BBCode

Few BBCode tags

[b]bold[/b] <b>bold</b>[i]italic[/i] <i>italic</i>[url=href]text[/url] <a href="href">text</a>[img]href[/img] <img src="href"/>

Page 5: Scala parsers Error Recovery in Production

Parsers Error Recovery for Practical Use

Context

BBCode example

I example of BBCode

Example

[quote="Nick"]original [b]text[/b][/quote]Here it is the reply with[url=http://www.google.com]link[/url]

Page 6: Scala parsers Error Recovery in Production

Parsers Error Recovery for Practical Use

Task

Why parser combinators, 1

I Regexp maintenance is a headache

I Bugs extremely hard to find

I No markup errors

Page 7: Scala parsers Error Recovery in Production

Parsers Error Recovery for Practical Use

Task

Why parser combinators, 2

One post source, many views

I HTML render for WebI textual view for emailsI text-only short summaryI text-only for full-text search indexer

Page 8: Scala parsers Error Recovery in Production

Parsers Error Recovery for Practical Use

Task

Why parser combinators, 3

Post analysis algorithms

I links (e.g. spam automated analysis)I imagesI whatever structure analysis we’d want

Page 9: Scala parsers Error Recovery in Production

Parsers Error Recovery for Practical Use

Task

Universal AST

One AST

I different printers

I various traversal algorithms

Page 10: Scala parsers Error Recovery in Production

Parsers Error Recovery for Practical Use

Problem

Sounds great. But.

This all looks like a perfect world.But what’s the catch??

Page 11: Scala parsers Error Recovery in Production

Parsers Error Recovery for Practical Use

Problem

Sounds great. But.

Humans.They do mistakes.

Example

[quote][url=http://www.google.com][img]http://www.image.com[/url[/img][/b]

Page 12: Scala parsers Error Recovery in Production

Parsers Error Recovery for Practical Use

Problem

Sounds great. But.

Humans.They do mistakes.

Example

[quote][url=http://www.google.com][img]http://www.image.com[/url[/img][/b]

Page 13: Scala parsers Error Recovery in Production

Parsers Error Recovery for Practical Use

Problem

User-Generated Content: Problem

Erroneous markup

I People do mistakes,I But no one wants to see empty post,I We have to show something meaningful in any case

Page 14: Scala parsers Error Recovery in Production

Parsers Error Recovery for Practical Use

Problem

Black or White World

I Scala parser combinators assume valid input

I Parser result: Success | NoSuccess

I no error recovery out of the box

Page 15: Scala parsers Error Recovery in Production

Parsers Error Recovery for Practical Use

Solution

Error recovery: our approach

I Our Parser never breaksI It generates “error nodes” instead

Page 16: Scala parsers Error Recovery in Production

Parsers Error Recovery for Practical Use

Solution

Approach: Error nodes

I Part of AST, FailNode contains the possible causes of thefailure

I They are meaningful

I for highlighting in editorI to mark posts having failures in markup (for moderators/other

users to see this)

Page 17: Scala parsers Error Recovery in Production

Parsers Error Recovery for Practical Use

Solution

Approach: input & unpaired tags

I Assume all input except tags as text

I E.g. [tag]text[/tag] is a text node

I Unpaired tags as the last choice: markup errors

Page 18: Scala parsers Error Recovery in Production

Parsers Error Recovery for Practical Use

Example

Example

Example

Page 19: Scala parsers Error Recovery in Production

Parsers Error Recovery for Practical Use

Example

Trivial BBCode markup

Example (Trivial "one tag" BBCode)

Simplest [font=bold]BBCode [font=red]example[/font][/font]

I has only one tag, fontI though it may have an argument

Page 20: Scala parsers Error Recovery in Production

Parsers Error Recovery for Practical Use

Example

Corresponding AST

AST

trait Node

case class Text(text: String) extends Nodecase class Font(arg: Option[String], subnodes: List[Node]) extends

Node

Page 21: Scala parsers Error Recovery in Production

Parsers Error Recovery for Practical Use

Example

Parser

BBCode parser

lazy val nodes = rep(font | text)lazy val text =rep1(not(fontOpen|fontClose) ~> "(?s).".r) ^^ {texts => Text(texts.mkString)

}lazy val font: Parser[Node] = {fontOpen ~ nodes <~ fontClose ^^ {case fontOpen(_, arg) ~ subnodes => Font(Option(arg),

subnodes)}

}

Page 22: Scala parsers Error Recovery in Production

Parsers Error Recovery for Practical Use

Example

Valid markup

Scalatest

describe("parser") {it("keeps spaces") {parse(" ") must equal(Right(Text(" ") :: Nil))parse(" \n ") must equal(Right(Text(" \n ") :: Nil))

}it("parses text") {parse("plain text") must equal(Right(Text("plain text") ::

Nil))}it("parses bbcode-like text") {parse("plain [tag] [fonttext") must equal(Right(Text("

plain [tag] [fonttext") :: Nil))}

Page 23: Scala parsers Error Recovery in Production

Parsers Error Recovery for Practical Use

Example

Invalid markup

Scalatest

describe("error markup") {it("results in error") {parse("t[/font]") must be(’left)parse("[font]t") must be(’left)

}}

Page 24: Scala parsers Error Recovery in Production

Parsers Error Recovery for Practical Use

Example

Recovery: Extra AST node

FailNode

case class FailNode(reason: String, markup: String) extends Node

Page 25: Scala parsers Error Recovery in Production

Parsers Error Recovery for Practical Use

Example

Recovery: helper methodsExplicitly return FailNode

protected def failed(reason: String) = FailNode(reason, "")

Enrich FailNode with markup

protected def recover(p: => Parser[Node]): Parser[Node] =Parser { in =>val r = p(in)lazy val markup = in.source.subSequence(in.offset, r.next.offset

).toStringr match {case Success(node: FailNode, next) =>Success(node.copy(markup = markup), next)

case other =>other

Page 26: Scala parsers Error Recovery in Production

Parsers Error Recovery for Practical Use

Example

Recovery: Parser rules

I never break (provide “alone tag” parsers)I return FailNode explicitly if needed

nodes

lazy val nodes = rep(node | missingOpen)lazy val node = font | text | missingClose

Page 27: Scala parsers Error Recovery in Production

Parsers Error Recovery for Practical Use

Example

“Missing open tag” parser

Catching alone [/font]

def missingOpen = recover {fontClose ^^^ { failed("missing open") }

}

Page 28: Scala parsers Error Recovery in Production

Parsers Error Recovery for Practical Use

Example

Argument check

font may have limits on argument

lazy val font: Parser[Node] = recover {fontOpen ~ rep(node) <~ fontClose ^^ {case fontOpen(_, arg) ~ subnodes =>if (arg == null || allowedFontArgs.contains(arg)) Font(

Option(arg), subnodes)else failed("arg incorrect")

}}

Page 29: Scala parsers Error Recovery in Production

Parsers Error Recovery for Practical Use

Example

Passes markup error tests

Scalatest

describe("recovery") {it("reports incorrect arg") {parse("[font=b]t[/font]") must equal(Right(FailNode("arg incorrect", "[font=b]t[/font]") :: Nil

))}it("recovers extra ending tag") {parse("t[/font]") must equal(Right(Text("t") :: FailNode("missing open", "[/font]") :: Nil

))}

Page 30: Scala parsers Error Recovery in Production

Parsers Error Recovery for Practical Use

Example

Passes longer tests

Scalatest

it("recovers extra starting tag in a longer sequence") {parse("[font][font]t[/font]") must equal(Right(FailNode("missing close", "[font]") :: Font(None, Text("t

") :: Nil) :: Nil))

}it("recovers extra ending tag in a longer sequence") {parse("[font]t[/font][/font]") must equal(Right(Font(None, Text("t") :: Nil) :: FailNode("missing open", "

[/font]") :: Nil))

Page 31: Scala parsers Error Recovery in Production

Parsers Error Recovery for Practical Use

Example

Examples source code

I Source code, specs:https://github.com/alaz/slides-err-recovery

Page 32: Scala parsers Error Recovery in Production

Parsers Error Recovery for Practical Use

Results

Production use outlines

I It works reliably

I Lower maintenance costsI Performance (see next slides)

I Beware: Scala parser combinators are not thread-safe.

Page 33: Scala parsers Error Recovery in Production

Parsers Error Recovery for Practical Use

Results

Performance

I The biggest problem is performance.

Benchmarks (real codebase)

PHP ScalaTypical 8k 5.3ms 51msBig w/err 76k 136ms 1245ms

I Workaround: caching

Page 34: Scala parsers Error Recovery in Production

Parsers Error Recovery for Practical Use

Results

Surprise!

Never give up

I find a good motivator instead (e.g. presentation for Scala.by)

Page 35: Scala parsers Error Recovery in Production

Parsers Error Recovery for Practical Use

Results

Performance: success story

I Want performance? Do not use Lexical

I Forget those scary numbers!

Benchmarks (real codebase)

PHP ScalaTypical 8k 5.3ms 51ms 16msBig w/err 76k 136ms 1245ms 31ms

I Thank you, Scala.by!

Page 36: Scala parsers Error Recovery in Production

Parsers Error Recovery for Practical Use

Results

Thank you

I Email: [email protected] Twitter: http://twitter.com/aazarovI Source code, specs:

https://github.com/alaz/slides-err-recovery