introduction to courier

37
Courier Joe Betz @ Coursera

Upload: joe-betz

Post on 23-Jan-2018

407 views

Category:

Software


0 download

TRANSCRIPT

Page 1: Introduction to Courier

CourierJoe Betz @ Coursera

Page 2: Introduction to Courier

Courier is a code generator

{"name": "Fortune","namespace": "org.example","type": "record","fields": [{

"name": "message", "type": "string"

}]

}

{ "message": "Today is your lucky day!" }

case class Fortune(message: String)

JSON Data

ScalaPegasus Schema generate

serialize / deserialize

Page 3: Introduction to Courier
Page 4: Introduction to Courier

● Extension of Apache Avro’s schema language built at Linkedin.

● Designed for natural looking JSON.

● Rich type system maps well between JSON and type-safe

languages like Scala.

● Schema language is machine readable and easy to extend.

● Tooling and language support.

Pegasus Schema Language

Pegasus Schemas

Avro Schemas

+optional record fields, +typerefs

Core schema language:

records, maps, arrays, unions, enums,

primitives

Courier

Page 5: Introduction to Courier

Why Schemas?

Page 6: Introduction to Courier

Increase transparency

into the structure of our

data.

Page 7: Introduction to Courier
Page 8: Introduction to Courier

With a common

understanding of

structure type-safety can

span multiple languages

and platforms.

Page 9: Introduction to Courier

Why Pegasus Schemas?

Page 10: Introduction to Courier

Pegasus schemas have a

rich type system.

Page 11: Introduction to Courier

Pegasus schemas are

machine readable.

Page 12: Introduction to Courier

Pegasus schemas are

easy to extend.

Page 13: Introduction to Courier

Pegasus schemas work

with multiple data

formats.

Page 14: Introduction to Courier

Pegasus Schema Types

Pegasus Type Scala Type Example JSON

int, long, float, double, boolean,

string

Int, Long, Float, Double, Boolean, String 1, 10000000, 3.14, 2.718281, true, “Coursera”

record case class R(f1: T1, f1: T2, ...) { “f1”: 1, “f2”: “Coursera” }

array A extends IndexedSeq[T] [1, 2, 3]

map M extends Map[String, T] { “key1”: 1, “key2”: 2 }

union sealed abstract class U

case class M1(T1) extends U

case class M2(T2) extends U

{

“org.example.M1”: <T1 Value>

}

enum object E extend Enumeration “SYMBOL”

* unions and typerefs will be covered in more detail later.

Page 15: Introduction to Courier

.pdsc

(pegasus data schema)

Page 16: Introduction to Courier

Records

Page 17: Introduction to Courier

org/example/Note.pdsc{

"name": "Note","namespace": "org.example","doc": "A simple note.","type": "record","fields": [

{ "name": "title", "type": "string" },{ "name": "body", "type": "Body", "optional": true }

]}

Scalacase class Note(title: String, body: Option[Body])

ExamplesNote("reminder", Some(Body(…))) => { "title": "reminder", "body": { … } }Note("reminder", None) => { "title": "reminder" }

Records

Page 18: Introduction to Courier

Schema{ "name": "WithOptionals", "type": "record", …"fields": [

{ "name": "o1", "type": "string", "optional": true },{ "name": "o2", "type": "string", "optional": true, "default": "b" }{ "name": "o3", "type": "string", "optional": true, "defaultNone": true }

]

Generated Scalacase class WithOptionals(

o1: Option[String],o2: Option[String] = Some("b"),o3: Option[String] = None)

ExamplesWithOptionals(o1 = None) => { "o2": "b" }WithOptionals(o1 = Some("a")) => { "o1": "a", "o2": "b" }WithOptionals(o1 = None, o2 = None) => {}WithOptionals(o1 = None, o3 = Some("c")) => { "o3": "c" }

Optional Fields and Defaults

Page 19: Introduction to Courier

Collection Types

Page 20: Introduction to Courier

org/example/NotePad.psdc{

"name": "NotePad", "type": "record", …"fields": [

{"name": "notes","type": { "type": "array", "items": "Note" }

}]

}

Generated Scalacase class NotePad(notes: NoteArray)class NoteArray extends IndexedSeq[Note]

ExampleNotePad(notes = NoteArray(Note(…), …)) =>

{ "notes": [ { "title": "…" }, … ] }

Arrays

Page 21: Introduction to Courier

org/example/TermWeights.pdsc{

"name": "TermWeights", "type": "record", "namespace": "org.example","fields": [

{"name": "byTerm","type": { "type": "map", "keys": "string", "values": "float" }

}]

}

Generated Scalacase class TermWeights(byTerm: FloatMap)class FloatMap extends Map[String, Float]

ExampleTermWeights(byTerm = FloatMap("ninja" -> 0.25f, "pirate" -> 0.75f)) =>

{ "byTerm": { "ninja": 0.25, "pirate": 0.75 } }

Maps

Page 22: Introduction to Courier

Unions

Page 23: Introduction to Courier

Unions are the closest

thing Pegasus has to a

polymorphic type.

Page 24: Introduction to Courier

Unlike true subtype

polymorphism, unions

have a “sealed” set of

member types.

Page 25: Introduction to Courier

org/example/Question.pdsc{

"name": "Question", "type": "record", …"fields": [

{ "name": "answerFormat", "type": [ "MultipleChoice", "TextEntry" ] }]

}

Scalacase class Question(answerFormat: Question.AnswerFormat)object Question {

sealed abstract class AnswerFormat()case class MultipleChoiceMember(value: MultipleChoice) extends

AnswerFormatcase class TextEntryMember(value: TextEntry) extends AnswerFormat

}

ExampleQuestion(answerFormat = MultipleChoiceMember(MultipleChoice(…)) =>

{ "answerFormat": { "org.example.MultipleChoice": { … } }

Unions

Page 26: Introduction to Courier

Enums

Page 27: Introduction to Courier

org/example/Fruits.pdsc{

"type" : "enum","name" : "Fruits","namespace" : "org.example","symbols" : ["APPLE", "BANANA", "ORANGE"]

}

Generated Scalaobject Fruits extend Enumeration {

val APPLEval BANANAval ORANCE

}

ExampleFruits.APPLE => "APPLE"

Enums

Page 28: Introduction to Courier

Typerefs

Page 29: Introduction to Courier

org/example/Timestamp.pdsc{

"type" : "enum","name" : "Timestamp","namespace" : "org.example","ref" : "long"

}

Generated ScalaUse Long directly.

Example1434916256

Basic Typeref

Page 30: Introduction to Courier

Typerefs are Pegasus’s

multi-tool.

Page 31: Introduction to Courier

org/example/AnswerFormats.pdsc{

"name": "AnswerFormats", "namespace": "org.example", "type": "typeref","ref": [ "MultipleChoice", "TextEntry" ]

}

Scalasealed abstract class AnswerFormats()object AnswerFormats {

case class MultipleChoiceMember(v: MultipleChoice) extends AnswerFormatscase class TextEntryMember(v: TextEntry) extends AnswerFormats

}

ExampleMultipleChoiceMember(MultipleChoice(…)) =>

{ "org.example.MultipleChoice": { … }

Naming a Union with a Typeref

Page 32: Introduction to Courier

org/example/DateTime.pdsc{

"name": "DateTime", "namespace": "org.example", "type": "typeref","ref": "string","scala": {

"class": "org.joda.time.DateTime","coercerClass": "org.coursera.models.common.DateTimeCoercer"

}}

ScalaUse org.joda.time.DateTime directly.

ExampleRecord(createdAt = new org.joda.time.DateTime(…)) =>

{ "createdAt": "2015-06-21T18:24:18Z" }

Custom Bindings with a Typeref

Page 33: Introduction to Courier

Pegasus System

Schema system: Schema based validation + custom validators.

Data system: JSON Object and Array equivalent types, support for binding to native types.

Code generators: Java, Scala (via Courier), Swift (in progress), Android Java (planned)

Codecs:

● via Pegasus:

o JSON - Jackson streaming

o PSON - non-standard JSON equivalent binary protocol

o Avro binary - compact binary protocol

● via Courier:

o StringKeyCodec - compatible with our legacy StringKeyFormats

o InlineStringCodec - a new “URL Friendly” JSON compatible format

Hardened and performance optimized at Linkedin. In large scale production use for over 3 years.

Page 34: Introduction to Courier

Development with Courier

Page 35: Introduction to Courier

Courier SBT Plugin

Code generation

integrated into SBT build.

Caches .pdsc file state,

only runs generator when

they change.

Page 36: Introduction to Courier

Courier SBT Plugin

Page 37: Introduction to Courier

http://coursera.github.io/courier/