spark as the gateway drug to typed functional programming: spark summit east talk by jeff smith and...

53
Spark as the Gateway Drug To Typed Functional Programming Jeff Smith Rohan Aletty x.ai

Upload: spark-summit

Post on 22-Jan-2018

429 views

Category:

Data & Analytics


2 download

TRANSCRIPT

Page 1: Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East talk by Jeff Smith and Rohan Aletty

Spark as the Gateway Drug To Typed Functional Programming

Jeff Smith Rohan Aletty x.ai

Page 2: Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East talk by Jeff Smith and Rohan Aletty

Real World AI• Scale is increasing • Complexity is increasing • Human brain size is constant

Page 3: Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East talk by Jeff Smith and Rohan Aletty

System Complexity

Data Ingest Annotation Routing

Response Generation

Annotation Services

Models

Annotation Services

Models

Annotation Services

Models

Annotation Services

Models

Annotation Services

Models

Annotation Services

Models

Annotation Services

Models

Models

Annotation Services

Knowledge Base

Page 4: Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East talk by Jeff Smith and Rohan Aletty

Problem Complexity

Page 5: Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East talk by Jeff Smith and Rohan Aletty

Complex Intelligence

Page 6: Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East talk by Jeff Smith and Rohan Aletty

Datanauts

Page 7: Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East talk by Jeff Smith and Rohan Aletty

Tools

Page 8: Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East talk by Jeff Smith and Rohan Aletty

Scala• Bleeding edge • Real world

Page 9: Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East talk by Jeff Smith and Rohan Aletty

Spark• Incredibly powerful • Easy to use

Page 10: Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East talk by Jeff Smith and Rohan Aletty

Typed Functional Programming• Powerful abstractions • Tough learning curve

Page 11: Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East talk by Jeff Smith and Rohan Aletty

Functions

Page 12: Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East talk by Jeff Smith and Rohan Aletty

Methods• Collection of statements • Might have side effects • On an object

Page 13: Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East talk by Jeff Smith and Rohan Aletty

Methodspublic class Dataset {

private List<Double> observations; private Double average;

public Dataset(List<Double> inputData) { observations = inputData; }

}

Page 14: Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East talk by Jeff Smith and Rohan Aletty

Methodspublic class Dataset {

public double getAverage() { Double runningSum = 0.0;

for (Double observation : observations) { runningSum += observation; }

average = runningSum / observations.size();

return average; }}

Page 15: Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East talk by Jeff Smith and Rohan Aletty

Methodspublic class Dataset {

public void setObservations(List<Double> inputData) { observations = inputData; }}

Page 16: Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East talk by Jeff Smith and Rohan Aletty

Methodspublic class Dataset {

private List<Double> observations; private Double average;

public Dataset(List<Double> inputData) { observations = inputData; }

public double getAverage() { Double runningSum = 0.0;

for (Double observation : observations) { runningSum += observation; }

average = runningSum / observations.size();

return average; }

public void setObservations(List<Double> inputData) { observations = inputData; }}

Page 17: Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East talk by Jeff Smith and Rohan Aletty

Functions• Collection of expressions • Returns a value • Are objects (in Scala) • Can be in-lined

Page 18: Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East talk by Jeff Smith and Rohan Aletty

Functions in Scalaval inputData = List(1.0, 2.0, 3.0)

Page 19: Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East talk by Jeff Smith and Rohan Aletty

Functions in Scaladef average(observations: List[Double]) { observations.sum / observations.size}

average(inputData)

Page 20: Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East talk by Jeff Smith and Rohan Aletty

Functions in Scaladef add(x: Double, y: Double) = { x + y}

val sum = inputData.foldLeft(0.0)(add)

val average = sum / inputData.size

Page 21: Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East talk by Jeff Smith and Rohan Aletty

Functions in Scalaval sum = inputData.foldLeft(0.0)(add)

val average = sum / inputData.size

inputData.foldLeft(0.0)(_ + _) / inputData.size

Page 22: Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East talk by Jeff Smith and Rohan Aletty

Functions in SparkinputData.foldLeft(0.0)(_ + _) / inputData.size

val observations = sc.parallelize(inputData)

observations.fold(0.0)(_ + _) / observations.count()

Page 23: Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East talk by Jeff Smith and Rohan Aletty

Immutability

Page 24: Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East talk by Jeff Smith and Rohan Aletty

Mutation• Changing an object

Page 25: Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East talk by Jeff Smith and Rohan Aletty

Mutationvisits = {"Church": 2, "Backus": 1, "McCarthy": 4}

old_value = visits["Backus"]

visits["Backus"] = old_value + 1

Page 26: Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East talk by Jeff Smith and Rohan Aletty

Immutability• Never changing objects

Page 27: Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East talk by Jeff Smith and Rohan Aletty

Immutability in Scalaval visits = Map("Church" -> 2, "Backus" -> 1, "McCarthy" -> 4)

val updatedVisits = visits.updated("Backus", 2)

Page 28: Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East talk by Jeff Smith and Rohan Aletty

Immutability in Sparkval manyVisits = sc.parallelize(visits.toSeq)

val additionalVisit = sc.parallelize(Seq(("Backus", 1)))

val updatedVisits = manyVisits.union(additionalVisit) .aggregateByKey(0)(_ + _, _ + _)

Page 29: Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East talk by Jeff Smith and Rohan Aletty

Recap

Page 30: Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East talk by Jeff Smith and Rohan Aletty

Concepts• Higher-order functions • Anonymous functions • Purity of functions

Page 31: Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East talk by Jeff Smith and Rohan Aletty

Concepts• Currying • Referential transparency • Closures • Resilient Distributed Datasets

Page 32: Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East talk by Jeff Smith and Rohan Aletty

Lazy Evaluation

Page 33: Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East talk by Jeff Smith and Rohan Aletty

Functional Programming — Lazy Evaluation

• Delaying evaluation of an expression until a value is needed

• Two major advantages of lazy evaluation • Deferring computation allows program only evaluate what is necessary • Changing evaluation scheme into to be more efficient

Page 34: Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East talk by Jeff Smith and Rohan Aletty

Spark — Lazy Evaluation• All transformations are lazy

• Their existence added to Spark computation DAG

• Example DAGs

Page 35: Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East talk by Jeff Smith and Rohan Aletty

Spark — Lazy Evaluationval rdd1 = sc.parallelize(...)

val rdd2 = rdd1.map(...)

val rdd3 = rdd1.map(...)

val rdd4 = rdd1.map(...)

rdd3.take(5)

Page 36: Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East talk by Jeff Smith and Rohan Aletty

Spark — Learning Laziness• Advantage 1: (deferred computation)

• draws directly from only evaluating parts of DAG that are necessary when executing an action

• Advantage 2: (optimized evaluation scheme) • draws directly from pipelining within Spark stages to make execution

more efficient

Page 37: Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East talk by Jeff Smith and Rohan Aletty

Types

Page 38: Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East talk by Jeff Smith and Rohan Aletty

Functional Programming — Type Systems

• Mechanism for defining algebraic data types (ADTs) which are useful for program structure • i.e. “let’s group this data together and brand it a new type”

• Compile time guarantees of correctness of program • e.g. “no, you cannot add Foo to Bar”

Page 39: Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East talk by Jeff Smith and Rohan Aletty

Spark — Types• RDD’s (typed), Datasets (typed), DataFrames (untyped)

• Types provide great schema enforcement on a dataset for preventing unexpected behavior

Page 40: Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East talk by Jeff Smith and Rohan Aletty

Spark — Typescase class Person(name: String, age: Int)

val peopleDS = spark.read.json(path).as[Person]

val ageGroupedDs = peopleDS.groupBy(_.age)

Page 41: Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East talk by Jeff Smith and Rohan Aletty

Spark — Learning Types• Spark through Scala also allows learning of pattern

matching • ADTs as both product types and union types

• Allows us to reason about code easier

• Gives us compile time safety

Page 42: Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East talk by Jeff Smith and Rohan Aletty

Spark — Learning Typestrait Person { def name: String }

case class Student(name: String, grade: String) extends Person

case class Professional(name: String, job: String) extends Person

val personRDD: RDD[Person] = sc.parallelize(…)

// working with both union and product typesval mappedRDD: RDD[String] = personRDD.map { case Student(name, grade) => grade case Professional(name, job) => job}

Page 43: Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East talk by Jeff Smith and Rohan Aletty

Spark — Learning Typesval rdd1: RDD[Person] = sc.parallelize(...)

val rdd2: RDD[String] = rdd1.map("name: " + _) // Compilation error!

val rdd3: RDD[String] = rdd2.map("name: " + _.name) // It works!

Page 44: Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East talk by Jeff Smith and Rohan Aletty

Monads

Page 45: Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East talk by Jeff Smith and Rohan Aletty

Functional Programming — Monads• In category theory:

• “a monad in X is just a monoid in the category of endofunctors of X”

• In functional programming, refers to a container that can: • Inject a value into the container • Perform operations on values returning a container with new values • Flatten nested containers into a single container

Page 46: Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East talk by Jeff Smith and Rohan Aletty

Scala — Monads!trait Monad[M[]] { // constructs a Monad instance from the given value, e.g. List(1) def apply[T](v: T): M[T]

// effectively lets you transform values within a Monad def bind[T, U](m: M[T])(fn: (T) => M[U]): M[U]}

Page 47: Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East talk by Jeff Smith and Rohan Aletty

Scala — Monads!• Many monads in Scala

• List, Set, Option, etc.

• Powerful line of thinking • Helps code comprehension • Reduces error checking logic (pattern matching!) • Can build further transformations: map(), filter(), foreach(), etc.

Page 48: Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East talk by Jeff Smith and Rohan Aletty

Spark — Learning Monads?• We have many “computation builders” -- (RDD’s, Datasets,

DataFrames) • Containers on which transformations can be applied

• Similar to monads though not identical • No unit function to wrap constituent values • Cannot lift all types into flatMap function unconstrained

Page 49: Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East talk by Jeff Smith and Rohan Aletty

For Later

Page 50: Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East talk by Jeff Smith and Rohan Aletty

Conclusions• Spark introduces all types of devs to Scala

• Scala helps people learn typed functional programming

• Typed functional programming improves Spark development

Page 51: Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East talk by Jeff Smith and Rohan Aletty

x.ai @xdotai [email protected] New York, New York

Page 52: Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East talk by Jeff Smith and Rohan Aletty

Use the code ctwsparks17 for 40% off!

Page 53: Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East talk by Jeff Smith and Rohan Aletty

Thank You