"spark summit 2016: trends & insights" -- zurich spark meetup, july 2016

41
Spark Summit 2016

Upload: rene-pfitzner

Post on 09-Jan-2017

200 views

Category:

Data & Analytics


1 download

TRANSCRIPT

Page 1: "Spark Summit 2016: Trends & Insights" -- Zurich Spark Meetup, July 2016

Spark Summit 2016

Page 2: "Spark Summit 2016: Trends & Insights" -- Zurich Spark Meetup, July 2016

Who am I?

Page 3: "Spark Summit 2016: Trends & Insights" -- Zurich Spark Meetup, July 2016

Looking for a Machine Learning Summer Intern!

bit.ly/nzzml

Page 4: "Spark Summit 2016: Trends & Insights" -- Zurich Spark Meetup, July 2016

Spark Summit 2016

Page 5: "Spark Summit 2016: Trends & Insights" -- Zurich Spark Meetup, July 2016

US / Europe & open to the world

Page 6: "Spark Summit 2016: Trends & Insights" -- Zurich Spark Meetup, July 2016

Trend #1:Spark 2.0

Page 8: "Spark Summit 2016: Trends & Insights" -- Zurich Spark Meetup, July 2016

Trend #2:RDD’s, DF’s, DS’s

Page 9: "Spark Summit 2016: Trends & Insights" -- Zurich Spark Meetup, July 2016

RDD’s, DF’s, DS’s ... Why?

Page 10: "Spark Summit 2016: Trends & Insights" -- Zurich Spark Meetup, July 2016

RDD’s, DF’s, DS’s ... Why?

Page 12: "Spark Summit 2016: Trends & Insights" -- Zurich Spark Meetup, July 2016

RDD’s, DF’s, DS’s ... Why?

+

Page 13: "Spark Summit 2016: Trends & Insights" -- Zurich Spark Meetup, July 2016

Prefer DF’s & DS’s over RDD’s!

Page 14: "Spark Summit 2016: Trends & Insights" -- Zurich Spark Meetup, July 2016

RDD’s, DF’s, DS’s ... Why?

Page 15: "Spark Summit 2016: Trends & Insights" -- Zurich Spark Meetup, July 2016

Demo ...

Page 16: "Spark Summit 2016: Trends & Insights" -- Zurich Spark Meetup, July 2016

Trend #3:Streaming 2.0

Page 17: "Spark Summit 2016: Trends & Insights" -- Zurich Spark Meetup, July 2016

“The simplest way to do streaming analytics, is when you don’t have to worry about streaming.”

Page 18: "Spark Summit 2016: Trends & Insights" -- Zurich Spark Meetup, July 2016

Streaming 2.0

Page 19: "Spark Summit 2016: Trends & Insights" -- Zurich Spark Meetup, July 2016

Streaming 2.0

Page 20: "Spark Summit 2016: Trends & Insights" -- Zurich Spark Meetup, July 2016

Demo ...

Page 21: "Spark Summit 2016: Trends & Insights" -- Zurich Spark Meetup, July 2016

Streaming 2.0val StructuredStream = sqlContext.read.format(“json”).stream(src_path)

StructuredStream.select($"constant_Value").groupBy($"constant_Value").count.write.format("parquet").save("/tmp/out/value.parquet").startStream()

Page 22: "Spark Summit 2016: Trends & Insights" -- Zurich Spark Meetup, July 2016

Trend #4:GraphFrames

Page 23: "Spark Summit 2016: Trends & Insights" -- Zurich Spark Meetup, July 2016

Trend #4: GraphFrames

Page 24: "Spark Summit 2016: Trends & Insights" -- Zurich Spark Meetup, July 2016

Trend #4: GraphFrames

http://graphframes.github.io/

Page 25: "Spark Summit 2016: Trends & Insights" -- Zurich Spark Meetup, July 2016

Demo ...

Page 26: "Spark Summit 2016: Trends & Insights" -- Zurich Spark Meetup, July 2016

Trend #5:SparkR is catching up

Page 27: "Spark Summit 2016: Trends & Insights" -- Zurich Spark Meetup, July 2016

Trend #5: SparkR is catching up

Page 28: "Spark Summit 2016: Trends & Insights" -- Zurich Spark Meetup, July 2016

Trend #6:Deep-Learning

Page 29: "Spark Summit 2016: Trends & Insights" -- Zurich Spark Meetup, July 2016

DNNs are coming: Watch it closely!

Page 30: "Spark Summit 2016: Trends & Insights" -- Zurich Spark Meetup, July 2016

Insight #1:Big Players ...

Page 31: "Spark Summit 2016: Trends & Insights" -- Zurich Spark Meetup, July 2016

… big community

Page 32: "Spark Summit 2016: Trends & Insights" -- Zurich Spark Meetup, July 2016

Insight #2:Same issues everywhere ...

Page 33: "Spark Summit 2016: Trends & Insights" -- Zurich Spark Meetup, July 2016

The user- mailinglist is your best friend!

Page 34: "Spark Summit 2016: Trends & Insights" -- Zurich Spark Meetup, July 2016

Insight #3:Stream, Compute, Dump

Page 35: "Spark Summit 2016: Trends & Insights" -- Zurich Spark Meetup, July 2016

Use Spark (streaming) what it’s meant for: realtime computation, not serving!

Page 36: "Spark Summit 2016: Trends & Insights" -- Zurich Spark Meetup, July 2016

Insight #4:4 Best practices

Page 37: "Spark Summit 2016: Trends & Insights" -- Zurich Spark Meetup, July 2016

GroupByKey! GroupByKey?

Page 38: "Spark Summit 2016: Trends & Insights" -- Zurich Spark Meetup, July 2016

Circumvent Skew by “Salting”

Key: Foo

Salted Key: Foo + random(1,saltDim)

Page 39: "Spark Summit 2016: Trends & Insights" -- Zurich Spark Meetup, July 2016

Think about resource allocation!--num-executors

--executor-cores

--executor-memory

? !

Page 40: "Spark Summit 2016: Trends & Insights" -- Zurich Spark Meetup, July 2016

You know, window functions ...

first value, last value, rank,

Page 41: "Spark Summit 2016: Trends & Insights" -- Zurich Spark Meetup, July 2016

Looking for a Machine Learning Summer

Intern!

bit.ly/nzzml

Checkout TechTuesday!

meetup.com/Tech-Tuesday-Zurich