"spark summit 2016: trends & insights" -- zurich spark meetup, july 2016

Post on 09-Jan-2017

200 Views

Category:

Data & Analytics

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Spark Summit 2016

Who am I?

Looking for a Machine Learning Summer Intern!

bit.ly/nzzml

Spark Summit 2016

US / Europe & open to the world

Trend #1:Spark 2.0

Trend #2:RDD’s, DF’s, DS’s

RDD’s, DF’s, DS’s ... Why?

RDD’s, DF’s, DS’s ... Why?

RDD’s, DF’s, DS’s ... Why?

+

Prefer DF’s & DS’s over RDD’s!

RDD’s, DF’s, DS’s ... Why?

Demo ...

Trend #3:Streaming 2.0

“The simplest way to do streaming analytics, is when you don’t have to worry about streaming.”

Streaming 2.0

Streaming 2.0

Demo ...

Streaming 2.0val StructuredStream = sqlContext.read.format(“json”).stream(src_path)

StructuredStream.select($"constant_Value").groupBy($"constant_Value").count.write.format("parquet").save("/tmp/out/value.parquet").startStream()

Trend #4:GraphFrames

Trend #4: GraphFrames

Trend #4: GraphFrames

http://graphframes.github.io/

Demo ...

Trend #5:SparkR is catching up

Trend #5: SparkR is catching up

Trend #6:Deep-Learning

DNNs are coming: Watch it closely!

Insight #1:Big Players ...

… big community

Insight #2:Same issues everywhere ...

The user- mailinglist is your best friend!

Insight #3:Stream, Compute, Dump

Use Spark (streaming) what it’s meant for: realtime computation, not serving!

Insight #4:4 Best practices

GroupByKey! GroupByKey?

Circumvent Skew by “Salting”

Key: Foo

Salted Key: Foo + random(1,saltDim)

Think about resource allocation!--num-executors

--executor-cores

--executor-memory

? !

You know, window functions ...

first value, last value, rank,

Looking for a Machine Learning Summer

Intern!

bit.ly/nzzml

Checkout TechTuesday!

meetup.com/Tech-Tuesday-Zurich

top related