how to deploy apache spark to mesos/dcos

Post on 07-Jan-2017

28.139 Views

Category:

Technology

3 Downloads

Preview:

Click to see full reader

TRANSCRIPT

How to deploy Apache Sparkto Mesos/DCOS

with Iulian Dragoș

Agenda

• Intro Apache Spark

• Apache Mesos

• Why Spark on Mesos

• A look under the hood

2

Spark - lightning-fast cluster computing

• next generation Big Data solution

• analytics and data processing

• up to 100x faster than Hadoop MapReduce

• built with Scala and Akka

• Apache top-level project

4

Spark

• It’s a next generation compute-engine

• Does not replace the whole Hadoop ecosystem

• just MapReduce

• Integrates/works with HDFS, Hive, Hbase, etc.

5

Spark API

• Scala distributed collections

• also available from Python and Java

• interactive shell and job submission

• streaming and batch modes

• flourishing ecosystem (SparkSQL, MLLib, GraphX)

6

Spark execution

7

http://spark.apache.org/docs/latest/cluster-­‐overview.html

Spark execution

• local (for experimentation)

• standalone (built-in cluster manager)

• YARN (Hadoop cluster manager)

• Mesos (general cluster manager)

8

Apache Mesos

Why Apache Mesos?

• General (a “distributed kernel”)

• Efficient resource management

• Proven technology (in production at Apple and Twitter)

• Typesafe & Mesosphere maintain the Spark/Mesos framework

10

“Program against your datacenter as a single pool of resources”

Frameworks running on Mesos

• HDFS

• Cassandra

• ElasticSearch

• Yarn (Myriad)

• Marathon, etc.

• and of course, Spark

11

Resource scheduling with Mesos

• 2-level scheduling

• Mesos offers resources to frameworks

• Frameworks accept or reject offers

• Offers include

• CPU cores, memory, ports, disk

12

13

Mesos Cluster

masterMesos Master

KeyMesosSparkHDFS

master / client

master / client

nodeMesos Slave

Name Node Executor

task1 …

node

DiskDiskDiskDiskDisk

Mesos SlaveData Node Executor… …

node

HDFS FW Sched. Job 1

Spark FW Sched. Job 1

1 (S1, 8CPU, 32GB, ...)

14

Mesos Cluster

masterMesos Master

KeyMesosSparkHDFS

master / client

master / client

nodeMesos Slave

Name Node Executor

task1 …

node

DiskDiskDiskDiskDisk

Mesos SlaveData Node Executor… …

node

HDFS FW Sched. Job 1

Spark FW Sched. Job 12

(S1, 8CPU, 32GB, ...)1

def foo(x: Int)

15

Mesos Cluster

masterMesos Master

KeyMesosSparkHDFS

master / client

master / client

nodeMesos Slave

Name Node Executor

task1 …

node

DiskDiskDiskDiskDisk

Mesos SlaveData Node Executor… …

node

HDFS FW Sched. Job 1

Spark FW Sched. Job 12

1

(S1, 2CPU, 8GB, ...)(S1, 2CPU, 8GB, ...)

3

def foo(x: Int)

16

Mesos Cluster

masterMesos Master

KeyMesosSparkHDFS

master / client

master / client

nodeMesos Slave

Name Node Executor

task1 …

node

DiskDiskDiskDiskDisk

Mesos SlaveData Node Executor… …

node

HDFS FW Sched. Job 1

Spark FW Sched. Job 12

1

(S1, 2CPU, 8GB, ...)(S1, 2CPU, 8GB, ...)

3

4

Spark Executortask1 …

17

Spark Cluster Abstraction

…NodeNode

Spark Driverobject MyApp { def main() { val sc = new SparkContext(…) … }}

Cluster Manager

Spark Executor

task task

task task

Spark Executor

task task

task task

18

Mesos Coarse Grained Mode

…Node NodeMesos Executor …Mesos Executor

master

Spark Executor

task task

task task

Spark Executor

task task

task task

Mesos Master

Spark Framework

Spark Driverobject MyApp { def main() { val sc = new SparkContext(…) … }}

Scheduler

Mesos Coarse Grained Mode

• Fast startup for tasks: • Better for interactive sessions.

• But resources locked up in larger Mesos task. • (Dynamic allocation changes this is in 1.5)

19

…Node NodeMesos Executor …Mesos Executor

master

Spark Executor

task task

task task

Spark Executor

task task

task task

Mesos Master

Spark Framework

Spark Driverobject MyApp { def main() { val sc = new SparkContext(…) … }}

Scheduler

Mesos Fine Grained Mode

20

…NodeNode

Spark Framework

Mesos Executor …

master

Spark Driverobject MyApp { def main() { val sc = new SparkContext(…) … }}

task task

task task

Mesos Master

Mesos ExecutorSpark Exec

task

Spark Exec

task

Spark Exec

task

Spark Exec

task

Mesos ExecutorSpark Exec

task

Spark Exec

task

Spark Exec

task

Spark Exec

task…

Scheduler

Mesos Fine Grained Mode

• Better resource utilization. • Slower startup for tasks:

• Fine for batch and relatively static streaming.

21

…NodeNode

Spark Framework

Mesos Executor …

master

Spark Driverobject MyApp { def main() { val sc = new SparkContext(…) … }}

task task

task task

Mesos Master

Mesos ExecutorSpark Exec

task

Spark Exec

task

Spark Exec

task

Spark Exec

task

Mesos ExecutorSpark Exec

task

Spark Exec

task

Spark Exec

task

Spark Exec

task…

Scheduler

Dynamic allocation

• Mesos support was added in Spark 1.5

• adds and removes executors based on load

• when executors are idle, kills them

• when tasks queue up in the scheduler, adds executors

• needs external-shuffle-service to be running on each node

22

Client vs Cluster mode

• Where does the driver process run?

• client-mode: on the machine that submits the job

• cluster-mode: on a machine in the cluster

23

Demo

What’s next on Mesos

• Oversubscription (0.23) • Persistence Volumes • Dynamic Reservations • Optimistic Offers • Isolations • More….

25

Closing words on Spark Streaming

• Spark 1.5 improves resiliency by adding back-pressure inside Spark Streaming

• slow-down receivers dynamically, based on load

• Spark 1.6 will add the ability to connect to Reactive Streams

• propagate back-pressure outside of Spark

26

Key points

• Spark is a next-generation compute engine for Big Data

• Mesos is a next-generation cluster manager

• better utilization of cluster resources across organization

• Spark on Mesos is commercially supported by Typesafe

• Typesafe&Mesosphere are the maintainers of Spark/Mesos

27

EXPERT SUPPORT Why Contact Typesafe for Your Apache Spark Project?

Ignite your Spark project with 24/7 production SLA, unlimited expert support and on-site training:

• Full application lifecycle support for Spark Core, Spark SQL & Spark Streaming

• Deployment to Standalone, EC2, Mesos clusters • Expert support from dedicated Spark team • Optional 10-day “getting started” services

package

Typesafe is a partner with Databricks, Mesosphere and IBM.

Learn more about on-site trainingCONTACT US

top related