getting started running apache spark on apache mesos

55
Getting Started Running Apache Spark on Apache Mesos, 2014-01-24 Paco Nathan liber118.com/pxn @pacoid

Upload: paco-nathan

Post on 19-Aug-2015

6.298 views

Category:

Technology


4 download

TRANSCRIPT

Page 1: Getting Started Running Apache Spark on Apache Mesos

Getting Started Running Apache Spark on Apache Mesos, 2014-01-24

Paco Nathan liber118.com/pxn@pacoid

Page 2: Getting Started Running Apache Spark on Apache Mesos

Spark on Mesos, 2014-01-24

•what is Apache Mesos?

• launch a Mesos cluster in the cloud

•configure and run Spark on Mesos

•run jobs in Spark

•further resources…

Page 3: Getting Started Running Apache Spark on Apache Mesos

Datacenter Computing

Google has been doing datacenter computing for years, to address the complexities of large-scale data workflows:

• leveraging the modern kernel: isolation in lieu of VMs

• among the top 10 Linux kernel OSS contributors: cgroups

• “most (>80%) jobs are batch jobs, but the majority of resources (55–80%) are allocated to service jobs”

• mixed workloads, multi-tenancy

• relatively high utilization rates

• JVM? not so much…

!

take-aways: scheduling batch is not so difficult; scheduling services is hard+expensive

Page 4: Getting Started Running Apache Spark on Apache Mesos

Google describes the business case…

Taming Latency Variability Jeff Deanplus.google.com/u/0/+ResearchatGoogle/posts/C1dPhQhcDRv

Page 5: Getting Started Running Apache Spark on Apache Mesos

“Return of the Borg”

Return of the Borg: How Twitter Rebuilt Google’s Secret Weapon Cade Metzwired.com/wiredenterprise/2013/03/google-borg-twitter-mesos

!The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines Luiz André Barroso, Urs Hölzle research.google.com/pubs/pub35290.html !!2011 GAFS Omega John Wilkes, et al. youtu.be/0ZFMlO98Jkc

Page 6: Getting Started Running Apache Spark on Apache Mesos

Google describes the technology…

Omega: flexible, scalable schedulers for large compute clusters Malte Schwarzkopf, Andy Konwinski, Michael Abd-El-Malek, John Wilkes eurosys2013.tudos.org/wp-content/uploads/2013/paper/Schwarzkopf.pdf

Page 7: Getting Started Running Apache Spark on Apache Mesos
Page 8: Getting Started Running Apache Spark on Apache Mesos

Mesos – open source datacenter computing

a common substrate for cluster computing

mesos.apache.org

heterogenous assets in your datacenter or cloud made available as a homogenous set of resources

• top-level Apache project

• scalability to 10,000s of nodes

• obviates the need for virtual machines

• isolation (pluggable) for CPU, RAM, I/O, FS, etc.

• fault-tolerant leader election based on Zookeeper

• APIs in C++, Java, Python, Go

• web UI for inspecting cluster state

• available for Linux, OpenSolaris, Mac OSX

Page 10: Getting Started Running Apache Spark on Apache Mesos

Mesos – architecture

Kernel

Apps

servicesbatch

Frameworks

Workloads

distributed file system

Chronos

DFS

distributed resources: CPU, RAM, I/O, FS, rack locality, etc. Cluster

Storm

Kafka JBoss Django RailsSharkImpalaScalding

Marathon

SparkHadoopMPI

MySQL

Page 11: Getting Started Running Apache Spark on Apache Mesos

Mesos – architecture

HDFS, distrib file system

Mesos, distrib kernel

meta-frameworks: Aurora, Marathon

frameworks: Spark, Storm, MPI, Jenkins, etc.

task schedulers: Chronos, etc.

APIs: C++, JVM, Py, Go

apps: HA services, web apps, batch jobs, scripts, etc.

Linux: libcgroup, libprocess, libev, etc.

Page 12: Getting Started Running Apache Spark on Apache Mesos

Mesos – dynamics

Mesosdistrib kernel

Marathondistrib init.d

Chronosdistrib cron

distribframeworks

HAservices

scheduledapps

Page 13: Getting Started Running Apache Spark on Apache Mesos

Mesos – dynamics

resourceoffers

distributedframework Scheduler Executor Executor Executor

Mesosslave

Mesosslave

Mesosslave

distributedkernel

available resources

Mesosslave

Mesosslave

Mesosslave

MesosmasterMesosmaster

Page 14: Getting Started Running Apache Spark on Apache Mesos

Production Deployments (public)

Page 15: Getting Started Running Apache Spark on Apache Mesos

Case Study: Twitter (bare metal / on premise)

“Mesos is the cornerstone of our elastic compute infrastructure – it’s how we build all our new services and is critical for Twitter’s continued success at scale. It's one of the primary keys to our data center efficiency."

Chris Fry, SVP Engineering blog.twitter.com/2013/mesos-graduates-from-apache-incubation wired.com/gadgetlab/2013/11/qa-with-chris-fry/ !

• key services run in production: analytics, typeahead, ads

• Twitter engineers rely on Mesos to build all new services

• instead of thinking about static machines, engineers think about resources like CPU, memory and disk

• allows services to scale and leverage a shared pool of servers across datacenters efficiently

• reduces the time between prototyping and launching

Page 16: Getting Started Running Apache Spark on Apache Mesos

Spark on Mesos, 2014-01-24

•what is Apache Mesos?

• launch a Mesos cluster in the cloud

•configure and run Spark on Mesos

•run jobs in Spark

•further resources…

Page 17: Getting Started Running Apache Spark on Apache Mesos

http://elastic.mesosphere.io

launch a Mesos cluster in the Amazon AWS cloud in three simple steps, given:

• AWS credentials • SSH public key • email address

Page 18: Getting Started Running Apache Spark on Apache Mesos
Page 19: Getting Started Running Apache Spark on Apache Mesos
Page 20: Getting Started Running Apache Spark on Apache Mesos
Page 21: Getting Started Running Apache Spark on Apache Mesos
Page 22: Getting Started Running Apache Spark on Apache Mesos
Page 23: Getting Started Running Apache Spark on Apache Mesos

Spark on Mesos, 2014-01-24

•what is Apache Mesos?

• launch a Mesos cluster in the cloud

•configure and run Spark on Mesos

•run jobs in Spark

•further resources…

Page 24: Getting Started Running Apache Spark on Apache Mesos

http://mesosphere.io/learn/run-spark-on-mesos/

configure and run Spark on a Mesos cluster on AWS, in a seven-step tutorial…

Page 25: Getting Started Running Apache Spark on Apache Mesos
Page 26: Getting Started Running Apache Spark on Apache Mesos
Page 27: Getting Started Running Apache Spark on Apache Mesos
Page 28: Getting Started Running Apache Spark on Apache Mesos

step 1: ssh to master

Page 29: Getting Started Running Apache Spark on Apache Mesos

ssh -l ubuntu <master>

Page 30: Getting Started Running Apache Spark on Apache Mesos

step 2: install git, jdk-7

Page 31: Getting Started Running Apache Spark on Apache Mesos

sudo aptitude -y install git!sudo aptitude -y install openjdk-7-jdk

Page 32: Getting Started Running Apache Spark on Apache Mesos

step 3: download spark

Page 33: Getting Started Running Apache Spark on Apache Mesos

wget http://spark-project.org/download/spark-0.8.0-incubating-bin-cdh4.tgz!tar xzf spark-0.8.0-incubating-bin-cdh4.tgz!cd spark-0.8.0-incubating-bin-cdh4/

Page 34: Getting Started Running Apache Spark on Apache Mesos

step 4: sbt clean assembly

Page 35: Getting Started Running Apache Spark on Apache Mesos

SPARK_HADOOP_VERSION=2.0.0-mr1-cdh4.4.0 sbt/sbt clean assembly

Page 36: Getting Started Running Apache Spark on Apache Mesos

step 5: make distro, cp to HDFS

Page 37: Getting Started Running Apache Spark on Apache Mesos

./make-distribution.sh --hadoop 2.0.0-mr1-cdh4.4.0!mv dist spark-0.8.0-2.0.0-mr1-cdh4.4.0!tar czf spark-0.8.0-2.0.0-mr1-cdh4.4.0.tgz spark-0.8.0-2.0.0-mr1-cdh4.4.0!!hadoop fs -mkdir /tmp!hadoop fs -put spark-0.8.0-2.0.0-mr1-cdh4.4.0.tgz /tmp

Page 38: Getting Started Running Apache Spark on Apache Mesos

step 6: config env

Page 39: Getting Started Running Apache Spark on Apache Mesos

cd conf/!cp spark-env.sh.template spark-env.sh!vim spark-env.sh!!export MESOS_NATIVE_LIBRARY=/usr/local/lib/libmesos.so!export SPARK_EXECUTOR_URI=hdfs://<nn>/tmp/spark-0.8.0-2.0.0-mr1-cdh4.4.0.tgz!export MASTER=zk://<master>:2181/mesos!!cat spark-env.sh!cd ..!!./spark-shell

Page 40: Getting Started Running Apache Spark on Apache Mesos

et voilà!

Page 41: Getting Started Running Apache Spark on Apache Mesos

Spark on Mesos, 2014-01-24

•what is Apache Mesos?

• launch a Mesos cluster in the cloud

•configure and run Spark on Mesos

•run jobs in Spark

•further resources…

Page 42: Getting Started Running Apache Spark on Apache Mesos

http://spark.incubator.apache.org/examples.html

run an example job in Spark, to filter an RDD of integers, in two steps at the REPL…

Page 43: Getting Started Running Apache Spark on Apache Mesos

step 1: create an RDD

Page 44: Getting Started Running Apache Spark on Apache Mesos

val data = 1 to 10000!val distData = sc.parallelize(data)!!distData.filter(_< 10).collect()

Page 45: Getting Started Running Apache Spark on Apache Mesos

step 2: run the filter

Page 46: Getting Started Running Apache Spark on Apache Mesos
Page 47: Getting Started Running Apache Spark on Apache Mesos
Page 48: Getting Started Running Apache Spark on Apache Mesos
Page 49: Getting Started Running Apache Spark on Apache Mesos
Page 50: Getting Started Running Apache Spark on Apache Mesos
Page 51: Getting Started Running Apache Spark on Apache Mesos

Spark on Mesos, 2014-01-24

•what is Apache Mesos?

• launch a Mesos cluster in the cloud

•configure and run Spark on Mesos

•run jobs in Spark

•further resources…

Page 52: Getting Started Running Apache Spark on Apache Mesos

Join us! !

O’Reilly Strata, Santa ClaraFeb 11-13 strataconf.com/strata2014

Mesos tutorial, Tue 2/11 1:30pm BOF lunch, Wed 2/12 12:10pm Mesos session, Thu 2/13 2:20pm office hours, Thu 2/13 3:15pm

Page 53: Getting Started Running Apache Spark on Apache Mesos

More insights… !

Monthly newsletter for events, conf summaries, workshops, etc.: liber118.com/pxn/ !

collected Mesos notes: goo.gl/jPtTP

Page 54: Getting Started Running Apache Spark on Apache Mesos
Page 55: Getting Started Running Apache Spark on Apache Mesos