flintrock: a faster, better spark-ec2 by nicholas chammas

26
Flintrock: A faster, better spark-ec2 Nicholas Chammas, Spark Summit East 2016 1 / 26

Upload: spark-summit

Post on 08-Jan-2017

678 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Flintrock: A faster, better spark-ec2

Nicholas Chammas, Spark Summit East 2016

1 / 26

Motivation

Common developer problem:

Give me a working clusterDon't bother me too much with the detailsMake it quick

2 / 26

spark-ec2

Single-purpose command-line toolLaunch and manage Spark clusters on EC2

3 / 26

spark-ec2

Single-purpose command-line toolLaunch and manage Spark clusters on EC2

Common use cases:PrototypingSpark performance testing (spark-perf)

4 / 26

spark-ec2

Problems:Slow launch times: ~9 minutes to launch 2-node cluster

5 / 26

spark-ec2

Problems:Slow launch times: ~9 minutes to launch 2-node clusterPoor UX

e.g. Having to type this out over and over again...

./spark-ec2 launch my-cluster \ --identity-file ~/.ssh/identity.pem \ --key-pair my-key \ --instance-type m3.medium \ --region us-east-1

6 / 26

spark-ec2

Problems:Slow launch times: ~9 minutes to launch 2-node clusterPoor UX

e.g. Having to type this out over and over again...

./spark-ec2 launch my-cluster \ --identity-file ~/.ssh/identity.pem \ --key-pair my-key \ --instance-type m3.medium \ --region us-east-1

Internals difficult to refactorMuch time has already been spent trying to make spark-ec2faster:

SPARK-4325SPARK-5189

7 / 26

spark-ec2

Problems:Slow launch times: ~9 minutes to launch 2-node clusterPoor UX

e.g. Having to type this out over and over again...

./spark-ec2 launch my-cluster \ --identity-file ~/.ssh/identity.pem \ --key-pair my-key \ --instance-type m3.medium \ --region us-east-1

Internals difficult to refactorMuch time has already been spent trying to make spark-ec2faster:

SPARK-4325SPARK-5189

spark-ec2 was created as a convenience side-toolNot originally intended to stand as its own project

8 / 26

Why a new tool?

Plenty of great options out there:DatabricksSpark on EMRApache Bigtop, Ubuntu Juju, Terraform, Ansible, etc.

9 / 26

Why a new tool?

Plenty of great options out there:DatabricksSpark on EMRApache Bigtop, Ubuntu Juju, Terraform, Ansible, etc.

It's fun (for me to build)

10 / 26

Why a new tool?

Plenty of great options out there:DatabricksSpark on EMRApache Bigtop, Ubuntu Juju, Terraform, Ansible, etc.

It's fun (for me to build)Perhaps you don't want a framework

You want a single-purpose tool

11 / 26

Why a new tool?

Plenty of great options out there:DatabricksSpark on EMRApache Bigtop, Ubuntu Juju, Terraform, Ansible, etc.

It's fun (for me to build)Perhaps you don't want a framework

You want a single-purpose toolPerhaps you don't want to be tied to something proprietary

12 / 26

Flintrock

Features

Obsessive focus on speede.g. Launching a cluster with 100 slaves

spark-ec2 takes more than 1 hourFlintrock can do it in under 5 minutes

14 / 26

Flintrock

Features

Obsessive focus on speede.g. Launching a cluster with 100 slaves

spark-ec2 takes more than 1 hourFlintrock can do it in under 5 minutes

Empathy for userPersist your configuration to a file. Then, all you need to launch acluster is:

flintrock launch test-cluster

15 / 26

Flintrock

Features

Obsessive focus on speede.g. Launching a cluster with 100 slaves

spark-ec2 takes more than 1 hourFlintrock can do it in under 5 minutes

Empathy for userPersist your configuration to a file. Then, all you need to launch acluster is:

flintrock launch test-cluster

AccessibilityInstall via pip - Python 3.5+ required

pip install flintrock

Standalone packages - Python not required!https://github.com/nchammas/flintrock/releases

16 / 26

Flintrock

Commands: launch, login, describe, stop, start, destroy, run-command, copy-fileExamples:run-command

flintrock run-command cluster 'sudo yum install -y expect'

copy-file

flintrock copy-file cluster small-file.json /tmp/

17 / 26

Flintrock

Warning: Flintrock is not a drop-in replacement for spark-ec2Scope is more limitedCommands and options are clearly different

18 / 26

Flintrock

Warning: Flintrock is not a drop-in replacement for spark-ec2Scope is more limitedCommands and options are clearly different

Beta-phase of development -- currently at version 0.3

19 / 26

Flintrock

Warning: Flintrock is not a drop-in replacement for spark-ec2Scope is more limitedCommands and options are clearly different

Beta-phase of development -- currently at version 0.3Already used to launch 200+ node clusters

20 / 26

Flintrock

Warning: Flintrock is not a drop-in replacement for spark-ec2Scope is more limitedCommands and options are clearly different

Beta-phase of development -- currently at version 0.3Already used to launch 200+ node clustersSupport for OS X and Linux

Windows support possible in the future

21 / 26

Flintrock

Warning: Flintrock is not a drop-in replacement for spark-ec2Scope is more limitedCommands and options are clearly different

Beta-phase of development -- currently at version 0.3Already used to launch 200+ node clustersSupport for OS X and Linux

Windows support possible in the futureArchitecture supports multiple providers

Perhaps support for Google Compute Engine will be added later thisyear

22 / 26

Flintrock

Warning: Flintrock is not a drop-in replacement for spark-ec2Scope is more limitedCommands and options are clearly different

Beta-phase of development -- currently at version 0.3Already used to launch 200+ node clustersSupport for OS X and Linux

Windows support possible in the futureArchitecture supports multiple providers

Perhaps support for Google Compute Engine will be added later thisyear

100% open source; Apache 2.0 licensedNot company-backed

23 / 26

Flintrock

Warning: Flintrock is not a drop-in replacement for spark-ec2Scope is more limitedCommands and options are clearly different

Beta-phase of development -- currently at version 0.3Already used to launch 200+ node clustersSupport for OS X and Linux

Windows support possible in the futureArchitecture supports multiple providers

Perhaps support for Google Compute Engine will be added later thisyear

100% open source; Apache 2.0 licensedNot company-backed

Contribute!We have unit and acceptance testsDevelopment done entirely on GitHub

24 / 26

Demo

25 / 26

Nicholas Chammas

https://github.com/nchammas/flintrock

Slideshow created using remark / gistdeck.

26 / 26