big containers, big orchestration, big data · big containers, big orchestration, big data william...

BIG CONTAINERS, BIG ORCHESTRATION, BIG DATAWilliam Benton Red Hat, Inc.@willb

COMMONS GATHERINGSeattle | November 7#OCGathering2016

BACKGROUND


Mesos

WHAT OUR CLUSTER LOOKED LIKE IN 2014

Networked POSIX FS

Spark executor

Spark executor

Spark executor

Spark executor

Spark executor

Spark executor

1

2

3

4

1

1

2

3

3

4

Analytics is no longer a separate workload.Analytics is an essential component of modern data-driven applications.


OUR GOALS

git


FORECAST

Spark and microservices

Architectures for analytics and applications

Scheduling and storage

Future work (and how to get involved)

SPARK AND MICROSERVICES

Apache Spark is a fast and general framework for distributed data processing.

Resilient Distributed Datasets are partitioned, lazy, and immutable homogeneous collections.


RESILIENT DISTRIBUTED DATASETS

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

2 3 4 6 7 8 10 11 121 5 9 13 14 15 16


RESILIENT DISTRIBUTED DATASETS


1 2 3 λ x: x % 2 != 0 λ x: x * 3

FILTER MAP

λ x: [x, x+1]

FLATMAP


3 λ x: x % 2 != 0 λ x: x * 3

FILTER MAP

λ x: [x, x+1]

FLATMAP

3 4 9 10COLLECT


1 2 3 λ x: x % 2 != 0λ x: x * 3

FILTERMAP

λ x: [x, x+1]

FLATMAP

3 4 9 10SAVE AS TEXT FILE

CACHE


executor1

1 2 3

executorn

10 11 12

cluster manager

2 4 6 20 22 24

λ x: x * 2λ x: x * 2

driver

CACHCACH


Spark core

Graph SQL ML Streaming

ad hoc Mesos YARN


Spark core

Graph SQL ML Streaming

ad hoc Mesos YARNk8s

A microservice architecture employs lightweight, modular, and typically stateless components with well-defined interfaces and contracts.


BENEFITS OF MICROSERVICE ARCHITECTURES


BENEFITS OF MICROSERVICE ARCHITECTURES

2 + 2 5


MICROSERVICES AND SPARK

executor

1 2 3

executor

4 5 6

executor

7 8 9

executor

10 11 12

master

λ x: x * 22 4 6 8 10 12 14 16 18 20 22 24

λ x: x * 2 λ x: x * 2 λ x: x * 2 λ x: x * 2

ARCHITECTURES FOR ANALYTICS AND APPLICATIONS


APPLICATION RESPONSIBILITIES

archive

trainmodels

transform

transform

transform

aggregate

events

databases

file, object storage


APPLICATION RESPONSIBILITIES

archive

trainmodels

transform

transform

transform

aggregate

events

databases


management

web and mobile

reporting

developer UI

LEGACY ARCHITECTURES


transactionprocessing

CONVENTIONAL DATA WAREHOUSE

transformevents

UI business logic

RDBMS


transactionprocessing

CONVENTIONAL DATA WAREHOUSE

transformevents

UI business logic

RDBMS analytic processing

RDBMS

analysis

interactive queryreporting


HADOOP-STYLE “DATA LAKE”

HDFS

events

HDFS HDFS HDFS HDFS


HADOOP-STYLE “DATA LAKE”

HDFS

compute

events

HDFS

compute

HDFS

compute compute compute

HDFS HDFS

MODERN ARCHITECTURES


serving layerspeed layer

THE LAMBDA ARCHITECTURE

events

batch layer

UIfederate

(precise)analysistransform

(imprecise)analysistransform

DFS


queue for “raw data” topic

THE KAPPA ARCHITECTURE

events

transform analysis

queue for “preprocessed data” topic

queue for “analysis results” topic

reporting end-user UI


DATA FEDERATION IN THE COMPUTE LAYER

aggregate

trainmodels

archive

events

databases


management

web and mobile

reporting

developer UItransform

transform

transform

PRACTICALITIES AND POTENTIAL PITFALLS


Cluster scheduler

SIDEBAR: THE MONOLITHIC SPARK ANTIPATTERN

Shared FSSpark executor

Spark executor

Spark executor

Spark executor

Spark executor

Spark executor

Resource manager

app 1 app 2

app 4app 3


OpenShift

ONE CLUSTER PER APPLICATION

Object storesapp 1 app 2

app 5app 4

app 3

app 6

app 1 app 2

app 5app 4

app 3

app 6

Databases


OpenShift

app 1 app 2

app 5app 4

app 3

app 6

app 1


OpenShift

app 1 app 2

app 5app 4

app 3

app 6

app 1 app 2

app 5app 4

app 3

app 6

POSIX FS

HDFS HDFS

HDFS HDFS

HDFS

HDFS


OpenShift

app 1 app 2

app 5app 4

app 3

app 6

app 1 app 2

app 5app 4

app 3

app 6

object store

✓ interoperability✓ fine-grained AC✓ many implementations

✗ consistency model✗ performance

“For the workloads from Facebook and Bing, we see that 96% and 89% of the active jobs respectively can have their data entirely fit in memory, given an allowance of 32GB memory per server for caching”

—“PACMan: Coordinated Memory Caching for Parallel Jobs.” G. Ananthanarayanan et al., in Proceedings of NSDI ’12.

“Recent studies have shown that reading data from local disks is only about 8% faster than reading it from remote disks over the network … [and] this 8% number is decreasing.”

—Tom Phelan, “The Elephant in the Big Data Room: Data Locality is Irrelevant for Hadoop” (goo.gl/MnCKuM)

http://goo.gl/MnCKuM

“Three out of ten hours of job runtime were spent moving files from the staging directory to the final directory in HDFS…We were essentially compressing, serializing, and replicating three copies for a single read.”

—“Apache Spark @Scale: a 60+ TB production use case”Facebook Engineering Blog Post


executor1 executornCACHCACH

driver


COLOCATED COMPUTE AND STORAGE: YAGNI

Disk locality is just another kind of caching, but memory is much faster than disk and working set sizes typically fit in cluster memory after ETL.

The I/O-heavy behavior of frameworks designed for colocated compute and storage performs worse than iterative processing in memory.

Colocating compute and storage prevents independent scale-out of compute and turns “cattle” into “pets.”


…BUT IF YOU DOOpenShift

app 1 app 2 app 3app 1 app 2 app 3

Storage


…BUT IF YOU DOOpenShift

app 1 app 2 app 3app 1 app 2 app 3

Storage Storage Storage

PLAYING ALONG AT HOME


TRY IT OUT YOURSELF

Enabling Spark on OpenShift: https://github.com/radanalyticsio

Video demo: https://vimeo.com/189710503

Meet the teams at lunch!

https://github.com/radanalyticsio

https://vimeo.com/189710503

@willb • [email protected] https://chapeau.freevariable.com

THANKS!

mailto:[email protected]?subject=

https://chapeau.freevariable.com

big containers, big orchestration, big data · big containers, big orchestration, big data william...

Documents