machine learning for big data analytics: scaling in with containers while scaling out on clusters

36
www.univa.com Presenter: Ian Lumb Machine Learning for Big Data Analytics: Scaling In with Containers while Scaling Out on Clusters Watch On Demand Anytime Note: Includes demos

Upload: ian-lumb

Post on 14-Jan-2017

225 views

Category:

Software


3 download

TRANSCRIPT

Page 1: Machine Learning for Big Data Analytics:  Scaling In with Containers while Scaling Out on Clusters

www.univa.com

Presenter:

Ian Lumb

Machine Learning for Big Data

Analytics:

Scaling In with Containers while

Scaling Out on Clusters

Watch On Demand AnytimeNote: Includes demos

Page 2: Machine Learning for Big Data Analytics:  Scaling In with Containers while Scaling Out on Clusters

2

Agenda

Introduction

Use case example

Scaling …

Out with Apache Spark via Univa Universal Resource Broker

Up with NVIDIA GPUs and Univa Grid Engine

In/Down with Univa container solutions

Summary

Page 3: Machine Learning for Big Data Analytics:  Scaling In with Containers while Scaling Out on Clusters

www.univa.com

Introduction

Page 4: Machine Learning for Big Data Analytics:  Scaling In with Containers while Scaling Out on Clusters

Machine Learning Defined

4

“A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P if its

performance at tasks in T, as measured by P, improves with experience E”.

T. M. Mitchell et al., Machine Learning, WCB, 1997

Page 5: Machine Learning for Big Data Analytics:  Scaling In with Containers while Scaling Out on Clusters

Deep Learning Defined

5

“… a modern refinement of ‘machine learning’, in which computers teach

themselves tasks by crunching large sets of data”.

http://www.economist.com/news/briefing/21650526-artificialintelligence-scares-peopleexcessively-so-rise-machines

Page 6: Machine Learning for Big Data Analytics:  Scaling In with Containers while Scaling Out on Clusters

www.univa.com

Use Case Example:

Earthquakes and

Tsunamis

Page 7: Machine Learning for Big Data Analytics:  Scaling In with Containers while Scaling Out on Clusters

Use Case: Context

htt

p:/

/cre

dit

.pva

mu

.ed

u/M

CB

DA

20

16/S

lides

/Day

2_L

um

b_M

CB

DA

1_T

wit

ter_

Tsu

nam

i.pd

f

Page 8: Machine Learning for Big Data Analytics:  Scaling In with Containers while Scaling Out on Clusters

Use Case: Motivation

Non-deterministic cause

Uncertainty inherent in any attempt to predict earthquakes

o In situ measurements may reduce uncertainty

Lead times

Availability of actionable observations

Communication of situation - advisories, warnings, etc.

Cause-effect relationship

Energy transfer - inputs ... coupling ... outputs

o ‘Geometry’ - bathymetry and topography

Other factors - e.g., tides

Established effect

Far-field estimates of tsunami propagation (pre-computed) and coastal

inundation (real-time) have proven to be extremely accurate ...

requires– Distributed array of deep-ocean tsunami detection buoys + forecasting model

htt

p:/

/cre

dit

.pva

mu

.ed

u/M

CB

DA

20

16

/Slid

es/D

ay2

_Lu

mb

_MC

BD

A1

_Tw

itte

r_Ts

un

ami.p

df

Page 9: Machine Learning for Big Data Analytics:  Scaling In with Containers while Scaling Out on Clusters

http://www.gitews.org/en/concept/

Use Case: Traditional Data Sources

http://credit.pvamu.edu/MCBDA2016/Slides/Day2_Lumb_MCBDA1_Twitter_Tsunami.pdf

Page 10: Machine Learning for Big Data Analytics:  Scaling In with Containers while Scaling Out on Clusters

Use Case: Deep Learning from Twitter?

http://credit.pvamu.edu/MCBDA2016/Slides/Day2_Lumb_MCBDA1_Twitter_Tsunami.pdf

Page 11: Machine Learning for Big Data Analytics:  Scaling In with Containers while Scaling Out on Clusters

Karau et al., Learning Spark, O’Reilly, 2015

Use Case: Machine Learning Pipeline

Page 12: Machine Learning for Big Data Analytics:  Scaling In with Containers while Scaling Out on Clusters

Use Case: Deep Learning from Twitter?

Represent data

Twitter data manually curated into ‘ham’ and ‘spam’

In-memory representation via Spark RDDs

Extract features

Frequency-based usage via Spark MLlib HashingTF

⇒ feature vectors

Develop model object

Spark MLlib LogisticRegressionWithSGD used for

classification

Evaluate model

http://credit.pvamu.edu/MCBDA2016/Slides/Day2_Lumb_MCBDA1_Twitter_Tsunami.pdf

Page 13: Machine Learning for Big Data Analytics:  Scaling In with Containers while Scaling Out on Clusters

Use Case: Laptop Prototype

htt

p:/

/cre

dit

.pva

mu

.ed

u/M

CB

DA

20

16

/Slid

es/D

ay2

_Lu

mb

_MC

BD

A1

_Tw

itte

r_Ts

un

ami.p

df

Page 14: Machine Learning for Big Data Analytics:  Scaling In with Containers while Scaling Out on Clusters

Use Case: Next Steps …

http://credit.pvamu.edu/MCBDA2016/Slides/Day2_Lumb_MCBDA1_Twitter_Tsunami.pdf

Page 15: Machine Learning for Big Data Analytics:  Scaling In with Containers while Scaling Out on Clusters

Next Steps: Scaling …

15

OUTIN

DOWN

UP

Page 16: Machine Learning for Big Data Analytics:  Scaling In with Containers while Scaling Out on Clusters

www.univa.com

Apache Spark via

Univa Universal

Resource Broker

Page 17: Machine Learning for Big Data Analytics:  Scaling In with Containers while Scaling Out on Clusters

Machine Learning via Apache Spark

17

http://img.deusm.com/informationweek/2015/03/1319660/Spark-2015-Vision.jpg

Page 18: Machine Learning for Big Data Analytics:  Scaling In with Containers while Scaling Out on Clusters

URB: Product Overview

18

URB extends Univa Grid Engine to handle Service and Custom distributed applications in a Univa Grid Engine Cluster.

An API for developing distributed applications Compatible with Apache Mesos API Bindings for Python, Java, and C++

A runtime environment for hosting distributed applications Supports frameworks developed against the Mesos API Supports frameworks developed against the URB API Uses Univa Grid Engine to place and run work

What is Universal Resource Broker (URB)?

www.univa.com

Page 19: Machine Learning for Big Data Analytics:  Scaling In with Containers while Scaling Out on Clusters

URB: Architecture Overview

19

Spark Framework Running Thunder

www.univa.com

Page 20: Machine Learning for Big Data Analytics:  Scaling In with Containers while Scaling Out on Clusters

www.univa.com

Copyright © Univa Corporation, 2015. All Rights Reserved 20

URB: Web User Interface

Page 21: Machine Learning for Big Data Analytics:  Scaling In with Containers while Scaling Out on Clusters

HPC & Spark Workloads Together

21

Page 22: Machine Learning for Big Data Analytics:  Scaling In with Containers while Scaling Out on Clusters

URB: Solution Summary

t

22

Universal Resource Broker For the end user there is no change in application workflow For the admins there is increased control and policy capability over

compute resources The solution provides the ability to share resources across big data and

traditional batch workloads Single resource allocation policy defined by business goals Single accounting repository to track resource consumption Full workload lifecycle management for heterogeneous workloads

www.univa.com

Page 23: Machine Learning for Big Data Analytics:  Scaling In with Containers while Scaling Out on Clusters

www.univa.com

GPUs

Page 24: Machine Learning for Big Data Analytics:  Scaling In with Containers while Scaling Out on Clusters

GPUs for Deep Learning

24

http://image.slidesharecdn.com/nvidiateslap100-160621104058/95/announcing-the-nvidia-tesla-p100-gpu-for-pcie-servers-9-638.jpg?cb=1466505803

Page 25: Machine Learning for Big Data Analytics:  Scaling In with Containers while Scaling Out on Clusters

Post installation check:

qhost -F <hostname>

hl:cuda.verstr=270.41.06

hl:cuda.0.name=GeForce 8400 GS

hl:cuda.0.totalMem=511.312M

hl:cuda.0.freeMem=500.480M

hl:cuda.0.usedMem=10.832M

hl:cuda.0.eccEnabled=0

hl:cuda.0.temperature=44.000000

hl:cuda.1.name=GeForce 8400 GS

hl:cuda.1.totalMem=511.312M

hl:cuda.1.freeMem=406.066M

hl:cuda.1.usedMem=20.274M

hl:cuda.1.eccEnabled=0

hl:cuda.1.temperature=43.000000

hl:cuda.devices=2

CUDA LOAD SENSOR

Copyright © 2016 Univa Corporation, All Rights Reserved. 25

Page 26: Machine Learning for Big Data Analytics:  Scaling In with Containers while Scaling Out on Clusters

• CUDA complexes can be used for:

• Setting alarm state of a host based on ECC errors

(load_threshold in queue config)

• Sorting hosts (load_formula)

• Job submission

• Requesting a host with GPUs

• qsub -l cuda.devices=2 ...

• Complex can be made consumable (complex

configuration) in order to limit amount of CUDA jobs per

host

GPU JOB SUBMISSION

Copyright © 2016 Univa Corporation, All Rights Reserved. 26

Page 27: Machine Learning for Big Data Analytics:  Scaling In with Containers while Scaling Out on Clusters

Host A10

Host B10

Host N10

UGE Cluster

...

Job124

A

B

C

D

E

e.g. GPUs(IDs 0 & 1)

e.g. scratchstorage A-E

Job123

Two host resources: 0, 1

Five global resources: A, B, C, D, E

Job 123 got assigned ID 0 of GPU resource on

host N and resource C of global resource

scratch

Job 124 got assigned ID 1 of GPU resource on

host B and resource E of global resource

scratch

RESOURCE MAPS

Copyright © 2016 Univa Corporation, All Rights Reserved. 27

Page 28: Machine Learning for Big Data Analytics:  Scaling In with Containers while Scaling Out on Clusters

www.univa.com

Containers

Page 29: Machine Learning for Big Data Analytics:  Scaling In with Containers while Scaling Out on Clusters

www.univa.co

m

Containerized PySpark Example

29

Page 30: Machine Learning for Big Data Analytics:  Scaling In with Containers while Scaling Out on Clusters

www.univa.co

m 30

Univa Grid Engine – Container Edition (1)

Launch Docker Container on best machine in cluster

Reduces time wasted (it can be minutes … or longer)

o Attempting to launch on an improperly serviced execution host.

o Waiting for the Docker image to download from the Docker registry.

Ensures container runs faster increasing throughput in the cluster.

Run Docker Containers in a Univa Grid Engine Cluster

Business Critical containers are prioritized over other containers.

Increases efficiency of the overall organization.

Containers can be orchestrated alongside other critical workloads such

as batch jobs and frameworks.

$ qsub -o /home/jdoe -j y -xdv "/home:/home"

-l docker,docker_images="*centos:latest*“ my_job.sh

Page 31: Machine Learning for Big Data Analytics:  Scaling In with Containers while Scaling Out on Clusters

www.univa.co

m 31

Univa Grid Engine – Container Edition (2)

Job Control and Limits for Docker Containers

Provides user and administrator control over containers running on Grid

Engine Hosts.

Accounting for Docker Containers

Keeps track of containers. Share policies require accounting.

Data file Management for Docker Containers

Transparent access to input, output and error files. Simplifies the

management of input and output files for Docker Containers and

ensures any output or error files are moved to a location where the user

can access them.

Interactive Docker Containers

Good for debugging when containers don’t work correctly!

Parallel jobs in Docker Containers

Message-passing parallel jobs can each run a set of tasks in a container

on a machine.

Page 32: Machine Learning for Big Data Analytics:  Scaling In with Containers while Scaling Out on Clusters

Containerized GPUs

32

https://github.com/NVIDIA/nvidia-docker

Page 33: Machine Learning for Big Data Analytics:  Scaling In with Containers while Scaling Out on Clusters

Univa Confidential

Navops by Univa

Easy installation, preconfigured solution including pre-integration with cloud services.

Build a container cluster on premise or in the cloud.

The fastest way to build a container cluster!!

Respond Quickly: Easy to resize, adapt, dynamic provisioning

Orchestrate and Optimize: Best use of resources and keep track of containers

The most advanced container orchestration!!

http://navops.io/

Page 34: Machine Learning for Big Data Analytics:  Scaling In with Containers while Scaling Out on Clusters

Univa Confidential 34

Navops orchestration

solution

Page 35: Machine Learning for Big Data Analytics:  Scaling In with Containers while Scaling Out on Clusters

35

Summary

Scaling Machine Learning from prototype to production …

Out with Apache Spark via Univa Universal Resource Broker

Up with NVIDIA GPUs via Univa Grid Engine

In/Down via Univa Container solutions

o Univa Grid Engine – Container Edition

o Navops Launch and Command

Page 36: Machine Learning for Big Data Analytics:  Scaling In with Containers while Scaling Out on Clusters

www.univa.com

THANK YOUIan Lumb

Solutions Architect

+1 630 303-9068 [email protected]

Watch On Demand AnytimeNote: Includes demos