machine learning for big data analytics: scaling in with containers while scaling out on clusters

www.univa.com

Presenter:

Ian Lumb

Machine Learning for Big Data

Analytics:

Scaling In with Containers while

Scaling Out on Clusters

Watch On Demand AnytimeNote: Includes demos

http://www.univa.com/resources/webinar-machine-learning.php

2

Agenda

Introduction

Use case example

Scaling …

Out with Apache Spark via Univa Universal Resource Broker

Up with NVIDIA GPUs and Univa Grid Engine

In/Down with Univa container solutions

Summary

www.univa.com

Introduction

Machine Learning Defined

4

“A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P if its

performance at tasks in T, as measured by P, improves with experience E”.

T. M. Mitchell et al., Machine Learning, WCB, 1997

Deep Learning Defined

5

“… a modern refinement of ‘machine learning’, in which computers teach

themselves tasks by crunching large sets of data”.

http://www.economist.com/news/briefing/21650526-artificialintelligence-scares-peopleexcessively-so-rise-machines

www.univa.com

Use Case Example:

Earthquakes and

Tsunamis

Use Case: Context

htt

p:/

/cre

dit

.pva

mu

.ed

u/M

CB

DA

20

16/S

lides

/Day

2_L

um

b_M

CB

DA

1_T

wit

ter_

Tsu

nam

i.pd

f

http://credit.pvamu.edu/MCBDA2016/Slides/Day2_Lumb_MCBDA1_Twitter_Tsunami.pdf

Use Case: Motivation

Non-deterministic cause

Uncertainty inherent in any attempt to predict earthquakes

o In situ measurements may reduce uncertainty

Lead times

Availability of actionable observations

Communication of situation - advisories, warnings, etc.

Cause-effect relationship

Energy transfer - inputs ... coupling ... outputs

o ‘Geometry’ - bathymetry and topography

Other factors - e.g., tides

Established effect

Far-field estimates of tsunami propagation (pre-computed) and coastal

inundation (real-time) have proven to be extremely accurate ...

requires– Distributed array of deep-ocean tsunami detection buoys + forecasting model

htt

p:/

/cre

dit

.pva

mu

.ed

u/M

CB

DA

20

16

/Slid

es/D

ay2

_Lu

mb

_MC

BD

A1

_Tw

itte

r_Ts

un

ami.p

df


http://www.gitews.org/en/concept/

Use Case: Traditional Data Sources


http://www.gitews.org/en/concept/


Use Case: Deep Learning from Twitter?



Karau et al., Learning Spark, O’Reilly, 2015

Use Case: Machine Learning Pipeline

Use Case: Deep Learning from Twitter?

Represent data

Twitter data manually curated into ‘ham’ and ‘spam’

In-memory representation via Spark RDDs

Extract features

Frequency-based usage via Spark MLlib HashingTF

⇒ feature vectors

Develop model object

Spark MLlib LogisticRegressionWithSGD used for

classification

Evaluate model



Use Case: Laptop Prototype

htt

p:/

/cre

dit

.pva

mu

.ed

u/M

CB

DA

20

16

/Slid

es/D

ay2

_Lu

mb

_MC

BD

A1

_Tw

itte

r_Ts

un

ami.p

df


Use Case: Next Steps …



Next Steps: Scaling …

15

OUTIN

DOWN

UP

www.univa.com

Apache Spark via

Univa Universal

Resource Broker

Machine Learning via Apache Spark

17

http://img.deusm.com/informationweek/2015/03/1319660/Spark-2015-Vision.jpg

http://img.deusm.com/informationweek/2015/03/1319660/Spark-2015-Vision.jpg

URB: Product Overview

18

URB extends Univa Grid Engine to handle Service and Custom distributed applications in a Univa Grid Engine Cluster.

An API for developing distributed applications Compatible with Apache Mesos API Bindings for Python, Java, and C++

A runtime environment for hosting distributed applications Supports frameworks developed against the Mesos API Supports frameworks developed against the URB API Uses Univa Grid Engine to place and run work

What is Universal Resource Broker (URB)?

www.univa.com

URB: Architecture Overview

19

Spark Framework Running Thunder

www.univa.com

www.univa.com

Copyright © Univa Corporation, 2015. All Rights Reserved 20

URB: Web User Interface

HPC & Spark Workloads Together

21

URB: Solution Summary

t

22

Universal Resource Broker For the end user there is no change in application workflow For the admins there is increased control and policy capability over

compute resources The solution provides the ability to share resources across big data and

traditional batch workloads Single resource allocation policy defined by business goals Single accounting repository to track resource consumption Full workload lifecycle management for heterogeneous workloads

www.univa.com

www.univa.com

GPUs

GPUs for Deep Learning

24

http://image.slidesharecdn.com/nvidiateslap100-160621104058/95/announcing-the-nvidia-tesla-p100-gpu-for-pcie-servers-9-638.jpg?cb=1466505803

http://image.slidesharecdn.com/nvidiateslap100-160621104058/95/announcing-the-nvidia-tesla-p100-gpu-for-pcie-servers-9-638.jpg?cb=1466505803

Post installation check:

qhost -F <hostname>

hl:cuda.verstr=270.41.06

hl:cuda.0.name=GeForce 8400 GS

hl:cuda.0.totalMem=511.312M

hl:cuda.0.freeMem=500.480M

hl:cuda.0.usedMem=10.832M

hl:cuda.0.eccEnabled=0

hl:cuda.0.temperature=44.000000

hl:cuda.1.name=GeForce 8400 GS

hl:cuda.1.totalMem=511.312M

hl:cuda.1.freeMem=406.066M

hl:cuda.1.usedMem=20.274M

hl:cuda.1.eccEnabled=0

hl:cuda.1.temperature=43.000000

hl:cuda.devices=2

CUDA LOAD SENSOR

Copyright © 2016 Univa Corporation, All Rights Reserved. 25

• CUDA complexes can be used for:

• Setting alarm state of a host based on ECC errors

(load_threshold in queue config)

• Sorting hosts (load_formula)

• Job submission

• Requesting a host with GPUs

• qsub -l cuda.devices=2 ...

• Complex can be made consumable (complex

configuration) in order to limit amount of CUDA jobs per

host

GPU JOB SUBMISSION


Host A10

Host B10

Host N10

UGE Cluster

...

Job124

A

B

C

D

E

e.g. GPUs(IDs 0 & 1)

e.g. scratchstorage A-E

Job123

Two host resources: 0, 1

Five global resources: A, B, C, D, E

Job 123 got assigned ID 0 of GPU resource on

host N and resource C of global resource

scratch

Job 124 got assigned ID 1 of GPU resource on

host B and resource E of global resource

scratch

RESOURCE MAPS


www.univa.com

Containers

www.univa.co

m

Containerized PySpark Example

29

www.univa.co

m 30

Univa Grid Engine – Container Edition (1)

Launch Docker Container on best machine in cluster

Reduces time wasted (it can be minutes … or longer)

o Attempting to launch on an improperly serviced execution host.

o Waiting for the Docker image to download from the Docker registry.

Ensures container runs faster increasing throughput in the cluster.

Run Docker Containers in a Univa Grid Engine Cluster

Business Critical containers are prioritized over other containers.

Increases efficiency of the overall organization.

Containers can be orchestrated alongside other critical workloads such

as batch jobs and frameworks.

$ qsub -o /home/jdoe -j y -xdv "/home:/home"

-l docker,docker_images="*centos:latest*“ my_job.sh

www.univa.co

m 31

Univa Grid Engine – Container Edition (2)

Job Control and Limits for Docker Containers

Provides user and administrator control over containers running on Grid

Engine Hosts.

Accounting for Docker Containers

Keeps track of containers. Share policies require accounting.

Data file Management for Docker Containers

Transparent access to input, output and error files. Simplifies the

management of input and output files for Docker Containers and

ensures any output or error files are moved to a location where the user

can access them.

Interactive Docker Containers

Good for debugging when containers don’t work correctly!

Parallel jobs in Docker Containers

Message-passing parallel jobs can each run a set of tasks in a container

on a machine.

Containerized GPUs

32

https://github.com/NVIDIA/nvidia-docker

https://github.com/NVIDIA/nvidia-docker

Univa Confidential

Navops by Univa

Easy installation, preconfigured solution including pre-integration with cloud services.

Build a container cluster on premise or in the cloud.

The fastest way to build a container cluster!!

Respond Quickly: Easy to resize, adapt, dynamic provisioning

Orchestrate and Optimize: Best use of resources and keep track of containers

The most advanced container orchestration!!

http://navops.io/

http://navops.io/

Univa Confidential 34

Navops orchestration

solution

35

Summary

Scaling Machine Learning from prototype to production …

Out with Apache Spark via Univa Universal Resource Broker

Up with NVIDIA GPUs via Univa Grid Engine

In/Down via Univa Container solutions

o Univa Grid Engine – Container Edition

o Navops Launch and Command

www.univa.com

THANK YOUIan Lumb

Solutions Architect

+1 630 303-9068 [email protected]

Watch On Demand AnytimeNote: Includes demos

http://www.univa.com/resources/webinar-machine-learning.php

machine learning for big data analytics: scaling in with containers while scaling out on clusters

Software