machine learning for big data analytics: scaling in with containers while scaling out on clusters
TRANSCRIPT
www.univa.com
Presenter:
Ian Lumb
Machine Learning for Big Data
Analytics:
Scaling In with Containers while
Scaling Out on Clusters
Watch On Demand AnytimeNote: Includes demos
2
Agenda
Introduction
Use case example
Scaling …
Out with Apache Spark via Univa Universal Resource Broker
Up with NVIDIA GPUs and Univa Grid Engine
In/Down with Univa container solutions
Summary
www.univa.com
Introduction
Machine Learning Defined
4
“A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P if its
performance at tasks in T, as measured by P, improves with experience E”.
T. M. Mitchell et al., Machine Learning, WCB, 1997
Deep Learning Defined
5
“… a modern refinement of ‘machine learning’, in which computers teach
themselves tasks by crunching large sets of data”.
http://www.economist.com/news/briefing/21650526-artificialintelligence-scares-peopleexcessively-so-rise-machines
www.univa.com
Use Case Example:
Earthquakes and
Tsunamis
Use Case: Context
htt
p:/
/cre
dit
.pva
mu
.ed
u/M
CB
DA
20
16/S
lides
/Day
2_L
um
b_M
CB
DA
1_T
wit
ter_
Tsu
nam
i.pd
f
Use Case: Motivation
Non-deterministic cause
Uncertainty inherent in any attempt to predict earthquakes
o In situ measurements may reduce uncertainty
Lead times
Availability of actionable observations
Communication of situation - advisories, warnings, etc.
Cause-effect relationship
Energy transfer - inputs ... coupling ... outputs
o ‘Geometry’ - bathymetry and topography
Other factors - e.g., tides
Established effect
Far-field estimates of tsunami propagation (pre-computed) and coastal
inundation (real-time) have proven to be extremely accurate ...
requires– Distributed array of deep-ocean tsunami detection buoys + forecasting model
htt
p:/
/cre
dit
.pva
mu
.ed
u/M
CB
DA
20
16
/Slid
es/D
ay2
_Lu
mb
_MC
BD
A1
_Tw
itte
r_Ts
un
ami.p
df
http://www.gitews.org/en/concept/
Use Case: Traditional Data Sources
http://credit.pvamu.edu/MCBDA2016/Slides/Day2_Lumb_MCBDA1_Twitter_Tsunami.pdf
Use Case: Deep Learning from Twitter?
http://credit.pvamu.edu/MCBDA2016/Slides/Day2_Lumb_MCBDA1_Twitter_Tsunami.pdf
Karau et al., Learning Spark, O’Reilly, 2015
Use Case: Machine Learning Pipeline
Use Case: Deep Learning from Twitter?
Represent data
Twitter data manually curated into ‘ham’ and ‘spam’
In-memory representation via Spark RDDs
Extract features
Frequency-based usage via Spark MLlib HashingTF
⇒ feature vectors
Develop model object
Spark MLlib LogisticRegressionWithSGD used for
classification
Evaluate model
http://credit.pvamu.edu/MCBDA2016/Slides/Day2_Lumb_MCBDA1_Twitter_Tsunami.pdf
Use Case: Laptop Prototype
htt
p:/
/cre
dit
.pva
mu
.ed
u/M
CB
DA
20
16
/Slid
es/D
ay2
_Lu
mb
_MC
BD
A1
_Tw
itte
r_Ts
un
ami.p
df
Use Case: Next Steps …
http://credit.pvamu.edu/MCBDA2016/Slides/Day2_Lumb_MCBDA1_Twitter_Tsunami.pdf
Next Steps: Scaling …
15
OUTIN
DOWN
UP
www.univa.com
Apache Spark via
Univa Universal
Resource Broker
Machine Learning via Apache Spark
17
http://img.deusm.com/informationweek/2015/03/1319660/Spark-2015-Vision.jpg
URB: Product Overview
18
URB extends Univa Grid Engine to handle Service and Custom distributed applications in a Univa Grid Engine Cluster.
An API for developing distributed applications Compatible with Apache Mesos API Bindings for Python, Java, and C++
A runtime environment for hosting distributed applications Supports frameworks developed against the Mesos API Supports frameworks developed against the URB API Uses Univa Grid Engine to place and run work
What is Universal Resource Broker (URB)?
www.univa.com
URB: Architecture Overview
19
Spark Framework Running Thunder
www.univa.com
www.univa.com
Copyright © Univa Corporation, 2015. All Rights Reserved 20
URB: Web User Interface
HPC & Spark Workloads Together
21
URB: Solution Summary
t
22
Universal Resource Broker For the end user there is no change in application workflow For the admins there is increased control and policy capability over
compute resources The solution provides the ability to share resources across big data and
traditional batch workloads Single resource allocation policy defined by business goals Single accounting repository to track resource consumption Full workload lifecycle management for heterogeneous workloads
www.univa.com
www.univa.com
GPUs
GPUs for Deep Learning
24
http://image.slidesharecdn.com/nvidiateslap100-160621104058/95/announcing-the-nvidia-tesla-p100-gpu-for-pcie-servers-9-638.jpg?cb=1466505803
Post installation check:
qhost -F <hostname>
hl:cuda.verstr=270.41.06
hl:cuda.0.name=GeForce 8400 GS
hl:cuda.0.totalMem=511.312M
hl:cuda.0.freeMem=500.480M
hl:cuda.0.usedMem=10.832M
hl:cuda.0.eccEnabled=0
hl:cuda.0.temperature=44.000000
hl:cuda.1.name=GeForce 8400 GS
hl:cuda.1.totalMem=511.312M
hl:cuda.1.freeMem=406.066M
hl:cuda.1.usedMem=20.274M
hl:cuda.1.eccEnabled=0
hl:cuda.1.temperature=43.000000
hl:cuda.devices=2
CUDA LOAD SENSOR
Copyright © 2016 Univa Corporation, All Rights Reserved. 25
• CUDA complexes can be used for:
• Setting alarm state of a host based on ECC errors
(load_threshold in queue config)
• Sorting hosts (load_formula)
• Job submission
• Requesting a host with GPUs
• qsub -l cuda.devices=2 ...
• Complex can be made consumable (complex
configuration) in order to limit amount of CUDA jobs per
host
GPU JOB SUBMISSION
Copyright © 2016 Univa Corporation, All Rights Reserved. 26
Host A10
Host B10
Host N10
UGE Cluster
...
Job124
A
B
C
D
E
e.g. GPUs(IDs 0 & 1)
e.g. scratchstorage A-E
Job123
Two host resources: 0, 1
Five global resources: A, B, C, D, E
Job 123 got assigned ID 0 of GPU resource on
host N and resource C of global resource
scratch
Job 124 got assigned ID 1 of GPU resource on
host B and resource E of global resource
scratch
RESOURCE MAPS
Copyright © 2016 Univa Corporation, All Rights Reserved. 27
www.univa.com
Containers
www.univa.co
m
Containerized PySpark Example
29
www.univa.co
m 30
Univa Grid Engine – Container Edition (1)
Launch Docker Container on best machine in cluster
Reduces time wasted (it can be minutes … or longer)
o Attempting to launch on an improperly serviced execution host.
o Waiting for the Docker image to download from the Docker registry.
Ensures container runs faster increasing throughput in the cluster.
Run Docker Containers in a Univa Grid Engine Cluster
Business Critical containers are prioritized over other containers.
Increases efficiency of the overall organization.
Containers can be orchestrated alongside other critical workloads such
as batch jobs and frameworks.
$ qsub -o /home/jdoe -j y -xdv "/home:/home"
-l docker,docker_images="*centos:latest*“ my_job.sh
www.univa.co
m 31
Univa Grid Engine – Container Edition (2)
Job Control and Limits for Docker Containers
Provides user and administrator control over containers running on Grid
Engine Hosts.
Accounting for Docker Containers
Keeps track of containers. Share policies require accounting.
Data file Management for Docker Containers
Transparent access to input, output and error files. Simplifies the
management of input and output files for Docker Containers and
ensures any output or error files are moved to a location where the user
can access them.
Interactive Docker Containers
Good for debugging when containers don’t work correctly!
Parallel jobs in Docker Containers
Message-passing parallel jobs can each run a set of tasks in a container
on a machine.
Containerized GPUs
32
https://github.com/NVIDIA/nvidia-docker
Univa Confidential
Navops by Univa
Easy installation, preconfigured solution including pre-integration with cloud services.
Build a container cluster on premise or in the cloud.
The fastest way to build a container cluster!!
Respond Quickly: Easy to resize, adapt, dynamic provisioning
Orchestrate and Optimize: Best use of resources and keep track of containers
The most advanced container orchestration!!
http://navops.io/
Univa Confidential 34
Navops orchestration
solution
35
Summary
Scaling Machine Learning from prototype to production …
Out with Apache Spark via Univa Universal Resource Broker
Up with NVIDIA GPUs via Univa Grid Engine
In/Down via Univa Container solutions
o Univa Grid Engine – Container Edition
o Navops Launch and Command
www.univa.com
THANK YOUIan Lumb
Solutions Architect
+1 630 303-9068 [email protected]
Watch On Demand AnytimeNote: Includes demos