dl4j at workday meetup

22
DL4J: Deep Learning for the JVM and Enterprise David C. Kale, Ruben Fiszel Skymind Workday Data Science Meetup August 10, 2016

Upload: david-kale

Post on 05-Jan-2017

613 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: DL4J at Workday Meetup

DL4J: Deep Learning for the JVM and Enterprise

David C. Kale, Ruben Fiszel Skymind

Workday Data Science Meetup August 10, 2016

Page 2: DL4J at Workday Meetup

Who are we?• Deeplearning4j: open source deep learning on the JVM

• Skymind: deep learning for enterprise • fighting good fight vs. python deep learning mafia • founded by Adam Gibson • CEO Chris Nicholson

• Dave Kale: developer, Skymind: Scala API • also PhD student, USC • research: deep learning for healthcare

• Ruben Fiszel: intern, Skymind: reinforcement learning (RL4J) • also MS student, EPFL

Page 3: DL4J at Workday Meetup

Outline• Overview of deep learning

• Tour of DL4J

• Scaling up DL4J

• DL4J versus…

• Preview of DL4J Scala API

• Preview of RL4J

Page 4: DL4J at Workday Meetup

What is Deep Learning?• Compositions of (deterministic) differentiable functions, some parameterized

• compute transformations of data• eventually emit output• can have multiple paths

• architecture is end-to-end differentiable w.r.t. parameters (w’s) • training:

• define targets, loss function• apply gradient methods: use chain rule to get component-wise updates

x1 f1(x1;w1) z1 f2(z1) z2

f3(z2;w3) y Loss(y,t)

t

f4(x2;w4)x2 z4f3([z2,z4];

w3)

Page 5: DL4J at Workday Meetup

Example: multilayer perceptron

• Classic “neural net” architecture — a powerful nonlinear function approximator • Zero or more fully connected (“dense”) layers of “neurons”

• ex. neuron: h = f(Wx + b) for some nonlinearity f (e.g., ReLu(a) = max(a, 0))• Predict y from fixed-size, not-too-large x with no structure

• Classify digits in MNIST (digits are generally centered and upright) • Model risk of mortality in patients with pneumonia

• Special case: logistic regression (zero hidden layers)

http://deeplearning4j.org/mnist-for-beginners

Page 6: DL4J at Workday Meetup

Variation of MLP: autoencoder

• “Unsupervised” training: no separate target y • Learns to accurately reconstruct x from succinct latent z• Probabilistic generative variants (e.g., deep belief net) can generate novel x’s by

first sampling z from prior probability distribution p(z)

http://deeplearning4j.org/deepautoencoder

Page 7: DL4J at Workday Meetup

Example: convolutional (neural) networks

• Convolution layers “filter” x to extract features ➡ Filters exploit (spatially) local regularities while preserving spatial relationships

• Subsampling (pooling) layers combine local information, reduce resolution ➡ pooling gives translational invariance (i.e., classifier robust to shifts in x)

• Predict y from x with local structure (e.g., images, short time series) • 2D: classify images of, e.g., cats, cat may appear in different locations • 1D: diagnose patients from lab time series, symptoms at different times

• Special case: fully convolutional network with no MLP at “top” (filter for variable-sized x’s)

http://deeplearning4j.org/convolutionalnets

63

CONVOLUTIONAL NET

Share the same parameters across different locations: Convolutions with learned kernels

Ranzato

(CVPR 2012 Tutorial, pt. 3 by M.A. Ranzato)

http://deeplearning.net/tutorial/lenet.html

Page 8: DL4J at Workday Meetup

Example: recurrent neural networks

• Recurrent connections between hidden units: ht+1 = f(Wx + Vht) • Gives neural net a form of memory for capturing longterm dependencies • More elaborate RNNs (LSTMs) learn when/what to remember or forget • Predict y from sequential x (natural language, video, time series) • Among most flexible and powerful learning algorithms available

• Also can be most challenging to train

http://deeplearning4j.org/recurrentnetwork

Page 9: DL4J at Workday Meetup

RNNs: flexible input-to-output modeling

• Diagnose patients from temporal data (Lipton/Kale, ICLR 2016) • Predict next word or character (language modeling) • Generate beer review from category, score (Strata NY talk) • Translate from English to French (machine translation)

http://karpathy.github.io/2015/05/21/rnn-effectiveness/

Page 10: DL4J at Workday Meetup

Let’s get crazy with architectures

• How about automatically captioning videos? • Recall: we are just composing functions that transform inputs • Compose ConvNets with RNNs • You can do this with DL4J today!

(Venugopalan, et al., NAACL 2015)

Page 11: DL4J at Workday Meetup

Machine learning in the deep learning era• Architecture design + hyperparameter tuning replace iterative feature

engineering

• Easier to transfer “knowledge” across problems • direct: can adapt generic image classifier into, e.g., tumor classifier • indirect: analogies across problems point to architectures

• Often better able to leverage Big Data: • start with high capacity neural net • add regularization and tuning

• None of the following is true: • your Big Data problems will all be solved magically • the machines are going to take over • the Singularity is right around the corner

Page 13: DL4J at Workday Meetup

DL4J ecosystem for scalable DL

Arbiter • Platform agnostic model evaluation • Includes randomized grid search

Spark API • Spark API wraps core DL4J classes • Designing and configuring model

architecture identical • Currently provides data parallelism

• Scales to massive datasets • Accelerated, distributed training

• DataVec compatible with Spark RDDs

Core • Efficient numpy-like numerical

framework (ND4J) • ND4J backends for CUDA, ATLAS,

MKL, OpenBLAS • Multi-GPU

Page 14: DL4J at Workday Meetup

Scalable DL with Spark API

• Use Downpour SGD model from (Dean, et al. NIPS 2012) • Data parallelism

• training data sharded across workers • workers each have complete model, train in parallel on disjoint minibatches

• Parameter averaging • Master stores “canonical” model parameters • Workers send parameter updates (gradients) to master • Workers periodically ask for updated parameters from master

Page 17: DL4J at Workday Meetup

Example: distributed training of LeNet on Spark

Spark LeNet example on github

……

Page 19: DL4J at Workday Meetup

DL4J versus…my two cents• Using Java Big Data ecosystem (Hadoop, Spark, etc.): DL4J

• Want robust data preprocessing tools/pipelines: DL4J • esp. natural language, images, video

• Custom layers, loss functions, etc.: Theano/TF + keras/lasagne • grad student trying to publish NIPS papers • trying to win Kaggle competition with OpenAI model from NIPS (keras) • prototype an idea before implementing gradients by hand in DL4J

• Use published CV models from Caffe zoo: Caffe

• Python shop and don’t mind being hostage to Google Cloud: TF

• Good news: this is a false choice, like most things (see Scala API)

Page 20: DL4J at Workday Meetup

• Scala API for DL4J that emulates keras user experience

• Goal: reduce friction for going between keras and DL4J • make it easy to mimic keras architectures • load models keras-trained using common model format

(coming soon)

DL4J Scala API Preview

Page 21: DL4J at Workday Meetup

DL4J Scala API Keras

DL4J Scala API Preview

Page 22: DL4J at Workday Meetup

Thank you!• DL4J: http://deeplearning4j.org/

• Skymind: https://skymind.io/

• Dave: • email: [email protected] • Twitter: @davekale • website: http://www-scf.usc.edu/~dkale • MLHC Conference: http://mucmd.org

• Ruben • email: [email protected] • website: http://rubenfiszel.github.io/

Gibson&Patterson.DeepLearning:APractitioner’sApproach.O’Reilly,Q22016.