productionizing dl from the ground up

31
Open DataSciCon May 2015 Productionizing Deep Learning From the Ground Up

Upload: adam-gibson

Post on 27-Jul-2015

1.696 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Productionizing dl from the ground up

Open DataSciCon May 2015

Productionizing Deep Learning

From the Ground Up

Page 2: Productionizing dl from the ground up

Overview

● What is Deep Learning?● Why is it hard?● Problems to think about● Conclusions

Page 3: Productionizing dl from the ground up

What is Deep Learning?Pattern recognition on unlabeled & unstructured data.

Page 4: Productionizing dl from the ground up

What is Deep Learning?

● Deep Neural Networks >= 3 Layers● For media/unstructured data● Automatic Feature Engineering● Benefits From Complex Architectures● Computationally Intensive● Accelerates With Special Hardware

Page 5: Productionizing dl from the ground up

Get why it’s hard yet?

Page 6: Productionizing dl from the ground up

Deep Networks >= 3 Layers

● Backpropagation and Old School ANNs = 3

Page 7: Productionizing dl from the ground up

Deep Networks

● Neural Networks themselves as hidden Layers

● Different Types of Layers can be Interchanged/stacked

● Multiple Layer Types, each with own Hyperparameters and Loss Functions

Page 8: Productionizing dl from the ground up

What Are Common Layer Types?

Page 9: Productionizing dl from the ground up

Feedforward

1.MLPs2.AutoEncoders3.RBMs

Page 10: Productionizing dl from the ground up

Recurrent

1.MultiModal2.LSTMs3.Stateful

Page 11: Productionizing dl from the ground up

Convolutional

Lenet: Mixes convolutional & subsampling layers

Page 12: Productionizing dl from the ground up

Recursive/Tree

Uses a parser to form a tree structure

Page 13: Productionizing dl from the ground up

Other kinds

● Memory Networks● Deep Reinforcement Learning● Adversarial Architectures● New recursive ConvNet variant to

come in 2016?● Over 9,000 Layers? (22 is already

pretty common)

Page 14: Productionizing dl from the ground up

Automatic Feature Engineering

Page 15: Productionizing dl from the ground up

Automatic Feature Engineering (TSNE)Visualizations are crucial:Use TSNE to render different kinds of data:http://lvdmaaten.github.io/tsne/

Page 16: Productionizing dl from the ground up

deeplearning4j.org

presentation@

Google, Nov. 17 2014

“TWO PIZZAS SITTING ON A STOVETOP”

Page 17: Productionizing dl from the ground up

Benefits from Complex Architectures

Google’s result combined:● LSTMs (learning captions) ● Word Embeddings ● Convolutional features from images

(aligned to be same size as embeddings)

Page 18: Productionizing dl from the ground up

Computationally Intensive

● One iteration of ImageNet (1k label dataset and over 1MM examples) takes 7 hours on GPUs

● Project Adam● Google Brain

Page 19: Productionizing dl from the ground up

Special Hardware required

Unlike most solutions, multiple GPUs are used today (Not common in Java-based stacks!)

Page 20: Productionizing dl from the ground up

Software Engineering Concerns

● Pipelines to deal with messy data, not canned problems...(Real life is not Kaggle, people.)

● Scale/Maintenance (Clusters of GPUs aren’t done well today.)

● Different kinds of parallelism (model and data)

Page 21: Productionizing dl from the ground up

Model vs Data Parallelism

● Model is sharding model across servers

(HPC style)● Data is mini batch

Page 22: Productionizing dl from the ground up

Vectorizing unstructured data

● Data is stored in different databases● Different kinds of files (raw)● Deep Learning works well on mixed

signal

Page 23: Productionizing dl from the ground up

Parallelism

● Model (HPC)● Data (Mini batch param averaging)

Page 24: Productionizing dl from the ground up

Production Stacks today

● Hadoop/Spark not enough● GPUs not friendly to average

programmer● Cluster management of GPUs as a

resource not typically done● Many frameworks don’t work well in a

distributed env (getting better, though)

Page 25: Productionizing dl from the ground up

Problems With Neural Nets● Loss functions● Scaling data● Mixing different neural nets● Hyperparameter tuning

Page 26: Productionizing dl from the ground up

Loss Functions

● Classification● Regression● Reconstruction

Page 27: Productionizing dl from the ground up

Scaling Data

● Zero mean and unit variance● Zero to 1● Other forms of preprocessing relative

to distribution of data● Processing can also be columnwise

(categorical?)

Page 28: Productionizing dl from the ground up

Mixing and Matching Neural Networks

● Video: ConvNet + Recurrent● Convolutional RBMs?● Convolutional -> Subsampling -> Fully

Connected● DBNs: Different hidden and visible

units for each layer

Page 29: Productionizing dl from the ground up

Hyperparameter tuning

● Underfit● Overfit● Overdescribe (your hidden layers)● Layerwise interactions● What activation function? (Competing?

Relu? Good ol’ Sigmoid?)

Page 30: Productionizing dl from the ground up

Hyperparameter Tuning (2)

● Grid search for neural nets (Don’t do it!)

● Bayesian (Getting better. There are at least priors here.)

● Gradient-based approaches (Your hyper- parameters are a neural net, so there are neural nets optimizing your neural nets...)

Page 31: Productionizing dl from the ground up

Questions?

Twitter: @agibsonccc Github: agibsoncccLinkedIn: /in/agibsoncccEmail: [email protected] (combo breaker!)Web: deeplearning4j.org