smart data conference: dl4j and datavec

40
skymind.io | deeplearning.org | gitter.im/deeplearning4j DL4J and DataVec Building Production Class Deep Learning Workflows for the Enterprise Josh Patterson / Director Field Org Smart Data 2017 / San Francisco, CA

Upload: josh-patterson

Post on 13-Apr-2017

303 views

Category:

Data & Analytics


3 download

TRANSCRIPT

Page 1: Smart Data Conference: DL4J and DataVec

skymind.io | deeplearning.org | gitter.im/deeplearning4j

DL4J and DataVecBuilding Production Class Deep Learning Workflows for the Enterprise

Josh Patterson / Director Field OrgSmart Data 2017 / San Francisco, CA

Page 2: Smart Data Conference: DL4J and DataVec

Josh Patterson

Director Field Engineering / SkymindCo-Author: O’Reilly’s “Deep Learning: A Practitioners Approach”

Past:

Self-Organizing Mesh Networks / Meta-Heuristics Research

Smartgrid work / TVA + NERC

Principal Field Architect / Cloudera

Page 3: Smart Data Conference: DL4J and DataVec

Topics

• Deep Learning in Production for the Enterprise

• DL4J and DataVec

• Example Workflow: Modeling Sensor Data with RNNs

Page 4: Smart Data Conference: DL4J and DataVec

Deep Learning in Production

Page 5: Smart Data Conference: DL4J and DataVec

In Practice Deep Learning Is…

• Matching Input Data Type to Specific Architecture (Image -> Convolutional Network)

• Higher Parameter Counts and more Processing Power

• Moving from “Feature Engineering” to “Automated Feature Learning”

Page 6: Smart Data Conference: DL4J and DataVec

Perceptron

Page 7: Smart Data Conference: DL4J and DataVec

Classic Multi-Layer Perceptron Architecture

Page 8: Smart Data Conference: DL4J and DataVec

RNN Architectures

Standard supervised learning

Imagecaptioning

Sentiment analysis

Video captioning,Natural language translation

Part of speechtagging

Generative models for text

Page 9: Smart Data Conference: DL4J and DataVec

Visually Understanding RNN Architecture

Page 10: Smart Data Conference: DL4J and DataVec

Evolving the Artificial Neuron for RNNs

Page 11: Smart Data Conference: DL4J and DataVec

Convolutional Network Architecture

Page 12: Smart Data Conference: DL4J and DataVec

Automated Feature Learning

• Hand-coding features has long been standard operation in machine learning

• Deep Learning got smart about matching architectures to data types

• Going forward, hand-coded features will be considered the “technical debt of machine learning”

Page 13: Smart Data Conference: DL4J and DataVec

Quick Usage Guide

• If I have Timeseries or Audio Input: Use a Recurrent Neural Network

• If I have Image input: Use a Convolutional Neural Network

• If I have Video input: Use a hybrid Convolutional + Recurrent Architecture!

• Applications in NLP: Word2Vec + variants

Page 14: Smart Data Conference: DL4J and DataVec

The Challenge of the Fortune 500

Take business problem and translate it into a product-izable solution

• Get data together

• Understand modeling, pull together expertise

Get the right data workflow / infra architecture to production-ize application

• Security

• Integration

Page 15: Smart Data Conference: DL4J and DataVec

“Google is living a few years in the future and sending the rest of us messages”

-- Doug Cutting in 2013

HoweverMost organizations are not built like Google

(and Jeff Dean does not work at your company…)

Anyone building Next-Gen infrastructure has to consider these things

Page 16: Smart Data Conference: DL4J and DataVec

Production Considerations

• Security – even though I can build a model, will IT let me run it?

• Data Warehouse Integration – can I easily run this In the existing IT footprint?

• Speedup – once I need to go faster, how hard is it to speed up modeling?

Page 17: Smart Data Conference: DL4J and DataVec

DL4J and DataVec

Page 18: Smart Data Conference: DL4J and DataVec

DL4J and DataVec

• DL4J – ASF 2.0 Licensed JVM Platform for Enterprise Deep Learning

• DataVec - a tool for machine learning ETL (Extract, Transform, Load) operations.

• Both run natively on Spark on CPU or GPU as Backends

• DL4J Suite certified on CDH5, HDP2.4, and upcoming IBM IOP platform.

Page 19: Smart Data Conference: DL4J and DataVec

ND4J: The Need for SpeedJavaCPP• Auto generate JNI Bindings for C++• Allows for easy maintenance and deployment of C++ binaries in Java

CPU Backends• OpenMP (multithreading within native operations)• OpenBLAS or MKL (BLAS operations)• SIMD-extensions

GPU Backends• DL4J supports Cuda 7.5 (+cuBLAS) at the moment, and will support 8.0 support as soon as it comes

out.• Leverages cuDNN as well

https://github.com/deeplearning4j/dl4j-benchmark

Page 20: Smart Data Conference: DL4J and DataVec

Prepping Data is Time Consuming

http://www.forbes.com/sites/gilpress/2016/03/23/data-preparation-most-time-consuming-least-enjoyable-data-science-task-survey-says/#633ea7f67f75

Page 21: Smart Data Conference: DL4J and DataVec

Preparing Data for Modeling is Hard

Page 22: Smart Data Conference: DL4J and DataVec

DL4J Workflow Toolchain

ETL(DataVec)

Vectorization

(DataVec)

Modeling(DL4J)

Evaluation(Arbiter)

Execution Platforms: Spark/Hadoop, Single Machine

ND4J - Linear Algebra Runtime: CPU, GPU

Page 23: Smart Data Conference: DL4J and DataVec

Model Import

• Import models from: Keras

• Keras imports data from: TensorFlow, Caffe, etc

• Example: Import VGGNet16

• Allows integration engineers to work with pre-built models

Page 24: Smart Data Conference: DL4J and DataVec

Coming Soon: DL4J as Keras Backend

• Allows Data Scientist to run python Keras commands and then execute on DL4J

• Sets up ability to run Keras jobs on Spark + Hadoop, securely

• Gives Python Data Scientists a better path to production class environment in the Enterprise

Page 25: Smart Data Conference: DL4J and DataVec

Modeling Sensor Data with RNNs and DL4J

Page 26: Smart Data Conference: DL4J and DataVec

NERC Sensor Data CollectionopenPDC PMU Data Collection circa 2009

• 120 Sensors• 30 samples/second• 4.3B Samples/day• Housed in Hadoop

Page 27: Smart Data Conference: DL4J and DataVec

Classifying UCI Sensor Data: Trends

A – Downward TrendB – CyclicC – NormalD – Upward ShiftE – Upward TrendF – Downward Shift

Page 28: Smart Data Conference: DL4J and DataVec

Loading and Transforming Timeseries Data with DataVec

SequenceRecordReader trainFeatures = new CSVSequenceRecordReader();trainFeatures.initialize(new NumberedFileInputSplit(featuresDirTrain.getAbsolutePath() + "/%d.csv", 0, 449));SequenceRecordReader trainLabels = new CSVSequenceRecordReader();trainLabels.initialize(new NumberedFileInputSplit(labelsDirTrain.getAbsolutePath() + "/%d.csv", 0, 449));

int minibatch = 10;int numLabelClasses = 6;DataSetIterator trainData = new SequenceRecordReaderDataSetIterator(trainFeatures, trainLabels, minibatch, numLabelClasses, false, SequenceRecordReaderDataSetIterator.AlignmentMode.ALIGN_END);

//Normalize the training dataDataNormalization normalizer = new NormalizerStandardize();normalizer.fit(trainData); //Collect training data statistics

trainData.reset();trainData.setPreProcessor(normalizer); //Use previously collected statistics to normalize on-the-fly

Page 29: Smart Data Conference: DL4J and DataVec

Configuring a Recurrent Neural Network with DL4JMultiLayerConfiguration conf = new NeuralNetConfiguration.Builder() .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT).iterations(1) .updater(Updater.NESTEROVS).momentum(0.9).learningRate(0.005) .gradientNormalization(GradientNormalization.ClipElementWiseAbsoluteValue) .gradientNormalizationThreshold(0.5) .list() .layer(0, new GravesLSTM.Builder().activation("tanh").nIn(1).nOut(10).build()) .layer(1, new RnnOutputLayer.Builder(LossFunctions.LossFunction.MCXENT) .activation("softmax").nIn(10).nOut(numLabelClasses).build()) .pretrain(false).backprop(true).build();

MultiLayerNetwork net = new MultiLayerNetwork(conf);net.init();

Page 30: Smart Data Conference: DL4J and DataVec

Train the Network on Local Machineint nEpochs = 40;String str = "Test set evaluation at epoch %d: Accuracy = %.2f, F1 = %.2f";

for (int i = 0; i < nEpochs; i++) { net.fit(trainData);

//Evaluate on the test set: Evaluation evaluation = net.evaluate(testData); System.out.println(String.format(str, i, evaluation.accuracy(), evaluation.f1()));

testData.reset(); trainData.reset();}

Page 31: Smart Data Conference: DL4J and DataVec

Train the Network on SparkTrainingMaster tm = new ParameterAveragingTrainingMaster(true,executors_count,1,batchSizePerWorker,1,0); //Create Spark multi layer network from configurationSparkDl4jMultiLayer sparkNetwork = new SparkDl4jMultiLayer(sc, net, tm);

int nEpochs = 40;String str = "Test set evaluation at epoch %d: Accuracy = %.2f, F1 = %.2f";

for (int i = 0; i < nEpochs; i++) { sparkNetwork.fit(trainDataRDD);

//Evaluate on the test set: Evaluation evaluation = net.evaluate(testData); System.out.println(String.format(str, i, evaluation.accuracy(), evaluation.f1()));

testData.reset(); trainData.reset();}

Page 32: Smart Data Conference: DL4J and DataVec

Modeling Character Data with RNNs (LSTMs) and DL4JGenerating Beer Reviews

Page 33: Smart Data Conference: DL4J and DataVec

Loading and Vectorizing Data with DataVec

Text: Pours a nice golden…

Category: LagerAppearance: 4.0Taste: 4.5Palate: 3.0Aroma: 3.5

• Characters: one-hot vector over vocabulary

• Categories: one-hot vector over beers• Ratings: score (we actually rescale)

Replicate static inputs at every step

t 1 2 3 4 5 6 7 8 9 10 11 12 …

a 0 0 0 0 0 0 1 0 0 0 0 0 …

c 0 0 0 0 0 0 0 0 0 0 1 0 …

o 0 1 0 0 0 0 0 0 0 0 0 0 …

r 0 0 0 1 0 0 0 0 0 0 0 0 …

0 0 0 0 0 1 0 1 0 0 0 0 …

… … … … … … … … … … … … … …

Lager 1 1 1 1 1 1 1 1 1 1 1 1 …

Porter 0 0 0 0 0 0 0 0 0 0 0 0 …

… … … … … … … … … … … … … …

Appear

4 4 4 4 4 4 4 4 4 4 4 4 …

Palate 3 3 3 3 3 3 3 3 3 3 3 3 …

… … … … … … … … … … … … … …

Page 34: Smart Data Conference: DL4J and DataVec

INDArray input = Nd4j.zeros(new int[]{ reviews.size(), inputColumnCount, maxLength });INDArray targets = Nd4j.zeros(new int[]{ reviews.size(), outputColumnCount, maxLength });INDArray mask = Nd4j.zeros(new int[]{ reviews.size(), maxLength });

/* iterate over samples in miniBatch, look up style index, etc. */

char currChar = STOPWORD; int currCharIdx = convertCharacterToIndex(currChar) for (int j =0; j < reviewChars.length; j++){ char nextChar = reviewChars[j]; int nextCharIdx = convertCharacterToIndex(nextChar); input.putScalar(new int[]{ mbIdx, currCharIdx, j }, 1); input.putScalar(new int[]{ mbIdx, ratingOffset, j }, review.overall); input.putScalar(new int[]{ mbIdx, ratingOffset + 1, j }, review.appearance); input.putScalar(new int[]{ mbIdx, ratingOffset + 2, j }, review.aroma); input.putScalar(new int[]{ mbIdx, ratingOffset + 3, j }, review.palate); input.putScalar(new int[]{ mbIdx, ratingOffset + 4, j }, taste); input.putScalar(new int[]{ mbIdx, styleIndexColumn, j }, 1); mask.putScalar(new int[]{ mbIdx, j }, 1); targets.putScalar(new int[]{ mbIdx, nextCharIdx, j }, 1); currChar = nextChar; currCharIdx = nextCharIdx; }

/* ... */return new DataSet(input,labels, mask, mask2);

Vectorizing JSON Beer Reviews

Page 35: Smart Data Conference: DL4J and DataVec

Setting Up LSTM ArchitectureMultiLayerConfiguration conf = new NeuralNetConfiguration.Builder() .seed(rngSeed) .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT).learningRate(0.1) .iterations(1) .updater(Updater.RMSPROP).rmsDecay(0.95) .regularization(true).l2(0.001) .weightInit(WeightInit.XAVIER) .list() .layer(0, new GravesLSTM.Builder().nIn(nIn).nOut(lstmLayerSize).activation("tanh").build()) .layer(1, new GravesLSTM.Builder().nOut(lstmLayerSize).activation("tanh").build()) .layer(2, new RnnOutputLayer.Builder(LossFunction.MCXENT) .activation("softmax").nOut(nOut).build()) .backpropType(BackpropType.TruncatedBPTT) .tBPTTForwardLength(tbpttLength) .tBPTTBackwardLength(tbpttLength) .pretrain(false) .backprop(true) .build();MultiLayerNetwork net = new MultiLayerNetwork(conf);net.init();

Optimization: SGD with RMSProp(NOTE: can be set on per layer basis)

Weight initialization and regularization: L2 weight decay(again, can be set per layer)

Hidden layers: 2 x Graves-style LSTM layers

Output layer: plain dense layer with softmax activation

Loss function: cross entropy (KL divergence between character distributions: neural net vs. empirical)

RNN-specific config for truncatedbackprop-through-time

Page 36: Smart Data Conference: DL4J and DataVec

Training Our LSTMfor(int epoch = 0; i < numEpochs; i++) { net.fit(trainData); /* Save model, print logging messages, etc. */

/* Compute held-out data performance. */ double cost = 0; double count = 0; while(heldoutData.hasNext()) { DataSet minibatch = heldoutData.next(); cost += net.scoreExamples(heldoutData, false).sumNumber().doubleValue(); count += minibatch.getLabelsMaskArray().sumNumber().doubleValue(); } log.info(String.format("Epoch %4d test set average cost: %.4f", i, cost / count));

/* Rest dataset iterators. */ trainData.reset() heldoutData.reset()}

Compute performance on held-out data.

Training. fit can be applied to DataSetIterator, DataSet, INDArray, etc.

Page 37: Smart Data Conference: DL4J and DataVec

Generating Beer Reviews from the LSTM ModelINDArray input = Nd4j.zeros(new int[]{iter.inputColumns()});

/* Load static data into vector. */

StringBuilder sb = new StringBuilder();int prevCharIdx = 0;int currCharIdx = 0;while (true) { input.putScalar(prevCharIdx, 0); input.putScalar(currCharIdx, 1); INDArray output = net.rnnTimeStep(input); double[] outputProbDistribution = new double[numCharacters]; for (int j = 0; j < outputProbDistribution.length; j++) outputProbDistribution[j] = output.getDouble(s, j); prevCharIdx = currCharIdx; currCharIdx = sampleFromDistribution(outputProbDistribution, rng); sb.append(convertIndexToCharacter(currCharIdx)); if (currCharIdx == STOPWORD) break;}String reviewSample = sb.toString();

Load input vector for single step.

Get probability distribution over next character by running RNN for one step.

Sample character from probability distribution.

Stop if we generate STOPWORD.

Page 38: Smart Data Conference: DL4J and DataVec

A Generated Beer Review…

Page 39: Smart Data Conference: DL4J and DataVec

More Resources

• DL4J Github:

• https://github.com/deeplearning4j/deeplearning4j

• DataVec Github

• https://github.com/deeplearning4j/DataVec

• Examples from this talk:

• https://github.com/deeplearning4j/dl4j-examples

Page 40: Smart Data Conference: DL4J and DataVec

Thank you!

Please visit skymind.io/learn for more information

OR

Visit us at booth P33