smart data conference: dl4j and datavec

skymind.io | deeplearning.org | gitter.im/deeplearning4j

DL4J and DataVecBuilding Production Class Deep Learning Workflows for the Enterprise

Josh Patterson / Director Field OrgSmart Data 2017 / San Francisco, CA

Josh Patterson

Director Field Engineering / SkymindCo-Author: O’Reilly’s “Deep Learning: A Practitioners Approach”

Self-Organizing Mesh Networks / Meta-Heuristics Research

Smartgrid work / TVA + NERC

Principal Field Architect / Cloudera

Topics

• Deep Learning in Production for the Enterprise

• DL4J and DataVec

• Example Workflow: Modeling Sensor Data with RNNs

Deep Learning in Production

In Practice Deep Learning Is…

• Matching Input Data Type to Specific Architecture (Image -> Convolutional Network)

• Higher Parameter Counts and more Processing Power

• Moving from “Feature Engineering” to “Automated Feature Learning”

Perceptron

Classic Multi-Layer Perceptron Architecture

RNN Architectures

Standard supervised learning

Imagecaptioning

Sentiment analysis

Video captioning,Natural language translation

Part of speechtagging

Generative models for text

Visually Understanding RNN Architecture

Evolving the Artificial Neuron for RNNs

Convolutional Network Architecture

Automated Feature Learning

• Hand-coding features has long been standard operation in machine learning

• Deep Learning got smart about matching architectures to data types

• Going forward, hand-coded features will be considered the “technical debt of machine learning”

Quick Usage Guide

• If I have Timeseries or Audio Input: Use a Recurrent Neural Network

• If I have Image input: Use a Convolutional Neural Network

• If I have Video input: Use a hybrid Convolutional + Recurrent Architecture!

• Applications in NLP: Word2Vec + variants

The Challenge of the Fortune 500

Take business problem and translate it into a product-izable solution

• Get data together

• Understand modeling, pull together expertise

Get the right data workflow / infra architecture to production-ize application

• Security

• Integration

“Google is living a few years in the future and sending the rest of us messages”

-- Doug Cutting in 2013

HoweverMost organizations are not built like Google

(and Jeff Dean does not work at your company…)

Anyone building Next-Gen infrastructure has to consider these things

Production Considerations

• Security – even though I can build a model, will IT let me run it?

• Data Warehouse Integration – can I easily run this In the existing IT footprint?

• Speedup – once I need to go faster, how hard is it to speed up modeling?

DL4J and DataVec

• DL4J – ASF 2.0 Licensed JVM Platform for Enterprise Deep Learning

• DataVec - a tool for machine learning ETL (Extract, Transform, Load) operations.

• Both run natively on Spark on CPU or GPU as Backends

• DL4J Suite certified on CDH5, HDP2.4, and upcoming IBM IOP platform.

ND4J: The Need for SpeedJavaCPP• Auto generate JNI Bindings for C++• Allows for easy maintenance and deployment of C++ binaries in Java

CPU Backends• OpenMP (multithreading within native operations)• OpenBLAS or MKL (BLAS operations)• SIMD-extensions

GPU Backends• DL4J supports Cuda 7.5 (+cuBLAS) at the moment, and will support 8.0 support as soon as it comes

out.• Leverages cuDNN as well

https://github.com/deeplearning4j/dl4j-benchmark

Prepping Data is Time Consuming

http://www.forbes.com/sites/gilpress/2016/03/23/data-preparation-most-time-consuming-least-enjoyable-data-science-task-survey-says/#633ea7f67f75

Preparing Data for Modeling is Hard

DL4J Workflow Toolchain

ETL(DataVec)

Vectorization

(DataVec)

Modeling(DL4J)

Evaluation(Arbiter)

Execution Platforms: Spark/Hadoop, Single Machine

ND4J - Linear Algebra Runtime: CPU, GPU

Model Import

• Import models from: Keras

• Keras imports data from: TensorFlow, Caffe, etc

• Example: Import VGGNet16

• Allows integration engineers to work with pre-built models

Coming Soon: DL4J as Keras Backend

• Allows Data Scientist to run python Keras commands and then execute on DL4J

• Sets up ability to run Keras jobs on Spark + Hadoop, securely

• Gives Python Data Scientists a better path to production class environment in the Enterprise

Modeling Sensor Data with RNNs and DL4J

NERC Sensor Data CollectionopenPDC PMU Data Collection circa 2009

• 120 Sensors• 30 samples/second• 4.3B Samples/day• Housed in Hadoop

Classifying UCI Sensor Data: Trends

A – Downward TrendB – CyclicC – NormalD – Upward ShiftE – Upward TrendF – Downward Shift

Loading and Transforming Timeseries Data with DataVec

SequenceRecordReader trainFeatures = new CSVSequenceRecordReader();trainFeatures.initialize(new NumberedFileInputSplit(featuresDirTrain.getAbsolutePath() + "/%d.csv", 0, 449));SequenceRecordReader trainLabels = new CSVSequenceRecordReader();trainLabels.initialize(new NumberedFileInputSplit(labelsDirTrain.getAbsolutePath() + "/%d.csv", 0, 449));

int minibatch = 10;int numLabelClasses = 6;DataSetIterator trainData = new SequenceRecordReaderDataSetIterator(trainFeatures, trainLabels, minibatch, numLabelClasses, false, SequenceRecordReaderDataSetIterator.AlignmentMode.ALIGN_END);

//Normalize the training dataDataNormalization normalizer = new NormalizerStandardize();normalizer.fit(trainData); //Collect training data statistics

trainData.reset();trainData.setPreProcessor(normalizer); //Use previously collected statistics to normalize on-the-fly

Configuring a Recurrent Neural Network with DL4JMultiLayerConfiguration conf = new NeuralNetConfiguration.Builder() .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT).iterations(1) .updater(Updater.NESTEROVS).momentum(0.9).learningRate(0.005) .gradientNormalization(GradientNormalization.ClipElementWiseAbsoluteValue) .gradientNormalizationThreshold(0.5) .list() .layer(0, new GravesLSTM.Builder().activation("tanh").nIn(1).nOut(10).build()) .layer(1, new RnnOutputLayer.Builder(LossFunctions.LossFunction.MCXENT) .activation("softmax").nIn(10).nOut(numLabelClasses).build()) .pretrain(false).backprop(true).build();

MultiLayerNetwork net = new MultiLayerNetwork(conf);net.init();

Train the Network on Local Machineint nEpochs = 40;String str = "Test set evaluation at epoch %d: Accuracy = %.2f, F1 = %.2f";

for (int i = 0; i < nEpochs; i++) { net.fit(trainData);

//Evaluate on the test set: Evaluation evaluation = net.evaluate(testData); System.out.println(String.format(str, i, evaluation.accuracy(), evaluation.f1()));

testData.reset(); trainData.reset();}

Train the Network on SparkTrainingMaster tm = new ParameterAveragingTrainingMaster(true,executors_count,1,batchSizePerWorker,1,0); //Create Spark multi layer network from configurationSparkDl4jMultiLayer sparkNetwork = new SparkDl4jMultiLayer(sc, net, tm);

int nEpochs = 40;String str = "Test set evaluation at epoch %d: Accuracy = %.2f, F1 = %.2f";

for (int i = 0; i < nEpochs; i++) { sparkNetwork.fit(trainDataRDD);

//Evaluate on the test set: Evaluation evaluation = net.evaluate(testData); System.out.println(String.format(str, i, evaluation.accuracy(), evaluation.f1()));

testData.reset(); trainData.reset();}

Modeling Character Data with RNNs (LSTMs) and DL4JGenerating Beer Reviews

Loading and Vectorizing Data with DataVec

Text: Pours a nice golden…

Category: LagerAppearance: 4.0Taste: 4.5Palate: 3.0Aroma: 3.5

• Characters: one-hot vector over vocabulary

• Categories: one-hot vector over beers• Ratings: score (we actually rescale)

Replicate static inputs at every step

t 1 2 3 4 5 6 7 8 9 10 11 12 …

a 0 0 0 0 0 0 1 0 0 0 0 0 …

c 0 0 0 0 0 0 0 0 0 0 1 0 …

o 0 1 0 0 0 0 0 0 0 0 0 0 …

r 0 0 0 1 0 0 0 0 0 0 0 0 …

0 0 0 0 0 1 0 1 0 0 0 0 …

… … … … … … … … … … … … … …

Lager 1 1 1 1 1 1 1 1 1 1 1 1 …

Porter 0 0 0 0 0 0 0 0 0 0 0 0 …

… … … … … … … … … … … … … …

Appear

4 4 4 4 4 4 4 4 4 4 4 4 …

Palate 3 3 3 3 3 3 3 3 3 3 3 3 …

… … … … … … … … … … … … … …

INDArray input = Nd4j.zeros(new int[]{ reviews.size(), inputColumnCount, maxLength });INDArray targets = Nd4j.zeros(new int[]{ reviews.size(), outputColumnCount, maxLength });INDArray mask = Nd4j.zeros(new int[]{ reviews.size(), maxLength });

/* iterate over samples in miniBatch, look up style index, etc. */

char currChar = STOPWORD; int currCharIdx = convertCharacterToIndex(currChar) for (int j =0; j < reviewChars.length; j++){ char nextChar = reviewChars[j]; int nextCharIdx = convertCharacterToIndex(nextChar); input.putScalar(new int[]{ mbIdx, currCharIdx, j }, 1); input.putScalar(new int[]{ mbIdx, ratingOffset, j }, review.overall); input.putScalar(new int[]{ mbIdx, ratingOffset + 1, j }, review.appearance); input.putScalar(new int[]{ mbIdx, ratingOffset + 2, j }, review.aroma); input.putScalar(new int[]{ mbIdx, ratingOffset + 3, j }, review.palate); input.putScalar(new int[]{ mbIdx, ratingOffset + 4, j }, taste); input.putScalar(new int[]{ mbIdx, styleIndexColumn, j }, 1); mask.putScalar(new int[]{ mbIdx, j }, 1); targets.putScalar(new int[]{ mbIdx, nextCharIdx, j }, 1); currChar = nextChar; currCharIdx = nextCharIdx; }

/* ... */return new DataSet(input,labels, mask, mask2);

Vectorizing JSON Beer Reviews

Setting Up LSTM ArchitectureMultiLayerConfiguration conf = new NeuralNetConfiguration.Builder() .seed(rngSeed) .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT).learningRate(0.1) .iterations(1) .updater(Updater.RMSPROP).rmsDecay(0.95) .regularization(true).l2(0.001) .weightInit(WeightInit.XAVIER) .list() .layer(0, new GravesLSTM.Builder().nIn(nIn).nOut(lstmLayerSize).activation("tanh").build()) .layer(1, new GravesLSTM.Builder().nOut(lstmLayerSize).activation("tanh").build()) .layer(2, new RnnOutputLayer.Builder(LossFunction.MCXENT) .activation("softmax").nOut(nOut).build()) .backpropType(BackpropType.TruncatedBPTT) .tBPTTForwardLength(tbpttLength) .tBPTTBackwardLength(tbpttLength) .pretrain(false) .backprop(true) .build();MultiLayerNetwork net = new MultiLayerNetwork(conf);net.init();

Optimization: SGD with RMSProp(NOTE: can be set on per layer basis)

Weight initialization and regularization: L2 weight decay(again, can be set per layer)

Hidden layers: 2 x Graves-style LSTM layers

Output layer: plain dense layer with softmax activation

Loss function: cross entropy (KL divergence between character distributions: neural net vs. empirical)

RNN-specific config for truncatedbackprop-through-time

Training Our LSTMfor(int epoch = 0; i < numEpochs; i++) { net.fit(trainData); /* Save model, print logging messages, etc. */

/* Compute held-out data performance. */ double cost = 0; double count = 0; while(heldoutData.hasNext()) { DataSet minibatch = heldoutData.next(); cost += net.scoreExamples(heldoutData, false).sumNumber().doubleValue(); count += minibatch.getLabelsMaskArray().sumNumber().doubleValue(); } log.info(String.format("Epoch %4d test set average cost: %.4f", i, cost / count));

/* Rest dataset iterators. */ trainData.reset() heldoutData.reset()}

Compute performance on held-out data.

Training. fit can be applied to DataSetIterator, DataSet, INDArray, etc.

Generating Beer Reviews from the LSTM ModelINDArray input = Nd4j.zeros(new int[]{iter.inputColumns()});

/* Load static data into vector. */

StringBuilder sb = new StringBuilder();int prevCharIdx = 0;int currCharIdx = 0;while (true) { input.putScalar(prevCharIdx, 0); input.putScalar(currCharIdx, 1); INDArray output = net.rnnTimeStep(input); double[] outputProbDistribution = new double[numCharacters]; for (int j = 0; j < outputProbDistribution.length; j++) outputProbDistribution[j] = output.getDouble(s, j); prevCharIdx = currCharIdx; currCharIdx = sampleFromDistribution(outputProbDistribution, rng); sb.append(convertIndexToCharacter(currCharIdx)); if (currCharIdx == STOPWORD) break;}String reviewSample = sb.toString();

Load input vector for single step.

Get probability distribution over next character by running RNN for one step.

Sample character from probability distribution.

Stop if we generate STOPWORD.

A Generated Beer Review…

More Resources

• DL4J Github:

• https://github.com/deeplearning4j/deeplearning4j

• DataVec Github

• https://github.com/deeplearning4j/DataVec

• Examples from this talk:

• https://github.com/deeplearning4j/dl4j-examples

Thank you!

Please visit skymind.io/learn for more information

Visit us at booth P33

smart data conference: dl4j and datavec

Data & Analytics

· dl4j caffe chainer m ha.il mxnet keras purine . torch...

sustainability in building design...sustainability in smart...

smart series - microsoft · 1 user manual smart-30s,...

getting involved in deep learning with dl4j...getting...

erzeugung von musiksequenzen mit...

smart manufacturing - events.static.linuxfound.org ·...

openpower april 2017 hpcac publish · tensorflow dl4j...

rbm with dl4j for deep learning

introduction to deep learning cmpt 733 -...

frameworks in python for numeric computation /...

practical applications of deep reinforcement learning...

dl4j at workday meetup

smart workplace เอการเมผตภาพ -...

formation continue en management des smart...

a tour of tensorflow - arxiv · a tour of tensorflow...

zukunftstrends - smarthome - ihk frankfurt am main · alles...

ibm deep learning solutions - openpower foundation · ibm...

suneel marthi - deep learning with apache flink and dl4j

dl4j in the wild

skil - dl4j in the wild meetup