Download - Smart Data Conference: DL4J and DataVec
![Page 1: Smart Data Conference: DL4J and DataVec](https://reader031.vdocuments.net/reader031/viewer/2022013113/58ef6fc31a28ab0e3b8b45e5/html5/thumbnails/1.jpg)
skymind.io | deeplearning.org | gitter.im/deeplearning4j
DL4J and DataVecBuilding Production Class Deep Learning Workflows for the Enterprise
Josh Patterson / Director Field OrgSmart Data 2017 / San Francisco, CA
![Page 2: Smart Data Conference: DL4J and DataVec](https://reader031.vdocuments.net/reader031/viewer/2022013113/58ef6fc31a28ab0e3b8b45e5/html5/thumbnails/2.jpg)
Josh Patterson
Director Field Engineering / SkymindCo-Author: O’Reilly’s “Deep Learning: A Practitioners Approach”
Past:
Self-Organizing Mesh Networks / Meta-Heuristics Research
Smartgrid work / TVA + NERC
Principal Field Architect / Cloudera
![Page 3: Smart Data Conference: DL4J and DataVec](https://reader031.vdocuments.net/reader031/viewer/2022013113/58ef6fc31a28ab0e3b8b45e5/html5/thumbnails/3.jpg)
Topics
• Deep Learning in Production for the Enterprise
• DL4J and DataVec
• Example Workflow: Modeling Sensor Data with RNNs
![Page 4: Smart Data Conference: DL4J and DataVec](https://reader031.vdocuments.net/reader031/viewer/2022013113/58ef6fc31a28ab0e3b8b45e5/html5/thumbnails/4.jpg)
Deep Learning in Production
![Page 5: Smart Data Conference: DL4J and DataVec](https://reader031.vdocuments.net/reader031/viewer/2022013113/58ef6fc31a28ab0e3b8b45e5/html5/thumbnails/5.jpg)
In Practice Deep Learning Is…
• Matching Input Data Type to Specific Architecture (Image -> Convolutional Network)
• Higher Parameter Counts and more Processing Power
• Moving from “Feature Engineering” to “Automated Feature Learning”
![Page 6: Smart Data Conference: DL4J and DataVec](https://reader031.vdocuments.net/reader031/viewer/2022013113/58ef6fc31a28ab0e3b8b45e5/html5/thumbnails/6.jpg)
Perceptron
![Page 7: Smart Data Conference: DL4J and DataVec](https://reader031.vdocuments.net/reader031/viewer/2022013113/58ef6fc31a28ab0e3b8b45e5/html5/thumbnails/7.jpg)
Classic Multi-Layer Perceptron Architecture
![Page 8: Smart Data Conference: DL4J and DataVec](https://reader031.vdocuments.net/reader031/viewer/2022013113/58ef6fc31a28ab0e3b8b45e5/html5/thumbnails/8.jpg)
RNN Architectures
Standard supervised learning
Imagecaptioning
Sentiment analysis
Video captioning,Natural language translation
Part of speechtagging
Generative models for text
![Page 9: Smart Data Conference: DL4J and DataVec](https://reader031.vdocuments.net/reader031/viewer/2022013113/58ef6fc31a28ab0e3b8b45e5/html5/thumbnails/9.jpg)
Visually Understanding RNN Architecture
![Page 10: Smart Data Conference: DL4J and DataVec](https://reader031.vdocuments.net/reader031/viewer/2022013113/58ef6fc31a28ab0e3b8b45e5/html5/thumbnails/10.jpg)
Evolving the Artificial Neuron for RNNs
![Page 11: Smart Data Conference: DL4J and DataVec](https://reader031.vdocuments.net/reader031/viewer/2022013113/58ef6fc31a28ab0e3b8b45e5/html5/thumbnails/11.jpg)
Convolutional Network Architecture
![Page 12: Smart Data Conference: DL4J and DataVec](https://reader031.vdocuments.net/reader031/viewer/2022013113/58ef6fc31a28ab0e3b8b45e5/html5/thumbnails/12.jpg)
Automated Feature Learning
• Hand-coding features has long been standard operation in machine learning
• Deep Learning got smart about matching architectures to data types
• Going forward, hand-coded features will be considered the “technical debt of machine learning”
![Page 13: Smart Data Conference: DL4J and DataVec](https://reader031.vdocuments.net/reader031/viewer/2022013113/58ef6fc31a28ab0e3b8b45e5/html5/thumbnails/13.jpg)
Quick Usage Guide
• If I have Timeseries or Audio Input: Use a Recurrent Neural Network
• If I have Image input: Use a Convolutional Neural Network
• If I have Video input: Use a hybrid Convolutional + Recurrent Architecture!
• Applications in NLP: Word2Vec + variants
![Page 14: Smart Data Conference: DL4J and DataVec](https://reader031.vdocuments.net/reader031/viewer/2022013113/58ef6fc31a28ab0e3b8b45e5/html5/thumbnails/14.jpg)
The Challenge of the Fortune 500
Take business problem and translate it into a product-izable solution
• Get data together
• Understand modeling, pull together expertise
Get the right data workflow / infra architecture to production-ize application
• Security
• Integration
![Page 15: Smart Data Conference: DL4J and DataVec](https://reader031.vdocuments.net/reader031/viewer/2022013113/58ef6fc31a28ab0e3b8b45e5/html5/thumbnails/15.jpg)
“Google is living a few years in the future and sending the rest of us messages”
-- Doug Cutting in 2013
HoweverMost organizations are not built like Google
(and Jeff Dean does not work at your company…)
Anyone building Next-Gen infrastructure has to consider these things
![Page 16: Smart Data Conference: DL4J and DataVec](https://reader031.vdocuments.net/reader031/viewer/2022013113/58ef6fc31a28ab0e3b8b45e5/html5/thumbnails/16.jpg)
Production Considerations
• Security – even though I can build a model, will IT let me run it?
• Data Warehouse Integration – can I easily run this In the existing IT footprint?
• Speedup – once I need to go faster, how hard is it to speed up modeling?
![Page 17: Smart Data Conference: DL4J and DataVec](https://reader031.vdocuments.net/reader031/viewer/2022013113/58ef6fc31a28ab0e3b8b45e5/html5/thumbnails/17.jpg)
DL4J and DataVec
![Page 18: Smart Data Conference: DL4J and DataVec](https://reader031.vdocuments.net/reader031/viewer/2022013113/58ef6fc31a28ab0e3b8b45e5/html5/thumbnails/18.jpg)
DL4J and DataVec
• DL4J – ASF 2.0 Licensed JVM Platform for Enterprise Deep Learning
• DataVec - a tool for machine learning ETL (Extract, Transform, Load) operations.
• Both run natively on Spark on CPU or GPU as Backends
• DL4J Suite certified on CDH5, HDP2.4, and upcoming IBM IOP platform.
![Page 19: Smart Data Conference: DL4J and DataVec](https://reader031.vdocuments.net/reader031/viewer/2022013113/58ef6fc31a28ab0e3b8b45e5/html5/thumbnails/19.jpg)
ND4J: The Need for SpeedJavaCPP• Auto generate JNI Bindings for C++• Allows for easy maintenance and deployment of C++ binaries in Java
CPU Backends• OpenMP (multithreading within native operations)• OpenBLAS or MKL (BLAS operations)• SIMD-extensions
GPU Backends• DL4J supports Cuda 7.5 (+cuBLAS) at the moment, and will support 8.0 support as soon as it comes
out.• Leverages cuDNN as well
https://github.com/deeplearning4j/dl4j-benchmark
![Page 20: Smart Data Conference: DL4J and DataVec](https://reader031.vdocuments.net/reader031/viewer/2022013113/58ef6fc31a28ab0e3b8b45e5/html5/thumbnails/20.jpg)
Prepping Data is Time Consuming
http://www.forbes.com/sites/gilpress/2016/03/23/data-preparation-most-time-consuming-least-enjoyable-data-science-task-survey-says/#633ea7f67f75
![Page 21: Smart Data Conference: DL4J and DataVec](https://reader031.vdocuments.net/reader031/viewer/2022013113/58ef6fc31a28ab0e3b8b45e5/html5/thumbnails/21.jpg)
Preparing Data for Modeling is Hard
![Page 22: Smart Data Conference: DL4J and DataVec](https://reader031.vdocuments.net/reader031/viewer/2022013113/58ef6fc31a28ab0e3b8b45e5/html5/thumbnails/22.jpg)
DL4J Workflow Toolchain
ETL(DataVec)
Vectorization
(DataVec)
Modeling(DL4J)
Evaluation(Arbiter)
Execution Platforms: Spark/Hadoop, Single Machine
ND4J - Linear Algebra Runtime: CPU, GPU
![Page 23: Smart Data Conference: DL4J and DataVec](https://reader031.vdocuments.net/reader031/viewer/2022013113/58ef6fc31a28ab0e3b8b45e5/html5/thumbnails/23.jpg)
Model Import
• Import models from: Keras
• Keras imports data from: TensorFlow, Caffe, etc
• Example: Import VGGNet16
• Allows integration engineers to work with pre-built models
![Page 24: Smart Data Conference: DL4J and DataVec](https://reader031.vdocuments.net/reader031/viewer/2022013113/58ef6fc31a28ab0e3b8b45e5/html5/thumbnails/24.jpg)
Coming Soon: DL4J as Keras Backend
• Allows Data Scientist to run python Keras commands and then execute on DL4J
• Sets up ability to run Keras jobs on Spark + Hadoop, securely
• Gives Python Data Scientists a better path to production class environment in the Enterprise
![Page 25: Smart Data Conference: DL4J and DataVec](https://reader031.vdocuments.net/reader031/viewer/2022013113/58ef6fc31a28ab0e3b8b45e5/html5/thumbnails/25.jpg)
Modeling Sensor Data with RNNs and DL4J
![Page 26: Smart Data Conference: DL4J and DataVec](https://reader031.vdocuments.net/reader031/viewer/2022013113/58ef6fc31a28ab0e3b8b45e5/html5/thumbnails/26.jpg)
NERC Sensor Data CollectionopenPDC PMU Data Collection circa 2009
• 120 Sensors• 30 samples/second• 4.3B Samples/day• Housed in Hadoop
![Page 27: Smart Data Conference: DL4J and DataVec](https://reader031.vdocuments.net/reader031/viewer/2022013113/58ef6fc31a28ab0e3b8b45e5/html5/thumbnails/27.jpg)
Classifying UCI Sensor Data: Trends
A – Downward TrendB – CyclicC – NormalD – Upward ShiftE – Upward TrendF – Downward Shift
![Page 28: Smart Data Conference: DL4J and DataVec](https://reader031.vdocuments.net/reader031/viewer/2022013113/58ef6fc31a28ab0e3b8b45e5/html5/thumbnails/28.jpg)
Loading and Transforming Timeseries Data with DataVec
SequenceRecordReader trainFeatures = new CSVSequenceRecordReader();trainFeatures.initialize(new NumberedFileInputSplit(featuresDirTrain.getAbsolutePath() + "/%d.csv", 0, 449));SequenceRecordReader trainLabels = new CSVSequenceRecordReader();trainLabels.initialize(new NumberedFileInputSplit(labelsDirTrain.getAbsolutePath() + "/%d.csv", 0, 449));
int minibatch = 10;int numLabelClasses = 6;DataSetIterator trainData = new SequenceRecordReaderDataSetIterator(trainFeatures, trainLabels, minibatch, numLabelClasses, false, SequenceRecordReaderDataSetIterator.AlignmentMode.ALIGN_END);
//Normalize the training dataDataNormalization normalizer = new NormalizerStandardize();normalizer.fit(trainData); //Collect training data statistics
trainData.reset();trainData.setPreProcessor(normalizer); //Use previously collected statistics to normalize on-the-fly
![Page 29: Smart Data Conference: DL4J and DataVec](https://reader031.vdocuments.net/reader031/viewer/2022013113/58ef6fc31a28ab0e3b8b45e5/html5/thumbnails/29.jpg)
Configuring a Recurrent Neural Network with DL4JMultiLayerConfiguration conf = new NeuralNetConfiguration.Builder() .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT).iterations(1) .updater(Updater.NESTEROVS).momentum(0.9).learningRate(0.005) .gradientNormalization(GradientNormalization.ClipElementWiseAbsoluteValue) .gradientNormalizationThreshold(0.5) .list() .layer(0, new GravesLSTM.Builder().activation("tanh").nIn(1).nOut(10).build()) .layer(1, new RnnOutputLayer.Builder(LossFunctions.LossFunction.MCXENT) .activation("softmax").nIn(10).nOut(numLabelClasses).build()) .pretrain(false).backprop(true).build();
MultiLayerNetwork net = new MultiLayerNetwork(conf);net.init();
![Page 30: Smart Data Conference: DL4J and DataVec](https://reader031.vdocuments.net/reader031/viewer/2022013113/58ef6fc31a28ab0e3b8b45e5/html5/thumbnails/30.jpg)
Train the Network on Local Machineint nEpochs = 40;String str = "Test set evaluation at epoch %d: Accuracy = %.2f, F1 = %.2f";
for (int i = 0; i < nEpochs; i++) { net.fit(trainData);
//Evaluate on the test set: Evaluation evaluation = net.evaluate(testData); System.out.println(String.format(str, i, evaluation.accuracy(), evaluation.f1()));
testData.reset(); trainData.reset();}
![Page 31: Smart Data Conference: DL4J and DataVec](https://reader031.vdocuments.net/reader031/viewer/2022013113/58ef6fc31a28ab0e3b8b45e5/html5/thumbnails/31.jpg)
Train the Network on SparkTrainingMaster tm = new ParameterAveragingTrainingMaster(true,executors_count,1,batchSizePerWorker,1,0); //Create Spark multi layer network from configurationSparkDl4jMultiLayer sparkNetwork = new SparkDl4jMultiLayer(sc, net, tm);
int nEpochs = 40;String str = "Test set evaluation at epoch %d: Accuracy = %.2f, F1 = %.2f";
for (int i = 0; i < nEpochs; i++) { sparkNetwork.fit(trainDataRDD);
//Evaluate on the test set: Evaluation evaluation = net.evaluate(testData); System.out.println(String.format(str, i, evaluation.accuracy(), evaluation.f1()));
testData.reset(); trainData.reset();}
![Page 32: Smart Data Conference: DL4J and DataVec](https://reader031.vdocuments.net/reader031/viewer/2022013113/58ef6fc31a28ab0e3b8b45e5/html5/thumbnails/32.jpg)
Modeling Character Data with RNNs (LSTMs) and DL4JGenerating Beer Reviews
![Page 33: Smart Data Conference: DL4J and DataVec](https://reader031.vdocuments.net/reader031/viewer/2022013113/58ef6fc31a28ab0e3b8b45e5/html5/thumbnails/33.jpg)
Loading and Vectorizing Data with DataVec
Text: Pours a nice golden…
Category: LagerAppearance: 4.0Taste: 4.5Palate: 3.0Aroma: 3.5
• Characters: one-hot vector over vocabulary
• Categories: one-hot vector over beers• Ratings: score (we actually rescale)
Replicate static inputs at every step
t 1 2 3 4 5 6 7 8 9 10 11 12 …
a 0 0 0 0 0 0 1 0 0 0 0 0 …
c 0 0 0 0 0 0 0 0 0 0 1 0 …
o 0 1 0 0 0 0 0 0 0 0 0 0 …
r 0 0 0 1 0 0 0 0 0 0 0 0 …
0 0 0 0 0 1 0 1 0 0 0 0 …
… … … … … … … … … … … … … …
Lager 1 1 1 1 1 1 1 1 1 1 1 1 …
Porter 0 0 0 0 0 0 0 0 0 0 0 0 …
… … … … … … … … … … … … … …
Appear
4 4 4 4 4 4 4 4 4 4 4 4 …
Palate 3 3 3 3 3 3 3 3 3 3 3 3 …
… … … … … … … … … … … … … …
![Page 34: Smart Data Conference: DL4J and DataVec](https://reader031.vdocuments.net/reader031/viewer/2022013113/58ef6fc31a28ab0e3b8b45e5/html5/thumbnails/34.jpg)
INDArray input = Nd4j.zeros(new int[]{ reviews.size(), inputColumnCount, maxLength });INDArray targets = Nd4j.zeros(new int[]{ reviews.size(), outputColumnCount, maxLength });INDArray mask = Nd4j.zeros(new int[]{ reviews.size(), maxLength });
/* iterate over samples in miniBatch, look up style index, etc. */
char currChar = STOPWORD; int currCharIdx = convertCharacterToIndex(currChar) for (int j =0; j < reviewChars.length; j++){ char nextChar = reviewChars[j]; int nextCharIdx = convertCharacterToIndex(nextChar); input.putScalar(new int[]{ mbIdx, currCharIdx, j }, 1); input.putScalar(new int[]{ mbIdx, ratingOffset, j }, review.overall); input.putScalar(new int[]{ mbIdx, ratingOffset + 1, j }, review.appearance); input.putScalar(new int[]{ mbIdx, ratingOffset + 2, j }, review.aroma); input.putScalar(new int[]{ mbIdx, ratingOffset + 3, j }, review.palate); input.putScalar(new int[]{ mbIdx, ratingOffset + 4, j }, taste); input.putScalar(new int[]{ mbIdx, styleIndexColumn, j }, 1); mask.putScalar(new int[]{ mbIdx, j }, 1); targets.putScalar(new int[]{ mbIdx, nextCharIdx, j }, 1); currChar = nextChar; currCharIdx = nextCharIdx; }
/* ... */return new DataSet(input,labels, mask, mask2);
Vectorizing JSON Beer Reviews
![Page 35: Smart Data Conference: DL4J and DataVec](https://reader031.vdocuments.net/reader031/viewer/2022013113/58ef6fc31a28ab0e3b8b45e5/html5/thumbnails/35.jpg)
Setting Up LSTM ArchitectureMultiLayerConfiguration conf = new NeuralNetConfiguration.Builder() .seed(rngSeed) .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT).learningRate(0.1) .iterations(1) .updater(Updater.RMSPROP).rmsDecay(0.95) .regularization(true).l2(0.001) .weightInit(WeightInit.XAVIER) .list() .layer(0, new GravesLSTM.Builder().nIn(nIn).nOut(lstmLayerSize).activation("tanh").build()) .layer(1, new GravesLSTM.Builder().nOut(lstmLayerSize).activation("tanh").build()) .layer(2, new RnnOutputLayer.Builder(LossFunction.MCXENT) .activation("softmax").nOut(nOut).build()) .backpropType(BackpropType.TruncatedBPTT) .tBPTTForwardLength(tbpttLength) .tBPTTBackwardLength(tbpttLength) .pretrain(false) .backprop(true) .build();MultiLayerNetwork net = new MultiLayerNetwork(conf);net.init();
Optimization: SGD with RMSProp(NOTE: can be set on per layer basis)
Weight initialization and regularization: L2 weight decay(again, can be set per layer)
Hidden layers: 2 x Graves-style LSTM layers
Output layer: plain dense layer with softmax activation
Loss function: cross entropy (KL divergence between character distributions: neural net vs. empirical)
RNN-specific config for truncatedbackprop-through-time
![Page 36: Smart Data Conference: DL4J and DataVec](https://reader031.vdocuments.net/reader031/viewer/2022013113/58ef6fc31a28ab0e3b8b45e5/html5/thumbnails/36.jpg)
Training Our LSTMfor(int epoch = 0; i < numEpochs; i++) { net.fit(trainData); /* Save model, print logging messages, etc. */
/* Compute held-out data performance. */ double cost = 0; double count = 0; while(heldoutData.hasNext()) { DataSet minibatch = heldoutData.next(); cost += net.scoreExamples(heldoutData, false).sumNumber().doubleValue(); count += minibatch.getLabelsMaskArray().sumNumber().doubleValue(); } log.info(String.format("Epoch %4d test set average cost: %.4f", i, cost / count));
/* Rest dataset iterators. */ trainData.reset() heldoutData.reset()}
Compute performance on held-out data.
Training. fit can be applied to DataSetIterator, DataSet, INDArray, etc.
![Page 37: Smart Data Conference: DL4J and DataVec](https://reader031.vdocuments.net/reader031/viewer/2022013113/58ef6fc31a28ab0e3b8b45e5/html5/thumbnails/37.jpg)
Generating Beer Reviews from the LSTM ModelINDArray input = Nd4j.zeros(new int[]{iter.inputColumns()});
/* Load static data into vector. */
StringBuilder sb = new StringBuilder();int prevCharIdx = 0;int currCharIdx = 0;while (true) { input.putScalar(prevCharIdx, 0); input.putScalar(currCharIdx, 1); INDArray output = net.rnnTimeStep(input); double[] outputProbDistribution = new double[numCharacters]; for (int j = 0; j < outputProbDistribution.length; j++) outputProbDistribution[j] = output.getDouble(s, j); prevCharIdx = currCharIdx; currCharIdx = sampleFromDistribution(outputProbDistribution, rng); sb.append(convertIndexToCharacter(currCharIdx)); if (currCharIdx == STOPWORD) break;}String reviewSample = sb.toString();
Load input vector for single step.
Get probability distribution over next character by running RNN for one step.
Sample character from probability distribution.
Stop if we generate STOPWORD.
![Page 38: Smart Data Conference: DL4J and DataVec](https://reader031.vdocuments.net/reader031/viewer/2022013113/58ef6fc31a28ab0e3b8b45e5/html5/thumbnails/38.jpg)
A Generated Beer Review…
![Page 39: Smart Data Conference: DL4J and DataVec](https://reader031.vdocuments.net/reader031/viewer/2022013113/58ef6fc31a28ab0e3b8b45e5/html5/thumbnails/39.jpg)
More Resources
• DL4J Github:
• https://github.com/deeplearning4j/deeplearning4j
• DataVec Github
• https://github.com/deeplearning4j/DataVec
• Examples from this talk:
• https://github.com/deeplearning4j/dl4j-examples
![Page 40: Smart Data Conference: DL4J and DataVec](https://reader031.vdocuments.net/reader031/viewer/2022013113/58ef6fc31a28ab0e3b8b45e5/html5/thumbnails/40.jpg)
Thank you!
Please visit skymind.io/learn for more information
OR
Visit us at booth P33