advanced spark deep learning

14
DeepLeanring4j Data Parallel deep learning on spark

Upload: adam-gibson

Post on 21-Apr-2017

1.569 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Advanced spark deep learning

DeepLeanring4jData Parallel deep learning on spark

Page 2: Advanced spark deep learning

The JVM is too slow for numerical compute

Great at network I/O and data access

Great streaming infrastructure

Hardware accel required

Spark - Data Access Layer.

Cuda - Compute layer

Page 3: Advanced spark deep learning

Current Landscape

Spark assumes columnar data

Binary (audio/images) is becoming more important

HDFS is great for storing blobs

SQL doesn’t work for pixels and audio frames

The ingredients are here for something great

Page 4: Advanced spark deep learning

The solution

Javacpp (cython for java)

64 bit pointers for efficient contiguous access of image and audio data

Leverage java’s distributed systems ecosystem

Add new numerical compute layer (libnd4j)

Allow for heterogeneous compute

Off heap memory

Easy deployment

Data pipelines as a first concern

Page 5: Advanced spark deep learning

SKIL (Skymind Intelligence Layer)

Page 6: Advanced spark deep learning

JavaCpp

Auto generate JNI bindings for C++ by parsing classes

Allows for easy maintenance and deployment of c++ binaries in java

Write efficient ETL pipelines for images via opencv (javacv)

Integrate other c++ deep learning frameworks (tensorflow,caffe,..)

Allows for productionization of fast (but academic) C++ code using java (kafka,spark) for ETL

64 bit pointers (wasn’t possible before)

Page 7: Advanced spark deep learning

“Actual” Streaming frameworks

Kafka

Flink

Spark Streaming

Apex

Page 8: Advanced spark deep learning
Page 9: Advanced spark deep learning

Nd4j

Heterogenous codebase

Supports cuda, x86 and soon (power)

Shared indexing logic for writing ndarray routines

Memory management in java (even cuda memory!)

Openmp on cpu + routines for common things such as reduce

Pinned memory and async operations

JIT allocation

Spark friendly (runs on multiple threads and devices)

Page 10: Advanced spark deep learning

Deployment

Juju

Runs as spark job

Easy to embed in production

Page 11: Advanced spark deep learning

Canova

One interface for ETL

Integrates with spark

Easy to extend to write your own custom data pipelines

One interface for generating NDArrays

Page 12: Advanced spark deep learning

Conclusion

Built to be friendly to the JVM ecosystem

Allows java to do what its good at

Numpy in java means easy to port things like scikit learn

Data Parallel means commodity hardware JVM assumes works

Page 13: Advanced spark deep learning

Future

Model Parallelism

Opencl

Sparse support

Reinforcement learning

Page 14: Advanced spark deep learning

Questions?

[email protected]