data science popup austin: making data science fast: survey of gpu accelerated tools

DATA SCIENCEPOP UP

AUSTIN

Making Data Science FAST: Survey of GPU Accelerated Tools

Mazhar MemonCO-Founder and CEO, Bitfusion.io

DATA SCIENCEPOP UP

AUSTIN

#datapopupaustin

April 13, 2016Galvanize, Austin Campus

http://www.dominodatalab.com

MAKING DATA SCIENCE FAST: SURVEY OF GPU ACCELERATED TOOLS

4

MAZHAR MEMON

CTO, BITFUSION. IO

Overview

•OverviewofGPUs

•Drop-inLibraries

•ProgrammingFrameworks

•DeepLearning

•GraphDatabases

•Visualization

5

6

abstractand

slow

à

ß complexand

fast

hardware

softwar

e

Timeà

Thebiggap:makingyourdatasciencefast!

The problem in computing

Integrated GPUs

•Architecture:SIMD,sharedresourcearchitecture

•Targetedworkloads:Medium-sizedoffloads,latency-sensitive,cost-sensitive,media

•Programmingmodels:OpenCL,DirectCompute,C++AMP,SPIR,HSAIL

•Ecosystemmaturity:High

•Links:• https://software.intel.com/en-us/articles/intel-graphics-developers-guides

7

https://software.intel.com/en-us/articles/intel-graphics-developers-guides

https://software.intel.com/en-us/articles/intel-graphics-developers-guides

Discrete GPUs

•Architecture:SIMD,discretecoprocessorconfiguration

•Targetedworkloads:Large-sizedoffloads,throughput-sensitive,parallelstructured

•Programmingmodels:CUDA,OpenCL,DirectCompute,C++AMP,SYCL,SPIR,HSA

•Ecosystemmaturity:High

•Links:• http://docs.nvidia.com/cuda/cuda-getting-started-guide-for-linux

8

http://docs.nvidia.com/cuda/cuda-getting-started-guide-for-linux

http://docs.nvidia.com/cuda/cuda-getting-started-guide-for-linux

Drop-in Libraries•Designedtobeeasytouse,noprogrammingrequired

•Others:AmgX,cuDNN,cuFFT,IndeX,nvGRAPH,GIE,NPP,FFMPEG,…• https://developer.nvidia.com/gpu-accelerated-libraries

9

Programming for GPUs

C/C++▪ CUDA▪ OpenCL▪ DirectCompute▪ AMP▪ SYCL

Python◦ PyCUDA

Matlab

Machine Learning

Caffe:http://caffe.berkeleyvision.org

Torch7:http://torch.ch

Cxxnet:https://github.com/dmlc/cxxnet

MXNet:https://github.com/dmlc/mxnet

MATLAB:http://www.mathworks.com/products/matlab/

TensorFlow:https://www.tensorflow.org

Mocha:https://github.com/pluskid/Mocha.jl

https://github.com/dmlc/cxxnet

https://github.com/dmlc/cxxnet

Graph Databases: Pros and Cons

+ Fast statistics

+ Best compression

+ Easy to add new column

- Not good for fast inserts, streaming

- ETL required on import

Column store

+ No schema lock-in

+ Relationship queries fast

+ Rapid development

- Bad performance at scale historically, or limited query support

- No standard query language (exception of SPARQL)

Graph/ NoSQL

+ Rapid transactions

+ Very robust, mature

- Not easy to add or remove columns after created

- Every database has it’s own interpretation

Row store

GPU experience so far

Success in Four different areas

Sqream:100xfasteronSQLqueries

BlazeGraph:1000xfasterongraphqueries

IBMDB2BLU:2-100xfasteronbusinessanalytics

GPUDB:Real-timequeriesonstreamingdata, naturalEnglishqueries

BlazingDB -

BlazingDBisahighperformanceSQLdatabaseonvideographicscards(GPUs).WeleverageprocessorsfromthevideogameindustrytopowerBigDataAnalytics.Faster,abletohandlegreaterscale,allinsimpleSQL.

14

Visualization

Graphistry: Demo

16

Questions?

17

Backup

18

DATA SCIENCEPOP UP

AUSTIN

@datapopup #datapopupaustin

data science popup austin: making data science fast: survey of gpu accelerated tools

Data & Analytics