data science popup austin: making data science fast: survey of gpu accelerated tools
TRANSCRIPT
DATA SCIENCEPOP UP
AUSTIN
Making Data Science FAST: Survey of GPU Accelerated Tools
Mazhar MemonCO-Founder and CEO, Bitfusion.io
DATA SCIENCEPOP UP
AUSTIN
#datapopupaustin
April 13, 2016Galvanize, Austin Campus
MAKING DATA SCIENCE FAST: SURVEY OF GPU ACCELERATED TOOLS
4
MAZHAR MEMON
CTO, BITFUSION. IO
Overview
•OverviewofGPUs
•Drop-inLibraries
•ProgrammingFrameworks
•DeepLearning
•GraphDatabases
•Visualization
5
6
abstractand
slow
à
ß complexand
fast
hardware
softwar
e
Timeà
Thebiggap:makingyourdatasciencefast!
The problem in computing
Integrated GPUs
•Architecture:SIMD,sharedresourcearchitecture
•Targetedworkloads:Medium-sizedoffloads,latency-sensitive,cost-sensitive,media
•Programmingmodels:OpenCL,DirectCompute,C++AMP,SPIR,HSAIL
•Ecosystemmaturity:High
•Links:• https://software.intel.com/en-us/articles/intel-graphics-developers-guides
7
Discrete GPUs
•Architecture:SIMD,discretecoprocessorconfiguration
•Targetedworkloads:Large-sizedoffloads,throughput-sensitive,parallelstructured
•Programmingmodels:CUDA,OpenCL,DirectCompute,C++AMP,SYCL,SPIR,HSA
•Ecosystemmaturity:High
•Links:• http://docs.nvidia.com/cuda/cuda-getting-started-guide-for-linux
8
Drop-in Libraries•Designedtobeeasytouse,noprogrammingrequired
•Others:AmgX,cuDNN,cuFFT,IndeX,nvGRAPH,GIE,NPP,FFMPEG,…• https://developer.nvidia.com/gpu-accelerated-libraries
9
Programming for GPUs
C/C++▪ CUDA▪ OpenCL▪ DirectCompute▪ AMP▪ SYCL
Python◦ PyCUDA
Matlab
Machine Learning
Caffe:http://caffe.berkeleyvision.org
Torch7:http://torch.ch
Cxxnet:https://github.com/dmlc/cxxnet
MXNet:https://github.com/dmlc/mxnet
MATLAB:http://www.mathworks.com/products/matlab/
TensorFlow:https://www.tensorflow.org
Mocha:https://github.com/pluskid/Mocha.jl
Graph Databases: Pros and Cons
+ Fast statistics
+ Best compression
+ Easy to add new column
- Not good for fast inserts, streaming
- ETL required on import
Column store
+ No schema lock-in
+ Relationship queries fast
+ Rapid development
- Bad performance at scale historically, or limited query support
- No standard query language (exception of SPARQL)
Graph/ NoSQL
+ Rapid transactions
+ Very robust, mature
- Not easy to add or remove columns after created
- Every database has it’s own interpretation
Row store
GPU experience so far
Success in Four different areas
Sqream:100xfasteronSQLqueries
BlazeGraph:1000xfasterongraphqueries
IBMDB2BLU:2-100xfasteronbusinessanalytics
GPUDB:Real-timequeriesonstreamingdata, naturalEnglishqueries
BlazingDB -
BlazingDBisahighperformanceSQLdatabaseonvideographicscards(GPUs).WeleverageprocessorsfromthevideogameindustrytopowerBigDataAnalytics.Faster,abletohandlegreaterscale,allinsimpleSQL.
14
Visualization
Graphistry: Demo
16
Questions?
17
Backup
18
DATA SCIENCEPOP UP
AUSTIN
@datapopup #datapopupaustin