operational analytics - on-demand.gputechconf.com · end-to-end accelerated gpu data science...
Post on 28-May-2020
10 Views
Preview:
TRANSCRIPT
100xOperational AnalyticsThe RAPIDS SQL Engine
SQL in Python on GPUs
gdf = bc.sql('select count(*) from table').get()
@blazingsql
conda install
@blazingsql
launch a notebook
run queries
Faster
Cheaper
Easier@blazingsql
End-to-End Accelerated GPU Data ScienceIntroducing the Open-Source RAPIDS Library Suite
cuDF cuIODataFrame
GPU Memory
Data Preparation VisualizationModel Training
cuMLMachine Learning
cuGraphGraph Analytics
PyTorch Chainer MxNet
Deep LearningcuXfilter <> pyViz
Visualization
Dask
@blazingsql
End-to-End Accelerated GPU Data ScienceIntroducing the Open-Source RAPIDS Library Suite
cuDF cuIODataFrame
GPU Memory
Data Preparation VisualizationModel Training
cuMLMachine Learning
cuGraphGraph Analytics
PyTorch Chainer MxNet
Deep LearningcuXfilter <> pyViz
Visualization
Dask
BlazingSQLSQL Engine
@blazingsql
Storage Plugins
Supported:File Readers (cuIO):
@blazingsql
Data Lake
• AWS S3• Google Cloud Storage• HDFS
• CSV• JSON• Apache Parquet• Apache ORC
• Azure BlobComing Soon:
GPU Memory
CSV GDF
Pandas Parquet JSON
ETLFeature
Engineering
XGBoost>cuDFBlazingSQL >>
YOURDATA
MACHINELEARNING
from blazingsql import BlazingContext
import cudf
bc = BlazingContext()
bc.s3('bsql', bucket_name='bsql', access_key_id='<access_key>', secret_key='<secret_key')
bc.create_table('orders', s3://bsql/orders/')
gdf = bc.sql('select * from orders').get()
@blazingsql
XGBoost>cuDFBlazingSQL >>
T4 GPU
0.00
4 NODES
25.00
50.00
75.00
100.00
84.40
Netflow Demo Timings
Graphistry>cuDFBlazingSQL >>
TIME(Seconds)
15.6GB(1 x T4)
15.6GB(4 Nodes)
0
1000
2000
3000
XGBoost Demo TimingsTIME
(Seconds) $0.90
$0.04
0.87
84.40
Cost to run the ETL workloads on Google Cloud Platform @blazingsql
@blazingsqlGCP: 5 x n1-standard-4 (Tesla T4 GPU) w/ Local NVME
• TPC-H SF100 Query Times - NVME Storage
@blazingsqlGCP: 5 x n1-standard-4 (Tesla T4 GPU)
• TPC-H SF100 Query Times - GCS Storage
@blazingsqlGCP: 15 x n1-standard-4 (Tesla T4 GPU)
• TPC-H SF300 Query Times - GCS Storage
@blazingsql
• TPC-H SF100 vs SF300 - GCS Storage
@blazingsql
Demos
Scale
Up /
Acce
lerate
Scale out with RAPIDS
Scale out / Parallelize
Accelerated on single GPU
NumPy -> CuPy/PyTorch/..Pandas -> cuDFScikit-Learn -> cuMLNumba -> Numba
RAPIDS and Others
Multi-GPUOn single Node (DGX)Or across a cluster
RAPIDSBlazingSQL + Dask + OpenUCX
Multi-core and Distributed PyData
NumPy -> Dask ArrayPandas -> Dask DataFrameScikit-Learn -> Dask-ML… -> Dask Futures
DaskNumPy, Pandas, Scikit-Learn, Numba and many more
Single CPU coreIn-memory data
PyData
BlazingSQL + Dask + OpenUCX
@blazingsql
GET STARTED NOWIt’s easy to get started with BlazingSQL + RAPIDS.ai
CONDAGET STARTED
DOCKER HUBTRY NOW
GITHUBINSTALL
BlazingSQL can be installed with conda (miniconda, or
the full Anaconda distribution) from the blazingsql channel.
To run BlazingSQL on your own infrastructure, you can use our
container on Docker Hub.
BlazingSQL, the GPU-accelerated SQL engine of
the RAPIDS ecosystem,is now 100% open-source
licensed under Apache 2.0!
https://github.com/BlazingDB/https://hub.docker.com/u/blazingdbhttps://anaconda.org/blazingsql
@blazingsql
top related