operational analytics - on-demand.gputechconf.com · end-to-end accelerated gpu data science...

100xOperational AnalyticsThe RAPIDS SQL Engine

SQL in Python on GPUs

gdf = bc.sql('select count(*) from table').get()

@blazingsql

conda install

@blazingsql

launch a notebook

run queries

Faster

Cheaper

Easier@blazingsql

End-to-End Accelerated GPU Data ScienceIntroducing the Open-Source RAPIDS Library Suite

cuDF cuIODataFrame

GPU Memory

Data Preparation VisualizationModel Training

cuMLMachine Learning

cuGraphGraph Analytics

PyTorch Chainer MxNet

Deep LearningcuXfilter <> pyViz

Visualization

@blazingsql

End-to-End Accelerated GPU Data ScienceIntroducing the Open-Source RAPIDS Library Suite

cuDF cuIODataFrame

GPU Memory

Data Preparation VisualizationModel Training

cuMLMachine Learning

cuGraphGraph Analytics

PyTorch Chainer MxNet

Deep LearningcuXfilter <> pyViz

Visualization

BlazingSQLSQL Engine

@blazingsql

Storage Plugins

Supported:File Readers (cuIO):

@blazingsql

Data Lake

• AWS S3• Google Cloud Storage• HDFS

• CSV• JSON• Apache Parquet• Apache ORC

• Azure BlobComing Soon:

GPU Memory

CSV GDF

Pandas Parquet JSON

ETLFeature

Engineering

XGBoost>cuDFBlazingSQL >>

YOURDATA

MACHINELEARNING

from blazingsql import BlazingContext

import cudf

bc = BlazingContext()

bc.s3('bsql', bucket_name='bsql', access_key_id='<access_key>', secret_key='<secret_key')

bc.create_table('orders', s3://bsql/orders/')

gdf = bc.sql('select * from orders').get()

@blazingsql

XGBoost>cuDFBlazingSQL >>

T4 GPU

4 NODES

100.00

Netflow Demo Timings

Graphistry>cuDFBlazingSQL >>

TIME(Seconds)

15.6GB(1 x T4)

15.6GB(4 Nodes)

XGBoost Demo TimingsTIME

(Seconds) $0.90

Cost to run the ETL workloads on Google Cloud Platform @blazingsql

@blazingsqlGCP: 5 x n1-standard-4 (Tesla T4 GPU) w/ Local NVME

• TPC-H SF100 Query Times - NVME Storage

@blazingsqlGCP: 5 x n1-standard-4 (Tesla T4 GPU)

• TPC-H SF100 Query Times - GCS Storage

@blazingsqlGCP: 15 x n1-standard-4 (Tesla T4 GPU)

• TPC-H SF300 Query Times - GCS Storage

@blazingsql

• TPC-H SF100 vs SF300 - GCS Storage

@blazingsql

lerate

Scale out with RAPIDS

Scale out / Parallelize

Accelerated on single GPU

NumPy -> CuPy/PyTorch/..Pandas -> cuDFScikit-Learn -> cuMLNumba -> Numba

RAPIDS and Others

Multi-GPUOn single Node (DGX)Or across a cluster

RAPIDSBlazingSQL + Dask + OpenUCX

Multi-core and Distributed PyData

NumPy -> Dask ArrayPandas -> Dask DataFrameScikit-Learn -> Dask-ML… -> Dask Futures

DaskNumPy, Pandas, Scikit-Learn, Numba and many more

Single CPU coreIn-memory data

PyData

BlazingSQL + Dask + OpenUCX

@blazingsql

GET STARTED NOWIt’s easy to get started with BlazingSQL + RAPIDS.ai

CONDAGET STARTED

DOCKER HUBTRY NOW

GITHUBINSTALL

BlazingSQL can be installed with conda (miniconda, or

the full Anaconda distribution) from the blazingsql channel.

To run BlazingSQL on your own infrastructure, you can use our

container on Docker Hub.

BlazingSQL, the GPU-accelerated SQL engine of

the RAPIDS ecosystem,is now 100% open-source

licensed under Apache 2.0!

https://github.com/BlazingDB/https://hub.docker.com/u/blazingdbhttps://anaconda.org/blazingsql

@blazingsql

operational analytics - on-demand.gputechconf.com · end-to-end accelerated gpu data science...

Documents

gpu architecture & implications - computer...

optimizing gpu to gpu communication on cray xk7

multi-gpu programming - gpu technology conference

release 0.13 the platform inside and out · the average...

inspur gpu server -...

openacc cudaによる gpuコンピューティングgpu...

hemeroteca...

cmpt454 gpu managed database · gpgpu: general purpose gpu,...

gpu-to-gpu and host-to-host multipattern string matching on...

cygnus: gpu meets fpga for hpc - riken r-ccs · 2020. 2....

gpu architecture: implications &...

parallel hybrid computing · gpu gpu gpu gpu openmp hmpp...

gpu accelerated high performance data analytics · gpu...

cuml: a library for gpu accelerated machine learning ·...

gpu architecture overview - github...

computer vision on gpu with opencv - gipsa- · pdf...

best practices gpu-based video processing | gtc...

multi-gpu mapreduce on gpu clusters

cs 380 - gpu and gpgpu programming lecture 4: gpu ... · cs...

gpu, gp-gpu, gpu computing