how nvidia data science workstations accelerate data ...€¦ · speed up drug discovery...

28
How NVIDIA Data Science Workstations Accelerate Data Science Workflows - Step By Step

Upload: others

Post on 03-Aug-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: How NVIDIA Data Science Workstations Accelerate Data ...€¦ · Speed Up Drug Discovery HEALTHCARE. 4 CHALLENGES AFFECTING DATA SCIENCE TODAY ... Simplifying DL, ML and HPC Workflows

How NVIDIA Data Science Workstations Accelerate Data Science Workflows - Step By Step

Page 2: How NVIDIA Data Science Workstations Accelerate Data ...€¦ · Speed Up Drug Discovery HEALTHCARE. 4 CHALLENGES AFFECTING DATA SCIENCE TODAY ... Simplifying DL, ML and HPC Workflows

2

Luke Wignall

Director, Technical Product [email protected]

Twitter: @lwignall

linkedin.com/in/lukewignall

PRESENTERS

Steve Harpster

Manager, Technical Product [email protected]

Twitter: @Nvsharpster

linkedin.com/in/steve-harpster-b212683

Page 3: How NVIDIA Data Science Workstations Accelerate Data ...€¦ · Speed Up Drug Discovery HEALTHCARE. 4 CHALLENGES AFFECTING DATA SCIENCE TODAY ... Simplifying DL, ML and HPC Workflows

3

DATA SCIENCE IS THEKEY TO MODERN BUSINESS

Ad Personalization

Click Through Rate Optimization

Churn Reduction

CONSUMER INTERNET

Claim Fraud

Customer Service Chatbots/Routing

Risk Evaluation

FINANCIAL SERVICES

Remaining Useful Life Estimation

Failure Prediction

Demand Forecasting

MANUFACTURING

Detect Network/Security Anomalies

Forecasting Network Performance

Network Resource Optimization (SON)

TELECOM

Supply Chain & Inventory Management

Price Management / Markdown Optimization

Promotion Prioritization And Ad Targeting

RETAIL

Personalization & Intelligent Customer Interactions

Connected Vehicle Predictive Maintenance

Forecasting, Demand, & Capacity Planning

AUTOMOTIVE

Sensor Data Tag Mapping

Anomaly Detection

Robust Fault Prediction

OIL & GAS

Improve Clinical Care

Drive Operational Efficiency

Speed Up Drug Discovery

HEALTHCARE

Page 4: How NVIDIA Data Science Workstations Accelerate Data ...€¦ · Speed Up Drug Discovery HEALTHCARE. 4 CHALLENGES AFFECTING DATA SCIENCE TODAY ... Simplifying DL, ML and HPC Workflows

4

CHALLENGES AFFECTINGDATA SCIENCE TODAY

What Concerns are Limiting Data Science Productivity?

INCREASING DATA ONSLAUGHT

Data sets are continuing to

dramatically increase in size

Multitude of sources

Different formats, varying

quality

SLOW CPUPROCESSING

End of Moore’s law, CPUs

aren’t getting faster

Many popular data science

tools have been CPU-only

Can only throw so many CPUs

at a job

COMPLEX INSTALLATION & MANAGEMENT

Time consuming to install

software

Nearly impossible to manage

all version conflicts

Updates often break other

software

Page 5: How NVIDIA Data Science Workstations Accelerate Data ...€¦ · Speed Up Drug Discovery HEALTHCARE. 4 CHALLENGES AFFECTING DATA SCIENCE TODAY ... Simplifying DL, ML and HPC Workflows

5

NVIDIA GPU ACCELERATED DATA SCIENCE

Workstation Server Cloud

Page 6: How NVIDIA Data Science Workstations Accelerate Data ...€¦ · Speed Up Drug Discovery HEALTHCARE. 4 CHALLENGES AFFECTING DATA SCIENCE TODAY ... Simplifying DL, ML and HPC Workflows

6

NVIDIA POWERED DATA SCIENCE WORKSTATIONSNVIDIA GPU-Accelerated Data Science

Integrated

hardware and

software solution

for Data Science

Page 7: How NVIDIA Data Science Workstations Accelerate Data ...€¦ · Speed Up Drug Discovery HEALTHCARE. 4 CHALLENGES AFFECTING DATA SCIENCE TODAY ... Simplifying DL, ML and HPC Workflows

7

BUILT ON QUADRO RTX

GPU Architecture Turing

CUDA Cores 4608

RT Cores 72

Tensor Cores 576

Memory BW Up to 672 GB/s

NVLink2-way (2 & 3slot)

100 GB/s bidirectional

Display Support 4x DP + 1x VirtualLink

RTX 8000 48GB / 96GB w/NVLink

RTX 6000 24GB / 48GB w/NVLink

GV100 32GB / 64GBDouble Precision (FP64)

RTX 6000/8000

Page 8: How NVIDIA Data Science Workstations Accelerate Data ...€¦ · Speed Up Drug Discovery HEALTHCARE. 4 CHALLENGES AFFECTING DATA SCIENCE TODAY ... Simplifying DL, ML and HPC Workflows

8

QUADRO REQUIRED FOR THE LARGEST WORKLOADSData Science Data Sets Require GPU Memory Only Available with Quadro

0

1

2

3

4

5

6

7

2080Ti RTX 6000

Max Data Set Size(in months)

Quadro value for data science

• Large Quadro memory lets data scientists process more data to improve

model training and accuracy.

• Quadro performance completes tasks faster

Sample data set using Home Mortgage data in the US for 2016. A single GeForce RTX 2080Ti can only load 3 months worth of data. A single Quadro RTX 6000 GPU can load 6 months of data at a time, anRTX 8000 can load an entire year’s worth of data.

1.40

0.00

0.20

0.40

0.60

0.80

1.00

1.20

1.40

1.60

2080Ti RTX 6000

End To End Training Time(3 months of data, relative performance)

Data Science Workload Example:

*A single RTX 8000 can load the entire year’s worth of dataTest system: dual Gold [email protected] 3.7GHz Turbo (Skylake), ETL with Dask + Pandas

Quadro holds more data

Quadro 40%faster

Page 9: How NVIDIA Data Science Workstations Accelerate Data ...€¦ · Speed Up Drug Discovery HEALTHCARE. 4 CHALLENGES AFFECTING DATA SCIENCE TODAY ... Simplifying DL, ML and HPC Workflows

9

CUDA-X AI DISTRIBUTIONDelivering Best User Experience with Validated Interoperability

arrow-cpp 0.12

cython 0.29

pyarrow 0.12

xgboost 0.8

nltk 3.2.5

numba 0.42

caffe-gpu 1.0.0

lime 0.1.1.32

anaconda-navigator 1.9.6

bokeh 1.0.4

cmake 3.12

opencv 3.4.2

pandas 0.23.4

pytorch 1.0.1

scikit-learn 0.20.2

scipy 1.2.0

statsmodels 0.9.0

torchvision 0.2.1

cffi 1.11.5

chainer 5.1.0

ipyvolume 0.5.1

pytest 4.3.0

python-graphviz 0.8.4

setuptools 40.8.0

cuda100 1.0.0

cudatoolkit 10.0.130

cudf 0.5.1

cuml 0.5.1

cupy-cuda100 5.2.0

nvstrings 0.2.0

matplotlib 3.0.2

python 3.7

matplotlib 3.0.2

dask 1.1.1

dask-core 1.1.1

dask-cuda 0.0.1

dask-cudf 0.0.1

dask-xgboost 0.1.5

distributed 1.25.3

faiss-gpu 1.5.0

ipywidgets 7.4.2

jupyterlab 0.35.4

numpy 1.15.4

(Almost) One Click Install: Assembled into one script, installed and tested on proven hardware

Highlighted libraries are GPU accelerated.

Page 10: How NVIDIA Data Science Workstations Accelerate Data ...€¦ · Speed Up Drug Discovery HEALTHCARE. 4 CHALLENGES AFFECTING DATA SCIENCE TODAY ... Simplifying DL, ML and HPC Workflows

10

GPU-ACCELERATED DATA SCIENCE WORKFLOW WITH NVIDIA RAPIDS

Built on CUDA-X AI

DATA

DATA PREPARATION

GPUs accelerated compute for in-memory data preparation

Simplified implementation using familiar data science tools

Python drop-in pandas replacement built on CUDA C++.

GPU-accelerated Spark

PREDICTIONS

Page 11: How NVIDIA Data Science Workstations Accelerate Data ...€¦ · Speed Up Drug Discovery HEALTHCARE. 4 CHALLENGES AFFECTING DATA SCIENCE TODAY ... Simplifying DL, ML and HPC Workflows

11

DATA SET VIDEO

Page 12: How NVIDIA Data Science Workstations Accelerate Data ...€¦ · Speed Up Drug Discovery HEALTHCARE. 4 CHALLENGES AFFECTING DATA SCIENCE TODAY ... Simplifying DL, ML and HPC Workflows

12

JUPYTER ETL VIDEO

Page 13: How NVIDIA Data Science Workstations Accelerate Data ...€¦ · Speed Up Drug Discovery HEALTHCARE. 4 CHALLENGES AFFECTING DATA SCIENCE TODAY ... Simplifying DL, ML and HPC Workflows

13

GPU-ACCELERATED DATA SCIENCE WORKFLOW WITH NVIDIA RAPIDS

Built on CUDA-X AI

MODEL TRAINING

GPU-acceleration of today’s most popular ML algorithms such as XGBoost

Also available are PCA, K-means, k-NN, DBScan, tSVD, and many more

Easy-to-adopt, scikit-learn like interface

DATA PREDICTIONS

Page 14: How NVIDIA Data Science Workstations Accelerate Data ...€¦ · Speed Up Drug Discovery HEALTHCARE. 4 CHALLENGES AFFECTING DATA SCIENCE TODAY ... Simplifying DL, ML and HPC Workflows

14

JUPYTER TRAINING VIDEO

Page 15: How NVIDIA Data Science Workstations Accelerate Data ...€¦ · Speed Up Drug Discovery HEALTHCARE. 4 CHALLENGES AFFECTING DATA SCIENCE TODAY ... Simplifying DL, ML and HPC Workflows

15

GPU-ACCELERATED DATA SCIENCE WORKFLOW WITH NVIDIA RAPIDS

Built on CUDA-X AI

VISUALIZATION

Effortless exploration of datasets, billions of records in milliseconds

Dynamic interaction with data = faster ML model development

Data visualization ecosystem (Graphistry & OmniSci), integrated with RAPIDS

DATA PREDICTIONS

Page 16: How NVIDIA Data Science Workstations Accelerate Data ...€¦ · Speed Up Drug Discovery HEALTHCARE. 4 CHALLENGES AFFECTING DATA SCIENCE TODAY ... Simplifying DL, ML and HPC Workflows

16

VIZ DEMOS

Page 17: How NVIDIA Data Science Workstations Accelerate Data ...€¦ · Speed Up Drug Discovery HEALTHCARE. 4 CHALLENGES AFFECTING DATA SCIENCE TODAY ... Simplifying DL, ML and HPC Workflows

17

FASTER PERFORMANCE, REAL WORLD BENEFITS

CPU: dual Gold [email protected] 3.7GHz Turbo (Skylake), ETL with Dask + Pandas End-to-end time = Data Prep + Conversion + Training + Validation

Dataset: Mortgage Data 2015-2016

0

50

100

150

200

250

300

CPU 1x RTX8000 2x RTX8000

Data Prep

0

20

40

60

80

100

120

140

160

180

200

CPU 1x RTX8000 2x RTX8000

Training w/XGBoost

0

100

200

300

400

500

600

CPU 1x RTX8000 2x RTX8000

End-to-end

Seconds (lower is better)

~ 30X Faster

than

CPU

~ 8X Faster

than

CPU

~ 10X Faster

than

CPU

Page 18: How NVIDIA Data Science Workstations Accelerate Data ...€¦ · Speed Up Drug Discovery HEALTHCARE. 4 CHALLENGES AFFECTING DATA SCIENCE TODAY ... Simplifying DL, ML and HPC Workflows

18

NGC: GPU-OPTIMIZED SOFTWARE HUBSimplifying DL, ML and HPC Workflows

50+ ContainersDL, ML, HPC

Pre-trained ModelsNLP, Classification, Object Detection & more

Industry WorkflowsMedical Imaging, Intelligent Video Analytics

Model Training ScriptsNLP, Image Classification, Object Detection & more

Innovate Faster

Deploy Anywhere

Simplify Deployments

NGC

Page 19: How NVIDIA Data Science Workstations Accelerate Data ...€¦ · Speed Up Drug Discovery HEALTHCARE. 4 CHALLENGES AFFECTING DATA SCIENCE TODAY ... Simplifying DL, ML and HPC Workflows

19

NVIDIA NGC SUPPORT SERVICESMinimize Downtime And Maximize System Utilization

Availability

• Available for NGC-Ready

workstations

• Service agreement between

NVIDIA & customer

• Purchase from OEM

Support by NVIDIA’s subject matter experts

24x7 portal, phone and email access to create support cases

Live support during local, regional business hours for technical assistance

Support Coverage

• NGC DL & ML containers

• NVIDIA drivers

• NV-docker

• CUDA

Page 20: How NVIDIA Data Science Workstations Accelerate Data ...€¦ · Speed Up Drug Discovery HEALTHCARE. 4 CHALLENGES AFFECTING DATA SCIENCE TODAY ... Simplifying DL, ML and HPC Workflows

20

VOLTA TENSOR CORE GPU

NVIDIA ACCELERATED DATA SCIENCEGPU-Accelerated Data Science for Today’s Data Led Business

Simplified Deployment from Integrated Hardware and Software

Faster Time To Insight and Mode Accurate

Models with the Power of NVIDIA GPUsA Solution for Every User and

Every Organization

GPU-Acceleration Offers 10xPerformance Improvement

Built on CUDA-X AI, Ready-to-Run on NVIDIA-powered Systems

Run on Cloud, Laptop, Workstation,Server, and Clusters

Page 21: How NVIDIA Data Science Workstations Accelerate Data ...€¦ · Speed Up Drug Discovery HEALTHCARE. 4 CHALLENGES AFFECTING DATA SCIENCE TODAY ... Simplifying DL, ML and HPC Workflows

21

CUSTOMER EXPERIENCES

“Our initial look at the NVIDIA-Powered Lenovo

AI workstation showed significant performance

gains. Data scientists will appreciate being able

to move more quickly through the analytics life

cycle, which will allow them to address and

support more analytics needs to transform

business processes.”

-- Gavin Day, Senior Vice President for

Technology at SAS

Page 22: How NVIDIA Data Science Workstations Accelerate Data ...€¦ · Speed Up Drug Discovery HEALTHCARE. 4 CHALLENGES AFFECTING DATA SCIENCE TODAY ... Simplifying DL, ML and HPC Workflows

22

LEARN MORE

www.nvidia.com/datascienceworkstationwww.nvidia.com/datasciencewww.rapids.ai

Page 23: How NVIDIA Data Science Workstations Accelerate Data ...€¦ · Speed Up Drug Discovery HEALTHCARE. 4 CHALLENGES AFFECTING DATA SCIENCE TODAY ... Simplifying DL, ML and HPC Workflows

23

NVIDIA POWERED DATA SCIENCE WORKSTATIONSA NEW BREED OF WORKSTATION FOR DATA SCIENCE

• Dual Quadro GPUs with up to 96 GB

GPU memory with NVLink

• Pre-installed NVDIA accelerated data

science software

• Optional enterprise software support

• Designed & optimized to accelerate

data science workflows

Page 24: How NVIDIA Data Science Workstations Accelerate Data ...€¦ · Speed Up Drug Discovery HEALTHCARE. 4 CHALLENGES AFFECTING DATA SCIENCE TODAY ... Simplifying DL, ML and HPC Workflows

24

Q&A

Page 25: How NVIDIA Data Science Workstations Accelerate Data ...€¦ · Speed Up Drug Discovery HEALTHCARE. 4 CHALLENGES AFFECTING DATA SCIENCE TODAY ... Simplifying DL, ML and HPC Workflows
Page 26: How NVIDIA Data Science Workstations Accelerate Data ...€¦ · Speed Up Drug Discovery HEALTHCARE. 4 CHALLENGES AFFECTING DATA SCIENCE TODAY ... Simplifying DL, ML and HPC Workflows

26

THE REVIEWS ARE INData Science Workstations are a Hit!

“The best part of it was really that everything

‘just works’”. “NVIDIA’s new hardware &

software is making it

easier for organizations

to process data right on

the desktop…”

“The processing power brought to the

data scientist by leveraging the Data

Science Workstation provides a

tremendous simplification for the

development and testing phase of the

life cycle.”

Page 27: How NVIDIA Data Science Workstations Accelerate Data ...€¦ · Speed Up Drug Discovery HEALTHCARE. 4 CHALLENGES AFFECTING DATA SCIENCE TODAY ... Simplifying DL, ML and HPC Workflows

27

DSWS: COST EFFECTIVE DATA SCIENCE DEVELOPMENT

Alibaba Cloud Data Analysis instance with

P100 GPU*$1499.04 USD / month

*based on Alibaba Price calculator alibabacloud.com 12/1102019. RTX 6000 ~3x performance of P100 based on internal testing on PyTorch ResNet-50 V1.5 training tests. **based on US pricing on Lenovo.com on 12/10/019

Lenovo ThinkStation P520$7,799 USD**

3 6 9 12

$4497.12 USD $8994.24 USD $13,718.16 $13491.36 USD

Lenovo ThinkStation P520Intel Xeon W-2148 8-core CPU

128 GB RAMQuadro RTX 6000 GPU

Page 28: How NVIDIA Data Science Workstations Accelerate Data ...€¦ · Speed Up Drug Discovery HEALTHCARE. 4 CHALLENGES AFFECTING DATA SCIENCE TODAY ... Simplifying DL, ML and HPC Workflows

28

DSWS: COST EFFECTIVE DATA SCIENCE DEVELOPMENT

AWS P3.2xlarge$1,524.24/month*

AWS P3.8xlarge$6098.42/month*

*based on AWS calculator calculator.s3.amazaonaws.com on 9/16/2019

Lenovo ThinkStation P520$7,799 USD**

Lenovo ThinkStation P920$24,648 USD**

3 6 9 12

$4,572.72 $9,145.44 $13,718.16 $18,290.88

3 6 9 12

$18,196.84 $36,590.52 $54,885.78 $72,181.04

**based on US pricing on Lenovo.com on 12/10/019