how nvidia data science workstations accelerate data ...€¦ · speed up drug discovery...
TRANSCRIPT
How NVIDIA Data Science Workstations Accelerate Data Science Workflows - Step By Step
2
Luke Wignall
Director, Technical Product [email protected]
Twitter: @lwignall
linkedin.com/in/lukewignall
PRESENTERS
Steve Harpster
Manager, Technical Product [email protected]
Twitter: @Nvsharpster
linkedin.com/in/steve-harpster-b212683
3
DATA SCIENCE IS THEKEY TO MODERN BUSINESS
Ad Personalization
Click Through Rate Optimization
Churn Reduction
CONSUMER INTERNET
Claim Fraud
Customer Service Chatbots/Routing
Risk Evaluation
FINANCIAL SERVICES
Remaining Useful Life Estimation
Failure Prediction
Demand Forecasting
MANUFACTURING
Detect Network/Security Anomalies
Forecasting Network Performance
Network Resource Optimization (SON)
TELECOM
Supply Chain & Inventory Management
Price Management / Markdown Optimization
Promotion Prioritization And Ad Targeting
RETAIL
Personalization & Intelligent Customer Interactions
Connected Vehicle Predictive Maintenance
Forecasting, Demand, & Capacity Planning
AUTOMOTIVE
Sensor Data Tag Mapping
Anomaly Detection
Robust Fault Prediction
OIL & GAS
Improve Clinical Care
Drive Operational Efficiency
Speed Up Drug Discovery
HEALTHCARE
4
CHALLENGES AFFECTINGDATA SCIENCE TODAY
What Concerns are Limiting Data Science Productivity?
INCREASING DATA ONSLAUGHT
Data sets are continuing to
dramatically increase in size
Multitude of sources
Different formats, varying
quality
SLOW CPUPROCESSING
End of Moore’s law, CPUs
aren’t getting faster
Many popular data science
tools have been CPU-only
Can only throw so many CPUs
at a job
COMPLEX INSTALLATION & MANAGEMENT
Time consuming to install
software
Nearly impossible to manage
all version conflicts
Updates often break other
software
5
NVIDIA GPU ACCELERATED DATA SCIENCE
Workstation Server Cloud
6
NVIDIA POWERED DATA SCIENCE WORKSTATIONSNVIDIA GPU-Accelerated Data Science
Integrated
hardware and
software solution
for Data Science
7
BUILT ON QUADRO RTX
GPU Architecture Turing
CUDA Cores 4608
RT Cores 72
Tensor Cores 576
Memory BW Up to 672 GB/s
NVLink2-way (2 & 3slot)
100 GB/s bidirectional
Display Support 4x DP + 1x VirtualLink
RTX 8000 48GB / 96GB w/NVLink
RTX 6000 24GB / 48GB w/NVLink
GV100 32GB / 64GBDouble Precision (FP64)
RTX 6000/8000
8
QUADRO REQUIRED FOR THE LARGEST WORKLOADSData Science Data Sets Require GPU Memory Only Available with Quadro
0
1
2
3
4
5
6
7
2080Ti RTX 6000
Max Data Set Size(in months)
Quadro value for data science
• Large Quadro memory lets data scientists process more data to improve
model training and accuracy.
• Quadro performance completes tasks faster
Sample data set using Home Mortgage data in the US for 2016. A single GeForce RTX 2080Ti can only load 3 months worth of data. A single Quadro RTX 6000 GPU can load 6 months of data at a time, anRTX 8000 can load an entire year’s worth of data.
1.40
0.00
0.20
0.40
0.60
0.80
1.00
1.20
1.40
1.60
2080Ti RTX 6000
End To End Training Time(3 months of data, relative performance)
Data Science Workload Example:
*A single RTX 8000 can load the entire year’s worth of dataTest system: dual Gold [email protected] 3.7GHz Turbo (Skylake), ETL with Dask + Pandas
Quadro holds more data
Quadro 40%faster
9
CUDA-X AI DISTRIBUTIONDelivering Best User Experience with Validated Interoperability
arrow-cpp 0.12
cython 0.29
pyarrow 0.12
xgboost 0.8
nltk 3.2.5
numba 0.42
caffe-gpu 1.0.0
lime 0.1.1.32
anaconda-navigator 1.9.6
bokeh 1.0.4
cmake 3.12
opencv 3.4.2
pandas 0.23.4
pytorch 1.0.1
scikit-learn 0.20.2
scipy 1.2.0
statsmodels 0.9.0
torchvision 0.2.1
cffi 1.11.5
chainer 5.1.0
ipyvolume 0.5.1
pytest 4.3.0
python-graphviz 0.8.4
setuptools 40.8.0
cuda100 1.0.0
cudatoolkit 10.0.130
cudf 0.5.1
cuml 0.5.1
cupy-cuda100 5.2.0
nvstrings 0.2.0
matplotlib 3.0.2
python 3.7
matplotlib 3.0.2
dask 1.1.1
dask-core 1.1.1
dask-cuda 0.0.1
dask-cudf 0.0.1
dask-xgboost 0.1.5
distributed 1.25.3
faiss-gpu 1.5.0
ipywidgets 7.4.2
jupyterlab 0.35.4
numpy 1.15.4
(Almost) One Click Install: Assembled into one script, installed and tested on proven hardware
Highlighted libraries are GPU accelerated.
10
GPU-ACCELERATED DATA SCIENCE WORKFLOW WITH NVIDIA RAPIDS
Built on CUDA-X AI
DATA
DATA PREPARATION
GPUs accelerated compute for in-memory data preparation
Simplified implementation using familiar data science tools
Python drop-in pandas replacement built on CUDA C++.
GPU-accelerated Spark
PREDICTIONS
11
DATA SET VIDEO
12
JUPYTER ETL VIDEO
13
GPU-ACCELERATED DATA SCIENCE WORKFLOW WITH NVIDIA RAPIDS
Built on CUDA-X AI
MODEL TRAINING
GPU-acceleration of today’s most popular ML algorithms such as XGBoost
Also available are PCA, K-means, k-NN, DBScan, tSVD, and many more
Easy-to-adopt, scikit-learn like interface
DATA PREDICTIONS
14
JUPYTER TRAINING VIDEO
15
GPU-ACCELERATED DATA SCIENCE WORKFLOW WITH NVIDIA RAPIDS
Built on CUDA-X AI
VISUALIZATION
Effortless exploration of datasets, billions of records in milliseconds
Dynamic interaction with data = faster ML model development
Data visualization ecosystem (Graphistry & OmniSci), integrated with RAPIDS
DATA PREDICTIONS
16
VIZ DEMOS
17
FASTER PERFORMANCE, REAL WORLD BENEFITS
CPU: dual Gold [email protected] 3.7GHz Turbo (Skylake), ETL with Dask + Pandas End-to-end time = Data Prep + Conversion + Training + Validation
Dataset: Mortgage Data 2015-2016
0
50
100
150
200
250
300
CPU 1x RTX8000 2x RTX8000
Data Prep
0
20
40
60
80
100
120
140
160
180
200
CPU 1x RTX8000 2x RTX8000
Training w/XGBoost
0
100
200
300
400
500
600
CPU 1x RTX8000 2x RTX8000
End-to-end
Seconds (lower is better)
~ 30X Faster
than
CPU
~ 8X Faster
than
CPU
~ 10X Faster
than
CPU
18
NGC: GPU-OPTIMIZED SOFTWARE HUBSimplifying DL, ML and HPC Workflows
50+ ContainersDL, ML, HPC
Pre-trained ModelsNLP, Classification, Object Detection & more
Industry WorkflowsMedical Imaging, Intelligent Video Analytics
Model Training ScriptsNLP, Image Classification, Object Detection & more
Innovate Faster
Deploy Anywhere
Simplify Deployments
NGC
19
NVIDIA NGC SUPPORT SERVICESMinimize Downtime And Maximize System Utilization
Availability
• Available for NGC-Ready
workstations
• Service agreement between
NVIDIA & customer
• Purchase from OEM
Support by NVIDIA’s subject matter experts
24x7 portal, phone and email access to create support cases
Live support during local, regional business hours for technical assistance
Support Coverage
• NGC DL & ML containers
• NVIDIA drivers
• NV-docker
• CUDA
20
VOLTA TENSOR CORE GPU
NVIDIA ACCELERATED DATA SCIENCEGPU-Accelerated Data Science for Today’s Data Led Business
Simplified Deployment from Integrated Hardware and Software
Faster Time To Insight and Mode Accurate
Models with the Power of NVIDIA GPUsA Solution for Every User and
Every Organization
GPU-Acceleration Offers 10xPerformance Improvement
Built on CUDA-X AI, Ready-to-Run on NVIDIA-powered Systems
Run on Cloud, Laptop, Workstation,Server, and Clusters
21
CUSTOMER EXPERIENCES
“Our initial look at the NVIDIA-Powered Lenovo
AI workstation showed significant performance
gains. Data scientists will appreciate being able
to move more quickly through the analytics life
cycle, which will allow them to address and
support more analytics needs to transform
business processes.”
-- Gavin Day, Senior Vice President for
Technology at SAS
22
LEARN MORE
www.nvidia.com/datascienceworkstationwww.nvidia.com/datasciencewww.rapids.ai
23
NVIDIA POWERED DATA SCIENCE WORKSTATIONSA NEW BREED OF WORKSTATION FOR DATA SCIENCE
• Dual Quadro GPUs with up to 96 GB
GPU memory with NVLink
• Pre-installed NVDIA accelerated data
science software
• Optional enterprise software support
• Designed & optimized to accelerate
data science workflows
24
Q&A
26
THE REVIEWS ARE INData Science Workstations are a Hit!
“The best part of it was really that everything
‘just works’”. “NVIDIA’s new hardware &
software is making it
easier for organizations
to process data right on
the desktop…”
“The processing power brought to the
data scientist by leveraging the Data
Science Workstation provides a
tremendous simplification for the
development and testing phase of the
life cycle.”
27
DSWS: COST EFFECTIVE DATA SCIENCE DEVELOPMENT
Alibaba Cloud Data Analysis instance with
P100 GPU*$1499.04 USD / month
*based on Alibaba Price calculator alibabacloud.com 12/1102019. RTX 6000 ~3x performance of P100 based on internal testing on PyTorch ResNet-50 V1.5 training tests. **based on US pricing on Lenovo.com on 12/10/019
Lenovo ThinkStation P520$7,799 USD**
3 6 9 12
$4497.12 USD $8994.24 USD $13,718.16 $13491.36 USD
Lenovo ThinkStation P520Intel Xeon W-2148 8-core CPU
128 GB RAMQuadro RTX 6000 GPU
28
DSWS: COST EFFECTIVE DATA SCIENCE DEVELOPMENT
AWS P3.2xlarge$1,524.24/month*
AWS P3.8xlarge$6098.42/month*
*based on AWS calculator calculator.s3.amazaonaws.com on 9/16/2019
Lenovo ThinkStation P520$7,799 USD**
Lenovo ThinkStation P920$24,648 USD**
3 6 9 12
$4,572.72 $9,145.44 $13,718.16 $18,290.88
3 6 9 12
$18,196.84 $36,590.52 $54,885.78 $72,181.04
**based on US pricing on Lenovo.com on 12/10/019