gpu data warehouse for massive€¦ · but your database wasn’t. built to handle this level of...
TRANSCRIPT
GPU DATA WAREHOUSE
DATA ANALYTICSFOR MASSIVE
OBJECTIVES• Introduction to SQream• The Data Analytics Scalability Problem• What is SQream DB• How SQream works with IBM• SQream Case Studies• Questions and Next Steps
HQ in 7 WTC New York | R&D in Tel Aviv
CORPORATE PROFILEFOUNDED IN 2010
with Alibaba CloudStrategic Partnership
Patents10
Employees70+
2008<1-4TB
2010<10TB
2016TB-PB
YOUR DATA STORES AREGROWING EXPONENTIALLY
TechnologyCPU
TechnologyGPU
BUT YOUR DATABASE WASN’TBUILT TO HANDLE THIS LEVEL OF DATA
1970s-1990s 1990-2010MPP zone
2005-2010In-Memory zone
2010…Massive Data zoneClassic Relational zone
SQL QUERIES AND BI ANALYTICS
ARE TAKING WAYTOO LONG
VALUABLE INSIGHTSGO UNDISCOVERED
BI Lost90%
Data Analyzed<10%
NEW PARADIGM REQUIRED TO HANDLE MASSIVE DATA
SQREAM DBGPU-ACCELERATED DATA WAREHOUSE
100xfaster
Queries
10%of resources
Cost
20xmore data
Analyze
SQREAM DB
• Massively parallel engine• Faster and smaller than CPUs
POWERED BY GPUs
• Terabytes to petabytes• Not limited by RAM
• Ingests 3 TB/hr/GPU• Powerful columnar storage• Always-on compression
• Familiar ANSI SQL• Standard connectors
• 100 TB in a 2U server• Highly cost-efficient
• Python, AI, Jupyter, etc. • Built for data science
COMPLEMENTS YOUR EXISTING DATA STORES
MASSIVELY SCALABLE
SQL DATABASE
EXTENSIBLE FOR ML/AI
MINIMAL FOOTPRINT
LIGHTNING FAST
HOW IT WORKS
Chunking
Data Data Data
Automatic adaptivecompression
Data Data Data
GPU
Parallel chunkprocessing
Data Skipping
Data Data Data
Columnar process+ Metadata tagging
Data DataDataData
Raw data
Data Data Data
Data Data DataData Data Data
YOUR ANALYTICSSIMPLIFY AND ACCELERATE
Aggregations,indexing, cubingMPP 100 TB
Analyticstake hours
Complex ETL
MPP 100 TB
Resultswithin minutes
INTEGRATES WITHYOUR BI ECOSYSTEM
Java | Python | SQL | R | C++
- Data Sources -
- ETL and others -
- SQream DB-
Cloud Infrastructure
- BI and Visualization -
SQREAM DB
• Independent scale of workload and data volume
GPUGPU
GPUGPU
GPUGPU
GPUGPU
GPUGPU
GPUGPU
GPUGPU
GPUGPU
• Hybrid cloud deployment• Simple and flexible software
management
DECOUPLED COMPUTE AND STORAGE BENEFITS
SCALE-UP OR OUT• Scale up by expanding attached storage • Scale out by adding additional compute nodes
FAST AND SIMPLEBIG DATA EXPLORATION Query raw data directly Immediate ad-hoc querying More dimensions, more joins More insights, better accuracy Enhanced business intelligence
Multiple JOINs on any field
Time Series
RegularExpressions
ANSI-92Compatible
Window Analysis
ODBC, JDBC Python
Connectivity
HIGH THROUGHPUT CONVERGED• SQream DB designed for high-throughput
• IBM Power Systems is the only NVLinkCPU-to-GPU enabled architecture
• IBM AC922, with POWER9 and NVLINK can transfer data at up to 300GB/s, almost 9.5x faster than PCIe 3.0 found in x86-based architectures, reducing classic I/O bottlenecks
2xNVIDIA
Tesla V100
2xNVIDIA
Tesla V100
IBM Power 9
IBM Power 9
UP TO 3.7x FASTER QUERIESSQREAM DB ON POWER9
• SQream DB on Power9 is between 150% to 370% faster than comparable x86 architectures
• The CPU-GPU NVLink bandwidth is key to performance in complex queries
IBM Power9 AC922:2x POWER9 16C @ 3.8GHz | 256 GB DDR4 2666 MHz | SSD storage | 4x NVIDIA Tesla V100 (SXM2 NVLINK - 16GB)Dell PowerEdge R740:2x Intel Xeon Silver 4112 CPU @ 2.60GHz | 256GB DDR4 2666MHz | SSD storage | 4x NVIDIA Tesla V100 (PCIe - 16GB)
52.83
10.35
84.578.57
14.06
2.8
30.29 29.01
0
10
20
30
40
50
60
70
80
90
TPC-H Query 8 TPC-H Query 6 TPC-H Query 19 TPC-H Query 17
Que
ry ti
me
(sec
onds
)Lo
wer
is b
ette
r
Query
SQream DB performanceIBM Power9 vs Intel Xeon (Skylake)
Dell PowerEdge R740 IBM Power9 AC922
UP TO 2x FASTER LOADINGSQREAM DB ON POWER9
• SQream DB relies on CPU as well as GPUs for loading
• IBM’s Power9 multi-core architecture makes loading much faster than comparable x86 based systems
• IBM Power9 system loaded data nearly twice as fast as the x86 based machine
IBM Power9 AC922:2x POWER9 16C @ 3.8GHz | 256 GB DDR4 2666 MHz | SSD storage | 4x NVIDIA Tesla V100 (SXM2 NVLINK - 16GB)Dell PowerEdge R740:2x Intel Xeon Silver 4112 CPU @ 2.60GHz | 256GB DDR4 2666MHz | SSD storage | 4x NVIDIA Tesla V100 (PCIe - 16GB)
1,929
1,094
-
500
1,000
1,500
2,000
2,500
Load Time (seconds)
Load
Tim
e (s
econ
ds)
Low
er is
bet
ter
Load time for 6 billion TPC-H records
Dell Poweredge R740 IBM Power9 AC922
FINANCEFraud analysis Risk consolidationCustomized services
RETAILMonitor CompetitorsCustomer Experience Operational Decisions
TELECOMCustomer 360Competitive AnalysisNetwork Optimization
HEALTHCARECare ManagementIOT DevicesGenomic Research
UNDERSTAND 40 MILLION CUSTOMERSTELECOM
HP DL380g9with NVIDIA Tesla GPUs96 GB RAM + 6 TB storage
$200K
40 NODES5 full racks7600 CPU cores
$10,000,000
18M
10M
360M
120M
Ingest time
Reporting time
Ownership Cost
INCREASE REVENUESAD-TECH
Tesla GPUs
AcquisitionSources
85 TB/day in ad impressions for constructing bidding histograms
Data
2x NVIDIA
Queries take5 hours
Extract
Data Ingest Queries take5 minutes
Tesla GPUs
AcquisitionSources
Data
8x NVIDIA
ExtractNot feasible
X
Queries take5 minutes
INCREASE REVENUESAD-TECH
360 TB/day ingested to enhance bid histogram accuracy
Data Ingest
BUSINESS INSIGHTSWHOLESALE
$30 Billion Company - Supply Chain Use Case
DISCOVER NEW
Query Time Reduced from 30 Minutes on Exadata to 30 Seconds on SQream
Vast insights
untapped datauncovered from
SQream and Orange demonstrate 100x cost performance, removing limits of databases.”“Pascal Déchamboux | Director of Software
SQream helps us keep pace with rapidly increasing data for real customer benefits.”“Suppachai Panichayunon, Head Solution Architect
WHAT OURCUSTOMERS SAY
SQream is helping us to cut years of cancer research on large genomic datasets.”“Prof. Gideon Rechavi, Head of Cancer Research
We saw a cost effective opportunity to obtain analytic capabilities we couldn’t have before.““RF Group Leader
PROOF OF VALUE
NDA
SQream Introduction
PoC Planand Scope
Use case in detail KPIs Schema and queries Data samples
Kickoff
HW and SW recommendations ETL process design
Internal Evaluation
Use case validation Load and test Query optimization
Business Value Identification• What are your SQL analytics use cases?• What do you use now for database and visualization?• How much data do you perform analytics on now in terabytes? And how
much would you like to analyze now and in 1 to 3 years, in terabytes?• How much data is in your data lake and what is the rate of growth?• What kind of query response times do you get now and what are you
looking for?• What are the data sources and do you feel you analyze enough dimensions?• How many users? Total and concurrent?
FEEL FREE TO
ADDRESS
Headquarters, 7 WTC 250 Greenwich Street New York, New York
Joel Sehr, VP Sales, [email protected] | www.sqream.com
WE ARE SOCIAL
CONTACT