leveraging customer behavioral data to drive...
TRANSCRIPT
![Page 1: Leveraging Customer Behavioral Data to Drive Revenueon-demand.gputechconf.com/gtc/2017/presentation/s7456-arnon-shimoni... · Leveraging Customer Behavioral Data to Drive Revenue](https://reader034.vdocuments.net/reader034/viewer/2022042022/5e79cff2a23e160c6671e810/html5/thumbnails/1.jpg)
1@arnon86 S7456
Leveraging Customer Behavioral Data
to Drive Revenue
the GPU way
![Page 2: Leveraging Customer Behavioral Data to Drive Revenueon-demand.gputechconf.com/gtc/2017/presentation/s7456-arnon-shimoni... · Leveraging Customer Behavioral Data to Drive Revenue](https://reader034.vdocuments.net/reader034/viewer/2022042022/5e79cff2a23e160c6671e810/html5/thumbnails/2.jpg)
2@arnon86 S7456
Hi! Arnon Shimoni
Senior Solutions Architect
I like hardware & parallel / concurrent stuff
In my 4th year at SQream Technologies
Send gifs to @arnon86 or [email protected]
![Page 3: Leveraging Customer Behavioral Data to Drive Revenueon-demand.gputechconf.com/gtc/2017/presentation/s7456-arnon-shimoni... · Leveraging Customer Behavioral Data to Drive Revenue](https://reader034.vdocuments.net/reader034/viewer/2022042022/5e79cff2a23e160c6671e810/html5/thumbnails/3.jpg)
3@arnon86 S7456
tl;dr
• GPUs are good number crunchers – makes them good for data processing
• SQream DB with GPUs is fast
• Rethink current solutions, the GPU can help
• Simple hardware is good enough, let’s avoid throwing lots of hardware at issues. Don’t need to shovel money at the problem!
![Page 4: Leveraging Customer Behavioral Data to Drive Revenueon-demand.gputechconf.com/gtc/2017/presentation/s7456-arnon-shimoni... · Leveraging Customer Behavioral Data to Drive Revenue](https://reader034.vdocuments.net/reader034/viewer/2022042022/5e79cff2a23e160c6671e810/html5/thumbnails/4.jpg)
4@arnon86 S7456
SQream DB – an SQL database powered by GPUs
Fast• Columnar storage • Always on compression• 2 TB / hour / GPU ingest speed
Scalable• 10 TB to 1 PB with ease
SQL Database• Familiar ANSI SQL• Standard connectors (ODBC, JDBC)
Extensible for AI• Python, Jupyter, etc• Data science
Powered by GPUs• Massively parallel engine• Relies on GPUs for power, not RAM
</>
![Page 5: Leveraging Customer Behavioral Data to Drive Revenueon-demand.gputechconf.com/gtc/2017/presentation/s7456-arnon-shimoni... · Leveraging Customer Behavioral Data to Drive Revenue](https://reader034.vdocuments.net/reader034/viewer/2022042022/5e79cff2a23e160c6671e810/html5/thumbnails/5.jpg)
5@arnon86 S7456
This story starts at MWC last yearThat’s my ear!
![Page 6: Leveraging Customer Behavioral Data to Drive Revenueon-demand.gputechconf.com/gtc/2017/presentation/s7456-arnon-shimoni... · Leveraging Customer Behavioral Data to Drive Revenue](https://reader034.vdocuments.net/reader034/viewer/2022042022/5e79cff2a23e160c6671e810/html5/thumbnails/6.jpg)
SQream knows telecoms
We’ve helped operators with
• Better analysis of network events
• Speeding up CDR preparations
• More history with security management (SIEM)
• And now – customer behaviour
![Page 7: Leveraging Customer Behavioral Data to Drive Revenueon-demand.gputechconf.com/gtc/2017/presentation/s7456-arnon-shimoni... · Leveraging Customer Behavioral Data to Drive Revenue](https://reader034.vdocuments.net/reader034/viewer/2022042022/5e79cff2a23e160c6671e810/html5/thumbnails/7.jpg)
7@arnon86 S7456
There is a lot of data about customers in telecoms
• Where and when they wake up and where they spend their days(daily grinders)
• When/where were they were Instagramming(When and where data was used)
• How frustrated they got(what the network experience was in each location)
• What modes of transport they use
• How close they are to competitor locations
But are they actually using this data? Are they getting anything actionable?
Are they looking at the entire customer base, and not just a single customer?
![Page 8: Leveraging Customer Behavioral Data to Drive Revenueon-demand.gputechconf.com/gtc/2017/presentation/s7456-arnon-shimoni... · Leveraging Customer Behavioral Data to Drive Revenue](https://reader034.vdocuments.net/reader034/viewer/2022042022/5e79cff2a23e160c6671e810/html5/thumbnails/8.jpg)
8@arnon86 S7456
“You know, Telefonica has this multi-million dollar product based on Hadoop for selling this customer behaviour data to 3rd party companies.
Have you thought about maybe getting the same solution for your company, but much simpler?”
![Page 9: Leveraging Customer Behavioral Data to Drive Revenueon-demand.gputechconf.com/gtc/2017/presentation/s7456-arnon-shimoni... · Leveraging Customer Behavioral Data to Drive Revenue](https://reader034.vdocuments.net/reader034/viewer/2022042022/5e79cff2a23e160c6671e810/html5/thumbnails/9.jpg)
9@arnon86 S7456
“Oh, and we’ll do it for you with a single machine”
![Page 10: Leveraging Customer Behavioral Data to Drive Revenueon-demand.gputechconf.com/gtc/2017/presentation/s7456-arnon-shimoni... · Leveraging Customer Behavioral Data to Drive Revenue](https://reader034.vdocuments.net/reader034/viewer/2022042022/5e79cff2a23e160c6671e810/html5/thumbnails/10.jpg)
10@arnon86 S7456
Why their current setup wasn’t good enough for this
• Data scientists and BI professionals have only short windows of time to run queries, because of overloaded systems
• Windows cut even shorter due to long overnight loading
• Queries take hours, and iterations become painful
Long queries Coffee breaks Bathroom breaks Unhappy managers Unhappy everyone
![Page 11: Leveraging Customer Behavioral Data to Drive Revenueon-demand.gputechconf.com/gtc/2017/presentation/s7456-arnon-shimoni... · Leveraging Customer Behavioral Data to Drive Revenue](https://reader034.vdocuments.net/reader034/viewer/2022042022/5e79cff2a23e160c6671e810/html5/thumbnails/11.jpg)
11@arnon86 S7456
Databases that displease data scientists
• When data scientists or BI professionals want to ask questions that no one has asked before, these systems tend to ‘break’ and not deliver what’s expected
• They’re just not designed for ad-hoc querying
• Legacy databases require indexing and a lot of manual tuning
• Newer databases like Vertica also require creating projections, which is time-consuming and inflexible
• Distributed databases don’t perform well when JOIN operations are necessary
• In-memory databases are very painful on the wallet if you need more than a couple of terabytes
![Page 12: Leveraging Customer Behavioral Data to Drive Revenueon-demand.gputechconf.com/gtc/2017/presentation/s7456-arnon-shimoni... · Leveraging Customer Behavioral Data to Drive Revenue](https://reader034.vdocuments.net/reader034/viewer/2022042022/5e79cff2a23e160c6671e810/html5/thumbnails/12.jpg)
12@arnon86 S7456
Picking the wrong databases will cause pain!
Just some of what we saw• Cloudera – for the BI team• Teradata – for the marketing team• Oracle Exadata – Transactional - for CDR collection and customer records• Vertica, Netezza – for financial• Lots of Greenplum – to collect from many sources, for marketing and BI
![Page 13: Leveraging Customer Behavioral Data to Drive Revenueon-demand.gputechconf.com/gtc/2017/presentation/s7456-arnon-shimoni... · Leveraging Customer Behavioral Data to Drive Revenue](https://reader034.vdocuments.net/reader034/viewer/2022042022/5e79cff2a23e160c6671e810/html5/thumbnails/13.jpg)
13@arnon86 S7456
Chanel says racks are fashionable. Our customers think otherwise
![Page 14: Leveraging Customer Behavioral Data to Drive Revenueon-demand.gputechconf.com/gtc/2017/presentation/s7456-arnon-shimoni... · Leveraging Customer Behavioral Data to Drive Revenue](https://reader034.vdocuments.net/reader034/viewer/2022042022/5e79cff2a23e160c6671e810/html5/thumbnails/14.jpg)
14@arnon86 S7456
SQream DB softwarein a standard 2U server
Configured with 96GB RAM and a single Tesla K80
for a $4,000 total investment.
Designed to handle ~40 TB of telecom data
![Page 15: Leveraging Customer Behavioral Data to Drive Revenueon-demand.gputechconf.com/gtc/2017/presentation/s7456-arnon-shimoni... · Leveraging Customer Behavioral Data to Drive Revenue](https://reader034.vdocuments.net/reader034/viewer/2022042022/5e79cff2a23e160c6671e810/html5/thumbnails/15.jpg)
15@arnon86 S7456
Sample dashboards generatedDashboard showing 3G/4G data throughput throughout the day (Morning, Lunch, Evening, Night, …).Larger circles represent more data throughput.
Colour becomes darker as the day progresses.Dark-outline circles mean more night-time traffic.
Dashboard aggregates directly off SQream DB, with no intermediate steps.
Represents 3 table join(3.3B rows ⋈ 40M rows ⋈ 300K rows)
![Page 16: Leveraging Customer Behavioral Data to Drive Revenueon-demand.gputechconf.com/gtc/2017/presentation/s7456-arnon-shimoni... · Leveraging Customer Behavioral Data to Drive Revenue](https://reader034.vdocuments.net/reader034/viewer/2022042022/5e79cff2a23e160c6671e810/html5/thumbnails/16.jpg)
16@arnon86 S7456
Sample dashboards generatedDashboard showing 3G/4G data throughput throughout the day (Morning, Lunch, Evening, Night, …).Larger circles represent more data throughput.
Colour becomes darker as the day progresses.Dark-outline circles mean more night-time traffic.
Dashboard aggregates directly off SQream DB, with no intermediate steps.
Represents 3 table join(3.3B rows ⋈ 40M rows ⋈ 300K rows)
![Page 17: Leveraging Customer Behavioral Data to Drive Revenueon-demand.gputechconf.com/gtc/2017/presentation/s7456-arnon-shimoni... · Leveraging Customer Behavioral Data to Drive Revenue](https://reader034.vdocuments.net/reader034/viewer/2022042022/5e79cff2a23e160c6671e810/html5/thumbnails/17.jpg)
17@arnon86 S7456
Saving hours on reporting with SQream DBAugmenting legacy MPP with a faster, easier to use GPU-powered analytics database
CDR 4G
CDR 3G
Non CDR Dozens of Reports
AggregationsETL Process
80 node
5 hours
Da
ta S
ou
rce
s
Direct Loading, 2TB/h ingest rate
20 minutes with SQream DB
15x faster
![Page 18: Leveraging Customer Behavioral Data to Drive Revenueon-demand.gputechconf.com/gtc/2017/presentation/s7456-arnon-shimoni... · Leveraging Customer Behavioral Data to Drive Revenue](https://reader034.vdocuments.net/reader034/viewer/2022042022/5e79cff2a23e160c6671e810/html5/thumbnails/18.jpg)
The cost of performance
80 nodes – 5 full racks960 CPU cores, 5.12 TB RAM
SQream DB v1.9.6
HP DL380g9 with NVIDIA Tesla K8096 GB RAM + 6 TB storage
$$$10,000,000
120 m
300 m 20 m
10 m
$200,000
ETL time15x faster
Reporting time12x faster
TCO w/license50x more cost
effective
![Page 19: Leveraging Customer Behavioral Data to Drive Revenueon-demand.gputechconf.com/gtc/2017/presentation/s7456-arnon-shimoni... · Leveraging Customer Behavioral Data to Drive Revenue](https://reader034.vdocuments.net/reader034/viewer/2022042022/5e79cff2a23e160c6671e810/html5/thumbnails/19.jpg)
33.70
56
4.0
12,000,000
That wasn’t an anomalyWe’ve done it against Netezza, Teradata, Oracle, Vertica, and even Hadoop based systems.
31.70
4
4.7
500,000
Netezza
8 full 42U racks, 56 S-Blades7 TB RAM
SQream DB v1.9.7
Dell C4130 with 4x NVIDIA Tesla K80512 GB RAM + iSCSI JBOD (20TB)
Average query time(seconds)
Processing Units(S-Blade / GPUs)
Compression ratio
Cost of Ownership $$
![Page 20: Leveraging Customer Behavioral Data to Drive Revenueon-demand.gputechconf.com/gtc/2017/presentation/s7456-arnon-shimoni... · Leveraging Customer Behavioral Data to Drive Revenue](https://reader034.vdocuments.net/reader034/viewer/2022042022/5e79cff2a23e160c6671e810/html5/thumbnails/20.jpg)
Find out more about SQream’s high performance
GPU-driven database software
www.sqream.comor [email protected]