gpu-accelerated analytics enable new enterprise solutions

16
GPU-Accelerated Analytics Enable New Enterprise Solutions Mark Johnson, Business Development November 14, 2016

Upload: kinetica

Post on 15-Jan-2017

129 views

Category:

Software


0 download

TRANSCRIPT

Page 1: GPU-Accelerated Analytics Enable New Enterprise Solutions

GPU-Accelerated Analytics Enable New Enterprise Solutions

Mark Johnson, Business DevelopmentNovember 14, 2016

Page 2: GPU-Accelerated Analytics Enable New Enterprise Solutions

Evolution of Data Processing

2

DATA WAREHOUSE

RDBMS and Data Warehouse technologies enable organizations to store and analyze growing volumes of data on high performance machines, but at high cost.

DISTRIBUTED STORAGE AFFORDABLE IN-MEMORY

GPU-ACCELERATED COMPUTE

Hadoop and MapReduce enable distributed storage and processing across multiple machines.

Storing massive volumes of data becomes more affordable, but performance is slow.

Affordable memory allows for faster data read and write. HBase, HANA, and MemSQL provide faster analytics.

At scale compute processing now becomes the bottleneck.

GPU cores bulk process tasks in parallel–far more efficient for compute-intensive tasks than CPUs.

1990 - 2000s 2005… 2010… 2016…

Page 3: GPU-Accelerated Analytics Enable New Enterprise Solutions

GPU Acceleration Overcomes Processing Bottlenecks

3

4,000+ cores per device in many cases, versus 8 to 32 cores per

typical CPU-based device.

High performance computing trend to using GPUs to solve

massive processing challengesGPU acceleration brings high performance compute to commodity hardware

Parallel processing is ideal for scanning entire dataset & brute force compute.

GPUs are designed around thousands of small, efficient cores that are well suited to performing repeated similar instructions in parallel. This makes them well-suited to the compute-intensive workloads required of large data sets.

Presenter
Presentation Notes
Acclerated by NVIDIA
Page 4: GPU-Accelerated Analytics Enable New Enterprise Solutions

Wins IDC HPC Innovation Excellence Award for work with US Postal Service.

Kinetica Background

2009 2012

United States Army Intelligence seeks a means to assess terrorist and other national security threats.

No database in the market was fast or flexible enough to met their needs.

Founders Amit Vij and Nima Negahban start on the pioneering use of GPUs while building a GPU-accelerated database from the ground up.

2014

Commercialization entered production with USPS.

2016

Rebranded to Kinetica. Seed funding. Moved HQ to San Francisco. Expanded management team. Hired field team.

Wins IDC HPC Innovation Excellence Award for work with US Army.

GPUdb goes live with the US Army Intelligence.

Patent granted for “Method and system for improving computational concurrency using a multi-threaded GPU calculation engine”

42

Presenter
Presentation Notes
Created to Identify Terrorist Threats in Real Time Kinetica incubated in 2009 as a massively-parallel computational engine for US Army INSCOM Ingests 200+ sources of streaming data Incorporates geospatial and temporal data Real-time, actionable threat intelligence First high-performance database to leverage the power of GPUs (deployed 2012)
Page 5: GPU-Accelerated Analytics Enable New Enterprise Solutions

GPU-accelerated database operations

Natural Language Processing based full-text search

Native GIS & IP address object

support

Real-time data handlers to ingest

structured and unstructured data

Deep integration with open source and commercial

frameworks and applications: Hadoop, Spark, NiFi, Accumulo,

H20, Tableau, Kibana and Caravel

Distributed visualization pipeline built-In

Kinetica: A Distributed, In-Memory Database

Predictable scale out for data ingestion and

querying

No typical tuning, indexing, and

tweaking

Page 6: GPU-Accelerated Analytics Enable New Enterprise Solutions

Enabling New Enterprise Solutions

6

Logistics, Fleet OptimizationTracking carrier movements in real time, to reallocate resources on the fly based upon personnel, environmental and seasonal data. 200,000+ USPS devices emit location once every minute, so > ¼ billion events captured and analyzed daily, with several times that amount available in a trailing window.

CybersecurityFusing multiple data feeds with real-time anomaly detection to protect its financial clients against current and emerging cyber threats.

Utility: Smart GridGeospatially visualizing smart grid while ingesting real-time vector data streaming from smart meters. Striving for optimized energy generation and uptime based on fluctuating usage patterns and unpredictable natural disasters.

Real-time Threat IntelligenceIngests 200+ sources of streaming data, producing 200B new records per day; incorporates geo-spatial and temporal data to identify terrorist threats in real time.

Presenter
Presentation Notes
US ARMY again mention USPS is the single largest logistic entity in the country, moving more individual items in four hours than the combination of UPS, FedEx, and DHL move all year. Distributed analysis - USPS’ parallel cluster is able to serve up to 15,000 simultaneous sessions, providing the service’s managers and analysts with the capability to instantly analyze their areas of responsibility via dashboards and to query it as if it were a relational database. At scale With 200,000 USPS devices emitting location once every minute, that amounts to more than a quarter billion events captured and analyzed daily tracked on 10 nodes. Ironnet several major banks, and Sis PG&E - Data silos between gas, electric, distribution/transportation, fiber vs land-based.We can consolidate feeds, fuse analytics, proactive on utility data, prevent crisis, full real time visibility on workforce, meters, grids. �
Page 7: GPU-Accelerated Analytics Enable New Enterprise Solutions

Impressive Performance vs. HANA

9

IN-MEMORY KNOCKOUT

30 nodes Kinetica cluster tested against 100+ node HANA cluster holding 3 years of retail data (150b rows) with a range of queries SELECT

GROUP BY (1)

GROUP BY (2)

0 5 10 15 20 25 30 35 40 45 50

Kinetica HANA

Source: Retail customer data for 10 most complex SQL queries that were being run on HANA, 2016

Time (s)

JOIN

GROUP BY

Page 8: GPU-Accelerated Analytics Enable New Enterprise Solutions

Enabling New Enterprise Solutions

10

Retail: Customer 360/customer sentiment, supply chain optimizationCorrelating data from point of sales (POS) systems, social media streams, weather forecasts, and even wearable devices. Better able to track inventory in real time, enabling efficient replenishment and avoiding out-of-stock situations.

Powering High Performance “Analytics as a Service” SolutionDelivering customer-focused services by leveraging all available transactional data. Currently no ability for business user to do customized analytics; IT has to. Query response times taking 10s of minutes, some over 2 hours, thus limiting ability to analyze and use data.

Fin servicesLarge scale risk aggregations and billion+ row joins in sub-second time (5TB+ tables choke on RDBMS joins and Hadoop is too slow).

Manufacturing IoTLive streaming analytics on component functionality to ensure safety (avoid failures) and validate warranty claims .

Ridesharing/Connected CarsView all passengers and drivers to monitor behavioral analytics. Sensor enabled predictive maintenance on connected cars.

Presenter
Presentation Notes
Use cases in play include: Real-time sentiment analysis Real-time anomaly detection to prevent fraud Reallocate resources on the fly based upon personnel, environmental and seasonal data Track terrorist and other national security threats in real time Optimize energy generation and uptime based on fluctuating usage patterns and unpredictable natural disasters Improve customer engagements by correlating data from point of sales (POS) systems, social media streams, weather forecasts, and other data. Track inventory in real-time, enabling efficient replenishment and avoiding out-of-stock situations Connected home, connected car, connected cities More rapid query response time and data discovery capabilities for end users Faster iterations to risk modeling and modeling optimization
Page 9: GPU-Accelerated Analytics Enable New Enterprise Solutions

Distributed Visualization PipelineKinetica native visualization ideal for fast moving, location-based IoT data.

Improving visual rendering of data–particularly geospatial and temporal data.

• Visualizes billions of data points • Updates in real time• Delivers full gamut of geospatial visualization• Native GIS & IP address object support

13

Presenter
Presentation Notes
renderers such as feature, class break, heat map, and geo-fencing
Page 10: GPU-Accelerated Analytics Enable New Enterprise Solutions

Efficiency = SavingsHardware costs and footprint are a fraction of standard in-memory databases.

12

Presenter
Presentation Notes
Let’s get some real-life facts in here
Page 11: GPU-Accelerated Analytics Enable New Enterprise Solutions

Availability

13

Pricing$300K/TB/year

Certified to run on premise with: Or in the cloud: Accelerated by:

Coming soon:

Page 12: GPU-Accelerated Analytics Enable New Enterprise Solutions

Thank you

Page 13: GPU-Accelerated Analytics Enable New Enterprise Solutions

Retail - Redesign Business Without Redesigning Your Data

Multidimensional query on “Organic” with outgoing sales on map

Outgoing sales of organic products in Texas is far less compared to California and Florida

CASE STUDY

Presenter
Presentation Notes
This search enabled us to go across the current attributes of data tables of departments that were probably created several decades ago. See results of “organic” in multiple departments that include Produce, Dairy, Grocery Dry Goods, and even Lawn and Garden. Out of the 5.1 Billion transactions in the test cluster, we conducted a text search on the word “organic” which yielded 3.2 million transactions with the word ”organic” mentioned somewhere in the metadata. We followed this by viewing the Outbound Activity Geo Comparison Report.
Page 14: GPU-Accelerated Analytics Enable New Enterprise Solutions

US Army INSCOM

US Army’s in-memory computational engine for any data with a geospatial or temporal attribute for a major joint cloud initiative within the Intelligence Community (IC ITE).

Intel analysts are able to conduct near real-time analytics and fuse SIGINT, ISR, and GEOINT streaming big data feeds and visualize in a web browser.

First time in history military analysts are able to query and visualize billions to trillions of near real-time objects in a production environment.

Major executive military and congressional visibility.

Oracle Spatial (92 Minutes)

42x Lower Space28x Lower Cost38x Lower Power Cost

U.S Army INSCOM Shift from Oracle to GPUdb

GPUdb(20ms)

1 GPUdb server vs 42 servers with Oracle 10gR2 (2011)

CASE STUDY

Presenter
Presentation Notes
Game changer for military; first time in history they can fuse UAV live track data with SIGINT data to target on the fly with high accuracy 1 server with GPUdb can replace 42 server with Oracle 10gR2. 42 servers doing a geospatial query took 92 minutes; same data, same query 1 GPUdb server took less than 1 second
Page 15: GPU-Accelerated Analytics Enable New Enterprise Solutions

Real-time route optimization

DISTRIBUTED ANALYSISUSPS’ parallel cluster is able to serve up to 15,000 simultaneous sessions, providing the service’s managers and analysts with the capability to instantly analyze their areas of responsibility via dashboards.

AT SCALEWith 200,000 USPS devices emitting location once every minute, that amounts to more than a quarter billion events captured and analyzed daily… tracked on 10 nodes.

USPS is the single largest logistic entity in the country, moving more individual items in four hours than the combination of UPS, FedEx, and DHL move all year.

CASE STUDY

Page 16: GPU-Accelerated Analytics Enable New Enterprise Solutions

View Mark Johnson’s presentation at SC16 on YouTube:https://www.youtube.com/watch?v=OdoSSfKwH5A

18