Transcript
Page 1: ScaleOut hServerv2: Enabling Real-Time Analytics Using Hadoop Map/Reduce

Enabling Real-Time Analytics Using Hadoop Map/Reduce

Copyright © 2013 by ScaleOut Software, Inc.

Briefing on New Product Release: ScaleOut hServer™ V2

October 14, 2013

Bill Bain, CEO ([email protected]) David Brinker, COO ([email protected])

Page 2: ScaleOut hServerv2: Enabling Real-Time Analytics Using Hadoop Map/Reduce

2 ScaleOut Software, Inc.

ScaleOut hServer V2:

•  World’s first Hadoop MapReduce engine integrated with a scalable, in-memory data grid

•  Full Hadoop MapReduce support for “live” fast-changing data

•  20x performance improvement in benchmark tests

•  Significant new technology to simplify development and maximize ease of use

What’s New Today

Page 3: ScaleOut hServerv2: Enabling Real-Time Analytics Using Hadoop Map/Reduce

3 ScaleOut Software, Inc.

•  Develops and markets software middleware for: •  Scaling application performance and •  Performing real-time analytics using •  In-memory data storage and computing

•  Executive Team:

•  Dr. William Bain, Founder & CEO

•  Career focused on parallel computing – Bell Labs, Intel, Microsoft

•  3 prior start-ups, last acquired by Microsoft and product now ships as Network Load Balancing in Windows Server

•  David Brinker, COO

•  25 years software business and executive management experience

•  Mentor Graphics, Cadence, Webridge

•  Eight years market experience in Windows & Linux; 400 customers

About ScaleOut Software

Page 4: ScaleOut hServerv2: Enabling Real-Time Analytics Using Hadoop Map/Reduce

4 ScaleOut Software, Inc.

•  ScaleOut StateServer®

•  In-Memory Data Grid for Windows and Linux

•  Scales application performance.

•  Industry-leading performance and ease of use

•  ScaleOut GeoServer® adds •  WAN based data replication for DR •  Breakthrough technology for global

data access

•  ScaleOut Analytics Server® adds •  Real-time data analysis for “live” data

•  Comprehensive management tools

•  Introducing ScaleOut hServer™ V2 •  Full Hadoop Map/Reduce engine (20X faster*) •  Hadoop Map/Reduce on live, in-memory data

ScaleOut Software Products ScaleOut StateServer In-Memory Data Grid

GridService

GridService

GridService

GridService

*in benchmark testing

Page 5: ScaleOut hServerv2: Enabling Real-Time Analytics Using Hadoop Map/Reduce

5 ScaleOut Software, Inc.

ScaleOut Analytics Server stores and analyzes “live” data:

•  In-memory storage holds live data sets which are continuously updated and accessed within operational systems. •  Examples: stock ticker data, business rules, order & inventory data

•  Integrated analytics engine tracks important patterns & trends.

•  Data-parallel analysis delivers results in msec. to seconds.

IMDGs Perform Real-Time Analytics

Page 6: ScaleOut hServerv2: Enabling Real-Time Analytics Using Hadoop Map/Reduce

6 ScaleOut Software, Inc.

Integrate analysis into a stock trading platform:

•  The IMDG holds market data and hedging strategies.

•  Updates to market data continuously flow through the IMDG.

•  The IMDG performs repeated map/reduce analysis on hedging strategies and alerts traders in real time.

•  IMDG automatically and dynamically scales its throughput to handle new hedging strategies by adding servers.

Example in Financial Services

Page 7: ScaleOut hServerv2: Enabling Real-Time Analytics Using Hadoop Map/Reduce

7 ScaleOut Software, Inc.

Example Uses

Online loan apps & banking

Portfolio management

Trading systems

Reservations systems

Ecommerce shopping

Customer service sites

Streaming entertainment

Configuration engines

Gaming

Customers •  400 unique customers •  35 Fortune 500 customers •  32 countries •  9,000 servers licensed •  50% have multiple deployments

% in $$s

Entertain.)&)Commun.

13%Financial)&)Insurance

26%Ecommerce)

Sales17%

Ecommerce)Services19%

Travel)&)Transport.

4%

Gov't)&)Education

10%

Software8%

Other3%

Page 8: ScaleOut hServerv2: Enabling Real-Time Analytics Using Hadoop Map/Reduce

8 ScaleOut Software, Inc.

•  In-Memory Data Grids have become key in several fast-growth markets.

•  Drivers:

•  Cloud computing / virtualization

•  Hardware enablement

•  Competitive pressure

•  Exploding workloads

•  Big data analysis

•  ScaleOut addresses scalability and analytics.

IMDGs Seeing Wide Adoption

Sources: 1 Wikibon 2013 2 Gartner 2010, rolled fwd to 2013 3 Market Research Media 2015 rolled back to 2013 4. Gartner 2011 rolled fwd to 2013

Big Data Analytics $18B 1

Enterprise Software

$292B 2

HPC / Grid Computing

$25B 3

In-Memory Data Grids

$355M 4

Page 9: ScaleOut hServerv2: Enabling Real-Time Analytics Using Hadoop Map/Reduce

9 ScaleOut Software, Inc.

Big Data Analytics $18B

Analytics Market

Static data sets Petabytes Disk storage Hours to minutes Best uses:

•  Analyzing warehoused data

•  Mining for long-term trends

Live data sets Gigabytes to terabytes In-memory storage Minutes to seconds Best uses:

•  Tracking live data

•  Immediately identifying trends and capturing opportunities

Analytics Server

hServer

Hadoop IBM

Teradata SAS SAP

Real-Time Batch

Real-time “Operational Intelligence”

Batch “Business Intelligence”

Page 10: ScaleOut hServerv2: Enabling Real-Time Analytics Using Hadoop Map/Reduce

10 ScaleOut Software, Inc.

Run continuous Hadoop on live data, while it’s being updated.

Accelerate Hadoop on static data with a one line code change.

Quickly prototype Hadoop code.

ScaleOut hServer Targeted Use Cases

“Capture perishable business opportunities and identify issues.” Real-time risk

analysis Credit card fraud

detection

“Speed-up Hadoop execution by >10X for faster business insights.”

Process simulations

Financial modeling

“Validate your Hadoop code before it goes into batch processing.”

Fast-turn debug and tuning

No need to install Hadoop stack

...

...

...

Page 11: ScaleOut hServerv2: Enabling Real-Time Analytics Using Hadoop Map/Reduce

11 ScaleOut Software, Inc.

•  Typically used for very large, static, offline datasets

•  Data must be copied from disk-based storage (e.g., HDFS) into memory for analysis.

•  Hadoop Map/Reduce adds lengthy batch scheduling overhead.

Problem: Hadoop Cannot Efficiently Perform Real-Time Analytics

Page 12: ScaleOut hServerv2: Enabling Real-Time Analytics Using Hadoop Map/Reduce

12 ScaleOut Software, Inc.

Benefits:

•  Enables real-time analysis using Hadoop M/R APIs. •  Accelerates data access by staging data in memory.

•  Eliminates batch scheduling and data shuffling overheads of standard Hadoop distributions.

•  Analyzes “live” data.

•  Allows Hadoop M/R programs to run without change.

•  Eliminates complexity in Hadoop deployment.

•  Enables rapid prototyping.

Solution: Integrate Hadoop M/R into In-Memory Data Grid

Page 13: ScaleOut hServerv2: Enabling Real-Time Analytics Using Hadoop Map/Reduce

13 ScaleOut Software, Inc.

Enables Hadoop Map/Reduce to perform real-time analysis:

•  Adds full Map/Reduce engine to SOAS IMDG. •  Delivers results in msec. to seconds instead of

minutes or hours. •  Benchmark results show 20X speedup.

•  Has flexible options for data storage/access: •  Hadoop programs can access/store

key/value pairs using either IMDG or HDFS.

•  Automatically caches HDFS data in IMDG for fast access.

•  Allows dynamic updates to key/value pairs during analysis to support “live” data.

•  Ships as open source Java library combined with SOAS IMDG.

Introducing ScaleOut hServer™ V2

Page 14: ScaleOut hServerv2: Enabling Real-Time Analytics Using Hadoop Map/Reduce

14 ScaleOut Software, Inc.

•  ScaleOut hServer adds Grid Record Reader for accessing key/value pairs held in the IMDG.

•  Hadoop programs optionally can output results to IMDG with Grid Record Writer.

•  Grid Record Reader optimizes access to key/value pairs to eliminate network overhead.

•  Applications can access and update key/value pairs as operational data during analysis.

Enabling Access to IMDG Data

Page 15: ScaleOut hServerv2: Enabling Real-Time Analytics Using Hadoop Map/Reduce

15 ScaleOut Software, Inc.

•  ScaleOut hServer adds Dataset Record Reader (wrapper) to cache HDFS data during program execution.

•  Hadoop automatically retrieves data from ScaleOut IMDG on subsequent runs.

•  Dataset Record Reader stores and retrieves data with minimum network and memory overheads.

•  Tests with Terasort benchmark have demonstrated 11X faster access latency over HDFS without IMDG.

Enabling Fast Access to HDFS Data

Page 16: ScaleOut hServerv2: Enabling Real-Time Analytics Using Hadoop Map/Reduce

16 ScaleOut Software, Inc.

ScaleOut hServer Editions

•  Offered in community and commercial editions

•  Community Edition can be used for evaluation or production

•  Hybrid open source / proprietary licensing

Editions

Community Commercial

# Servers Up to 4 100s

Expected data set size

256GB (max) GB - TBs

Pricing Free Subscription & perpetual

Support Community Forum

Full support

Page 17: ScaleOut hServerv2: Enabling Real-Time Analytics Using Hadoop Map/Reduce

17 ScaleOut Software, Inc.

•  IMDGs help scale application performance and analyze “live” data in real-time.

•  Hadoop focuses on analyzing large, static (offline) datasets held in file systems.

•  ScaleOut hServer V2 introduces breakthrough technology enabling Hadoop applications to perform real-time analytics: •  Integrates Hadoop Map/Reduce engine with SOAS’s IMDG.

•  Accelerates Map/Reduce execution by 20X in benchmark tests.

•  Enables Hadoop applications to analyze “live,” in-memory data.

•  Offers flexible access to both in-memory and file-based data.

•  Eliminates complex Hadoop deployment and tuning.

•  Offers a fast, easy-to-use platform for rapid prototyping.

Summary

Page 18: ScaleOut hServerv2: Enabling Real-Time Analytics Using Hadoop Map/Reduce

18 ScaleOut Software, Inc.

A few examples: •  Equity trading: to minimize risk during a trading day •  Ecommerce: to optimize real-time shopping activity •  Reservations systems: to identify issues, reroute, etc. •  Credit cards: to detect fraud in real time •  Smart grids: to optimize power distribution & detect issues

Online Systems Need Real-Time Analysis

Page 19: ScaleOut hServerv2: Enabling Real-Time Analytics Using Hadoop Map/Reduce

19 ScaleOut Software, Inc.

•  ScaleOut Software conducted informal survey at Strata 2013 Conference (Santa Clara).

•  Based on 150 responses:

•  78% of organizations generate fast-changing data.

•  60% use Hadoop and 78% plan to expand usage of Hadoop within 12 months.

•  Only 42% consider Hadoop to be an effective platform for real-time analysis, but…

•  93% would benefit from real-time data analytics.

•  71% consider a 10X improvement in performance meaningful.

•  Take-away: Hadoop users need real-time analytics.

Hadoop Users Need Real-Time Analytics


Top Related