complex analytics with nosql data store in real time

33
Complex Analytics with NoSQL Data Store in Real Time Nested Queries, Projection, Transactions and more Nati Shalom @natishalom slideshare.net/giganati

Upload: nati-shalom

Post on 18-Dec-2014

689 views

Category:

Technology


0 download

DESCRIPTION

NOSQL are often limited in the type of queries that they can support due to the distributed nature of the data. In this session we would learn patterns on how we can overcome this limitation and combine multiple query semantics with NoSQL based engines. We will demonstrate specifically a combination of key/value, SQL like, Document model and Graph based queries as well as more advanced topic such as handling partial update and query through projection. We will also demonstrate how we can create a meshaup between those API's i.e. write fast through Key/Value API and execute complex queries on that same data through SQL query. - See more at: http://nosql2014.dataversity.net/sessionPop.cfm?confid=81&proposalid=6335#sthash.PNSZi5TJ.dpuf

TRANSCRIPT

Page 1: Complex Analytics with NoSQL Data Store in Real Time

Complex Analytics with NoSQL Data Store in Real Time

Nested Queries, Projection, Transactions and more

Nati Shalom@natishalom slideshare.net/giganati

Page 2: Complex Analytics with NoSQL Data Store in Real Time

What were here to discuss?

Making Sense of the Exploding Data World

How that World Could Look Like if Disk is no Longer the Bottleneck

Live Demo

Page 3: Complex Analytics with NoSQL Data Store in Real Time

Making Sense of The Exploding Data World

Page 4: Complex Analytics with NoSQL Data Store in Real Time

GB

TB

PB

Dat

a Vo

lum

e

Yr Mo Day Hr Min Sec MS µS

Data MiningMachine Learning

Data Velocity

Data Warehouse High Throughput OLTP

Operational Intelligence

Exploratory Analytics

OLTP

Business Intelligence

Streaming

Capacity and Performance Drives New Data Management Technologies

Page 5: Complex Analytics with NoSQL Data Store in Real Time

Let’s Look at Tradeoffs of

Some Selected Solutions

Page 6: Complex Analytics with NoSQL Data Store in Real Time

SQL Queries

• Query: SQL • Semantics:

• CRUD• Aggregation• Projection• Partial update

• Performance: 100’s/Sec • Consistency: Transactional• Scaling: Mostly Scale-UP• Availability: Disk Based

Page 7: Complex Analytics with NoSQL Data Store in Real Time

NoSQL• Query: Proprietary but rich• Semantics:

• CRUD• Limited Aggregation

(Map/Reduce)• No Projection• No Partial update

• Performance: 1000s/Sec • Consistency: Eventual • Scaling: Mostly Scale-Out• Availability: Based on replication

Page 8: Complex Analytics with NoSQL Data Store in Real Time

IMDG • Query: Propriety but rich• Semantics:

• CRUD• Aggregation API +

Map/Reduce• Projection (GigaSpaces)• Partial Update

(GigaSpaces)• Performance: 100k/sec• Consistency: Transactional • Scaling: Mostly Scale-Out• Availability: Replication

Page 9: Complex Analytics with NoSQL Data Store in Real Time

Key/Value

• Query: Key, Value• Semantics:

• Mostly Read• No Aggregation• No Projection• No Partial update

• Performance: 1M’s/sec • Consistency: Atomic• Scaling: Mostly Scale-Out• Availability: Limited (varies quite substantially between implementations)

Page 10: Complex Analytics with NoSQL Data Store in Real Time

Stream Processing (Storm)

• Semantics– Event driven data processing

• Used for continues updates– No need for a costly “SELECT

FOR UPDATE”

• Performance: 10’sM/sec updates

Spouts

Bolt

Page 11: Complex Analytics with NoSQL Data Store in Real Time

Common Assumption

Disk is the bottleneck

2010

Perf

orm

ance

1̂0

2000 2020

CPU Perform

ance = 100X PER DECADE

HDD Latency (Seek & Rotate) = Little Improvement

100X

10,000X

Source: GigaOM Research

Page 12: Complex Analytics with NoSQL Data Store in Real Time

Capacity and Performance Drives New Data Management Technologies

(Source: IDC, 2013)

Big Data (Hadoop)

NoSQL

In Memory, Stream Processing

RDBMS

Page 13: Complex Analytics with NoSQL Data Store in Real Time

There’s No One Size Fits All

Page 14: Complex Analytics with NoSQL Data Store in Real Time

A Typical App Looks Like This..

Front End Analytics

RT

Batch

STORM

The Data Flow Complexity

Page 15: Complex Analytics with NoSQL Data Store in Real Time

What if Disk Was no Longer the Bottleneck?

FLASH Closes the CPU to Storage Gap

Page 16: Complex Analytics with NoSQL Data Store in Real Time

Our Application Cloud Look Like This..

Front End

High Speed Data Store

(Using Flash/NVM)

Key/Value

SQL

Document

Graph

Transactional

Map/Reduce

Disk Becomes the new Tape

StreamBase

Common Data Store servingMultiple Semantics/API

Page 17: Complex Analytics with NoSQL Data Store in Real Time

We're not there yet ..

But..

Page 18: Complex Analytics with NoSQL Data Store in Real Time

We can use High Speed Data Bus for Integrating All of our Data Sources

Front End Analytics

RT

Batch

STORM

High Speed Data Bus(Built-In

Caching)

RT Transactional Data Access

Direct Access

RT Streaming

Hadoop Synch

MySQL Synch

Mongo Synch

Page 19: Complex Analytics with NoSQL Data Store in Real Time

High Speed Data Bus (Zoom In)

Page 20: Complex Analytics with NoSQL Data Store in Real Time

Designed for Transactional and Analytics Scenarios..

Homeland Security

Real Time Search

Social

eCommerce

User Tracking & Engagement

Financial Services

Page 21: Complex Analytics with NoSQL Data Store in Real Time

Many API’s – Same Data

Key/Value SQL Document Graph TransactionalMap/Reduce

Page 22: Complex Analytics with NoSQL Data Store in Real Time

Let’s take a closer look..

Page 23: Complex Analytics with NoSQL Data Store in Real Time

Nested Queries & Projections

Page 24: Complex Analytics with NoSQL Data Store in Real Time

Aggregations.

Page 25: Complex Analytics with NoSQL Data Store in Real Time

Fast Update …

Remains with strong consistency!

Page 26: Complex Analytics with NoSQL Data Store in Real Time

Transactions support

Page 27: Complex Analytics with NoSQL Data Store in Real Time

- 1KB object size and uniform distribution- 2 sockets 2.8GHz CPU with total 24 cores, CentOS 5.8, 2 FusionIO SLC PCIe cards RAID- YCSB measurements performed by SanDisk

No Read / 100% Write 100 % Read / No Write0

20

40

60

80

100

120

140

160

62

121

17

56

FDF-GigaSpaces on SSDs Stock GigaSpaces in DRAM

Assumptions: 1TB Flash = $2K; 1TB RAM = $20K

The Performance of RAM at a Cost/Capacity Closer to Disk

ZetaScale-GigaSpaces on SSDsStock GigaSpaces in DRAM

ZetaScale-GigaSpaces

Provides 2x – 3.6x Better TPS/$ 1:50 More Capacity

ZetaScale™ – XAP MemoryXtend

Capacity0

200

400

600

800

1000

1200

20

1000

XAP XAP Extend

1:50

242k Read/Sec

Page 28: Complex Analytics with NoSQL Data Store in Real Time

Data is Moving to Cloud

Source: Managing Storage: Trends, Challenges, and Options (2013-2014). (EMC, 2013)

Page 29: Complex Analytics with NoSQL Data Store in Real Time

Orchestration needs to be integrated into DataBase solution to make it Cloud Ready

Page 31: Complex Analytics with NoSQL Data Store in Real Time

Many API’s Same Data

Data Bus (Integration with Storm)

Built In Orchestration

Demo References

Click on the relevant box to get the demo

Page 32: Complex Analytics with NoSQL Data Store in Real Time

Summary

Page 33: Complex Analytics with NoSQL Data Store in Real Time

Nati Shalom

Check out the slide on http://www.slideshare.net/giganati