nosql meetup july 2011

22
Shay Hassidim Deputy CTO GigaSpaces Inc. NoSQL meetup July 2011 Real-Time processing with In-Memory-Data-Grid and NoSQL [email protected]

Upload: shayhassidim

Post on 11-May-2015

1.383 views

Category:

Technology


1 download

DESCRIPTION

Real-Time processing with In-Memory-Data-Grid and NoSQL Database

TRANSCRIPT

Page 1: NoSQL meetup July 2011

Shay HassidimDeputy CTOGigaSpaces Inc.

NoSQL meetupJuly 2011Real-Time processing with In-Memory-Data-Grid and NoSQL

[email protected]

Page 2: NoSQL meetup July 2011

® Copyright 2011 GigaSpaces Inc. All Rights Reserved

Agenda

• Slides – 30 min• Live Demos – 45 min• Q&A – 15 min

2

Page 3: NoSQL meetup July 2011

® Copyright 2011 GigaSpaces Inc. All Rights Reserved

Real-Time Processing Use Cases

• Risk – Calculation engines• Call Center Management• E-commerce – auction monitoring , inventory • Gaming – Multi-user , on-line gaming • On-line marketing – Improve conversion rate• Weather reporting• Traffic analysis • Supply-Chain optimization• Manufacturing - Quality management in• Shipment & Delivery Monitoring• Fraud Detection

3

Page 4: NoSQL meetup July 2011

® Copyright 2011 GigaSpaces Inc. All Rights Reserved

Note the Time dimension

• Real-Time (msec/sec)Processing

• Near Real-Time(Min)Correlating

• Batch (Hours/Days..)Research

4

Page 5: NoSQL meetup July 2011

® Copyright 2011 GigaSpaces Inc. All Rights Reserved

Data resolution & processing models

• Mostly Event Driven• High resolution – every tick counts

Processing

• Ad-hoc queries • Mid resolution - Aggregated counters

Correlating

• Pre-generated reports• Cross-grain resolution – trends,..

Research

5

Page 6: NoSQL meetup July 2011

® Copyright 2011 GigaSpaces Inc. All Rights Reserved

Traditional Processing - RDBMS

• Scale-up Database

– Use traditional RDBMS

– Stored procedure

– Flash memory to reduce I/O

– Read-only replica

• Limitations

– Doesn’t scale on write

– Extremely expensive (HW + SW)

6

Page 7: NoSQL meetup July 2011

® Copyright 2011 GigaSpaces Inc. All Rights Reserved

Traditional Processing - CEP• Process the data as it comes• Maintain a small fraction of the data in-memory

• Pros:

– Low-latency

– Relatively low-cost• Cons

– Hard to scale (Mostly limited to scale-up)

– Not agile - Queries must be pre-generated

– Fairly complex 7

Page 8: NoSQL meetup July 2011

® Copyright 2011 GigaSpaces Inc. All Rights Reserved

In-Memory Database

• Scale up• Pros

– Scale both on write & read

– Fits the event-driven model (CEP style) , ad-hoc

query model

– SQL• Cons

- Cost of memory vs. disk

- Memory capacity is limited

- SQL

8

Page 9: NoSQL meetup July 2011

® Copyright 2011 GigaSpaces Inc. All Rights Reserved

NoSQL DB• Distributed database

– Hbase, Cassandra, MongoDB

• Pros

– Scale on write/read

– Elastic• Cons

– High latency on Read (tunable)

– Consistency tradeoffs are hard

– Non-Transactional9

Page 10: NoSQL meetup July 2011

® Copyright 2011 GigaSpaces Inc. All Rights Reserved

Hadoop Map/Reduce

• Distributed batch processing• Pros

– Designed to process massive amount of data

– Mature

– Low cost• Cons

– Not real-time

– New Programming Model

– HDFS must be carefully tuned to improve data

locality

10

Page 11: NoSQL meetup July 2011

® Copyright 2011 GigaSpaces Inc. All Rights Reserved

So what’s the bottom line?

11

One size fit all model doesn’t cut it..

The solution has to be a combination of several technologies and patterns...

Page 12: NoSQL meetup July 2011

® Copyright 2011 GigaSpaces Inc. All Rights Reserved

About GigaSpaces XAP…

12

• Application Platform• Java, .Net, C++• Real-Time processing

MW

• All Functionality• Limited Capacity

Free Edition

• Entire client side source code providedOpen

Page 13: NoSQL meetup July 2011

® Copyright 2011 GigaSpaces Inc. All Rights Reserved

GigaSpaces

GigaSpaces delivers software middleware that provides enterprises and ISVs with end-to-end application scalability and cloud-enablement for mission-critical applications for hundreds of tier-1 organizations worldwide.

13

Page 14: NoSQL meetup July 2011

® Copyright 2011 GigaSpaces Inc. All Rights Reserved

GigaSpaces XAP Components

14

Virtualize All Middleware Components

1 Clustering Model for all components

Run entire application in-memory… transaction -safe

Java-.Net-C++Ruby-Groovy-Jython-Spring JPA-JMS JDBCSchema-Free

Real-Time Automated Deployment MonitoringManagement

Customize Application Management Rules & Workflows

In-MemoryData Grid

Page 15: NoSQL meetup July 2011

® Copyright 2011 GigaSpaces Inc. All Rights Reserved

Caching

Other Solutions…

• Alachi Soft• IBM extreme scale• Microsoft Velocity• Oracle Coherence• JBoss Infinispan

15

• ScaleOut Software• Terracotta-EHCache• Tibco ActiveSpaces• Vmware GemFire• Gridgain• hazelcast

JMSAQ , MQ , Active MQ…

App ServerWeblogic , websphere, Jboss AS , Tomcat …

Orchestration

Cheff, Pupet,

Rightscale, Nolio ..CEP

Esper , Aleri , StreamBase…

App Server – WS , WL , JBOss …

JMS - AQ , MQ , Active

MQ…

CEP - Esper , Aleri …

Caching – Coherence ,

Exterme Scale..

Orchestration - Cheff, Pupet,

Rightscale

Page 16: NoSQL meetup July 2011

® Copyright 2011 GigaSpaces Inc. All Rights Reserved

RT Processing with IMDG and NoSQL DB

16

Analytics Application

EventSources

Write behind

- In Memory Data Grid

- RT Processing Grid• Light Event Processing• Map-reduce• Event-driven• Execute code with data• Transactional• Secured• Elastic

NoSQL DB• Low-cost storage• Write/Read

scalability• Dynamic scaling• Raw Data and

aggregated DataGenerate Patterns

Page 17: NoSQL meetup July 2011

® Copyright 2011 GigaSpaces Inc. All Rights Reserved

Use Case

Calculation Engine Design Patterns

With XAP

17

Page 18: NoSQL meetup July 2011

® Copyright 2011 GigaSpaces Inc. All Rights Reserved

Main Features Used

18

Data Partitioning: Transparent content-based data partitioning to evenly and intelligently distribute data across your data-grid cluster

Querying: Sophisticated query engine with support for SQL and example based queries

Indexing: Predefined and ad-hoc property indexing for blazing fast data access

Locking Support: Locking and transaction isolation for robust and hassle-free data access

Write Behind: Asynchronous and reliable propagation of data to any external data source

Master-Worker Support: Intuitive and highly scalable master-worker implementation for distributing computation-intensive tasks

Distributed Code Execution: Dynamic code shipment and map/reduce execution across the grid for optimized processing and data access

Content Based Routing: Routing of events to relevant cluster members based on their content

Workflow Support: Implement complex workflows using event propagation and sophisticated event filtering

Admin API: Comprehensive and intuitive API for monitoring and controlling every aspect of your cluster and application

Page 19: NoSQL meetup July 2011

® Copyright 2011 GigaSpaces Inc. All Rights Reserved

Elastic Calculation Engine - Colocated Logic

Client

Task (set3)

CalculationRequest

Task (set1)

Task (set x)

Task (set2)Bulk Lazy Load

Async persistency

Partition x

Partition 3

Partition 2

Partition 1

19

Step 3 - The Calculation Task searches for all

Trades. Any missing Trades are loaded in a lazy manner

from the DB in one bulk query.

Step 1 - The client sends calculation Task to each partition with the specific Trade IDs

required.

Step 2 - The Task reads all the Trade objects and performs the NPV calculation for each Task. Result sent back into the client

for final aggregation

The Data-Grid and the calculations Grid scale

togetherStep 4 - Intermediate results retrieved from

each partition and reduced.

Page 20: NoSQL meetup July 2011

® Copyright 2011 GigaSpaces Inc. All Rights Reserved

Scales on demand separately from the Data-Grid

The Data Grid and the calculations Grid scale

independently

Client

CalculationRequest

Write Requests

Take Result

Write Result

Partition x

Partition 3

Partition 2

Partition 1

Calculation Engines

Take Results

Bulk Lazy load

Async persistency

Elastic Calculation Engine - Remote Logic

20

Step 3 - The Calculation logic searches for all Trades. Any

missing Trades are loaded in a lazy manner from the DB in

one bulk query and written into the space to be reused later.

Step 1 - The client sends calculation Requests to the space

cluster.

Step 2 - Each Calculation engine consumes a different Request ,

processes it and writes the Result back into the space.

Using local cache for reference data.

Step 4 - The client consumes all the

calculation results and performs final aggregation.

Page 21: NoSQL meetup July 2011

® Copyright 2011 GigaSpaces Inc. All Rights Reserved

Demos

• Simple IMDG Operations

– IMDG write,read,execute…• IMDG and NoSQL DB Integration

– Cassandra

– MongoDB• Calculation Engine

– Small scale Demo

– Large scale Demo – on the Cloud

21

Page 22: NoSQL meetup July 2011

® Copyright 2011 GigaSpaces Inc. All Rights Reserved22