nosql meetup july 2011
DESCRIPTION
Real-Time processing with In-Memory-Data-Grid and NoSQL DatabaseTRANSCRIPT
Shay HassidimDeputy CTOGigaSpaces Inc.
NoSQL meetupJuly 2011Real-Time processing with In-Memory-Data-Grid and NoSQL
® Copyright 2011 GigaSpaces Inc. All Rights Reserved
Agenda
• Slides – 30 min• Live Demos – 45 min• Q&A – 15 min
2
® Copyright 2011 GigaSpaces Inc. All Rights Reserved
Real-Time Processing Use Cases
• Risk – Calculation engines• Call Center Management• E-commerce – auction monitoring , inventory • Gaming – Multi-user , on-line gaming • On-line marketing – Improve conversion rate• Weather reporting• Traffic analysis • Supply-Chain optimization• Manufacturing - Quality management in• Shipment & Delivery Monitoring• Fraud Detection
3
® Copyright 2011 GigaSpaces Inc. All Rights Reserved
Note the Time dimension
• Real-Time (msec/sec)Processing
• Near Real-Time(Min)Correlating
• Batch (Hours/Days..)Research
4
® Copyright 2011 GigaSpaces Inc. All Rights Reserved
Data resolution & processing models
• Mostly Event Driven• High resolution – every tick counts
Processing
• Ad-hoc queries • Mid resolution - Aggregated counters
Correlating
• Pre-generated reports• Cross-grain resolution – trends,..
Research
5
® Copyright 2011 GigaSpaces Inc. All Rights Reserved
Traditional Processing - RDBMS
• Scale-up Database
– Use traditional RDBMS
– Stored procedure
– Flash memory to reduce I/O
– Read-only replica
• Limitations
– Doesn’t scale on write
– Extremely expensive (HW + SW)
6
® Copyright 2011 GigaSpaces Inc. All Rights Reserved
Traditional Processing - CEP• Process the data as it comes• Maintain a small fraction of the data in-memory
• Pros:
– Low-latency
– Relatively low-cost• Cons
– Hard to scale (Mostly limited to scale-up)
– Not agile - Queries must be pre-generated
– Fairly complex 7
® Copyright 2011 GigaSpaces Inc. All Rights Reserved
In-Memory Database
• Scale up• Pros
– Scale both on write & read
– Fits the event-driven model (CEP style) , ad-hoc
query model
– SQL• Cons
- Cost of memory vs. disk
- Memory capacity is limited
- SQL
8
® Copyright 2011 GigaSpaces Inc. All Rights Reserved
NoSQL DB• Distributed database
– Hbase, Cassandra, MongoDB
• Pros
– Scale on write/read
– Elastic• Cons
– High latency on Read (tunable)
– Consistency tradeoffs are hard
– Non-Transactional9
® Copyright 2011 GigaSpaces Inc. All Rights Reserved
Hadoop Map/Reduce
• Distributed batch processing• Pros
– Designed to process massive amount of data
– Mature
– Low cost• Cons
– Not real-time
– New Programming Model
– HDFS must be carefully tuned to improve data
locality
10
® Copyright 2011 GigaSpaces Inc. All Rights Reserved
So what’s the bottom line?
11
One size fit all model doesn’t cut it..
The solution has to be a combination of several technologies and patterns...
® Copyright 2011 GigaSpaces Inc. All Rights Reserved
About GigaSpaces XAP…
12
• Application Platform• Java, .Net, C++• Real-Time processing
MW
• All Functionality• Limited Capacity
Free Edition
• Entire client side source code providedOpen
® Copyright 2011 GigaSpaces Inc. All Rights Reserved
GigaSpaces
GigaSpaces delivers software middleware that provides enterprises and ISVs with end-to-end application scalability and cloud-enablement for mission-critical applications for hundreds of tier-1 organizations worldwide.
13
® Copyright 2011 GigaSpaces Inc. All Rights Reserved
GigaSpaces XAP Components
14
Virtualize All Middleware Components
1 Clustering Model for all components
Run entire application in-memory… transaction -safe
Java-.Net-C++Ruby-Groovy-Jython-Spring JPA-JMS JDBCSchema-Free
Real-Time Automated Deployment MonitoringManagement
Customize Application Management Rules & Workflows
In-MemoryData Grid
® Copyright 2011 GigaSpaces Inc. All Rights Reserved
Caching
Other Solutions…
• Alachi Soft• IBM extreme scale• Microsoft Velocity• Oracle Coherence• JBoss Infinispan
15
• ScaleOut Software• Terracotta-EHCache• Tibco ActiveSpaces• Vmware GemFire• Gridgain• hazelcast
JMSAQ , MQ , Active MQ…
App ServerWeblogic , websphere, Jboss AS , Tomcat …
Orchestration
Cheff, Pupet,
Rightscale, Nolio ..CEP
Esper , Aleri , StreamBase…
App Server – WS , WL , JBOss …
JMS - AQ , MQ , Active
MQ…
CEP - Esper , Aleri …
Caching – Coherence ,
Exterme Scale..
Orchestration - Cheff, Pupet,
Rightscale
® Copyright 2011 GigaSpaces Inc. All Rights Reserved
RT Processing with IMDG and NoSQL DB
16
Analytics Application
EventSources
Write behind
- In Memory Data Grid
- RT Processing Grid• Light Event Processing• Map-reduce• Event-driven• Execute code with data• Transactional• Secured• Elastic
NoSQL DB• Low-cost storage• Write/Read
scalability• Dynamic scaling• Raw Data and
aggregated DataGenerate Patterns
® Copyright 2011 GigaSpaces Inc. All Rights Reserved
Use Case
Calculation Engine Design Patterns
With XAP
17
® Copyright 2011 GigaSpaces Inc. All Rights Reserved
Main Features Used
18
Data Partitioning: Transparent content-based data partitioning to evenly and intelligently distribute data across your data-grid cluster
Querying: Sophisticated query engine with support for SQL and example based queries
Indexing: Predefined and ad-hoc property indexing for blazing fast data access
Locking Support: Locking and transaction isolation for robust and hassle-free data access
Write Behind: Asynchronous and reliable propagation of data to any external data source
Master-Worker Support: Intuitive and highly scalable master-worker implementation for distributing computation-intensive tasks
Distributed Code Execution: Dynamic code shipment and map/reduce execution across the grid for optimized processing and data access
Content Based Routing: Routing of events to relevant cluster members based on their content
Workflow Support: Implement complex workflows using event propagation and sophisticated event filtering
Admin API: Comprehensive and intuitive API for monitoring and controlling every aspect of your cluster and application
® Copyright 2011 GigaSpaces Inc. All Rights Reserved
Elastic Calculation Engine - Colocated Logic
Client
Task (set3)
CalculationRequest
Task (set1)
Task (set x)
Task (set2)Bulk Lazy Load
Async persistency
Partition x
Partition 3
Partition 2
Partition 1
19
Step 3 - The Calculation Task searches for all
Trades. Any missing Trades are loaded in a lazy manner
from the DB in one bulk query.
Step 1 - The client sends calculation Task to each partition with the specific Trade IDs
required.
Step 2 - The Task reads all the Trade objects and performs the NPV calculation for each Task. Result sent back into the client
for final aggregation
The Data-Grid and the calculations Grid scale
togetherStep 4 - Intermediate results retrieved from
each partition and reduced.
® Copyright 2011 GigaSpaces Inc. All Rights Reserved
Scales on demand separately from the Data-Grid
The Data Grid and the calculations Grid scale
independently
Client
CalculationRequest
Write Requests
Take Result
Write Result
Partition x
Partition 3
Partition 2
Partition 1
Calculation Engines
Take Results
Bulk Lazy load
Async persistency
Elastic Calculation Engine - Remote Logic
20
Step 3 - The Calculation logic searches for all Trades. Any
missing Trades are loaded in a lazy manner from the DB in
one bulk query and written into the space to be reused later.
Step 1 - The client sends calculation Requests to the space
cluster.
Step 2 - Each Calculation engine consumes a different Request ,
processes it and writes the Result back into the space.
Using local cache for reference data.
Step 4 - The client consumes all the
calculation results and performs final aggregation.
® Copyright 2011 GigaSpaces Inc. All Rights Reserved
Demos
• Simple IMDG Operations
– IMDG write,read,execute…• IMDG and NoSQL DB Integration
– Cassandra
– MongoDB• Calculation Engine
– Small scale Demo
– Large scale Demo – on the Cloud
21
® Copyright 2011 GigaSpaces Inc. All Rights Reserved22