copyright © 2006, gemstone systems inc. all rights reserved. increasing computation throughput with...

Copyright © 2006, GemStone Systems Inc. All Rights Reserved.

Increasing computation throughput with Grid Data

Caching

Jags RamnarayanChief Architect

GemStone Systems


Background on GemStone Systems

• Known for its Object Database technology since 1982

• Now specializes in memory-oriented distributed data management– 12 pending patents

• Over 200 installed customers in global 2000• Grid focus driven by:

– Very high performance with predictable throughput, latency and availability

• Capital markets – risk analytics, pricing, etc• Large e-commerce portals – real time fraud• Federal intelligence


Batch to real-time - long jobs to short tasks

Batch oriented, long jobs, highly distributed, consume and produce a lot

examples: risk analytics in finance, protein sequencing, EDA

Short tasks, real-time event driven, transactional data,

Less dataE.gs: SOA grids, E-commerce,

finance - real time pricing

Traditional HPC

Emerging Grid Apps

Software is a Service, SLA guarantees, Price pressure – cost/transaction

Future is the “Cloud”?


Increasing focus on DATA management

• Workloads where– task duration is getting shorter– latency of data access is important– consistency in data is crucial– high availability is not enough; it has to be

continuously available– common data across thousands of parallel

activities


Accessing data in Grid today

• Direct access to enterprise database or• Federated data access layer

• Exposed to the weakest link problem– only as fast as the slowest data source

– only as available as the weakest link

– can only scale as well as the weakest link

• Distributed/parallel file systems• What if too many tasks go after the same data?• Disk access speed is still 1000X slower than memory• Data consistency challenges

Might be controversial here


Impact to Grid SLA

CPU bound job can become a IO bound one

- Tasks waiting for data to arrive- Waiting for data to written to disk

- waiting for data to be shared to other concurrent tasks


Introducing memory oriented data fabric

Pool memory (and disk) across cluster/Grid Managed as a single unit Replicate data for high concurrent load, HA Distribute (partition) data for high data volume, scale Gracefully expand capacity to meet scalability/Perf goals

Distributed Data Space

Data warehouses Rational databases

Distributed Applications


How does it work?

OtherDatabases

File system

JavaClient

C++Client

C#Client When data is stored, it is

transparently replicated and/or partitioned;

Redundant storage can be in memory and/or on disk—ensures continuous availability

Machine nodes can be added dynamically to expand storage capacity or to handle increased client load

Shared Nothing disk persistence

- Each cache instance can optionally persist to disk

Synchronous read through, write through or

Asynchronous write-behind to other data sources and

sinks


Predictably scale with partitioning

Distributed AppsDistributed Apps

By keeping data spread across many nodes in memory, we can exploit the CPU and network capacity on each node simultaneously to provide linear scalability

A1B1C1

D1E1F1

G1H1I1

Local CachePartitioning Meta Data

Single Hop

- Parallel loading by many Grid nodes

- only limited by CPU and network backbone

- With partitioning meta data on each compute node, access to any single piece of data is a single hop

- As changes are redundantly and synchronously managed, availability and consistency is preserved

- Dynamically detect load changes and add or remove nodes for data

- Automatic data re-partitioning will condition the load


Collocate data for near infinite scale

Distributed AppsDistributed Apps

A1B1C1

D1E1F1

G1H1I1

Local CachePartitioning Meta Data

Single Hop

• Different Partitioning policies• Hash partitioning

– Suitable for key based access– Uniform random hashing

• Dramatically scale by keeping all related data together

• Application managed - associations– Orders hash partitioned but associated

line items are collocated

• Application managed– Grouped on data object field(s)– Customize what is collocated– Example: ‘Manage all Sept trades in

one data partition’


Move business logic to data

f1 , f2 , … fn

FIFO Queue

Data fabric Resources

Exec function

s

Sept TradesSubmit (f1) ->

AggregateHighValueTrades(<input data>,

“where trades.month=‘Sept’)

Function (f1)

Function (f2)

Principle: Move task to computational resource with most of the relevant data before considering other nodes where data transfer becomes necessary

Fabric function execution service Data dependency hints

• Routing key, collection of keys, “where clause(s)” Serial or parallel execution

• “Map Reduce”


Parallel queries

Query execution for Hash policy Parallelize query to each relevant

node Each node executes query in

parallel using local indexes on data subset

Query result is streamed to coordinating node

Individual results are unioned for final result set

This “scatter-gather” algorithm can waste CPU cycles

Partition the data on the common filter For instance, most queries are

filtered on a Trade symbol Query predicate can be analyzed to

prune partitions

1. select * from Trades

where trade.month = August

2. Parallel query execution

3. Parallel streaming of results

4. Results returned


Key lessons

• Apps should think about capitalizing memory across Grid (it is abundant)

• Keep IO cycles to minimum through main memory caching of operational data sets– Scavange Grid memory and avoid data source access

• Achieve near infinite scale for your Grid apps by horizontally partitioning your data and behavior– Read “Pat helland’s – Life beyond Distributed transactions” (

http://www-db.cs.wisc.edu/cidr/cidr2007/papers/cidr07p15.pdf)

• Get more info on the GemFire data fabric– http://www.gemstone.com/gemfire

copyright © 2006, gemstone systems inc. all rights reserved. increasing computation throughput with...

Documents