copyright © 2006, gemstone systems inc. all rights reserved. increasing computation throughput with...
TRANSCRIPT
Copyright © 2006, GemStone Systems Inc. All Rights Reserved.
Increasing computation throughput with Grid Data
Caching
Jags RamnarayanChief Architect
GemStone Systems
Copyright © 2008, GemStone Systems Inc. All Rights Reserved.
Background on GemStone Systems
• Known for its Object Database technology since 1982
• Now specializes in memory-oriented distributed data management– 12 pending patents
• Over 200 installed customers in global 2000• Grid focus driven by:
– Very high performance with predictable throughput, latency and availability
• Capital markets – risk analytics, pricing, etc• Large e-commerce portals – real time fraud• Federal intelligence
Copyright © 2008, GemStone Systems Inc. All Rights Reserved.
Batch to real-time - long jobs to short tasks
Batch oriented, long jobs, highly distributed, consume and produce a lot
examples: risk analytics in finance, protein sequencing, EDA
Short tasks, real-time event driven, transactional data,
Less dataE.gs: SOA grids, E-commerce,
finance - real time pricing
Traditional HPC
Emerging Grid Apps
Software is a Service, SLA guarantees, Price pressure – cost/transaction
Future is the “Cloud”?
Copyright © 2008, GemStone Systems Inc. All Rights Reserved.
Increasing focus on DATA management
• Workloads where– task duration is getting shorter– latency of data access is important– consistency in data is crucial– high availability is not enough; it has to be
continuously available– common data across thousands of parallel
activities
Copyright © 2008, GemStone Systems Inc. All Rights Reserved.
Accessing data in Grid today
• Direct access to enterprise database or• Federated data access layer
• Exposed to the weakest link problem– only as fast as the slowest data source
– only as available as the weakest link
– can only scale as well as the weakest link
• Distributed/parallel file systems• What if too many tasks go after the same data?• Disk access speed is still 1000X slower than memory• Data consistency challenges
Might be controversial here
Copyright © 2008, GemStone Systems Inc. All Rights Reserved.
Impact to Grid SLA
CPU bound job can become a IO bound one
- Tasks waiting for data to arrive- Waiting for data to written to disk
- waiting for data to be shared to other concurrent tasks
Copyright © 2008, GemStone Systems Inc. All Rights Reserved.
Introducing memory oriented data fabric
Pool memory (and disk) across cluster/Grid Managed as a single unit Replicate data for high concurrent load, HA Distribute (partition) data for high data volume, scale Gracefully expand capacity to meet scalability/Perf goals
Distributed Data Space
Data warehouses Rational databases
Distributed Applications
Copyright © 2008, GemStone Systems Inc. All Rights Reserved.
How does it work?
OtherDatabases
File system
JavaClient
C++Client
C#Client When data is stored, it is
transparently replicated and/or partitioned;
Redundant storage can be in memory and/or on disk—ensures continuous availability
Machine nodes can be added dynamically to expand storage capacity or to handle increased client load
Shared Nothing disk persistence
- Each cache instance can optionally persist to disk
Synchronous read through, write through or
Asynchronous write-behind to other data sources and
sinks
Copyright © 2008, GemStone Systems Inc. All Rights Reserved.
Predictably scale with partitioning
Distributed AppsDistributed Apps
By keeping data spread across many nodes in memory, we can exploit the CPU and network capacity on each node simultaneously to provide linear scalability
A1B1C1
D1E1F1
G1H1I1
Local CachePartitioning Meta Data
Single Hop
- Parallel loading by many Grid nodes
- only limited by CPU and network backbone
- With partitioning meta data on each compute node, access to any single piece of data is a single hop
- As changes are redundantly and synchronously managed, availability and consistency is preserved
- Dynamically detect load changes and add or remove nodes for data
- Automatic data re-partitioning will condition the load
Copyright © 2008, GemStone Systems Inc. All Rights Reserved.
Collocate data for near infinite scale
Distributed AppsDistributed Apps
A1B1C1
D1E1F1
G1H1I1
Local CachePartitioning Meta Data
Single Hop
• Different Partitioning policies• Hash partitioning
– Suitable for key based access– Uniform random hashing
• Dramatically scale by keeping all related data together
• Application managed - associations– Orders hash partitioned but associated
line items are collocated
• Application managed– Grouped on data object field(s)– Customize what is collocated– Example: ‘Manage all Sept trades in
one data partition’
Copyright © 2008, GemStone Systems Inc. All Rights Reserved.
Move business logic to data
f1 , f2 , … fn
FIFO Queue
Data fabric Resources
Exec function
s
Sept TradesSubmit (f1) ->
AggregateHighValueTrades(<input data>,
“where trades.month=‘Sept’)
Function (f1)
Function (f2)
Principle: Move task to computational resource with most of the relevant data before considering other nodes where data transfer becomes necessary
Fabric function execution service Data dependency hints
• Routing key, collection of keys, “where clause(s)” Serial or parallel execution
• “Map Reduce”
Copyright © 2008, GemStone Systems Inc. All Rights Reserved.
Parallel queries
Query execution for Hash policy Parallelize query to each relevant
node Each node executes query in
parallel using local indexes on data subset
Query result is streamed to coordinating node
Individual results are unioned for final result set
This “scatter-gather” algorithm can waste CPU cycles
Partition the data on the common filter For instance, most queries are
filtered on a Trade symbol Query predicate can be analyzed to
prune partitions
1. select * from Trades
where trade.month = August
2. Parallel query execution
3. Parallel streaming of results
4. Results returned
Copyright © 2008, GemStone Systems Inc. All Rights Reserved.
Key lessons
• Apps should think about capitalizing memory across Grid (it is abundant)
• Keep IO cycles to minimum through main memory caching of operational data sets– Scavange Grid memory and avoid data source access
• Achieve near infinite scale for your Grid apps by horizontally partitioning your data and behavior– Read “Pat helland’s – Life beyond Distributed transactions” (
http://www-db.cs.wisc.edu/cidr/cidr2007/papers/cidr07p15.pdf)
• Get more info on the GemFire data fabric– http://www.gemstone.com/gemfire