container independent failover framework - mobicents summit 2011

Container Independent Failover Framework

Mobicents 2011 Summit

Agenda

• Current Scaling Limitations• DataGrid and Eventual Consistency

Current Scaling Limitations


Cluster Replication Default

• Total replication• Each node has to store a copy of the N other nodes state present in the cluster.• Involves keeping 100% state constant across entire cluster• Becomes very expensive fast, doesn't scale very well, deteriorate perf very fast

• Data is replicated to a finite number of nodes in a cluster rather than the entire cluster. • Each node has its own data, and the backup data of N (configurable) other node(s)• Data is only replicated only to the buddy nodes• Synchronous and asynchronous replication modes supported• If a node fails its data is still backed up on other buddies• If failover happens on non buddy node and look for this data, the data "gravitates" to

this new non buddy node which now owns the data and acts as the new backup node

Buddy Replication

A B C D E arrows indicate replication direction

• Sounded promising to control memory growth and network utilization and allow larger scaling

• Gravitation is expensive and is uncontrolled with regard to current server load and can make the entire cluster fail.

Buddy Replication Limitations

• Scales very well• Higher Management and Operational

costs• Firing up a new instance may often

require to fire up a new cluster instead

N nodes mini clusters

• If an entire cluster goes down, calls fails• Require a SIP LB aware of the topology

DataGrids and Eventual Consistency


CAP (Brewer's) Theorem

• Consistency : all nodes see the same data at the same time• Availability : a guarantee that every request receives a response

about whether it was successful or failed.• Partition Tolerance : the system continues to operate despite

arbitrary message loss

• A distributed system can satisfy any two of these guarantees at the same time, but not all three

Drop Network Partition

Drop Availability

Network Partition Failure : All affected nodes wait until the partition is whole again before replying, thus loss of availability

Drop Consistency

Network Partition Failure : Consistency is Broken

DataGrid vs JBoss Cache

• DataGrid similar to JBoss Cache Buddy Replication except : • transaction atomicity is not conserved• data is eventually replicated (same for JBoss Cache in

asynchronous mode)o Loosing up Consistency allows to scale much better and provide

better performance overall as less intensive in terms of node synchronisation

• Failover to the correct buddy is ensured by consistent hashing of the keys

Customers and CommunityHA Use Cases

• No High Availability : one node• High Availability with no replication : up to hundreds of nodes• High Availability with replication but controlled load : Fixed Low

Number of Nodes < 5o Usually need high consistency

• High Availability with replication and uncontrolled load : up to hundreds and more nodeso Eventual consistency can't be avoided

• GeoLocation Failover Needs to be added to the last 2 itemso Eventual consistency will typically help here

Peer to Peer Mode with DISTwith L1 Cache

The application container or the apps will need to access sessions, dialogs and other objects in the distributed store. Each container will maintain a quick-lookup local cache of deserialized objects. In Sip Servlets that is implemented for sessions. If that fails (the session needed by the app is not in the local cache), then the container will look it up in infinispan. Each infinispan instance has a data store with items hashed to the specific node (DIST-assigned serialized data). This is where the majority of the memory is consumed. However the data in this store is randomly distributed based on hash values and is not cached based on LRU or other common caching rule. To help with that infinispan allows enabling L1 cache which will enable LRU policy on commonly accessed items that are missing from the DIST-assigned store.

Data Grid HA Model

In this model the data is primarily stored in a remote data-grid by partitioning it based on some key value. The communication with the grid is done only via the network with protocols such as hot rod, memcached or thrift.In this model caching deserialized data would be very difficult if concurrent changes are allowed. Protocols for network lookup and transport in grid systems rarely allow delivery of async events to the clients (in this case the Local Node is the client for the grid). So we will never be eagerly notified of some session has changed so that we know if we can use the cached deserialized local copy. L1 cache for serialized chunks however may have a function to check if the data is up to date which still requires at least one network request by itself even without locking the data.Each grid protocol supports different operations and features that can be used to optimize the network access and support cache more efficiently.Local cache invalidation is more difficult in this model as it is not guaranteed that the HA grid architecture allows notifications to keep the caches up to date with any remote changes.

Thank you !

http://telestax.com/

container independent failover framework - mobicents summit 2011

Technology