distributed caching: essential...

Dist r ibuted Caching: Essent ial Lessons

Considerat ions for m axim izing scalable perform ance and reliability in J2 EE clusters

About: Overview

Applicat ion developm ent considerat ions for achieving

m axim um scalable perform ance and reliability in

clustered J2 EE environm ents, im proving scalability and

scalable perform ance of applicat ions through the use of

clustered caching to re liably share live data am ong

clustered JVMs in the applicat ion t ier , providing

t ransparent fa il- over as a key elem ent of uninterrupted

operat ion, and reduced load on the database t ier as a

key elem ent of scalability.

About: Cameron Purdy and Tangosol

Cam eron Purdy is President of Tangosol, and is a cont r ibutor to Java and XML specificat ions

Tangosol is the JCache ( JSR1 0 7 ) specificat ion lead and a m em ber of the W ork Manager ( JSR1 3 7 ) expert group.

Tangosol Coherence is the leading clustered caching and data gr id product for Java and J2 EE environm ents. Coherence enables in- m em ory data m anagem ent for clustered J2 EE applicat ions and applicat ion servers, and m akes sharing, m anaging and caching data in a cluster as sim ple as on a single server .

Caching Topologies

Replicated Topology: Descript ion

Requirem ent : Extrem e Perform ance. Zero- Latency

Access.

Solut ion : Fully Replicate data to a ll m em bers of the

cluster .

Result : Zero Latency Access. Since the data is pushed

( replicated) to each cluster m em ber, it is available in

each JVM for use w ithout w ait ing. This provides the

highest possible speed for read accesses.

Replicated Topology: Read Access

Replicated Topology: Lim itat ions

Cost Per Update : Updat ing a replicated cache requires

pushing the new version of the data to a ll other cluster

m em bers, w hich w ill lim it scalability if there are a high

frequency of updates per m em ber.

Cost Per Entry : The data is replicated to every cluster

m em ber, so Java heap space is used on each m em ber,

w hich w ill im pact perform ance for large caches.

Replicated Topology: Write Operat ions

Replicated Topology: Conclusions

Perform ance : Very good read perform ance

Scalability : The scalability of replicat ion is

inversely proport ional to the num ber of

m em bers, the frequency of updates per

m em ber, and the size of the updates

Uses : Sm all read- intensive caches that benefit

from a data push m odel

Part it ioned Topology: Descript ion

Requirem ent : Extrem e Scalability.

Solut ion : Shared- Nothing Architecture. Autom at ically

Part it ion data across all cluster m em bers.

Result : Linear Scalability. By part it ioning the data

evenly, the per- port throughput ( the m axim um am ount

of w ork that can be perform ed by each server) rem ains

constant as servers are added, up to the extent of the

sw itched fabric.

Part it ioned Topology: Read Access

Part it ioned Topology: Benefits

Part it ioned : The size of the cache and the

processing pow er available grow linear ly w ith

the size of the cluster.

Load- Balanced : The responsibility for

m anaging the data is autom at ically load-

balanced across the cluster.

Part it ioned Topology: Benefits

Ow nership : Exact ly one node in the cluster is

responsible for each piece of data in the cache.Supports cache- through architectures

Supports data- grid capabilit ies

Point - To- Point : The com m unicat ion for the

Part it ioned cache is a ll point - to- point ,

enabling linear scalability.

Part it ioned Topology: Write Operat ions

Part it ioned Topology: Failover

Failover : All Coherence cache services provide

failover and failback w ithout any data loss,

and that includes part it ioned caches.Configurable level of redundancy ( backup count )

Any cluster node can fail w ithout the loss of data

Data is explicit ly backed up on different physical servers

Never a m om ent w hen the cluster is not ready for a server

to die ( no data vulnerability, no SPOF)

Part it ioned Topology: Failover

Part it ioned Topology: Cache Servers

Locat ion Transparency : The JCache API and its

behavior are the sam e w ith a local, replicated or

part it ioned cache.

Local Storage Enabled : Cluster nodes w ith local storage

enabled w ill provide the cache and backup storage for

the part it ioned cache. All cluster nodes w ill have the

sam e exact view of the data, because of Locat ion

Transparency.

Part it ioned Topology: Cache Servers

Part it ioned Topology: Conclusions

Perform ance : Fixed cost .

Scalability : Linear scalability of both cache capacity and

throughput as the num ber of m em bers increases.

Designed for scaling on m odern sw itched netw orks.

Uses : Any size caches, scaling w ith the size of the

cluster or data gr id. Both read- and w rite- intensive use

cases. Ability to offload heap usage to other JVMs.

Load- balancing. Resilient to server fa ilure.

Near Topology: Descript ion

Requirem ent : Extrem e Perform ance. Extrem e

Scalability.

Solut ion : Local L1 I n- Mem ory cache in front of a

Clustered L2 Part it ioned Cache.

Result : Zero Latency Access to recent ly- used and

frequent ly- used data. Scalable cache capacity and cache

throughput , w ith a fixed cost for w orst - case.

Near Topology: Read Access

Near Topology: Coherency

Read- Only/ Read- Most ly : Expiry- based Near

Caching allow s the data to be read unt il its

configured expiry

Event - Based Seppuku : Evict ion by event . The

Near Cache can autom at ically listen to a ll

cache events, or only those cache events that

apply to the data it has cached locally.

Near Topology: Write Operat ions

Near Topology: Cache Servers

Potent Com binat ion : Com bining the benefits of Near Caching w ith the dedicated Cache Servers can provide

the best of both w orlds for m any com m on use cases.

Bulging At The Heap : This topology is very popular for applicat ion server environm ents that w ant to cache very large data sets, but do not w ant to use the applicat ion server heap to do so.

Balanced : The applicat ion server w ill use a tunable am ount of m em ory to cache recent ly- and frequent ly-used objects in a local cache.

Near Topology: Cache Servers

Near Topology: Conclusions

Perform ance : Zero- latency for com m on data. Fixed cost for the rem ainder of the data.

Scalability : Linear scalability of both cache capacity and throughput as the num ber of m em bers increases. Som e reduct ion in scalability w hen using event - based Seppuku.

Uses : Any size caches, scaling w ith the size of the cluster . Both read- and w rite- intensive use cases. Part icular ly good for read- intensive caches that have t ight data access pat terns. Killer Cache Server config.

Cache-Aside Architecture

Cache- Aside refers to an architecture in w hich

the applicat ion developer m anages the caching

of data from a data source

Adding cache- aside to an exist ing applicat ion:Check the cache before reading from the data source

Put data into the cache after reading from the data source

Evict or update the cache w hen updat ing the data source

Cache-Through: Architecture

Cache- Through places the cache betw een the client of the data source and the data source itself, requir ing access to the data source to go through the cache.

A Cache Loader represents access to a data source. W hen a cache is asked for data, if it is a cache m iss, then any data that it cannot provide it w ill a t tem pt to load by delegat ing to the Cache Loader.

A Cache Store is an extension to Cache Loader that adds the set of operat ions generally referred to as Create, Read, Update and Delete ( CRUD)

Cache-Through: Coherence

Coherence Cache- Through operat ions are

alw ays m anaged by the ow ner of the data

w ithin the cluster.

Concurrent access operat ions are com bined by

the ow ner, great ly reducing database load.

W rite- Through keeps the cache and database

in sync by keeping the cache aw are of updates

Cache-Through: Read- & Write-Through

Cache-Through: Conclusions

Perform ance : Reduces latency for database access by interposing a cache betw een the applicat ion and the data. Database m odificat ions addit ionally involve a cache update.

Scalability : Channels and com bines data accesses. May significant ly reduce database load.

Uses : Any t im e a cache needs to t ransparent ly load data from a database. Clean encapsulat ion of data source access in one place: Cache Loader / Store.

Write-Behind: Descript ion

Coherence W rite- Behind accepts cache

m odificat ions direct ly into the cache

The m odificat ions are then asynchronously

w rit ten through the Cache Store, opt ionally

after a specified delay

The w rite- behind data is clustered, m aking it

resilient to server fa ilure

Write-Behind: I llust rat ion

Write-Behind: Conclusions

Perform ance : Low - latency for cached data reads and data m odificat ions.

Scalability : The sam e extrem e read/ w rite scalability of the cache, and significant ly ( 9 0 % + ) reduced load on the database

Coalesced w rite- through of m ult iple m odificat ions to an object

Batched w rite- through of m odificat ions to m ult iple objects

Uses : W hen w rite perform ance is im portant , data source load is high, and/ or w hen an applicat ion has to be able to cont inue w hen the data source is dow n.

Only use w hen all w rites to the data source com e through the cache

Topology Quiz: Exam ple Use Case

W hat cache topology w ould be opt im al for this

applicat ion?Caching 1 0 GB of financial port folio data

Read- heavy, updated night ly

Several hundred users

Several thousand requests per m inute



applicat ion?Caching user preferences for an in- house applicat ion

Several hundred concurrent users

Preferences updated a few t im es per day



applicat ion?Caching brow sing history for an online retailer to support

personalizat ion and real- t im e assistance

Many updates, few reads



applicat ion?Logging user interact ions to a database for internal

audit ing purposes

1 0 0 0 updates per m inute

Lesson 1: Use an MVC Architecture

Model/ View / Controller ( MVC, aka Model2 )Model: Dom ain- specific representat ion of the inform at ion

the applicat ion displays and on w hich it operates

View : Renders the m odel into a form suitable for

interact ion, typically a user interface elem ent or docum ent

Controller: Responds to events typically are user act ions

or service requests and invokes changes on the m odel

Lesson 1: Use an MVC Architecture

Clear delineat ion of responsibility; for

exam ple:Cache is used in the Model

Cache Loader / Store is the DAO

Cache contains the applicat ion s POJOs / Value Objects

View pulls data from the Model

Goal is to ensure that a ll accesses are served from cache

Controller affects the Model

Modificat ions via W rite- Through or W rite- Behind

Lesson 2: Specify the Data Access

There are m ult iple w ays to access dataORM ( JDO, EJB3 , Hibernate, etc.)

Cache API ( i.e . t ransparent ly via a Cache Loader)

JDBC ( or other direct integrat ion API )

Most applicat ions have a best w ayLarge- scale set oriented access usually indicates JDBC

Mix of set - and ident ity- oriented access indicates ORM

I dent ity- oriented access m ay indicate Cache API


Picking the w rong approach is disastrousRDBMS ( JDBC) opt im ized for set - based queries and

operat ions, including joins and aggregates, but crum bles

w ith heavy row - level access ( 1 + N access pat tern, etc.)

ORMs can bog dow n bady on large set - based access

JCache API is built around ident ity- based access, not set -

based access


Not a lw ays an obvious best choiceSom e applicat ions have a m ix of intensive row - level and

large set - level operat ions, w hich lend them selves poorly to

any single approach

Even a w ell- architected and carefully- designed applicat ion

w ill often have a few except ions to the rule that require

the specified approach to Data Access to be circum vented

I t is som et im es necessary to use different approaches for

different classes of data w ithin the sam e applicat ion


Opt im izat ions available for eachI t is often possible to cache JDBC result sets

Most ORMs have effect ive support for pluggable caches

Som e ORMs have opt im izat ions for set - based operat ions

that can t ranslate som e operat ions direct ly into opt im ized

SQL that perform s the ent ire operat ion w ithin the RDBMS

Coherence provides extensive query support , including

paralle l query w ith indexes and cost - based opt im izat ions

Lesson 3: Design a Domain Model

Dom ain Model includes tw o aspects of

applicat ion m odelingData Model: Describes the state that the applicat ion

m aintains, both in term s of persistent data ( e .g. the

system of record ) and runt im e data ( sessions, queued

events, requests, responses, etc.)

Behavioral Model: Describes the various act ions that can

affect the state of the applicat ion. Very sim ilar to the

concepts behind SOA, but at a m uch low er level.


Dom ain Model is not dissim ilar from SOAData Model: Analogous to the inform at ion encapsulated

and m anaged behind a set of services

Behavioral Model: Analogous to the set of services exposed

by a broker

The concept of a Dom ain Model is technology-

neutral. I t can even exist only in the abstract .Modeling is a tool, not a religion


Dom ain Model has valueAllow s the Data Model to exist independent ly of the behavioral m odel, support ing the separat ion of a cont roller from the m odel in an MVC architecture

The behavioral m odel is the basis for the events that a cont roller is required to support

W ith an abstract data m odel that reflects applicat ion concerns instead technology concerns, it is m uch m ore likely that the result ing data m odel im plem entat ion w ill be m ore easily used by the view and the cont roller


Dom ain Model is not newOO developers have been using Dom ain Modeling for years

Applicat ion Developers that double as DBAs have used

m odeling to get their ideas dow n into a design that could

provide both opt im al applicat ion im plem entat ion and

opt im al database organizat ion

SOA is the publishing of a behavior m odel that is intended

to be publicly- accessible, w ith data m odels often direct ly

reflected in the service request and response data

Lesson 4: Find the natural granular ity

Every applicat ion has a natural granularity for

its data.Relat ional data m odels have a norm alized granularity from

w hich tables naturally em erge

Opt im ized JDBC- based applicat ions have a statem ent

execut ion granularity and a Result Set granularity

ORM- based applicat ions and cache- intensive applicat ions

often have an OO granularity that m irrors the data m odel

Caches have an ident ity granularity of access

Lesson 4: Find the natural granular ity

Caches w ill typically exist for each m ajor class of applicat ion object

e.g. Accounts, Sym bols, Posit ions, Orders, Execut ions, etc.

Each cache w ill tend to have a natural keye.g. account id, sym bol, account id + sym bol, order id, etc.

Applicat ion objects tend to be com plexContain ow ned objects, e .g. Purchase Order contains Lines

Lesson 5: Decouple using I dent ity

Store, Load, Provide and Manage the I dent ity

of related m odel objects

Provide accessors for related m odel objects by

using I dent ity de- reference ( i.e . cache access)

Read- only m odels tend to have m ore lee- w aySoft references

Transient reference fields

Lesson 5: Decouple using I dent ity

Sim plifies m anagem ent of large object graphs

Enables efficient lazy loading of object graphs

W orks w ell w ith

Lesson 6: Use an I mmutable Data Model

From the View , the Model should t reated as if

it is read- only

From the point of view of the Controller , the

m odel that is shared across threads should be

t reated as im m utable, for exam ple just in case

the View is using it on a different thread

Lesson 6: Use an I mmutable Data Model

Since m ost applicat ions are not read- only, the

Controller does have to m odify the data

represented by the Model

W hen the Controller needs to m odify the

Model, it can obtain m utable clones of the

shared m odel, and m anage them

Lesson 7: Use Cache Transact ions

t ransact ionally.

W hen the Controller obtains cached values

w ithin a t ransact ion, the values are actually

clones of the m aster cached values

The Controller m akes its m odificat ions to the

Model in a t ransact ionally isolated and

consistent m anner

Lesson 7: Use Cache Transact ions

For m axim um scalability, m ost t ransact ions should be opt im ist ic. Just as w ith any opt im ist ic t ransact ion approach, this im plies that the applicat ion m ust handle and/ or ret ry t ransact ions w hose opt im ist ic checks fail

Cache Transact ions can integrate w ith the container s Transact ion Manager via the JEE Connector Architecture

Lesson 8: Use Queries Wisely

Cache Queries are opt im ized, and they are run in paralle l across a cluster, but they are probably at least an order of m agnitude m ore expensive than ident ity- based operat ions

I f you use queries, m ake sure to use indexes; the Coherence query opt im izer can use m ult iple indexes on a single query, even if they don t perfect ly cover the query

Lesson 9: Opt im ize Serializat ion

Objects that are stored in a cache m ay need to

be serialized, and Java s default object

seria lizat ion is relat ively inefficient

I m plem ent ing the Externalizable interface

m ay help slight ly

Serializat ion using data st ream s instead of

object st ream s can m ake a phenom enal im pact


Coherence provides the ExternalizableLiteinterface to support seria lizat ion to data st ream s, and the Tangosol Xm lBeanfram ew ork w hich im plem ents ExternalizableLite for derived value objects

Serializat ion perform ance im provem ents are up to an order of m agnitude, and the reduct ion in size can be up to 8 0 % .


Since Java does not have an object - cloning interface, classes that do not have a public clone( ) m ethod m ay require serializat ion and deseria lizat ion in order to be cloned

Since Cache Transact ions m ay need to clone an object to create a copy w ithin a local t ransact ion, opt im ized serializat ion can even im prove the perform ance of t ransact ions

Lesson 10: Use Good I dent it ies

An I dent ity im plem entat ion m ust provide

correct hashCode( ) and equals( )

im plem entat ions, and cache the hashcode!

A good toStr ing( ) im plem entat ion helps w ith

debugging ( and not just for I dent it ies!)

I f feasible, m ake your I dent ity classes

im m utable

Lesson 10: Use Good I dent it ies

An I dent ity m ust be Serializable, and its

seria lized form should be stable: Tw o

instances should ser ialize to the sam e binary

value if and only if equals( ) returns t rue

Java s Str ing, I nteger, Long, etc. are perfect

Tangosol Xm lBean is a good base class for ( or

exam ple of) com plex ident it ies

Lesson 11: Cache in the Right Scope

HTTP Session objects can be used for caching

user- or session- specific inform at ion; don t

use them as a cache for global inform at ion

Conversely, don t use a global cache for user-

specific caching w hen the HTTP Session w ould

do just fine

Lesson 12: Never Assume it Works

W e have seen caches in product ion that literally w ere not even get t ing used, and w e have seen caches that had not even been configured or w ere badly m is- configured

W hile load test ing, use JMX to m onitor w hat caches exist , how big they are, w hat their hit rates are, and w hether their average access t im es and access counts seem correct

Audience Response

Quest ion?

Dist r ibuted Caching: Essent ial Lessons

Considerat ions for m axim izing scalable perform ance and reliability in J2 EE clusters

distributed caching: essential...

Documents