distributed caching: essential...
TRANSCRIPT
Dist r ibuted Caching: Essent ial Lessons
Considerat ions for m axim izing scalable perform ance and reliability in J2 EE clusters
About: Overview
Applicat ion developm ent considerat ions for achieving
m axim um scalable perform ance and reliability in
clustered J2 EE environm ents, im proving scalability and
scalable perform ance of applicat ions through the use of
clustered caching to re liably share live data am ong
clustered JVMs in the applicat ion t ier , providing
t ransparent fa il- over as a key elem ent of uninterrupted
operat ion, and reduced load on the database t ier as a
key elem ent of scalability.
About: Cameron Purdy and Tangosol
Cam eron Purdy is President of Tangosol, and is a cont r ibutor to Java and XML specificat ions
Tangosol is the JCache ( JSR1 0 7 ) specificat ion lead and a m em ber of the W ork Manager ( JSR1 3 7 ) expert group.
Tangosol Coherence is the leading clustered caching and data gr id product for Java and J2 EE environm ents. Coherence enables in- m em ory data m anagem ent for clustered J2 EE applicat ions and applicat ion servers, and m akes sharing, m anaging and caching data in a cluster as sim ple as on a single server .
Caching Topologies
Replicated Topology: Descript ion
Requirem ent : Extrem e Perform ance. Zero- Latency
Access.
Solut ion : Fully Replicate data to a ll m em bers of the
cluster .
Result : Zero Latency Access. Since the data is pushed
( replicated) to each cluster m em ber, it is available in
each JVM for use w ithout w ait ing. This provides the
highest possible speed for read accesses.
Replicated Topology: Read Access
Replicated Topology: Lim itat ions
Cost Per Update : Updat ing a replicated cache requires
pushing the new version of the data to a ll other cluster
m em bers, w hich w ill lim it scalability if there are a high
frequency of updates per m em ber.
Cost Per Entry : The data is replicated to every cluster
m em ber, so Java heap space is used on each m em ber,
w hich w ill im pact perform ance for large caches.
Replicated Topology: Write Operat ions
Replicated Topology: Conclusions
Perform ance : Very good read perform ance
Scalability : The scalability of replicat ion is
inversely proport ional to the num ber of
m em bers, the frequency of updates per
m em ber, and the size of the updates
Uses : Sm all read- intensive caches that benefit
from a data push m odel
Part it ioned Topology: Descript ion
Requirem ent : Extrem e Scalability.
Solut ion : Shared- Nothing Architecture. Autom at ically
Part it ion data across all cluster m em bers.
Result : Linear Scalability. By part it ioning the data
evenly, the per- port throughput ( the m axim um am ount
of w ork that can be perform ed by each server) rem ains
constant as servers are added, up to the extent of the
sw itched fabric.
Part it ioned Topology: Read Access
Part it ioned Topology: Benefits
Part it ioned : The size of the cache and the
processing pow er available grow linear ly w ith
the size of the cluster.
Load- Balanced : The responsibility for
m anaging the data is autom at ically load-
balanced across the cluster.
Part it ioned Topology: Benefits
Ow nership : Exact ly one node in the cluster is
responsible for each piece of data in the cache.Supports cache- through architectures
Supports data- grid capabilit ies
Point - To- Point : The com m unicat ion for the
Part it ioned cache is a ll point - to- point ,
enabling linear scalability.
Part it ioned Topology: Write Operat ions
Part it ioned Topology: Failover
Failover : All Coherence cache services provide
failover and failback w ithout any data loss,
and that includes part it ioned caches.Configurable level of redundancy ( backup count )
Any cluster node can fail w ithout the loss of data
Data is explicit ly backed up on different physical servers
Never a m om ent w hen the cluster is not ready for a server
to die ( no data vulnerability, no SPOF)
Part it ioned Topology: Failover
Part it ioned Topology: Cache Servers
Locat ion Transparency : The JCache API and its
behavior are the sam e w ith a local, replicated or
part it ioned cache.
Local Storage Enabled : Cluster nodes w ith local storage
enabled w ill provide the cache and backup storage for
the part it ioned cache. All cluster nodes w ill have the
sam e exact view of the data, because of Locat ion
Transparency.
Part it ioned Topology: Cache Servers
Part it ioned Topology: Conclusions
Perform ance : Fixed cost .
Scalability : Linear scalability of both cache capacity and
throughput as the num ber of m em bers increases.
Designed for scaling on m odern sw itched netw orks.
Uses : Any size caches, scaling w ith the size of the
cluster or data gr id. Both read- and w rite- intensive use
cases. Ability to offload heap usage to other JVMs.
Load- balancing. Resilient to server fa ilure.
Near Topology: Descript ion
Requirem ent : Extrem e Perform ance. Extrem e
Scalability.
Solut ion : Local L1 I n- Mem ory cache in front of a
Clustered L2 Part it ioned Cache.
Result : Zero Latency Access to recent ly- used and
frequent ly- used data. Scalable cache capacity and cache
throughput , w ith a fixed cost for w orst - case.
Near Topology: Read Access
Near Topology: Coherency
Read- Only/ Read- Most ly : Expiry- based Near
Caching allow s the data to be read unt il its
configured expiry
Event - Based Seppuku : Evict ion by event . The
Near Cache can autom at ically listen to a ll
cache events, or only those cache events that
apply to the data it has cached locally.
Near Topology: Write Operat ions
Near Topology: Cache Servers
Potent Com binat ion : Com bining the benefits of Near Caching w ith the dedicated Cache Servers can provide
the best of both w orlds for m any com m on use cases.
Bulging At The Heap : This topology is very popular for applicat ion server environm ents that w ant to cache very large data sets, but do not w ant to use the applicat ion server heap to do so.
Balanced : The applicat ion server w ill use a tunable am ount of m em ory to cache recent ly- and frequent ly-used objects in a local cache.
Near Topology: Cache Servers
Near Topology: Conclusions
Perform ance : Zero- latency for com m on data. Fixed cost for the rem ainder of the data.
Scalability : Linear scalability of both cache capacity and throughput as the num ber of m em bers increases. Som e reduct ion in scalability w hen using event - based Seppuku.
Uses : Any size caches, scaling w ith the size of the cluster . Both read- and w rite- intensive use cases. Part icular ly good for read- intensive caches that have t ight data access pat terns. Killer Cache Server config.
Cache-Aside Architecture
Cache- Aside refers to an architecture in w hich
the applicat ion developer m anages the caching
of data from a data source
Adding cache- aside to an exist ing applicat ion:Check the cache before reading from the data source
Put data into the cache after reading from the data source
Evict or update the cache w hen updat ing the data source
Cache-Through: Architecture
Cache- Through places the cache betw een the client of the data source and the data source itself, requir ing access to the data source to go through the cache.
A Cache Loader represents access to a data source. W hen a cache is asked for data, if it is a cache m iss, then any data that it cannot provide it w ill a t tem pt to load by delegat ing to the Cache Loader.
A Cache Store is an extension to Cache Loader that adds the set of operat ions generally referred to as Create, Read, Update and Delete ( CRUD)
Cache-Through: Coherence
Coherence Cache- Through operat ions are
alw ays m anaged by the ow ner of the data
w ithin the cluster.
Concurrent access operat ions are com bined by
the ow ner, great ly reducing database load.
W rite- Through keeps the cache and database
in sync by keeping the cache aw are of updates
Cache-Through: Read- & Write-Through
Cache-Through: Conclusions
Perform ance : Reduces latency for database access by interposing a cache betw een the applicat ion and the data. Database m odificat ions addit ionally involve a cache update.
Scalability : Channels and com bines data accesses. May significant ly reduce database load.
Uses : Any t im e a cache needs to t ransparent ly load data from a database. Clean encapsulat ion of data source access in one place: Cache Loader / Store.
Write-Behind: Descript ion
Coherence W rite- Behind accepts cache
m odificat ions direct ly into the cache
The m odificat ions are then asynchronously
w rit ten through the Cache Store, opt ionally
after a specified delay
The w rite- behind data is clustered, m aking it
resilient to server fa ilure
Write-Behind: I llust rat ion
Write-Behind: Conclusions
Perform ance : Low - latency for cached data reads and data m odificat ions.
Scalability : The sam e extrem e read/ w rite scalability of the cache, and significant ly ( 9 0 % + ) reduced load on the database
Coalesced w rite- through of m ult iple m odificat ions to an object
Batched w rite- through of m odificat ions to m ult iple objects
Uses : W hen w rite perform ance is im portant , data source load is high, and/ or w hen an applicat ion has to be able to cont inue w hen the data source is dow n.
Only use w hen all w rites to the data source com e through the cache
Topology Quiz: Exam ple Use Case
W hat cache topology w ould be opt im al for this
applicat ion?Caching 1 0 GB of financial port folio data
Read- heavy, updated night ly
Several hundred users
Several thousand requests per m inute
Topology Quiz: Exam ple Use Case
W hat cache topology w ould be opt im al for this
applicat ion?Caching user preferences for an in- house applicat ion
Several hundred concurrent users
Preferences updated a few t im es per day
Topology Quiz: Exam ple Use Case
W hat cache topology w ould be opt im al for this
applicat ion?Caching brow sing history for an online retailer to support
personalizat ion and real- t im e assistance
Many updates, few reads
Topology Quiz: Exam ple Use Case
W hat cache topology w ould be opt im al for this
applicat ion?Logging user interact ions to a database for internal
audit ing purposes
1 0 0 0 updates per m inute
Lesson 1: Use an MVC Architecture
Model/ View / Controller ( MVC, aka Model2 )Model: Dom ain- specific representat ion of the inform at ion
the applicat ion displays and on w hich it operates
View : Renders the m odel into a form suitable for
interact ion, typically a user interface elem ent or docum ent
Controller: Responds to events typically are user act ions
or service requests and invokes changes on the m odel
Lesson 1: Use an MVC Architecture
Clear delineat ion of responsibility; for
exam ple:Cache is used in the Model
Cache Loader / Store is the DAO
Cache contains the applicat ion s POJOs / Value Objects
View pulls data from the Model
Goal is to ensure that a ll accesses are served from cache
Controller affects the Model
Modificat ions via W rite- Through or W rite- Behind
Lesson 2: Specify the Data Access
There are m ult iple w ays to access dataORM ( JDO, EJB3 , Hibernate, etc.)
Cache API ( i.e . t ransparent ly via a Cache Loader)
JDBC ( or other direct integrat ion API )
Most applicat ions have a best w ayLarge- scale set oriented access usually indicates JDBC
Mix of set - and ident ity- oriented access indicates ORM
I dent ity- oriented access m ay indicate Cache API
Lesson 2: Specify the Data Access
Picking the w rong approach is disastrousRDBMS ( JDBC) opt im ized for set - based queries and
operat ions, including joins and aggregates, but crum bles
w ith heavy row - level access ( 1 + N access pat tern, etc.)
ORMs can bog dow n bady on large set - based access
JCache API is built around ident ity- based access, not set -
based access
Lesson 2: Specify the Data Access
Not a lw ays an obvious best choiceSom e applicat ions have a m ix of intensive row - level and
large set - level operat ions, w hich lend them selves poorly to
any single approach
Even a w ell- architected and carefully- designed applicat ion
w ill often have a few except ions to the rule that require
the specified approach to Data Access to be circum vented
I t is som et im es necessary to use different approaches for
different classes of data w ithin the sam e applicat ion
Lesson 2: Specify the Data Access
Opt im izat ions available for eachI t is often possible to cache JDBC result sets
Most ORMs have effect ive support for pluggable caches
Som e ORMs have opt im izat ions for set - based operat ions
that can t ranslate som e operat ions direct ly into opt im ized
SQL that perform s the ent ire operat ion w ithin the RDBMS
Coherence provides extensive query support , including
paralle l query w ith indexes and cost - based opt im izat ions
Lesson 3: Design a Domain Model
Dom ain Model includes tw o aspects of
applicat ion m odelingData Model: Describes the state that the applicat ion
m aintains, both in term s of persistent data ( e .g. the
system of record ) and runt im e data ( sessions, queued
events, requests, responses, etc.)
Behavioral Model: Describes the various act ions that can
affect the state of the applicat ion. Very sim ilar to the
concepts behind SOA, but at a m uch low er level.
Lesson 3: Design a Domain Model
Dom ain Model is not dissim ilar from SOAData Model: Analogous to the inform at ion encapsulated
and m anaged behind a set of services
Behavioral Model: Analogous to the set of services exposed
by a broker
The concept of a Dom ain Model is technology-
neutral. I t can even exist only in the abstract .Modeling is a tool, not a religion
Lesson 3: Design a Domain Model
Dom ain Model has valueAllow s the Data Model to exist independent ly of the behavioral m odel, support ing the separat ion of a cont roller from the m odel in an MVC architecture
The behavioral m odel is the basis for the events that a cont roller is required to support
W ith an abstract data m odel that reflects applicat ion concerns instead technology concerns, it is m uch m ore likely that the result ing data m odel im plem entat ion w ill be m ore easily used by the view and the cont roller
Lesson 3: Design a Domain Model
Dom ain Model is not newOO developers have been using Dom ain Modeling for years
Applicat ion Developers that double as DBAs have used
m odeling to get their ideas dow n into a design that could
provide both opt im al applicat ion im plem entat ion and
opt im al database organizat ion
SOA is the publishing of a behavior m odel that is intended
to be publicly- accessible, w ith data m odels often direct ly
reflected in the service request and response data
Lesson 4: Find the natural granular ity
Every applicat ion has a natural granularity for
its data.Relat ional data m odels have a norm alized granularity from
w hich tables naturally em erge
Opt im ized JDBC- based applicat ions have a statem ent
execut ion granularity and a Result Set granularity
ORM- based applicat ions and cache- intensive applicat ions
often have an OO granularity that m irrors the data m odel
Caches have an ident ity granularity of access
Lesson 4: Find the natural granular ity
Caches w ill typically exist for each m ajor class of applicat ion object
e.g. Accounts, Sym bols, Posit ions, Orders, Execut ions, etc.
Each cache w ill tend to have a natural keye.g. account id, sym bol, account id + sym bol, order id, etc.
Applicat ion objects tend to be com plexContain ow ned objects, e .g. Purchase Order contains Lines
Lesson 5: Decouple using I dent ity
Store, Load, Provide and Manage the I dent ity
of related m odel objects
Provide accessors for related m odel objects by
using I dent ity de- reference ( i.e . cache access)
Read- only m odels tend to have m ore lee- w aySoft references
Transient reference fields
Lesson 5: Decouple using I dent ity
Sim plifies m anagem ent of large object graphs
Enables efficient lazy loading of object graphs
W orks w ell w ith
Lesson 6: Use an I mmutable Data Model
From the View , the Model should t reated as if
it is read- only
From the point of view of the Controller , the
m odel that is shared across threads should be
t reated as im m utable, for exam ple just in case
the View is using it on a different thread
Lesson 6: Use an I mmutable Data Model
Since m ost applicat ions are not read- only, the
Controller does have to m odify the data
represented by the Model
W hen the Controller needs to m odify the
Model, it can obtain m utable clones of the
shared m odel, and m anage them
Lesson 7: Use Cache Transact ions
t ransact ionally.
W hen the Controller obtains cached values
w ithin a t ransact ion, the values are actually
clones of the m aster cached values
The Controller m akes its m odificat ions to the
Model in a t ransact ionally isolated and
consistent m anner
Lesson 7: Use Cache Transact ions
For m axim um scalability, m ost t ransact ions should be opt im ist ic. Just as w ith any opt im ist ic t ransact ion approach, this im plies that the applicat ion m ust handle and/ or ret ry t ransact ions w hose opt im ist ic checks fail
Cache Transact ions can integrate w ith the container s Transact ion Manager via the JEE Connector Architecture
Lesson 8: Use Queries Wisely
Cache Queries are opt im ized, and they are run in paralle l across a cluster, but they are probably at least an order of m agnitude m ore expensive than ident ity- based operat ions
I f you use queries, m ake sure to use indexes; the Coherence query opt im izer can use m ult iple indexes on a single query, even if they don t perfect ly cover the query
Lesson 9: Opt im ize Serializat ion
Objects that are stored in a cache m ay need to
be serialized, and Java s default object
seria lizat ion is relat ively inefficient
I m plem ent ing the Externalizable interface
m ay help slight ly
Serializat ion using data st ream s instead of
object st ream s can m ake a phenom enal im pact
Lesson 9: Opt im ize Serializat ion
Coherence provides the ExternalizableLiteinterface to support seria lizat ion to data st ream s, and the Tangosol Xm lBeanfram ew ork w hich im plem ents ExternalizableLite for derived value objects
Serializat ion perform ance im provem ents are up to an order of m agnitude, and the reduct ion in size can be up to 8 0 % .
Lesson 9: Opt im ize Serializat ion
Since Java does not have an object - cloning interface, classes that do not have a public clone( ) m ethod m ay require serializat ion and deseria lizat ion in order to be cloned
Since Cache Transact ions m ay need to clone an object to create a copy w ithin a local t ransact ion, opt im ized serializat ion can even im prove the perform ance of t ransact ions
Lesson 10: Use Good I dent it ies
An I dent ity im plem entat ion m ust provide
correct hashCode( ) and equals( )
im plem entat ions, and cache the hash- code!
A good toStr ing( ) im plem entat ion helps w ith
debugging ( and not just for I dent it ies!)
I f feasible, m ake your I dent ity classes
im m utable
Lesson 10: Use Good I dent it ies
An I dent ity m ust be Serializable, and its
seria lized form should be stable: Tw o
instances should ser ialize to the sam e binary
value if and only if equals( ) returns t rue
Java s Str ing, I nteger, Long, etc. are perfect
Tangosol Xm lBean is a good base class for ( or
exam ple of) com plex ident it ies
Lesson 11: Cache in the Right Scope
HTTP Session objects can be used for caching
user- or session- specific inform at ion; don t
use them as a cache for global inform at ion
Conversely, don t use a global cache for user-
specific caching w hen the HTTP Session w ould
do just fine
Lesson 12: Never Assume it Works
W e have seen caches in product ion that literally w ere not even get t ing used, and w e have seen caches that had not even been configured or w ere badly m is- configured
W hile load test ing, use JMX to m onitor w hat caches exist , how big they are, w hat their hit rates are, and w hether their average access t im es and access counts seem correct
Audience Response
Quest ion?
Dist r ibuted Caching: Essent ial Lessons
Considerat ions for m axim izing scalable perform ance and reliability in J2 EE clusters