leveraging websphere extreme scale as an in-line database buffer

WebSphere ®® Extreme ScaleExtreme Scale

IBM Extreme Transaction Processing (XTP) Patterns: Leveraging WebSphereExtreme Scale as an in-line database buffer

IBM Extreme Transaction Processing (XTP) Patterns:Leveraging WebSphere Extreme Scale as an in-line databasebuffer

Lan VuongWebSphere Technical Evangelist, XTPIBM RTPApril, 2009

© Copyright International Business Machines Corporation 2009. All rights reserved.

Keywords: write-behind cache, JPA loader, WebSphere eXtreme Scale, caching, cache, XTP,ObjectGrid, Performance, Extreme Transaction Processing, OpenJPA

In this paper, we will illustrate how to optimize the performance of an application by leveragingWebSphere eXtreme Scale as the intermediary between the database and the application. Thisarticle will provide an overview of the theory and implementation of the write-behind cachingsolution and JPA loader concepts. We will then review an example business case coupled withsample code to demonstrate how to deploy these features.

Introduction......................................................................................................................................2Defining and Configuring Key Concepts.........................................................................................3

What is a “write-behind” cache?..................................................................................................3Configuring the write-behind function........................................................................................ 4What is a JPA loader?..................................................................................................................4JPA Loader Configuration........................................................................................................... 5

Exploring an Example Business Case..............................................................................................6Use Case: Portal Personalization................................................................................................. 6

1. Populating the database........................................................................................................72. Warming the cache...............................................................................................................83. Generating load on the grid..................................................................................................94. Results..................................................................................................................................9

Conclusion..................................................................................................................................... 12Resources....................................................................................................................................... 12

Acknowledgements................................................................................................................13

IntroductionApplications typically use a data cache to increase performance, especially where the applicationpredominantly uses read-only transactions. These applications directly update the database forchanges in the data. The issue here is that as the load increases then the response time on theseupdates grows. Databases are not good at executing lots of concurrent transactions with a smallnumber of records per transaction. Databases are much better at executing batched transactions.Eventually, the database will saturate the CPU or disks and at that point the response time willrise as additional load is added. Conventional in-memory caches are also limited to only storingwhat can fit in the free memory of a JVM. Once we need to cache more than this amount of datathen thrashing occurs where the cache continuously evicts data to make room for other data. Therequired record must then be read continually thereby making the cache useless and exposing thedatabase to the full read load. This paper shows how WebSphere eXtreme Scale can allow all ofthe free memory of a cluster of JVMs to be used as a cache rather than just the free memory of asingle JVM. This allows the capacity of the cache to scale linearly as more JVMs areincorporated. If these JVMs are on additional physical servers with CPU, memory and network,then this allows linearly scalable, constant response time, servicing of read requests. It can alsoprovide linearly scalable, constant response time, servicing of update requests by leveraging itswrite-behind technology. The linear scalability of WebSphere eXtreme Scale makes it ideal tohandle extreme transaction processing (XTP) scenarios. XTP is defined by Gartner as "anapplication style aimed at supporting the design, development, deployment, management andmaintenance of distributed TP applications characterized by exceptionally demandingperformance, scalability, availability, security, manageability and dependability requirements.”(1)

In this paper, we will illustrate how to optimize the performance of an application by leveragingWebSphere eXtreme Scale as the intermediary between the database and the application.WebSphere eXtreme Scale is a highly available, distributed in-memory cache with manyadvanced features to boost application performance. The write-behind function batches updatesto the back-end database asynchronously within a user configurable interval of time. The

obvious advantage of this scenario is reduced database calls and therefore reduced transactionload and faster access to objects in the grid. This scenario also has faster response times than thewrite-through caching scenario where an update to the cache results in an immediate update tothe database. In the write-behind case, transactions no longer have to wait for the database writeoperation to finish. Additionally, it protects the application from database failure as the write-behind buffer will hold changes through memory replication until it can propagate them to thedatabase.

With this in-line database buffer, a loader is needed to synchronize data between the grid andback-end database. Any user written loader will work with write-behind but we will use thebuilt-in JPA Loader in this paper to discuss this capability. The Java Persistence API (JPA) is aspecification that allows mapping between Java Objects and relational databases. WebSphereeXtreme Scale 6.1.0.3 onwards includes a built-in JPA loader which uses this specification toautomatically map the cache data to the database relational data. Customers can use a JPAcompliant object relational mapper such as OpenJPA or Hibernate with this Loader. This article will provide an overview of the theory and implementation of the write-behindcaching solution and JPA loader concepts. We will then review an example business casecoupled with sample code to demonstrate how to deploy these features.

Defining and Configuring Key Concepts

What is a “write-behind” cache?In a write-behind cache, data reads and updates are all serviced by the cache, but unlike a write-through cache, updates are not immediately propagated to the data store. Instead, updates occurin the cache, the cache tracks the list of dirty updates and periodically flushes the current set ofdirty records to the data store. As an additional performance improvement, the cache doesconflation on these dirty records. Conflation means if the same record is updated or dirtiedmultiple times within the buffering period then it only keeps the last update. This cansignificantly improve performance in scenarios where values change very frequently such asstock prices in financial markets. If a stock price changes 100 times a second then normally thatwould have meant 30 x 100 updates to the loader every 30 seconds. Conflation reduces that toone update. This list of dirty updates is replicated to ensure its survival if a JVM exits. Thecustomer can specify the level of replication and there are basically two choices, synchronousand asynchronous. Synchronous replication means no data loss when a JVM exits but it is sloweras the primary must wait for the replicas to acknowledge they have received the change.Asynchronous replication is much faster (typically at least 6x) but changes from the very latesttransactions may be lost if a JVM exits before they are replicated.

The dirty record list is written to the data source using a large batch transaction. If the datasource is not available, then the grid continues processing requests and will try again later. Thegrid can offer constant response times as it scales for changes because changes are committed tothe grid alone and transactions can commit even if the database is down. If the grid JVMs havefailures while these dirty records are buffered, then the grid automatically fails over and retriesthe flushing on the backup server. It also creates additional replicas after a failure to reduce the

risk of losing this list of dirty records on a second failure. The main issue with this approach isthat the database isn't always up to date. This may not be a problem as this style of grid can beused to preprocess large amounts of data at grid speed before writing to the data source for laterprocessing/reporting.

A write-behind cache may not be suitable for all situations. The nature of write-behind meansthat, for a time, changes which the user sees as committed are not reflected in the database. Thistime delay is called “cache write latency” or “database staleness”; the delay between databasechanges and the cache being updated (or invalidated) to reflect them is called “cache readlatency” or “cache staleness”. If all parts of the system access the data through the cache (e.g.,through a common interface) then write-behind will be acceptable as the cache will always havethe correct latest record. It is expected that for a system using write behind that all changes aremade through the cache and through no other path.

Either a sparse cache or complete cache could be used with the write-behind feature. The sparsecache only stores a subset of the data and can be populated lazily. Sparse caches are normallyaccessed using keys since not all of the data is available in the cache and thus queries cannot bedone using the cache. A complete cache contains all the data but could take a long time to loadinitially. A third method is available that is a compromise between these two options. Itpreloads the cache with a subset of the data in a short amount of time and then lazily loads therest. The subset that is preloaded is roughly 20% of the total number of records but it fulfills80% of the requests.

An application using WebSphere eXtreme Scale in this manner is typically only used forscenarios which access partitionable data models using simple CRUD (Create, Read, Update, andDelete) patterns.

Configuring the write-behind functionThe write-behind function is enabled in the objectgrid.xml configuration by adding thewriteBehind attribute to the backingMap element as shown below. The value of the parameteruses the syntax “"[T(time)][;][C(count)]" which specifies when the database updates occur. Theupdates are written to the persistent store when either the specified time in seconds has passed orthe number of changes in the queue map has reached the count value.

Listing 1. An example of write-behind configuration

<objectGrid name="UserGrid"><backingMap name="Map" pluginCollectionRef="User"lockStrategy="PESSIMISTIC" writeBehind="T180;C1000"/>

What is a JPA loader?Loaders are needed to read and write data from the database when usingWebSphere eXtreme Scale as an in-memory cache. Starting withWebSphere eXtreme Scale 6.1.0.3, there are two built-in loaders thatinteract with JPA providers to map relational data to the ObjectGrid maps,the JPALoader and JPAEntityLoader. The JPALoader is used for cachesthat store POJO’s and the JPAEntityLoader is used for caches that storeObjectGrid entities.

JPA Loader ConfigurationTo configure a JPA loader, changes must be made to the objectgrid.xml and a persistence.xmlfile must be added to the META-INF directory.

A transaction callback must be defined to receive transaction commit or rollback events and sendit to the JPA layer. Configure the transaction callback by adding a JPATxCallback bean to theobjectGrid definition. The persistenceUnitName property defines the location of the JPA entitymetadata in the persistence.xml. Configure a loader by adding a JPALoader or JPAEntityLoaderbean. The entityClassName property is required for the JPA loaders.

Here is a sample objectgrid.xml:

Listing 2. Sample objectgrid.xml

<?xml version="1.0" encoding="UTF-8"?><objectGridConfig xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://ibm.com/ws/objectgrid/config ../objectGrid.xsd" xmlns="http://ibm.com/ws/objectgrid/config"> <objectGrids> <objectGrid name="UserGrid" txTimeout="30"> <bean id="TransactionCallback"className="com.ibm.websphere.objectgrid.jpa.JPATxCallback"> <property name="persistenceUnitName" type="java.lang.String"value="userPUDB2"/> </bean> <backingMap name="Map" pluginCollectionRef="User"lockStrategy="PESSIMISTIC" writeBehind="T180;C1000"/> </objectGrid> </objectGrids> <backingMapPluginCollections> <backingMapPluginCollection id="User" > <bean id="Loader"className="com.ibm.websphere.objectgrid.jpa.JPALoader">

Figure 1. Write-behind cache and JPA loader topology

Server

ObjectGrid

JPA Loader

Write-behind Buffer

Database

JPA Provider

<property name="entityClassName" type="java.lang.String"value="com.ibm.personalization.model.User"/> </bean> </backingMapPluginCollection> </backingMapPluginCollections></objectGridConfig>

The loader is configured via the persistence.xml file which should be stored in your application’sMETA-INF folder. This configuration file denotes a particular JPA provider for the persistenceunit along with provider specific properties.

Here is a sample persistence.xml using the OpenJPA provider:

Listing 3. Sample persistence.xml

<?xml version="1.0"?><persistence xmlns="http://java.sun.com/xml/ns/persistence" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" version="1.0"> <persistence-unit name="userPUDB2">

<provider>org.apache.openjpa.persistence.PersistenceProviderImpl</provider> <class>com.ibm.personalization.model.User</class> <class>com.ibm.personalization.model.UserAccount</class> <class>com.ibm.personalization.model.UserTransaction</class> <properties> <property name="openjpa.ConnectionURL" value="jdbc:db2://myserver:50001/userdb" /> <property name="openjpa.ConnectionDriverName"value="com.ibm.db2.jcc.DB2Driver" /> <property name="openjpa.ConnectionUserName" value="db2inst1" /> <property name="openjpa.ConnectionPassword" value="mypassword" />

<property name="openjpa.jdbc.DBDictionary" value="db2"/><property name="openjpa.jdbc.SynchronizeMappings"

value="buildSchema"/> <property name="openjpa.Log" value="DefaultLevel=WARN,MetaData=INFO, Runtime=INFO, Tool=INFO, JDBC=INFO, SQL=WARN, Enhance=INFO"/>

</properties> </persistence-unit></persistence>

Exploring an Example Business CaseAn online banking website with a growing number of users is experiencing slow response timesand scalability issues with their environment. They need a way to support their clients with theexisting hardware. Next we’ll walk you through this use case to see how the write-behind featurecan help resolve their issue.

Use Case: Portal PersonalizationInstead of pulling the user profile information directly from the database, they will preload thecache with the profiles from the database. This means the cache can service the read requests

instead of the database. We have customers loading well over a 100Gb of records in to the cachefor this kind of scenario. Profile updates were also written directly to the database in the oldsystem. This limited the number of concurrent updates/second with an acceptable response timeas the database machine would saturate. The new system writes profile changes to the grid andthen these changes are pushed to the database using the write-behind technology. This allows thegrid to service these with the usual grid quality of service and performance and it completelydecouples the single instance database from the read and write profile operations. The customercan now scale up the profile service simply by adding more JVMs/servers to the grid and theywill see linear scaling of throughput with constant response time. The database is no longer abottleneck as there are vastly fewer transactions sent to the back-end. The quicker response leadsto faster page loads and results in a better user experience as well as cost effective scaling of theprofile server and better availability as the database is no longer a single point of failure. The gridrecovers from typical failures in under a second and a failure only impacts the subset of the dataon that server, the rest of the data remains available.

In this case, we are using a DB2 database with OpenJPA provider. The data model for thisscenario is a User which contains one-to-many UserAccount’s and UserTransaction’s. Thefollowing code snippet from the User class shows this relationship:

Listing 4: Use case entity relationships

@OneToMany(mappedBy = "user", fetch = FetchType.EAGER, cascade ={ CascadeType.ALL })private Set<UserAccount> accounts = new HashSet<UserAccount>(); @OneToMany(mappedBy = "user", fetch = FetchType.EAGER, cascade ={ CascadeType.ALL })private Set<UserTransaction> transactions = new HashSet<UserTransaction>();

1. Populating the databaseThe sample code includes the class PopulateDB which loads some user data into the database.The DB2 database connection information is defined in the persistence.xml shown earlier. Thepersistence unit name listed in the persistence.xml is used to create the JPAEntityManagerFactory. User objects are created and then persisted to the database in batches.

Here is a paraphrased code snippet to show the flow:

Listing 5: Database population example

javax.persistence.EntityManagerFactory emf = null;synchronized (PopulateDB.class) {emf = Persistence.createEntityManagerFactory(puName);}javax.persistence.EntityManager em = emf.createEntityManager();for (int i = start; i < end; i++) { if ((i - start) % BATCH_SIZE == 0) {

em.getTransaction().begin();}

User user = createUser(i, totalUsers);em.persist(user);

if (((i - start) + 1) % BATCH_SIZE == 0) {em.getTransaction().commit();em.clear();

}}if (em.getTransaction().isActive()) {

em.getTransaction().commit();em.clear();

}

Here’s a sample command to populate the database:

$JAVA_HOME/bin/java -Xms1024M -Xmx1024M -verbose:gc -classpath$TEST_CLASSPATH ogdriver.PopulateDB 1000000 5

2. Warming the cacheAfter the database is loaded, the cache is preloaded using data grid agents. The records arewritten to the cache in batches so there are fewer trips between the client and server. Multipleclients should also be used to speed up the warm-up time. The cache can be warmed up with a“hot” set of data which is a subset of the all the records and the remaining data will be loadedlazily. Preloading the cache increases the chances of a cache hit and reduces the need to retrievedata from back-end tiers. For this example, data matching the database records was inserted intothe cache rather than loaded from the database to expedite the execution time.

Here is a code snippet showing the batched inserts into the grid:

Listing 6: Cache preloading example

public void putAll(Map<K,V> batch, BackingMap bmap) throws Exception {Map<Integer, Map<K,V>> pmap = convertToPartitionEntryMap(bmap, batch);

Iterator<Map<K,V>> items = pmap.values().iterator();ArrayList<Future<Boolean>> results = new ArrayList<Future<Boolean>>();while(items.hasNext()) {

Map<K,V> perPartitionEntries = items.next();// we need one key for partition routing// so get the first oneK key = perPartitionEntries.keySet().iterator().next();// invoke the agent to add the batch of records to the gridInsertAgent<K,V> ia = new InsertAgent<K,V>();ia.batch = perPartitionEntries;Future<Boolean> fv = threadPool.submit(new

InserterThread(bmap.getName(), key, ia));results.add(fv);

}Iterator<Future<Boolean>> iter = results.iterator();while(iter.hasNext()) {

Future<Boolean> fv = iter.next();Boolean r = fv.get();if(r.booleanValue() == false) {

throw new RuntimeException("Put failed");}

}}

Here’s a sample command to preload the cache:

$JAVA_HOME/bin/java -Xms1024M -Xmx1024M -verbose:gc -classpath$TEST_CLASSPATH ogdriver.ClientDriver –load -m Map -n 1000000 -g UserGrid -nt5 -r 1000 -t 200000 -c $CATALOG_1

3. Generating load on the gridThe sample code includes a client driver that mimics operations on the grid to demonstrate howthe write-behind caching function increases performance. The client has several options to tweakthe load behavior. The following command will load 500K records to the “UserGrid” grid using10 threads with a rate of 200 requests per thread.

$JAVA_HOME/bin/java -Xms1024M -Xmx1024M -verbose:gc -classpath$TEST_CLASSPATH ogdriver.ClientDriver -m Map -n 500000 -g UserGrid -nt 10 -r200 -c $CATALOG_1

All of the available options are documented in the Options class.

4. ResultsUse of the write-behind feature can lead to an improvement in performance. We ran the samplecode using write-through and write-behind for a comparison in response time and database CPUutilization. Data was inserted into the cache that matched the records in the database to avoid thewarm up time and produce a consistent read response time so we can compare the write responsetimes. Here are the charts showing read and write response times for both cases. It shows thewrite-through scenario results in higher response times for updates whereas the write-behindscenario has update times that are nearly the same as the reads. More JVMs can be added toincrease the capacity of the cache without changing the response times as there’s no longer abottleneck with the database.

Listing 7: Chart of response times for write-through cache scenario

Write-through Response Times

0

100

200

300

400

500

600

700

22:2

2:42

:150

22:3

2:16

:229

22:4

1:30

:718

22:5

0:42

:728

22:5

9:49

:175

23:0

8:53

:731

23:1

8:06

:521

23:2

6:47

:036

23:3

6:27

:068

23:4

5:09

:112

23:5

3:54

:299

00:0

3:05

:176

00:1

1:35

:348

00:2

1:14

:921

00:3

0:07

:094

00:3

9:00

:192

00:4

8:40

:396

Tim e

Res

pons

e Ti

me

(ms)

reads

w rites

Listing 8: Chart of response times for write-behind cache scenario

Write-behind Response Times

0

5

10

15

20

25

30

06:3

6:34

:601

06:3

6:51

:351

06:3

7:09

:483

06:3

7:29

:982

06:3

7:48

:376

06:3

8:11

:695

06:3

8:28

:084

06:3

8:47

:687

06:3

9:10

:176

06:3

9:32

:339

06:3

9:59

:271

06:4

0:21

:415

06:4

0:47

:901

06:4

1:14

:452

06:4

1:38

:184

06:4

1:59

:489

06:4

2:19

:639

06:4

2:40

:968

Time

Res

pons

e Ti

me

(ms)

reads

w rites

The database CPU utilization charts illustrate the improvement in back-end load when usingwrite-behind. Rather than having constant load on the back-end like the write-through scenario,

the write-behind case results in low CPU with load on the back-end only when buffer interval isreached. The write-behind configuration should be tuned to best match your environment withregards to the ratio of write transactions, the same record update frequency, and database updatelatency.

Listing 9: Chart of database CPU utilization for write-through cache scenario

Write-through Database CPU Utilization

0

2

4

6

8

10

12

14

16

18

20

22:1

4:01

:500

22:2

3:46

:869

22:3

3:32

:017

22:4

3:17

:177

22:5

3:02

:338

23:0

2:47

:497

23:1

2:32

:662

23:2

2:18

:328

23:3

2:03

:526

23:4

1:48

:914

23:5

1:34

:297

00:0

1:19

:798

00:1

1:05

:192

00:2

0:50

:372

00:3

0:35

:534

00:4

0:20

:928

00:5

0:06

:337

Time

CPU DB server

Listing 10: Chart of database CPU utilization for write-behind cache scenario

Write-behind Database CPU Utiliation

0

2

4

6

8

10

12

14

16

18

20

06:36

:02:37

2

06:36

:32:59

8

06:37

:02:60

8

06:37

:32:61

5

06:38

:02:85

6

06:38

:32:86

4

06:39

:02:87

3

06:39

:32:88

2

06:40

:02:88

9

06:40

:32:89

5

06:41

:02:90

1

06:41

:32:90

8

06:42

:02:91

5

06:42

:32:92

4

Time

CPU DB server

ConclusionThis article reviewed the write-behind caching scenario, JPA loader, and batched agentpreloading and showed how these WebSphere eXtreme Scale functions can be deployed togetherto provide an extreme transaction processing solution. The write-behind caching functionreduces back-end load, decreases transaction response time, and isolates the application fromback-end failure. These benefits and its simple configuration make the write-behind caching atruly powerful feature.

Resources1. Gartner Group - http://www.gartner.com/

2. User’s Guide to WebSphere eXtreme Scale - http://www.redbooks.ibm.com/abstracts/sg247683.html

3. WebSphere eXtreme Scale Wiki - http://www.ibm.com/developerworks/wikis/display/objectgrid/Getting+started

AcknowledgementsThanks to Billy Newport, Jian Tang, Thuc Nguyen, Tom Alcott, and Art Jolin for their help withthis article.

http://www.ibm.com/developerworks/wikis/display/objectgrid/Getting+started

http://www.ibm.com/developerworks/wikis/display/objectgrid/Getting+started

http://www.redbooks.ibm.com/abstracts/sg247683.html

http://www.redbooks.ibm.com/abstracts/sg247683.html

Getting Started with sample code Contents of download file

o OGClientDriver.zip Required Libraries

o WebSphere eXtreme Scale trial downloado OpenJPAo args4j JARo DB2

Trademarks

IBM, the IBM logo, WebSphere and DB2 are trademarks or registered trademarks of theInternational Business Machines Corporation in the United States, other countries, or both.

Java and all Java-based trademarks and logos are trademarks or registered trademarks of SunMicrosystems, Inc. in the United States, other countries, or both.

Other company, product, or service names may be trademarks or service marks of others.

Disclaimers

© IBM Corporation 2009. All Rights Reserved.

The information contained in this publication is provided for informational purposes only. While effortswere made to verify the completeness and accuracy of the information contained in this publication, it isprovided AS IS without warranty of any kind, express or implied. In addition, this information is basedon IBM’s current product plans and strategy, which are subject to change by IBM without notice. IBMshall not be responsible for any damages arising out of the use of, or otherwise related to, this publicationor any other materials. Nothing contained in this publication is intended to, nor shall have the effect of,creating any warranties or representations from IBM or its suppliers or licensors, or altering the termsand conditions of the applicable license agreement governing the use of IBM software.

References in this publication to IBM products, programs, or services do not imply that they will beavailable in all countries in which IBM operates. Product release dates and/or capabilities referenced inthis presentation may change at any time at IBM’s sole discretion based on market opportunities or otherfactors, and are not intended to be a commitment to future product or feature availability in any way.Nothing contained in these materials is intended to, nor shall have the effect of, stating or implying thatany activities undertaken by you will result in any specific sales, revenue growth, savings or other results.

All performance data contained in this publication was obtained in the specific operatingenvironment and under the conditions described above and is presented as an illustration only.Performance obtained in other operating environments may vary and customers should conducttheir own testing.

leveraging websphere extreme scale as an in-line database buffer

Documents