developing polyglot persistence applications #javaone 2012

Post on 10-May-2015

3.352 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

NoSQL databases such as Redis, MongoDB and Cassandra are emerging as a compelling choice for many applications. They can simplify the persistence of complex data models and offer significantly better scalability and performance. However, using a NoSQL database means giving up the benefits of the relational model such as SQL, constraints and ACID transactions. For some applications, the solution is polyglot persistence: using SQL and NoSQL databases together. In this talk, you will learn about the benefits and drawbacks of polyglot persistence and how to design applications that use this approach. We will explore the architecture and implementation of an example application that uses MySQL as the system of record and Redis as a very high-performance database that handles queries from the front-end. You will learn about mechanisms for maintaining consistency across the various databases.

TRANSCRIPT

DEVELOPING POLYGLOT PERSISTENCE APPLICATIONS

Chris Richardson

Author of POJOs in ActionFounder of the original CloudFoundry.com

@crichardsoncrichardson@vmware.comhttp://plainoldobjects.com/

Presentation goal

The benefits and drawbacks of polyglot persistence

and

How to design applications that use this approach

About Chris

(About Chris)

About Chris()

About Chris

About Chris

http://www.theregister.co.uk/2009/08/19/springsource_cloud_foundry/

vmc push About-Chris

Developer Advocate for CloudFoundry.com

Signup at http://cloudfoundry.com promo code: cfjavaone

Agenda

• Why polyglot persistence?

• Using Redis as a cache

• Optimizing queries using Redis materialized views

• Synchronizing MySQL and Redis

• Tracking changes to entities

• Using a modular asynchronous architecture

Food to Go

• Take-out food delivery service

• “Launched” in 2006

Food To Go Architecture

Order taking

Restaurant Management

MySQL Database

CONSUMER RESTAURANT OWNER

Success ⇒ Growth challenges

• Increasing traffic

• Increasing data volume

• Distribute across a few data centers

• Increasing domain model complexity

Limitations of relational databases

• Scalability

• Distribution

• Schema updates

• O/R impedance mismatch

• Handling semi-structured data

Solution: Spend Money

http://upload.wikimedia.org/wikipedia/commons/e/e5/Rising_Sun_Yacht.JPG

OR

http://www.trekbikes.com/us/en/bikes/road/race_performance/madone_5_series/madone_5_2/#

Solution: Use NoSQL

Benefits

• Higher performance

• Higher scalability

• Richer data-model

• Schema-less

Drawbacks

• Limited transactions

• Limited querying

• Relaxed consistency

• Unconstrained data

Example NoSQL Databases

Database Key features

Cassandra Extensible column store, very scalable, distributed

Neo4j Graph database

MongoDB Document-oriented, fast, scalable

Redis Key-value store, very fast

http://nosql-database.org/ lists 122+ NoSQL databases

• Advanced key-value store

• Very fast, e.g. 100K reqs/sec

• Optional persistence

• Transactions with optimistic locking

• Master-slave replication

• Sharding using client-side consistent hashing

RedisK1 V1

K2 V2

... ...

Sorted sets

myset a

5.0

b

10.

Key Value

ScoreMembers are sorted by score

Adding members to a sorted setRedis Server

zadd myset 5.0 a myset a

5.0

Key Score Value

Adding members to a sorted setRedis Server

zadd myset 10.0 b myset a

5.0

b

10.

Adding members to a sorted setRedis Server

zadd myset 1.0 c myset a

5.0

b

10.

c

1.0

Retrieving members by index range

Redis Server

zrange myset 0 1

myset a

5.0

b

10.

c

1.0ac

Key StartIndex

EndIndex

Retrieving members by score

Redis Server

zrangebyscore myset 1 6

myset a

5.0

b

10.

c

1.0ac

Key Min value

Max value

Redis use cases• Replacement for Memcached

• Session state

• Cache of data retrieved from system of record (SOR)

• Replica of SOR for queries needing high-performance

• Handling tasks that overload an RDBMS

• Hit counts - INCR

• Most recent N items - LPUSH and LTRIM

• Randomly selecting an item – SRANDMEMBER

• Queuing – Lists with LPOP, RPUSH, ….

• High score tables – Sorted sets and ZINCRBY

• …

Redis is great but there are tradeoffs

• Low-level query language: PK-based access only

• Limited transaction model:

• Read first and then execute updates as batch

• Difficult to compose code

• Data must fit in memory

• Single-threaded server : run multiple with client-side sharding

• Missing features such as access control, ...

And don’t forget:

An RDBMS is fine for many applications

The future is polyglot

IEEE Software Sept/October 2010 - Debasish Ghosh / Twitter @debasishg

e.g. Netflix• RDBMS• SimpleDB• Cassandra• Hadoop/Hbase

Agenda

• Why polyglot persistence?

• Using Redis as a cache

• Optimizing queries using Redis materialized views

• Synchronizing MySQL and Redis

• Tracking changes to entities

• Using a modular asynchronous architecture

Increase scalability by caching

Order taking

Restaurant Management

MySQL Database

CONSUMER RESTAURANT OWNER

Cache

Caching Options

• Where:

• Hibernate 2nd level cache

• Explicit calls from application code

• Caching aspect

• Cache technologies: Ehcache, Memcached, Infinispan, ...

Redis is also an option

Using Redis as a cache

• Spring 3.1 cache abstraction

• Annotations specify which methods to cache

• CacheManager - pluggable back-end cache

• Spring Data for Redis

• Simplifies the development of Redis applications

• Provides RedisTemplate (analogous to JdbcTemplate)

• Provides RedisCacheManager

Using Spring 3.1 Caching@Servicepublic class RestaurantManagementServiceImpl implements RestaurantManagementService {

private final RestaurantRepository restaurantRepository;

@Autowired public RestaurantManagementServiceImpl(RestaurantRepository restaurantRepository) { this.restaurantRepository = restaurantRepository; } @Override public void add(Restaurant restaurant) { restaurantRepository.add(restaurant); }

@Override @Cacheable(value = "Restaurant") public Restaurant findById(int id) { return restaurantRepository.findRestaurant(id); }

@Override @CacheEvict(value = "Restaurant", key="#restaurant.id") public void update(Restaurant restaurant) { restaurantRepository.update(restaurant); }

Cache result

Evict from cache

Configuring the Redis Cache Manager

<cache:annotation-driven />

<bean id="cacheManager" class="org.springframework.data.redis.cache.RedisCacheManager" > <constructor-arg ref="restaurantTemplate"/> </bean>

Enables caching

Specifies CacheManager implementation

The RedisTemplate used to access Redis

TimeRange MenuItem

Domain object to key-value mapping?

Restaurant

TimeRange MenuItem

ServiceArea

K1 V1

K2 V2

... ...

RedisTemplate

• Analogous to JdbcTemplate

• Encapsulates boilerplate code, e.g. connection management

• Maps Java objects ⇔ Redis byte[]’s

Serializers: object ⇔ byte[]

• RedisTemplate has multiple serializers

• DefaultSerializer - defaults to JdkSerializationRedisSerializer

• KeySerializer

• ValueSerializer

• HashKeySerializer

• HashValueSerializer

Serializing a Restaurant as JSON@Configurationpublic class RestaurantManagementRedisConfiguration {

@Autowired private RestaurantObjectMapperFactory restaurantObjectMapperFactory; private JacksonJsonRedisSerializer<Restaurant> makeRestaurantJsonSerializer() { JacksonJsonRedisSerializer<Restaurant> serializer =

new JacksonJsonRedisSerializer<Restaurant>(Restaurant.class); ... return serializer; }

@Bean @Qualifier("Restaurant") public RedisTemplate<String, Restaurant> restaurantTemplate(RedisConnectionFactory factory) { RedisTemplate<String, Restaurant> template = new RedisTemplate<String, Restaurant>(); template.setConnectionFactory(factory); JacksonJsonRedisSerializer<Restaurant> jsonSerializer = makeRestaurantJsonSerializer(); template.setValueSerializer(jsonSerializer); return template; }

}

Serialize restaurants using Jackson JSON

Caching with Redis

Order taking

Restaurant Management

MySQL Database

CONSUMER RESTAURANT OWNER

RedisCacheFirst Second

Agenda

• Why polyglot persistence?

• Using Redis as a cache

• Optimizing queries using Redis materialized views

• Synchronizing MySQL and Redis

• Tracking changes to entities

• Using a modular asynchronous architecture

Finding available restaurants

Available restaurants =Serve the zip code of the delivery address

AND Are open at the delivery time

public interface AvailableRestaurantRepository {

List<AvailableRestaurant> findAvailableRestaurants(Address deliveryAddress, Date deliveryTime); ...}

Food to Go – Domain model (partial)

class Restaurant { long id; String name; Set<String> serviceArea; Set<TimeRange> openingHours; List<MenuItem> menuItems;}

class MenuItem { String name; double price;}

class TimeRange { long id; int dayOfWeek; int openTime; int closeTime;}

Database schemaID Name …

1 Ajanta

2 Montclair Eggshop

Restaurant_id zipcode

1 94707

1 94619

2 94611

2 94619

Restaurant_id dayOfWeek openTime closeTime

1 Monday 1130 1430

1 Monday 1730 2130

2 Tuesday 1130 …

RESTAURANT table

RESTAURANT_ZIPCODE table

RESTAURANT_TIME_RANGE table

Finding available restaurants on Monday, 6.15pm for 94619 zipcode

Straightforward three-way join

select r.*from restaurant r inner join restaurant_time_range tr on r.id =tr.restaurant_id inner join restaurant_zipcode sa on r.id = sa.restaurant_id where ’94619’ = sa.zip_code and tr.day_of_week=’monday’ and tr.openingtime <= 1815 and 1815 <= tr.closingtime

How to scale queries?

Option #1: Query caching

• [ZipCode, DeliveryTime] ⇨ list of available restaurants

BUT

• Long tail queries

• Update restaurant ⇨ Flush entire cache

Ineffective

Option #2: Master/Slave replication

MySQL Master

MySQL Slave 1

MySQL Slave 2

MySQL Slave N

Writes Consistent reads

Queries(Inconsistent reads)

Master/Slave replication

• Mostly straightforward

BUT

• Assumes that SQL query is efficient

• Complexity of administration of slaves

• Doesn’t scale writes

Option #3: Redis materialized views

Order taking

Restaurant Management

MySQL Database

CONSUMER RESTAURANT OWNER

CacheRedis

Systemof RecordCopy

findAvailable()update()

BUT how to implement findAvailableRestaurants() with Redis?!

?select r.*from restaurant r inner join restaurant_time_range tr on r.id =tr.restaurant_id inner join restaurant_zipcode sa on r.id = sa.restaurant_id where ’94619’ = sa.zip_code and tr.day_of_week=’monday’

and tr.openingtime <= 1815 and 1815 <= tr.closingtime

K1 V1

K2 V2

... ...

Where we need to be

ZRANGEBYSCORE myset 1 6

select value,scorefrom sorted_setwhere key = ‘myset’ and score >= 1 and score <= 6

=

key value score

sorted_set

We need to denormalize

Think materialized view

Simplification #1: Denormalization

Restaurant_id Day_of_week Open_time Close_time Zip_code

1 Monday 1130 1430 94707

1 Monday 1130 1430 94619

1 Monday 1730 2130 94707

1 Monday 1730 2130 94619

2 Monday 0700 1430 94619

SELECT restaurant_id FROM time_range_zip_code WHERE day_of_week = ‘Monday’ AND zip_code = 94619 AND 1815 < close_time AND open_time < 1815

Simpler query: No joins Two = and two <

Simplification #2: Application filtering

SELECT restaurant_id, open_time FROM time_range_zip_code WHERE day_of_week = ‘Monday’ AND zip_code = 94619 AND 1815 < close_time AND open_time < 1815

Even simpler query• No joins• Two = and one <

Simplification #3: Eliminate multiple =’s with concatenation

SELECT restaurant_id, open_time FROM time_range_zip_code WHERE zip_code_day_of_week = ‘94619:Monday’ AND 1815 < close_time

Restaurant_id Zip_dow Open_time Close_time

1 94707:Monday 1130 1430

1 94619:Monday 1130 1430

1 94707:Monday 1730 2130

1 94619:Monday 1730 2130

2 94619:Monday 0700 1430

key

range

Simplification #4: Eliminate multiple RETURN VALUES with concatenation

SELECT open_time_restaurant_id, FROM time_range_zip_code WHERE zip_code_day_of_week = ‘94619:Monday’ AND 1815 < close_time

zip_dow open_time_restaurant_id close_time

94707:Monday 1130_1 1430

94619:Monday 1130_1 1430

94707:Monday 1730_1 2130

94619:Monday 1730_1 2130

94619:Monday 0700_2 1430

...

zip_dow open_time_restaurant_id close_time

94707:Monday 1130_1 1430

94619:Monday 1130_1 1430

94707:Monday 1730_1 2130

94619:Monday 1730_1 2130

94619:Monday 0700_2 1430

...

Sorted Set [ Entry:Score, …]

[1130_1:1430, 1730_1:2130]

[0700_2:1430, 1130_1:1430, 1730_1:2130]

Using a Redis sorted set as an index

Key

94619:Monday

94707:Monday

94619:Monday 0700_2 1430

Querying with ZRANGEBYSCORE

ZRANGEBYSCORE 94619:Monday 1815 2359

{1730_1}

1730 is before 1815 Ajanta is open

Delivery timeDelivery zip and day

Sorted Set [ Entry:Score, …]

[1130_1:1430, 1730_1:2130]

[0700_2:1430, 1130_1:1430, 1730_1:2130]

Key

94619:Monday

94707:Monday

Adding a Restaurant

Text

@Componentpublic class AvailableRestaurantRepositoryImpl implements AvailableRestaurantRepository {

@Override public void add(Restaurant restaurant) { addRestaurantDetails(restaurant); addAvailabilityIndexEntries(restaurant); }

private void addRestaurantDetails(Restaurant restaurant) { restaurantTemplate.opsForValue().set(keyFormatter.key(restaurant.getId()), restaurant); } private void addAvailabilityIndexEntries(Restaurant restaurant) { for (TimeRange tr : restaurant.getOpeningHours()) { String indexValue = formatTrId(restaurant, tr); int dayOfWeek = tr.getDayOfWeek(); int closingTime = tr.getClosingTime(); for (String zipCode : restaurant.getServiceArea()) { redisTemplate.opsForZSet().add(closingTimesKey(zipCode, dayOfWeek), indexValue,

closingTime); } } }

key member

score

Store as JSON

Finding available Restaurants@Componentpublic class AvailableRestaurantRepositoryImpl implements AvailableRestaurantRepository { @Override public List<AvailableRestaurant>

findAvailableRestaurants(Address deliveryAddress, Date deliveryTime) { String zipCode = deliveryAddress.getZip(); int dayOfWeek = DateTimeUtil.dayOfWeek(deliveryTime); int timeOfDay = DateTimeUtil.timeOfDay(deliveryTime); String closingTimesKey = closingTimesKey(zipCode, dayOfWeek);

Set<String> trsClosingAfter = redisTemplate.opsForZSet().rangeByScore(closingTimesKey, timeOfDay, 2359);

Set<String> restaurantIds = new HashSet<String>(); for (String tr : trsClosingAfter) { String[] values = tr.split("_"); if (Integer.parseInt(values[0]) <= timeOfDay) restaurantIds.add(values[1]); } Collection<String> keys = keyFormatter.keys(restaurantIds); return availableRestaurantTemplate.opsForValue().multiGet(keys); }

Find those that close after

Filter out those that open after

Retrieve open restaurants

Sorry Ted!

http://en.wikipedia.org/wiki/Edgar_F._Codd

Agenda

• Why polyglot persistence?

• Using Redis as a cache

• Optimizing queries using Redis materialized views

• Synchronizing MySQL and Redis

• Tracking changes to entities

• Using a modular asynchronous architecture

MySQL & Redis need to be consistent

Two-Phase commit is not an option

• Redis does not support it

• Even if it did, 2PC is best avoided http://www.infoq.com/articles/ebay-scalability-best-practices

AtomicConsistentIsolatedDurable

Basically AvailableSoft stateEventually consistent

BASE: An Acid Alternative http://queue.acm.org/detail.cfm?id=1394128

Updating Redis #FAILbegin MySQL transaction update MySQL update Redisrollback MySQL transaction

Redis has updateMySQL does not

begin MySQL transaction update MySQLcommit MySQL transaction<<system crashes>> update Redis

MySQL has updateRedis does not

Updating Redis reliablyStep 1 of 2

begin MySQL transactionupdate MySQLqueue CRUD event in MySQL

commit transaction

ACID

Event IdOperation: Create, Update, DeleteNew entity state, e.g. JSON

Updating Redis reliably Step 2 of 2

for each CRUD event in MySQL queue get next CRUD event from MySQL queue

If CRUD event is not duplicate then Update Redis (incl. eventId)end ifbegin MySQL transaction

mark CRUD event as processedcommit transaction

ID JSON processed?

ENTITY_CRUD_EVENT

EntityCrudEventProcessor

RedisUpdater

apply(event)

Step 1 Step 2

EntityCrudEventRepository

INSERT INTO ...

Timer

SELECT ... FROM ...

Redis

Updating Redis

WATCH restaurant:lastSeenEventId:≪restaurantId≫

Optimistic locking

Duplicate detection

lastSeenEventId = GET restaurant:lastSeenEventId:≪restaurantId≫

if (lastSeenEventId >= eventId) return;

Transaction

MULTI SET restaurant:lastSeenEventId:≪restaurantId≫ eventId

... update the restaurant data...

EXEC

Agenda

• Why polyglot persistence?

• Using Redis as a cache

• Optimizing queries using Redis materialized views

• Synchronizing MySQL and Redis

• Tracking changes to entities

• Using a modular asynchronous architecture

How do we generate CRUD events?

Change tracking options

• Explicit code

• Hibernate event listener

• Service-layer aspect

• CQRS/Event-sourcing

ID JSON processed?

ENTITY_CRUD_EVENT

EntityCrudEventRepository

HibernateEventListener

Hibernate event listenerpublic class ChangeTrackingListener implements PostInsertEventListener, PostDeleteEventListener, PostUpdateEventListener { @Autowired private EntityCrudEventRepository entityCrudEventRepository; private void maybeTrackChange(Object entity, EntityCrudEventType eventType) { if (isTrackedEntity(entity)) { entityCrudEventRepository.add(new EntityCrudEvent(eventType, entity)); } }

@Override public void onPostInsert(PostInsertEvent event) { Object entity = event.getEntity(); maybeTrackChange(entity, EntityCrudEventType.CREATE); }

@Override public void onPostUpdate(PostUpdateEvent event) { Object entity = event.getEntity(); maybeTrackChange(entity, EntityCrudEventType.UPDATE); }

@Override public void onPostDelete(PostDeleteEvent event) { Object entity = event.getEntity(); maybeTrackChange(entity, EntityCrudEventType.DELETE); }

Agenda

• Why polyglot persistence?

• Using Redis as a cache

• Optimizing queries using Redis materialized views

• Synchronizing MySQL and Redis

• Tracking changes to entities

• Using a modular asynchronous architecture

Original architecture

WAR

RestaurantManagement

...

Drawbacks of this monolithic architecture

• Obstacle to frequent deployments

• Overloads IDE and web container

• Obstacle to scaling development

• Technology lock-in

WAR

RestaurantManagement

...

Need a more modular architecture

Using a message broker

Asynchronous is preferred

JSON is fashionable but binary format is more efficient

Modular architecture

Order taking

Restaurant Management

MySQL Database

RedisCacheRedis RabbitMQ

CONSUMER RESTAURANT OWNER

Event Publisher

Timer

Benefits of a modular asynchronous architecture

• Scales development: develop, deploy and scale each service independently

• Redeploy UI frequently/independently

• Improves fault isolation

• Eliminates long-term commitment to a single technology stack

• Message broker decouples producers and consumers

Step 2 of 2

for each CRUD event in MySQL queue get next CRUD event from MySQL queue

Publish persistent message to RabbitMQbegin MySQL transaction

mark CRUD event as processedcommit transaction

Message flowEntityCrudEvent

Processor

RedisUpdater

RABBITMQ

AvailableRestaurantManagementService

REDIS

Spring Integration glue code

RedisUpdater ⇒ AMQP<beans>

<int:gateway id="redisUpdaterGateway" service-interface="net...RedisUpdater" default-request-channel="eventChannel" />

<int:channel id="eventChannel"/>

<int:object-to-json-transformer input-channel="eventChannel" output-channel="amqpOut"/>

<int:channel id="amqpOut"/>

<amqp:outbound-channel-adapter channel="amqpOut" amqp-template="rabbitTemplate" routing-key="crudEvents" exchange-name="crudEvents" />

</beans>

Creates proxy

AMQP ⇒ Available...Service<beans> <amqp:inbound-channel-adapter channel="inboundJsonEventsChannel" connection-factory="rabbitConnectionFactory" queue-names="crudEvents"/>

<int:channel id="inboundJsonEventsChannel"/> <int:json-to-object-transformer input-channel="inboundJsonEventsChannel" type="net.chrisrichardson.foodToGo.common.JsonEntityCrudEvent" output-channel="inboundEventsChannel"/> <int:channel id="inboundEventsChannel"/> <int:service-activator input-channel="inboundEventsChannel" ref="availableRestaurantManagementServiceImpl" method="processEvent"/></beans>

Invokes service

Summary

• Each SQL/NoSQL database = set of tradeoffs

• Polyglot persistence: leverage the strengths of SQL and NoSQL databases

• Use Redis as a distributed cache

• Store denormalized data in Redis for fast querying

• Reliable database synchronization required

Questions?

@crichardson crichardson@vmware.comhttp://slideshare.net/chris.e.richardson/

Sign up for CloudFoundry.com using promo code cfjavaone

top related