memory is the new disk, disk is the new tape, bela ban (jboss by redhat)

24
Memory is the new disk, disk is the new tape Bela Ban, JBoss / Red Hat

Upload: openblend-society

Post on 29-Nov-2014

1.382 views

Category:

Technology


1 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Memory is the new disk, disk is the new tape, Bela Ban (JBoss by RedHat)

Memory is the new disk,disk is the new tape

Bela Ban, JBoss / Red Hat

Page 2: Memory is the new disk, disk is the new tape, Bela Ban (JBoss by RedHat)

Motivation

● We want to store our data in memory– Memory access is faster than disk access

– Even across a network

– A DB requires network communication, too

● The disk is used for archival purposes● Not a replacement for DBs !

– Only a key-value store

– NoSQL

Page 3: Memory is the new disk, disk is the new tape, Bela Ban (JBoss by RedHat)

Problems

● #1: How do we provide memory large enough to store the data (e.g. 2 TB of memory) ?

● #2: How do we guarantee persistence ?– Survival of data between reboots / crashes

Page 4: Memory is the new disk, disk is the new tape, Bela Ban (JBoss by RedHat)

#1: Large memory

● We aggregate the memory of all nodes in a cluster into a large virtual memory space

– 100 nodes of 10 GB == 1 TB of virtual memory

Page 5: Memory is the new disk, disk is the new tape, Bela Ban (JBoss by RedHat)

#2: Persistence

● We store keys redundantly on multiple nodes

– Unless all nodes on which key K is stored crash at the same time, K is persistent

● We can also store the data on disk– To prevent data loss in case all cluster

nodes crash

– This can be done asynchronously, on a background thread

Page 6: Memory is the new disk, disk is the new tape, Bela Ban (JBoss by RedHat)

How do we provide redundancy ?

Page 7: Memory is the new disk, disk is the new tape, Bela Ban (JBoss by RedHat)

Store every key on every node

AA BB CC DD

K1 K1 K1 K1

K2 K2 K2 K2

K3 K3 K3 K3

K4 K4 K4 K4

● RAID 1● Pro: data is available everywhere

– No network round trip

– Data loss only when all nodes crash

● Con: we can only use 25% of our memory

Page 8: Memory is the new disk, disk is the new tape, Bela Ban (JBoss by RedHat)

Store every key on 1 node only

AA BB CC DD

K1 K2 K3 K4

● RAID 0, JBOD● Pro: we can use 100% of our memory● Con: data loss on node crash

– No redundancy

Page 9: Memory is the new disk, disk is the new tape, Bela Ban (JBoss by RedHat)

Store every key on K nodes

AA BB CC DD

K1 K1

K2 K2

K3 K3

K4 K4

● K is configurable (2 in the example)● Variable RAID● Pro: we can use a variable % of our memory

– User determines tradeoff between memory consumption and risk of data loss

Page 10: Memory is the new disk, disk is the new tape, Bela Ban (JBoss by RedHat)

So how do we determine on which nodes the keys are stored ?

Page 11: Memory is the new disk, disk is the new tape, Bela Ban (JBoss by RedHat)

Consistent hashing

● Given a key K and a set of nodes, CH(K) will always pick the same node P for K

– We can also pick a list {P,Q} for K

● Anyone 'knows' that K is on P● If P leaves, CH(K) will pick another node Q

and rebalance affected keys● A good CH will rebalance 1/N keys at most

(where N = number of cluster nodes)

Page 12: Memory is the new disk, disk is the new tape, Bela Ban (JBoss by RedHat)

Example

AA BB CC DD

K1 K1

K2 K2

K3 K3

K4 K4

● K2 is stored on B (primary owner) and C (backup owner)

Page 13: Memory is the new disk, disk is the new tape, Bela Ban (JBoss by RedHat)

Example

AA BB CC DD

K1 K1

K2 K2

K3 K3

K4 K4

● Node B now crashes

Page 14: Memory is the new disk, disk is the new tape, Bela Ban (JBoss by RedHat)

Example

● C (the backup owner of K2) copies K2 to D– C is now the primary owner of K2

● A copies K1 to C– C is now the backup owner of K1

AA BB CC DD

K1 K1 K1

K2 K2 K2

K3 K3

K4 K4

Page 15: Memory is the new disk, disk is the new tape, Bela Ban (JBoss by RedHat)

Rebalancing

● Unless all N owners of a key K crash exactly at the same time, K is always stored redundantly

● When less than N owners crash, rebalancing will copy/move keys to other nodes, so that we have N owners again

Page 16: Memory is the new disk, disk is the new tape, Bela Ban (JBoss by RedHat)

Enter ReplCache

● ReplCache is a distributed hashmap spanning the entire cluster

● Operations: put(K,V), get(K), remove(K)● For every key, we can define how many

times we'd like it to be stored in the cluster– 1: RAID 0

– -1: RAID 1

– N: variable RAID

Page 17: Memory is the new disk, disk is the new tape, Bela Ban (JBoss by RedHat)

Use of ReplCache

HTTP

Apache

mod_jk

DB

JBoss

Servlet

ReplCache

JBoss

Servlet

ReplCache

JBoss

Servlet

ReplCacheCluster

Page 18: Memory is the new disk, disk is the new tape, Bela Ban (JBoss by RedHat)

Demo

Page 19: Memory is the new disk, disk is the new tape, Bela Ban (JBoss by RedHat)

Use cases

● JBoss AS: session distribution using Infinispan

– For data scalability, sessions are stored only N times in a cluster

● GridFS (Infinispan)– I/O over grid

– Files are chunked into slices, each slice is stored in the grid (redundantly if needed)

– Store a 4GB DVD in a grid where each node has only 2GB of heap

Page 20: Memory is the new disk, disk is the new tape, Bela Ban (JBoss by RedHat)

Use cases

● Hibernate Over Grid (OGM)– Replaces DB backend with Infinispan

backed grid

Page 21: Memory is the new disk, disk is the new tape, Bela Ban (JBoss by RedHat)

Conclusion

● Given enough nodes in a cluster, we can provide persistence for data

● Unlike RAID, where everything is stored fully redundantly (even /tmp), we can define persistence guarantees per key

● Ideal for data sets which need to be accessed quickly

– For the paranoid we can still stream to disk

Page 22: Memory is the new disk, disk is the new tape, Bela Ban (JBoss by RedHat)

Conclusion

● Data is distributed over a grid– Cache is closer to clients

– No bottleneck to the DBMS

– Keys are on different nodes

Page 23: Memory is the new disk, disk is the new tape, Bela Ban (JBoss by RedHat)

Conclusion

CacheCache

ClientClient

ClientClient

ClientClient

ClientClient

ClientClient

ClientClient

ClientClient

ClientClient

ClientClient

CacheCache

CacheCache

CacheCache

CacheCache

CacheCache

CacheCache

Page 24: Memory is the new disk, disk is the new tape, Bela Ban (JBoss by RedHat)

Questions ?

● Demo (JGroups)– http://www.jgroups.org

● Infinispan– http://www.infinispan.org

● OGM– http://community.jboss.org/en/hibernate/ogm