whoami(1)15 years of experience, proud to be a programmerWrites software for information extraction, nlp, opinion mining (@scale ), and a lot of other buzzwordsImplements scalable architecturesMember of the JUG-Torino coordination team
[email protected] github.com/robfranktwitter.com/robfrankie linkedin.com/in/robfrankhttp://www.celi.it http://www.blogmeter.it
From the site
Redis is an open source, BSD licensed, advanced key-value cache and store. It is often referred to as a data structure server since keys can contain strings, hashes, lists, sets, sorted sets, bitmaps
and hyperloglogs.
Clients in every known language
Articles, books, presentations
On High Scalability every other day
Ecosystem
Architecture
Single-threaded server
Yes: single threaded server
Remember that when you need to scale
Single Linux server can handle 500k req/s
Main features
In memory K/V storeBut with durable persistenceMaster-slave async replicaTransactionsPub/SubServer side LUA scripting
Main featuresKeys with TTLLRU evictionKeys can contain strings, hashes, lists, sets, sorted sets, bitmaps and hyperloglogsREDIS cluster on the go (3.0.0-rc1)
K/V storeKey-value (KV) stores use the associative array (also known as a map or dictionary) as their fundamental data model. In this model, data is represented as a collection of key-value pairs, such that each possible key appears at most once in the collection. (wikipedia)
K/V store
Key
“plain text”
name robsurname frank
A C E D B F
A B C D E F
String/blobs/bitmaps
HashTable: Objects
Linked lists
Sets
PersistenceConfigurable, two flavors
RDB: perfect for backupAOF: append only log, replayed at startup
Use AOF + RDB for rock solid persistenceAutomatic cache warm-up at startup!!Only RAM: switch off persistence
BasicsSET user:1 frankGET user:1 → frankEXISTS user:2 → 1
EXPIRE user:1 3600
INCR count:1 GET count:1 → 1
BasicsKEYS user:* → user:1, user:2MSET user:1 frank user:2 coderMGET user:1 user:2 → frank, coder
HMSET userdetail:3 name rob surname frankHGETALL userdetail:3 → name::rob, surname:: frank
TransactionsMULTIINCR counter:1INCR counter:2EXEC> 1> 1
WATCH counter:3val = GET counter:3val = val +1MULTISET counter:3 $valEXEC
Atomic countersOperators for key increment
INCR counter:1 GET counter:1 → 1
INCRBY counter:1 9GET counter:1 → 10
LUA scriptingServer side LUA scriptingA “sort of” stored procedureScripts are sandboxedAtomic execution ← bear in mind
SCRIPT LOAD "return {KEYS[1],KEYS[2]}""3905aac1828a8f75707b48e446988eaaeb173f13"EVALSHA 3905aac1828a8f75707b48e446988eaaeb173f13 2 user:1 user:21) "user:1"2) "user:2"
LUA scripting
Caching: server level
Configure REDIS as a cache
maxmemory 1024mbmaxmemory-policy allkeys-lru
all the keys will be evicted using an approximated LRU algorithm
Caching: TTL on key
Set a timeout on a keySET doc:1 “mydoc.txt”EXIPRE doc:1 10
OrSETEX doc:1 10 “mydoc.txt”
Duplicate detectionReal time stream of documents from
the Internet20% to 50% of documents are duplicated
DUPLICATES ARE EVIL
And customers don’t pay for that :(
Avoid duplicated documentsAct on producers was
TOO HARD
Filter-out them before heavy document analysis (NLP)
DocumentsEach kind of document has its own natural id
twitter: status idfacebook: post idforum: URLblog: URL
We don’t want this IDs inside our system
Duplicate and id generation
Producer
2M
Producer
Producer
Duplicatedetector - ID generation
Analysis
Storage
3M3M
Duplicatedetector - ID generation
Analysis1M 1M
5M
Map external keys to internal UIDGenerate an ID for each documentIDs are generated using daily named counters:
INCR day:20141028 → 12576INCR day:20141010 → 23412576
Cache generated IDtw_1234578688 → day:20141028;12576
Map external keys to internal UIDDocuments are internally stored on different storage systems with their generated id
globalId→ 20141028:3456789
OperationsNatural Keys are cached with TTL Documents out of time are parked in a staging areaDuplicated documents are usually dropped
LRU cache, counters and LUALUA scripts are executed atomicallyWrote a simple script to:
return previous mapped idor generate id and store key and id in cache
EVALSHA “sha” 2 20141028 tw_1234566 → 20141028:123GET tw_1234566 → 20141028:123
AlternativesPostgreSQL
sequence(s)table OR hstore
Hazelcast (we are java based)in memorywrite your own persistence
References
http://redis.io/http://redis.io/commandshttp://stackoverflow.com/questions/tagged/redishttp://try.redis.io/