navigating the nosql landscape using lego mindstorms and java
TRANSCRIPT
Navigating the NoSQL Landscape using Lego Mindstorms and Java
Michael Nitschinger Developer Advocate, Couchbase Inc.
Navigating the NoSQL Landscape using Lego Mindstorms and Java
Michael Nitschinger Developer Advocate, Couchbase Inc.
• Developer(Advocate(at(Couchbase,(Inc.(• Maintainer(of(the(Couchbase(Java(SDK(
• Speaking(at(Conferences(and(Meetups(
• Living(and(Working(here(in(Vienna,(Austria(
{“about”:*“me”}*
What*we’ll*talk*about*
• What*are*the*limits*of*RDBMS*solu=ons?*
• What*are*the*different*NoSQL*taxonomies?*
• Which*NoSQL*solu=on*is*right*for*me?*
Growth*is*the*New*Reality*
• Instagram*gained*nearly*1*million*users*overnight*when*they*expanded*to*Android*
Showcase:*Draw*Something*
Showcase:*Draw*Something*
Showcase:*Draw*Something*
Does*it*work*with*RDMBS*backend?*
Application Scales Out Just add more commodity web servers
Database Scales Up Get a bigger, more complex server
Note(–(RelaEonal(database(technology(is(great(for(what(it(is(great(for,(but(it(is(not(great(for(this.(
Some*alterna=ves*to*scale*out*your*RDBMS*
Scale*out*your*RDBMS*• Run*many*SQL*Servers*• Data*is*sharded*
(on$the$app$level!)$• Memcached/Cache*for*faster*
response*=me*
• Writes*are*s=ll*slow*
Scale*out*with*RDBMS*
Is*this*a*good*approach*to*scale?*
• Lot*of*components*to*deploy*
• Scale*by*Hand* Caching( Sharding/ReplicaEon(
Learn*From*Others((This(Scenario(Costs(Time(and(Money.(Scaling(SQL(is(potenEally(disastrous(when(going(Viral:((Very(risky(Eme(for(major(code(changes(and(migraEons...(You(have(no(Time(when(skyrockeEng(up!(
The*Rela=onal*Model*
• Formulated*and*proposed*by*Edgar*Codd*in*1969.* hPp://en.wikipedia.org/wiki/RelaEonal_model(
• Based*on*Rela=onal*Algebra* which(is(based(on(Set(Theory(
• Not*all*Problems*fit*into*Set*Theory* i.e.(Graph(Theory( RelaEonships( RecommendaEons(
hPp://en.wikipedia.org/(wiki/Honeywell_316(
Lacking*market*solu=ons,*users*forced*to*invent*
Dynamo(October(2007(
Cassandra(August(2008(
Voldemort(February(2009(
Bigtable(November(2006(
Very(few(organizaEons(want(to((fewer(can)(build(and(maintain(database(sobware(technology.(But(every(organizaEon(building(interacEve(web(applicaEons(needs(this(technology.(
• No(schema(required(before(inserEng(data(• No(schema(change(required(to(change(data(format(• Autodsharding(without(applicaEon(parEcipaEon(• Distributed(queries(• Integrated(main(memory(caching(• Data(synchronizaEon((mobile,(mulEddatacenter)(
Survey:*Schema*inflexibility*#1*adop=on*driver*
11%(
12%(
16%(
29%(
35%(
49%(
Other(
All(of(these(
Costs(
High(latency/low(performance(
Inability(to(scale(out(data(
Lack(of(flexibility/rigid(schemas(
Source: Couchbase NoSQL Survey, December 2011, n=1351
What*is*the*biggest*data*management*problem**driving*your*use*of*NoSQL*in*the*coming*year?*
NoSQL*database*matches*applica=on*logic*=er*architecture*Data(layer(now(scales(with(linear(cost(and(constant(performance(
Application Scales Out Just add more commodity web servers
Database Scales Out Just add more commodity data servers
Scaling out flattens the cost and performance curves.
NoSQL(Database(Servers(
NoSQL*Taxonomy*
The*CAP*Theorem*
• In*a*distributed*System:* Consistency( Availability( ParEEon(Tolerance(
• When*Par==on*happens* Choose(either(Consistency(
(only(respond(to(subset)( or(Availability(
(accept(stale(data(and(conflict(writes)(Conflict(ResoluEon!(
C A
P
• Big*Data* Large(scale(datastore((“>=(100TB(or(Petabytes”)( OpEmized(for(Batch(Processing( Data(Warehouse(
• Big*Users* very(high(get/set(rate((thousands(of(ops/s)( working(set(in(RAM( latency(and(throughput(maPers(most( (near)(RealdTime(use(cases(
Clarifica=on*
The*Key`Value*Store*/*“Cache”*–*the*founda=on*of*NoSQL*
Key*101100101000100010011101(101100101000100010011101(101100101000100010011101(101100101000100010011101(101100101000100010011101(101100101000100010011101(101100101000100010011101(101100101000100010011101(101100101000100010011101(101100101000100010011101(
101100101000100010011101(101100101000100010011101(
101100101000100010011101(101100101000100010011101(101100101000100010011101(
Opaque*Binary*Value*
Memcached*–*the*NoSQL*precursor*
Key*101100101000100010011101(101100101000100010011101(101100101000100010011101(101100101000100010011101(101100101000100010011101(101100101000100010011101(101100101000100010011101(101100101000100010011101(101100101000100010011101(101100101000100010011101(
101100101000100010011101(101100101000100010011101(
101100101000100010011101(101100101000100010011101(101100101000100010011101(
Opaque*Binary*Value*
Memcached*
Indmemory(only(Limited(set(of(operaEons(Blob(Storage:(Set,(Add,(Replace,(CAS(Retrieval:(Get(Structured(Data:(Append,(Increment((“Simple(and(fast.”((Challenges:((d((((cold(cache(d disrupEve(elasEcity(d missing(persistence(
NoSQL*catalog*Key`Value*
Memcached(
Cache(
(mem
ory(on
ly)(
Database(
(mem
ory/disk)(
Redis*–*More*“Structured*Data”*commands*
Key*101100101000100010011101(101100101000100010011101(101100101000100010011101(101100101000100010011101(101100101000100010011101(101100101000100010011101(101100101000100010011101(101100101000100010011101(101100101000100010011101(101100101000100010011101(
101100101000100010011101(101100101000100010011101(
101100101000100010011101(101100101000100010011101(101100101000100010011101(
“Data*Structures”*Blob*List*Set*Hash*…*
Redis*
Disk(Persistence((eventual(consistency(on(the(disk)!Vast(set(of(operaEons(Blob(Storage:(Set,(Add,(Replace,(CAS(Retrieval:(Get,(PubdSub(Structured(Data:(Strings,(Hashes,(Lists,(Sets,(Sorted(lists((Challenges:((d(clustering((to(come)((d(RAM(limit((no(evicEon)((
((
NoSQL*catalog*Key`Value*
Memcached(
Cache(
(mem
ory(on
ly)(
Database(
(mem
ory/disk)(
Redis(
Data*Structure*
Membase*–*From*key`value*cache*to*database*
Diskdbased(with(builtdin(memcached(cache(Cache(refill(on(restart(Memcached(compaEble((drop(in(replacement)(Highlydavailable((data(replicaEon)(Add(or(remove(capacity(to(live(cluster((“Simple,(fast,(elasEc.”((
Membase*Key*101100101000100010011101(101100101000100010011101(101100101000100010011101(101100101000100010011101(101100101000100010011101(101100101000100010011101(101100101000100010011101(101100101000100010011101(101100101000100010011101(101100101000100010011101(
101100101000100010011101(101100101000100010011101(
101100101000100010011101(101100101000100010011101(101100101000100010011101(
Opaque*Binary*Value*
NoSQL*catalog*Key`Value*
Memcached(
Cache(
(mem
ory(on
ly)(
Database(
(mem
ory/disk)(
Redis(
Data*Structure*
Membase(
Couchbase*–*Document`oriented*database*
Key*{(((((“string”(:(“string”,(((((“string”(:(value,(((((“string”(:((((((((((((({((“string”(:(“string”,((((((((((((((((“string”(:(value(},(((((“string”(:([(array(](}((
Autodsharding(Diskdbased(with(builtdin(memcached(cache(Cache(refill(on(restart(Memcached(compaEble((drop(in(replace)(Highlydavailable((data(replicaEon)(Add(or(remove(capacity(to(live(cluster((When(values(are(JSON(objects((“documents”):(Create(indices,(views(and(query(against(the(views((Chooses(Consistency(over(Availability(
JSON*&*Opaque*OBJECT*
(“DOCUMENT”)*
Couchbase*
NoSQL*catalog*Key`Value*
Memcached(
Cache(
(mem
ory(on
ly)(
Database(
(mem
ory/disk)(
Redis(
Data*Structure*
Membase( Couchbase(
Document*
MongoDB*–*Document`oriented*database*
Key*{(((((“string”(:(“string”,(((((“string”(:(value,(((((“string”(:((((((((((((({((“string”(:(“string”,((((((((((((((((“string”(:(value(},(((((“string”(:([(array(](}((
Diskdbased(with(indmemory(“caching”(BSON((“binary(JSON”)(format(and(wire(protocol(Masterdslave(replicaEon(Autodsharding(Values(are(BSON(objects(Supports(ad(hoc(queries(–(best(when(indexed((more(similar(to(RDBMS(modeling(than(Caches((Scaling(over(sharding(requires(special(nodes(
BSON*OBJECT*
(“DOCUMENT”)*
MongoDB*
NoSQL*catalog*Key`Value*
Memcached(
Cache(
(mem
ory(on
ly)(
Database(
(mem
ory/disk)(
Redis(
Data*Structure*
Membase( Couchbase(
MongoDB(
Document*
Cassandra*–*Column*overlays*
Diskdbased(system(Clustered((External(caching(required(for(lowdlatency(reads(“Columns”(are(overlaid(on(the(data(Not(all(rows(must(have(all(columns(Supports(efficient(queries(on(columns(Restart(required(when(adding(columns((MulEdDatadCenter(replicaEon(supported(ColumndModel(may(be(complex(to(start(with((Chooses(Availability(over(Consistency(((
Cassandra*Key101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101
101100101000100010011101101100101000100010011101
101100101000100010011101101100101000100010011101101100101000100010011101
OpaqueBinaryValue
Column(1(
Column(2(
Column(3(((not(present)((
NoSQL*catalog*Key`Value*
Memcached(
Cache(
(mem
ory(on
ly)(
Database(
(mem
ory/disk)(
Redis(
Data*Structure*
Membase( Couchbase(
MongoDB(
Document* Column*
Cassandra(
Neo4j*–*Graph*database*
Diskdbased(system(External(caching(required(for(lowdlatency(reads(Nodes,(relaEonships(and(paths(ProperEes(on(nodes(Delete,(Insert,(Traverse,(etc.(((
Neo4j*
Key101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101
101100101000100010011101101100101000100010011101
101100101000100010011101101100101000100010011101101100101000100010011101
OpaqueBinaryValue
Key101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101
101100101000100010011101101100101000100010011101
101100101000100010011101101100101000100010011101101100101000100010011101
OpaqueBinaryValue
Key101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101
101100101000100010011101101100101000100010011101
101100101000100010011101101100101000100010011101101100101000100010011101
OpaqueBinaryValue
Key101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101
101100101000100010011101101100101000100010011101
101100101000100010011101101100101000100010011101101100101000100010011101
OpaqueBinaryValue
Key101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101
101100101000100010011101101100101000100010011101
101100101000100010011101101100101000100010011101101100101000100010011101
OpaqueBinaryValue
NoSQL*catalog*Key`Value*
Memcached(
Cache(
(mem
ory(on
ly)(
Database(
(mem
ory/disk)(
Redis(
Data*Structure*
Membase( Couchbase(
MongoDB(
Document* Column*
Cassandra(
Graph*
Neo4j(
NoSQL*catalog*Key`Value*
Memcached(
Cache(
(mem
ory(on
ly)(
Database(
(mem
ory/disk)(
Redis(
Data*Structure*
Riak(
Couchbase(
MongoDB(
Document* Column*
Cassandra(
Graph*
Neo4j(
HBase( InfiniteGraph(
Coherence(
Membase(
What*about*Hadoop?*
Hadoop:*Big*Data*Swiss*Army*Knife*
• Oozie:(Workflow,(coordinaEon(• Sqoop(:(Data(connector(to(import/export(data(• Hive(:(SQLdLike(interface(• Pig(:(High(level(programming(language(• Mahout(:(Machine(learning(library(• Whirr(:(Hadoop(management(tools(for(cloud(services(
• Flume(:(Aggregator(• Map(Reduce(:(Framework(to(process(large(volume(of(data(• HBase(:(Key(Value(data(store(• Zookeeper(:(Centralized(configuraEon(management(
• HDFS(:(Distributed(file(system(
So*what?*Connec=ng*Hadoop*
click(stream(events(
profiles,(campaigns(
profiles,(real(Eme(campaign((staEsEcs(
40*milliseconds*to(respond(with(the(decision.(
2*
3*
1*
Which*one*is*right*for*me?*
Survey:*Schema*inflexibility*#1*adop=on*driver*
11%(
12%(
16%(
29%(
35%(
49%(
Other(
All(of(these(
Costs(
High(latency/low(performance(
Inability(to(scale(out(data(
Lack(of(flexibility/rigid(schemas(
Source: Couchbase NoSQL Survey, December 2011, n=1351
What*is*the*biggest*data*management*problem**driving*your*use*of*NoSQL*in*the*coming*year?*
Lack*of*Flexibility*/*Rigid*Schema*• Aggregate*Data*Models*(Mar0n$Fowler)$ Flexible(Data(Structure( OpEmized(Access( Easy(to(distribute(data(
o::1001*{ uid: ji22jd, customer: Ann, line_items: [ { sku: 0321293533, quan: 3, unit_price: 48.0 }, { sku: 0321601912, quan: 1, unit_price: 39.0 }, { sku: 0131495054, quan: 1, unit_price: 51.0 } ], payment: { type: Amex, expiry: 04/2001,
last5: 12345 } }
hPp://marEnfowler.com/bliki/AggregateOrientedDatabase.html(
Use*Cases*Key*Value* • *Session*Management*
• *User*Profile/Preferences*• *Shopping*Cart*
Document* • *Event*Logging*• *Content*Management**• *Web*Analy=cs*• *E`Commerce*Applica=on*
Columns* • *Event*Logging*• *Content*Management*• *Counters*
Graph* • *Connected*Data*/**Social*Networks*• *Rou=ng,*Dispatch*• *Recommenda=ons*based*on*Social*Graph*
Produc=on*Environment*
US*DATA*CENTER*
*
EMEA*DC*
*
APAC*DC*
*
How*do*I*want*to*scale*out?*
• Modify*cluster*topology*should*be*simple* Add,(Remove,(Configure(Nodes(on(a(running(system(
• What*is*the*impact*of*topology*changes?* Sharding,(Caching(of(the(data( Availability(of(the(service(during(cluster(changes(
• More*hardware*=*More*failures* Availability,(reliability(of(the(system:(failover(support(
Add*Nodes*to*Cluster*
• Two*servers*added*One`click*opera=on*
• Docs*automa=cally*rebalanced*across*cluster*Even(distribuEon(of(docs(Minimum(doc(movement(
• Cluster*map*updated*
• App*database**calls*now*distributed**over*larger*number*of*servers**
**
REPLICA*
ACTIVE*
Doc*5*
Doc*2*
Doc*
Doc*
Doc*4*
Doc*1*
Doc*
Doc*
SERVER*1* **
REPLICA*
ACTIVE*
Doc*4*
Doc*7*
Doc*
Doc*
Doc*6*
Doc*3*
Doc*
Doc*
SERVER*2* **
REPLICA*
ACTIVE*
Doc*1*
Doc*2*
Doc*
Doc*
Doc*7*
Doc*9*
Doc*
Doc*
SERVER*3* **
SERVER*4* **
SERVER*5*
REPLICA*
ACTIVE*
REPLICA*
ACTIVE*
Doc*
Doc*8* Doc*
Doc*9* Doc*
Doc*2* Doc*
Doc*8* Doc*
Doc*5* Doc*
Doc*6*
READ/WRITE/UPDATE* READ/WRITE/UPDATE*
APP*SERVER*1*
COUCHBASE*Client*Library***CLUSTER*MAP*
COUCHBASE*Client*Library***CLUSTER*MAP*
APP*SERVER*2*
COUCHBASE*SERVER*CLUSTER*
User(Configured(Replica(Count(=(1(
Fail*Over*Node*
**
REPLICA*
ACTIVE*
Doc*5*
Doc*2*
Doc*
Doc*
Doc*4*
Doc*1*
Doc*
Doc*
SERVER*1* **
REPLICA*
ACTIVE*
Doc*4*
Doc*7*
Doc*
Doc*
Doc*6*
Doc*3*
Doc*
Doc*
SERVER*2* **
REPLICA*
ACTIVE*
Doc*1*
Doc*2*
Doc*
Doc*
Doc*7*
Doc*9*
Doc*
Doc*
SERVER*3* **
SERVER*4* **
SERVER*5*
REPLICA*
ACTIVE*
REPLICA*
ACTIVE*
Doc*9*
Doc*8*
Doc* Doc*6* Doc*
Doc*
Doc*5* Doc*
Doc*2*
Doc*8* Doc*
Doc*
• App*servers*accessing*docs*
• Requests*to*Server*3*fail*
• Cluster*detects*server*failed*Promotes(replicas(of(docs(to(acEve(Updates(cluster(map(
• Requests*for*docs*now*go*to*appropriate*server*
• Typically*rebalance**would*follow*
Doc*
Doc*1* Doc*3*
APP*SERVER*1*
COUCHBASE*Client*Library***CLUSTER*MAP*
COUCHBASE*Client*Library***CLUSTER*MAP*
APP*SERVER*2*
User(Configured(Replica(Count(=(1(
COUCHBASE*SERVER*CLUSTER*
Performance*
• What*is*my*working*set?* Different(PaPerns(based(on(the(ApplicaEon( Social(Games(vs.(AnalyEcs(
• What*do*I*need*to*cache*/*how*oren?* Put(your(data(in(RAM( Read/Write(rates(
• How*to*design*my*data*model?* Trim(towards(your(“hot(code(path”( Aggregate(Model( Easy(to(change(
Management*and*Monitoring*
• Do*not*forget*about*Opera=ons!* Service(Reliability(Engineering(Team(will(thank(you!(
• Manage*your*cluster*easily:* Command(Line,(AdministraEon(Console(to(change(cluster(toplogy(
• Monitor*“your*NoSQL”* Analyze(the(overall(status(of(your(cluster( View(and(fix(boPlenecks(
Conclusion*
• One*Size*Does*Not*Fit*All*• Overview*of*the*the*NoSQL*types*• Choose*the*right*solu=on*for*your*applica=on*
• Don’t*mix*Big*Data*with*Big*Users!*
Q&A*