nosql - apacheconarchive.apachecon.com/.../b_1000_eifrem_nosql.pdf · 2011-11-11 · analogously,...

71
NOSQL the past, present and future Emil Eifrem [email protected] @emileifrem #neo4j 1 Thursday, November 10, 2011

Upload: others

Post on 08-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: NOSQL - ApacheConarchive.apachecon.com/.../B_1000_Eifrem_NoSQL.pdf · 2011-11-11 · Analogously, why push the full burden of schema freedom to the developer? • Over time, I believe

NOSQLthe past, present and future

Emil Eifrem

[email protected]

@emileifrem

#neo4j

1Thursday, November 10, 2011

Page 2: NOSQL - ApacheConarchive.apachecon.com/.../B_1000_Eifrem_NoSQL.pdf · 2011-11-11 · Analogously, why push the full burden of schema freedom to the developer? • Over time, I believe

So what’s the plan?

๏The title page already told you, but out of order(*). Actual order:

•The Present

•The Past

•And The Future of NOSQL

๏Then lunch.

2

(*) Out of order? Well, the title page did inorder traversal (left node, root, right node), and since most folks aren’t graph geeks I’m going to go with the more intuitive preorder traversal (root, left, right) in the actual presentation.

2Thursday, November 10, 2011

Page 3: NOSQL - ApacheConarchive.apachecon.com/.../B_1000_Eifrem_NoSQL.pdf · 2011-11-11 · Analogously, why push the full burden of schema freedom to the developer? • Over time, I believe

Anyway...

NOSQL: The Present

3

3Thursday, November 10, 2011

Page 4: NOSQL - ApacheConarchive.apachecon.com/.../B_1000_Eifrem_NoSQL.pdf · 2011-11-11 · Analogously, why push the full burden of schema freedom to the developer? • Over time, I believe

First off: the name

4

๏WE ALL HATES IT, M’KAY?

4Thursday, November 10, 2011

Page 5: NOSQL - ApacheConarchive.apachecon.com/.../B_1000_Eifrem_NoSQL.pdf · 2011-11-11 · Analogously, why push the full burden of schema freedom to the developer? • Over time, I believe

NOSQL is NOT...

๏ NO to SQL

๏ NEVER SQL

5

5Thursday, November 10, 2011

Page 6: NOSQL - ApacheConarchive.apachecon.com/.../B_1000_Eifrem_NoSQL.pdf · 2011-11-11 · Analogously, why push the full burden of schema freedom to the developer? • Over time, I believe

Not Only SQL

6

NOSQL is simply

6Thursday, November 10, 2011

Page 7: NOSQL - ApacheConarchive.apachecon.com/.../B_1000_Eifrem_NoSQL.pdf · 2011-11-11 · Analogously, why push the full burden of schema freedom to the developer? • Over time, I believe

7

Four trends

NOSQL - Why now?

7Thursday, November 10, 2011

Page 8: NOSQL - ApacheConarchive.apachecon.com/.../B_1000_Eifrem_NoSQL.pdf · 2011-11-11 · Analogously, why push the full burden of schema freedom to the developer? • Over time, I believe

Source: IDC 2007

Trend 1:data set size

200740

8Thursday, November 10, 2011

Page 9: NOSQL - ApacheConarchive.apachecon.com/.../B_1000_Eifrem_NoSQL.pdf · 2011-11-11 · Analogously, why push the full burden of schema freedom to the developer? • Over time, I believe

200740

2010

988

Trend 1:data set size

Source: IDC 20079Thursday, November 10, 2011

Page 10: NOSQL - ApacheConarchive.apachecon.com/.../B_1000_Eifrem_NoSQL.pdf · 2011-11-11 · Analogously, why push the full burden of schema freedom to the developer? • Over time, I believe

Trend 2: Connectedness

101990

Info

rmat

ion

conn

ectiv

ity

2000 2010 2020

web 1.0 web 2.0 “web 3.0”

10Thursday, November 10, 2011

Page 11: NOSQL - ApacheConarchive.apachecon.com/.../B_1000_Eifrem_NoSQL.pdf · 2011-11-11 · Analogously, why push the full burden of schema freedom to the developer? • Over time, I believe

Trend 2: Connectedness

11

Text documents

1990

Info

rmat

ion

conn

ectiv

ity

2000 2010 2020

web 1.0 web 2.0 “web 3.0”

11Thursday, November 10, 2011

Page 12: NOSQL - ApacheConarchive.apachecon.com/.../B_1000_Eifrem_NoSQL.pdf · 2011-11-11 · Analogously, why push the full burden of schema freedom to the developer? • Over time, I believe

Trend 2: Connectedness

12

Text documents

1990

Info

rmat

ion

conn

ectiv

ity

Hypertext

2000 2010 2020

web 1.0 web 2.0 “web 3.0”

12Thursday, November 10, 2011

Page 13: NOSQL - ApacheConarchive.apachecon.com/.../B_1000_Eifrem_NoSQL.pdf · 2011-11-11 · Analogously, why push the full burden of schema freedom to the developer? • Over time, I believe

Trend 2: Connectedness

13

Text documents

1990

Info

rmat

ion

conn

ectiv

ity Folksonomies

Tagging

User-generated content

Wikis

RSS

Blogs

Hypertext

2000 2010 2020

web 1.0 web 2.0 “web 3.0”

13Thursday, November 10, 2011

Page 14: NOSQL - ApacheConarchive.apachecon.com/.../B_1000_Eifrem_NoSQL.pdf · 2011-11-11 · Analogously, why push the full burden of schema freedom to the developer? • Over time, I believe

Trend 2: Connectedness

14

Text documents

1990

Info

rmat

ion

conn

ectiv

ity Folksonomies

Tagging

User-generated content

Wikis

RSS

Blogs

Hypertext

2000 2010 2020

web 1.0 web 2.0 “web 3.0”

Ontologies

RDF

GiantGlobal

Graph (GGG)

14Thursday, November 10, 2011

Page 15: NOSQL - ApacheConarchive.apachecon.com/.../B_1000_Eifrem_NoSQL.pdf · 2011-11-11 · Analogously, why push the full burden of schema freedom to the developer? • Over time, I believe

Trend 2: Connectedness

15

Text documents

1990

Info

rmat

ion

conn

ectiv

ity Folksonomies

Tagging

User-generated content

Wikis

RSS

Blogs

Hypertext

2000 2010 2020

web 1.0 web 2.0 “web 3.0”

Ontologies

RDF

GiantGlobal

Graph (GGG)

15Thursday, November 10, 2011

Page 16: NOSQL - ApacheConarchive.apachecon.com/.../B_1000_Eifrem_NoSQL.pdf · 2011-11-11 · Analogously, why push the full burden of schema freedom to the developer? • Over time, I believe

Trend 3: Semi-structure

16

๏ Individualization of content

• In the salary lists of the 1970s, all elements had exactly one job

• In the salary lists of the 2000s, we need 5 job columns! Or 8? Or 15?

๏All encompassing “entire world views”

• Store more data about each entity

๏Trend accelerated by the decentralization of content generation that is the hallmark of the age of participation (“web 2.0”)

16Thursday, November 10, 2011

Page 17: NOSQL - ApacheConarchive.apachecon.com/.../B_1000_Eifrem_NoSQL.pdf · 2011-11-11 · Analogously, why push the full burden of schema freedom to the developer? • Over time, I believe

Aside: RDBMS performance

17Data complexity

Perfo

rman

ce

Relational database

Requirement of application

17Thursday, November 10, 2011

Page 18: NOSQL - ApacheConarchive.apachecon.com/.../B_1000_Eifrem_NoSQL.pdf · 2011-11-11 · Analogously, why push the full burden of schema freedom to the developer? • Over time, I believe

Aside: RDBMS performance

18Data complexity

Perfo

rman

ce

Relational database

Requirement of application

18Thursday, November 10, 2011

Page 19: NOSQL - ApacheConarchive.apachecon.com/.../B_1000_Eifrem_NoSQL.pdf · 2011-11-11 · Analogously, why push the full burden of schema freedom to the developer? • Over time, I believe

Aside: RDBMS performance

19Data complexity

Perfo

rman

ce

Salary List Relational database

Requirement of application

19Thursday, November 10, 2011

Page 20: NOSQL - ApacheConarchive.apachecon.com/.../B_1000_Eifrem_NoSQL.pdf · 2011-11-11 · Analogously, why push the full burden of schema freedom to the developer? • Over time, I believe

Aside: RDBMS performance

20Data complexity

Perfo

rman

ce

Majority ofWebapps

Salary List Relational database

Requirement of application

20Thursday, November 10, 2011

Page 21: NOSQL - ApacheConarchive.apachecon.com/.../B_1000_Eifrem_NoSQL.pdf · 2011-11-11 · Analogously, why push the full burden of schema freedom to the developer? • Over time, I believe

Aside: RDBMS performance

21Data complexity

Perfo

rman

ce

Majority ofWebapps

Social network

Location-based services

Salary List

}custom

Relational database

Requirement of application

21Thursday, November 10, 2011

Page 22: NOSQL - ApacheConarchive.apachecon.com/.../B_1000_Eifrem_NoSQL.pdf · 2011-11-11 · Analogously, why push the full burden of schema freedom to the developer? • Over time, I believe

Trend 4: Architecture

22

DB

Application

1980s: Application (<-- note lack of s)

22Thursday, November 10, 2011

Page 23: NOSQL - ApacheConarchive.apachecon.com/.../B_1000_Eifrem_NoSQL.pdf · 2011-11-11 · Analogously, why push the full burden of schema freedom to the developer? • Over time, I believe

Trend 4: Architecture

23

DB

Application

1990s: Database as integration hub

Application Application

23Thursday, November 10, 2011

Page 24: NOSQL - ApacheConarchive.apachecon.com/.../B_1000_Eifrem_NoSQL.pdf · 2011-11-11 · Analogously, why push the full burden of schema freedom to the developer? • Over time, I believe

DBDB DB

Trend 4: Architecture

24

Service

2000s: (moving towards) Decoupled serviceswith their own backend

Service Service

24Thursday, November 10, 2011

Page 25: NOSQL - ApacheConarchive.apachecon.com/.../B_1000_Eifrem_NoSQL.pdf · 2011-11-11 · Analogously, why push the full burden of schema freedom to the developer? • Over time, I believe

Why NOSQL Now?

๏Trend 1: Size

๏Trend 2: Connectedness

๏Trend 3: Semi-structure

๏Trend 4: Architecture

25

25Thursday, November 10, 2011

Page 26: NOSQL - ApacheConarchive.apachecon.com/.../B_1000_Eifrem_NoSQL.pdf · 2011-11-11 · Analogously, why push the full burden of schema freedom to the developer? • Over time, I believe

26

Four product categories

NOSQL

26Thursday, November 10, 2011

Page 27: NOSQL - ApacheConarchive.apachecon.com/.../B_1000_Eifrem_NoSQL.pdf · 2011-11-11 · Analogously, why push the full burden of schema freedom to the developer? • Over time, I believe

Category 1: Key-Value stores

27

๏Lineage:

• “Dynamo: Amazon’s Highly Available Key-Value Store” (2007)

๏Data model:

•Global key-value mapping

•Think: Globally available HashMap/Dict/etc

๏Examples:

• Project Voldemort

•Riak

• 27Thursday, November 10, 2011

Page 28: NOSQL - ApacheConarchive.apachecon.com/.../B_1000_Eifrem_NoSQL.pdf · 2011-11-11 · Analogously, why push the full burden of schema freedom to the developer? • Over time, I believe

Category II: ColumnFamily (BigTable) stores

28

๏Lineage:

• “Bigtable: A Distributed Storage System for Structured Data” (2006)

๏Data model:

•A big table, with column families

๏Examples:

•HBase

•HyperTable

•Cassandra28Thursday, November 10, 2011

Page 29: NOSQL - ApacheConarchive.apachecon.com/.../B_1000_Eifrem_NoSQL.pdf · 2011-11-11 · Analogously, why push the full burden of schema freedom to the developer? • Over time, I believe

Category III: Document databases

29

๏Lineage:

• Lotus Notes

๏Data model:

•Collections of documents

•A document is a key-value collection

๏Examples:

•CouchDB

•MongoDB

29Thursday, November 10, 2011

Page 30: NOSQL - ApacheConarchive.apachecon.com/.../B_1000_Eifrem_NoSQL.pdf · 2011-11-11 · Analogously, why push the full burden of schema freedom to the developer? • Over time, I believe

Document db: An example

30

๏How would we model a blogging software?

๏One stab:

•Represent each Blog as a Collection of Post documents

•Represent Comments as nested documents in the Post documents

30Thursday, November 10, 2011

Page 31: NOSQL - ApacheConarchive.apachecon.com/.../B_1000_Eifrem_NoSQL.pdf · 2011-11-11 · Analogously, why push the full burden of schema freedom to the developer? • Over time, I believe

Document db: Creating a blog post

31

import com.mongodb.Mongo;import com.mongodb.DB;import com.mongodb.DBCollection;import com.mongodb.BasicDBObject;import com.mongodb.DBObject;// ...Mongo mongo = new Mongo( "localhost" ); // Connect to MongoDB// ...DB blogs = mongo.getDB( "blogs" ); // Access the blogs databaseDBCollection myBlog = blogs.getCollection( "Thobe’s blog" );

DBObject blogPost = new BasicDBObject();blogPost.put( "title", "ApacheCon 2011" );blogPost.put( "pub_date", new Date() );blogPost.put( "body", "Publishing a post about ApacheCon in my

MongoDB blog!" );blogPost.put( "tags", Arrays.asList( "conference", "names" ) );blogPost.put( "comments", new ArrayList() );

myBlog.insert( blogPost );

31Thursday, November 10, 2011

Page 32: NOSQL - ApacheConarchive.apachecon.com/.../B_1000_Eifrem_NoSQL.pdf · 2011-11-11 · Analogously, why push the full burden of schema freedom to the developer? • Over time, I believe

Retrieving posts// ...import com.mongodb.DBCursor;// ...

public Object getAllPosts( String blogName ) {DBCollection blog = db.getCollection( blogName );return renderPosts( blog.find() );

}

private Object renderPosts( DBCursor cursor ) {// order by publication date (descending)cursor = cursor.sort( new BasicDBObject( "pub_date", -1 ) );// ...

}

32

32Thursday, November 10, 2011

Page 33: NOSQL - ApacheConarchive.apachecon.com/.../B_1000_Eifrem_NoSQL.pdf · 2011-11-11 · Analogously, why push the full burden of schema freedom to the developer? • Over time, I believe

Category IV: Graph databases

33

๏Lineage:

• Euler and graph theory

๏Data model:

•Nodes with properties

•Typed relationships with properties

๏Examples:

• InfiniteGraph

•Neo4j

33Thursday, November 10, 2011

Page 34: NOSQL - ApacheConarchive.apachecon.com/.../B_1000_Eifrem_NoSQL.pdf · 2011-11-11 · Analogously, why push the full burden of schema freedom to the developer? • Over time, I believe

Property Graph model

34

34Thursday, November 10, 2011

Page 35: NOSQL - ApacheConarchive.apachecon.com/.../B_1000_Eifrem_NoSQL.pdf · 2011-11-11 · Analogously, why push the full burden of schema freedom to the developer? • Over time, I believe

Property Graph model

35

LIVES WITHLOVES

OWNSDRIVES

LOVES

35Thursday, November 10, 2011

Page 36: NOSQL - ApacheConarchive.apachecon.com/.../B_1000_Eifrem_NoSQL.pdf · 2011-11-11 · Analogously, why push the full burden of schema freedom to the developer? • Over time, I believe

Property Graph model

36

LIVES WITHLOVES

OWNSDRIVES

LOVESname: “James”age: 32twitter: “@spam”

name: “Mary”age: 35

brand: “Volvo”model: “V70”

property type: “car”

36Thursday, November 10, 2011

Page 37: NOSQL - ApacheConarchive.apachecon.com/.../B_1000_Eifrem_NoSQL.pdf · 2011-11-11 · Analogously, why push the full burden of schema freedom to the developer? • Over time, I believe

Graphs are whiteboard friendly

37Image credits: Tobias Ivarsson

An application domain model outlined on a whiteboard or piece of paper would be translated to an ER-diagram, then normalized to fit a Relational Database.With a Graph Database the model from the whiteboard is implemented directly.

37Thursday, November 10, 2011

Page 38: NOSQL - ApacheConarchive.apachecon.com/.../B_1000_Eifrem_NoSQL.pdf · 2011-11-11 · Analogously, why push the full burden of schema freedom to the developer? • Over time, I believe

Graphs are whiteboard friendly

38

thobe

Wardrobe Strength

Joe project blog

Hello Joe

Neo4j performance analysis

Modularizing Jython

Image credits: Tobias Ivarsson

An application domain model outlined on a whiteboard or piece of paper would be translated to an ER-diagram, then normalized to fit a Relational Database.With a Graph Database the model from the whiteboard is implemented directly.

38Thursday, November 10, 2011

Page 39: NOSQL - ApacheConarchive.apachecon.com/.../B_1000_Eifrem_NoSQL.pdf · 2011-11-11 · Analogously, why push the full burden of schema freedom to the developer? • Over time, I believe

Graph db: Creating a social graph

39

GraphDatabaseService graphDb = new EmbeddedGraphDatabase(GRAPH_STORAGE_LOCATION );

Transaction tx = graphDb.beginTx();try {

Node mrAnderson = graphDb.createNode();mrAnderson.setProperty( "name", "Thomas Anderson" );mrAnderson.setProperty( "age", 29 );

Node morpheus = graphDb.createNode();morpheus.setProperty( "name", "Morpheus" );morpheus.setProperty( "rank", "Captain" );

Relationship friendship = mrAnderson.createRelationshipTo(morpheus, SocialGraphTypes.FRIENDSHIP );

tx.success();} finally {

tx.finish();}

39Thursday, November 10, 2011

Page 40: NOSQL - ApacheConarchive.apachecon.com/.../B_1000_Eifrem_NoSQL.pdf · 2011-11-11 · Analogously, why push the full burden of schema freedom to the developer? • Over time, I believe

Graph db: How do I know this person?Node me = ...Node you = ...

PathFinder shortestPathFinder = GraphAlgoFactory.shortestPath(Traversals.expanderForTypes(

SocialGraphTypes.FRIENDSHIP, Direction.BOTH ),/* maximum depth: */ 4 );

Path shortestPath = shortestPathFinder.findSinglePath(me, you);

for ( Node friend : shortestPath.nodes() ) {System.out.println( friend.getProperty( "name" ) );

}

40

40Thursday, November 10, 2011

Page 41: NOSQL - ApacheConarchive.apachecon.com/.../B_1000_Eifrem_NoSQL.pdf · 2011-11-11 · Analogously, why push the full burden of schema freedom to the developer? • Over time, I believe

Four emerging NOSQL categories

๏ Key-Value stores

๏ ColumnFamiy stores

๏ Document databases

๏ Graph databases

41

41Thursday, November 10, 2011

Page 42: NOSQL - ApacheConarchive.apachecon.com/.../B_1000_Eifrem_NoSQL.pdf · 2011-11-11 · Analogously, why push the full burden of schema freedom to the developer? • Over time, I believe

Scaling to size vs. Scaling to complexity

42

Coping with Size

Coping with Complexity

Key/Value stores

ColumnFamily stores

Document databases

Graph databases

My subjective view: > 90% of use cases

100+ billion of nodesand relationships

42Thursday, November 10, 2011

Page 43: NOSQL - ApacheConarchive.apachecon.com/.../B_1000_Eifrem_NoSQL.pdf · 2011-11-11 · Analogously, why push the full burden of schema freedom to the developer? • Over time, I believe

43

a brief excursion into the past

NOSQL

43Thursday, November 10, 2011

Page 44: NOSQL - ApacheConarchive.apachecon.com/.../B_1000_Eifrem_NoSQL.pdf · 2011-11-11 · Analogously, why push the full burden of schema freedom to the developer? • Over time, I believe

44

2006 2007 2008 2009 2010 2011 2012 2013 2014 ... 20192005

NOSQL

1976 1977 1978 1979 1980 1981 1982 1983 1984 ... 19891975

SQL

Graph theory researched

Bigtable paper

Dynamo paper

Cambrian explosion of

projects

Oracle announces a NOSQL db

“NOSQL” coined $1.5B

“NOSQL” market?

Only four left?

Only the “Big

Four” left

$1.5BRDBMSmarket

Oracle takes VC funding

Cambrian explosion of

vendors

Oracle starts selling

products

All databases still

in-house

Oracle incorporated

“SEQUEL” invented

RDBMS theory

researched

44Thursday, November 10, 2011

Page 45: NOSQL - ApacheConarchive.apachecon.com/.../B_1000_Eifrem_NoSQL.pdf · 2011-11-11 · Analogously, why push the full burden of schema freedom to the developer? • Over time, I believe

45

The Future

NOSQL

45Thursday, November 10, 2011

Page 46: NOSQL - ApacheConarchive.apachecon.com/.../B_1000_Eifrem_NoSQL.pdf · 2011-11-11 · Analogously, why push the full burden of schema freedom to the developer? • Over time, I believe

Four trends

46

๏ More ACIDity

• Mongo adding durable logging storage in 1.7

• Tunable consistency in Apache Cassandra

• Roger Bodamer

‣ uptime ( CP + average developer )>=uptime ( AP + average developer )http://www.slideshare.net/iammutex/q-con-sf10rogerbodamer

• Makes sense - why push the burden to the developer when eventually consistency is not needed in most scenarios?

46Thursday, November 10, 2011

Page 47: NOSQL - ApacheConarchive.apachecon.com/.../B_1000_Eifrem_NoSQL.pdf · 2011-11-11 · Analogously, why push the full burden of schema freedom to the developer? • Over time, I believe

47

๏ More query languages

• In the past year, many prominent NOSQL databases have invested heavily in query languages

•Cassandra: CQL

•Couchbase: UnQL

•Neo4j: Cypher

•Mongo’s had it from the get go? <--- One reason for their popularity?

Four trends

47Thursday, November 10, 2011

Page 48: NOSQL - ApacheConarchive.apachecon.com/.../B_1000_Eifrem_NoSQL.pdf · 2011-11-11 · Analogously, why push the full burden of schema freedom to the developer? • Over time, I believe

48

๏ More schemas?

•Analogously, why push the full burden of schema freedom to the developer?

•Over time, I believe we will see more schema-like support in most NOSQL stores

•At least in document databases and graph databases, who have the richest models

•Granted, we haven’t really seen that yet

Four trends

48Thursday, November 10, 2011

Page 49: NOSQL - ApacheConarchive.apachecon.com/.../B_1000_Eifrem_NoSQL.pdf · 2011-11-11 · Analogously, why push the full burden of schema freedom to the developer? • Over time, I believe

49

๏ Polyglot persistence will drive middleware support

•The era of the One Size Fits All Database is over

• Ergo, any given system will typically work at runtime with multiple databases

•That’s all fine and dandy, except it’s not because it’s a pain

•This trend will demand a lot of middleware support

Four trends

49Thursday, November 10, 2011

Page 50: NOSQL - ApacheConarchive.apachecon.com/.../B_1000_Eifrem_NoSQL.pdf · 2011-11-11 · Analogously, why push the full burden of schema freedom to the developer? • Over time, I believe

Middleware support?๏Lemme tell you the story about Mike and his restaurant site

50

50Thursday, November 10, 2011

Page 51: NOSQL - ApacheConarchive.apachecon.com/.../B_1000_Eifrem_NoSQL.pdf · 2011-11-11 · Analogously, why push the full burden of schema freedom to the developer? • Over time, I believe

Domain & data model

51

Restaurant@Entitypublic class Restaurant {

@Id @GeneratedValue private Long id; private String name; private String city; private String state; private String zipCode;

UserAccount@Entity@Table(name = "user_account")public class UserAccount { @Id @GeneratedValue private Long id; private String userName; private String firstName; private String lastName; @Temporal(TemporalType.TIMESTAMP) private Date birthDate; @ManyToMany(cascade = CascadeType.ALL) private Set<Restaurant> favorites;

51Thursday, November 10, 2011

Page 52: NOSQL - ApacheConarchive.apachecon.com/.../B_1000_Eifrem_NoSQL.pdf · 2011-11-11 · Analogously, why push the full burden of schema freedom to the developer? • Over time, I believe

Step 1: Buildsing a web site

52

MySQL

Tomcat

One box

52Thursday, November 10, 2011

Page 53: NOSQL - ApacheConarchive.apachecon.com/.../B_1000_Eifrem_NoSQL.pdf · 2011-11-11 · Analogously, why push the full burden of schema freedom to the developer? • Over time, I believe

Step II: Whoa, ppl are actually using it?

53

MySQL

Tomcat

Two boxes

53Thursday, November 10, 2011

Page 54: NOSQL - ApacheConarchive.apachecon.com/.../B_1000_Eifrem_NoSQL.pdf · 2011-11-11 · Analogously, why push the full burden of schema freedom to the developer? • Over time, I believe

Step III: That’s a LOT of pages served...

54

MySQL

Tomcat n boxesTomcat Tomcat

1 box

54Thursday, November 10, 2011

Page 55: NOSQL - ApacheConarchive.apachecon.com/.../B_1000_Eifrem_NoSQL.pdf · 2011-11-11 · Analogously, why push the full burden of schema freedom to the developer? • Over time, I believe

Step IV: Our DB is completely overwhelmed...

55

MySQL (m)

Tomcat n boxesTomcat Tomcat

MySQL (s) n boxes

55Thursday, November 10, 2011

Page 56: NOSQL - ApacheConarchive.apachecon.com/.../B_1000_Eifrem_NoSQL.pdf · 2011-11-11 · Analogously, why push the full burden of schema freedom to the developer? • Over time, I believe

Step V: Our DBs are STILL overwhelmed

?56

56Thursday, November 10, 2011

Page 57: NOSQL - ApacheConarchive.apachecon.com/.../B_1000_Eifrem_NoSQL.pdf · 2011-11-11 · Analogously, why push the full burden of schema freedom to the developer? • Over time, I believe

What does the site look like now?

57

57Thursday, November 10, 2011

Page 58: NOSQL - ApacheConarchive.apachecon.com/.../B_1000_Eifrem_NoSQL.pdf · 2011-11-11 · Analogously, why push the full burden of schema freedom to the developer? • Over time, I believe

Step V: Our DBs are STILL overwhelmed๏Turns out the problem is due to joins

๏A while back Mike introduced a new feature

•Recommend restaurants based on the user’s friends (and friends of friends)

•Whoa, recommendations aren’t just simple get and put!

•They’re killing us with joins

๏What about sharding?

๏What about SSDs?58

58Thursday, November 10, 2011

Page 59: NOSQL - ApacheConarchive.apachecon.com/.../B_1000_Eifrem_NoSQL.pdf · 2011-11-11 · Analogously, why push the full burden of schema freedom to the developer? • Over time, I believe

Polyglot persistence (Not Only SQL)๏How did we get into this situation?

๏Well, data sets are increasingly less uniform

•Parts of Mike’s data fits well in an RDBMS

•But parts of it is graph-shaped

‣It fits much better in a graph database!

‣And I’m sure that there is or will be very key-value-esque parts of the dataset

๏Simple, just store some of it in a graph db and some of it in MySQL!But what does the code look like? 59

59Thursday, November 10, 2011

Page 60: NOSQL - ApacheConarchive.apachecon.com/.../B_1000_Eifrem_NoSQL.pdf · 2011-11-11 · Analogously, why push the full burden of schema freedom to the developer? • Over time, I believe

We were here

60

Restaurant@Entitypublic class Restaurant {

@Id @GeneratedValue private Long id; private String name; private String city; private String state; private String zipCode;

UserAccount@Entity@Table(name = "user_account")public class UserAccount { @Id @GeneratedValue private Long id; private String userName; private String firstName; private String lastName; @Temporal(TemporalType.TIMESTAMP) private Date birthDate; @ManyToMany(cascade = CascadeType.ALL) private Set<Restaurant> favorites;

60Thursday, November 10, 2011

Page 61: NOSQL - ApacheConarchive.apachecon.com/.../B_1000_Eifrem_NoSQL.pdf · 2011-11-11 · Analogously, why push the full burden of schema freedom to the developer? • Over time, I believe

Then we added recommendations

61

Recommendation

@RelationshipEntitypublic class Recommendation { @StartNode private UserAccount user; @EndNode private Restaurant restaurant; private int stars; private String comment;

Restaurant@Entity@NodeEntity(partial = true)public class Restaurant { @Id @GeneratedValue private Long id; private String name; private String city; private String state; private String zipCode;

UserAccount@Entity@Table(name = "user_account")@NodeEntity(partial = true)public class UserAccount { @Id @GeneratedValue private Long id; private String userName; private String firstName; private String lastName; @Temporal(TemporalType.TIMESTAMP) private Date birthDate; @ManyToMany(cascade = CascadeType.ALL) private Set<Restaurant> favorites;

@GraphProperty String nickname; @RelatedTo(type = "friends", elementClass = UserAccount.class) Set<UserAccount> friends; @RelatedToVia(type = "recommends", elementClass = Recommendation.class) Iterable<Recommendation> recommendations;

61Thursday, November 10, 2011

Page 62: NOSQL - ApacheConarchive.apachecon.com/.../B_1000_Eifrem_NoSQL.pdf · 2011-11-11 · Analogously, why push the full burden of schema freedom to the developer? • Over time, I believe

Then we added recommendations

62

Recommendation

@RelationshipEntitypublic class Recommendation { @StartNode private UserAccount user; @EndNode private Restaurant restaurant; private int stars; private String comment;

Restaurant@Entity@NodeEntity(partial = true)public class Restaurant { @Id @GeneratedValue private Long id; private String name; private String city; private String state; private String zipCode;

UserAccount@Entity@Table(name = "user_account")@NodeEntity(partial = true)public class UserAccount { @Id @GeneratedValue private Long id; private String userName; private String firstName; private String lastName; @Temporal(TemporalType.TIMESTAMP) private Date birthDate; @ManyToMany(cascade = CascadeType.ALL) private Set<Restaurant> favorites;

@GraphProperty String nickname; @RelatedTo(type = "friends", elementClass = UserAccount.class) Set<UserAccount> friends; @RelatedToVia(type = "recommends", elementClass = Recommendation.class) Iterable<Recommendation> recommendations;

62Thursday, November 10, 2011

Page 63: NOSQL - ApacheConarchive.apachecon.com/.../B_1000_Eifrem_NoSQL.pdf · 2011-11-11 · Analogously, why push the full burden of schema freedom to the developer? • Over time, I believe

And we’re back to talking about Middleware๏This example was from the Spring Data project

• Specifically Spring Data Neo4j

•Available at: http://www.springsource.org/spring-data/neo4j

• Spring Data Neo4j 2.0 will be released RSN, RC out NOW

๏But this really should be available in any middleware stack

•Ask Redhat / JBoss

•Ask David & the brogrammers

•Ask Sun / Oracle

•Ask Microsoft 63

63Thursday, November 10, 2011

Page 64: NOSQL - ApacheConarchive.apachecon.com/.../B_1000_Eifrem_NoSQL.pdf · 2011-11-11 · Analogously, why push the full burden of schema freedom to the developer? • Over time, I believe

Four NOSQL trends

๏ More ACIDity

๏ More query languages

๏ More schemas

๏ More middleware support

64

64Thursday, November 10, 2011

Page 65: NOSQL - ApacheConarchive.apachecon.com/.../B_1000_Eifrem_NoSQL.pdf · 2011-11-11 · Analogously, why push the full burden of schema freedom to the developer? • Over time, I believe

65

Conclusion

65Thursday, November 10, 2011

Page 66: NOSQL - ApacheConarchive.apachecon.com/.../B_1000_Eifrem_NoSQL.pdf · 2011-11-11 · Analogously, why push the full burden of schema freedom to the developer? • Over time, I believe

Ait, what’s your point?๏There’s an explosion of ‘nosql’ databases out there

• Some are immature and experimental

• Some are coming out of years of battle-hardened production

๏NOSQL is about finding the right tool for the job

• Sometimes that’s an RDBMS

•But increasingly commonly a NOSQL db is the perfect fit

๏We will have heterogenous data backends in the future

•Now the rest of the stack needs to step up and help developers cope with that 66

66Thursday, November 10, 2011

Page 67: NOSQL - ApacheConarchive.apachecon.com/.../B_1000_Eifrem_NoSQL.pdf · 2011-11-11 · Analogously, why push the full burden of schema freedom to the developer? • Over time, I believe

Not Only SQLis here

67

Key takeaway

67Thursday, November 10, 2011

Page 68: NOSQL - ApacheConarchive.apachecon.com/.../B_1000_Eifrem_NoSQL.pdf · 2011-11-11 · Analogously, why push the full burden of schema freedom to the developer? • Over time, I believe

Not Only SQLis here to stay

68

Key takeaway

68Thursday, November 10, 2011

Page 69: NOSQL - ApacheConarchive.apachecon.com/.../B_1000_Eifrem_NoSQL.pdf · 2011-11-11 · Analogously, why push the full burden of schema freedom to the developer? • Over time, I believe

Not Only SQLis FUN - dl & play

around now!69

Key takeaway

69Thursday, November 10, 2011

Page 70: NOSQL - ApacheConarchive.apachecon.com/.../B_1000_Eifrem_NoSQL.pdf · 2011-11-11 · Analogously, why push the full burden of schema freedom to the developer? • Over time, I believe

70

Image credits: Lost! Sorry... :(

Questions?

70Thursday, November 10, 2011

Page 71: NOSQL - ApacheConarchive.apachecon.com/.../B_1000_Eifrem_NoSQL.pdf · 2011-11-11 · Analogously, why push the full burden of schema freedom to the developer? • Over time, I believe

http://neotechnology.com

71Thursday, November 10, 2011