nosql - apacheconarchive.apachecon.com/.../b_1000_eifrem_nosql.pdf · 2011-11-11 · analogously,...
TRANSCRIPT
NOSQLthe past, present and future
Emil Eifrem
@emileifrem
#neo4j
1Thursday, November 10, 2011
So what’s the plan?
๏The title page already told you, but out of order(*). Actual order:
•The Present
•The Past
•And The Future of NOSQL
๏Then lunch.
2
(*) Out of order? Well, the title page did inorder traversal (left node, root, right node), and since most folks aren’t graph geeks I’m going to go with the more intuitive preorder traversal (root, left, right) in the actual presentation.
2Thursday, November 10, 2011
Anyway...
NOSQL: The Present
3
3Thursday, November 10, 2011
First off: the name
4
๏WE ALL HATES IT, M’KAY?
4Thursday, November 10, 2011
NOSQL is NOT...
๏ NO to SQL
๏ NEVER SQL
5
5Thursday, November 10, 2011
Not Only SQL
6
NOSQL is simply
6Thursday, November 10, 2011
7
Four trends
NOSQL - Why now?
7Thursday, November 10, 2011
Source: IDC 2007
Trend 1:data set size
200740
8Thursday, November 10, 2011
200740
2010
988
Trend 1:data set size
Source: IDC 20079Thursday, November 10, 2011
Trend 2: Connectedness
101990
Info
rmat
ion
conn
ectiv
ity
2000 2010 2020
web 1.0 web 2.0 “web 3.0”
10Thursday, November 10, 2011
Trend 2: Connectedness
11
Text documents
1990
Info
rmat
ion
conn
ectiv
ity
2000 2010 2020
web 1.0 web 2.0 “web 3.0”
11Thursday, November 10, 2011
Trend 2: Connectedness
12
Text documents
1990
Info
rmat
ion
conn
ectiv
ity
Hypertext
2000 2010 2020
web 1.0 web 2.0 “web 3.0”
12Thursday, November 10, 2011
Trend 2: Connectedness
13
Text documents
1990
Info
rmat
ion
conn
ectiv
ity Folksonomies
Tagging
User-generated content
Wikis
RSS
Blogs
Hypertext
2000 2010 2020
web 1.0 web 2.0 “web 3.0”
13Thursday, November 10, 2011
Trend 2: Connectedness
14
Text documents
1990
Info
rmat
ion
conn
ectiv
ity Folksonomies
Tagging
User-generated content
Wikis
RSS
Blogs
Hypertext
2000 2010 2020
web 1.0 web 2.0 “web 3.0”
Ontologies
RDF
GiantGlobal
Graph (GGG)
14Thursday, November 10, 2011
Trend 2: Connectedness
15
Text documents
1990
Info
rmat
ion
conn
ectiv
ity Folksonomies
Tagging
User-generated content
Wikis
RSS
Blogs
Hypertext
2000 2010 2020
web 1.0 web 2.0 “web 3.0”
Ontologies
RDF
GiantGlobal
Graph (GGG)
15Thursday, November 10, 2011
Trend 3: Semi-structure
16
๏ Individualization of content
• In the salary lists of the 1970s, all elements had exactly one job
• In the salary lists of the 2000s, we need 5 job columns! Or 8? Or 15?
๏All encompassing “entire world views”
• Store more data about each entity
๏Trend accelerated by the decentralization of content generation that is the hallmark of the age of participation (“web 2.0”)
16Thursday, November 10, 2011
Aside: RDBMS performance
17Data complexity
Perfo
rman
ce
Relational database
Requirement of application
17Thursday, November 10, 2011
Aside: RDBMS performance
18Data complexity
Perfo
rman
ce
Relational database
Requirement of application
18Thursday, November 10, 2011
Aside: RDBMS performance
19Data complexity
Perfo
rman
ce
Salary List Relational database
Requirement of application
19Thursday, November 10, 2011
Aside: RDBMS performance
20Data complexity
Perfo
rman
ce
Majority ofWebapps
Salary List Relational database
Requirement of application
20Thursday, November 10, 2011
Aside: RDBMS performance
21Data complexity
Perfo
rman
ce
Majority ofWebapps
Social network
Location-based services
Salary List
}custom
Relational database
Requirement of application
21Thursday, November 10, 2011
Trend 4: Architecture
22
DB
Application
1980s: Application (<-- note lack of s)
22Thursday, November 10, 2011
Trend 4: Architecture
23
DB
Application
1990s: Database as integration hub
Application Application
23Thursday, November 10, 2011
DBDB DB
Trend 4: Architecture
24
Service
2000s: (moving towards) Decoupled serviceswith their own backend
Service Service
24Thursday, November 10, 2011
Why NOSQL Now?
๏Trend 1: Size
๏Trend 2: Connectedness
๏Trend 3: Semi-structure
๏Trend 4: Architecture
25
25Thursday, November 10, 2011
26
Four product categories
NOSQL
26Thursday, November 10, 2011
Category 1: Key-Value stores
27
๏Lineage:
• “Dynamo: Amazon’s Highly Available Key-Value Store” (2007)
๏Data model:
•Global key-value mapping
•Think: Globally available HashMap/Dict/etc
๏Examples:
• Project Voldemort
•Riak
• 27Thursday, November 10, 2011
Category II: ColumnFamily (BigTable) stores
28
๏Lineage:
• “Bigtable: A Distributed Storage System for Structured Data” (2006)
๏Data model:
•A big table, with column families
๏Examples:
•HBase
•HyperTable
•Cassandra28Thursday, November 10, 2011
Category III: Document databases
29
๏Lineage:
• Lotus Notes
๏Data model:
•Collections of documents
•A document is a key-value collection
๏Examples:
•CouchDB
•MongoDB
29Thursday, November 10, 2011
Document db: An example
30
๏How would we model a blogging software?
๏One stab:
•Represent each Blog as a Collection of Post documents
•Represent Comments as nested documents in the Post documents
30Thursday, November 10, 2011
Document db: Creating a blog post
31
import com.mongodb.Mongo;import com.mongodb.DB;import com.mongodb.DBCollection;import com.mongodb.BasicDBObject;import com.mongodb.DBObject;// ...Mongo mongo = new Mongo( "localhost" ); // Connect to MongoDB// ...DB blogs = mongo.getDB( "blogs" ); // Access the blogs databaseDBCollection myBlog = blogs.getCollection( "Thobe’s blog" );
DBObject blogPost = new BasicDBObject();blogPost.put( "title", "ApacheCon 2011" );blogPost.put( "pub_date", new Date() );blogPost.put( "body", "Publishing a post about ApacheCon in my
MongoDB blog!" );blogPost.put( "tags", Arrays.asList( "conference", "names" ) );blogPost.put( "comments", new ArrayList() );
myBlog.insert( blogPost );
31Thursday, November 10, 2011
Retrieving posts// ...import com.mongodb.DBCursor;// ...
public Object getAllPosts( String blogName ) {DBCollection blog = db.getCollection( blogName );return renderPosts( blog.find() );
}
private Object renderPosts( DBCursor cursor ) {// order by publication date (descending)cursor = cursor.sort( new BasicDBObject( "pub_date", -1 ) );// ...
}
32
32Thursday, November 10, 2011
Category IV: Graph databases
33
๏Lineage:
• Euler and graph theory
๏Data model:
•Nodes with properties
•Typed relationships with properties
๏Examples:
• InfiniteGraph
•Neo4j
33Thursday, November 10, 2011
Property Graph model
34
34Thursday, November 10, 2011
Property Graph model
35
LIVES WITHLOVES
OWNSDRIVES
LOVES
35Thursday, November 10, 2011
Property Graph model
36
LIVES WITHLOVES
OWNSDRIVES
LOVESname: “James”age: 32twitter: “@spam”
name: “Mary”age: 35
brand: “Volvo”model: “V70”
property type: “car”
36Thursday, November 10, 2011
Graphs are whiteboard friendly
37Image credits: Tobias Ivarsson
An application domain model outlined on a whiteboard or piece of paper would be translated to an ER-diagram, then normalized to fit a Relational Database.With a Graph Database the model from the whiteboard is implemented directly.
37Thursday, November 10, 2011
Graphs are whiteboard friendly
38
thobe
Wardrobe Strength
Joe project blog
Hello Joe
Neo4j performance analysis
Modularizing Jython
Image credits: Tobias Ivarsson
An application domain model outlined on a whiteboard or piece of paper would be translated to an ER-diagram, then normalized to fit a Relational Database.With a Graph Database the model from the whiteboard is implemented directly.
38Thursday, November 10, 2011
Graph db: Creating a social graph
39
GraphDatabaseService graphDb = new EmbeddedGraphDatabase(GRAPH_STORAGE_LOCATION );
Transaction tx = graphDb.beginTx();try {
Node mrAnderson = graphDb.createNode();mrAnderson.setProperty( "name", "Thomas Anderson" );mrAnderson.setProperty( "age", 29 );
Node morpheus = graphDb.createNode();morpheus.setProperty( "name", "Morpheus" );morpheus.setProperty( "rank", "Captain" );
Relationship friendship = mrAnderson.createRelationshipTo(morpheus, SocialGraphTypes.FRIENDSHIP );
tx.success();} finally {
tx.finish();}
39Thursday, November 10, 2011
Graph db: How do I know this person?Node me = ...Node you = ...
PathFinder shortestPathFinder = GraphAlgoFactory.shortestPath(Traversals.expanderForTypes(
SocialGraphTypes.FRIENDSHIP, Direction.BOTH ),/* maximum depth: */ 4 );
Path shortestPath = shortestPathFinder.findSinglePath(me, you);
for ( Node friend : shortestPath.nodes() ) {System.out.println( friend.getProperty( "name" ) );
}
40
40Thursday, November 10, 2011
Four emerging NOSQL categories
๏ Key-Value stores
๏ ColumnFamiy stores
๏ Document databases
๏ Graph databases
41
41Thursday, November 10, 2011
Scaling to size vs. Scaling to complexity
42
Coping with Size
Coping with Complexity
Key/Value stores
ColumnFamily stores
Document databases
Graph databases
My subjective view: > 90% of use cases
100+ billion of nodesand relationships
42Thursday, November 10, 2011
43
a brief excursion into the past
NOSQL
43Thursday, November 10, 2011
44
2006 2007 2008 2009 2010 2011 2012 2013 2014 ... 20192005
NOSQL
1976 1977 1978 1979 1980 1981 1982 1983 1984 ... 19891975
SQL
Graph theory researched
Bigtable paper
Dynamo paper
Cambrian explosion of
projects
Oracle announces a NOSQL db
“NOSQL” coined $1.5B
“NOSQL” market?
Only four left?
Only the “Big
Four” left
$1.5BRDBMSmarket
Oracle takes VC funding
Cambrian explosion of
vendors
Oracle starts selling
products
All databases still
in-house
Oracle incorporated
“SEQUEL” invented
RDBMS theory
researched
44Thursday, November 10, 2011
45
The Future
NOSQL
45Thursday, November 10, 2011
Four trends
46
๏ More ACIDity
• Mongo adding durable logging storage in 1.7
• Tunable consistency in Apache Cassandra
• Roger Bodamer
‣ uptime ( CP + average developer )>=uptime ( AP + average developer )http://www.slideshare.net/iammutex/q-con-sf10rogerbodamer
• Makes sense - why push the burden to the developer when eventually consistency is not needed in most scenarios?
46Thursday, November 10, 2011
47
๏ More query languages
• In the past year, many prominent NOSQL databases have invested heavily in query languages
•Cassandra: CQL
•Couchbase: UnQL
•Neo4j: Cypher
•Mongo’s had it from the get go? <--- One reason for their popularity?
Four trends
47Thursday, November 10, 2011
48
๏ More schemas?
•Analogously, why push the full burden of schema freedom to the developer?
•Over time, I believe we will see more schema-like support in most NOSQL stores
•At least in document databases and graph databases, who have the richest models
•Granted, we haven’t really seen that yet
Four trends
48Thursday, November 10, 2011
49
๏ Polyglot persistence will drive middleware support
•The era of the One Size Fits All Database is over
• Ergo, any given system will typically work at runtime with multiple databases
•That’s all fine and dandy, except it’s not because it’s a pain
•This trend will demand a lot of middleware support
Four trends
49Thursday, November 10, 2011
Middleware support?๏Lemme tell you the story about Mike and his restaurant site
50
50Thursday, November 10, 2011
Domain & data model
51
Restaurant@Entitypublic class Restaurant {
@Id @GeneratedValue private Long id; private String name; private String city; private String state; private String zipCode;
UserAccount@Entity@Table(name = "user_account")public class UserAccount { @Id @GeneratedValue private Long id; private String userName; private String firstName; private String lastName; @Temporal(TemporalType.TIMESTAMP) private Date birthDate; @ManyToMany(cascade = CascadeType.ALL) private Set<Restaurant> favorites;
51Thursday, November 10, 2011
Step 1: Buildsing a web site
52
MySQL
Tomcat
One box
52Thursday, November 10, 2011
Step II: Whoa, ppl are actually using it?
53
MySQL
Tomcat
Two boxes
53Thursday, November 10, 2011
Step III: That’s a LOT of pages served...
54
MySQL
Tomcat n boxesTomcat Tomcat
1 box
54Thursday, November 10, 2011
Step IV: Our DB is completely overwhelmed...
55
MySQL (m)
Tomcat n boxesTomcat Tomcat
MySQL (s) n boxes
55Thursday, November 10, 2011
Step V: Our DBs are STILL overwhelmed
?56
56Thursday, November 10, 2011
What does the site look like now?
57
57Thursday, November 10, 2011
Step V: Our DBs are STILL overwhelmed๏Turns out the problem is due to joins
๏A while back Mike introduced a new feature
•Recommend restaurants based on the user’s friends (and friends of friends)
•Whoa, recommendations aren’t just simple get and put!
•They’re killing us with joins
๏What about sharding?
๏What about SSDs?58
58Thursday, November 10, 2011
Polyglot persistence (Not Only SQL)๏How did we get into this situation?
๏Well, data sets are increasingly less uniform
•Parts of Mike’s data fits well in an RDBMS
•But parts of it is graph-shaped
‣It fits much better in a graph database!
‣And I’m sure that there is or will be very key-value-esque parts of the dataset
๏Simple, just store some of it in a graph db and some of it in MySQL!But what does the code look like? 59
59Thursday, November 10, 2011
We were here
60
Restaurant@Entitypublic class Restaurant {
@Id @GeneratedValue private Long id; private String name; private String city; private String state; private String zipCode;
UserAccount@Entity@Table(name = "user_account")public class UserAccount { @Id @GeneratedValue private Long id; private String userName; private String firstName; private String lastName; @Temporal(TemporalType.TIMESTAMP) private Date birthDate; @ManyToMany(cascade = CascadeType.ALL) private Set<Restaurant> favorites;
60Thursday, November 10, 2011
Then we added recommendations
61
Recommendation
@RelationshipEntitypublic class Recommendation { @StartNode private UserAccount user; @EndNode private Restaurant restaurant; private int stars; private String comment;
Restaurant@Entity@NodeEntity(partial = true)public class Restaurant { @Id @GeneratedValue private Long id; private String name; private String city; private String state; private String zipCode;
UserAccount@Entity@Table(name = "user_account")@NodeEntity(partial = true)public class UserAccount { @Id @GeneratedValue private Long id; private String userName; private String firstName; private String lastName; @Temporal(TemporalType.TIMESTAMP) private Date birthDate; @ManyToMany(cascade = CascadeType.ALL) private Set<Restaurant> favorites;
@GraphProperty String nickname; @RelatedTo(type = "friends", elementClass = UserAccount.class) Set<UserAccount> friends; @RelatedToVia(type = "recommends", elementClass = Recommendation.class) Iterable<Recommendation> recommendations;
61Thursday, November 10, 2011
Then we added recommendations
62
Recommendation
@RelationshipEntitypublic class Recommendation { @StartNode private UserAccount user; @EndNode private Restaurant restaurant; private int stars; private String comment;
Restaurant@Entity@NodeEntity(partial = true)public class Restaurant { @Id @GeneratedValue private Long id; private String name; private String city; private String state; private String zipCode;
UserAccount@Entity@Table(name = "user_account")@NodeEntity(partial = true)public class UserAccount { @Id @GeneratedValue private Long id; private String userName; private String firstName; private String lastName; @Temporal(TemporalType.TIMESTAMP) private Date birthDate; @ManyToMany(cascade = CascadeType.ALL) private Set<Restaurant> favorites;
@GraphProperty String nickname; @RelatedTo(type = "friends", elementClass = UserAccount.class) Set<UserAccount> friends; @RelatedToVia(type = "recommends", elementClass = Recommendation.class) Iterable<Recommendation> recommendations;
62Thursday, November 10, 2011
And we’re back to talking about Middleware๏This example was from the Spring Data project
• Specifically Spring Data Neo4j
•Available at: http://www.springsource.org/spring-data/neo4j
• Spring Data Neo4j 2.0 will be released RSN, RC out NOW
๏But this really should be available in any middleware stack
•Ask Redhat / JBoss
•Ask David & the brogrammers
•Ask Sun / Oracle
•Ask Microsoft 63
63Thursday, November 10, 2011
Four NOSQL trends
๏ More ACIDity
๏ More query languages
๏ More schemas
๏ More middleware support
64
64Thursday, November 10, 2011
65
Conclusion
65Thursday, November 10, 2011
Ait, what’s your point?๏There’s an explosion of ‘nosql’ databases out there
• Some are immature and experimental
• Some are coming out of years of battle-hardened production
๏NOSQL is about finding the right tool for the job
• Sometimes that’s an RDBMS
•But increasingly commonly a NOSQL db is the perfect fit
๏We will have heterogenous data backends in the future
•Now the rest of the stack needs to step up and help developers cope with that 66
66Thursday, November 10, 2011
Not Only SQLis here
67
Key takeaway
67Thursday, November 10, 2011
Not Only SQLis here to stay
68
Key takeaway
68Thursday, November 10, 2011
Not Only SQLis FUN - dl & play
around now!69
Key takeaway
69Thursday, November 10, 2011
70
Image credits: Lost! Sorry... :(
Questions?
70Thursday, November 10, 2011
http://neotechnology.com
71Thursday, November 10, 2011