try nosql it doesn't hurts and is fun
DESCRIPTION
This presentation is a review of the NoSQL spaces I did for the X Jornades de Programari Lliure in Barcelona. You will see a complete review of the NoSQL movement, use cases, technology review, an special review of what are the Graph Databases. And more.... Special thanks to @Hagenburger, @sbitxu, @jannis and the inspiration of the big @jimwebber and the amazing community.TRANSCRIPT
Pere Urbón-BayesMoviepilot Gmbh
N SQL
dijous 30 de juny de 2011
Were we are, and where do we come from?
NoSQL. { “motivation” : “use cases” }
Graph databases.
....
We’re going to talk about?
dijous 30 de juny de 2011
{
"if_you":{
"are_the_master_of": [ "movies", "data analytics", "ruby", "git", "nosql" ],
"love":"recommendation systems",
"would_love_to_know_about":"graph_databases",
"believe_in":"open source"
},
"join_us":"true",
"contact_with":"jobs at moviepilot.com"
}
Moviepilot is a leading provider and discovery service for movies and series based in Berlin!
dijous 30 de juny de 2011
Come and GoingHistory
1960 Navigational Databases
1970 Relational Databases.
Edgar Codd Algebra.
1970 ends, SQL DBMS.
SQL, DB2, Ingres, PostgreSQL, Sybase,
dijous 30 de juny de 2011
dijous 30 de juny de 2011
dijous 30 de juny de 2011
Where are we now?
1990 2000 2010 2030
Text Files
Social networks
Blogs
Tagging
RDF
Semantic Web
Folksonomies
Linked Data
Business Intelligence
Is every thing related?
RDMS
dijous 30 de juny de 2011
Where are we now?
19801990
20002010
2020
dijous 30 de juny de 2011
How are our apps...?
Data warehousing and Business Intelligence.
Stream processing.
Text search.
Scientific processing.
Semi-(un)-structured data.
dijous 30 de juny de 2011
Need to scale horizontally.
Partition and replication.
OLTP and OLAP.
Web 2.0.
Performance, Performance, Performance.
Flexibility.
Big even Huge datasets.
.....?
How are our apps...?
dijous 30 de juny de 2011
select fun, profit from real_world where relational=false and barcelona=true;
Carlo Strozzi, 1998.
Eric Evans (Rackspace) and Johan Oskarsson (last.fm), early 2009.
no:sql(east) 2009, no:sql(eu) 2010.
N SQL
dijous 30 de juny de 2011
Ability to scale horizontally.
Replication and distribution.
Weaker concurrency model.
Smart use of resources.
Access throw different end points.
Dynamic schema environment.
Leave more business to the app side.
select fun, profit from real_world where relational=false and barcelona=true;
N SQL
dijous 30 de juny de 2011
dijous 30 de juny de 2011
StoreDismantle
RebuildBrick
Window
Roof
Unstructured?
Enjoy
Unstructured Structured
dijous 30 de juny de 2011
ACIDselect fun, profit from real_world where relational=false and barcelona=true;
AtomicityAll operations are executed or none is.
ConsistencyData is consistent after the transaction.
IsolationTransactions are independent.
DurabilityChanges persist, event if failures.
Helps
Understand data.Persistence guaranteed.
Hurts
Horizontal scale.High Availability.
dijous 30 de juny de 2011
“There is a magic bullet! It's called relaxing the requirements.”
- Evan Weaver, @evan
dijous 30 de juny de 2011
CAPselect fun, profit from real_world where relational=false and barcelona=true;
C
P
A
ConsistencyEach client has the same
view.
AvailabilityAll client can read and
write.
Partition ToleranceWorks well across different
network partitions.
mysql
redis riak
Only Two!!!!
dijous 30 de juny de 2011
“You have database problem. You research blog and HN. You start use NoSQL product. Now you not know anymore if you have problem.”
- Devops BORAT, @devops_borat
dijous 30 de juny de 2011
NoSQL systems.select fun, profit from real_world where relational=false and barcelona=true;
Most commons
Column DBs.
Document DBs.
Key-Value DBs.
Graph DBs.
Object DBs.
Other systems
XML Databases
Grid Databases.
RDF.
....
dijous 30 de juny de 2011
Column Databasesselect fun, profit from real_world where relational=false and barcelona=true;
Is a DBMS that stores its content by column rather than by row. This has advantages for data warehouses.
More efficient with Aggregates and if data is column oriented.
Suited for OLAP and not much for OLTP.
First implementations, early 1970.
dijous 30 de juny de 2011
Apache Cassandraselect fun, profit from real_world where relational=false and barcelona=true;
Designed to handle very large data spread across multiple commodity servers.
High Availability with no SPOF.
Born at Facebook, to power Inbox Search.
Hybrid system, between column and rows.
Initial Release 2008. Version 0.8.1 28/06/11.
dijous 30 de juny de 2011
Key-Value Databasesselect fun, profit from real_world where relational=false and barcelona=true;
Allow the use to store key-value pairs, where the key usually consist of a string, and the value is a simple primitive.
Suited for use cases where properties and values are enough, ex: profiles, logs, etc...
Eventually consistent, hierarchy, multivalued, etc..
First implementations, around 1980.
dijous 30 de juny de 2011
Redis.ioselect fun, profit from real_world where relational=false and barcelona=true;
Open-source, networked, in-memory, persistent, journaled, key-value datastore.
Binding for the major languages.
The data structure storage system.
Master-Slave replication. High performance.
Initial Release 2009. Version 2.2.7 11/05/11
dijous 30 de juny de 2011
Document Databasesselect fun, profit from real_world where relational=false and barcelona=true;
Is a DBMS where the default unit of store is a document. XML, JSON, YAML, .....
More complex than Key-Value store.
Suited for multi document apps. News, CVs,...
Eventual consistency, limited Atomicity and Isolation.
One of the first, Lotus Notes, 1989.
dijous 30 de juny de 2011
OrientDBselect fun, profit from real_world where relational=false and barcelona=true;
Open source database written in Java.
Schema-[full,less,mix] modes.
Support SQL, ACID compliant, HTTP, Rest and JSON. Distributed and scalable.
Light and embeddable. Binding most langs.
Initial Release 2010, Version 1.0rc2 17/06/11
dijous 30 de juny de 2011
Graph Databasesselect fun, profit from real_world where relational=false and barcelona=true;
Is a database that uses graph structures with nodes, edges, and properties.
Suited for associative datasets, map object oriented app structure. Avoid expensive joins.
Are powerful for graph-like operations, like shortest path, communities, etc.
First implementations around 2007.
dijous 30 de juny de 2011
Graph Databases
dijous 30 de juny de 2011
What is a graph?
Graph G(V,E) where V = {v1,v2,...,vN) and E = {E1,E2,...,EN)
Directed / Undirected
Mixed
Multigraph
Weighted
dijous 30 de juny de 2011
dijous 30 de juny de 2011
Graph DatabasesThe Property Graph
Abstractions
Nodes and Relationships.
Properties on both.
John smith liked http://www.example.com at 01/10/11
dijous 30 de juny de 2011
Graph DatabasesApplications
Task planning
Scheduling
Process assignation
Routing
Logistics
League planning
Pattern Recognition
Dependency analysis
Impact analysis
Network flow
Traffic analysis and optimization
Delivery optimization
Optimization of tasks
dijous 30 de juny de 2011
Graph Databases Applications
Recommendations
Heuristics (PageRank)
Local
Shortest Paths
Hammock Functions
Walks
Search algorithms
Shooting stars
K-nearest neighbors
dijous 30 de juny de 2011
Graph DatabasesApplications
Semantic web
RDF (OWL) Store
RDF-Sail
SPARQL
Linked data (Open Data)
Link analysis
Structure mining
dijous 30 de juny de 2011
Graph DatabasesVendors
Neo4J: Open source database NoSQL graph.
HyperGraphDB: An IA and semantic web graph database.
Infogrid: The Internet Graph database.
Sones: SaaS dot Net Graph database.
OrientDB: The Document-GraphDB.
FlockDB: The twitter graphdb.
Pregel: Graph Processing at Google.
dijous 30 de juny de 2011
dijous 30 de juny de 2011
dijous 30 de juny de 2011
Demo time
dijous 30 de juny de 2011
Thanks!Questions?Pere Urbón-BayesMoviepilot Gmbh
dijous 30 de juny de 2011