nosql
TRANSCRIPT
{“name” : “Radu Vunvulea,“company” : “iQuest”,“userType” : “enthusiastic”“technologies” : [ “.NET”, “JS”, “Azure”, “Web”, “Mobile”, “SL” ],“blog” : “vunvulearadu.blogspot.com”,“email” : ”[email protected]”,“socialMedia” :
{“twitter” : “@RaduVunvulea”,
“fb” : “radu.vunvulea”}
}
Who am I?
In the early 1980s, relational databases began to be defined. One of the proponents of relational database theory was Edgar F. Codd, who published 13 rules that set out to define a relational database. This was the beginning of the formalized scientific groundwork done to lay down specific rules for the existence of the relational aspects of a database.Sursa: http://www.ehow.com
Relevant rules
• Relational facilities• Information is represented only in
one way• All data must be accessible• All views that are theoretically
updatable must be updatable by the system
• Insert, Update, Delete for any retrieval sets
• Non-Relational Database• But is to long• Is not so cool• This name would not caught on
A better name would be
• Non-Relational Database• But is to long• Is not so cool• This name would not caught on
…so we are back to
NoSQL
A better name would be
• More and more connections between data• Everything is linked to something more…
and more… and so on
• Hyperlinks• Tags• RSS• RDF• Attributes• User content
Database trends – 1 Connections
• From a flat architecture to a couple one and now we have a decoupled one based on services
Database trends – 2 Architecture
DB
App
DB
App
DB
App
• From web 2.0 the structure of data are don’t have so fixed structure (is more flexible)
• How many phone number a person could have in 1970?
Database trends – 3 No fix structure
• From web 2.0 the structure of data are don’t have so fixed structure (is more flexible)
• How many phone number a person could have in 1970? And NOW …
Database trends – 3 No fix structure
• 2006 - 160• 2008 – 390• 2010 – 998• 2012 – 2000+
• First column is in years• Second column is in … ?
Database trends – 4 Data Size
• 2006 - 160• 2008 – 390• 2010 – 998• 2012 – 2000+
• First column is in years• Second column is in ExaByte (EB) - TeraByte
–
Database trends – 4 Data Size
• Design to handle massive load• Can scale to massive amounts of
data• Based on Key-Value collections • Dynamic ring partition • Dynamic replication
• Ex.: Dynoite
Key-Value
• Like column oriented Relational Database, but with a twist
• Tables similar to RDBMS, but handle semi-structured
• Based on Google’s BigTable paper• Data mode: • Columns – columns family -> ACL• Dataums keyed by - row, column, time,
index• Row-range – table -> distribution
• Ex.: Cassandra
Big Table
• Similar with Key-Value pair but• DB knows what the Value is
• Inspired by Lotus Notes• Data model:• Collections of Key-Value collections
• Documents are often versioned
• Ex.: MongoDB
Document Database
• Focus in modeling the structure of data• The interconnectivity
• Scales on the complexity of data• Inspired by mathematical Graph Theory• Data model:• Property Graph -> Nodes• Relationships/Edges between Nodes• Key-Value pair on both• Possible Edge Labels and/or Node/Edge Types
• Ex.: Neo4j
Graph Database
• Not part of NoSQL community• Still a good solution for a lot of
problems• Focuses on matching OOP paradigm • Easy to use• Simple to integrate
• Neither gain nor loosing traction
Object Database
• Easy to deploy• No OS management• Scaling • Monitoring• Publish from different source controls • Support different technologies (PHP,
node.js, .NET)• Low cost support – shared mode • Reserved mode – dedicated instance• Each site run in an isolated environment
Web Sites
• Replication• Write to many• Master/Slave replication
• Master reelection• Failover• Either by another machine taking over• Client knowing
Availability
• Most NoSQL sacrifice Consistency • Some NoSQL don’t have Transactions • Atom single operations• Because of this some operations are
impossible to implement
Correctness
• NoSQL is the Batman• Durability is sacrificed• On-disk durability• Multiple-replicas durability
Performance
• Why• Dynamic query• Content is stored as documents• Big database that need to be very fast
• Where• Properties are stored like query and index• Can be used for voting system, CMS or comment
storage
MongoDB
• Why• When you make a lot of updates and insert• Reading data is not the main scope of the
database (writes are faster than reads)• Content is stored as column• High availability
• Where• Can be used with success for logging• Financial industry or any place where we work
with a lot of data that is needed to be written• Basket of an e-commerce application
Cassandra
• Why• For data that don’t change very often
(insert and read and NOT update)• We have a lot of predefined queries and
we need versioning support• Where• Is a great database for CMS and CRM.
CouchDB
• Why• When we need high concurrency• When the latency is very low and we
want the latency to be minimal• Where• Backend of a game or a system that
offer data in real time
Membase
• Why• When we need to make a lot of updates• When the database is not too big and
can be kept in memory• Where• Can be used when we have a real time
communication, for example a stock market with prices
Redis
• Facebook• Hbase – Facebook messages• Scribe - Real-time click logs• Hive – SQLqueries -> MapReducejobs• Hadoop• Web analytics warehouse• Distribute datastore• MySQLbackup
Examples
• Twitter• Hadoop – Analytics• Hbase – People search• Scribe – Log collection framework• FlockDB – Social graph analysis
Examples