non relational database-nosql

6
Non-Relational Databases-NoSQL Ramkumar.R Department of Computer Science Pope John Paul II College of Education Pondicherry Abstract Many organizations collect vast amounts of customer, scientific, sales, and other data for future analysis. Traditionally, most of these organizations have stored structured data in relational databases for subsequent access and analysis. However, a growing number of developers and users have begun turning to various types of non- relational, now frequently called NoSQL-databases. Non-relational databases, including hierarchical, graph, and object-oriented databases- have been around since the late 1960s. However, new types of NoSQL databases are being developed. And only now are they beginning to gain market traction. Different NoSQL databases take different approaches. What they have in common is that they're not relational. Their primary advantage is that, unlike relational databases, they handle unstructured data such as word-processing files, e-mail, multimedia, and social media efficiently. Numerous companies and organizations have developed NoSQL databases. The approach’s most influential champions are primarily Web 2.0 companies with huge, growing data and infrastructure needs such as Amazon and Google. They developed the Dynamo and Big Table NoSQL databases, respectively, which have inspired many of today’s NoSQL applications. This paper discuss issues such as limitation of SQL, characteristics, advantages, concerns and challenges regarding NoSQL databases. Keywords: NoSQL, Schema, Auto- sharding, Scaling, Big data. 1. Introduction In computing, NoSQL (commonly interpreted as "not only SQL") is a broad class of database management systems identified by non-adherence to the widely used relational database management system model. NoSQL databases are not built primarily on tables, and generally do not use SQL for data manipulation. NoSQL database systems are often highly optimized for retrieval and appending operations and often offer little functionality beyond record storage (e.g. key–value stores). The reduced run-time flexibility compared to full SQL systems is compensated by marked gains

Upload: ram-kumar

Post on 13-Apr-2015

44 views

Category:

Documents


1 download

DESCRIPTION

This was my Paper presentation regarding nosql database

TRANSCRIPT

Page 1: Non Relational Database-NoSQL

Non-Relational Databases-NoSQL

Ramkumar.RDepartment of Computer Science

Pope John Paul II College of EducationPondicherry

Abstract

Many organizations collect vast amounts of customer, scientific, sales, and other data for future analysis. Traditionally, most of these organizations have stored structured data in relational databases for subsequent access and analysis. However, a growing number of developers and users have begun turning to various types of non-relational, now frequently called NoSQL-databases. Non-relational databases, including hierarchical, graph, and object-oriented databases-have been around since the late 1960s. However, new types of NoSQL databases are being developed. And only now are they beginning to gain market traction. Different NoSQL databases take different approaches. What they have in common is that they're not relational. Their primary advantage is that, unlike relational databases, they handle unstructured data such as word-processing files, e-mail, multimedia, and social media efficiently. Numerous companies and organizations have developed NoSQL databases. The approach’s most influential champions are primarily Web 2.0 companies with huge, growing data and infrastructure needs such as Amazon and Google. They developed the Dynamo and Big Table NoSQL databases, respectively, which have inspired many of today’s NoSQL applications. This paper discuss issues such as limitation of SQL, characteristics, advantages, concerns and challenges regarding NoSQL databases.

Keywords: NoSQL, Schema, Auto-sharding, Scaling, Big data.

1. Introduction

In computing, NoSQL (commonly interpreted as "not only SQL") is a broad class of database management systems identified by non-adherence to the widely used relational database management system model. NoSQL databases are not built primarily on tables, and generally

do not use SQL for data manipulation. NoSQL database systems are often highly optimized for retrieval and appending operations and often offer little functionality beyond record storage (e.g. key–value stores). The reduced run-time flexibility compared to full SQL systems is compensated by marked gains in scalability and performance for certain data models.

NoSQL database management systems are useful when working with a huge quantity of data when the data's nature does not require a relational model. The data can be structured, but NoSQL is used when what really matters is the ability to store and retrieve great quantities of data, not the relationships between the elements. Usage examples might be to store millions of key–value pairs in one or a few associative arrays or to store millions of data records. This organization is particularly useful for statistical or real-time analyses of growing lists of elements (such as Twitter posts or the Internet server logs from a large group of users).

2. History

The term NoSQL was coined by Carlo Strozzi in the year 1998. He used this term to name his Open Source, Light Weight, and Database which did not have an SQL interface.

In the early 2009, when last.fm wanted to organize an event on open-source distributed databases, Eric Evans, a Rackspace employee, reused the term to refer databases which are non-relational, distributed, and does not conform to atomicity, consistency, isolation, and durability - four obvious features of traditional relational database systems.

In the same year, the NoSQL conference held in Atlanta, USA, NoSQL was discussed and debated a lot and then, discussion and practice of NoSQL got a momentum, and NoSQL saw an unprecedented growth.

Page 2: Non Relational Database-NoSQL

3. Characteristics

No schema required, Data can be inserted in a NoSQL database without first defining a rigid database schema. As a corollary, the format of the data being inserted can be changed at any time, without application disruption. This provides immense application flexibility, which ultimately delivers substantial business flexibility.

Auto-sharding (sometimes called “elasticity”). A NoSQL database automatically spreads data across servers, without requiring applications to participate. Servers can be added or removed from the data layer without application downtime, with data (and I/O) automatically spread across the servers. Most NoSQL databases also support data replication, storing multiple copies of data across the cluster, and even across data centers, to ensure high availability and support disaster recovery.

Distributed query support, NoSQL database systems retain their full query expressive power even when distributed across hundreds or thousands of servers.

Integrated caching, to reduce latency and increase sustained data throughput, advanced NoSQL database technologies transparently cache data in system memory. This behavior is transparent to the application developer and the operations team, in contrast to RDBMS technology where a caching tier is usually a separate infrastructure tier that must be developed to, deployed on separate servers, and explicitly managed by the ops team.

4. NoSQL Database Classification:

Key-value stores: Data is saved with a unique key and a value. This is incredibly fast and this can scale to large size.

Column stores: They are similar to relational databases but they store all of the values for a column together in a stream instead of storing records.

Document stores: They save data without it being structured in a schema, with buckets of key-value pairs inside a self-contained object. This data structure is reminiscent of an associative array in PHP.

Graph databases: They store data in a flexible graph model that contains a node for each object. Nodes have properties and relationships to other nodes.

5. Categories of NoSQL database

Category Description Name of the database

Document Oriented

Data is stored as documents. An example format may be like - FirstName="Arun", Address="St. Xavier's Road", Spouse=[{Name:"Kiran"}], Children=[{Name:"Rihit", Age:8}]

CouchDB, Jackrabbit, MongoDB, OrientDB, SimpleDB,Terrastore

XML database

Data is stored in XML format

BaseX, eXist, MarkLogic Server etc.

Graph databases

Data is stored as a collection of nodes, where nodes are analogous to objects in a programming language. Nodes are connected using edges.

AllegroGraph, DEX, Neo4j, FlockDB, Sones GraphDB

Key-value store

In Key-value-store category of NoSQL database, a user can store data in schema-less way. A key may be strings, hashes, lists, sets, sorted sets and values are stored against these keys.

Cassandra, Riak, Redis, memcached, BigTable

6. Major NoSQL Databases

Dynamo: Dynamo was created by Amazon.com and is the most prominent Key-Value NoSQL database. Amazon was in need of a highly scalable distributed platform for their e-commerce businesses so they developed Dynamo. Amazon S3 uses Dynamo as the storage mechanism.

Cassandra: Cassandra was open sourced by Facebook and is a column oriented NoSQL database.

BigTable: BigTable is Google's proprietary column oriented database. Google allows the use of BigTable but only for the Google App Engine.

SimpleDB: SimpleDB is another Amazon database. Used for Amazon EC2 and S3, it is part of Amazon Web Services that charges fees depending on usage.

Page 3: Non Relational Database-NoSQL

CouchDB: CouchDB along with MongoDB are open source document oriented NoSQL databases.

Neo4J: Neo4j is an open source graph database.

7. Limitation of SQL

The structure of data in a relational database is predefined by the layout of the tables and the fixed

names and types of the columns.

Scaling, Users can scale a relational database by running it on a more powerful—and expensive computer. To scale beyond a certain point, though, it must be distributed across multiple servers. Relational databases don’t work easily in a distributed manner because joining their tables across a distributed system is difficult

Complexity, with relational databases, users must convert all data into tables. When the data doesn’t fit easily into a table, the database’s structure can be complex, difficult, and slow to work with.

Data, Using SQL is convenient with structured data. However, using the language with other types of information is difficult because it’s designed to work with structured, relationally organized databases with fixed table information.

Large feature set, Relational databases offer a big feature set and data integrity. But NoSQL proponents say database users often don’t need all the features, as well as the cost and complexity they add.

8. Benefits of NoSQL

Elastic scaling, for years, database administrators have relied on scale up — buying bigger servers as database load increases — rather than scale out— distributing the database across multiple hosts as load increases. However, as transaction rates and availability requirements increase, and as databases move into the cloud or onto virtualized environments, the economic advantages of scaling out on commodity hardware become irresistible.

But the new breeds of NoSQL databases are designed to expand transparently to take advantage of new nodes, and they’re usually designed with low-cost commodity hardware in mind.

Big data, just as transaction rates have grown out of recognition over the last decade, the volumes of data that are being stored also have increased massively.

Goodbye DBAs, NoSQL databases are generally designed from the ground up to require less management,  automatic repair, data distribution, and simpler data models lead to lower administration and tuning requirements

Economics, NoSQL databases typically use clusters of cheap commodity servers to manage the exploding data and transaction volumes, while RDBMS tends to rely on expensive proprietary servers and storage systems.

Flexible data models, NoSQL databases have far more relaxed — or even nonexistent — data model restrictions. NoSQL Key Value stores and document databases allow the application to store virtually any structure it wants in a data element. Even the more rigidly defined BigTable-based NoSQL databases (Cassandra, HBase) typically allow new columns to be created without too much fuss.

9. Concerns and Challenges

NoSQL databases face several challenges.

Overhead and complexity, Because NoSQL databases don’t work with SQL, they require manual query programming, which can be fast for simple tasks but time-consuming for others. In addition, complex query programming for the databases can be difficult

Reliability, Relational databases natively support ACID, while NoSQL databases don’t. NoSQL databases thus don’t natively offer the degree of reliability that ACID provides. If users want NoSQL databases to apply ACID restraints to a data set, they must perform additional programming.

Consistency, because NoSQL databases don’t natively support ACID transactions, they also could compromise consistency, unless manual support is provided. Not providing consistency enables better performance and scalability but is a problem for certain types of applications and transactions, such as those involved in banking.

Unfamiliarity with the technology. Most organizations are unfamiliar with NoSQL databases and thus may not feel knowledgeable enough to choose one or even to determine that the approach might be better for their purposes.

Page 4: Non Relational Database-NoSQL

Limited Eco structure, unlike commercial relational databases, many open source NoSQL application don’t yet come with customer support or management tools.

10. Production deployment

There is a large number of companies using NoSQL. To name a few:

GoogleFacebookMozillaAdobeFoursquareLinkedInMcGraw-Hill Education

References

[1] Neal Leavitt, “Will NoSQL live up to their promise”, IEEE Computer Society volume: 43 issue: 2, February, 2010, pp.12-14.

[2] Carlos Coronel, Steven Morris, Peter Rob,” Database Systems: Design, Implementation, and Management”, Tenth Edition, 2013, pp.47-49.

[3] Pethuru Raj,” Cloud Enterprise Architecture”, CRC press, 2013, pp.190-195.

[4] Tim Juravich,” Couch DB and PHP Web Development Beginner's Guide”, PACKT Publishing, June 2012, Chapter 1.

[5] Jing Han, “Survey on NoSQL database”, IEEE Pervasive Computing and Applications (ICPCA), 2011 6th International Conference”, 26-28 Oct. 2011, pp.363-366.

[6] Wyile.b, “Using NoSQL Database for Streaming Network Analysis”, IEEE Large Data Analysis and Visualization (LDAV), 2012, 14-15 Oct. 2012, pp.121-124