big data and nosql cloud computing : module 4. objectives how big is big data?what can we do with...

28
Big Data and NoSQL Cloud Computing : Module 4

Upload: christian-dustin-sanders

Post on 25-Dec-2015

226 views

Category:

Documents


6 download

TRANSCRIPT

Page 1: Big Data and NoSQL Cloud Computing : Module 4. Objectives How Big Is Big Data?What can we do with it? What is NoSQL?

Big Data and NoSQLCloud Computing : Module 4

Page 2: Big Data and NoSQL Cloud Computing : Module 4. Objectives How Big Is Big Data?What can we do with it? What is NoSQL?

Objectives

How Big Is Big Data?

What can we do with it?

What is NoSQL?

Page 3: Big Data and NoSQL Cloud Computing : Module 4. Objectives How Big Is Big Data?What can we do with it? What is NoSQL?

Big Data

Kilo- Mega- Giga- Tera- Peta- ??

Structured and Unstuctured

Should be mined for benefit of organizations

Page 4: Big Data and NoSQL Cloud Computing : Module 4. Objectives How Big Is Big Data?What can we do with it? What is NoSQL?

http://mashable.com/2012/06/22/data-created-every-minute/

Page 5: Big Data and NoSQL Cloud Computing : Module 4. Objectives How Big Is Big Data?What can we do with it? What is NoSQL?

http://mashable.com/2012/06/22/data-created-every-minute/

90% of world’s data was created in past two years

Page 6: Big Data and NoSQL Cloud Computing : Module 4. Objectives How Big Is Big Data?What can we do with it? What is NoSQL?

Where does this data come from?

• 1. Machine generated/Sensor data : eg. Logs, call records• 2. Social data: eg. Facebook, twitter• 3. Traditional Enterprise data: eg. Web store transactions

Page 7: Big Data and NoSQL Cloud Computing : Module 4. Objectives How Big Is Big Data?What can we do with it? What is NoSQL?

3 V’s of Big Data

Volume

VolumeTerabytes of Tweets -> Product sentiment analysis

Annual meter readings ->predict power consumption

Velocity

VelocityScrutinize 5 million trade events created each day to identify potential fraud

Analyze 500 million daily call detail records in real-time to predict customer churn faster

Variety

VarietyMonitor 100’s of live video feeds from surveillance cameras to target points of interest

Exploit the 80% data growth in images, video and documents to improve customer satisfaction

Big Data

Page 8: Big Data and NoSQL Cloud Computing : Module 4. Objectives How Big Is Big Data?What can we do with it? What is NoSQL?

Benefits

• Determine root causes of failures, issues and defects in near-real time, potentially saving billions of dollars annually.• Optimize routes for many thousands of package delivery vehicles while

they are on the road.• Analyze millions of SKUs to determine prices that maximize profit and

clear inventory.• Generate retail coupons at the point of sale based on the customer's

current and past purchases.• Send tailored recommendations to mobile devices while customers are

in the right area to take advantage of offers.• Recalculate entire risk portfolios in minutes.• Quickly identify customers who matter the most.• Use clickstream analysis and data mining to detect fraudulent behavior.

Page 9: Big Data and NoSQL Cloud Computing : Module 4. Objectives How Big Is Big Data?What can we do with it? What is NoSQL?

Big Data Technologies and Platforms

Hadoop and Hadoop Stack

HDFS, MapReduce, Pig & Hive

Hadoop Distributions – MapR, Cloudera, HortonWorks

NoSQL Databases – MongoDB, Neo4J, Cassandra

Deployment Options – Microsoft Azure, Amazon Elastic MapReduce

Page 10: Big Data and NoSQL Cloud Computing : Module 4. Objectives How Big Is Big Data?What can we do with it? What is NoSQL?

Scale Up vs Scale Out

Page 11: Big Data and NoSQL Cloud Computing : Module 4. Objectives How Big Is Big Data?What can we do with it? What is NoSQL?

Scale UpAdding resources to a single node. i.e adding more CPU , more RAM etc. to a single computer.

Lower infrastructure(Ethernet) costsLess power consumption than running multiple server.Less Server to manage

Higher hardware(servers) costsBig hardware impact in case of failure of a node.

Page 12: Big Data and NoSQL Cloud Computing : Module 4. Objectives How Big Is Big Data?What can we do with it? What is NoSQL?

Amore nodes or servers to the system i.e if there is one computer in a system then scaling out means adding more computers to the system.

Lower hardware(servers) costsMore flexibleFault Tolerance

Many hypervisor images to maintainHigher infrastructure(Ethernet) costsHigh power consumption

Scale Out

Page 13: Big Data and NoSQL Cloud Computing : Module 4. Objectives How Big Is Big Data?What can we do with it? What is NoSQL?

NoSQL

Page 14: Big Data and NoSQL Cloud Computing : Module 4. Objectives How Big Is Big Data?What can we do with it? What is NoSQL?

Why NoSQL?Simple, Scalable, Cheap

Can handle petabytes of data which when needed will always be available

• In NoSQL there is no need to define any rigid database schema to insert the data in a NoSQL database. You can change the format of data at any time, without application disruption. Thus provides application flexibility.

No Schema Required

• Multiple copies of data can be stored across the cluster, and data centers. Thus in case of any disaster, there is a great probability that data can be recovered. Thus ensure high-availability of data.

Replication

• NoSQL database systems support distributed query i.e there would be no effect on query expressive power when distributed across hundreds or thousands of servers

Distributed

• NoSQL Databases transparently cache data in system memory, thus reducing latency and increase sustained data throughput. This behaviour is transparent to the application developer and the operations team

Integrated Caching

Page 15: Big Data and NoSQL Cloud Computing : Module 4. Objectives How Big Is Big Data?What can we do with it? What is NoSQL?

NoSQL Database Types

• Document databases pair each key with a complex data structure known as a document. Documents can contain many different key-value pairs, or key-array pairs, or even nested documents.• Graph stores are used to store information about networks, such as

social connections. Graph stores include Neo4J and HyperGraphDB.• Key-value stores are the simplest NoSQL databases. Every single item

in the database is stored as an attribute name (or "key"), together with its value. Examples of key-value stores are Riak and Voldemort. Some key-value stores, such as Redis, allow each value to have a type, such as "integer", which adds functionality.• Wide-column stores such as Cassandra and HBase are optimized for

queries over large datasets, and store columns of data together, instead of rows.

Page 16: Big Data and NoSQL Cloud Computing : Module 4. Objectives How Big Is Big Data?What can we do with it? What is NoSQL?
Page 17: Big Data and NoSQL Cloud Computing : Module 4. Objectives How Big Is Big Data?What can we do with it? What is NoSQL?

Why Not RDBMS?

Reading Data•Accelerate only data reads•Cold cache thrash – Caches are temporary, so therefore whenever an application seeks some data, it first tries to find the data in caching tier and when it doesn’t find the data there then it is forced to read the data from the RDBMS thus delaying both read and write •Another tier to manage - In RDBMS , caching is developed as a separate infrastructure tier thus inserting another infrastructure tier into the existing architecture adds more complexity.

Partitioning(Sharding)•Application needs to be Partition Aware•When you fill a shard, it is highly disruptive to re-shard.•Relationships are broken i.e. referential integrity is no more.•You lose some of the most important benefits of the relational model.•You have to create and maintain a schema on every server .

Schema•RDBMS technology requires the strict definition of a “schema” prior to storing any data into the database. It’s an integral part as it defines the structure of the database. In RDBMS changes like capturing new information, changing the data formats and content of the application, are extremely turbulent and therefore are frequently avoided.

Page 18: Big Data and NoSQL Cloud Computing : Module 4. Objectives How Big Is Big Data?What can we do with it? What is NoSQL?

MongoDB – Quick Tutorial

Page 19: Big Data and NoSQL Cloud Computing : Module 4. Objectives How Big Is Big Data?What can we do with it? What is NoSQL?

What is MongoDB

Document Database

{id: "00e8da9d", type: "Film”pricing: { ... }details: {

title: "The Matrix", director: [ "Andy Wachowski", "Larry Wachowski" ], writer: [ "Andy Wachowski", "Larry Wachowski" ], ..., aspect_ratio: "1.66:1" },

} ….}

Page 20: Big Data and NoSQL Cloud Computing : Module 4. Objectives How Big Is Big Data?What can we do with it? What is NoSQL?

Installing MongoDBDownload the binary files for the desired release of MongoDB.

Download the binaries from https://www.mongodb.org/downloads.

Extract the files from the downloaded archive.tar -zxvf mongodb-linux-x86_64-2.6.1.tgz

Copy the extracted archive to the target directory.

Copy the extracted folder to the location from which MongoDB will run.

mkdir -p mongodb

cp -R -n mongodb-linux-x86_64-2.6.1/ mongodb

Ensure the location of the binaries is in the PATH variable.

The MongoDB binaries are in the bin/ directory of the archive. To ensure that the binaries are in your PATH, you can modify your PATH.

For example, you can add the following line to your shell’s rc file (e.g. ~/.bashrc):

export PATH=<mongodb-install-directory>:$PATH

Replace <mongodb-install-directory> with the path to the MongoDB binaries.

Page 21: Big Data and NoSQL Cloud Computing : Module 4. Objectives How Big Is Big Data?What can we do with it? What is NoSQL?

Running MongoDBCreate the data directory.

The following example command creates the default /data/db directory:mkdir -p /data/db

Set permissions for the data directory.

Before running mongod for the first time, ensure that the user account running mongod has read and write permissions for the directory.

Run MongoDB.

To run MongoDB, run the mongod process at the system prompt. If necessary, specify the path of the mongod or the data directory. See the following examples.

Run without specifying paths

If your system PATH variable includes the location of the mongod binary and if you use the default data directory (i.e., /data/db), simply enter mongod at the system prompt:

mongod

Stop MongoDB as needed.

To stop MongoDB, press Control+C in the terminal where the mongod instance is running.

Page 22: Big Data and NoSQL Cloud Computing : Module 4. Objectives How Big Is Big Data?What can we do with it? What is NoSQL?

Where to Go Further?

http://docs.mongodb.org/manual/tutorial/

https://university.mongodb.com/

Page 23: Big Data and NoSQL Cloud Computing : Module 4. Objectives How Big Is Big Data?What can we do with it? What is NoSQL?

Handling DatabasesConnect to a mongodmongo

From the mongo shell, display the list of databases, with the following operation:show dbs

Switch to a new database named mydb, with the following operation:use mydb

Confirm that your session has the mydb database as context, by checking the value of the db object, which returns the name of the current database, as follows:db

Inserting Data to Collectionsj = { name : "mongo" }

k = { x : 3 }

db.testData.insert( j )

db.testData.insert( k )

Dropping a Databasedb.dropDatabase()

> { "dropped" : "mydb", "ok" : 1 }

Page 24: Big Data and NoSQL Cloud Computing : Module 4. Objectives How Big Is Big Data?What can we do with it? What is NoSQL?

Inserting Data

SQLINSERT INTO post VALUES(title, description, tags, likes) VALUES (‘MongoDBOverview’, ‘MongoDB is no sql database’, ‘database’, ‘100’)

MongoDBdb.post.insert([

{

title: 'MongoDB Overview',

description: 'MongoDB is no sql database',

tags: 'database',

likes: 100

}

Page 25: Big Data and NoSQL Cloud Computing : Module 4. Objectives How Big Is Big Data?What can we do with it? What is NoSQL?

Retrieving Data

SQL SELECT Statements MongoDB find() Statements

SELECT * FROM users db.users.find()

SELECT id, user_id, status FROM users db.users.find( { }, { user_id: 1, status: 1 } )

SELECT user_id, status FROM users db.users.find( { }, { user_id: 1, status: 1, _id: 0 } )

SELECT * FROM users WHERE status = "A" db.users.find( { status: "A" } )

SELECT user_id, status FROM users WHERE status = "A"

db.users.find( { status: "A" }, { user_id: 1, status: 1, _id: 0 } )

SELECT * FROM users WHERE status != "A" db.users.find( { status: { $ne: "A" } } )

SELECT * FROM users WHERE status = "A" AND age = 50 db.users.find( { status: "A", age: 50 } )

SELECT * FROM users WHERE status = "A" OR age = 50

db.users.find( { $or: [ { status: "A" } , { age: 50 } ] } )

SELECT * FROM users WHERE age > 25 db.users.find( { age: { $gt: 25 } } )

SELECT * FROM users WHERE age < 25 db.users.find( { age: { $lt: 25 } } )

Page 26: Big Data and NoSQL Cloud Computing : Module 4. Objectives How Big Is Big Data?What can we do with it? What is NoSQL?

Retrieving Data

db.mycol.find({"tags":"mongodb","title": "MongoDB Overview"}).pretty()

{

"_id": ObjectId(7df78ad8902c),

"title": "MongoDB Overview",

"description": "MongoDB is no sql database",

"tags": ["mongodb", "database", "NoSQL"],

"likes": "100"

}

Page 27: Big Data and NoSQL Cloud Computing : Module 4. Objectives How Big Is Big Data?What can we do with it? What is NoSQL?

Where to Go Further?

http://docs.mongodb.org/manual/tutorial/

https://university.mongodb.com/

Page 28: Big Data and NoSQL Cloud Computing : Module 4. Objectives How Big Is Big Data?What can we do with it? What is NoSQL?

That’s All Folks!