introduction to big data and nosql

52
1 Introduction to Big Data and NoSQL SQL Azure Saturday April, 21, 2012 Don Demsak Advisory Solutions Architect EMC Consulting www.donxml.com

Upload: murray

Post on 22-Feb-2016

107 views

Category:

Documents


3 download

DESCRIPTION

Introduction to Big Data and NoSQL. SQL Azure Saturday April, 21, 2012. Don Demsak Advisory Solutions Architect EMC Consulting www.donxml.com. Meet Don. Advisory Solutions Architect EMC Consulting Application Architecture, Development & Design DonXml.com, Twitter: donxml - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Introduction to Big Data and  NoSQL

1

Introduction to Big Data and NoSQLSQL Azure SaturdayApril, 21, 2012

Don DemsakAdvisory Solutions ArchitectEMC Consultingwww.donxml.com

Page 2: Introduction to Big Data and  NoSQL

2

Meet Don

• Advisory Solutions Architect– EMC Consulting

• Application Architecture, Development & Design• DonXml.com, Twitter: donxml• Email – [email protected]• SlideShare - http://www.slideshare.net/dondemsak

Page 3: Introduction to Big Data and  NoSQL

3

The era of Big Data

Page 4: Introduction to Big Data and  NoSQL

4

How did we get here?• Expensive

– Processors– Disk space– Memory– Operating Systems– Software– Programmers

• Monoculture– Limit CPU cycles– Limit disk space– Limit memory– Limited OS

Development– Limited Software– Programmers

• Mono-lingual• Mono-persistence

Page 5: Introduction to Big Data and  NoSQL

5

Typical RDBMS Implementations• Fixed table schemas• Small but frequent reads/writes• Large batch transactions• Focus on ACID– Atomicity– Consistency– Isolation– Durability

Page 6: Introduction to Big Data and  NoSQL

6

How we scale RDBMS implementations

Page 7: Introduction to Big Data and  NoSQL

7

1st Step – Build a relational database

Database

Page 8: Introduction to Big Data and  NoSQL

8

2nd Step – Table Partitioning

Database

p1 p2 p3

Page 9: Introduction to Big Data and  NoSQL

9

3rd Step – Database Partitioning

Web TierBrowser B/L Tier DatabaseCustomer #2

Web TierBrowser B/L Tier DatabaseCustomer #1

Web TierBrowser B/L Tier DatabaseCustomer #3

Page 10: Introduction to Big Data and  NoSQL

10

4th Step – Move to the cloud?

Web TierBrowser B/L Tier SQL AzureFederation

Customer #2

Web TierBrowser B/L Tier SQL AzureFederation

Customer #1

Web TierBrowser B/L Tier SQL AzureFederation

Customer #3

Page 11: Introduction to Big Data and  NoSQL

11

There has to be other ways

Page 12: Introduction to Big Data and  NoSQL

12

Polyglot Persistence

Page 13: Introduction to Big Data and  NoSQL

13

Polyglot Programmer

Page 14: Introduction to Big Data and  NoSQL

14

Page 15: Introduction to Big Data and  NoSQL

15

Where Did NoSQL Originate?• 1998 - Carlo Strozzi– NoSQL project - lightweight open-source

relational DB with no SQL interface• 2009 - Eric Evans & Johan Oskarsson of

Last.fm wanted to organize an event to discuss open-source distributed databases

Page 16: Introduction to Big Data and  NoSQL

16

NoSQL (loose) Definition• (often) Open source• Non-relational• Distributed• (often) don’t guarantee ACID

Page 17: Introduction to Big Data and  NoSQL

17

Atlanta 2009• No:sql(east) conference– select fun, profit from real_world where

relational=false• Billed as “conference of no-rel datastores”

Page 18: Introduction to Big Data and  NoSQL

18

Types Of NoSQL Data Stores

Page 19: Introduction to Big Data and  NoSQL

19

5 Groups of Data ModelsRelational

Document

Key Value

Graph

Column Family

Page 20: Introduction to Big Data and  NoSQL

20

Document Store• Apache Jackrabbit• CouchDB• MongoDB• SimpleDB• XML Databases– MarkLogic Server– eXist.

Page 21: Introduction to Big Data and  NoSQL

21

Document?• Okay think of a web page...– Relational model requires column/tag– Lots of empty columns– Wasted space

• Document model just stores the pages as is– Saves on space– Very flexible.

Page 22: Introduction to Big Data and  NoSQL

22

Graph Storage• AllegroGraph• Core Data• Neo4j• DEX• FlockDB• Microsoft Trinity (research project)– http://research.microsoft.com/en-us/projects/

trinity/

Page 23: Introduction to Big Data and  NoSQL

23

What’s a graph?• Graph consists of– Node (‘stations’ of the graph)– Edges (lines between them)

• FlockDB– Created by the Twitter folks– Nodes = Users– Edges = Nature of relationship between nodes.

Page 24: Introduction to Big Data and  NoSQL

24

Key/Value Stores• On disk• Cache in Ram• Eventually Consistent

– Weak Definition• “If no updates occur for a period, eventually all updates will

propagate through the system and all replicas will be consistent”

– Strong Definition• “for a given update and a given replica eventually either the

update reaches the replica or the replica retires”

• Ordered– Distributed Hash Table allows lexicographical processing

Page 25: Introduction to Big Data and  NoSQL

25

Key/Value Examples• Azure AppFabric Cache• Memcache-d• VMWare vFabric GemFire

Page 26: Introduction to Big Data and  NoSQL

26

Object Databases• Db4o• GemStone/S• InterSystems Caché• Objectivity/DB• ZODB

Page 27: Introduction to Big Data and  NoSQL

27

Tabular• BigTable• Mnesia• Hbase• Hypertable• Azure Table Storage• SQL Server 2012

Page 28: Introduction to Big Data and  NoSQL

28

Azure Table Storage Demo

Page 29: Introduction to Big Data and  NoSQL

29

Big Data

Page 30: Introduction to Big Data and  NoSQL

30

Big Data Definition• Volumes & volumes of data• Unstructured• Semi-structured• Not suited for Relational Databases• Often utilizes MapReduce frameworks

Page 31: Introduction to Big Data and  NoSQL

31

Big Data Examples• Cassandra• Hadoop• Greenplum• Azure Storage• EMC Atmos• Amazon S3• SQL Azure (with Federations support)

Page 32: Introduction to Big Data and  NoSQL

32

Real World Example• Twitter

– The challenges• Needs to store many graphs

Who you are following Who’s following you Who you receive phone

notifications from etc• To deliver a tweet requires

rapid paging of followers• Heavy write load as followers

are added and removed• Set arithmetic for @mentions

(intersection of users).

Page 33: Introduction to Big Data and  NoSQL

33

What did they try?• Started with

Relational Databases• Tried Key-Value

storage of denormalized lists• Did it work?– Nope

• Either good at Handling the write

load Or paging large

amounts of data But not both

Page 34: Introduction to Big Data and  NoSQL

34

What did they need?• Simplest possible thing that would work• Allow for horizontal partitioning• Allow write operations to• Arrive out of order– Or be processed more than once– Failures should result in redundant work

• Not lost work!

Page 35: Introduction to Big Data and  NoSQL

35

The Result was FlockDB• Stores graph data• Not optimized for graph traversal operations• Optimized for large adjacency lists– List of all edges in a graph

• Key is the edge value a set of the node end points

• Optimized for fast read and write• Optimized for page-able set arithmetic.

Page 36: Introduction to Big Data and  NoSQL

36

How Does it Work?• Stores graphs as sets of edges between

nodes• Data is partitioned by node– All queries can be answered by a single partition

• Write operations are idempotent– Can be applied multiple times without changing

the result• And commutative– Changing the order of operands doesn’t change

the result.

Page 37: Introduction to Big Data and  NoSQL

37

Working With Big Data

Page 38: Introduction to Big Data and  NoSQL

38

ACID• Atomicity– All or Nothing

• Consistency– Valid according to all defined rules

• Isolation– No transaction should be able to interfere with

another transaction• Durability– Once a transaction has been committed, it will

remain so, even in the event of power loss, crashes, or errors

Page 39: Introduction to Big Data and  NoSQL

39

BASE• Basically Available– High availability but not always consistent

• Soft state– Background cleanup mechanism

• Eventual consistency– Given a sufficiently long period of time over

which no changes are sent, all updates can be expected to propagate eventually through the system and all the replicas will be consistent.

Page 40: Introduction to Big Data and  NoSQL

40

Traditional (relational) Approach

Extract

Transform

Load

Transactional Data Store

Data Warehouse

Page 41: Introduction to Big Data and  NoSQL

41

Big Data Approach• MapReduce Pattern/Framework– an Input Reader– Map Function – To transform to a common

shape (format)– a partition function– a compare function– Reduce Function– an Output Writer

Page 42: Introduction to Big Data and  NoSQL

42

MongoDB Example

> // map function> m = function(){... this.tags.forEach(... function(z){... emit( z , { count : 1 } );... }... );...};

> // reduce function> r = function( key , values ){... var total = 0;... for ( var i=0; i<values.length; i++ )... total += values[i].count;... return { count : total };...};

> // execute> res = db.things.mapReduce(m, r, { out : "myoutput" } );

Page 43: Introduction to Big Data and  NoSQL

43

MongoDB Demo

Page 44: Introduction to Big Data and  NoSQL

44

Big Data on Azure• Azure Table Storage– Azure Service Bus

• SQL Azure Federations• MongoDB on Azure

– http://www.mongodb.org/display/DOCS/MongoDB+on+Azure

• Hadoop on Azure– https://www.hadooponazure.com/

Page 45: Introduction to Big Data and  NoSQL

45

Using Azure for Computing

MasterClient

Data

Worker

Worker

Worker

Data

Data

DataJob/Task SchedulerSockets

Page 46: Introduction to Big Data and  NoSQL

46

Moving to Event Based Architecture

Web Role

Queue

Req

Web Role

Web Role

Req

Req

Monitor queuelength against

user’s expectations

Web Role

Web Role

Web Role

Worker Role

Worker Role

Worker Role

Worker Role

Worker Role

Worker Role

Page 47: Introduction to Big Data and  NoSQL

47

Aggregate Stores

Page 48: Introduction to Big Data and  NoSQL

48

Visualizing AggregatesID: 1001

Customer: Ann

Line Items

32411234 2 $48 $96707423234 1 $56 456

125145 1 $24 $24

Payment Details

Card: AmExCC#: 12343Expiration: 07/2015

Orders

Customers

Order Lines

Credit Cards

Page 49: Introduction to Big Data and  NoSQL

49

Visualizing AggregatesID: 1001

Customer: Ann

Line Items

32411234 2 $48 $96707423234 1 $56 456

125145 1 $24 $24

Payment Details

Card: AmExCC#: 12343Expiration: 07/2015

{“SalesOrdersView”:{ ID: 1001, Customer: Ann, LineItems: []……………..…………….……………..}}

Page 50: Introduction to Big Data and  NoSQL

50

MongoDB on Azure Demo

Page 51: Introduction to Big Data and  NoSQL

51

Next Steps• Learn a NoSQL product– Great place to start – AppFabric Cache, Azure

Table Storage, MongoDB• Pick a new programming language to learn– Not Java or C#/VB– Node.js, JavaScript, F#

Page 52: Introduction to Big Data and  NoSQL

52

THANK YOU