apachecon europe 2012 - real time big data in practice with cassandra

26
Real-Time Big Data in practice with Cassandra Michaël Figuière @mfiguiere

Upload: michael-figuiere

Post on 13-Dec-2014

5.402 views

Category:

Documents


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: ApacheCon Europe 2012 - Real Time Big Data in practice with Cassandra

Real-Time Big Data in practice with Cassandra

Michaël Figuière@mfiguiere

Page 2: ApacheCon Europe 2012 - Real Time Big Data in practice with Cassandra

©2012 DataStax

Speaker

Michaël Figuière

@mfiguiere

2

Page 3: ApacheCon Europe 2012 - Real Time Big Data in practice with Cassandra

©2012 DataStax

Ring Architecture

CassandraNode

Node

NodeNode

Node

Node

3

Page 4: ApacheCon Europe 2012 - Real Time Big Data in practice with Cassandra

©2012 DataStax

Ring Architecture

4

Node

Node Replica

Replica

NodeReplica

Page 5: ApacheCon Europe 2012 - Real Time Big Data in practice with Cassandra

©2012 DataStax

Linear Scalability

5

Client Writes/s by Node Count - Replication Factor = 3

Page 6: ApacheCon Europe 2012 - Real Time Big Data in practice with Cassandra

©2012 DataStax

Client / Server Communication

Client

Client

Client

Client

Node?

Node

6

Replica

Replica

Replica

Node

Page 7: ApacheCon Europe 2012 - Real Time Big Data in practice with Cassandra

©2012 DataStax

Client / Server Communication

Client

Client

Client

Client

Node

Node

Coordinator node:Forwards all R/W requeststo corresponding replicas

7

Replica

Replica

ReplicaNode

Page 8: ApacheCon Europe 2012 - Real Time Big Data in practice with Cassandra

©2012 DataStax

3 replicas

A A A

Time

8

Tunable Consistency

Page 9: ApacheCon Europe 2012 - Real Time Big Data in practice with Cassandra

©2012 DataStax

Write and wait for acknowledge from one node

Write ‘B’

B A A

9

Time

A A A

Tunable Consistency

Page 10: ApacheCon Europe 2012 - Real Time Big Data in practice with Cassandra

©2012 DataStax

R + W < N

Read waiting for one node to answer

B A A

10

B A A

A A A

Write and wait for acknowledge from one node

Time

Tunable Consistency

Page 11: ApacheCon Europe 2012 - Real Time Big Data in practice with Cassandra

©2012 DataStax

R + W = N

11

B B A

B A

A A A

B

Write and wait for acknowledges from two nodes

Read waiting for one node to answer

Tunable ConsistencyTime

Page 12: ApacheCon Europe 2012 - Real Time Big Data in practice with Cassandra

©2012 DataStax

Tunable Consistency

R + W > N

12

B A

B A

A A A

B

B

Write and wait for acknowledges from two nodes

Read waiting for two nodes to answer

Time

Page 13: ApacheCon Europe 2012 - Real Time Big Data in practice with Cassandra

©2012 DataStax

Tunable Consistency

R = W = QUORUM

13

B A

B A

A A A

B

B

Time

QUORUM = (N / 2) + 1

Page 14: ApacheCon Europe 2012 - Real Time Big Data in practice with Cassandra

©2012 DataStax

Request Path

1

2

2

2

3

3

3

4

14

Client

Client

Client

Client

Node

Node

Node

Replica

Replica

Replica

Coordinator node

Page 15: ApacheCon Europe 2012 - Real Time Big Data in practice with Cassandra

©2012 DataStax

Column Family Data Model

15

Jonathan

name

[email protected]

email

123 main

address

TX

statejbellis

Daria

name

[email protected]

email

45 2nd st

address

CA

statedhutch

Eric

name

[email protected]

emailegilmore

Row Key Columns

Page 16: ApacheCon Europe 2012 - Real Time Big Data in practice with Cassandra

©2012 DataStax

Column Family Data Model

16

dhutch egilmore datastax mzcassiejbellis

egilmoredhutch

datastax mzcassieegilmore

Row Key Columns

Page 17: ApacheCon Europe 2012 - Real Time Big Data in practice with Cassandra

©2012 DataStax

CQL3 Data Model

17

gmason

user_id

1765

tweet_id

phenry

author

Give me liberty or give me death

body

PartitionKey

gmason 1742 gwashington I chopped down the cherry tree

ahamilton 1797 jadams A government of laws, not men

ahamilton 1742 gwashington I chopped down the cherry tree

RemainingKey

Timeline Table

Page 18: ApacheCon Europe 2012 - Real Time Big Data in practice with Cassandra

©2012 DataStax

CQL3 Data Model

18

gmason

user_id

1765

tweet_id

phenry

author

Give me liberty or give me death

body

gmason 1742 gwashington I chopped down the cherry tree

ahamilton 1797 jadams A government of laws, not men

ahamilton 1742 gwashington I chopped down the cherry tree

Timeline Table

CREATE TABLE timeline ( user_id varchar, tweet_id uuid, author varchar, body varchar, PRIMARY KEY (user_id, tweet_id));

CQL

Page 19: ApacheCon Europe 2012 - Real Time Big Data in practice with Cassandra

©2012 DataStax

CQL3 Data Model

19

gmason

user_id

1765

tweet_id

phenry

author

Give me liberty or give me death

body

gmason 1742 gwashington I chopped down the cherry tree

ahamilton 1797 jadams A government of laws, not men

ahamilton 1742 gwashington I chopped down the cherry tree

gwashington

[1742, author]

I chopped down the...

[1742, body]

phenry

[1765, author]

Give me liberty or give...

[1765, body]gmason

gwashington

[1742, author]

I chopped down the...

[1742, body]

jadams

[1797, author]

A government of laws...

[1797, body]ahamilton

Timeline Table

Timeline Physical Layout

Page 20: ApacheCon Europe 2012 - Real Time Big Data in practice with Cassandra

©2012 DataStax

Denormalized Data Model

20

Data duplicated over several tables

Page 21: ApacheCon Europe 2012 - Real Time Big Data in practice with Cassandra

©2012 DataStax

Real-Time Analytics

Google Analytics gives you immediate statistics about

your website traffic

21

Page 22: ApacheCon Europe 2012 - Real Time Big Data in practice with Cassandra

©2012 DataStax

Web Analytics Data Model

22

/index.html

url

12:00

time

354

views

/index.html 12:01 402

/contacts.html 12:00 23

/contacts.html 12:01 20

Analytics Table

CREATE TABLE analytics ( url varchar, time timestamp, views counter, from_search counter, direct counter, from_referrer counter, PRIMARY KEY (url, time));

300

from_search

333

3

4

20

direct

25

0

1

34

from_referrer

44

20

15

CQL

Page 23: ApacheCon Europe 2012 - Real Time Big Data in practice with Cassandra

©2012 DataStax

Web Analytics Data Model

23

/index.html

url time

354

views

/index.html 402

/contacts.html 23

/contacts.html 20

Analytics Table

UPDATE analyticsSET views = views + 1, from_search = from_search + 1WHERE url = '/index.html'AND time = '2012-10-06 12:00';

300

from_search

333

3

4

20

direct

25

0

1

34

from_referrer

44

20

15

CQL

12:00

12:01

12:00

12:01

Page 24: ApacheCon Europe 2012 - Real Time Big Data in practice with Cassandra

©2012 DataStax

Web Analytics Data Model

24

/index.html

url time

354

views

/index.html 402

/contacts.html 23

/contacts.html 20

Analytics Table

SELECT * FROM analyticsWHERE url = '/index.html'

300

from_search

333

3

4

20

direct

25

0

1

34

from_referrer

44

20

15

CQL

12:00

12:01

12:00

12:01

Page 25: ApacheCon Europe 2012 - Real Time Big Data in practice with Cassandra

©2012 DataStax

Online Business Intelligence

Application Cassandra Hadoop

Storage for application in production

Using results in production

Distributed batch processing

Storage forresults

25

Page 26: ApacheCon Europe 2012 - Real Time Big Data in practice with Cassandra

@mfiguiere

blog.datastax.com

Stay Tuned!