[globant summer take over] empowering big data with cassandra

Post on 07-Apr-2017

455 Views

Category:

Education

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Empowering Big Datawith Cassandra

Empowering Big Datawith Cassandra

./me

> Renato Carelli- DevOps + Infra @ Big Data- Hardening Enthusiast- Cloud evangelist- Bitcoin speculator

./intro

./intro/CAP

Consistency

Availability Partitiontolerance

CA CP

AP

N/A

./intro/RDBMS

Data

Performance

Com

plex

ityCo

stsMana

g. ti

me

Issue

s

./intro/NoSQL

Unstructured(not really a DB) Key Value Column Graph Document

General file storageText filesLog files

Complex modelsFlexible business logicSemi-structured dataHigh volumes

OLAPAnalyticsNOT FOR UPDATES

Relations between entities (social graphs)

Agile developmentFlexible data-modelsToo many types. Eg: Corporate areas

Data Store BigQueryGFS BigTableCloudStore

./intro/BigData

./intro/specs

./intro/history

BigTable (2006) Dynamo (2007)

Open Source(2008)

{data modeling} {design}

./intro/version_historyju

l/08

apr/1

0

jul/0

9

jan/

11ju

n/11

oct/1

1

(0.1) (0.3) (0.6) (0.7)(0.8)(1.0)

apr/1

2

(1.1) (1.2)

jan/

13

sept

/13

(2.0)

sept

/14

(2.1)

sept/15 (3.0.0-rc1)

./infra

./infra/features

N1

N2N4

N3

> Masterless

> Distributed

> Decentralized [p2p]

> Elastically Scalable

> Highly Available

> Fault-Tolerant

> Tuneable Consistent

./infra/benchmark

Nodes

Ops

/sec

./infra/benchmark

./infra/references

N1 C* Node

Connection Failed

Connection Established

Updated Data

Outdated Data

ACK

Slow Connection Established

./infra/token

Murmur3Partitioner:

-2^63 to +2^63 -1

token(‘Globant’) = -6148914691517517206

./infra/token

DemoPartitioner:

1 to 100

token(‘Globant’) = 68

./infra/token_ring

Node 1 Node 2

Node 3 Node 4

./infra/token_ring

Node 1 Node 2

Node 3 Node 4

1 - 25 26 - 50

51 - 75 76 - 100

‘Glob’ = 17

‘ant’ = 94

‘Globant’ = 68

~/Images/pic.png = 69

~/media/movie.mkv = 34

./infra/token_ring/replication

Node 1 Node 2

Node 3 Node 4

1 - 25 26 - 50

51 - 75 76 - 100

‘Glob’ = 17

RF = 3

./infra/token_ring/vnodes

What about virtual nodes?

C* 1.2

./infra/coordinator

N1

N2N4

N3

./infra/coordinator

N1

N2N4

N3> readRF = 3CL = TWO

./infra/coordinator

N1

N2N4

N3> readRF = 3CL = TWO

./infra/coordinator

N1

N2N4

N3

Coordinator

> readRF = 3CL = TWO

./infra/coordinator

N1

N2N4

N3

Coordinator

> readRF = 3CL = TWO

./infra/coordinator

N1

N2N4

N3

Coordinator

> readRF = 3CL = TWO

./infra/coordinator

N1

N2N4

N3

Coordinator

> readRF = 3CL = TWO

./infra/coordinator

N1

N2N4

N3

Coordinator

> readRF = 3CL = TWO

./infra/coordinator

N1

N2N4

N3

Coordinator

> readRF = 3CL = TWO

./infra/replication

How many copies of each piece of data (partition) do we want in the system?

./infra/replication

> Replication Factor> Replication Strategy

Keyspace-based!

./infra/replication

N1

N2N4

N3RF = 3

./infra/replication

CREATE KEYSPACE Globant WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 3 };

./infra/replicationN1

N2N3

N4

R1

R2

Data Center - East

N1

N2N3

N4

R1

R2

Data Center - West

RF = {‘w’:3, ‘e’:2}

./infra/replication

CREATE KEYSPACE Globant WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', 'w' : 3, 'e' : 2};

./infra/consistency_level

How many replicas/nodes (based in RF) must respond to declare success?

./infra/consistency_level

Query-based!

./infra/consistency_level

N1

N2N4

N3

> writeCL = QUORUM

CL = { ANY, ONE, TWO, THREE, QUORUM, LOCAL_ONE, LOCAL_QUORUM, EACH_QUORUM, ALL }

./infra/consistency_level

N1

N2N4

N3

> readCL = ALL

CL = { ANY, ONE, TWO, THREE, QUORUM, LOCAL_ONE, LOCAL_QUORUM, EACH_QUORUM, ALL }

./infra/consistency_level

N1

N2N4

N3

> readCL = QUORUM

CL = { ANY, ONE, TWO, THREE, QUORUM, LOCAL_ONE, LOCAL_QUORUM, EACH_QUORUM, ALL }

./infra/consistency_level

Latest timestamp wins!

./infra/consistency_level/immediate

{ R + W > RF }

./infra/consistency_level/immediate

+Reads > Write CL: ALL> Read CL: ONE

./infra/consistency_level/immediate

N1

N2N4

N3

RF = 3

> writeCL = ALL

./infra/consistency_level/immediate

N1

N2N4

N3

RF = 3

> writeCL = ALL

> readCL = ONE

./infra/consistency_level/immediate

N1

N2N4

N3

RF = 3

> writeCL = ALL

> readCL = ONE

./infra/consistency_level/immediate

N1

N2N4

N3

RF = 3

> writeCL = ALL

> readCL = ONE

./infra/consistency_level/immediate

{ R + W > RF }

./infra/consistency_level/immediate

{ 1 + 3 > 3 }

./infra/consistency_level/immediate

+Writes> Write CL: ONE> Read CL: ALL

./infra/consistency_level/immediate

N1

N2N4

N3

RF = 3

> writeCL = ONE

./infra/consistency_level/immediate

N1

N2N4

N3

RF = 3

> writeCL = ONE

> readCL = ALL

./infra/consistency_level/immediate

N1

N2N4

N3

RF = 3

> writeCL = ONE

> readCL = ALL

./infra/consistency_level/immediate

N1

N2N4

N3

RF = 3

> writeCL = ONE

> readCL = ALL

./infra/consistency_level/immediate

{ R + W > RF }

./infra/consistency_level/immediate

{ 3 + 1 > 3 }

./infra/consistency_level/immediate

Balanced> Write CL: QUORUM> Read CL: QUORUM

./infra/consistency_level/immediate

N1

N2N4

N3

RF = 3

> writeCL = QUORUM

./infra/consistency_level/immediate

N1

N2N4

N3

RF = 3

> writeCL = QUORUM

> readCL = QUORUM

./infra/consistency_level/immediate

N1

N2N4

N3

RF = 3

> writeCL = QUORUM

> readCL = QUORUM

./infra/consistency_level/immediate

N1

N2N4

N3

RF = 3

> writeCL = QUORUM

> readCL = QUORUM

./infra/consistency_level/immediate

{ R + W > RF }

./infra/consistency_level/immediate

{ 2 + 2 > 3 }

./infra/read_repair

> Query ALL replicas when reading- Data from one.- Checksum + Timestamp from others.

./infra/read_repair

> If there is a mismatch:- Pull all data and merge- Write back to out of sync replicas

./infra/read_repair

Table-based!

./infra/read_repair

N1

N2N4

N3

DATA

SUM

SUM

./infra/read_repair

N1

N2N4

N3

DATA

SUM

SUM

./infra/read_repair

N1

N2N4

N3

./infra/read_repair

ALTER TABLE Globant.foobar WITH read_repair_chance = 0.2;

./infra/read_repair

> Weak Consistencyreturn results + repair

> Strong Consistencyrepair + return results

./infra/hinted_handoff

> Recovery mechanism- Stored @ Coordinator‘s system.hints- 3hs default TTL- DataCenter-based!

./infra/nodetool

$ nodetool repair> Recovering a failed node> Infreq read data (read repair chance)> Tombstone gc period (gc_grace_seconds)

./internals

./internals/write_path

Npartition key 3 n: SasoConf city: cur year: 3partition key 2 n: EkoParty city: caba year: 11partition key 1 n: pwnConf city: mdq year: 2

MEMORY

STORAGE

Memtable (1 table)

CommitLog

SSTables

... ... ... ...

... ... ... ...

... ... ... ...

... ... ... ...

... ... ... ...

... ... ... ...

C

./internals/write_path

N

MEMORY

STORAGE

Memtable (1 table)

CommitLog

SSTables

... ... ... ...

... ... ... ...

... ... ... ...

... ... ... ...

... ... ... ...

... ... ... ...

C

(Flush)

... ... ... ...

... ... ... ...

... ... ... ...

partition key 3 n: SasoConf city: cur year: 3partition key 2 n: EkoParty city: caba year: 11partition key 1 n: pwnConf city: mdq year: 2

./internals/write_path

N

MEMORY

STORAGE

Memtable (1 table)

CommitLog

SSTables

... ... ... ...

... ... ... ...

... ... ... ...

... ... ... ...

... ... ... ...

... ... ... ...

C

... ... ... ...

... ... ... ...

... ... ... ...

partition key 3 n: SasoConf city: cur year: 3partition key 2 n: EkoParty city: caba year: 11partition key 1 n: pwnConf city: mdq year: 2

Compaction

... ... ... ...

... ... ... ...

... ... ... ...

./internals/write_path

N

MEMORY

STORAGE

Memtable (1 table)

CommitLog

SSTables

C

... ... ... ...

... ... ... ...

... ... ... ...

... ... ... ...

... ... ... ...

... ... ... ...

... ... ... ...

... ... ... ...

... ... ... ...

Compaction

... ... ... ...

... ... ... ...

... ... ... ...

APPEND ONLY

APPEND ONLY

IMMUTA

BLE

./hands-on

./hands-on/stress

> 1.3M writes/sec (1.3 write/µs)

> 160K reads/sec (160 reads/ms)

> Collisions?

./hands-on/stress

Custom Apps

./me/contact

> Renato Carelli- mailto: renato.carelli@globant.com- mailto: renato@carelli.com.ar- telegram: @renato

We are hiring DevOps!> mailto: solange.domijan@globant.com

Thanks!

top related