apache cassandra and datastax - meetupfiles.meetup.com/3139382/apachecassandraanddatastax.pdf ·...

46
DataStax EMEA Apache Cassandra and DataStax

Upload: others

Post on 02-Feb-2020

15 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Apache Cassandra and DataStax - Meetupfiles.meetup.com/3139382/ApacheCassandraAndDatastax.pdf · Cassandra - Node Recovery 15 • When a write is performed and a replica node for

DataStax EMEA

Apache Cassandra and DataStax

Page 2: Apache Cassandra and DataStax - Meetupfiles.meetup.com/3139382/ApacheCassandraAndDatastax.pdf · Cassandra - Node Recovery 15 • When a write is performed and a replica node for

Agenda

2

1. Apache Cassandra2. Cassandra Query Language3. DataStax Enterprise4. Realtime Analytics5. What´s New

Page 3: Apache Cassandra and DataStax - Meetupfiles.meetup.com/3139382/ApacheCassandraAndDatastax.pdf · Cassandra - Node Recovery 15 • When a write is performed and a replica node for

About me

3

Christian JohannsenSolutions Engineer @ DataStax

@cjohannsen81

Page 4: Apache Cassandra and DataStax - Meetupfiles.meetup.com/3139382/ApacheCassandraAndDatastax.pdf · Cassandra - Node Recovery 15 • When a write is performed and a replica node for

Apache Cassandra

4

Page 5: Apache Cassandra and DataStax - Meetupfiles.meetup.com/3139382/ApacheCassandraAndDatastax.pdf · Cassandra - Node Recovery 15 • When a write is performed and a replica node for

What is Apache Cassandra

5

• Apache Cassandra is a distributed (“NoSQL”) database• massively scalable and available • providing extreme performance• designed to handle big data workloads • across multiple data center • no single point of failure

Dynamo

BigTable

BigTable: http://research.google.com/archive/bigtable-osdi06.pdf Dynamo: http://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf

Page 6: Apache Cassandra and DataStax - Meetupfiles.meetup.com/3139382/ApacheCassandraAndDatastax.pdf · Cassandra - Node Recovery 15 • When a write is performed and a replica node for

Cassandra Architecture

6

DATA STORE (BIG TABLES)

CLUSTER (DYNAMO)

API (CQL & RPC)

DISKS

Node1

Client request

Node2

CLUSTER (DYNAMO)

API (CQL & RPC)

DISKS

DATA STORE (BIG TABLES)

Page 7: Apache Cassandra and DataStax - Meetupfiles.meetup.com/3139382/ApacheCassandraAndDatastax.pdf · Cassandra - Node Recovery 15 • When a write is performed and a replica node for

Scale, anybody?

7

• Cassandra works from very small to very large deployments • Cassandra footprint @ Apple* • 75,000+ nodes • 10’s of petabytes of data • Millions ops/second • Largest cluster 1000+ nodes

*Cassandra Summit, USA, September 2014

Page 8: Apache Cassandra and DataStax - Meetupfiles.meetup.com/3139382/ApacheCassandraAndDatastax.pdf · Cassandra - Node Recovery 15 • When a write is performed and a replica node for

Why Cassandra?

8

• Masterless Architecture with read/write anywhere design• Continuous Availability with no single point of failure• Multi-Data Center and Availability Zone support• Flexible data model for unstructured, semi-structured and

structured data• Linear scalable performance with online expansion (scale-out and

scale-up)• Security with integrated authentication• Operationally simple• CQL - Cassandra Query Language

Page 9: Apache Cassandra and DataStax - Meetupfiles.meetup.com/3139382/ApacheCassandraAndDatastax.pdf · Cassandra - Node Recovery 15 • When a write is performed and a replica node for

Cassandra Adoption

9

Source: db-engines.com, Feb. 2014

Page 10: Apache Cassandra and DataStax - Meetupfiles.meetup.com/3139382/ApacheCassandraAndDatastax.pdf · Cassandra - Node Recovery 15 • When a write is performed and a replica node for

• Client reads or writes to any node• Node coordinates with others (gossip

protocol)• Data read or replicated in parallel• RF = 3 in this example• Each node is strong 60% of the clusters

Data i.e. 3/5

Cassandra - Locally Distributed

10

Node 1 1st copy

Node 4

Node 5

Node 2 2nd copy

Node 3 3rd copy

Node 2 2nd copy

Page 11: Apache Cassandra and DataStax - Meetupfiles.meetup.com/3139382/ApacheCassandraAndDatastax.pdf · Cassandra - Node Recovery 15 • When a write is performed and a replica node for

Cassandra - Rack/Zone aware

11

Node 1 1st copy

Node 4

Node 2

Node 3 2nd copy

Rack 1

Rack 2Rack 2

Rack 3

Rack 1

Node 5 3rd copy

• Cassandra is aware of which rack or zone each node resides in

• It will attempt to place each data copy in a different rack

• RF=3 in this example

Page 12: Apache Cassandra and DataStax - Meetupfiles.meetup.com/3139382/ApacheCassandraAndDatastax.pdf · Cassandra - Node Recovery 15 • When a write is performed and a replica node for

Cassandra - DC/Region aware

12

• Active Everywhere – reads/writes in multiple data centres • Client writes local • Data syncs across WAN • Replication Factor per DC • Different number of nodes per

data center Node 1 1st copy

Node 4

Node 5 Node 2 2nd copy

Node 3 3rd copy

Node 1 1st copy

Node 4

Node 5 Node 2 2nd copy

Node 3 3rd copy

DC: EUROPEDC: USA

Page 13: Apache Cassandra and DataStax - Meetupfiles.meetup.com/3139382/ApacheCassandraAndDatastax.pdf · Cassandra - Node Recovery 15 • When a write is performed and a replica node for

Cassandra - Tuneable Consistency

13

• Consistency Level (CL) • Client specifies per operation • Handles multi-data center operations

• ALL = All replicas ack • QUORUM = > 51% of replicas ack • LOCAL_QUORUM = > 51% in local DC ack • ONE = Only one replica acks • Plus more…. (see docs)

• Blog: Eventual Consistency != Hopeful Consistencyhttp://planetcassandra.org/blog/post/a-netflix-experiment-eventual-consistency-hopeful-consistency-by-christos-kalantzis/

Node 1 1st copy

Node 4

Node 5 Node 2 2nd copy

Node 3 3rd copy

ParallelWrite

WriteCL=QUORUM

5 μs ack

12 μs ack

500 μs ack

12 μs ack

Page 14: Apache Cassandra and DataStax - Meetupfiles.meetup.com/3139382/ApacheCassandraAndDatastax.pdf · Cassandra - Node Recovery 15 • When a write is performed and a replica node for

Cassandra - Node failure

14

• A single node failure shouldn’t bring failure. • Replication Factor + Consistency Level = Success

• This example: • RF = 3 • CL = QUORUM

Node 1 1st copy

Node 4

Node 5 Node 2 2nd copy

Node 3 3rd copy

ParallelWrite

WriteCL=QUORUM

5 μs ack

12 μs ack

12 μs ack

>51% ack – so request is a success

Page 15: Apache Cassandra and DataStax - Meetupfiles.meetup.com/3139382/ApacheCassandraAndDatastax.pdf · Cassandra - Node Recovery 15 • When a write is performed and a replica node for

Cassandra - Node Recovery

15

• When a write is performed and a replica node for the row is unavailable the coordinator will store a hint locally (3 hours)

• When the node recovers, the coordinator replays the missed writes. • Note: a hinted write does not count the consistency level • Note: you should still run repairs across your cluster

Node 1 1st copy

Node 4

Node 5 Node 2 2nd copy

Node 3 3rd copy

Stores Hints while Node 3 is offline

Page 16: Apache Cassandra and DataStax - Meetupfiles.meetup.com/3139382/ApacheCassandraAndDatastax.pdf · Cassandra - Node Recovery 15 • When a write is performed and a replica node for

Cassandra Rack/Zone Failure

16

• Cassandra will place the data in as many different racks or availability zones as it can.

• This example: • RF = 3 • CL = QUORUM • AZ/Rack 2 fails

• Data copies still available in Node 1 and Node 5

• Quorum can be honored i.e. > 51% ack

Node 1 1st copy

Node 4

Node 2

Node 3 2nd copy

Rack 1

Rack 2Rack 2

Rack 3

Rack 1

Node 5 3rd copy

request is a success

Page 17: Apache Cassandra and DataStax - Meetupfiles.meetup.com/3139382/ApacheCassandraAndDatastax.pdf · Cassandra - Node Recovery 15 • When a write is performed and a replica node for

Cassandra is fast!

17

• University of Toronto study:

http://vldb.org/pvldb/vol5/p1724_tilmannrabl_vldb2012.pdf

Page 18: Apache Cassandra and DataStax - Meetupfiles.meetup.com/3139382/ApacheCassandraAndDatastax.pdf · Cassandra - Node Recovery 15 • When a write is performed and a replica node for

Why is Cassandra so fast?

18

• write-optimised - sequential writes to disk

• fast merging - when SSTable big enough merged with existing

• single layout on disk

Page 19: Apache Cassandra and DataStax - Meetupfiles.meetup.com/3139382/ApacheCassandraAndDatastax.pdf · Cassandra - Node Recovery 15 • When a write is performed and a replica node for

Cassandra Query Language

19

Page 20: Apache Cassandra and DataStax - Meetupfiles.meetup.com/3139382/ApacheCassandraAndDatastax.pdf · Cassandra - Node Recovery 15 • When a write is performed and a replica node for

CQL

20

• Cassandra Query Language

• CQL is intended to provide a common, simpler and easier to use interface into Cassandra - and you probably already know it!

• e.g. SELECT * FROM users

• Usual statements: • CREATE / DROP / ALTER TABLE / SELECT

Page 21: Apache Cassandra and DataStax - Meetupfiles.meetup.com/3139382/ApacheCassandraAndDatastax.pdf · Cassandra - Node Recovery 15 • When a write is performed and a replica node for

CQLSH

21

• Command line interface comes with Cassandra• Allows some other Statements

Command DescriptionCAPTURE Captures command output and appends it to a fileCONSISTENCY Shows the current consistency level, or given a level, sets itCOPY Imports and exports CSV (comma-separated values) dataDESCRIBE Provides information about a Cassandra cluster or data objectsEXIT Terminates cqlshSHOW Shows the Cassandra version, host, or data type assumptionsSOURCE Executes a file containing CQL statementsTRACING Enables or disables request tracing

Page 22: Apache Cassandra and DataStax - Meetupfiles.meetup.com/3139382/ApacheCassandraAndDatastax.pdf · Cassandra - Node Recovery 15 • When a write is performed and a replica node for

CQL Basics

22

CREATE KEYSPACE league WITH REPLICATION = {‘class’:’NetworkTopologyStrategy’, ‘DataCentre1’:3, ‘DataCentre2’: 2};

USE league;

CREATE TABLE teams ( team_name varchar, player_name varchar, jersey int, PRIMARY KEY (team_name, player_name));SELECT * FROM teams WHERE team_name = ‘Mighty Mutts’ and player_name = ‘Lucky’;

INSERT INTO teams (team_name, player_name, jersey) VALUES ('Mighty Mutts',’Felix’,90);

Page 23: Apache Cassandra and DataStax - Meetupfiles.meetup.com/3139382/ApacheCassandraAndDatastax.pdf · Cassandra - Node Recovery 15 • When a write is performed and a replica node for

CQL Data Types

23

Page 24: Apache Cassandra and DataStax - Meetupfiles.meetup.com/3139382/ApacheCassandraAndDatastax.pdf · Cassandra - Node Recovery 15 • When a write is performed and a replica node for

What ´s up with DataStax?

24

Page 25: Apache Cassandra and DataStax - Meetupfiles.meetup.com/3139382/ApacheCassandraAndDatastax.pdf · Cassandra - Node Recovery 15 • When a write is performed and a replica node for

DataStax at a glance

25

Founded in April 2010

~27 600+

Santa Clara, Austin, New York, London, Sydney, Paris

370+Employees Percent Customers

Page 26: Apache Cassandra and DataStax - Meetupfiles.meetup.com/3139382/ApacheCassandraAndDatastax.pdf · Cassandra - Node Recovery 15 • When a write is performed and a replica node for

26

Certified,Enterprise-ready

Cassandra

Security Analytics Search Visual Monitoring

Management Services In-Memory

Dev. IDE & Drivers

Professional Services

Support & Training

Commercial Confidence

Enterprise Functionality

DataStax adds value

Page 27: Apache Cassandra and DataStax - Meetupfiles.meetup.com/3139382/ApacheCassandraAndDatastax.pdf · Cassandra - Node Recovery 15 • When a write is performed and a replica node for

DataStax Analytics

27

• Designed for running analytics on Cassandra data

• There are 4 ways to do Analytics on Cassandra data:1. Integrated Search (Solr)2. Integrated Batch Analytics (MapReduce, Hive, Pig, Mahout) on

Cassandra3. External Batch Analytics (Hadoop; certified with Cloudera,

HortonWorks)4. Integrated Near Real-Time Analytics (Spark)

Page 28: Apache Cassandra and DataStax - Meetupfiles.meetup.com/3139382/ApacheCassandraAndDatastax.pdf · Cassandra - Node Recovery 15 • When a write is performed and a replica node for

OpsCenter 5.0

28

• Manage multiple clusters and nodes • Add and remove nodes • Administer individual nodes or in bulk • Configure clusters • Perform rolling restarts • Automatically repair data • Rebalance data • Backup management • Capacity planning

Page 29: Apache Cassandra and DataStax - Meetupfiles.meetup.com/3139382/ApacheCassandraAndDatastax.pdf · Cassandra - Node Recovery 15 • When a write is performed and a replica node for

OpsCenter - New Cluster Example

29

A new, 10-node Cassandra (or Hadoop) cluster with OpsCenter running in 3 minutes… A new, 10-node DSE cluster with OpsCenter running on AWS in 3 minutes…

Done1 2 3

Page 30: Apache Cassandra and DataStax - Meetupfiles.meetup.com/3139382/ApacheCassandraAndDatastax.pdf · Cassandra - Node Recovery 15 • When a write is performed and a replica node for

DevCenter 1.1

30

• Visual Query Tool for Developers and Administrators • Easily create and run Cassandra Queries • Visually navigate database objects • Context-based suggestions

Page 31: Apache Cassandra and DataStax - Meetupfiles.meetup.com/3139382/ApacheCassandraAndDatastax.pdf · Cassandra - Node Recovery 15 • When a write is performed and a replica node for

DataStax Office Demo

31

• 32 Raspberry Pi´s • 16 per DataStax Enterprise 4.5 Cluster • Managed in OpsCenter 5.0 • “Red Button” downs one DataCenter • Not the Performance-Demo but

• Availability • Commodity Hardware

Page 32: Apache Cassandra and DataStax - Meetupfiles.meetup.com/3139382/ApacheCassandraAndDatastax.pdf · Cassandra - Node Recovery 15 • When a write is performed and a replica node for

Native Drivers

32

• Different Native Drivers available: Java, Python etc. • Load Balancing Policies (Client Driver receives Updates) • Data Centre Aware • Latency Aware • Token Aware

• Reconnection policies • Retry policies • Downgrading Consistency • Plus others..

• http://www.datastax.com/download/clientdrivers

Page 33: Apache Cassandra and DataStax - Meetupfiles.meetup.com/3139382/ApacheCassandraAndDatastax.pdf · Cassandra - Node Recovery 15 • When a write is performed and a replica node for

DataStax Enterprise

33

Feature Open Source Datastax EnterpriseDatabase SoftwareData Platform Latest Community Cassandra Production Certified CassandraCore security features Yes YesEnterprise security features No YesBuilt-in automatic management services No YesIntegrated analytics No YesIntegrated enterprise search No YesWorkload/Workflow Isolation No YesEasy migration of RDBMS and log data No YesCertified Service Packs No YesCertified platform support No YesManagement SoftwareOpsCenter Basic functionality Advanced functionalityServicesCommunity Support Yes YesDatastax 24x7x365 Support No YesQuarterly Performance Reviews No YesHot Fixes No YesBug Escalation Privilege No YesCustom Builds No OptionEOL Support No YesLicensing Free Subscription

Page 34: Apache Cassandra and DataStax - Meetupfiles.meetup.com/3139382/ApacheCassandraAndDatastax.pdf · Cassandra - Node Recovery 15 • When a write is performed and a replica node for

DataStax Comparison

34

Standard Pro MaxServer Data Management ComponentsProduction-certified Cassandra Yes Yes YesAdvanced security option Yes Yes YesRepair service Yes Yes YesCapacity planning service Yes Yes YesEnterprise search (built-in Solr) No Yes YesAnalytics (built-in Hadoop) No No YesManagement ToolsOpsCenter Enterprise Yes Yes YesSupport ServicesExpert Support 24x7x1 24x7x1 24x7x1Partner Development Support Business

hoursBusiness hours Business

hoursCertified service packs Yes Yes YesHot fixes Yes Yes YesBug escalation Yes Yes YesQuarterly performance reviews No No YesBi-weekly call with support team No No YesCustom builds No No Option

Page 35: Apache Cassandra and DataStax - Meetupfiles.meetup.com/3139382/ApacheCassandraAndDatastax.pdf · Cassandra - Node Recovery 15 • When a write is performed and a replica node for

© 2014 DataStax Confidential. Do not distribute without consent.

Netflix Delights Customers with Personal RecommendationsWorld’s leading streaming media provider with digital revenue $1.5BN+Tailors content delivery based on viewing preference data captured in CassandraIncreased market cap by 600% since 2012Introduction of ‘Profiles’ drove throughput to over 10M transactions per secondReplaced Oracle in six data centers, worldwide, 100% in the cloud

Use Case: Personalization35

Page 36: Apache Cassandra and DataStax - Meetupfiles.meetup.com/3139382/ApacheCassandraAndDatastax.pdf · Cassandra - Node Recovery 15 • When a write is performed and a replica node for

36

Page 37: Apache Cassandra and DataStax - Meetupfiles.meetup.com/3139382/ApacheCassandraAndDatastax.pdf · Cassandra - Node Recovery 15 • When a write is performed and a replica node for

Realtime Analytics

37

Page 38: Apache Cassandra and DataStax - Meetupfiles.meetup.com/3139382/ApacheCassandraAndDatastax.pdf · Cassandra - Node Recovery 15 • When a write is performed and a replica node for

What is Spark?

38

• Apache Project since 2010 - Analytics Framework • 10-100x faster than Hadoop MapReduce • In-Memory Storage for Read&Write data • Single JVM Processor per node • Rich Scala, Java and Python API´s • 2x-5x less code • Interactive Shell

Page 39: Apache Cassandra and DataStax - Meetupfiles.meetup.com/3139382/ApacheCassandraAndDatastax.pdf · Cassandra - Node Recovery 15 • When a write is performed and a replica node for

Spark Architecture

39

Page 40: Apache Cassandra and DataStax - Meetupfiles.meetup.com/3139382/ApacheCassandraAndDatastax.pdf · Cassandra - Node Recovery 15 • When a write is performed and a replica node for

Why Spark on Cassandra?

40

• Data model independent queries • cross-table operations (JOIN, UNION, etc.)! • complex analytics (e.g. machine learning) • data transformation, aggregation etc. • stream processing (coming soon) • all nodes are Spark workers • by default resilient to worker failures • first node promoted as Spark Master • Standby Master promoted on failure • Master HA available in Datastax Enterprise

Page 41: Apache Cassandra and DataStax - Meetupfiles.meetup.com/3139382/ApacheCassandraAndDatastax.pdf · Cassandra - Node Recovery 15 • When a write is performed and a replica node for

How to Spark on Cassandra?

41

• DataStax Cassandra Spark driver • OpenSource: https://github.com/datastax/cassandra-driver-spark

• Compatible with • Spark 0.9+ • Cassandra 2.0+ • DataStax Enterprise 4.5+

Page 42: Apache Cassandra and DataStax - Meetupfiles.meetup.com/3139382/ApacheCassandraAndDatastax.pdf · Cassandra - Node Recovery 15 • When a write is performed and a replica node for

What´s new?!

42

Page 43: Apache Cassandra and DataStax - Meetupfiles.meetup.com/3139382/ApacheCassandraAndDatastax.pdf · Cassandra - Node Recovery 15 • When a write is performed and a replica node for

2.1 Release - User Defined Types

43

CREATE TYPE address ( street text, city text, zip_code int, phones set<text> )

CREATE TABLE users ( id uuid PRIMARY KEY, name text, addresses map<text, address> )

SELECT id, name, addresses.city, addresses.phones FROM users;

id | name | addresses.city | addresses.phones

--------------------+----------------+-------------------------- 63bf691f | chris | Berlin | {’0201234567', ’0796622222'}

Page 44: Apache Cassandra and DataStax - Meetupfiles.meetup.com/3139382/ApacheCassandraAndDatastax.pdf · Cassandra - Node Recovery 15 • When a write is performed and a replica node for

2.1 Release - Secondary Indexes on collections

44

CREATE TABLE songs (

id uuid PRIMARY KEY,

artist text,

album text,

title text,

data blob,

tags set<text>

);

CREATE INDEX song_tags_idx ON songs(tags);

SELECT * FROM songs WHERE tags CONTAINS 'blues';

id | album | artist | tags | title

----------+---------------+-------------------+-----------------------+------------------

5027b27e | Country Blues | Lightnin' Hopkins | {'acoustic', 'blues'} | Worrying My Mind

Page 45: Apache Cassandra and DataStax - Meetupfiles.meetup.com/3139382/ApacheCassandraAndDatastax.pdf · Cassandra - Node Recovery 15 • When a write is performed and a replica node for

How to start in production?

45

• DataStax Enterprise or Community• Hardware:

• min. 8GB RAM - optimal price-performance sweet spot is 16GB to 64GB• 8-Core CPU - Cassandra is so efficient in writing that the CPU is the

limiting factor• SSD-Disks - Commitlog + 50% Compaction and ext3/4 or xfs file-system

• Nodes - Cluster recommendation is 3 nodes as minimum

• Alternative: Use the Amazon Images (http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architecturePlanningEC2_c.html)

Page 46: Apache Cassandra and DataStax - Meetupfiles.meetup.com/3139382/ApacheCassandraAndDatastax.pdf · Cassandra - Node Recovery 15 • When a write is performed and a replica node for

Thanks! Let´s see a demo!

46