running cassandra on amazon’s ecs - meetupfiles.meetup.com/7439192/cassandra-ecs.pdf · •...

Post on 21-Sep-2018

275 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Running Cassandra on Amazon’s ECS

Anirvan Chakraborty

@anirvan_c

Agenda• Motivation• Docker• Cassandra• ECS• Cassandra on Docker best practices• Cassandra on ECS

Motivation

Motivation• Ease of development• Support polyglot languages, frameworks and

components• Operational simplicity• Quick feedback loop

Docker

Docker history

• Came out of dotCloud, a PaaS company• Was originally written in Python• Got re-written in Golang in Feb, 2013• Docker 0.1 was released on Mar, 2013• Docker 1.10 is the latest release

Docker tag line

Build, ship and run any app, anywhere

Docker tag line

• Build: package your application in a container• Ship: move it between machines• Run: execute that container with your application • Any application: as long as it runs on Linux• Anywhere: local VM, bare metal, cloud instances

Why Docker?

• Deploy reliably & consistently• Execution is fast and light weight • Simplicity• Developer friendly workflow• Fantastic community

Apache Cassandra

What is Apache Cassandra?• Fast distributed database• High Availability• Linear Scalability• Predictable performance• No single point of failure• Multi-DC• Easy to manage• Can use commodity hardware• Not a drop in replacement for RDBMS

Hash ring

• Data is partitioned around the ring• Location of data in ring is determined by partition

key• Data is replicated to N servers based on RF• All nodes hold data and can answer read or write

queries

source: https://academy.datastax.com/courses/ds101-introduction-cassandra/introduction-cassandra-overview

CAP Tradeoff

• During network partition it is impossible to be both consistent and highly available

• Latency between data centres also makes consistency impractical

• Cassandra chooses Availability & Partition tolerance over Consistency

Replication• Choose “replication factor” or RF• Data is always replicated to each replica• If node is down, missing data is replayed via

hinted handoff

source: https://academy.datastax.com/courses/ds101-introduction-cassandra/introduction-cassandra-overview

Consistency level

source: https://academy.datastax.com/courses/ds101-introduction-cassandra/introduction-cassandra-overview

• Per query consistency• ALL, QUORUM, ONE• How many replicas to respond OK for query to

succeed

Amazon EC2 Container Service

What is ECS?

…is a highly scalable, fast, container management service that makes it easy to

run, stop, and manage Docker containers on a cluster of Amazon EC2 instances.

http://docs.aws.amazon.com/AmazonECS/latest/developerguide/Welcome.html

What is ECS?

Amazon Docker as a Service

https://www.expeditedssl.com/aws-in-plain-english

How does ECS work

http://www.allthingsdistributed.com/2015/07/under-the-hood-of-the-amazon-ec2-container-service.html

Cluster

http://www.allthingsdistributed.com/2015/07/under-the-hood-of-the-amazon-ec2-container-service.html

Container Instance

http://www.allthingsdistributed.com/2015/07/under-the-hood-of-the-amazon-ec2-container-service.html

ECS Agent

http://www.allthingsdistributed.com/2015/07/under-the-hood-of-the-amazon-ec2-container-service.html

Task

http://www.allthingsdistributed.com/2015/07/under-the-hood-of-the-amazon-ec2-container-service.html

ECS Service

http://www.allthingsdistributed.com/2015/07/under-the-hood-of-the-amazon-ec2-container-service.html

Typical ECS workflow

• Build Docker image using whatever you want.• Push image to registry.• Create JSON file describing your task definition.• Register this task definition with ECS.• Make sure that your cluster has enough

resources.• Start a new task from the task definition.

Tips & tricks

Dockerize C* Dev Environment• Make it run as slow, but as stable as possible!• Super low memory settings in cassandra-env.sh

• MAX_HEAP_SIZE=“128M” • HEAP_NEWSIZE=“24M”

• Remove caches in dev mode in cassandra.yml• key_cache_size_in_mb: 0 • reduce_cache_sizes_at: 0 • reduce_cache_capacity_to: 0

Dockerize C* Production

• Use host networking (—net=host) for better network performance

• Put data, commitlog and saved_caches in volume mount folders to the underlying host

• Run cassandra on the foreground using (-f)• Tune JVM heap for optimal size• Tune JVM garbage collector for your workload

Dockerize C* on ECS• Simple service discovery using ECS API• Custom Dockerfile and entry-point script

to control Cassandra configuration• Cleanup downed node and repair cluster

on node failover

Where to next?

• Consider GlusterFS for Cassandra and Spark on ECS

• Consider Weave for networking with Docker on ECS

Thanks!

Twitter: @cakesolutionsTel: 0845 617 1200

Email: enquiries@cakesolutions.net

Resourceshttps://academy.datastax.com/courses/ds101-introduction-cassandra

http://www.allthingsdistributed.com/2015/07/under-the-hood-of-the-amazon-ec2-container-service.html

http://docs.aws.amazon.com/AmazonECS/latest/developerguide/Welcome.html

https://www.expeditedssl.com/aws-in-plain-english

https://github.com/aws/amazon-ecs-agent

https://blog.docker.com/2015/03/why-i-love-docker-and-why-youll-love-it-too/

http://www.datastax.com/resources/whitepapers/best-practices-running-datastax-enterprise-within-docker

top related