running cassandra on amazon’s ecs - meetupfiles.meetup.com/7439192/cassandra-ecs.pdf · •...
TRANSCRIPT
Running Cassandra on Amazon’s ECS
Anirvan Chakraborty
@anirvan_c
Agenda• Motivation• Docker• Cassandra• ECS• Cassandra on Docker best practices• Cassandra on ECS
Motivation
Motivation• Ease of development• Support polyglot languages, frameworks and
components• Operational simplicity• Quick feedback loop
Docker
Docker history
• Came out of dotCloud, a PaaS company• Was originally written in Python• Got re-written in Golang in Feb, 2013• Docker 0.1 was released on Mar, 2013• Docker 1.10 is the latest release
Docker tag line
Build, ship and run any app, anywhere
Docker tag line
• Build: package your application in a container• Ship: move it between machines• Run: execute that container with your application • Any application: as long as it runs on Linux• Anywhere: local VM, bare metal, cloud instances
Why Docker?
• Deploy reliably & consistently• Execution is fast and light weight • Simplicity• Developer friendly workflow• Fantastic community
Apache Cassandra
What is Apache Cassandra?• Fast distributed database• High Availability• Linear Scalability• Predictable performance• No single point of failure• Multi-DC• Easy to manage• Can use commodity hardware• Not a drop in replacement for RDBMS
Hash ring
• Data is partitioned around the ring• Location of data in ring is determined by partition
key• Data is replicated to N servers based on RF• All nodes hold data and can answer read or write
queries
source: https://academy.datastax.com/courses/ds101-introduction-cassandra/introduction-cassandra-overview
CAP Tradeoff
• During network partition it is impossible to be both consistent and highly available
• Latency between data centres also makes consistency impractical
• Cassandra chooses Availability & Partition tolerance over Consistency
Replication• Choose “replication factor” or RF• Data is always replicated to each replica• If node is down, missing data is replayed via
hinted handoff
source: https://academy.datastax.com/courses/ds101-introduction-cassandra/introduction-cassandra-overview
Consistency level
source: https://academy.datastax.com/courses/ds101-introduction-cassandra/introduction-cassandra-overview
• Per query consistency• ALL, QUORUM, ONE• How many replicas to respond OK for query to
succeed
Amazon EC2 Container Service
What is ECS?
…is a highly scalable, fast, container management service that makes it easy to
run, stop, and manage Docker containers on a cluster of Amazon EC2 instances.
http://docs.aws.amazon.com/AmazonECS/latest/developerguide/Welcome.html
What is ECS?
Amazon Docker as a Service
https://www.expeditedssl.com/aws-in-plain-english
How does ECS work
http://www.allthingsdistributed.com/2015/07/under-the-hood-of-the-amazon-ec2-container-service.html
Cluster
http://www.allthingsdistributed.com/2015/07/under-the-hood-of-the-amazon-ec2-container-service.html
Container Instance
http://www.allthingsdistributed.com/2015/07/under-the-hood-of-the-amazon-ec2-container-service.html
ECS Agent
http://www.allthingsdistributed.com/2015/07/under-the-hood-of-the-amazon-ec2-container-service.html
Task
http://www.allthingsdistributed.com/2015/07/under-the-hood-of-the-amazon-ec2-container-service.html
ECS Service
http://www.allthingsdistributed.com/2015/07/under-the-hood-of-the-amazon-ec2-container-service.html
Typical ECS workflow
• Build Docker image using whatever you want.• Push image to registry.• Create JSON file describing your task definition.• Register this task definition with ECS.• Make sure that your cluster has enough
resources.• Start a new task from the task definition.
Tips & tricks
Dockerize C* Dev Environment• Make it run as slow, but as stable as possible!• Super low memory settings in cassandra-env.sh
• MAX_HEAP_SIZE=“128M” • HEAP_NEWSIZE=“24M”
• Remove caches in dev mode in cassandra.yml• key_cache_size_in_mb: 0 • reduce_cache_sizes_at: 0 • reduce_cache_capacity_to: 0
Dockerize C* Production
• Use host networking (—net=host) for better network performance
• Put data, commitlog and saved_caches in volume mount folders to the underlying host
• Run cassandra on the foreground using (-f)• Tune JVM heap for optimal size• Tune JVM garbage collector for your workload
Dockerize C* on ECS• Simple service discovery using ECS API• Custom Dockerfile and entry-point script
to control Cassandra configuration• Cleanup downed node and repair cluster
on node failover
Where to next?
• Consider GlusterFS for Cassandra and Spark on ECS
• Consider Weave for networking with Docker on ECS
Resourceshttps://academy.datastax.com/courses/ds101-introduction-cassandra
http://www.allthingsdistributed.com/2015/07/under-the-hood-of-the-amazon-ec2-container-service.html
http://docs.aws.amazon.com/AmazonECS/latest/developerguide/Welcome.html
https://www.expeditedssl.com/aws-in-plain-english
https://github.com/aws/amazon-ecs-agent
https://blog.docker.com/2015/03/why-i-love-docker-and-why-youll-love-it-too/
http://www.datastax.com/resources/whitepapers/best-practices-running-datastax-enterprise-within-docker