apache cassandra at wayin

14
August 28, 2013 Cassandra in the Cloud August 28, 2013 Jamey Wood

Upload: planet-cassandra

Post on 14-Dec-2014

1.190 views

Category:

Technology


3 download

DESCRIPTION

Jamey Wood presents on Apache Cassandra at Wayin for the Colorado Cassandra Users Group on August 28th, 2013. http://www.meetup.com/Colorado-Cassandra-Meetup/

TRANSCRIPT

Page 1: Apache Cassandra at Wayin

August 28, 2013

Cassandrain the Cloud

August 28, 2013Jamey Wood

Page 2: Apache Cassandra at Wayin

04/10/2023 2

Wayin: History

Founded in 2011

Located in beautiful Denver, Colorado

Global clients in largest corporations, sports teams, agencies, and publishers

$20M raised

Co-founded by Scott McNealy

Twitter Certified May 2013

Page 3: Apache Cassandra at Wayin

04/10/2023 3

Wayin: MissionTransforming Social Media into Brand Experiences

Page 4: Apache Cassandra at Wayin

04/10/2023 4

Marketing is becoming more reactive, and the ability to own, brand, curate and customize relevant experiences in the moment is more

valuable now, than it has ever been

Why it Works

Page 5: Apache Cassandra at Wayin

04/10/2023 5

How it Works

ELB LoadBalancer

CloudFront S3

Route 53 SQS

API Server

API Server

API Server

API Server Scaling Group

Auto-Scaled Based on Machine Load

Clients

DB Server Scaling Groups

Scaled Based on Data Volume

Cassandra

API Server

API ServerTrackingServer

Tracking ServerScaling Group

Auto-Scaled Based on Queue Length

Page 6: Apache Cassandra at Wayin

04/10/2023 6

Challenge 1: Provisioning and DeploymentCloudFormation, Auto Scaling Groups, and the Cassandra Ring

Clients

CloudFormation

DB Auto Scaling Group: us-east-1a

DB Auto Scaling Group: us-east-1b

DB Auto Scaling Group: us-east-1c

1a

1a

1b

1c1b

1c

Cassandra

time

Page 7: Apache Cassandra at Wayin

04/10/2023 7

Challenge 1: Provisioning and DeploymentPitfalls and Opportunities

Clients

• Auto Scaling Groups are helpful for automatically replacing terminated instances, but certain actions can be problematic.

• Be familiar with as-suspend-processes options.

• Token management is important to keep Cassandra ring balanced, properly distributed across availability zones, etc. Also important to be able to bring up rings (and launch replacement servers) in a fully automated fashion.

• Netflix’s “Priam” open source tool can provide this kind of token management (and more).

Page 8: Apache Cassandra at Wayin

04/10/2023 8

Challenge 2: Migration

Clients

Jackson{ “_id”: “abc”, “author” : “John Doe”, “body”: “some text”, …}

id: “abc” author: “John Doe” data: “{ … }”

id: “def” author: “JaneDoe” data: “{ … }”

id: “ghi” author: “Jim Doe” data: “{ … }”

id: “jkl” author: “Jill Doe” data: “{ … }”

MongoDB Cassandra

Page 9: Apache Cassandra at Wayin

04/10/2023 9

Challenge 3: Volatile PerformanceManaging EC2 I/O

Clients

Source for EC2 IO Performance Graph: http://blog.scalyr.com/2012/10/16/a-systematic-look-at-ec2-io/

IO Performance for 45 EC2 Instances over Time Mitigation: md(4) RAID0 across Ephemeral Disks

Page 10: Apache Cassandra at Wayin

04/10/2023 10

Challenge 3: Volatile PerformanceClient Resiliency

Clientsnew ConnectionPoolConfigurationImpl("MyConnectionPool")

// Will resort hosts per token partition every 10 seconds .setLatencyAwareUpdateInterval(10000)

// Will clear the latency every 10 seconds .setLatencyAwareResetInterval(10000)

// Will sort hosts if a host is more than 100% slower than the best and always // assign connections to the fastest host, otherwise will use round robin .setLatencyAwareBadnessThreshold(2)

// Uses last 100 latency samples. These samples are in a FIFO queue and // will just cycle themselves .setLatencyAwareWindowSize(100);

Astyanax Example: Configuring Latency Awareness

Page 11: Apache Cassandra at Wayin

04/10/2023 11

Challenge 4: Sorting

1a

1b

1c

Cassandra

1b

1c

1a

• Single wide rows make it easy to code sorting/slicing logic, but can lead to performance hotspots.

• Good rule of thumb is to keep individual rows below 10MB in size[1].

• Our current solution involves using “bucketed” wide rows (spreading the data for a given sorting range across multiple keys/servers, and then collating that data during reads).

• More info:1. http://rubyscale.com/blog/2011/03/06/basic-time-series-

with-cassandra/2. http://www.datastax.com/dev/blog/advanced-time-series-

with-cassandra

Page 12: Apache Cassandra at Wayin

04/10/2023 12

Challenge 5: MonitoringNagios Reports

Clients

Nagios Report: RecentReadLatency

Page 13: Apache Cassandra at Wayin

04/10/2023 13

Challenge 5: MonitoringNagios Setup

Clients

ColumnFamilies/RecentReadLatencyMicros for some_table table check_jmx -U service:jmx:rmi:///jndi/rmi://127.0.0.1:7199/jmxrmi \ -O org.apache.cassandra.db:columnfamily=some_table \ ,keyspace=some_keyspace\ ,type=ColumnFamilies

Monitor Cassandra using JMX Nagios Plugin / NRPE (Nagios Remote Plugin Executor)http://wiki.apache.org/cassandra/JmxInterface

Page 14: Apache Cassandra at Wayin

04/10/2023 14

Challenge 6: We’re Hiring!Looking for great developers to work with Cassandra (amongst other

things)

Clients

http://www.wayin.com/about-us/careers

Senior Software EngineerWork with great people and great technologies:

• Cassandra• JVM• Jetty• Jersey• Jackson• AWS

Vice President of SalesWork with great brands and agencies:

• Denver Broncos• Atlanta Falcons• St. Louis Rams• San Jose Sharks• Chevrolet• Bank of America• Turtlewax