counting image views using redis cluster

Post on 21-Jan-2018

1.480 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Counting Image Views using Redis Cluster

Seandon MooyDevOps Engineer

@erulabs

Counting Image Views using Redis Cluster

Or…. how I stopped map-reducing and learned to love the stream

Seandon MooyDevOps Engineer

@erulabs

3 Billion!

Delay!

Delay!

Failures!

Delay!

Failures!

Failures!

Also… I may not be the best zookeeper

Challenges with Hbase

Roughly 5% of all requests through THRIFT were failing… So many tunables!

Challenges with Hbase

Roughly 5% of all requests through THRIFT were failing… So many tunables!Optimized timeouts,added circuitbreakers, etc

Trickle of working requests during outage means circuit breakers are hard to design…

Challenges with Hbase

Roughly 5% of all requests through THRIFT were failing… So many tunables!Optimized timeouts,added circuitbreakers, etc

Trickle of working requests during outage means circuit breakers are hard to design…

“Hbase down == Imgur down”Downtime == sadtime :(

3 Billion!

Solution?

Redis Cluster!

Fastly

ViewCount V2 - Real time with less complexity!

TCP syslog stream

Fastly

ViewCount V2 - Real time with less complexity!

TCP syslog stream

Ingest service

Fastly

ViewCount V2 - Real time with less complexity!

TCP syslog stream

Ingest service

Parses syslog lines, reports metrics via statsd

Fastly

ViewCount V2 - Real time with less complexity!

TCP syslog stream

Ingest service

Parses syslog lines, reports metrics via statsd

Redis 3.2 cluster!

Fastly

ViewCount V2 - Real time with less complexity!

Ingest service

Hbase Backfill service

Fastly

ViewCount V2 - Real time with less complexity!

Ingest service

Hbase Backfill service

Internet

API service

ViewCount V2 - Results:

ViewCount V2 - Results:

Request latency: min: 1ms max: 16.9ms median: 1.6ms p95: 2.6ms p99: 4.6ms Codes: 200: 10000

ViewCount V2 - Results:

Request latency: min: 1ms max: 16.9ms median: 1.6ms p95: 2.6ms p99: 4.6ms Codes: 200: 10000

ViewCount V2 - Results:

20 billion commands!> 400GB in memory!

Things to be aware of:

1. Redis Cluster shard maps - redirections, etc.Monitor redirections - gracefully restart workers after shard moves

2. AOF can slow down / fail large “redis-trib.rb” operations.Make sure to disable before / re-enable after!

3. Not all legacy systems support Redis Cluster, and if they do…They might not support it well (PHP-FPM)!

4. Over memory capacity behavior?Previously we would hard-crash - now we’d LRU old 1-view images.

Neither are good, but for us, one is much less painful

ViewCount V3?Approaching the point of minimal gains for man-hours, but what else might be fun?

1. Moving PHP7 off NodeJS API and directly to Redis ClusterDownsides: dealing with shard maps is complex is a stateless / process-per-request environment!

2. Using redis3's BITFIELD or HSet to save on key storage costsDownsides: complicate the system, reduce “hit-by-a-bus” issues - keys are just hashes, values are just counts!

3. Dealing with the nature of TCP Streams (TCP is not HTTP!)One connection to rule them all! - Node’s Cluster module helps,

but perhaps Rust or Golang?Downsides: Vertical scaling is non-obvious on EC2

ViewCount V2 - Results:

Redis is:

Faster - Imgur response time decreased ~50ms

ViewCount V2 - Results:

Redis is:

Faster - Imgur response time decreased ~50ms

Cheaper - EC2 cost reduced by 75%

ViewCount V2 - Results:

Redis is:

Faster - Imgur response time decreased ~50ms

Cheaper - EC2 cost reduced by 75%

Simpler - No Java, no MR, no ZK, no third parties, just INCR + GET!

Redis is:

Faster - Imgur response time decreased ~50ms

Cheaper - EC2 cost reduced by 75%

Simpler - No Java, no MR, no ZK, no third parties, just INCR + GET!

More fun! - I got to talk at RedisConf17!

ViewCount V2 - Results:

Acknowledgment

Imgur DevOps Team

Imgur Platform Team

top related