raghavendra d prabhu [email protected] @randomsurfer ...€¦ · “need to run logical backup on a...
TRANSCRIPT
![Page 1: Raghavendra D Prabhu rprabhu@yelp.com @randomsurfer ...€¦ · “Need to run logical backup on a fleet without disruption to ingress traffic” “Run anti-entropy repair on Cassandra](https://reader033.vdocuments.net/reader033/viewer/2022050221/5f665232578fe244b513590e/html5/thumbnails/1.jpg)
Raghavendra D [email protected]
@randomsurferDistributed Systems
TaskermanA Distributed Cluster Task Manager
![Page 2: Raghavendra D Prabhu rprabhu@yelp.com @randomsurfer ...€¦ · “Need to run logical backup on a fleet without disruption to ingress traffic” “Run anti-entropy repair on Cassandra](https://reader033.vdocuments.net/reader033/viewer/2022050221/5f665232578fe244b513590e/html5/thumbnails/2.jpg)
Yelp’s MissionConnecting people with great
local businesses.
![Page 3: Raghavendra D Prabhu rprabhu@yelp.com @randomsurfer ...€¦ · “Need to run logical backup on a fleet without disruption to ingress traffic” “Run anti-entropy repair on Cassandra](https://reader033.vdocuments.net/reader033/viewer/2022050221/5f665232578fe244b513590e/html5/thumbnails/3.jpg)
Datastore Ecosystem @
![Page 4: Raghavendra D Prabhu rprabhu@yelp.com @randomsurfer ...€¦ · “Need to run logical backup on a fleet without disruption to ingress traffic” “Run anti-entropy repair on Cassandra](https://reader033.vdocuments.net/reader033/viewer/2022050221/5f665232578fe244b513590e/html5/thumbnails/4.jpg)
Cassandra
Elasticsearch
Zookeeper
PostgreSQL
![Page 5: Raghavendra D Prabhu rprabhu@yelp.com @randomsurfer ...€¦ · “Need to run logical backup on a fleet without disruption to ingress traffic” “Run anti-entropy repair on Cassandra](https://reader033.vdocuments.net/reader033/viewer/2022050221/5f665232578fe244b513590e/html5/thumbnails/5.jpg)
5
….● Memcached● Redis● Spark● Redshift● DynamoDB● PaaStorm● S3
Any many more..
![Page 6: Raghavendra D Prabhu rprabhu@yelp.com @randomsurfer ...€¦ · “Need to run logical backup on a fleet without disruption to ingress traffic” “Run anti-entropy repair on Cassandra](https://reader033.vdocuments.net/reader033/viewer/2022050221/5f665232578fe244b513590e/html5/thumbnails/6.jpg)
● Several TB in Cassandra clusters with tens of nodes each● Close to a million messages/second in streaming pipeline● Several TB in Elasticsearch with several hundred nodes in
each● Many PB archived to S3 every month● Multi-AZ Multi-Region● And growing…
Distributed Systems
![Page 7: Raghavendra D Prabhu rprabhu@yelp.com @randomsurfer ...€¦ · “Need to run logical backup on a fleet without disruption to ingress traffic” “Run anti-entropy repair on Cassandra](https://reader033.vdocuments.net/reader033/viewer/2022050221/5f665232578fe244b513590e/html5/thumbnails/7.jpg)
![Page 8: Raghavendra D Prabhu rprabhu@yelp.com @randomsurfer ...€¦ · “Need to run logical backup on a fleet without disruption to ingress traffic” “Run anti-entropy repair on Cassandra](https://reader033.vdocuments.net/reader033/viewer/2022050221/5f665232578fe244b513590e/html5/thumbnails/8.jpg)
“Need to run logical backup on a fleet without disruption to ingress traffic”
“Run anti-entropy repair on Cassandra cluster without spiking read latency”
“Reboot 1000 instances without taking a millennia but not bringing down site either”
“Upgrade an Elasticsearch cluster from m3.medium to m3.xlarge safely without downtime”
![Page 9: Raghavendra D Prabhu rprabhu@yelp.com @randomsurfer ...€¦ · “Need to run logical backup on a fleet without disruption to ingress traffic” “Run anti-entropy repair on Cassandra](https://reader033.vdocuments.net/reader033/viewer/2022050221/5f665232578fe244b513590e/html5/thumbnails/9.jpg)
Pet vs Cattle
![Page 10: Raghavendra D Prabhu rprabhu@yelp.com @randomsurfer ...€¦ · “Need to run logical backup on a fleet without disruption to ingress traffic” “Run anti-entropy repair on Cassandra](https://reader033.vdocuments.net/reader033/viewer/2022050221/5f665232578fe244b513590e/html5/thumbnails/10.jpg)
Maintenance Cost
Engineering Efficiency
Scalability
![Page 11: Raghavendra D Prabhu rprabhu@yelp.com @randomsurfer ...€¦ · “Need to run logical backup on a fleet without disruption to ingress traffic” “Run anti-entropy repair on Cassandra](https://reader033.vdocuments.net/reader033/viewer/2022050221/5f665232578fe244b513590e/html5/thumbnails/11.jpg)
Taskerman
![Page 12: Raghavendra D Prabhu rprabhu@yelp.com @randomsurfer ...€¦ · “Need to run logical backup on a fleet without disruption to ingress traffic” “Run anti-entropy repair on Cassandra](https://reader033.vdocuments.net/reader033/viewer/2022050221/5f665232578fe244b513590e/html5/thumbnails/12.jpg)
![Page 13: Raghavendra D Prabhu rprabhu@yelp.com @randomsurfer ...€¦ · “Need to run logical backup on a fleet without disruption to ingress traffic” “Run anti-entropy repair on Cassandra](https://reader033.vdocuments.net/reader033/viewer/2022050221/5f665232578fe244b513590e/html5/thumbnails/13.jpg)
![Page 14: Raghavendra D Prabhu rprabhu@yelp.com @randomsurfer ...€¦ · “Need to run logical backup on a fleet without disruption to ingress traffic” “Run anti-entropy repair on Cassandra](https://reader033.vdocuments.net/reader033/viewer/2022050221/5f665232578fe244b513590e/html5/thumbnails/14.jpg)
![Page 15: Raghavendra D Prabhu rprabhu@yelp.com @randomsurfer ...€¦ · “Need to run logical backup on a fleet without disruption to ingress traffic” “Run anti-entropy repair on Cassandra](https://reader033.vdocuments.net/reader033/viewer/2022050221/5f665232578fe244b513590e/html5/thumbnails/15.jpg)
● Safe● Security● Generic and Extensible● Distributed● Loosely coupled● Cluster awareness
Requirements
![Page 16: Raghavendra D Prabhu rprabhu@yelp.com @randomsurfer ...€¦ · “Need to run logical backup on a fleet without disruption to ingress traffic” “Run anti-entropy repair on Cassandra](https://reader033.vdocuments.net/reader033/viewer/2022050221/5f665232578fe244b513590e/html5/thumbnails/16.jpg)
● Schedulable● Reusable● Auditability
○ Not Ad-hoc○ More Declarative, Less Imperative○ Config Management
● Maintainability● Observability● Resilience
Desirable
![Page 17: Raghavendra D Prabhu rprabhu@yelp.com @randomsurfer ...€¦ · “Need to run logical backup on a fleet without disruption to ingress traffic” “Run anti-entropy repair on Cassandra](https://reader033.vdocuments.net/reader033/viewer/2022050221/5f665232578fe244b513590e/html5/thumbnails/17.jpg)
● Paramount*● Serialized execution
○ ‘m’ out of ‘n’ ○ Disjoint jobs.
● Avoid cascade● Privilege escalation● Pull-based
* Unless oncall is automated too.
Safety
![Page 18: Raghavendra D Prabhu rprabhu@yelp.com @randomsurfer ...€¦ · “Need to run logical backup on a fleet without disruption to ingress traffic” “Run anti-entropy repair on Cassandra](https://reader033.vdocuments.net/reader033/viewer/2022050221/5f665232578fe244b513590e/html5/thumbnails/18.jpg)
![Page 19: Raghavendra D Prabhu rprabhu@yelp.com @randomsurfer ...€¦ · “Need to run logical backup on a fleet without disruption to ingress traffic” “Run anti-entropy repair on Cassandra](https://reader033.vdocuments.net/reader033/viewer/2022050221/5f665232578fe244b513590e/html5/thumbnails/19.jpg)
![Page 20: Raghavendra D Prabhu rprabhu@yelp.com @randomsurfer ...€¦ · “Need to run logical backup on a fleet without disruption to ingress traffic” “Run anti-entropy repair on Cassandra](https://reader033.vdocuments.net/reader033/viewer/2022050221/5f665232578fe244b513590e/html5/thumbnails/20.jpg)
● Network is reliable● Latency is zero● Bandwidth is infinite● Network is secure● One administrator● Transport cost is zero● Network is homogenous● Topology doesn't change
Fallacies of Distributed System
![Page 21: Raghavendra D Prabhu rprabhu@yelp.com @randomsurfer ...€¦ · “Need to run logical backup on a fleet without disruption to ingress traffic” “Run anti-entropy repair on Cassandra](https://reader033.vdocuments.net/reader033/viewer/2022050221/5f665232578fe244b513590e/html5/thumbnails/21.jpg)
Quotes
There are 2 hard problems in computer science: cache invalidation, naming things, and off-by-1 errors. @secretGeek
There are only two hard problems in distributed systems: 2. Exactly-once delivery 1. Guaranteed order of messages 2. Exactly-once delivery @mathiasverraes
![Page 22: Raghavendra D Prabhu rprabhu@yelp.com @randomsurfer ...€¦ · “Need to run logical backup on a fleet without disruption to ingress traffic” “Run anti-entropy repair on Cassandra](https://reader033.vdocuments.net/reader033/viewer/2022050221/5f665232578fe244b513590e/html5/thumbnails/22.jpg)
● Scheduler● Router● Co-ordinator● Transport● Executor● Error handler● Configuration● Monitoring● Tooling
Building Blocks
![Page 23: Raghavendra D Prabhu rprabhu@yelp.com @randomsurfer ...€¦ · “Need to run logical backup on a fleet without disruption to ingress traffic” “Run anti-entropy repair on Cassandra](https://reader033.vdocuments.net/reader033/viewer/2022050221/5f665232578fe244b513590e/html5/thumbnails/23.jpg)
RouterQueue
Q2Q1 Q3
Dead Letter Queue
T1T2
T3
Lease
Failure
Workqueue
Flow of task
Task Scheduler
Cluster
Node Queues
Retries
Zookeeper
EC2 API
![Page 24: Raghavendra D Prabhu rprabhu@yelp.com @randomsurfer ...€¦ · “Need to run logical backup on a fleet without disruption to ingress traffic” “Run anti-entropy repair on Cassandra](https://reader033.vdocuments.net/reader033/viewer/2022050221/5f665232578fe244b513590e/html5/thumbnails/24.jpg)
#Anatomy of a Taskerman Task
# Restart action for 2 nodes of geo_counter # cassandra cluster owned by gsi{ ‘action’: ‘cassandra_task:restart’, ‘version’: 1.2, ‘limit’: 2, ‘cluster_name’: ‘cassandra:geo_counter’, ‘discovery’ : ‘aws_tags’, ‘owner’: ‘gsi’, ‘task_id’: ‘abcd-ef123’,
![Page 25: Raghavendra D Prabhu rprabhu@yelp.com @randomsurfer ...€¦ · “Need to run logical backup on a fleet without disruption to ingress traffic” “Run anti-entropy repair on Cassandra](https://reader033.vdocuments.net/reader033/viewer/2022050221/5f665232578fe244b513590e/html5/thumbnails/25.jpg)
#Anatomy of a Taskerman Task
‘taskerman_params’: { ‘action_args’: {‘force’: true}, ‘workqueue_args’: {‘retry_count’:3}, }, ‘nodes’: [], ‘destnode’: ‘’,}
# force=true for restart, retry_count for queue# [a,b,c,d] to skip discovery
![Page 26: Raghavendra D Prabhu rprabhu@yelp.com @randomsurfer ...€¦ · “Need to run logical backup on a fleet without disruption to ingress traffic” “Run anti-entropy repair on Cassandra](https://reader033.vdocuments.net/reader033/viewer/2022050221/5f665232578fe244b513590e/html5/thumbnails/26.jpg)
![Page 27: Raghavendra D Prabhu rprabhu@yelp.com @randomsurfer ...€¦ · “Need to run logical backup on a fleet without disruption to ingress traffic” “Run anti-entropy repair on Cassandra](https://reader033.vdocuments.net/reader033/viewer/2022050221/5f665232578fe244b513590e/html5/thumbnails/27.jpg)
RouterQueue
Q2Q1 Q3
Dead Letter Queue
T1T2
T3
Lease
Failure
Workqueue
Flow of task
Task Scheduler
Cluster
Node Queues
Retries
Zookeeper
EC2 API
![Page 28: Raghavendra D Prabhu rprabhu@yelp.com @randomsurfer ...€¦ · “Need to run logical backup on a fleet without disruption to ingress traffic” “Run anti-entropy repair on Cassandra](https://reader033.vdocuments.net/reader033/viewer/2022050221/5f665232578fe244b513590e/html5/thumbnails/28.jpg)
● Runs on Chronos● Emits a task● Enqueues into global queue● Ad-hoc invocation● Deployment granularities● Task tracking● Yelpsoa-configs
Task Scheduler
PaaSTA
![Page 29: Raghavendra D Prabhu rprabhu@yelp.com @randomsurfer ...€¦ · “Need to run logical backup on a fleet without disruption to ingress traffic” “Run anti-entropy repair on Cassandra](https://reader033.vdocuments.net/reader033/viewer/2022050221/5f665232578fe244b513590e/html5/thumbnails/29.jpg)
RouterQueue
Q2Q1 Q3
Dead Letter Queue
T1T2
T3
Lease
Failure
Workqueue
Flow of task
Task Scheduler
Cluster
Node Queues
Retries
Zookeeper
EC2 API
![Page 30: Raghavendra D Prabhu rprabhu@yelp.com @randomsurfer ...€¦ · “Need to run logical backup on a fleet without disruption to ingress traffic” “Run anti-entropy repair on Cassandra](https://reader033.vdocuments.net/reader033/viewer/2022050221/5f665232578fe244b513590e/html5/thumbnails/30.jpg)
● AWS SQS● Best-effort FIFO● Reliable and cheap● Low latency● Properties
○ Read without delete○ Visibility timeout○ Retry○ Dead Letter Queue
WorkQueue
AWS SQS
![Page 31: Raghavendra D Prabhu rprabhu@yelp.com @randomsurfer ...€¦ · “Need to run logical backup on a fleet without disruption to ingress traffic” “Run anti-entropy repair on Cassandra](https://reader033.vdocuments.net/reader033/viewer/2022050221/5f665232578fe244b513590e/html5/thumbnails/31.jpg)
RouterQueue
Q2Q1 Q3
Dead Letter Queue
T1T2
T3
Lease
Failure
Workqueue
Flow of task
Task Scheduler
Cluster
Node Queues
Retries
Zookeeper
EC2 API
![Page 32: Raghavendra D Prabhu rprabhu@yelp.com @randomsurfer ...€¦ · “Need to run logical backup on a fleet without disruption to ingress traffic” “Run anti-entropy repair on Cassandra](https://reader033.vdocuments.net/reader033/viewer/2022050221/5f665232578fe244b513590e/html5/thumbnails/32.jpg)
● Stateless Marathon worker● Routes tasks to clusters● Custom routing logic● At-least once delivery● ‘DNS’ of Taskerman● Pluggable discovery
○ AWS○ Smartstack
Task Router
PaaSTA
![Page 33: Raghavendra D Prabhu rprabhu@yelp.com @randomsurfer ...€¦ · “Need to run logical backup on a fleet without disruption to ingress traffic” “Run anti-entropy repair on Cassandra](https://reader033.vdocuments.net/reader033/viewer/2022050221/5f665232578fe244b513590e/html5/thumbnails/33.jpg)
RouterQueue
Q2Q1 Q3
Dead Letter Queue
T1T2
T3
Lease
Failure
Workqueue
Flow of task
Task Scheduler
Cluster
Node Queues
Retries
Zookeeper
EC2 API
![Page 34: Raghavendra D Prabhu rprabhu@yelp.com @randomsurfer ...€¦ · “Need to run logical backup on a fleet without disruption to ingress traffic” “Run anti-entropy repair on Cassandra](https://reader033.vdocuments.net/reader033/viewer/2022050221/5f665232578fe244b513590e/html5/thumbnails/34.jpg)
● The executor of Taskerman● Dequeue task and executes
○ Pre-defined reviewed code.● Cron-ed on node● Zookeeper for coordination● Task deleted upon success● Dead letter queue upon failed
retries
TaskRunner
![Page 35: Raghavendra D Prabhu rprabhu@yelp.com @randomsurfer ...€¦ · “Need to run logical backup on a fleet without disruption to ingress traffic” “Run anti-entropy repair on Cassandra](https://reader033.vdocuments.net/reader033/viewer/2022050221/5f665232578fe244b513590e/html5/thumbnails/35.jpg)
class TestTaskRunner(TaskRunner): def __init__(self, task,..): # State mgmt and datastore specific
def pre_check(self): # Is the task safe to execute on this cluster
def execute_action(self): # Actual execution of task:action
def post_check(self): # cluster good after execution or is it on fire
![Page 36: Raghavendra D Prabhu rprabhu@yelp.com @randomsurfer ...€¦ · “Need to run logical backup on a fleet without disruption to ingress traffic” “Run anti-entropy repair on Cassandra](https://reader033.vdocuments.net/reader033/viewer/2022050221/5f665232578fe244b513590e/html5/thumbnails/36.jpg)
RouterQueue
Q2Q1 Q3
Dead Letter Queue
T1T2
T3
Lease
Failure
Workqueue
Flow of task
Task Scheduler
Cluster
Node Queues
Retries
EC2 API
Zookeeper
![Page 37: Raghavendra D Prabhu rprabhu@yelp.com @randomsurfer ...€¦ · “Need to run logical backup on a fleet without disruption to ingress traffic” “Run anti-entropy repair on Cassandra](https://reader033.vdocuments.net/reader033/viewer/2022050221/5f665232578fe244b513590e/html5/thumbnails/37.jpg)
![Page 38: Raghavendra D Prabhu rprabhu@yelp.com @randomsurfer ...€¦ · “Need to run logical backup on a fleet without disruption to ingress traffic” “Run anti-entropy repair on Cassandra](https://reader033.vdocuments.net/reader033/viewer/2022050221/5f665232578fe244b513590e/html5/thumbnails/38.jpg)
● Distributed Coordinator● Non Blocking Lease
○ Time-based lease○ Global lease
● Ephemeral locks● Atomic Counters
○ Statistics○ Circuit breaker
Zookeeper
![Page 39: Raghavendra D Prabhu rprabhu@yelp.com @randomsurfer ...€¦ · “Need to run logical backup on a fleet without disruption to ingress traffic” “Run anti-entropy repair on Cassandra](https://reader033.vdocuments.net/reader033/viewer/2022050221/5f665232578fe244b513590e/html5/thumbnails/39.jpg)
● Staleness○ Nodes can go down
● Garbage collection○ Cleanup of ZK data structures
● Composition● Starvation● Uptime
Zookeeper: Challenges
![Page 40: Raghavendra D Prabhu rprabhu@yelp.com @randomsurfer ...€¦ · “Need to run logical backup on a fleet without disruption to ingress traffic” “Run anti-entropy repair on Cassandra](https://reader033.vdocuments.net/reader033/viewer/2022050221/5f665232578fe244b513590e/html5/thumbnails/40.jpg)
![Page 41: Raghavendra D Prabhu rprabhu@yelp.com @randomsurfer ...€¦ · “Need to run logical backup on a fleet without disruption to ingress traffic” “Run anti-entropy repair on Cassandra](https://reader033.vdocuments.net/reader033/viewer/2022050221/5f665232578fe244b513590e/html5/thumbnails/41.jpg)
● Puppet● Terraform● Yelpsoa-configs● PaaSTA● Jenkins● AWS Lambda
Deployment
PaaSTA
![Page 42: Raghavendra D Prabhu rprabhu@yelp.com @randomsurfer ...€¦ · “Need to run logical backup on a fleet without disruption to ingress traffic” “Run anti-entropy repair on Cassandra](https://reader033.vdocuments.net/reader033/viewer/2022050221/5f665232578fe244b513590e/html5/thumbnails/42.jpg)
![Page 43: Raghavendra D Prabhu rprabhu@yelp.com @randomsurfer ...€¦ · “Need to run logical backup on a fleet without disruption to ingress traffic” “Run anti-entropy repair on Cassandra](https://reader033.vdocuments.net/reader033/viewer/2022050221/5f665232578fe244b513590e/html5/thumbnails/43.jpg)
● Multiple vectors of failure● Idempotency● Pessimistic approach
○ Job retry● Separation of state● Mutability● Highly available components● Circuit breakers
Failure handling
![Page 44: Raghavendra D Prabhu rprabhu@yelp.com @randomsurfer ...€¦ · “Need to run logical backup on a fleet without disruption to ingress traffic” “Run anti-entropy repair on Cassandra](https://reader033.vdocuments.net/reader033/viewer/2022050221/5f665232578fe244b513590e/html5/thumbnails/44.jpg)
Debugging
![Page 45: Raghavendra D Prabhu rprabhu@yelp.com @randomsurfer ...€¦ · “Need to run logical backup on a fleet without disruption to ingress traffic” “Run anti-entropy repair on Cassandra](https://reader033.vdocuments.net/reader033/viewer/2022050221/5f665232578fe244b513590e/html5/thumbnails/45.jpg)
● Heartbeat ping○ End-to-end monitoring
● Dead Letter Queue ○ Recycle bin of failed tasks.○ Hooks into human side of
monitoring● Status check
Failure detection
![Page 46: Raghavendra D Prabhu rprabhu@yelp.com @randomsurfer ...€¦ · “Need to run logical backup on a fleet without disruption to ingress traffic” “Run anti-entropy repair on Cassandra](https://reader033.vdocuments.net/reader033/viewer/2022050221/5f665232578fe244b513590e/html5/thumbnails/46.jpg)
![Page 47: Raghavendra D Prabhu rprabhu@yelp.com @randomsurfer ...€¦ · “Need to run logical backup on a fleet without disruption to ingress traffic” “Run anti-entropy repair on Cassandra](https://reader033.vdocuments.net/reader033/viewer/2022050221/5f665232578fe244b513590e/html5/thumbnails/47.jpg)
● End-to-end logging ○ Un/structured
● Metrics○ Counters○ Queue lengths
● Aggregation and dashboards● Staleness checks● Dead Letter Queue● Multi-modal Alerting
Monitoring
![Page 48: Raghavendra D Prabhu rprabhu@yelp.com @randomsurfer ...€¦ · “Need to run logical backup on a fleet without disruption to ingress traffic” “Run anti-entropy repair on Cassandra](https://reader033.vdocuments.net/reader033/viewer/2022050221/5f665232578fe244b513590e/html5/thumbnails/48.jpg)
● Restarts● Reboots● Instance Replacement● Integration tests● Kafka config reload● Failure injection● Backup and restore● Search indexing● .. and many more.
Use cases
![Page 49: Raghavendra D Prabhu rprabhu@yelp.com @randomsurfer ...€¦ · “Need to run logical backup on a fleet without disruption to ingress traffic” “Run anti-entropy repair on Cassandra](https://reader033.vdocuments.net/reader033/viewer/2022050221/5f665232578fe244b513590e/html5/thumbnails/49.jpg)
● Safety● Cassandra● Elasticsearch● Common issues● Constraints
○ Limit○ Healthcheck○ Mutual exclusion
Scheduled Backups
![Page 50: Raghavendra D Prabhu rprabhu@yelp.com @randomsurfer ...€¦ · “Need to run logical backup on a fleet without disruption to ingress traffic” “Run anti-entropy repair on Cassandra](https://reader033.vdocuments.net/reader033/viewer/2022050221/5f665232578fe244b513590e/html5/thumbnails/50.jpg)
Secure Infrastructure
$ uptime 06:52:54 up 99 days, 19:20, 1 user, load average: 0.02, 0.03, 0.07
ps -eo pid,cmd,lstart | grep ..
10058 zookeeper Tue Dec 5 05:23:43 2017
![Page 51: Raghavendra D Prabhu rprabhu@yelp.com @randomsurfer ...€¦ · “Need to run logical backup on a fleet without disruption to ingress traffic” “Run anti-entropy repair on Cassandra](https://reader033.vdocuments.net/reader033/viewer/2022050221/5f665232578fe244b513590e/html5/thumbnails/51.jpg)
www.yelp.com/careers/
We're Hiring!
![Page 52: Raghavendra D Prabhu rprabhu@yelp.com @randomsurfer ...€¦ · “Need to run logical backup on a fleet without disruption to ingress traffic” “Run anti-entropy repair on Cassandra](https://reader033.vdocuments.net/reader033/viewer/2022050221/5f665232578fe244b513590e/html5/thumbnails/52.jpg)
![Page 53: Raghavendra D Prabhu rprabhu@yelp.com @randomsurfer ...€¦ · “Need to run logical backup on a fleet without disruption to ingress traffic” “Run anti-entropy repair on Cassandra](https://reader033.vdocuments.net/reader033/viewer/2022050221/5f665232578fe244b513590e/html5/thumbnails/53.jpg)
![Page 54: Raghavendra D Prabhu rprabhu@yelp.com @randomsurfer ...€¦ · “Need to run logical backup on a fleet without disruption to ingress traffic” “Run anti-entropy repair on Cassandra](https://reader033.vdocuments.net/reader033/viewer/2022050221/5f665232578fe244b513590e/html5/thumbnails/54.jpg)
![Page 55: Raghavendra D Prabhu rprabhu@yelp.com @randomsurfer ...€¦ · “Need to run logical backup on a fleet without disruption to ingress traffic” “Run anti-entropy repair on Cassandra](https://reader033.vdocuments.net/reader033/viewer/2022050221/5f665232578fe244b513590e/html5/thumbnails/55.jpg)
@YelpEngineering
fb.com/YelpEngineers
engineeringblog.yelp.com
github.com/yelp
![Page 56: Raghavendra D Prabhu rprabhu@yelp.com @randomsurfer ...€¦ · “Need to run logical backup on a fleet without disruption to ingress traffic” “Run anti-entropy repair on Cassandra](https://reader033.vdocuments.net/reader033/viewer/2022050221/5f665232578fe244b513590e/html5/thumbnails/56.jpg)
Q & A
● Slides will also be uploaded to slideshare.net/slidunder.
![Page 57: Raghavendra D Prabhu rprabhu@yelp.com @randomsurfer ...€¦ · “Need to run logical backup on a fleet without disruption to ingress traffic” “Run anti-entropy repair on Cassandra](https://reader033.vdocuments.net/reader033/viewer/2022050221/5f665232578fe244b513590e/html5/thumbnails/57.jpg)
Q & A
❖ Q: What challenges remain with Taskerman.➢ A:
❖ Q: …➢ A: …
![Page 58: Raghavendra D Prabhu rprabhu@yelp.com @randomsurfer ...€¦ · “Need to run logical backup on a fleet without disruption to ingress traffic” “Run anti-entropy repair on Cassandra](https://reader033.vdocuments.net/reader033/viewer/2022050221/5f665232578fe244b513590e/html5/thumbnails/58.jpg)
![Page 59: Raghavendra D Prabhu rprabhu@yelp.com @randomsurfer ...€¦ · “Need to run logical backup on a fleet without disruption to ingress traffic” “Run anti-entropy repair on Cassandra](https://reader033.vdocuments.net/reader033/viewer/2022050221/5f665232578fe244b513590e/html5/thumbnails/59.jpg)
● https://www.elastic.co/products/elasticsearch ● https://zookeeper.apache.org/ ● https://kafka.apache.org/● https://www.flickr.com/photos/dapuglet/6291424431 ● http://www.alamy.com/stock-photo/cattle-penning.html ● http://www.firstcallsigns.co.uk/content/images/thumbs/0000927_EE80127.jpeg ● https://sensuapp.org/img/logo-flat-white.png ● https://thumbs.gfycat.com/FocusedCompetentEyas-max-1mb.gif ● https://www.percona.com/sites/default/files/dashboard.png ● https://www.sales-initiative.com/downloads/2856/download/resilience.jpg?cb=29f43ac82cea225ab3ee370d7580760d ● http://izquotes.com/quotes-pictures/quote-a-distributed-system-is-one-in-which-the-failure-of-a-computer-you-didn-t-eve
n-know-existed-can-leslie-lamport-346227.jpg ● https://pbs.twimg.com/media/DRCfqaCWsAczqTz.jpg ● https://upload.wikimedia.org/wikipedia/en/thumb/e/e0/Iron_Man_bleeding_edge.jpg/220px-Iron_Man_bleeding_edge.jpg ● https://github.com/mesos/chronos● https://github.com/mesosphere
Image Credits
![Page 60: Raghavendra D Prabhu rprabhu@yelp.com @randomsurfer ...€¦ · “Need to run logical backup on a fleet without disruption to ingress traffic” “Run anti-entropy repair on Cassandra](https://reader033.vdocuments.net/reader033/viewer/2022050221/5f665232578fe244b513590e/html5/thumbnails/60.jpg)
● http://www.networknuts-web.biz/wp-content/uploads/2014/10/cron-logo.png ● http://www.pvhc.net/img195/ojfspebrvfblupftgajb.png ● https://fun-damentals.com/wp-content/uploads/2016/05/a-resilience.png ● http://www.azquotes.com/picture-quotes/quote-debugging-is-twice-as-hard-as-writing-the-code-in-the-first-place-therefor
e-if-you-write-brian-kernighan-66-91-06.jpg ● https://thenounproject.com/ ● https://aws.amazon.com/ ● https://www.splunk.com/ ● https://www.terraform.io/ ● http://yelp.com ● http://cloudscaling.com/blog/cloud-computing/the-history-of-pets-vs-cattle/
Image Credits
![Page 61: Raghavendra D Prabhu rprabhu@yelp.com @randomsurfer ...€¦ · “Need to run logical backup on a fleet without disruption to ingress traffic” “Run anti-entropy repair on Cassandra](https://reader033.vdocuments.net/reader033/viewer/2022050221/5f665232578fe244b513590e/html5/thumbnails/61.jpg)
● https://engineeringblog.yelp.com/2015/03/using-services-to-break-down-monoliths.html● http://cloudscaling.com/blog/cloud-computing/the-history-of-pets-vs-cattle/ ● https://martinfowler.com/bliki/TwoHardThings.html ● https://zookeeper.apache.org/ ● https://www.terraform.io/ ● https://github.com/Yelp/service-principles ● https://en.wikipedia.org/wiki/Law_of_Demeter
Further Reading