how spotify scales apache storm pipelines

40

Upload: kinshuk-mishra

Post on 12-Jul-2015

4.214 views

Category:

Data & Analytics


3 download

TRANSCRIPT

Page 1: How Spotify scales Apache Storm Pipelines
Page 2: How Spotify scales Apache Storm Pipelines
Page 3: How Spotify scales Apache Storm Pipelines
Page 4: How Spotify scales Apache Storm Pipelines
Page 5: How Spotify scales Apache Storm Pipelines

○○○

○○

Page 6: How Spotify scales Apache Storm Pipelines
Page 7: How Spotify scales Apache Storm Pipelines

Page 8: How Spotify scales Apache Storm Pipelines
Page 9: How Spotify scales Apache Storm Pipelines

Log EventsApache Kafka

Real-time Personalization Pipeline

Apache Storm

User Profile Store

Apache Cassandra

Entity Metadata Store

Apache Cassandra

Page 10: How Spotify scales Apache Storm Pipelines

Page 11: How Spotify scales Apache Storm Pipelines
Page 12: How Spotify scales Apache Storm Pipelines
Page 13: How Spotify scales Apache Storm Pipelines
Page 14: How Spotify scales Apache Storm Pipelines
Page 15: How Spotify scales Apache Storm Pipelines

Page 16: How Spotify scales Apache Storm Pipelines
Page 17: How Spotify scales Apache Storm Pipelines
Page 18: How Spotify scales Apache Storm Pipelines

Page 19: How Spotify scales Apache Storm Pipelines
Page 20: How Spotify scales Apache Storm Pipelines
Page 21: How Spotify scales Apache Storm Pipelines

Page 22: How Spotify scales Apache Storm Pipelines
Page 23: How Spotify scales Apache Storm Pipelines

Page 24: How Spotify scales Apache Storm Pipelines

Build v2

Storm Cluster

Running v1

t1 t2

Storm Cluster

t3 t4

Running v1

Running v2

Deactivate v1

Submit v2

Check v2

graphs

Kill v1

Storm Cluster

Running v2

t5 t6 t7 t8

Page 25: How Spotify scales Apache Storm Pipelines

Page 26: How Spotify scales Apache Storm Pipelines
Page 27: How Spotify scales Apache Storm Pipelines

Pagerduty Inhouse Solution

Page 28: How Spotify scales Apache Storm Pipelines

Page 29: How Spotify scales Apache Storm Pipelines

○○○○

Page 30: How Spotify scales Apache Storm Pipelines

Page 31: How Spotify scales Apache Storm Pipelines

Log EventsApache Kafka

Real-time Personalization Pipeline

Apache Storm

User Profile Store

Apache Cassandra

Entity Metadata Store

Apache Cassandra

Page 32: How Spotify scales Apache Storm Pipelines
Page 33: How Spotify scales Apache Storm Pipelines
Page 34: How Spotify scales Apache Storm Pipelines

● Different tables for different TTLs and set gc_grace_period=0. Read repairs are disabled.

● Used DateTieredCompactionStrategy for short lived data.

● Control the number of open connections from Storm topology to Cassandra

● Configure Snitch to ensure proper call routing

Page 35: How Spotify scales Apache Storm Pipelines
Page 36: How Spotify scales Apache Storm Pipelines

● 1 worker per node per topology● 1 executor per core for CPU bound tasks● 1-10 executors per core for IO bound tasks● Compute total parallelism possible and distribute it

amongst slow and fast tasks. High parallelism for slow tasks, low for fast tasks.

* Parallelism tuning inspired by P Taylor Goetz’s Strata 2014 talk

Page 37: How Spotify scales Apache Storm Pipelines

Page 38: How Spotify scales Apache Storm Pipelines

● Think about constraints in external vs in-process caching○ External Caching

■ Network IO■ Latency■ Another point of failure

○ In-process■ Limited memory■ No persistence

Page 39: How Spotify scales Apache Storm Pipelines
Page 40: How Spotify scales Apache Storm Pipelines