collecting 600m events/day

85
600,000,000 events/day Lars Marius Garshol, [email protected] 2016–05–30, AWS Meetup Oslo http://twitter.com/larsga

Upload: lars-marius-garshol

Post on 16-Apr-2017

752 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Collecting 600M events/day

600,000,000 events/dayLars Marius Garshol, [email protected] 2016–05–30, AWS Meetup Oslo http://twitter.com/larsga

Page 2: Collecting 600M events/day

Schibsted? Collecting events?

Why?

Page 3: Collecting 600M events/day

Schibsted

3

30 countries200 million users/month20 billion pageviews/month

Page 4: Collecting 600M events/day

Three parts

4

Page 5: Collecting 600M events/day

Tinius

5

Page 6: Collecting 600M events/day

Restructuring

6

Page 7: Collecting 600M events/day

User data is central

7

Page 8: Collecting 600M events/day

Event collection• Event collection pipeline exists to enable strategy

• Collect information on user behaviour • across all web properties

• Use to • target ads • understand users • later: collect backend data

8

Page 9: Collecting 600M events/day

Events/day

9

Page 10: Collecting 600M events/day

Helicopter crash

10

Page 11: Collecting 600M events/day

The pipelineRælingen, 1960. Photo by Per Solrud

Page 12: Collecting 600M events/day

Architecture

12

Collector Kinesis Storage+ Piper

S3

Kinesis

Page 13: Collecting 600M events/day

The collector• Very vulnerable

• traffic comes with sudden spikes • can’t expect clients to resend if collector fails

• A very thin layer in front of Kinesis • persist the data as quickly as possible • we can work on the data later, once it’s stored

• Use Kinesis because it has very low write latency • generally <100 ms

13

Page 14: Collecting 600M events/day

Kinesis

14

Shard 1

Shard 2

Shard 3

Shard 4

Endpoint

Page 15: Collecting 600M events/day

Complications

15

Collector

CISSteps:

1. Start batching events 2. Get ID from CIS 3. If opt out, stop 4. Send events

Page 16: Collecting 600M events/day

Consumers• Ad targeting

• Personalization

• User profiling

• Individual sites

• Schibsted internal CMS

• …

16

Piper

Anonymized

Identifiable

Page 17: Collecting 600M events/day

Storage• The last step writes events into S3 as JSON

• Each event ~2000 bytes = ~1200 GB/day • this turns out to be very slow to load in Spark

• Switched over to using batch jobs coalescing JSON into Parquet • some difficulties with nested -> flat structure • unpredictable schema variations from site to site • huge performance gain

17

Page 18: Collecting 600M events/day

S3• AWS storage system

• kind of like a file system over HTTP

• You can write to s3://bucket/path/to/file • files are blobs • have to write entire file when writing, no random access • eventually consistent

• Very strong guarantee on durability • but latency

18

Page 19: Collecting 600M events/day

Why not S3 directly?• Writing to S3 takes ~2-4 seconds

• a long time to wait before the events are safe

• Reading takes a while, too

• Not good for “realtime” usages • of which we have a few

19

Page 20: Collecting 600M events/day

Working in the cloud

Page 21: Collecting 600M events/day

Amazon web services• For each application, build a Linux image (AMI)

• Set up an auto-scaling group • define min and max number of nodes • define rules to scale up/down based on metrics (CPU, …)

• Amazon ELB in front

• Also provides Kinesis, S3, …

21

Page 22: Collecting 600M events/day
Page 23: Collecting 600M events/day

Scaling policies

23

Page 24: Collecting 600M events/day

Metrics

24

Page 25: Collecting 600M events/day

Monitors

25

Page 26: Collecting 600M events/day

Probe

26

Page 27: Collecting 600M events/day

Anomaly detection

27

Page 28: Collecting 600M events/day

CloudFormation• Describe wanted infrastructure in JSON

• one Kinesis stream • one S3 bucket • SQS queue with notifications from said S3 bucket • another S3 bucket

• Run command, infrastructure is created

• Use this to create new test environment every time we run integration tests

28

Page 29: Collecting 600M events/day

CloudFormation (2)• Doesn’t work perfectly

• had to show CF some dependencies it couldn’t figure out for itself

• Tight AWS API limits in some cases • can’t have more than 5 Kinesis streams creating/deleting

simultaneously

• Have been able to work round all of this

29

Page 30: Collecting 600M events/day

Data collector

Page 31: Collecting 600M events/day

The collector

31

Page 32: Collecting 600M events/day

Basics• Scala application

• Based on Netty/Finagle/Finatra

• Framework scales very well • if app is overloaded, the only thing that happens is number

of open connections piles up • eventually hits max open files, causing failures

32

Page 33: Collecting 600M events/day

First design

33

Kinesis

parse JSON

validate

insert fields

Kinesis PUT wait

send response

Page 34: Collecting 600M events/day

First version

34

Kinesis

parse JSON

validate

insert fields queue JSON

send response

get queue

Kinesis PUT wait

Page 35: Collecting 600M events/day

Error handling?• I had no experience with AWS - didn’t know what to expect

• could Kinesis go down for 1 minute? 1 hour? 1 day? • anecdotal reports of single nodes being unable to contact Kinesis

while others working fine

• Considered • batching on disk with fallback to S3 • feeding batched events back to collector • …

• In the end decided to wait for more experience

35

Page 36: Collecting 600M events/day

Ooops!

36

Page 37: Collecting 600M events/day

Queue grows

37

Page 38: Collecting 600M events/day

Kaboom

38

Page 39: Collecting 600M events/day

GC eats the CPU

39

Page 40: Collecting 600M events/day

Failure to recover

40

Page 41: Collecting 600M events/day

Redesign

41

extend JSON serialize JSON

temp storage

fixed-size queue fixed-size queue

request thread

Page 42: Collecting 600M events/day

Latency spike!

42

Page 43: Collecting 600M events/day

No increase in CPU

43

Page 44: Collecting 600M events/day

Frankfurt saves the day

44

Page 45: Collecting 600M events/day

Network outage

45

Page 46: Collecting 600M events/day

Network out is slow

46

Page 47: Collecting 600M events/day

Events spill to disk

47

Page 48: Collecting 600M events/day

Application survives

48

Page 49: Collecting 600M events/day

Disk usage

49

Page 50: Collecting 600M events/day

Design flaws• Collector shouldn’t parse the JSON

• this is a waste of CPU resources

• Collector should just pack JSON plus extra fields into some efficient serialization • then write to Kinesis

• Let later stages deal with the tidying up • not done yet, because requires changes to several

components • quite possibly also a custom Spark reader

50

Page 51: Collecting 600M events/day

Kinesis -> S3

Page 52: Collecting 600M events/day

Storage

• Very simple application • uses Kinesis Client Library (KCL) to read from Kinesis • stores data to S3

• Does nothing else • data in Kinesis lives 24 hours • therefore want something simple and fool-proof

52

Kinesis Storage S3

Page 53: Collecting 600M events/day

Reading from Kinesis

53

Shard 1

Shard 2

Shard 3

Shard 4

Node Node Node Node Node

Page 54: Collecting 600M events/day

KCL APIpublic class RecordProcessor implements IRecordProcessor {

public RecordProcessor(AmazonS3 s3,

String bucket,

ThreadLocal<SimpleDateFormat> date_formatter) {

public void initialize(InitializationInput initializationInput) {

this.shard_id = initializationInput.getShardId();

}

public void processRecords(ProcessRecordsInput context) {

public void shutdown(ShutdownInput context) {

log.info("Shutting down because {}", context.getShutdownReason());

}

54

Page 55: Collecting 600M events/day

Falling behind

55

Page 56: Collecting 600M events/day

Kinesis read limits• KCL can give us max 10,000 records per read

• but it never does • even if the stream contains many more records

• Experiment • write lots of 80-byte records into a test stream • then add lots of 2000-byte records • read the stream with KCL, observe results

56

Page 57: Collecting 600M events/day

ResultWrote 10000 records (815960 bytes) in 2409 ms, 4 records/ms Wrote 10000 records (816980 bytes) in 790 ms, 12 records/ms Wrote 10000 records (816270 bytes) in 750 ms, 13 records/ms Wrote 10000 records (817690 bytes) in 742 ms, 13 records/ms Wrote 10000 records (817990 bytes) in 929 ms, 10 records/ms Wrote 10000 records (817990 bytes) in 798 ms, 12 records/ms Wrote 10000 records (819000 bytes) in 720 ms, 13 records/ms Wrote 10000 records (816980 bytes) in 724 ms, 13 records/ms Wrote 10000 records (817990 bytes) in 833 ms, 12 records/ms Wrote 10000 records (818080 bytes) in 726 ms, 13 records/ms Wrote 10000 records (818000 bytes) in 730 ms, 13 records/ms Wrote 10000 records (818180 bytes) in 721 ms, 13 records/ms Wrote 9535 records (6176176 bytes) in 2432 ms, 3 records/ms Wrote 3309 records (6934426 bytes) in 1991 ms, 1 records/ms Wrote 3309 records (6933172 bytes) in 1578 ms, 2 records/ms Wrote 3309 records (6934878 bytes) in 1667 ms, 1 records/ms Wrote 3310 records (6934916 bytes) in 1599 ms, 2 records/ms Wrote 3309 records (6934319 bytes) in 1614 ms, 2 records/ms Wrote 3309 records (6933975 bytes) in 2054 ms, 1 records/ms

57

Bigger records = fewer per batch

Page 58: Collecting 600M events/day

The relevant knob• KCL has a setting for sleep between reads

• Used to have this at 10,000 ms • this in order to not get so many small JSON files • these are slow to read back out of S3

• As a result of this investigation, reduced to 5000ms • much later, reduced further to 3000ms

58

Page 59: Collecting 600M events/day

Results

59

New setting deployed

Page 60: Collecting 600M events/day

Piper• Home-made app which streams events through a tree

of stages

• Reads storage’s S3 output • using SQS notification to be informed of new files

• Feeds output through lots of stages • coalescing into larger files • demuxing by site • filter to Kinesis • …

60

Page 61: Collecting 600M events/day

Analytics platform

Page 62: Collecting 600M events/day

Tools• Analytics jobs are written in Apache Spark

• much easier to write code for than Hadoop • also more efficient

• Used to be deployed on separate clusters • this was very expensive • now switching over to a shared cluster • this is somewhat painful, as we’re still learning

62

Page 63: Collecting 600M events/day

Spark

63

Page 64: Collecting 600M events/day

Dependencies

64

Storage output

Anon

Ident

anonymize

Site 1Site 2Site 3

demux

Page 65: Collecting 600M events/day

• A job scheduler developed by Spotify • use Python code to declare parameters, dependencies,

and outputs • Luigi will schedule jobs accordingly • locking to ensure only one of each job running at a time

• Python code also for actually starting the job • many ready-made wrappers for known job types

65

Page 66: Collecting 600M events/day

A Luigi task

Page 67: Collecting 600M events/day

Luigi issues• No cron-like functionality

• has to be handled by other means (like cron)

• Single master only • uses file locking on disk to prevent simultaneous execution

of tasks

67

Page 68: Collecting 600M events/day

knox• Schibsted internal tool for working with data

• knox job deploy: deploy job in cluster

• knox job status: what’s the status of my job?

• knox job kill: stop running job

• knox job disable: don’t run this job again

• knox job list: what jobs exist?68

Page 69: Collecting 600M events/day

Cluster architecture

69

Chronos

Mesos cluster

Job #1 Job #2 Job #3

Luigi server

Spark master

Spark node

Spark node

Spark node

Spark node

Craneknox

Page 70: Collecting 600M events/day
Page 71: Collecting 600M events/day

Synchronous events

Page 72: Collecting 600M events/day

New requirements• Receive backend events from other applications

• new ad, ad updated, …

• Can’t lose events • want to be able to tie business logic to them

• Must confirm event written to Kinesis with 200 OK • clients can resend

72

Page 73: Collecting 600M events/day

Remember this?

73

Kinesis

parse JSON

validate

insert fields

Kinesis PUT wait

send response

Page 74: Collecting 600M events/day

Throttling

74

try { openRequests++ if (openRequests > MAX_SIMULTANEOUS_REQUESTS) response.status(509) else if (kinesis.sendToKinesis(request.content)) response.status(200) else response.status(500) } finally { openRequests-- }

openRequests never bigger than 2…

Page 75: Collecting 600M events/day

What’s going on?• Worker-based web servers

• allocate a fixed number of worker threads/processes • have enough that while some may block on I/O there are

always some threads making progress

• Event-based web servers • small number of worker threads • use polling interfaces to multiplex between connections • less context switching => more efficient

75

Page 76: Collecting 600M events/day

Finatra• Event-based framework

• response.status(200) doesn’t return a Response • it returns Future[Response] that’s already populated

• This means, if we’re blocked we can return a Future[Response] that completes when we’re done • allows Finatra to continue on another connection that’s not

blocked

76

Page 77: Collecting 600M events/day

Canonical solution

77

ThreadPool

Kinesis

process JSON

hand off task

get future

Page 78: Collecting 600M events/day

Weaknesses• Now suddenly we have a lot of threads again

• back to the context-switching we were supposed to avoid

• Hard, low limit on the number of simultaneous requests

• Every event is a separate request to Kinesis • very inefficient

78

Page 79: Collecting 600M events/day

What’s a Future?public interface Future<V> {

public boolean isDone()

// true iff the value is there, false otherwise

public V get()

// loop until the value is there, then return it

// if computing the value failed, throw the exception

}79

Page 80: Collecting 600M events/day

Redesign

80

extend JSON serialize JSON

temp storage

fixed-size queue fixed-size queue

request thread

SyncEndpt Add event with callback Callback resolves Future

Page 81: Collecting 600M events/day

Actual codeval promise = new Promise[Response]

kinesis.add(request.contentString.getBytes("utf-8"),

// the writer will call this function with the outcome, which

// causes Finatra to send the response to the client

(success : Boolean) => if (success)

promise.setValue(response.status(200))

else

promise.setValue(response.status(509))

)

promise

81

Page 82: Collecting 600M events/day

Benefits• Number of threads is kept minimal

• not all that context-switching

• Hard limit on requests is much higher

• No extra moving parts

• Synchronous requests sent together with other requests • much more efficient

82

Page 83: Collecting 600M events/day

Winding up

Page 84: Collecting 600M events/day

Conclusion• Schibsted is still only just getting started

• the basics now in place • starting to generate money • a lot more work to do

• Use of AWS saves us from a lot of hassle • Working at this scale

• causes different challenges from what I’m used to • a lot more error handling/retries/scaling

84

Page 85: Collecting 600M events/day