aws kinesis

Post on 22-Jan-2018

188 Views

Category:

Software

3 Downloads

Preview:

Click to see full reader

TRANSCRIPT

AWS KINESISSCRATCHING THE SURFACE OF

QUESTION TIME

QUESTION

What (Kafka?) cluster do you needto ingest:

- 10,000 record/sec

- record size 512Bytes

KINESIS

AWS KINESIS

WHAT IS KINESIS?

▸ Platform for streaming data on AWS

▸ "sometimes TBs per hour"

AWS KINESIS

MOTIVATION

▸ Highly Scalable

▸ Durable

▸ Elastic

▸ Replay-able reads

AWS KINESIS

KINESIS COMPONENTS

Firehose AnalyticsStreams

KINESIS STREAMS

AWS KINESIS STREAMS

AWS KINESIS STREAMS

TERMINOLOGY

▸ Streams - ordered sequence of data records

▸ Data record - Sequence Number, Partition Key, Data Blob

▸ 1MB max

▸ Retention period - 24h - 7d

▸ Producers, Consumers

▸ Shards

AWS KINESIS STREAMS

KINESIS STREAMS - SHARDS

▸ Fixed unit of capacity

▸ Read

▸ 5 transaction / sec

▸ 2MB / sec

▸ Write

▸ 1000 records / sec

▸ 1MB / sec

QUESTION

What cluster do you need to ingest

- 10,000 records/sec

- record size 512Bytes

AWS KINESIS STREAMS

KINESIS STREAMS - SHARD CALCULATION

Requirement Kinesis Stream Write Capacity

10,000 records/sec 1,000 Record/sec

512 Bytes/rec (5MB/sec) 1MB/sec

AWS KINESIS STREAMS

KINESIS STREAMS - SHARD CALCULATION

▸ 10 Shards

▸ 10,000 records / sec

▸ 10MB / sec

DEMO

AWS KINESIS STREAMS

DEMO

▸ Create a stream with 1 shard

▸ Put 10 records / sec

▸ Read in batches every sec

{ RecordId: 11, KeyPressCount: 22, UserId: 33 }

KINESIS FIREHOSE

AWS KINESIS FIREHOSE

AWS KINESIS FIREHOSE

TERMINOLOGY

▸ Firehose delivery stream

▸ record - 1MB max

▸ data producer

▸ buffer size, buffer interval

AWS KINESIS FIREHOSE

DATA DELIVERY

▸ KINESIS STREAM

▸ S3

▸ Redshift

▸ Elasticsearch

DEMO

KINESIS ANALYTICS

AWS KINESIS ANALYTICS

AWS KINESIS ANALYTICS

AWS KINESIS ANALYTICS

TERMINOLOGY

▸ Input

▸ Application Code

▸ In-App-Streams

▸ Pumps

▸ Streaming SQL

▸ Output

AWS KINESIS ANALYTICS

STREAMING SQL

▸ Tumbling Window

[...] GROUP BY FLOOR((“SOURCE_SQL_STREAM_001”.ROWTIME – TIMESTAMP ‘1970-01-01 00:00:00’) SECOND / 10 TO SECOND)

▸ Sliding Window

SELECT AVG(change) OVER W1 as avg_change FROM "SOURCE_SQL_STREAM_001" WINDOW W1 AS (PARTITION BY ticker_symbol RANGE INTERVAL '10' SECOND PRECEDING)

AWS KINESIS ANALYTICS

STREAMING SQL - TUMBLING WINDOW

DEMO

AWS KINESIS ANALYTICS

DEMO

▸ Check BDMeetup Application in AWS Console

▸ Producer / Consumer

PRODUCERAPP BDMEETUP STREAM ANALYTICS

APPBDMEETUP-OUTPUT

STREAMCONSUMER

APP

THANK YOU

top related