processing big data in motion - ieee computer society · amazon kinesis firehose • zero admin:...

66
Processing Big Data in Motion Streaming Data Ingestion and Processing Roger Barga, General Manager, Kinesis Streaming Services, AWS April 7, 2016

Upload: others

Post on 20-May-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Processing Big Data in Motion - IEEE Computer Society · Amazon Kinesis Firehose • Zero Admin: Capture and deliver streaming data into S3, Redshift, and other destinations without

Processing Big Data in MotionStreaming Data Ingestion and Processing

Roger Barga, General Manager, Kinesis Streaming Services, AWS

April 7, 2016

Page 2: Processing Big Data in Motion - IEEE Computer Society · Amazon Kinesis Firehose • Zero Admin: Capture and deliver streaming data into S3, Redshift, and other destinations without

Riding the Streaming Rapids

2011 20152007 & 2008 2013201220102009 2016

Azure Stream Analytics

Complex Event Processingover Streaming Data

Relational Semanticsand Implementation

Streaming Map Reduce& Machine Learning over Streams

Page 3: Processing Big Data in Motion - IEEE Computer Society · Amazon Kinesis Firehose • Zero Admin: Capture and deliver streaming data into S3, Redshift, and other destinations without

Interest in and demand for

stream data processing is rapidly

increasing*…* Understatement of the year…

Page 4: Processing Big Data in Motion - IEEE Computer Society · Amazon Kinesis Firehose • Zero Admin: Capture and deliver streaming data into S3, Redshift, and other destinations without

Most data is produced continuously

Why?

{

"payerId": "Joe",

"productCode": "AmazonS3",

"clientProductCode": "AmazonS3",

"usageType": "Bandwidth",

"operation": "PUT",

"value": "22490",

"timestamp": "1216674828"

}

Metering Record

127.0.0.1 user-identifier frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif

HTTP/1.0" 200 2326

Common Log Entry

<165>1 2003-10-11T22:14:15.003Z

mymachine.example.com evntslog - ID47

[exampleSDID@32473 iut="3" eventSource="Application"

eventID="1011"][examplePriority@32473 class="high"]

Syslog Entry

“SeattlePublicWater/Kinesis/123/Realtime” –

412309129140

MQTT Record<R,AMZN,T,G,R1>

NASDAQ OMX Record

Page 5: Processing Big Data in Motion - IEEE Computer Society · Amazon Kinesis Firehose • Zero Admin: Capture and deliver streaming data into S3, Redshift, and other destinations without

Time is money• Perishable Insights (Forrester)

Why?

• Hourly server logs: how your systems were misbehaving an hour ago

• Weekly / Monthly Bill: What you spent this past billing

cycle?

• Daily fraud reports: tells you if there was fraud yesterday

• CloudWatch metrics: what just went wrong now

• Real-time spending alerts/caps: guaranteeing you

can’t overspend

• Real-time detection: blocks fraudulent use now

Page 6: Processing Big Data in Motion - IEEE Computer Society · Amazon Kinesis Firehose • Zero Admin: Capture and deliver streaming data into S3, Redshift, and other destinations without

Time is money• Perishable Insights (Forrester)

• A more efficient implementation

• Most ‘Big Data’ deployments process

continuously generated data (batched)

Why?

Page 7: Processing Big Data in Motion - IEEE Computer Society · Amazon Kinesis Firehose • Zero Admin: Capture and deliver streaming data into S3, Redshift, and other destinations without

Availability Variety of stream data processing systems,

active ecosystem but still early days…

Why?

Page 8: Processing Big Data in Motion - IEEE Computer Society · Amazon Kinesis Firehose • Zero Admin: Capture and deliver streaming data into S3, Redshift, and other destinations without

Disruptive Foundational for business critical workflows

Enable new class of applications & services

that process data continuously.

Why?

Page 9: Processing Big Data in Motion - IEEE Computer Society · Amazon Kinesis Firehose • Zero Admin: Capture and deliver streaming data into S3, Redshift, and other destinations without

Need to begin thinking about applications &

services in terms of streams of data and

continuous processing.

You

A change in perspective is worth 80 IQ points… – Alan Kay

Page 10: Processing Big Data in Motion - IEEE Computer Society · Amazon Kinesis Firehose • Zero Admin: Capture and deliver streaming data into S3, Redshift, and other destinations without

• Scalable & Durable Data Ingest A quick word on our motivation

Kinesis Streams, through a simple example

• Continuous Stream Data Processing Kinesis Client Library (KCL)

One select design challenge: dynamic resharding

How customers are using Kinesis Streams today

• Building on Kinesis Streams Kinesis Firehose

AWS Event Driven Computing

Agenda

Page 11: Processing Big Data in Motion - IEEE Computer Society · Amazon Kinesis Firehose • Zero Admin: Capture and deliver streaming data into S3, Redshift, and other destinations without

Our Motivation for Continuous Processing

AWS Metering service• 100s of millions of billing records per second

• Terabytes++ per hour

• Hundreds of thousands of sources

• For each customer: gather all metering records & compute monthly bill

• Auditors guarantee 100% accuracy at months end

Seem perfectly reasonable to run as a batch, but relentless pressure for realtime…

With a Data Warehouse to load• 1000s extract-transform-load (ETL) jobs every day

• Hundreds of thousands of files per load cycle

• Thousands of daily users, hundreds of queries per hour

Page 12: Processing Big Data in Motion - IEEE Computer Society · Amazon Kinesis Firehose • Zero Admin: Capture and deliver streaming data into S3, Redshift, and other destinations without

Our Motivation for Continuous Processing

AWS Metering service• 100s of millions of billing records per second

• Terabytes++ per hour

• Hundreds of thousands of sources

• For each customer: gather all metering records & compute monthly bill

• Auditors guarantee 100% accuracy at months end

Other Service Teams, Similar Requirements• CloudWatch Logs and CloudWatch Metrics

• CloudFront API logging

• ‘Snitch’ internal datacenter hardware metrics

Page 13: Processing Big Data in Motion - IEEE Computer Society · Amazon Kinesis Firehose • Zero Admin: Capture and deliver streaming data into S3, Redshift, and other destinations without

Real-time Ingest

• Highly Scalable

• Durable

• Replayable Reads

Continuous Processing

• Support multiple simultaneous

data processing applications

• Load-balancing incoming

streams, scale out processing

• Fault-tolerance, Checkpoint /

Replay

Right Tool for the Job Enable Streaming Data Ingestion and Processing

Page 14: Processing Big Data in Motion - IEEE Computer Society · Amazon Kinesis Firehose • Zero Admin: Capture and deliver streaming data into S3, Redshift, and other destinations without

twitter-trends.com

Elastic Beanstalk

twitter-trends.com

Example applicationtwitter-trends.com website

Page 15: Processing Big Data in Motion - IEEE Computer Society · Amazon Kinesis Firehose • Zero Admin: Capture and deliver streaming data into S3, Redshift, and other destinations without

twitter-trends.com

Too big to handle on one box

Page 16: Processing Big Data in Motion - IEEE Computer Society · Amazon Kinesis Firehose • Zero Admin: Capture and deliver streaming data into S3, Redshift, and other destinations without

twitter-trends.com

The solution: streaming map/reduce

My top-10

My top-10

My top-10

Global top-10

Page 17: Processing Big Data in Motion - IEEE Computer Society · Amazon Kinesis Firehose • Zero Admin: Capture and deliver streaming data into S3, Redshift, and other destinations without

twitter-trends.com

Core concepts

My top-10

My top-10

My top-10

Global top-10

Data recordStream

Partition key

ShardWorker

Shard: 14 17 18 21 23

Data record

Sequence number

Page 18: Processing Big Data in Motion - IEEE Computer Society · Amazon Kinesis Firehose • Zero Admin: Capture and deliver streaming data into S3, Redshift, and other destinations without

twitter-trends.com

How this relates to Kinesis

KinesisKinesis application

Page 19: Processing Big Data in Motion - IEEE Computer Society · Amazon Kinesis Firehose • Zero Admin: Capture and deliver streaming data into S3, Redshift, and other destinations without

Kinesis Streaming Data Ingestion

• Streams are made of Shards

• Each Shard ingests data up to

1MB/sec, and up to 1000 TPS

• Producers use a PUT call to store

data in a Stream: PutRecord {Data,

PartitionKey, StreamName}

• Each Shard emits up to 2 MB/sec

• All data is stored for 24 hours, 7

days if extended retention is ‘ON’

• Scale Kinesis streams by adding

or removing Shards

• Replay data from retention period

Page 20: Processing Big Data in Motion - IEEE Computer Society · Amazon Kinesis Firehose • Zero Admin: Capture and deliver streaming data into S3, Redshift, and other destinations without

Amazon Web Services

AZ AZ AZ

Durable, highly consistent storage replicates dataacross three data centers (availability zones)

Aggregate andarchive to S3

Millions ofsources producing100s of terabytes

per hour

FrontEnd

AuthenticationAuthorization

Ordered streamof events supportsmultiple readers

Real-timedashboardsand alarms

Machine learningalgorithms or

sliding windowanalytics

Aggregate analysisin Hadoop or adata warehouse

Inexpensive: $0.028 per million puts

Real-Time Streaming Data Ingestion

Custom-built

Streaming

Applications

(KCL)

Inexpensive: $0.014 per 1,000,000 PUT Payload Units

25 – 40ms 100 – 150ms

Page 21: Processing Big Data in Motion - IEEE Computer Society · Amazon Kinesis Firehose • Zero Admin: Capture and deliver streaming data into S3, Redshift, and other destinations without

Kinesis Client Library

Page 22: Processing Big Data in Motion - IEEE Computer Society · Amazon Kinesis Firehose • Zero Admin: Capture and deliver streaming data into S3, Redshift, and other destinations without

twitter-trends.com

Using the Kinesis API directly

K

I

N

E

S

I

S

Page 23: Processing Big Data in Motion - IEEE Computer Society · Amazon Kinesis Firehose • Zero Admin: Capture and deliver streaming data into S3, Redshift, and other destinations without

twitter-trends.com

Using the Kinesis API directlyK

I

N

E

S

I

S

iterator = getShardIterator(shardId, LATEST);

while (true) {

[records, iterator] =

getNextRecords(iterator, maxRecsToReturn);

process(records);

}

process(records): {

for (record in records) {

updateLocalTop10(record);

}

if (timeToDoOutput()) {

writeLocalTop10ToDDB();

}

}

while (true) {

localTop10Lists =

scanDDBTable();

updateGlobalTop10List(

localTop10Lists);

sleep(10);

}

Page 24: Processing Big Data in Motion - IEEE Computer Society · Amazon Kinesis Firehose • Zero Admin: Capture and deliver streaming data into S3, Redshift, and other destinations without

K

I

N

E

S

I

S

twitter-trends.com

Challenges with using the Kinesis API directly

Kinesis

application

Manual creation of workers and

assignment to shards

Page 25: Processing Big Data in Motion - IEEE Computer Society · Amazon Kinesis Firehose • Zero Admin: Capture and deliver streaming data into S3, Redshift, and other destinations without

K

I

N

E

S

I

S

twitter-trends.com

Challenges with using the Kinesis API directly

Kinesis

application

How many workers

per EC2 instance?

Page 26: Processing Big Data in Motion - IEEE Computer Society · Amazon Kinesis Firehose • Zero Admin: Capture and deliver streaming data into S3, Redshift, and other destinations without

K

I

N

E

S

I

S

twitter-trends.com

Challenges with using the Kinesis API directly

Kinesis

application

How many EC2 instances?

Page 27: Processing Big Data in Motion - IEEE Computer Society · Amazon Kinesis Firehose • Zero Admin: Capture and deliver streaming data into S3, Redshift, and other destinations without

K

I

N

E

S

I

S

twitter-trends.com

Using the Kinesis Client Library

Kinesis

application

Shard mgmt

table

Page 28: Processing Big Data in Motion - IEEE Computer Society · Amazon Kinesis Firehose • Zero Admin: Capture and deliver streaming data into S3, Redshift, and other destinations without

K

I

N

E

S

I

S

twitter-trends.com

Elasticity and Load Balancing

Shard mgmt

table

Page 29: Processing Big Data in Motion - IEEE Computer Society · Amazon Kinesis Firehose • Zero Admin: Capture and deliver streaming data into S3, Redshift, and other destinations without

K

I

N

E

S

I

S

twitter-trends.com

Elasticity and Load Balancing

Auto

scaling

Group

Shard mgmt

table

Page 30: Processing Big Data in Motion - IEEE Computer Society · Amazon Kinesis Firehose • Zero Admin: Capture and deliver streaming data into S3, Redshift, and other destinations without

K

I

N

E

S

I

S

twitter-trends.com

Elasticity and Load Balancing

Auto

scaling

Group

Shard mgmt

table

Page 31: Processing Big Data in Motion - IEEE Computer Society · Amazon Kinesis Firehose • Zero Admin: Capture and deliver streaming data into S3, Redshift, and other destinations without

K

I

N

E

S

I

S

twitter-trends.com

Elasticity and Load Balancing

Shard mgmt

table

Auto

scaling

Group

Page 32: Processing Big Data in Motion - IEEE Computer Society · Amazon Kinesis Firehose • Zero Admin: Capture and deliver streaming data into S3, Redshift, and other destinations without

K

I

N

E

S

I

S

twitter-trends.com

Elasticity and Load Balancing

Shard mgmt

table

Auto

scaling

Group

Page 33: Processing Big Data in Motion - IEEE Computer Society · Amazon Kinesis Firehose • Zero Admin: Capture and deliver streaming data into S3, Redshift, and other destinations without

K

I

N

E

S

I

S

twitter-trends.com

Fault Tolerance Support

Shard mgmt

table

Availability Zone

1

Availability Zone

3

Page 34: Processing Big Data in Motion - IEEE Computer Society · Amazon Kinesis Firehose • Zero Admin: Capture and deliver streaming data into S3, Redshift, and other destinations without

K

I

N

E

S

I

S

twitter-trends.com

Fault Tolerance Support

Shard mgmt

table

XAvailability Zone

1

Availability Zone

3

Page 35: Processing Big Data in Motion - IEEE Computer Society · Amazon Kinesis Firehose • Zero Admin: Capture and deliver streaming data into S3, Redshift, and other destinations without

K

I

N

E

S

I

S

twitter-trends.com

Fault Tolerance Support

Shard mgmt

table

XAvailability Zone

1

Availability Zone

3

Page 36: Processing Big Data in Motion - IEEE Computer Society · Amazon Kinesis Firehose • Zero Admin: Capture and deliver streaming data into S3, Redshift, and other destinations without

K

I

N

E

S

I

S

twitter-trends.com

Fault Tolerance Support

Shard mgmt

tableAvailability Zone

3

Page 37: Processing Big Data in Motion - IEEE Computer Society · Amazon Kinesis Firehose • Zero Admin: Capture and deliver streaming data into S3, Redshift, and other destinations without

Worker Fail Over

Amazon.com Confidential 37

Shard-0

Shard-1

Shard-2

Worker1

Worker2

Worker3

LeaseKey LeaseOwner LeaseCounter

Shard-0 Worker1 85

Shard-1 Worker2 94

Shard-2 Worker3 76

Page 38: Processing Big Data in Motion - IEEE Computer Society · Amazon Kinesis Firehose • Zero Admin: Capture and deliver streaming data into S3, Redshift, and other destinations without

Worker Fail Over

Amazon.com Confidential 38

Shard-0

Shard-1

Shard-2

Worker1

Worker2

Worker3

LeaseKey LeaseOwner LeaseCounter

Shard-0 Worker1 85 86

Shard-1 Worker2 94

Shard-2 Worker3 76 77X

Page 39: Processing Big Data in Motion - IEEE Computer Society · Amazon Kinesis Firehose • Zero Admin: Capture and deliver streaming data into S3, Redshift, and other destinations without

Worker Fail Over

Amazon.com Confidential 39

Shard-0

Shard-1

Shard-2

Worker1

Worker2

Worker3

LeaseKey LeaseOwner LeaseCounter

Shard-0 Worker1 85 86 87

Shard-1 Worker2 94

Shard-2 Worker3 76 77 78X

Page 40: Processing Big Data in Motion - IEEE Computer Society · Amazon Kinesis Firehose • Zero Admin: Capture and deliver streaming data into S3, Redshift, and other destinations without

Worker Fail Over

Amazon.com Confidential 40

Shard-0

Shard-1

Shard-2

Worker1

Worker2

Worker3

LeaseKey LeaseOwner LeaseCounter

Shard-0 Worker1 85 86 87 88

Shard-1 Worker3 94 95

Shard-2 Worker3 76 77 78 79X

Page 41: Processing Big Data in Motion - IEEE Computer Society · Amazon Kinesis Firehose • Zero Admin: Capture and deliver streaming data into S3, Redshift, and other destinations without

Worker Load Balancing

Amazon.com Confidential 41

Shard-0

Shard-1

Shard-2

Worker1

Worker2

Worker3

Worker4

LeaseKey LeaseOwner LeaseCounter

Shard-0 Worker1 88

Shard-1 Worker3 96

Shard-2 Worker3 78X

Page 42: Processing Big Data in Motion - IEEE Computer Society · Amazon Kinesis Firehose • Zero Admin: Capture and deliver streaming data into S3, Redshift, and other destinations without

Worker Load Balancing

Amazon.com Confidential 42

Shard-0

Shard-1

Shard-2

Worker1

Worker2

Worker3

Worker4

LeaseKey LeaseOwner LeaseCounter

Shard-0 Worker1 88

Shard-1 Worker3 96

Shard-2 Worker4 79X

Page 43: Processing Big Data in Motion - IEEE Computer Society · Amazon Kinesis Firehose • Zero Admin: Capture and deliver streaming data into S3, Redshift, and other destinations without

Resharding

Amazon.com Confidential 43

Shard-0Worker1

Worker2

LeaseKey LeaseOwner LeaseCounter checkpoint

Shard-0 Worker1 90 SHARD_END

Shard-0Shard-1

Shard-2

Page 44: Processing Big Data in Motion - IEEE Computer Society · Amazon Kinesis Firehose • Zero Admin: Capture and deliver streaming data into S3, Redshift, and other destinations without

Resharding

Amazon.com Confidential 44

Shard-0

Shard-1

Shard-2

Worker1

Worker2

LeaseKey LeaseOwner LeaseCounter checkpoint

Shard-0 Worker1 90 SHARD_END

Shard-1 0 TRIM_HORIZON

Shard-2 0 TRIM_HORIZON

Shard-0Shard-1

Shard-2

Page 45: Processing Big Data in Motion - IEEE Computer Society · Amazon Kinesis Firehose • Zero Admin: Capture and deliver streaming data into S3, Redshift, and other destinations without

Resharding

Amazon.com Confidential 45

Shard-0

Shard-1

Shard-2

Worker1

Worker2

LeaseKey LeaseOwner LeaseCounter checkpoint

Shard-0 Worker1 90 SHARD_END

Shard-1 Worker1 2 TRIM_HORIZON

Shard-2 Worker2 3 TRIM_HORIZON

Shard-0Shard-1

Shard-2

Page 46: Processing Big Data in Motion - IEEE Computer Society · Amazon Kinesis Firehose • Zero Admin: Capture and deliver streaming data into S3, Redshift, and other destinations without

Resharding

Amazon.com Confidential 46

Shard-1

Shard-2

Worker1

Worker2

LeaseKey LeaseOwner LeaseCounter checkpoint

Shard-1 Worker1 2 TRIM_HORIZON

Shard-2 Worker2 3 TRIM_HORIZON

Shard-0Shard-1

Shard-2

Page 47: Processing Big Data in Motion - IEEE Computer Society · Amazon Kinesis Firehose • Zero Admin: Capture and deliver streaming data into S3, Redshift, and other destinations without

500MM tweets/day = ~ 5,800 tweets/sec

2k/tweet is ~12MB/sec (~1TB/day)

$0.015/hour per shard, $0.014/million PUTS

Kinesis cost is $0.47/hour

Redshift cost is $0.850/hour (for a 2TB node)

Total: $1.32/hour

Cost &

Scale

Putting this into production

Page 48: Processing Big Data in Motion - IEEE Computer Society · Amazon Kinesis Firehose • Zero Admin: Capture and deliver streaming data into S3, Redshift, and other destinations without

Design Challenge(s)• Dynamic Resharding & Scale Out

• Enforcing Quotas (think proxy fleet with 1Ks servers)

• Distributed Denial of Service Attack (unintentional)

• Dynamic Load Balancing on Storage Servers

• Heterogeneous Workloads (tip of stream vs 7 day)

• Optimizing Fleet Utilization (proxy, control, data planes)

• Avoid Scaling Cliffs

• …

Page 49: Processing Big Data in Motion - IEEE Computer Society · Amazon Kinesis Firehose • Zero Admin: Capture and deliver streaming data into S3, Redshift, and other destinations without

Kinesis Streams: Streaming Data the AWS Way

• Pay as you go, no up front costs

• Elastically scalable

• Choose the service, or combination of

services, for your specific use cases.

• Real-time latencies

Deploy • Easy to provision, deploy, and manage

Page 50: Processing Big Data in Motion - IEEE Computer Society · Amazon Kinesis Firehose • Zero Admin: Capture and deliver streaming data into S3, Redshift, and other destinations without

Sushiro: Kaiten Sushi Restaurants380 stores stream data from sushi plate sensors and stream to Kinesis

Page 51: Processing Big Data in Motion - IEEE Computer Society · Amazon Kinesis Firehose • Zero Admin: Capture and deliver streaming data into S3, Redshift, and other destinations without

Sushiro: Kaiten Sushi Restaurants380 stores stream data from sushi plate sensors and stream to Kinesis

Page 52: Processing Big Data in Motion - IEEE Computer Society · Amazon Kinesis Firehose • Zero Admin: Capture and deliver streaming data into S3, Redshift, and other destinations without

Real-Time Streaming Data with Kinesis Streams

5 billion events/wk from

connected devices | IoT

17 PB of game data per

season | Entertainment

100 billion ad

impressions/day, 30 ms

response time | Ad Tech

100 GB/day click streams

250+ sites | Enterprise

50 billion ad

impressions/day sub-50

ms responses | Ad Tech

17 million events/day

| Technology

1 billion transactions per

day | Bitcoin

1 TB+/day game data

analyzed in real-time

| Gaming

Page 53: Processing Big Data in Motion - IEEE Computer Society · Amazon Kinesis Firehose • Zero Admin: Capture and deliver streaming data into S3, Redshift, and other destinations without

Streams provide a foundational

abstraction on which to build higher

level services

Page 54: Processing Big Data in Motion - IEEE Computer Society · Amazon Kinesis Firehose • Zero Admin: Capture and deliver streaming data into S3, Redshift, and other destinations without

Amazon Kinesis Firehose

• Zero Admin: Capture and deliver streaming data into S3, Redshift, and other

destinations without writing an application or managing infrastructure

• Direct-to-data store integration: Batch, compress, and encrypt streaming data

for delivery into S3, and other destinations in as little as 60 secs, set up in minutes

• Seamless elasticity: Seamlessly scales to match data throughput

Capture and submit

streaming data to Firehose

Firehose loads streaming data

continuously into S3 and Redshift

Analyze streaming data using your favorite

BI tools

Page 55: Processing Big Data in Motion - IEEE Computer Society · Amazon Kinesis Firehose • Zero Admin: Capture and deliver streaming data into S3, Redshift, and other destinations without

AW

S En

dp

oin

t

[Batch, Compress, Encrypt]

Data Sources

S3No Partition Keys

No Provisioning

End to End Elastic

Amazon Kinesis Firehose Fully Managed Service for Delivering Data Streams into AWS Destinations

Redshift

Page 56: Processing Big Data in Motion - IEEE Computer Society · Amazon Kinesis Firehose • Zero Admin: Capture and deliver streaming data into S3, Redshift, and other destinations without

AWS Event-Driven Computing

• Compute in response to recently occurring events

• Newly arrived/changed data

– Example: generate thumbnail for an image uploaded to S3

• Newly occurring system state changes

– Example: EC2 instance created

– Example: DynamoDB table deleted

– Example: Auto-scaling group membership change

– Example: RDS-HA primary fail-over occurs

Page 57: Processing Big Data in Motion - IEEE Computer Society · Amazon Kinesis Firehose • Zero Admin: Capture and deliver streaming data into S3, Redshift, and other destinations without

Event Driven Computing in AWS Today

SQS

S3 event notifications

Page 58: Processing Big Data in Motion - IEEE Computer Society · Amazon Kinesis Firehose • Zero Admin: Capture and deliver streaming data into S3, Redshift, and other destinations without

Event Driven Computing in AWS Today

DynamoDB Update Streams

Page 59: Processing Big Data in Motion - IEEE Computer Society · Amazon Kinesis Firehose • Zero Admin: Capture and deliver streaming data into S3, Redshift, and other destinations without

Cloudtrails event log for API calls

Event Driven Computing in AWS Today

S3

Customer 1

Customer 2

Customer 3

Page 60: Processing Big Data in Motion - IEEE Computer Society · Amazon Kinesis Firehose • Zero Admin: Capture and deliver streaming data into S3, Redshift, and other destinations without

Event Driven Computing in AWS Tomorrow

Single Event logs for asynchronous

service events

Customer 1

Customer 2

Customer 3

Page 61: Processing Big Data in Motion - IEEE Computer Society · Amazon Kinesis Firehose • Zero Admin: Capture and deliver streaming data into S3, Redshift, and other destinations without

Event Driven Computing in AWS Tomorrow

Event logs for asynchronous service events

Event logs from other data storage services

Customer 1

Customer 2

Customer 3

Page 62: Processing Big Data in Motion - IEEE Computer Society · Amazon Kinesis Firehose • Zero Admin: Capture and deliver streaming data into S3, Redshift, and other destinations without

A Unified Event Log Approach

KinesisSQS

(Unordered Events) (Ordered Events)

Page 63: Processing Big Data in Motion - IEEE Computer Society · Amazon Kinesis Firehose • Zero Admin: Capture and deliver streaming data into S3, Redshift, and other destinations without

K

I

N

E

S

I

S

Ordered Event Log Using Kinesis Streams

and the Kinesis Client Library

Shard mgmt

table

User

State

AWS EDC

ASG

Use of the KCL

Mostly writing business logic

EDC Rules Language

Simple CloudWatch actions in

response to matching rules.

Page 64: Processing Big Data in Motion - IEEE Computer Society · Amazon Kinesis Firehose • Zero Admin: Capture and deliver streaming data into S3, Redshift, and other destinations without

Event Logs for Customers’ Services

Vision: customers’ services and applications leverage the AWS

event log infrastructure

Cust. 2593

Cust. 7302

Cust. 3826

Widget A

Widget B

Widget C

www.widget.com

Per-customer control plane events sent

to customer’s unified control plane log

Per-entity data plane event logs

Page 65: Processing Big Data in Motion - IEEE Computer Society · Amazon Kinesis Firehose • Zero Admin: Capture and deliver streaming data into S3, Redshift, and other destinations without

Streaming data is highly prevalent and relevant;

Stream data processing is on the rise;

A key part of business critical workflows today, a

powerful abstraction for building a new class of

applications & data intensive services tomorrow.

A rich area for distributed systems, programming

model, IoT, and new service(s) research.

Closing Thoughts

Page 66: Processing Big Data in Motion - IEEE Computer Society · Amazon Kinesis Firehose • Zero Admin: Capture and deliver streaming data into S3, Redshift, and other destinations without

Questions