aws real-time event processing

40
April 21, 2015 Seattle AWS Big Data Platform

Upload: amazon-web-services

Post on 16-Jul-2015

2.178 views

Category:

Technology


3 download

TRANSCRIPT

Page 1: AWS Real-Time Event Processing

April 21, 2015

Seattle

AWS Big Data Platform

Page 2: AWS Real-Time Event Processing

Agenda Overview

10:00 AM Registration

10:30 AM Introduction to Big Data @ AWS

12:00 PM Lunch + Registration for Technical Sessions

12:30 PM Use Case Technical Deep Dive Sessions •  Data Collection and Storage

•  Real-time Event Processing

•  Analytics

Page 3: AWS Real-Time Event Processing

Collect   Process   Analyze  Store  

Data Collection and Storage

Data Processing

Data Analysis

Event Processing

Primitive Patterns

S3 Kinesis DynamoDB RDS  (Aurora)

MySQL

AWS  Lambda

KCL  Apps EMR Redshi?

Machine Learning

Page 4: AWS Real-Time Event Processing

Real-Time Event Processing

•  Examples:

Page 5: AWS Real-Time Event Processing

Processing framework

Page 6: AWS Real-Time Event Processing

Two main processing patterns

Page 7: AWS Real-Time Event Processing

Real-time event processing frameworks

Kinesis Client Library

AWS Lambda

Page 8: AWS Real-Time Event Processing

Amazon KCL

Page 9: AWS Real-Time Event Processing

Shard 1

Shard 2

Shard 3

Shard n

Shard 4

KCL Worker 1

KCL Worker 2

EC2 Instance

KCL Worker 3

KCL Worker 4

EC2 Instance

KCL Worker n

EC2 Instance

Kinesis

Kinesis Client Library (KCL)

Page 10: AWS Real-Time Event Processing

KCL Design Components

KCL restarts the processing of the shard at the last known processed record if a worker fails

Page 11: AWS Real-Time Event Processing

Processing with Kinesis Client Library •  Connects to the stream and enumerates the shards

•  Instantiates a record processor for every shard it manages

•  Checkpoints processed records in Amazon DynamoDB

•  Balances shard-worker associations when the worker instance count changes

•  Balances shard-worker associations when shards are split or merged

Page 12: AWS Real-Time Event Processing

Best practices for KCL applications

Page 13: AWS Real-Time Event Processing

Amazon Kinesis Connector

S3 Dynamo DB Redshift

Kinesis

Page 14: AWS Real-Time Event Processing

Amazon Kinesis connector application

Connector Pipeline

Transformed

Filtered

Buffered

Emitted

Incoming Records

Outgoing to Endpoints

Page 15: AWS Real-Time Event Processing

Real-time Monitoring dashboard with KCL

Amazon Kinesis

Kinesis-enabled Application

Producer on Amazon EC2

Amazon DynamoDB

Dashboard on Amazon EC2

2 sec sliding-window analysis over streaming clickstream data

Page 16: AWS Real-Time Event Processing

Monitoring Demo Kinesis Client Library

Page 17: AWS Real-Time Event Processing

AWS Lambda

Page 18: AWS Real-Time Event Processing

Event-Driven Compute in the Cloud

Page 19: AWS Real-Time Event Processing

No Infrastructure to Manage

Page 20: AWS Real-Time Event Processing

Automatic Scaling

Page 21: AWS Real-Time Event Processing

Bring your own code

Page 22: AWS Real-Time Event Processing

Fine-grained pricing

Free Tier 1M requests and 400,000 GB-s of compute.

Every month, every customer.

Never pay for idle.

Page 23: AWS Real-Time Event Processing

Data Triggers: Amazon S3

Amazon S3 Bucket Events AWS Lambda

Original image Thumbnailed image

1

2

3

Page 24: AWS Real-Time Event Processing

Data Triggers: Amazon DynamoDB

AWS Lambda Amazon DynamoDB Table and Stream

Send Amazon SNS notifications

Update another table

Page 25: AWS Real-Time Event Processing

Calling Lambda Functions

Page 26: AWS Real-Time Event Processing

Writing Lambda Functions

Page 27: AWS Real-Time Event Processing

How can you use these features?

“I want to send customized messages to

different users”

SNS + Lambda

“I want to send an offer when a user runs out of lives in

my game”

Amazon Cognito + Lambda + SNS

“I want to transform the

records in a click stream or an IoT

data stream”

Amazon Kinesis + Lambda

Page 28: AWS Real-Time Event Processing

Real-Time Alerting Demo AWS Lambda

Page 29: AWS Real-Time Event Processing

Stream Processing Apache Spark Apache Storm Amazon EMR

Page 30: AWS Real-Time Event Processing

Read Data Directly into Hive, Pig, Streaming and Cascading Real time sources into Batch Oriented Systems Multi-Application Support & Check-pointing

Amazon EMR integration

Page 31: AWS Real-Time Event Processing

CREATE  TABLE  call_data_records  (      start_time  bigint,      end_time  bigint,      phone_number  STRING,      carrier  STRING,      recorded_duration  bigint,      calculated_duration  bigint,      lat  double,      long  double  )  ROW  FORMAT  DELIMITED  FIELDS  TERMINATED  BY  ","  STORED  BY  'com.amazon.emr.kinesis.hive.KinesisStorageHandler'  TBLPROPERTIES("kinesis.stream.name"=”MyTestStream");  

Amazon EMR integration: Hive

Page 32: AWS Real-Time Event Processing

DStream

RDD@T1 RDD@T2

Messages

Receiver

Spark Streaming – Basic concepts

http://spark.apache.org/docs/latest/streaming-kinesis-integration.html

Page 33: AWS Real-Time Event Processing

Spark Streaming

Page 34: AWS Real-Time Event Processing

Processing Amazon Kinesis streams

Amazon Kinesis

Spark-Streaming

Page 35: AWS Real-Time Event Processing

Weblog Demo Kinesis + Spark Streaming

Page 36: AWS Real-Time Event Processing

Storm

Page 37: AWS Real-Time Event Processing

Apache Storm: Basic Concepts

https://github.com/awslabs/kinesis-storm-spout

Page 38: AWS Real-Time Event Processing

Launches Workers

Storm architecture

Master Node

Cluster Coordination

Worker Processes

Worker

Nimbus

Zookeeper

Zookeeper

Zookeeper

Supervisor

Supervisor

Supervisor

Supervisor

Worker

Worker

Worker

Page 39: AWS Real-Time Event Processing

Real-time: Event-based processing

Kinesis Storm Spout

Producer Amazon Kinesis

Apache Storm

ElastiCache (Redis) Node.js Client

(D3)

http://blogs.aws.amazon.com/bigdata/post/Tx36LYSCY2R0A9B/Implement-a-Real-time-Sliding-Window-Application-Using-Amazon-Kinesis-and-Apache

Page 40: AWS Real-Time Event Processing

Thank You