aws real-time event processing

Post on 16-Jul-2015

2.178 Views

Category:

Technology

3 Downloads

Preview:

Click to see full reader

TRANSCRIPT

April 21, 2015

Seattle

AWS Big Data Platform

Agenda Overview

10:00 AM Registration

10:30 AM Introduction to Big Data @ AWS

12:00 PM Lunch + Registration for Technical Sessions

12:30 PM Use Case Technical Deep Dive Sessions •  Data Collection and Storage

•  Real-time Event Processing

•  Analytics

Collect   Process   Analyze  Store  

Data Collection and Storage

Data Processing

Data Analysis

Event Processing

Primitive Patterns

S3 Kinesis DynamoDB RDS  (Aurora)

MySQL

AWS  Lambda

KCL  Apps EMR Redshi?

Machine Learning

Real-Time Event Processing

•  Examples:

Processing framework

Two main processing patterns

Real-time event processing frameworks

Kinesis Client Library

AWS Lambda

Amazon KCL

Shard 1

Shard 2

Shard 3

Shard n

Shard 4

KCL Worker 1

KCL Worker 2

EC2 Instance

KCL Worker 3

KCL Worker 4

EC2 Instance

KCL Worker n

EC2 Instance

Kinesis

Kinesis Client Library (KCL)

KCL Design Components

KCL restarts the processing of the shard at the last known processed record if a worker fails

Processing with Kinesis Client Library •  Connects to the stream and enumerates the shards

•  Instantiates a record processor for every shard it manages

•  Checkpoints processed records in Amazon DynamoDB

•  Balances shard-worker associations when the worker instance count changes

•  Balances shard-worker associations when shards are split or merged

Best practices for KCL applications

Amazon Kinesis Connector

S3 Dynamo DB Redshift

Kinesis

Amazon Kinesis connector application

Connector Pipeline

Transformed

Filtered

Buffered

Emitted

Incoming Records

Outgoing to Endpoints

Real-time Monitoring dashboard with KCL

Amazon Kinesis

Kinesis-enabled Application

Producer on Amazon EC2

Amazon DynamoDB

Dashboard on Amazon EC2

2 sec sliding-window analysis over streaming clickstream data

Monitoring Demo Kinesis Client Library

AWS Lambda

Event-Driven Compute in the Cloud

No Infrastructure to Manage

Automatic Scaling

Bring your own code

Fine-grained pricing

Free Tier 1M requests and 400,000 GB-s of compute.

Every month, every customer.

Never pay for idle.

Data Triggers: Amazon S3

Amazon S3 Bucket Events AWS Lambda

Original image Thumbnailed image

1

2

3

Data Triggers: Amazon DynamoDB

AWS Lambda Amazon DynamoDB Table and Stream

Send Amazon SNS notifications

Update another table

Calling Lambda Functions

Writing Lambda Functions

How can you use these features?

“I want to send customized messages to

different users”

SNS + Lambda

“I want to send an offer when a user runs out of lives in

my game”

Amazon Cognito + Lambda + SNS

“I want to transform the

records in a click stream or an IoT

data stream”

Amazon Kinesis + Lambda

Real-Time Alerting Demo AWS Lambda

Stream Processing Apache Spark Apache Storm Amazon EMR

Read Data Directly into Hive, Pig, Streaming and Cascading Real time sources into Batch Oriented Systems Multi-Application Support & Check-pointing

Amazon EMR integration

CREATE  TABLE  call_data_records  (      start_time  bigint,      end_time  bigint,      phone_number  STRING,      carrier  STRING,      recorded_duration  bigint,      calculated_duration  bigint,      lat  double,      long  double  )  ROW  FORMAT  DELIMITED  FIELDS  TERMINATED  BY  ","  STORED  BY  'com.amazon.emr.kinesis.hive.KinesisStorageHandler'  TBLPROPERTIES("kinesis.stream.name"=”MyTestStream");  

Amazon EMR integration: Hive

DStream

RDD@T1 RDD@T2

Messages

Receiver

Spark Streaming – Basic concepts

http://spark.apache.org/docs/latest/streaming-kinesis-integration.html

Spark Streaming

Processing Amazon Kinesis streams

Amazon Kinesis

Spark-Streaming

Weblog Demo Kinesis + Spark Streaming

Storm

Apache Storm: Basic Concepts

https://github.com/awslabs/kinesis-storm-spout

Launches Workers

Storm architecture

Master Node

Cluster Coordination

Worker Processes

Worker

Nimbus

Zookeeper

Zookeeper

Zookeeper

Supervisor

Supervisor

Supervisor

Supervisor

Worker

Worker

Worker

Real-time: Event-based processing

Kinesis Storm Spout

Producer Amazon Kinesis

Apache Storm

ElastiCache (Redis) Node.js Client

(D3)

http://blogs.aws.amazon.com/bigdata/post/Tx36LYSCY2R0A9B/Implement-a-Real-time-Sliding-Window-Application-Using-Amazon-Kinesis-and-Apache

Thank You

top related