Download - AWS Real-Time Event Processing
April 21, 2015
Seattle
AWS Big Data Platform
Agenda Overview
10:00 AM Registration
10:30 AM Introduction to Big Data @ AWS
12:00 PM Lunch + Registration for Technical Sessions
12:30 PM Use Case Technical Deep Dive Sessions • Data Collection and Storage
• Real-time Event Processing
• Analytics
Collect Process Analyze Store
Data Collection and Storage
Data Processing
Data Analysis
Event Processing
Primitive Patterns
S3 Kinesis DynamoDB RDS (Aurora)
MySQL
AWS Lambda
KCL Apps EMR Redshi?
Machine Learning
Real-Time Event Processing
• Examples:
Processing framework
Two main processing patterns
Real-time event processing frameworks
Kinesis Client Library
AWS Lambda
Amazon KCL
Shard 1
Shard 2
Shard 3
Shard n
Shard 4
KCL Worker 1
KCL Worker 2
EC2 Instance
KCL Worker 3
KCL Worker 4
EC2 Instance
KCL Worker n
EC2 Instance
Kinesis
Kinesis Client Library (KCL)
KCL Design Components
KCL restarts the processing of the shard at the last known processed record if a worker fails
Processing with Kinesis Client Library • Connects to the stream and enumerates the shards
• Instantiates a record processor for every shard it manages
• Checkpoints processed records in Amazon DynamoDB
• Balances shard-worker associations when the worker instance count changes
• Balances shard-worker associations when shards are split or merged
Best practices for KCL applications
Amazon Kinesis Connector
S3 Dynamo DB Redshift
Kinesis
Amazon Kinesis connector application
Connector Pipeline
Transformed
Filtered
Buffered
Emitted
Incoming Records
Outgoing to Endpoints
Real-time Monitoring dashboard with KCL
Amazon Kinesis
Kinesis-enabled Application
Producer on Amazon EC2
Amazon DynamoDB
Dashboard on Amazon EC2
2 sec sliding-window analysis over streaming clickstream data
Monitoring Demo Kinesis Client Library
AWS Lambda
Event-Driven Compute in the Cloud
No Infrastructure to Manage
Automatic Scaling
Bring your own code
Fine-grained pricing
Free Tier 1M requests and 400,000 GB-s of compute.
Every month, every customer.
Never pay for idle.
Data Triggers: Amazon S3
Amazon S3 Bucket Events AWS Lambda
Original image Thumbnailed image
1
2
3
Data Triggers: Amazon DynamoDB
AWS Lambda Amazon DynamoDB Table and Stream
Send Amazon SNS notifications
Update another table
Calling Lambda Functions
Writing Lambda Functions
How can you use these features?
“I want to send customized messages to
different users”
SNS + Lambda
“I want to send an offer when a user runs out of lives in
my game”
Amazon Cognito + Lambda + SNS
“I want to transform the
records in a click stream or an IoT
data stream”
Amazon Kinesis + Lambda
Real-Time Alerting Demo AWS Lambda
Stream Processing Apache Spark Apache Storm Amazon EMR
Read Data Directly into Hive, Pig, Streaming and Cascading Real time sources into Batch Oriented Systems Multi-Application Support & Check-pointing
Amazon EMR integration
CREATE TABLE call_data_records ( start_time bigint, end_time bigint, phone_number STRING, carrier STRING, recorded_duration bigint, calculated_duration bigint, lat double, long double ) ROW FORMAT DELIMITED FIELDS TERMINATED BY "," STORED BY 'com.amazon.emr.kinesis.hive.KinesisStorageHandler' TBLPROPERTIES("kinesis.stream.name"=”MyTestStream");
Amazon EMR integration: Hive
DStream
RDD@T1 RDD@T2
Messages
Receiver
Spark Streaming – Basic concepts
http://spark.apache.org/docs/latest/streaming-kinesis-integration.html
Spark Streaming
Processing Amazon Kinesis streams
Amazon Kinesis
Spark-Streaming
Weblog Demo Kinesis + Spark Streaming
Storm
Apache Storm: Basic Concepts
https://github.com/awslabs/kinesis-storm-spout
Launches Workers
Storm architecture
Master Node
Cluster Coordination
Worker Processes
Worker
Nimbus
Zookeeper
Zookeeper
Zookeeper
Supervisor
Supervisor
Supervisor
Supervisor
Worker
Worker
Worker
Real-time: Event-based processing
Kinesis Storm Spout
Producer Amazon Kinesis
Apache Storm
ElastiCache (Redis) Node.js Client
(D3)
http://blogs.aws.amazon.com/bigdata/post/Tx36LYSCY2R0A9B/Implement-a-Real-time-Sliding-Window-Application-Using-Amazon-Kinesis-and-Apache
Thank You