introduction to aws kinesis

27
Wellington AWS Meetup Introduction to Kinesis

Upload: steven-ensslen

Post on 13-Apr-2017

391 views

Category:

Data & Analytics


4 download

TRANSCRIPT

Page 1: Introduction to AWS Kinesis

Wellington AWS MeetupIntroduction to

Kinesis

Page 2: Introduction to AWS Kinesis

Who Am I?

• Team Leader/Architect in Business Intelligence/databases• 17 years experience.• MCSE BI, OCP DBA, MCDBA• AWS-ASA-2505

Page 3: Introduction to AWS Kinesis

Who are OptimalBI?

• Wellington based BI Consultancy• “Making Information Visible”

Page 4: Introduction to AWS Kinesis

Talk Outline1. Why do we need Kinesis?2. What is Kinesis?3. Demo4. How does it fit into an

existing data warehouse5. When to use Kinesis

Page 5: Introduction to AWS Kinesis

Big Data1. Volume2. Velocity3. Variety

Page 6: Introduction to AWS Kinesis

Kinesis is an answer to Velocity

Machine learning looks simple: Data is collected,magic happens,and we output it to our users

Page 7: Introduction to AWS Kinesis

Traditional Business Intelligence

Data Store Data Warehouse

Query Tool

• Periodic, Batch Extract-Transform-Load.

• Persistent data source• High latency

Page 8: Introduction to AWS Kinesis

Internet of Things• Large number of sensors.• Self registering • Pushing data• May or may not retain any

historic data.= Only one chance to get data

Page 9: Introduction to AWS Kinesis

Batch ETL• Data needs to wait

somewhere between loads.• If data is only loaded six hours

per day, then four-times as much hardware is needed.

• Latency of hours

Page 10: Introduction to AWS Kinesis

DIY Streaming ETL

“Realtime” “ETL” cluster

Page 11: Introduction to AWS Kinesis

DIY Streaming ETL 2.0

Add a queue

Page 12: Introduction to AWS Kinesis

DIY Streaming ETL 3+Cluster more

Getting messy, still problems

Page 13: Introduction to AWS Kinesis

Problems with DIY Streaming ETL1. Message queues deliver once. If you want

to fan out to many readers the application in front needs to know about each of them and queue the same message repeatedly.

2. Order of message delivery is not guaranteed.

3. If the program reading data crashes partway through aggregating, messages are lost.

Page 14: Introduction to AWS Kinesis

What is Kinesis• Kinesis is like a message queue,

but more scalable and with multiple readers of each message.

• Kinesis is like a NOSQL database, but with message delivery and daily purging.

• Kinesis is like an Enterprise Service Bus focused on Analytics.

• For a limited, if common, use case Kinesis is the best of all.

Page 15: Introduction to AWS Kinesis

Kinesis Qualities• Scalable• Elastic• Durable• Fault Tolerant• Replayable

Page 16: Introduction to AWS Kinesis

Kinesis Components• Each Queue/DB is called a Stream• Each stream scales by adding Shards• Each Shard provides 1 MB/s in and

2MB/s out• Shards are only $0.44/day, so autoscale

them to give some safety margin• Also pay about 2 cents per million puts

Page 17: Introduction to AWS Kinesis

Kinesis Client Library• Kinesis expects you to write bespoke

producer and consumer programs• KCL provides automatic multi-threading

with one worker thread per shard.• Similar to Hadoop, framework handles

the lifting the bespoke program does the “reduce”

• You have to autoscale the EC2 groups.

Page 18: Introduction to AWS Kinesis

Kinesis Application

instancesAuto Scaling group

instancesAuto Scaling group

instancesAuto Scaling group

Amazon Kinesis

Page 19: Introduction to AWS Kinesis

Existing Kinesis ConnectorsHTTP POST

AWS SDK

Log4j

Flume

Fluentd

Get* APIs

Amazon Kinesis Client Library +Connector Library

Apache Storm

Amazon Elastic MapReduce

Sending Reading

Page 20: Introduction to AWS Kinesis

http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-kinesis.html

Standard AWS Demo Script

1. HIVE already running in EMR2. Create Kinesis Stream3. Start Producer4. Configure HIVE as consumer

Page 21: Introduction to AWS Kinesis

Integrating Kinesis into an existing Data Warehouse

1. Access data in near real-time2. Facilitate more-traditional ETL3. Archive

Page 22: Introduction to AWS Kinesis
Page 23: Introduction to AWS Kinesis

Near Real-time Data1. Analyze individual transactions2. Send alerts for both individual

transactions and trends3. Aggregate to feed a

live dashboard

Page 24: Introduction to AWS Kinesis

Facilitate Traditional ETL1. Write lightly transformed data to

S3 to batch COPY into Redshift 2. Pre-compute aggregates, then

write them to S33. Provide a durable, replayable

buffer in front of traditional ETL tools.

Page 25: Introduction to AWS Kinesis

Archive1. In addition to using your data,

Kinesis makes it easy to log the full incoming data set to S3.

2. An object store makes more sense for write-once/read-never data than a database.

Page 26: Introduction to AWS Kinesis

When to use Kinesis1. Internet of Things (IOT)2. Use for near-real-time

access to data.3. Have more than one

consumer for each piece of data.

Page 27: Introduction to AWS Kinesis

Thanks1. Our sponsors: • API Talent• AWS • OptimalPeople

2. Bronwyn and Wyn3. AWS for images on slides