bdt201 aws data pipeline - aws re: invent 2012

Post on 01-Jul-2015

3.944 Views

Category:

Documents

3 Downloads

Preview:

Click to see full reader

DESCRIPTION

In this session, we'll review the features and architecture of the new AWS Data Pipeline service and explain how you can use it to better manage your data-driven workloads. We'll then go over a few examples of setting up and provisioning a pipeline in the system.

TRANSCRIPT

Amazon S3

Amazon

DynamoDB

Amazon

RDS

Amazon

Redshift

On

Premise

HDFS

(Amazon EMR)

Amazon DynamoDB Amazon S3

Amazon S3

Amazon

DynamoDB

Amazon

RDS

Amazon

Redshift

On

Premise

HDFS

(Amazon EMR)

Amazon S3

Amazon

DynamoDB

Amazon

RDS

Amazon

Redshift

On

Premise

HDFS

(Amazon EMR)

Amazon S3

Amazon

DynamoDB

Amazon

RDS

Amazon

Redshift

On

Premise

HDFS

(Amazon EMR)

Amazon S3

Amazon

DynamoDB

Amazon

RDS

Amazon

Redshift

On

Premise

HDFS

(Amazon EMR)

Amazon S3

Amazon

DynamoDB

Amazon

RDS

Amazon

Redshift

On

Premise

HDFS

(Amazon EMR)

Input Datanode

Activity

[Output Datanode]

Input Datanode with precondition check

Activity with failure & delay notifications

Ouput Datanode

Compute Resources

Data Data

Data Stores Data Stores

Start

Interval

[End]

Noon Today

1 hour

…..

12-1pm

1-2pm

2-3pm

X

…..

12-1pm

1-2pm

2-3pm

1 day X

X

Hourly

Daily

Weekly

Monthly

Yearly

Quarterly

S3 logs (hourly) Geolocation data

Per-geography

usage computation

(hourly)

Redshift

results

S3 logs (hourly)

Precondition: files exist

Geolocation data

Precondition: ./geo_available

Per-geography

usage computation

(hourly)

Redshift

results

Dynamo

event data RDS

demographics

Hive-based

analysis (hourly)

Redshift

results

Hourly click updates Hourly event analysis

Daily reporting SQL

Amazon S3

logs

Custom

Precondition

EMR usage-by-geo job

Amazon EC2

report generation

Amazon

DynamoDB

event data

Amazon RDS

demographics

Amazon Redshift

DW table

Amazon

Redshift

DW table

Hive

script

Amazon S3

logs

Custom

Precondition

EMR usage-by-geo job

Amazon EC2

report generation

Amazon

DynamoDB

event data

Amazon RDS

demographics

Amazon Redshift

DW table

Amazon

Redshift

DW table

Hive

script

We Manage You Manage

EC2

Instances

EMR Clusters On Premise Resources

EC2

Instances

EMR Clusters

{

"objects" : [

{

"name" : “My Copy”,

"type" : “Copy Action”,

“input”: {“ref” : “My RDS Data”},

“output”: {“ref” : “My S3 Data”},

”runsOn” : {“ref”: “My Instance”},

"schedule" : { "ref" : “My Schedule" } },

{

"name" : ”My Instance”,

"type" : ”EC2Instance”,

"instanceType" : "m1.small”,

"schedule" : { "ref” : “My Schedule" } },

…..

}

On AWS On Premise

High

Frequency

$1/month $2.50/month

Low Frequency $.60/month $1.50/month

We are sincerely eager to

hear your feedback on this

presentation and on re:Invent.

Please fill out an evaluation

form when you have a

chance.

top related