(adv303) mediamath’s data revolution with amazon kinesis and amazon emr | aws re:invent 2014

30
November 13, 2014 | Las Vegas, NV Eddie Fagin, VP Engineering, MediaMath Ian Hummel, Sr. Director Engineering, MediaMath Adi Krishnan, Sr. PM Amazon Kinesis

Upload: amazon-web-services

Post on 26-Jul-2015

749 views

Category:

Technology


1 download

TRANSCRIPT

November 13, 2014 | Las Vegas, NV

Eddie Fagin, VP Engineering, MediaMath

Ian Hummel, Sr. Director Engineering, MediaMath

Adi Krishnan, Sr. PM Amazon Kinesis

Canonical Data Flow With Amazon Kinesis

• Query Engine Approach

• Pre-computations such as

indices and dimensional views

improve performance

• Historical, structured data

• Amazon Redshift

• HIVE/SQL-on-Hadoop/ M-R/

Spark

• Batch programs, or other

abstractions breaking down

into MR style computations

• Historical, Semi-structured

data

• Amazon EMR

• Custom computations of relative

simple complexity

• Continuous Processing – filters,

sliding windows, aggregates – on

infinite data streams

• Semi/Structured data, generated

continuously in real-time

• Amazon Kinesis

Data Warehousing Hadoop Style Processing Stream Processing

Real-time processing

High throughput; elastic

Easy to use

S3, Redshift, DynamoDB Integrations

Amazon

Kinesis

Amazon Kinesis

Amazon Web Services

AZ AZ AZ

Durable, highly consistent storage replicates dataacross three data centers (availability zones)

Aggregate andarchive to S3

Millions ofsources producing100s of terabytes

per hour

FrontEnd

AuthenticationAuthorization

Ordered streamof events supportsmultiple readers

Real-timedashboardsand alarms

Machine learningalgorithms or

sliding windowanalytics

Aggregate analysisin Hadoop or adata warehouse

Inexpensive: $0.028 per million puts

Hadoop/HDFS clusters

Hive, Impala, MapReduce

Easy to use; fully managed

On-demand and spot pricing

Amazon EMR

Warehouse

(analytics,

decisioning,

optimization,

archive)

Bidder

Data (wins)

Site Events

3rd Party

Segments

Firehose

(Kinesis)

Decisioning

&

Optimization

Real-time

Analytics

Archive

S3

Bidder

Data (wins)

Site Events

3rd Party

Segments

App

(metadata)

Data mart

(Oracle/

Postgres)

Qubole

Redshift

Hadoop

Scripts

Attribution

BiddersBidders

Bidders S3

S3

S3

S3

EMREMR

EMR

Recurring

partition

jobs/process

jobs

Partners/clients/tools/

internal services

PixelsPixels

Pixels

Realtime

Firehose

Netezza

http://bit.ly/awsevals