slashing big data complexity: how comcast x1 syndicates streaming analytics with amazon kinesis

51
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. August 11, 2016 Slashing Big Data Complexity: How Comcast X1 Powers Analytics with Amazon Kinesis Charlie Hammell, Solutions Architect, Comcast Liam Morrison, Solutions Architect, AWS

Upload: amazon-web-services

Post on 11-Jan-2017

784 views

Category:

Technology


2 download

TRANSCRIPT

Page 1: Slashing Big Data Complexity: How Comcast X1 Syndicates Streaming Analytics with Amazon Kinesis

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

August 11, 2016

Slashing Big Data Complexity:

How Comcast X1 Powers Analytics with

Amazon KinesisCharlie Hammell, Solutions Architect, Comcast

Liam Morrison, Solutions Architect, AWS

Page 2: Slashing Big Data Complexity: How Comcast X1 Syndicates Streaming Analytics with Amazon Kinesis

What to expect from this session

• Streaming scenarios

• Amazon Kinesis overview

• Comcast X1 Platform

• Challenges with streaming data

• Schema management

Page 3: Slashing Big Data Complexity: How Comcast X1 Syndicates Streaming Analytics with Amazon Kinesis

Scenarios Accelerated Ingest-

Transform-Load

Continual Metrics

Generation

Responsive Data

Analysis

Ad/Marketing

Tech

Publisher, bidder data

aggregation

Advertising metrics like

coverage, yield, conversion

Analytics on user

engagement with ads,

optimized bid/buy engines

IoT Sensor, device telemetry

data ingestion

IT operational metrics

dashboards

Sensor operational

intelligence, alerts, and

notifications

Gaming Online customer

engagement data

aggregation

Consumer engagement

metrics for level success,

transition rates, CTR

Clickstream analytics,

leaderboard generation,

player-skill match engines

Consumer

Engagement

Online customer

engagement data

aggregation

Consumer engagement

metrics like page views,

CTR

Clickstream analytics,

recommendation engines

Streaming data scenarios across segments

1 23

Page 4: Slashing Big Data Complexity: How Comcast X1 Syndicates Streaming Analytics with Amazon Kinesis

Amazon Kinesis makes it easy to work with

real-time streaming data

Amazon Kinesis

Streams

• For Technical Developers

• Collect and stream data

for ordered, replayable,

real-time processing

Amazon Kinesis

Firehose

• For all developers, data

scientists

• Easily load massive

volumes of streaming data

into Amazon S3, Amazon

Redshift, or Amazon

Elasticsearch Service

Amazon Kinesis

Analytics

• For all developers, data

scientists

• Easily analyze data streams

using standard SQL queries

Page 5: Slashing Big Data Complexity: How Comcast X1 Syndicates Streaming Analytics with Amazon Kinesis

Amazon Kinesis Streams

Easy administration: Simply create a new stream and set the desired level of capacity with

shards. Scale to match your data throughput rate and volume.

Build real-time applications: Perform continual processing on streaming big data using

Amazon Kinesis Client Library (KCL), Apache Spark/Storm, AWS Lambda, and more.

Low cost: Cost-efficient for workloads of any scale.

Page 6: Slashing Big Data Complexity: How Comcast X1 Syndicates Streaming Analytics with Amazon Kinesis

Sending & reading data from Kinesis Streams

AWS SDK

LOG4J

Flume

Fluentd

Get* APIs

Kinesis Client Library

+

Connector Library

Apache

Storm

Amazon EMR

Sending Consuming

AWS Mobile SDK

Amazon Kinesis

Producer Library

AWS Lambda

Apache

Spark

Kinesis Analytics

Amazon Kinesis

Agent

Page 7: Slashing Big Data Complexity: How Comcast X1 Syndicates Streaming Analytics with Amazon Kinesis

Amazon Kinesis Firehose

Zero administration: Capture and deliver streaming data into Amazon S3, Amazon

Redshift, and other destinations without writing an application or managing infrastructure.

Direct-to-data-store integration: Batch, compress, and encrypt streaming data for

delivery into data destinations in as little as 60 seconds using simple configurations.

Seamless elasticity: Seamlessly scale to match data throughput without intervention.

Capture and submit

streaming data to Firehose

Firehose loads streaming data

continuously into Amazon S3

and Amazon Redshift

Analyze streaming data using

your favorite BI tools

Page 8: Slashing Big Data Complexity: How Comcast X1 Syndicates Streaming Analytics with Amazon Kinesis

Scenarios Accelerated Ingest-

Transform-Load

Continual Metrics

Generation

Responsive Data

Analysis

Ad/Marketing

Tech

Publisher, bidder data

aggregation

Advertising metrics like

coverage, yield, conversion

Analytics on user

engagement with ads,

optimized bid/buy engines

IoT Sensor, device telemetry

data ingestion

IT operational metrics

dashboards

Sensor operational

intelligence, alerts, and

notifications

Gaming Online customer

engagement data

aggregation

Consumer engagement

metrics for level success,

transition rates, CTR

Clickstream analytics,

leaderboard generation,

player-skill match engines

Consumer

Engagement

Online customer

engagement data

aggregation

Consumer engagement

metrics like page views,

CTR

Clickstream analytics,

recommendation engines

Streaming data scenarios across segments

1 23

Page 9: Slashing Big Data Complexity: How Comcast X1 Syndicates Streaming Analytics with Amazon Kinesis

AWS IoT

Amazon

S3

Amazon

Redshift

Amazon

Kinesis

Firehose

Amazon

Elasticsearch

Service

AWS SDK

AWS Mobile SDK

Kinesis Agent

Sending & reading data from Kinesis Firehose

Page 10: Slashing Big Data Complexity: How Comcast X1 Syndicates Streaming Analytics with Amazon Kinesis

Amazon Kinesis AnalyticsAnalyze data streams continuously with standard SQL

Apply SQL on streams: Easily connect to data streams and apply existing SQL skills.

Build real-time applications: Perform continual processing on streaming big data with sub-second processing latencies.

Scale elastically: Elastically scales to match data throughput without operator intervention.

New!

Connect to Kinesis streams,

Firehose delivery streamsRun standard SQL queries

against data streams

Analytics can send processed data to

analytics tools so you can create alerts

and respond in real time

Page 11: Slashing Big Data Complexity: How Comcast X1 Syndicates Streaming Analytics with Amazon Kinesis

Use SQL to build real-time applications

Easily write SQL code to process streaming data

Connect to streaming source

Continuously deliver SQL results

Page 12: Slashing Big Data Complexity: How Comcast X1 Syndicates Streaming Analytics with Amazon Kinesis

Amazon Kinesis at ComcastCharlie Hammell, Solutions Architect, Comcast

Page 14: Slashing Big Data Complexity: How Comcast X1 Syndicates Streaming Analytics with Amazon Kinesis

The challenge

• Comcast now syndicates the X1 Platform to other video

providers

• Syndication includes providing telemetry data (data

related to performance and reliability), anonymized and

secured, to improve the X1 experience

• Stream quality status

• VOD usage

• Error rates and status

• Solution: The data bus

Page 15: Slashing Big Data Complexity: How Comcast X1 Syndicates Streaming Analytics with Amazon Kinesis

Delivering X1 telemetry to partners

Fairmount

X1 Platform

· STB telemetry

· Mobile player actions

· IP VOD player actions

· Screen errors

Service 1

Service 2

Service 3

Partner 1

Partner 2

Partner 3

Page 16: Slashing Big Data Complexity: How Comcast X1 Syndicates Streaming Analytics with Amazon Kinesis

The Data Bus

Page 17: Slashing Big Data Complexity: How Comcast X1 Syndicates Streaming Analytics with Amazon Kinesis

Producer1

Producer2

Producer3

Consumer1

Consumer2

Consumer3

Total connections: 18

Why a data bus?

Page 18: Slashing Big Data Complexity: How Comcast X1 Syndicates Streaming Analytics with Amazon Kinesis

Why a data bus?

Producer1

Producer2

Producer3

Consumer1

Consumer2

Consumer3

Total connections: 24

Consumer4

Page 19: Slashing Big Data Complexity: How Comcast X1 Syndicates Streaming Analytics with Amazon Kinesis

Producer1

Producer2

Producer3

Consumer1

Consumer2

Consumer3

Total connections: 12

Why a data bus?

Page 20: Slashing Big Data Complexity: How Comcast X1 Syndicates Streaming Analytics with Amazon Kinesis

Why a data bus?

Consumer4

Producer1

Producer2

Producer3

Consumer1

Consumer2

Consumer3

Total connections: 14

Page 21: Slashing Big Data Complexity: How Comcast X1 Syndicates Streaming Analytics with Amazon Kinesis

Remember: Syndication includes providing telemetry

data, anonymized and secured, to cable partners

• The bus decouples publishers and subscribers

• The bus has extensible features

• The bus has topics

• The bus is reusable

Characteristics of a data bus

Page 22: Slashing Big Data Complexity: How Comcast X1 Syndicates Streaming Analytics with Amazon Kinesis

Where we started

X1 Services

1

2

X1 reporting and

analytics (Tableau,

other apps)

Partners

Partner 1

Partner 2

Apache Storm

Page 23: Slashing Big Data Complexity: How Comcast X1 Syndicates Streaming Analytics with Amazon Kinesis

• Mean Time Between Failure: two weeks

• Mean Time To Recovery: four hours

• Impact: affected syndication subscribers, extensive

overtime effort for staff

• Root causes: data re-balancing, infrastructure issues,

Zookeeper problems, overloading by other users

• Weak or missing features:

• Multi-tenant guardrails

• Elastic scale

• Security

• Geo-distributed high availability

Data bus challenges using Apache Kafka

Page 24: Slashing Big Data Complexity: How Comcast X1 Syndicates Streaming Analytics with Amazon Kinesis

Toes in the Managed

Services Waters

Page 25: Slashing Big Data Complexity: How Comcast X1 Syndicates Streaming Analytics with Amazon Kinesis

Migrating toward managed services

X1 Services

1

2

X1 reporting and

analytics (Tableau,

other apps)

PartnersApache Storm

Partner 1

Kinesis Stream

Partner 2

Kinesis Stream

Kinesis

Streams

Page 26: Slashing Big Data Complexity: How Comcast X1 Syndicates Streaming Analytics with Amazon Kinesis

More managed services

X1 Services

1

2Partners

Partner 1

Kinesis Stream

Partner 2

Kinesis Stream

3

4

Kinesis

Streams

Kinesis

Analytics

Kinesis

Firehose

Amazon Aurora

Amazon Aurora

S3

EMR

Spark

AWS

Lambda

Page 27: Slashing Big Data Complexity: How Comcast X1 Syndicates Streaming Analytics with Amazon Kinesis

The data bus foundation

• Multi-tenancy

• Elastic scale

• Security

• High availability

Page 28: Slashing Big Data Complexity: How Comcast X1 Syndicates Streaming Analytics with Amazon Kinesis

• Read, write limits

• Protects me from others (and others from me)

Multi-tenancy

Shard

Data Bus Stream

Stream/Topic

KPLProducer

AppConsumer

AppKCL

Page 29: Slashing Big Data Complexity: How Comcast X1 Syndicates Streaming Analytics with Amazon Kinesis

• Streams are made of shards

• Each shard ingests data up to 1 MB/sec

and up to 1000 TPS

• Each shard emits up to 2 MB/sec

• Scale Kinesis streams by splitting or

merging shards

Elastic scale—how Kinesis scales

Page 30: Slashing Big Data Complexity: How Comcast X1 Syndicates Streaming Analytics with Amazon Kinesis

Batching

User Record 1

User Record 2

...

User Record A

User Record K

User Record L

...

User Record S

...

User Record AA

User Record BB

...

User Record ZZ

...

Kinesis Record 1

Aggregating

Kinesis Record C

...

Kinesis Record M

...

PutRecords Request

Collecting

Elastic scale: how batching helps

Page 31: Slashing Big Data Complexity: How Comcast X1 Syndicates Streaming Analytics with Amazon Kinesis

• IAM credentials

• Federation

• Cross-account trust

Data bus security

Page 32: Slashing Big Data Complexity: How Comcast X1 Syndicates Streaming Analytics with Amazon Kinesis

[email protected] ID: 111122223333

Kinesis-role

{ "Statement": [{

"Action": [“kinesis:DescribeStream",“kinesis:PutRecord",“kinesis:PutRecords",

],"Effect": "Allow","Resource": “arn:kinesis:*:111122223333:st

}]}

[email protected] ID: 123456789012

Get temporary

security credentials

for kinesis-role

Call AWS APIs using

temporary security

credentials

of kinesis-role

{ "Statement": [{"Effect": "Allow","Action": "sts:AssumeRole","Resource": "arn:aws:iam::111122223333:role/kinesis-role"

}]}

{ "Statement": [{"Effect":"Allow","Principal":{"AWS":"123456789012"},"Action":"sts:AssumeRole"

}]}

Data bus security cross-account trust

kinesis-role trusts AWS Identity and Access

Management (IAM) users from the AWS account

[email protected] (123456789012)

Permissions assigned to partner

granting permission

to assume kinesis-role in

account B

Permissions assigned

to kinesis-role

STSAuthenticate with

Users tokens

KinesisStreams

Lambda

Publisher

Page 33: Slashing Big Data Complexity: How Comcast X1 Syndicates Streaming Analytics with Amazon Kinesis

Opaque HA

AZ 3

AZ 1

AZ 2

Applications

1

2

3

4

KinesisEndpoint

Amazon

Kinesis

Kinesis

Streams

Page 34: Slashing Big Data Complexity: How Comcast X1 Syndicates Streaming Analytics with Amazon Kinesis

The Data Bus Ecosystem

Page 35: Slashing Big Data Complexity: How Comcast X1 Syndicates Streaming Analytics with Amazon Kinesis

• Schema management

• Self-service message routing

• Security governance

Page 36: Slashing Big Data Complexity: How Comcast X1 Syndicates Streaming Analytics with Amazon Kinesis

Avro sample

Page 37: Slashing Big Data Complexity: How Comcast X1 Syndicates Streaming Analytics with Amazon Kinesis

Serialized Avro container

Schema

Binary Data

Page 38: Slashing Big Data Complexity: How Comcast X1 Syndicates Streaming Analytics with Amazon Kinesis

Serialized Avro container (non-X1 example)

Avro schema

Binary encoded

message

Page 39: Slashing Big Data Complexity: How Comcast X1 Syndicates Streaming Analytics with Amazon Kinesis

Avro containers over streaming

1 sec/1 MB 1 sec/1 MB 1 sec/1 MB 1 sec/1 MB

Schema

Binary Data

Schema

Binary Data

Schema

Binary Data

Schema

Binary Data

Page 40: Slashing Big Data Complexity: How Comcast X1 Syndicates Streaming Analytics with Amazon Kinesis

schema_id reserved major version minor version reserved reserved reserved reserved Core Header + Message Data

Magic Bytes Avro encoded body

Data bus schema header

60% Reduction!

Page 41: Slashing Big Data Complexity: How Comcast X1 Syndicates Streaming Analytics with Amazon Kinesis

Avro records over streaming

1 sec/1 MB 1 sec/1 MB 1 sec/1 MB 1 sec/1 MB

Magic Byte Header

Binary Data

Magic Byte Header

Binary Data

Magic Byte Header

Binary Data

Magic Byte Header

Binary Data

Page 42: Slashing Big Data Complexity: How Comcast X1 Syndicates Streaming Analytics with Amazon Kinesis

Data bus schema registry

KinesisStreams

Producer(format stream

to schema)

Consumer(validate stream against schema)

Schema Registry

No schema = smaller payload

Page 43: Slashing Big Data Complexity: How Comcast X1 Syndicates Streaming Analytics with Amazon Kinesis

Self-service message routing

The data bus ecosystem

Page 44: Slashing Big Data Complexity: How Comcast X1 Syndicates Streaming Analytics with Amazon Kinesis

Pace of innovation

Thousands of changes

per month

Page 45: Slashing Big Data Complexity: How Comcast X1 Syndicates Streaming Analytics with Amazon Kinesis

Self-service data bus message routing

Partner

Kinesis

Stream

Partner

Production

Stack

Partner

Kinesis

Stream

Partner

PreProd

Stack

Partner

Kinesis

Stream

Partner

UAT

Stack

Partner

Kinesis

Stream

Partner

Test

Stack

X1

Service

Producer

XBI

Kinesis

Stream

Publishing

Agent to

Partner

Self-ServiceEndpoint

Configuration

X1 Platform

Partner

configures this

Partner

Schema

v. 1.2

Partner

Kinesis

Stream

Partner

Test

Stack

Partner

Kinesis

Stream

Partner

UAT

Stack

Partner

Kinesis

Stream

Partner

PreProd

Stack

Partner

Kinesis

Stream

Partner

Production

Stack

Schema

v. 2.0

Page 46: Slashing Big Data Complexity: How Comcast X1 Syndicates Streaming Analytics with Amazon Kinesis

Security governance

The data bus ecosystem

Page 47: Slashing Big Data Complexity: How Comcast X1 Syndicates Streaming Analytics with Amazon Kinesis

Governance

• Policy

• Practices

• Procedures

Page 48: Slashing Big Data Complexity: How Comcast X1 Syndicates Streaming Analytics with Amazon Kinesis

• Mean Time Before Failure: so far ∞

• Mean Time To Recovery: 0

1.Multi-tenant guardrails: clear and enforced by the

platform

2.Elastic scale: OK—API (looking forward to a

checkbox)

3.Security: IAM, SAML federation, cross-account trust

4.Multi Data Center high availability: yes

Retrospective

Page 49: Slashing Big Data Complexity: How Comcast X1 Syndicates Streaming Analytics with Amazon Kinesis

How to get started

Decide: • High-impact, higher risk

• Low-impact, lower risk

Pick a data flow—preferably a new one

Get an eager developer who wants the challenge (and the resume perks)

Pitch it to the end consumer (if not your team)

Choose a schema approach—it really matters

Decide on RT processing framework: Spark, Storm, AWS Lambda, Kinesis Analytics?

Build a producer proxy to pull in the data—don’t ask the producer to bother

Build a consumer or send it to S3 through Firehose

Evaluate and take next steps

Page 50: Slashing Big Data Complexity: How Comcast X1 Syndicates Streaming Analytics with Amazon Kinesis

Remember to complete

your evaluations!

Page 51: Slashing Big Data Complexity: How Comcast X1 Syndicates Streaming Analytics with Amazon Kinesis

Thank you!