aws re:invent 2013 scalable media processing in the cloud

44
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc. Scalable Media Processing Phil Cluff, British Broadcasting Corporation David Sayed, Amazon Web Services November 13, 2013

Upload: david-sayed

Post on 10-Jun-2015

282 views

Category:

Technology


1 download

DESCRIPTION

Presentation from AWS re:Invent 2013. See session video here: http://www.youtube.com/watch?v=MjZdiDotRU8 Presentation is in two parts: (1) Introduction to moving workloads to the cloud, (2) deep dive on how the BBC moved their playout to the cloud.

TRANSCRIPT

Page 1: AWS re:Invent 2013 Scalable Media Processing in the Cloud

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.

Scalable Media ProcessingPhil Cluff, British Broadcasting Corporation

David Sayed, Amazon Web Services

November 13, 2013

Page 2: AWS re:Invent 2013 Scalable Media Processing in the Cloud

Agenda

• Media workflows• Where AWS fits• Cloud media processing approaches• BBC iPlayer in the cloud

Page 3: AWS re:Invent 2013 Scalable Media Processing in the Cloud

Media Workflows

Featurettes

Interviews

2D Movie

3D Movie

Archive Materials

Stills

Networks

Theatrical

DVD/BD

Online

Mobile Apps

Archive

MSOs

Media Workflow

Media Workflow

Media Workflow

Page 4: AWS re:Invent 2013 Scalable Media Processing in the Cloud

Where AWS Fits Into Media Processing

Amazon Web Services

Inge

st

Inde

x

Pro

cess

Pac

kage

Pro

tect

QC

Aut

h.

Tra

ck

Pla

ybac

k

Media Asset Management

Analytics and Monetization

Page 5: AWS re:Invent 2013 Scalable Media Processing in the Cloud

Media Processing Approaches

3 Phases

Page 6: AWS re:Invent 2013 Scalable Media Processing in the Cloud

Cloud Media Processing Approaches

Phase 1: Lift processing from the premises and shift to the cloud

Page 7: AWS re:Invent 2013 Scalable Media Processing in the Cloud

Lift and Shift

Media Processing Operation

OS Storage

Media Processing Operation

OS Storage

EC2

Media Processing Operation

OS Storage

EC2

Page 8: AWS re:Invent 2013 Scalable Media Processing in the Cloud

The Problem with Lift and Shift

Media Processing Operation

OS Storage

Monolithic Media Processing Operation

Ingest Operation

Post-processing

Export

Workflow Parameters

EC2

Page 9: AWS re:Invent 2013 Scalable Media Processing in the Cloud

Cloud Media Processing Approaches: Phase 2

Phase 1: Lift processing from the premises and shift to the cloud

Phase 2: Refactor and optimize to leverage cloud resources

Page 10: AWS re:Invent 2013 Scalable Media Processing in the Cloud

Refactor and Optimization Opportunities

“Deconstruct monolithic media processing operations”

– Ingest– Atomic media processing operation– Post-processing– Export– Workflow– Parameters

Page 11: AWS re:Invent 2013 Scalable Media Processing in the Cloud

Refactoring and Optimization Example

AP

I C

alls

EC2 EBS

EC2 EBS

EC2 EBS

Source S3 Bucket

SWF

Output S3 Bucket

Page 12: AWS re:Invent 2013 Scalable Media Processing in the Cloud

Cloud Media Processing Approaches

Phase 1: Lift processing from the premises and shift to the cloud

Phase 2: Refactor and optimize to leverage cloud resources

Phase 3: Decomposed, modular cloud-native architecture

Page 13: AWS re:Invent 2013 Scalable Media Processing in the Cloud

Decomposition and Modularization Ideas for Media Processing

• Decouple *everything* that is not part of atomic media processing operation

• Use managed services where possible for workflow, queues, databases, etc.

• Manage– Capacity– Redundancy– Latency– Security

Page 14: AWS re:Invent 2013 Scalable Media Processing in the Cloud

in the Cloud

AKA “Video Factory”

Phil CluffPrincipal Software Engineer & Team LeadBBC Media Services

Page 15: AWS re:Invent 2013 Scalable Media Processing in the Cloud

• The UK’s biggest video & audio on-demand service– And it’s free!

• Over 7 million requests every day– ~2% of overall consumption of BBC output

• Over 500 unique hours of content every week– Available immediately after broadcast, for at least 7 days

• Available on over 1000 devices including– PC, iOS, Android, Windows Phone, Smart TVs, Cable Boxes…

• Both streaming and download (iOS, Android, PC)

• 20 million app downloads to date

Sources: BBC iPlayer Performance Pack August 2013http://www.bbc.co.uk/blogs/internet/posts/Video-Factory

Page 16: AWS re:Invent 2013 Scalable Media Processing in the Cloud

Video“Where Next?”

Page 17: AWS re:Invent 2013 Scalable Media Processing in the Cloud

What Is Video Factory?

• Complete in-house rebuild of ingest, transcode, and delivery workflows for BBC iPlayer

• Scalable, message-driven cloud-based architecture

• The result of 1 year of development by ~18 engineers

Page 18: AWS re:Invent 2013 Scalable Media Processing in the Cloud

And here they are!

Page 19: AWS re:Invent 2013 Scalable Media Processing in the Cloud

Why Did We Build Video Factory?

• Old system– Monolithic– Slow– Couldn’t cope with spikes– Mixed ownership with third party

• Video Factory– Highly scalable, reliable– Completely elastic transcode resource

– Complete ownership

Page 20: AWS re:Invent 2013 Scalable Media Processing in the Cloud

Why Use the Cloud?• Background of 6 channels, spikes up to 24 channels, 6 days a week• A perfect pattern for an elastic architecture

Off-Air Transcode Requests for 1 week

Page 21: AWS re:Invent 2013 Scalable Media Processing in the Cloud

Video Factory – Architecture

• Entirely message driven– Amazon Simple Queuing Service (SQS)

• Some Amazon Simple Notification Service (SNS)

– We use lots of classic message patterns

• ~20 small components– Singular responsibility – “Do one thing, and do it well”

• Share libraries if components do things that are alike• Control bloat

– Components have contracts of behavior• Easy to test

Page 22: AWS re:Invent 2013 Scalable Media Processing in the Cloud

Video Factory – Workflow

SDI Broadcast Video Feed

x 24

Playout Data Feed

Broadcast Encoder

Live Ingest Logic

Amazon Elastic Transcoder

ElementalCloud

DRM

QC

Editorial Clipping

MAM

Amazon S3Mezzanine

Time AddressableMedia Store

Amazon S3Distribution Renditions

RTPChunker

Transcode Abstraction

Layer

Mezzanine

Playout Video

Transcoded Video

Metadata

SMPTE Timecode

Mezzanine Video Capture

Page 23: AWS re:Invent 2013 Scalable Media Processing in the Cloud

Detail

• Mezzanine video capture• Transcode abstraction• Eventing demonstration

Page 24: AWS re:Invent 2013 Scalable Media Processing in the Cloud

Mezzanine Video Capture

Page 25: AWS re:Invent 2013 Scalable Media Processing in the Cloud

Mezzanine Capture

SDI Broadcast Video Feed

x 24Broadcast Grade Encoder

Amazon S3Mezzanine

Chunks

RTPChunker

ChunkUploader

MPEG2 Transport Stream (H.264) on RTP Multicast 30 MB HD/10 MB SD

MPEG2 Transport Stream (H.264) Chunks

3 GB HD/1 GB SD

ChunkConcatenator

Amazon S3Mezzanine

Control Messages

SMPTE Timecode

Page 26: AWS re:Invent 2013 Scalable Media Processing in the Cloud

Concatenating Chunks

• Build file using Amazon S3 multipart requests – 10 GB Mezzanine file constructed in under 10 seconds

• Amazon S3 multipart APIs are very helpful– Component only makes REST API calls

• Small instances; still gives very high performance

• Be careful – Amazon S3 isn’t immediately consistent when dealing with multipart built files– Mitigated with rollback logic in message-based applications

Page 27: AWS re:Invent 2013 Scalable Media Processing in the Cloud

By Numbers – Mezzanine Capture

• 24 channels– 6 HD, 18 SD– 16 TB of Mezzanine data every day per capture

• 200,000 chunks every day– And Amazon S3 has never lost one– That’s ~2 (UK) billion RTP packets every day… per capture

• Broadcast grade resiliency– Several data centers / 2 copies each

Page 28: AWS re:Invent 2013 Scalable Media Processing in the Cloud

Transcode Abstraction

Page 29: AWS re:Invent 2013 Scalable Media Processing in the Cloud

Transcode Abstraction• Abstract away from single supplier

– Avoid vendor lock in– Choose suppliers based on performance and quality and broadcaster-friendly feature sets– BBC: Elemental Cloud (GPU), Amazon Elastic Transcoder, in-house for subtitles

• Smart routing & smart bundling– Save money on non–time critical transcode– Save time & money by bundling together “like” outputs

• Hybrid cloud friendly– Route a baseline of transcode to local encoders, and spike to cloud

• Who has the next game changer?

Page 30: AWS re:Invent 2013 Scalable Media Processing in the Cloud

Transcode Abstraction

TranscodeRequest

Transcode Router

Amazon Elastic Transcoder

ElementalCloud

Amazon Elastic Transcoder

Backend

Elemental Backend

RESTSQS

Amazon S3Mezzanine

Amazon S3Distribution Renditions

SQS

Subtitle Extraction Backend

Page 31: AWS re:Invent 2013 Scalable Media Processing in the Cloud

Transcode Abstraction - Future

TranscodeRequest

Transcode Router

Amazon Elastic Transcoder

ElementalCloud

Amazon Elastic Transcoder

Backend

Elemental Backend

SQS

Amazon S3Mezzanine

Amazon S3Distribution Renditions

SQS

Subtitle Extraction Backend

Unknown Future Backend X

?

REST

Page 32: AWS re:Invent 2013 Scalable Media Processing in the Cloud

Example – A Simple Elastic Transcoder Backend

XMLTranscodeRequest

Get Message from Queue

Unmarshal and Validate Message

Initialize Transcode

Wait for SNS Callback over HTTP

XMLTranscode

StatusMessage

Amazon Elastic Transcoder

POSTPOST(Via SNS)

SQS Message Transaction

Page 33: AWS re:Invent 2013 Scalable Media Processing in the Cloud

Example – Add Error Handling

XMLTranscodeRequest

Get Message from Queue

Unmarshal and Validate Message

Initialize Transcode

Wait for SNS Callback over HTTP

XMLTranscode

StatusMessage

Amazon Elastic Transcoder

POSTPOST(Via SNS)

Bad MessageQueue

FailQueue

Dead LetterQueue

SQS Message Transaction

Page 34: AWS re:Invent 2013 Scalable Media Processing in the Cloud

Example – Add Monitoring Eventing

XMLTranscodeRequest

Get Message from Queue

Unmarshal and Validate Message

Initialize Transcode

Wait for SNS Callback over HTTP

XMLTranscode

StatusMessage

Amazon Elastic Transcoder

POSTPOST(Via SNS)

Bad MessageQueue

FailQueue

Dead LetterQueue

MonitoringEvents

MonitoringEvents

MonitoringEvents

MonitoringEvents

SQS Message Transaction

Page 35: AWS re:Invent 2013 Scalable Media Processing in the Cloud

BBC eventing framework

• Key-value pairs pushed into Splunk– Business-level events, e.g.:

• Message consumed• Transcode started

– System-level events, e.g.:

• HTTP call returned status 404• Application’s heap size• Unhandled exception

• Fixed model for “context” data– Identifiable workflows, grouping of events; transactions– Saves us a LOT of time diagnosing failures

Page 36: AWS re:Invent 2013 Scalable Media Processing in the Cloud

Component Development – General Development & Architecture• Java applications

– Run inside Apache Tomcat on m1.small EC2 instances– Run at least 3 of everything– Autoscale on queue depth

• Built on top of the Apache Camel framework– A platform for build message-driven applications– Reliable, well-tested SQS backend– Camel route builders Java DSL

• Full of messaging patterns

• Developed with Behavior-Driven Development (BDD) & Test-Driven Development (TDD)– Cucumber

• Deployed continuously– Many times a day, 5 days a week

Page 37: AWS re:Invent 2013 Scalable Media Processing in the Cloud

Error Handling Messaging Patterns

• We use several message patterns– Bad message queue– Dead letter queue– Fail queue

• Key concept– Never lose a message– Message is either in-flight, done, or in an error queue somewhere

• All require human intervention for the workflow to continue– Not necessarily a bad thing

Page 38: AWS re:Invent 2013 Scalable Media Processing in the Cloud

Message Patterns – Bad Message Queue

• Wrapped in a message wrapper which contains context• Never retried• Very rare in production systems• Implemented as an exception handler on the route builder

The message doesn’t unmarshal to the object it should OR

We could unmarshal the object, but it doesn’t meet our validation rules

Page 39: AWS re:Invent 2013 Scalable Media Processing in the Cloud

Message Patterns – Dead Letter Queue

• Message is an exact copy of the input message• Retried several times before being put on the DLQ• Can be common, even in production systems• Implemented as a bean in the route builder for SQS

We tried processing the message a number of times, and something we weren’t expecting went wrong each time

Page 40: AWS re:Invent 2013 Scalable Media Processing in the Cloud

Message Patterns – Fail Queue

• Wrapped in a message wrapper that contains context• Requires some level of knowledge of the system to be retried• Often evolve from understanding the causes of DLQ’d messages• Implemented as an exception handler on the route builder

Something I knew could go wrong went wrong

Page 41: AWS re:Invent 2013 Scalable Media Processing in the Cloud

Demonstration – Eventing Framework

Page 42: AWS re:Invent 2013 Scalable Media Processing in the Cloud
Page 43: AWS re:Invent 2013 Scalable Media Processing in the Cloud

Questions?

[email protected]@amazon.com

@GeneticGenesis@dsayed

Page 44: AWS re:Invent 2013 Scalable Media Processing in the Cloud

Please give us your feedback on this presentation

As a thank you, we will select prize winners daily for completed surveys!

MED302