automated media workflows in the cloud (med304) | aws re:invent 2013

51
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc. MED304 - Automated Media Workflows in the Cloud John Mancuso, Amazon Web Services November 14, 2013

Upload: amazon-web-services

Post on 17-Jan-2015

648 views

Category:

Technology


4 download

DESCRIPTION

Ingesting, storing, processing and delivering a large library of content involves massive complexity. This session walks through sample code that leverages AWS Services to perform all these tasks while coordinating the activities with Amazon Simple Workflow Service (SWF). Along the journey you are introduced to best practices for cost optimization, monitoring, reporting, and exception or error handling. In addition to the sample workflow, a guest speaker from Netflix takes the audience on a deep dive into their “digital supply chain” where you learn how they have automated their processes in moving data all the way from the studios to the last mile. Services covered include Amazon SWF, Amazon Simple Storage Service (S3), Amazon Glacier, Amazon Elastic Compute Cloud (EC2), Amazon Elastic Transcoder, Amazon Mechanical Turk, and Amazon CloudFront.

TRANSCRIPT

Page 1: Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.

MED304 - Automated Media Workflows in the

Cloud

John Mancuso, Amazon Web Services

November 14, 2013

Page 2: Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013

Agenda

• Why automate

• Workflow steps

• Automating the workflow

• Demo of an end-to-end media workflow

• How Netflix approaches their digital supply chain

Page 3: Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013

Why Automate?

Analog VCD DVD 720p 1080p (3D) 2K 4K

SIZE USERS

FORMAT

Page 4: Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013

Scenario

• At any given time, company X produces 10 broadcast quality shows

• Each show consists of 200 30-minute episodes per year

• High-res post-production copies of each show are temporarily stored at company X’s studio in Tokyo

• The content must be made available for distribution to consumers via web, mobile devices, and media players

• The high-res content must be archived for future access

Page 5: Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013

Media Workflow

Ingest Processing Discovery &

Delivery

Page 6: Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013

Media Workflow

Ingest Processing Discovery &

Delivery

Amazon Simple Workflow Service (SWF)

Amazon Storage Services

Amazon S3 – Standard & RRS, Amazon Glacier

Page 7: Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013

Media Workflow

Ingest Processing Discovery &

Delivery

Amazon Simple Workflow Service (SWF)

Amazon Storage Services

Amazon S3 – Standard & RRS, Amazon Glacier

Page 8: Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013

Ingest

Image courtesy of porbital FreeDigitalPhotos.net

Amazon S3 –

US East

Page 9: Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013

Ingest – Data Transfer

AWS Command

Line Interface (CLI)

Amazon S3

Server Side

Amazon S3 parallel

multipart

uploads

Page 10: Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013

Ingest – Data Transfer

Amazon S3

Tsunami UDP

Amazon EC2 Image courtesy of porbital FreeDigitalPhotos.net

Page 11: Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013

Ingest –Timing Comparison 885 MB Video File

Single thread to S3 13 minutes 25 seconds --

Multiple threads to

S3

1 minute 93% reduction

Tsunami UDP +

multiple threads

15 seconds + 7 seconds

= 22 seconds

63% further reduction

Instance size: CC2.8xlarge

OS: Amazon Linux

Page 12: Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013

Ingest – Code Snippet def doWork_INGEST(remoteIP,remoteFileName,s3Key_HighRes):

#Transfer using TSUNAMI

cmd_s = '/usr/local/bin/tsunami connect {} set rate 500m get {} quit'

cmd_s = cmd_s.format(remoteIP,remoteFileName)

execCMD(cmd_s)

#Upload to S3 using AWS CLI

s3Path = 's3://{}/{}'

s3Path = s3Path.format(s3Bucket_HighRes,s3Key_HighRes)

cmd_s = 'aws s3 cp {} {} --region us-east-1'

cmd_s = cmd_s.format(remoteFileName,s3Path)

execCMD(cmd_s)

#Delete the local file

os.remove(localFilePath)

Page 13: Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013

Media Workflow

Ingest Processing Discovery &

Delivery

Amazon Simple Workflow Service (SWF)

Amazon Storage Services

Amazon S3 – Standard & RRS, Amazon Glacier

Page 14: Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013

Processing

• Transcoding

• Thumbnail selection

• Archiving of high-res videos

Page 15: Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013

Processing – Transcoding

Amazon S3 Amazon S3

(RRS)

Amazon Elastic

Transcoder

Page 16: Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013

Transcoding – Code Snippet def doWork_PROCESS_TRANSCODE(Key_HighRes,s3PreFix_TranscodeRoot):

etc = ElasticTranscoderConnection()

job_input_name={"Key": s3Key_HighRes, "FrameRate": "auto", "Resolution": "auto", "AspectRatio": "auto", "Interlaced": "auto", "Container": "auto" }

job_outputs=[

{"Key": "MP4.mp4", "ThumbnailPattern": "MP4{count}", "Rotate": "auto", "PresetId": ET_PresetId_MP4},

{"Key": "HLS", "ThumbnailPattern": "HLS{count}", "Rotate": "auto", "PresetId": ET_PresetId_HLS}]

job = etc.create_job(pipeline_id=ET_Pipeline_ID,input_name=job_input_name,outputs=job_outputs,output_key_prefix=s3PreFix_TranscodeRoot)

jid = job['Job']['Id']

#Ideally you would leverage the SNS capabilities of ET to signal SWF on completion

waitForCompletion(etc,jid)

Page 17: Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013

Processing –Thumbnail selection

Amazon S3

(RRS)

Amazon

DynamoDB

Amazon

Mechanical Turk

Page 18: Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013

Thumbnail Selection – Code Snippet def getRequest(s3WebPath_Thumbnails):

request_params = {"Title":"Thumbnail Selcection",

"Description":"Please choose a thumbnail",

"MaxAssignments":"1",

"HITLayoutId": MTurk_HITLAYOUTID,

"Reward": {"Amount": "0.10","CurrencyCode":"USD"},

"LifetimeInSeconds":"300",

"AssignmentDurationInSeconds":"300",

"HITLayoutParameter": [

{"Name": "image1","Value": s3WebPath_Thumbnails + "MP400001.png"},

.

.

.

{"Name": "image10","Value": s3WebPath_Thumbnails + "MP400010.png"},

]

}

print request_params

Page 19: Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013

Thumbnail Selection – Code Snippet def doWork_PROCESS_THUMBNAIL(s3PreFix_Thumbnails):

m = mturkcore.MechanicalTurk()

mtc = MTurkConnection()

s3WebPath_Thumbnails = 'http://{}.s3-website-us-east-1.amazonaws.com/{}'

s3WebPath_Thumbnails = s3WebPath_Thumbnails.format(s3Bucket_Thumbs, s3PreFix_Thumbnails)

request_params = getRequest(s3WebPath_Thumbnails)

hit = m.create_request("CreateHIT", request_params)

hid = hit['CreateHITResponse']['HIT']['HITId']

#Wait for an answer

answer = getAnswer(mtc,hid)

#Get the imagename from the answer

answer = answer[5:]

answer = answer.zfill(5)

imagekey = '{}MP4{}.png'

imagekey = imagekey.format(s3WebPath_Thumbnails,answer)

return imagekey

Page 20: Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013

Processing – Archiving of High-res Videos

Amazon S3 Amazon

Glacier

Page 21: Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013

Archiving – Code Snippet def doWork_PROCESS_ARCHIVE(s3Key_HighRes):

#Move the high-res video to a path in S3 configured to archive

#to Amazon Glacier with a lifecycle policy

s3PathA = 's3://{}/{}'

s3PathA = s3PathA.format(s3Bucket_HighRes,s3Key_HighRes)

s3PathB = 's3://{}/toArchive/{}'

s3PathB = s3PathB.format(s3Bucket_HighRes,s3Key_HighRes)

cmd_s = 'aws s3 mv {} {} --region us-east-1'

cmd_s = cmd_s.format(s3PathA,s3PathB)

execCMD(cmd_s)

Page 22: Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013

Media Workflow

Ingest Processing Discovery &

Delivery

Amazon Simple Workflow Service (SWF)

Amazon Storage Services

Amazon S3 – Standard & RRS, Amazon Glacier

Page 23: Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013

Discovery & Delivery

Amazon S3

(RRS)

Amazon CloudFront

CMS Running on Amazon EC2

Page 24: Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013

Automating the Workflow

Page 25: Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013

Media Workflow

Ingest Processing Discovery &

Delivery

Amazon Simple Workflow Service (SWF)

Amazon Storage Services

Amazon S3 – Standard & RRS, Amazon Glacier

Page 26: Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013

Amazon Simple Workflow (SWF)

• SWF – Maintains distributed

application state

– Tracks workflow executions

– Dispatches tasks

(activities & deciders)

– Retains history

– Provides visibility

• Activities tasks – Do the “work” associated

with a workflow step

• Decider tasks – Determines which activity

task should come next

• Activities & deciders can run anywhere (on prem, in cloud)

Page 27: Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013

Decider Logic

Task = GetDecision Task

Exists?

NextActivity =

ACTIVITIES[len(EventList)]

Signal Completion of

Execution

NextActivity.Input =

PreviosActivity.Result

NextActivity.Input =

Execution Input

Is First

Activity?

Yes

No

Yes Yes No

Start

EventList with

[‘ActivityTaskCompleted’,

‘WorkflowExecutionStarted’]

All Activities

Completed?

No

Page 28: Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013

Activity Worker – Code Snippet from mwf_Ingest import *

swf_l1 = swf.Layer1()

while True:

task = swf_l1.poll_for_activity_task(domain['name'], workflow_type['task_list'])

if 'taskToken' in task:

task_token = task['taskToken']

task_input = json.loads(task['input'])

try:

if task['activityType']['name'] == activities[0]['name']:

remoteIP = task_input['remoteIP']

remoteFileName = task_input['remoteFileName']

s3Key_HighRes = get_rand() + remoteFileName[remoteFileName.rindex('.'):]

doWork_INGEST(remoteIP,remoteFileName,s3Key_HighRes)

dataToPass = {'s3Key_HighRes' : s3Key_HighRes}

task_status_s = json.dumps(dataToPass)

out = swf_l1.respond_activity_task_completed(task_token,task_status_s)

except:

out = swf_l1.respond_activity_task_failed(task_token,'','')

Page 29: Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013

Workflow Steps

• Start workflow execution

• Ingest (transfer file to Amazon EC2 using

Tsunami UDP & upload to Amazon S3)

• Transcode file (multiple output formats)

• Select thumbnail

• Archive high-res file

• Signal completion of execution

Page 30: Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013

Scalability & Fault Tolerance Analysis

Step Is Scalable? Is Fault Tolerant?

Ingest

Transcode

Archive to Amazon Glacier

Amazon Mechanical Turk

for thumbnails

Delivery with Amazon

CloudFront

Automation elements

Page 31: Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013

Demo External references: MTurkCore, Boto

Page 32: Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013

Netflix’s Transcoding Transformation

Tony Koinov, Director Engineering, Netflix

Page 33: Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013

Netflix Media in AWS

• Matrix : The Netflix media pipeline

• MAPLE : New generation media

pipeline

• Concluding thoughts

33

Page 34: Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013

Netflix Media Pipeline

34

EC2

S3 EC2

S3

Open

Connect

EC2

FTP Media

Processing

Page 35: Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013

Driving to Hollywood Game

35

Page 36: Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013

Rules of the Game

• 200 MPH!

• Purchase only

• Quantities limited

• It breaks, you fix it

• Pay for parking

• Obsolete in 1 year

• 85 MPH

• Lease, cancel anytime

• Unlimited quantity

• It breaks, replace it, no charge

• No parking, just walk away

• Brand new each year

36

Page 37: Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013

Industry Heritage : Optimize for Latency • Interactive editing

– Master creation

– DVD/Blu-ray authoring

– Edits for television

37

Page 38: Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013

Netflix 2008 • Custom data center

• Custom GPU encoders

• Fixed size

• New format needed – PC, Mac, Xbox

• Content library doubled

• Frequent HW failures

• Fail! Catalog incomplete

38

Page 39: Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013

Fall 2009 – Launch Netflix PS3 Player

• First 100% AWS

transcode

• New format, unique to

Netflix PS3 player

• Encode recipe nailed

down late

• 3 weeks, transcode

entire catalog

39

Page 40: Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013

Netflix 2009 to Present

• US East AWS

• Variable sized EC2 farm

• S3 for storage

• Optimized for throughput, not

latency

• No more missed deadlines – Devices, catalogs, countries

40

Page 41: Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013

Spring 2010 – Launch Netflix iPad Player

• Launch April 10th

• Apple approached us in mid February

• Grew EC2 farm to 4,000 instances

• Entire library transcoded in 2 weeks

• New format ready for launch

41

Page 42: Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013

Netflix Media Pipeline

42

EC2

S3 EC2

S3

Open

Connect

EC2

FTP Media

Processing

Page 43: Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013

For Netflix, Throughput Trumps Latency

• Think horizontal, not vertical

• Priuses move more people than Ferraris

• Frequent re-encodes of growing libraries

• Netflix is nimble because of AWS

43

Page 44: Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013

More Proof That Horizontal Wins

• New countries, new content

• Codec innovation

44

Page 45: Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013

AWS Handles Netflix Scale

• 6 regional catalogs

• 4 formats supported today – 1 VC-1, 3 H.264

– Multiple bit rates per format

• 10s of 1000s of hours of content

• Petabytes of S3 storage

45

Page 46: Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013

Netflix Media in AWS

• Matrix: The Netflix media pipeline

• MAPLE: New generation media

pipeline

• Concluding thoughts

46

Page 47: Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013

New Generation : Address Faults and Latency • More than 1 week 4K

transcode

• 2 – 3 days for HD transcode

• Fault intolerant

• Maintenance is challenging

• Often too slow – Day after broadcast

– Redelivery of damaged content

47

EC2: C1 Medium

S3

~700 Mbps

10-16 Mbps

Page 48: Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013

MAPLE : Massively Parallel Encoding

• 5-minute chunks – Close to real time

• Fault tolerant

• Easy maintenance

• Address low latency use cases – Day after broadcast

– Redelivery of damaged content

48 S3

EC2

Page 49: Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013

Netflix Media in AWS

• Matrix : The Netflix media pipeline

• MAPLE : New generation media

pipeline

• Concluding thoughts

49

Page 50: Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013

We Would Do It All Over Again

• Don’t be fooled by IT cost

comparisons – We don’t administer the gear

• 6,000 EC2 instances

• Petabytes of storage

• High network traffic

– Storage is durable

– It is a moving target

• You cannot put a price on nimble

50

Page 51: Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013

Please give us your feedback on this

presentation

As a thank you, we will select prize

winners daily for completed surveys!

MED304