Download - Building a Scalable Digital Asset Management Platform in the Cloud (MED402) | AWS re:Invent 2013
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
MED402: Building a Scalable Video / Digital Asset
Management (DAM) Platform in the Cloud
Michael Limcaco – Enterprise Solutions Architect (AWS)
Jonathan Rivers – Director, Technical Operations (PBS)
November 15, 2013
Agenda
• The big picture
• Architecture
• Build-out exercise
• Customer case study (PBS)
• Observations and summary
Big Picture: Enterprise Media Architecture
Transcoders
Store output
profile and file
Content
Management,
Discovery &
Delivery
Store output
profile and file
Media
Files
RTMP
MPEG-TS
HD-SDI
Camera
Physical
Media
Live
Stream
Integrated
Workflow
Big Picture: Digital Asset Management (DAM)
Transcoders
Store output
profile and file
Content
Management,
Discovery, &
Delivery
Store output
profile and file
Media
Files
RTMP
MPEG-TS
HD-SDI
Camera
Physical
Media
Live
Stream
Integrated
Workflow
DAM
Ingest Processing Discovery &
Delivery
Workflow Management
Storage
Ingest Processing Discovery &
Delivery
Workflow Management
Storage
Key DAM Requirements
• Ingest
• Metadata extraction
• Create renditions
• Build the catalog
• Enable rich search
• Manage storage lifecycle
• Provide efficient delivery of media assets
Key DAM Requirements
• Ingest
• Metadata extraction
• Create renditions
• Build the catalog
• Enable rich search
• Manage storage lifecycle
• Provide efficient delivery of media assets
Key DAM Requirements
• Ingest
• Metadata extraction
• Create renditions
• Build the catalog
• Enable rich search
• Manage storage lifecycle
• Provide efficient delivery of media assets
Why Scalable?
• Increasing volume, variety, velocity – Collectors, cameras, sensors and sources
• Ex: UGC, raw source, Mezzanine, B-roll, creative collateral
• Final content
– Formats and standards • Transport, containers, codecs, metadata
• SD, HD, 4K …. 8K
– Devices and user expectations
• Opportunities through cloud enablement – Media platform as a service
– Multitenancy
What about Search? Ugh …
• Core elements – Project, keyword, asset name, tags, date/time capture, timecode range,
subject, format, size
• Extended structured search – Dublin core, XMP, MPEG-7, IPTC, EXIF, FCXML, SMPTE, MISB
• Unstructured search – Comments, notes, transcript, closed captioning
Enough Theory …
Let’s Build a DAM in the Cloud!
The User Experience
(Notional Reference Client)
(Demo)
Architecture
S3 Buckets
For Renditions,
Metadata Sidecar
Files
Auto scaling
Group
EC2 Workers
Auto scaling
Group DynamoDB
Amazon
CloudSearch
EC2 Workers
AWS
Beanstalk
DAM
Web Service
DAM
Storage &
Archive
Catalog
Rendition
Processing
Metadata
Processing
Event
Handler
Mailbox
Mailbox
DAM
Interface
Delivery
Cache
S3 Buckets
For Renditions,
Metadata Sidecar
Files
Auto scaling
Group
EC2 Workers
Auto scaling
Group DynamoDB
Amazon
CloudSearch
EC2 Workers
AWS
Beanstalk
DAM
Web Service
DAM
Storage &
Archive
Catalog
Rendition
Processing
Metadata
Processing
Event
Handler
Mailbox
Mailbox
DAM
Web
Interface
Delivery
Cache
S3 Buckets
For Renditions,
Metadata Sidecar
Files
Auto scaling
Group
EC2 Workers
Auto scaling
Group DynamoDB
Amazon
CloudSearch
EC2 Workers
AWS
Beanstalk
DAM
Web Service
DAM
Storage &
Archive
Catalog
Rendition
Processing
Metadata
Processing
Event
Handler
Mailbox
Mailbox
DAM
Web
Interface
Delivery
Cache
S3 Buckets
For Renditions,
Metadata Sidecar
Files
Auto scaling
Group
EC2 Workers
Auto scaling
Group DynamoDB
Amazon
CloudSearch
EC2 Workers
AWS
Beanstalk
DAM
Web Service
DAM
Storage &
Archive
Catalog
Rendition
Processing
Metadata
Processing
Event
Handler
Mailbox
Mailbox
DAM
Web
Interface
Delivery
Cache
S3 Buckets
For Renditions,
Metadata Sidecar
Files
Auto scaling
Group
EC2 Workers
Auto scaling
Group DynamoDB
Amazon
CloudSearch
EC2 Workers
AWS
Beanstalk
DAM
Web Service
DAM
Storage &
Archive
Catalog
Rendition
Processing
Metadata
Processing
Event
Handler
Mailbox
Mailbox
DAM
Web
Interface
Delivery
Cache
S3 Buckets
For Renditions,
Metadata Sidecar
Files
Auto scaling
Group
EC2 Workers
Auto scaling
Group DynamoDB
Amazon
CloudSearch
EC2 Workers
AWS
Beanstalk
DAM
Web Service
DAM
Storage &
Archive
Catalog
Rendition
Processing
Metadata
Processing
Event
Handler
Mailbox
Mailbox
DAM
Interface
Delivery
Cache
S3 Buckets
For Renditions,
Metadata Sidecar
Files
Auto scaling
Group
EC2 Workers
Auto scaling
Group DynamoDB
Amazon
CloudSearch
EC2 Workers
AWS
Beanstalk
DAM
Web Service
DAM
Storage &
Archive
Catalog
Rendition
Processing
Metadata
Processing
Event
Handler
Mailbox
Mailbox
DAM
Interface
Delivery
Cache
Tools Available to Us Need Description AWS Service
Ingest Integrate w / existing file-based workflows Amazon S3
Metadata Process inline and sidecar files EC2 / Elastic Beanstalk
Renditions Autogenerate thumbnails and proxies Amazon Elastic Transcoder
Catalog part 1 Administrative entities, simple retrieval Amazon DynamoDB
Catalog part 2 Field and free-form search Amazon CloudSearch
Storage Nearline, online, offline infinite storage Amazon S3, Amazon Glacier
Delivery Global caching and streaming footprint Amazon CloudFront
Catalog: A word on why DynamoDB
Container-A
Header
Layer-1
Layer-2
Container-B
Header
Layer-1
Layer-2
Core Elem1 Core Elem2 Elem from A Elem from B
Name_A Size Some_Field
Name_B Size Some_Field
Name_C Size
NoSQL Data Model
Container-C
Header
Catalog: A Word on Why CloudSearch
• Video and text
– Header fields with textual descriptions, synopsis, comments
– Tracks with speech to text, closed caption data
– Links to scripts
• Video and structured elements
– XMP dynamic media
– Sidecar files
• A managed search engine dedicated to these kinds of problems
– Case folding, stemming, stopword removal, synonyms
– Also accent normalization, UTF-8 normalization, etc.
Other Goodies
• Back-end services – AWS CLI
– Open source decode utilities
• EXIFtool
• MediaInfo
– ETL support
• Talend (representative)
• Front-end services – Node.js + AWS Node SDK
S3 Buckets
For Renditions,
Metadata Sidecar
Files
Auto scaling
Group
EC2 Workers
Auto scaling
Group DynamoDB
Amazon
CloudSearch
EC2 Workers
AWS
Beanstalk
DAM
Web Service
DAM
Storage &
Archive
Catalog
Rendition
Processing
Metadata
Processing
Event
Handler
Mailbox
Mailbox
DAM
Interface
Delivery
Cache
Amazon SNS Topic
Amazon S3 Storage
For Source,
Renditions, Metadata
Sidecar Files
Amazon SQS Queue
Rendition Jobs
Amazon SQS
Queue
Metadata
Processing Jobs Metadata
Workers
EC2 ASG
Rendition
Workers Amazon
DynamoDB
Amazon
CloudSearch
EC2 ASG
Media
Content
AWS Elastic
Beanstalk
Elastic Transcoder
Proxy / Thumbnail
Generation
DAM
Web Service
CloudFront
Download
Distribution
DAM
Catalog
EC2 Crawler
Walkthrough
(Dual Screen)
Setup
• Amazon Simple Storage Service (S3) buckets ready to go – External staging locations
– Internal working locations
• Amazon Simple Notification Service (SNS) + Amazon Simple Queue Service (SQS) wired up
• Catalog data models established – Amazon DynamoDB table “catalog” created
– Amazon CloudSearch search domain “catalog” created
1. Ingest, Crawl, Notify
a. End user initiates data copy
b. EC2 worker scans Amazon S3 staging bucket
c. EC2 worker copies or moves content
d. EC2 worker broadcasts “NEW DATA” event
Amazon SNS Topic
Amazon S3 Storage
For Source,
Renditions, Metadata
Sidecar Files
SQS Queue
Rendition Jobs
SQS Queue
Metadata
Processing
Jobs Metadata
Workers
EC2 ASG
Rendition
Workers Amazon
DynamoDB
Amazon
CloudSearch
EC2 ASG
Media
Content
AWS Elastic
Beanstalk
Elastic Transcoder
Proxy / Thumbnail
Generation
DAM
Web Service
CloudFront
Download
Distribution
DAM
Catalog
EC2 Crawler
Amazon SNS Topic
Amazon S3 Storage
For Source,
Renditions, Metadata
Sidecar Files
SQS Queue
Rendition Jobs
SQS Queue
Metadata
Processing
Jobs Metadata
Workers
EC2 ASG
Rendition
Workers Amazon
DynamoDB
Amazon
CloudSearch
EC2 ASG
Media
Content
AWS Elastic
Beanstalk
Elastic Transcoder
Proxy / Thumbnail
Generation
DAM
Web Service
CloudFront
Download
Distribution
DAM
Catalog
EC2 Crawler
Amazon SNS Topic
Amazon S3 Storage
For Source,
Renditions, Metadata
Sidecar Files
SQS Queue
Rendition Jobs
SQS Queue
Metadata
Processing
Jobs Metadata
Workers
EC2 ASG
Rendition
Workers
Amazon
CloudSearch
EC2 ASG
Media
Content
AWS Elastic
Beanstalk
Elastic Transcoder
Proxy / Thumbnail
Generation
DAM
Web Service
CloudFront
Download
Distribution
EC2 Crawler
Amazon
DynamoDB
DAM
Catalog
Amazon SNS Topic
Amazon S3 Storage
For Source,
Renditions, Metadata
Sidecar Files
SQS Queue
Rendition Jobs
SQS Queue
Metadata
Processing
Jobs Metadata
Workers
EC2 ASG
Rendition
Workers
Amazon
CloudSearch
EC2 ASG
Media
Content
AWS Elastic
Beanstalk
Elastic Transcoder
Proxy / Thumbnail
Generation
DAM
Web Service
CloudFront
Download
Distribution
EC2 Crawler
Amazon
DynamoDB
DAM
Catalog
Amazon SNS Topic
Amazon S3 Storage
For Source,
Renditions, Metadata
Sidecar Files
SQS Queue
Rendition Jobs
SQS Queue
Metadata
Processing
Jobs Metadata
Workers
EC2 ASG
Rendition
Workers
Amazon
CloudSearch
EC2 ASG
Media
Content
AWS Elastic
Beanstalk
Elastic Transcoder
Proxy / Thumbnail
Generation
DAM
Web Service
CloudFront
Download
Distribution
EC2 Crawler
Amazon
DynamoDB
DAM
Catalog
1. Ingest, Crawl, Notify
a. End user initiates data copy
b. EC2 worker scans Amazon S3 staging bucket
c. EC2 worker copies or moves content
d. EC2 worker broadcasts “NEW DATA” event
(SNS)
2. Metadata Extraction
a. EC2 worker polls inbox (SQS)
b. EC2 worker pulls down media asset from Amazon S3
c. EC2 worker parses media files
d. EC2 worker pumps metadata through ETL flow to prepare for catalog insertion
e. EC2 worker inserts into catalog (Amazon DynamoDB)
Amazon SNS Topic
Amazon S3 Storage
For Source,
Renditions, Metadata
Sidecar Files
SQS Queue
Rendition Jobs
SQS Queue
Metadata
Processing
Jobs Metadata
Workers
EC2 ASG
Rendition
Workers
Amazon
CloudSearch
EC2 ASG
Media
Content
AWS Elastic
Beanstalk
Elastic Transcoder
Proxy / Thumbnail
Generation
DAM
Web Service
CloudFront
Download
Distribution
EC2 Crawler
Amazon
DynamoDB
DAM
Catalog
Amazon SNS Topic
Amazon S3 Storage
For Source,
Renditions, Metadata
Sidecar Files
SQS Queue
Rendition Jobs
SQS Queue
Metadata
Processing
Jobs Metadata
Workers
EC2 ASG
Rendition
Workers
Amazon
CloudSearch
EC2 ASG
Media
Content
AWS Elastic
Beanstalk
Elastic Transcoder
Proxy / Thumbnail
Generation
DAM
Web Service
CloudFront
Download
Distribution
EC2 Crawler
Amazon
DynamoDB
DAM
Catalog
2. Metadata Extraction
a. EC2 worker polls inbox (SQS)
b. EC2 worker pulls down media asset from Amazon S3
c. EC2 worker parses media files
d. EC2 worker pumps metadata through ETL flow to prepare for catalog insertion
e. EC2 worker inserts into catalog (Amazon DynamoDB)
Preparing for Amazon DynamoDB Insert
{
"COMPLETE_NAME" :
{ "S" : "01_01_SoccerF_05_A.mp4" },
"FORMAT" :
{ "S" : "MPEG-4" },
"CODEC_ID" :
{ "S" : "mp42" }
}
Model It and Deploy to EC2! (Talend)
3. Catalog Processing
a. Store metadata record in Amazon DynamoDB
b. Reflect searchable subset to Amazon
CloudSearch
c. Go crazy (HTTP GET)
Amazon SNS Topic
Amazon S3 Storage
For Source,
Renditions, Metadata
Sidecar Files
SQS Queue
Rendition Jobs
SQS Queue
Metadata
Processing
Jobs Metadata
Workers
EC2 ASG
Rendition
Workers
Amazon
CloudSearch
EC2 ASG
Media
Content
AWS Elastic
Beanstalk
Elastic Transcoder
Proxy / Thumbnail
Generation
DAM
Web Service
CloudFront
Download
Distribution
EC2 Crawler
Amazon
DynamoDB
DAM
Catalog
Amazon SNS Topic
Amazon S3 Storage
For Source,
Renditions, Metadata
Sidecar Files
SQS Queue
Rendition Jobs
SQS Queue
Metadata
Processing
Jobs Metadata
Workers
EC2 ASG
Rendition
Workers
Amazon
CloudSearch
EC2 ASG
Media
Content
AWS Elastic
Beanstalk
Elastic Transcoder
Proxy / Thumbnail
Generation
DAM
Web Service
CloudFront
Download
Distribution
EC2 Crawler
2
Amazon
DynamoDB
DAM
Catalog
1
Querying the Catalog (Amazon CloudSearch)
• In Node.js
var optionsget = {
host : 'cloudsearch.demo.aws.com', // here only the domain name
port : 80,
path : '/2011-02-01/search?bq=complete_name:\'-STRAWBERRY\'&
return-fields=complete_name,text_relevance,codec_id_info,
duration,file_size, duration,encoded_date',
method : 'GET'
}
• http://cloudsearch.demo.aws.com/2011-02-
01/search?bq=complete_name : …<field=value>
Customer Case Study (PBS)
Merlin: PBS CMS/DAM
• Code name Merlin
• Structured metadata
• 200+ web object records daily
– 29,046 web objects
• 150+ Video objects daily
– 91,436 videos
• Users from over 150 stations 30 national producers
– Frontline
– Downton Abbey
– PBS Newshour
What’s It Do?
• Large multitenant system – 1200 registered users
• 250 million streams per month
• 20 million unique viewers
• 8 PB of video delivered monthly
Getting Data In
• 33 ingestible web feeds
– Content editors
– Web page listings
• Batch video ingest API
– Video content editors
– External workflow integration
• Manually entered videos
– Video content editors from all 50 states
– Number of user accounts
System Overview
Workflow
Service
Content
API
Amazon
S3 Amazon
RDS
Amazon
RDS
Search Util Amazon
CloudSearch
Amazon
SWF
DAM (Merlin)
RSS
Ingest
API
User
Input
CDN
Basic Workflow
• Object registered with Merlin
• Images registered and processed with ITS
– Stored in CDN fronted Amazon S3 bucket
• Videos registered with VTS
– Jobs sent to Zencoder for processing
– Video stored in CDN fronted Amazon S3 bucket
• Objects ready for clients
– Objects rendered for consumption in Amazon S3
– Objects registered with APIs
– Objects discoverable
Making It Discoverable
• Search util service
• Runs every hour
– Re-indexes last several hours each time
• Polls APIs
– Content API
– Modified time
• Updates Amazon CloudSearch index
– 2 primary indexes
Search Considerations
• Hidden objects
• Rights management
• Partitioned search – Local station search
– Results by geo
– Restrict results for international customers
• Unify and normalize existing APIs – Flatten data model
• Users looking for programs – Specific searches
– Suitable for structured data
Challenges
• No native time field – Convert dates to integers
– Epoch time
• Versioning of documents – Epoch for versioning
• Exposing two versions of most fields – Text searchable
– Facets (copy of text version)
Search Consumers (PBS.org)
Site Search
Search Consumers (Video Portal)
Site Search Programs A-Z
Xbox / OTT
Summary
Summary
• Build an enterprise-scale DAM platform now
– Managed storage and archive (Amazon S3, Amazon Glacier)
– Managed database for catalog processing (Amazon DynamoDB, Amazon
Relational Database Service [RDS])
– Managed search (CloudSearch)
• Application development accelerators
– Elastic Beanstalk harness (web, API, and worker roles)
– Reduced effort with the AWS CLI
• (Almost) fire and forget
AWS Marketplace Can Help
• AWS online software store – Customer can find, research, buy software
– Simple pricing, aligns with EC2 usage model
– 1-click launch in minutes
– Marketplace billing integrated into your AWS account
– 1,000+ products across 24 categories
• Digital asset management related options Include: – WebDAM – centralize, store, manage and distribute collateral
– Digital asset management cloud – web-based open source DAM
– Widen – manage and distribute digital media and brand assets with
user roles and permissions
– Adobe Experience Manager – unified asset management including
mobile
Learn more at: http://aws.amazon.com/marketplace
“DAM!”
Please give us your feedback on this
presentation
As a thank you, we will select prize
winners daily for completed surveys!
MED-402 Building a Scalable Video / DAM Solution in the Cloud