epfl / montreux jazz: a petabyte scale live video archive case

30

Upload: others

Post on 24-Nov-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: EPFL / Montreux Jazz: A Petabyte Scale Live Video Archive Case
Page 2: EPFL / Montreux Jazz: A Petabyte Scale Live Video Archive Case

EPFL / Montreux Jazz:

A Petabyte Scale

Live Video Archive Case

Study

Paul Speciale

VP Products

Amplidata Inc.

Page 3: EPFL / Montreux Jazz: A Petabyte Scale Live Video Archive Case

• Intro

• Live Archives for Big Unstructured Data

• Jazz Montreux

• EPFL & Montreux Sounds intro

• Jazz Montreux Live Video Archive

• Optimized Object Storage

• Q&A

Agenda

Page 4: EPFL / Montreux Jazz: A Petabyte Scale Live Video Archive Case

Introduction

• Paul Speciale – VP Products Amplidata

• Amplidata Incorporated

– Founders of DCT (became NetBackup Puredisk Dedupe technology - acquired by Veritas/Symantec)

– Technology was incubated since 2005 at Incubaid (www.incubaid.com) – Belgium based R&D (Lochristi, outside Gent)

• US Headquarters in Redwood City, CA

– World Wide Support centers in Redwood City, CA; Belgium, Taipei Taiwan

• Products – AmpliStor Optimized Object Storage (OOS) systems – customers in media &

entertainment, Enterprise IT, Social Media, Scientific – BitSpread (Erasure Encoding) & BitDynamcs (self-healing agent) software - sold

through OEM’s

Page 5: EPFL / Montreux Jazz: A Petabyte Scale Live Video Archive Case

5

• Growth of data to 35ZB estimated by 2020

• Much of that growth (80%) is driven by unstructured data

• Billions of large objects and files

Media Archives Online Images Large Files

Medical Images Online Storage Online Movies

Storage facts and trends

Page 6: EPFL / Montreux Jazz: A Petabyte Scale Live Video Archive Case

6

M&E is driving huge capacity requirements, both with file sizes and volume of files

and storage capacities in use, driven by HD, 3D video formats:

“Petabytes are peanuts”

3TB per hour for 4K video

2X for 3D formats

25TB per day of HD shooting

Storage facts and trends:

Media & Entertainment Industry

Page 7: EPFL / Montreux Jazz: A Petabyte Scale Live Video Archive Case

Big Data for Analytics

7

• In the 90’s, we experienced an explosion of data captured for analytics purposes:

– Academic Research

– Chemical R&D facilities

– Travel industry

– Geo-industry, oil & gas

– Financial / Trading

– Agriculture

• In the 2000’s, online applications & social media triggered a flood of trend data

Page 8: EPFL / Montreux Jazz: A Petabyte Scale Live Video Archive Case

Big Data for Analytics

• Data is captured as many small log files & concatenated as “Big Data”

• Relational databases were not optimal:

– Too much data, too big

– Insufficient performance for analytics

• This stimulated innovations: – Hadoop, MapReduce, GFS

– XML databases

• => This is Big Data for Analytics

Page 9: EPFL / Montreux Jazz: A Petabyte Scale Live Video Archive Case

Big Data Evolution:

Big Unstructured Data

9

• Today, Big Data trend refers to Big Data for Analytics & Big Unstructured Data: • Media • Streaming • Business • Scientific

• Fundamentally different data but with lots of similarities – Immense capacities – Number of transactions or objects

• Unstructured data is traditionally stored on host

files systems but: – Host file systems impose fixed limits - do not scale up

to the size we need – File systems do not meet performance requirements

due to host limiting access

Page 10: EPFL / Montreux Jazz: A Petabyte Scale Live Video Archive Case

Big Unstructured Data Archives

• Big Unstructured Data Archives – This data has been typically viewed as a cost to manage,

not a business value

– Volumes are increasing exponentially especially with media data (video, images)

• Dormant Archives

– Often archived to tape (for low cost), then difficult to access

– Data archives are an organization & management burden

– Creates unmanaged “Grandma’s Attic”

Page 11: EPFL / Montreux Jazz: A Petabyte Scale Live Video Archive Case

Reactivating Dormant Big Data

• Companies are starting to see the value of the data in their archives: – Documents of individuals can

be valuable for others (sharing & reuse)

– Some companies have legal reasons to keep data available

– Unexplored analytics opportunities

– This data can be mined and monetized

• This is the story of the Jazz Montreux Project – Data once dormant reactivated

onto a “live archive”

– Data ways then monetized in several ways

Page 12: EPFL / Montreux Jazz: A Petabyte Scale Live Video Archive Case

Live Archives

But how do store all this data in a cost efficient way?

“Building cost-efficient Live

Archives”

Page 13: EPFL / Montreux Jazz: A Petabyte Scale Live Video Archive Case

Live Archive Requirements What are the requirements?

– Tape is a difficult option:

access latency is key

– Data has to be always available online

– Direct interface to the applications

– Petabyte scalability

– Extreme reliability, integrity

– Cost-efficient

– Security

Disk Storage

(online, low-latency access)

+ Open application API’s (App & Cloud-enabled)

+ Ultra-high data durability (Erasure Coding)

= Optimized Object Storage

}

}

Page 14: EPFL / Montreux Jazz: A Petabyte Scale Live Video Archive Case

Jazz Montreux & EPFL intro

Page 15: EPFL / Montreux Jazz: A Petabyte Scale Live Video Archive Case

• Started 45 Years Ago in Montreux, Switzerland – Held every July since 1867

– Has become the premier Jazz & Music Festival Globally

• Founder Claude Nobs – Claude is a big technology fan (aka

“geek”)

– Recorded all concerts on video tape (12 different video tape and audio tape formats)

– Started recording in HD video formats in early 1990’s

Montreux Jazz Festival

Page 16: EPFL / Montreux Jazz: A Petabyte Scale Live Video Archive Case

• All recordings were being “archived” at Claude’s Chalet on lake Geneva!

Video Storage

Page 17: EPFL / Montreux Jazz: A Petabyte Scale Live Video Archive Case

• Music References

– Keith Jarrett, Frank Zappa, Led Zeppelin, BB King, Aretha

• “Smoke on the Water” – by Deep Purple (1972)

– Lyrics reference Montreux Jazz Festival… (www.elyrics.net):

– Also mentions Claude Nobs

Jazz Montreux in Popular Culture

“We all came out to Montreux On the Lake Geneva shoreline To make records with a mobile We didn't have much time

Frank Zappa and the Mothers Were at the best place around But some stupid with a flare gun Burned the place to the ground, now

Smoke on the water A fire in the sky Smoke on the water

But burning down

You know, they burned down the gambling house It died with an awful sound Funky Claude was running in and out He was pulling kids out the ground

When it all was over We had to find another place Swiss time was running out It seemed that we would lose the race, now”

Page 18: EPFL / Montreux Jazz: A Petabyte Scale Live Video Archive Case

Metamedia Center

Project Sponsor:

Alexandre Delidais

MetaMedia center

[email protected]

Page 19: EPFL / Montreux Jazz: A Petabyte Scale Live Video Archive Case

EPFL Metamedia Center • EPFL (Swiss Federal Institute for Technology)

– Based in Lausanne (near Lake Geneva)

– Mission (education, research, technology transfer)

– University of 8000+ students (IT, Life Sciences, Architecture)

– 350 labs

• Metamedia Center – Make a bridge between research projects and the

industry

– Accelerate technology transfer

– Applicable fields: Media & ICT

– Flagship project: Montreux Sounds Digital project

• A Resource for research

Page 20: EPFL / Montreux Jazz: A Petabyte Scale Live Video Archive Case

The Montreux project

• Montreux Sounds Digital project – Cultural project financed by sponsors

• 3 Main Goals: – Save the archives and o sustain them for the long run

– Make the archives accessible for cultural and scientific projects

– Monetize the archives online

=> Need for an active archive

Page 21: EPFL / Montreux Jazz: A Petabyte Scale Live Video Archive Case

The key requirements…

• Use cases

– Static archive

• data loss protection, security

– File system for daily usage

• Large file repository: systems need to support files up to 1TB each

• Large file data transfer

– High speed streaming: peak throughput >> 1Gbps

AND

– Low cost of ownership, low power costs, simple administration

Page 22: EPFL / Montreux Jazz: A Petabyte Scale Live Video Archive Case

Storage Strategy

• First choice was LTO (2010) – Main thought is that it was “Cheap”

– At least 7-10 years of sustainability

– Already an IT solution (versus pro video)

– No satisfactory disk based solution

• First interest in Amplidata – Met with CTO and learned about disk-based Object Storage

– Low power system – a key element for EPFL (demanded by University)

– High durability “bit perfect preservation” Erasure Coding (BitSpread)

– Online Performance with low access latency

• Additional requirements – Long life cycle (”a hundred year archive”)

– High security, protect access to media by unauthorized users

– High availability (built-in redundancy to prevent loss of data)

– “Fast” and easy replication mechanisms (in case of loss of data)

– Open access APIs

Page 23: EPFL / Montreux Jazz: A Petabyte Scale Live Video Archive Case

• Selected AmpliStor Optimized Object Storage as the scale-out Archive system

– EPFL acquired two AmpliStor systems in Fall 2011 (1.2PB)

– Low-power packaging (1U/20TB appliances - 3 Watts per TB)

– 45 years of Montreux Jazz festival data loaded onto AmpliStor from LTO’s

– LTO will be maintained during initial phases

• Addressed the 3 main objectives

– Save the recordings in a secure archive (static archive)

– Make the archive available for cultural and scientific projects (live archive)

– Scale and maintain the archives

– Enable end-user access in a series of Montreux Jazz Café’s

The Live Archive Solution

Page 24: EPFL / Montreux Jazz: A Petabyte Scale Live Video Archive Case

Object Storage Simplifies the Problem

• File System organization of data becomes a burden

• File systems impose limitations on numbers of files & directories

• Very time-consuming to organize data

• Object Storage simplifies this problem

• Flat “Namespaces” (not file systems) - without storage limits

• Let’s the applications talk directly to the Storage

• Use “Object” application API’s to let applications directly manage objects & metadata

• File Gateways can be used as a transition bridge

• Bring legacy data and apps into Object Storage

Object API

Application Application Application

Page 25: EPFL / Montreux Jazz: A Petabyte Scale Live Video Archive Case

Optimized Object Storage • Single highly-durable disk tier with:

– Scale-out architecture with dedicated controllers & storage nodes

– Low power “Green” SATA disks, low-power processors in storage nodes

– Erasure coding architecture utilizes the entire storage pool

Page 26: EPFL / Montreux Jazz: A Petabyte Scale Live Video Archive Case

Petabyte Scalability and Beyond Systems scalability • Metadata support for millions to billions of objects

• Incremental capacity growth with no disruption or downtime

Manageability at higher scale • Add resources incrementally and grow as a Single System View

• Manage from a “Single Pane of Glass”

• Scale performance and capacity separately

• Migration and seamless growth across newer generations of component technologies (processors, disk densities)

Page 27: EPFL / Montreux Jazz: A Petabyte Scale Live Video Archive Case

Ultra-High Levels of Data Integrity

• Data needs to be archived for lifetimes

– Expect “bit perfect” integrity to store gold-copy of critical assets

– Consolidate multiple copies of data into a single highly-durable tier

• Ensuring the integrity of long-term unstructured data archive requires new data protection algorithms, to:

– Address the increasing capacity of disk drives

– Solve issues related to long RAID rebuild windows

“Object storage systems based on erasure-coding can not only protect data from higher numbers of drive failures, but also against the failure of entire storage modules and data corruption.”

Page 28: EPFL / Montreux Jazz: A Petabyte Scale Live Video Archive Case

The future • Next steps

– Grow the archive as digitalization progresses

– Use the system in EPFL education projects

– Duplicate the system for the Montreux organization

– DVD and TV footage production

• Streaming in Montreux Jazz Cafés

– Zurich, Geneva and Paris today

– More planned (New York, Sydney, Tokyo)

Page 29: EPFL / Montreux Jazz: A Petabyte Scale Live Video Archive Case

Thank You

Q&A

Page 30: EPFL / Montreux Jazz: A Petabyte Scale Live Video Archive Case