1 introduction to archiving movies in a digital world dave cavena, sun microsystems january, 2007
TRANSCRIPT
1
Introduction toArchiving Movies in a Digital World
Dave Cavena, Sun Microsystems
January, 2007
2
Agenda
Overview
Archiving
Archived content integrity
Proposed model
Costs
Alternatives?
Summary
Conclusion
3
Overview
Has the time come to begin archiving movies digitally?
Only archiving remains reliant on film
Digital image archive technology is mature
A viable, scalable, cost-effective COTS model
What are the alternatives?
4
The stories of an AgeFiduciary responsibilityA Digital Content Archive can store these assets
without degradationforever
Archiving
5
Chairman Chairman
Vice ChairmanVice Chairman
Archiving
Any movie archived in 1907 is playable in 2007Will a celluloid movie archived in 2007 be
playable in 2107? Is it time to start digital archiving of this
irreplaceable content?
6
Archiving
Will the Archive be the only time the story exists on film?What are celluloid archive and repurposing
costs?A Digital Content Archive provides image and
cost advantages over celluloidCan be accomplished with COTS Technology
7
Archived Content Integrity
Irreplaceable contentMultiple copiesMultiple librariesAutomated audit, copy
Algorithmic assurance of bit integrityError Correction Codes (ECC)Bit Error DetectionBit Error Correction
8
Archived Content Integrity
ECC Standard on tape drivesCOTS technology
Bit Error Rates*Bit Error Rates (BER) differ by manufacturer ECC undetected BER = 10-33
Four copies = 10-128
ECC uncorrected BER = 10-19
Four copies = 10-76
10TB Digital Intermediate = 1014 bitsOne uncorrectable bit error in 1062 movies (10-76 * 1014)
* Sun T10000 drive
9
Archived Content Integrity
Generational data integrity20 generations of compute/disk front-end5 generations of librariesUnknown generations of application file formatsAt least 12 rewrites of the content onto new media
What is the generational impact on the algorithmic BER?
10
Archived Content IntegrityFor this application it doesn’t matter how many times
the data is accessed; how many generations of rewriteProbability that the ECC will fail to correct damage during any given access is 10-19.The probability it will fail one or more times during N accesses is 1 minus the probability that it will succeed N times in a row:
1-(1-10-19)N
For N less than 1019, this is well approximated by
N*10-19
11
Archived Content Integrity
ExampleAssume a movie accessed one million timesThe chance of an uncorrectable bit error per read is 10-19
The chance of an uncorrectable bit error on any one of 106 reads is 106 * 10-19 = 10-13
For a single copy
It reasonably can be assumed for the purposes of this application that the ability to detect and correct errors in transcription is perfect.
12
Archived Content Integrity
Other StrategiesSecure Hashing Algorithm, SHA-256*
Checksum failure probability of 2-256, or approximately 10-77
Four-copy BER = 10-308
One undetected bit loss in 10294 moviesBirthday collisions don’t apply; not defending against traffic analysis, just using it as a good checksum
Voting bit-by-bitCan make a 10TB DCDM into 40 1TB files, 31 of
which would have to be damaged to preclude rebuilding the original
* Developed by the NSA, publicly available, peer-reviewed, easy to implement
13
Archive Model
Enterprise class tape libraryFront-end server and disk
Ingest and prepare Archive Object for writing to tape libraryHierarchical Storage Manager, HSM
Two complete and identical systems, geographically separateTwo copies of each movie on each library
14
Archive Model
Computers and disk front-ends reach EOSL 5-yr replacement
Tape drives reach EOSL10-yr replacement
Libraries reach EOSL20-yr replacement
Tape media has a finite lifetime*Replace tapes every 10 yearsAudit every tape every six monthsRe-write from pristine copies as necessary
*National Media Lab, IBM, Sun, others, publish 30 years as viable tape media lifetime
15
Archive Model
Application software and file formatsProposed archive model HSM uses an open tarball format, readable even without the applicationWhen a tape is audited, rewritten or copied, the new copy can be created in the new file formatThis is feasible because the underlying data format remains digitally fixed, only the file format and / or storage medium change
16
Archive Model
Institutional memory must be createdTwo or more sites are required, geographically
separateNo network connectivityArchive content in the clear
Same as current modelLost key or algorithm will render archive uselessCan be encrypted for transport (tape drive HW encryption becoming the norm)
When copying tapes, send old ones to another location
17
Oil & Gas has been archiving digital images for decadesMedical is doing this with far higher transaction
ratesLibrary of Congress doing it now
"Storing National Treasures" http://www.enterprisestorageforum.com/sans/features/article.php/
3586066
"Sun Rises at the Library of Congress"http://www.enterprisestorageforum.com/sans/features/article.php/
3619646
Archive Model
18
Costs
Can digital compete with celluloid?Film archiving cost
$100K /100 years / feature2,000 movies = $200M
10TB archive object, 20 objects/year, 100 years$45,000/movie (list)$16,000/movie (Archive pricing)2,000 movies = $32M
100TB archive object$67,000/movie (Archive pricing)2,000 movies = $79M
19
Costs10TB Archive Object – List price
$2,601,160
$408,631
$114,218 $73,094 $57,330 $45,493
$0
$500,000
$1,000,000
$1,500,000
$2,000,000
$2,500,000
$3,000,000
Movies in Archive
Do
lla
rs
10 100 500 1000 1500 2000
20
Costs10TB Archive Object – List price
SAM License +
mtce 75%
Media3%
Library and Drives + mtce
18%
Compute/Disk + mtce 4%
Description (both libraries, two
copies/movie/library) Cost
Compute/Disk + mtce $3,200,000Library and Drives + mtce $16,112,000Media $2,699,000SAM License + mtce $68,974,400Total $90,985,400
21
Costs10TB Archive Object – Archive price
SAM License + mtce32%
Media8%
Library and Drives + mtce
50%
Compute/Disk + mtce10%
Description (both libraries, two
copies/movie/library) Cost
Compute/Disk + mtce $3,200,000Library and Drives + mtce $16,112,000Media $2,699,000SAM License + mtce $10,519,184Total $32,530,184
22
Costs10TB Archive Object – Archive price
$2,005,850
$236,852
$54,072 $29,706 $21,259 $16,265
$0
$500,000
$1,000,000
$1,500,000
$2,000,000
$2,500,000
Movies in Archive
Do
lla
rs
10 100 500 1000 1500 2000
23
Costs100TB Archive Object – List price
$6,488,535
$5,006,303
$3,791,715
$3,180,640
$2,641,742
$2,121,252
$0
$1,000,000
$2,000,000
$3,000,000
$4,000,000
$5,000,000
$6,000,000
$7,000,000
Movies in Archive
Do
lla
rs
20 100 500 1000 1500 2000
24
Costs100TB Archive Object – List price
SAM Mtce96%
SAM License2%
Library and Drives + Mtce
1%
Compute/Disk + Mtce
0%
Media1%
Description (both libraries, two
copies/movie/library) Price
Compute/Disk + Mtce $6,400,000Library and Drives + Mtce $28,133,000Media $26,128,000SAM License $80,612,300
SAM Mtce $4,101,230,000Total $4,242,503,300
25
Costs100TB Archive Object – Archive price
SAM Mtce13%
SAM License10%
Library and Drives + Mtce
36%
Compute/Disk + Mtce
8%
Media33%
Description (both libraries, two
copies/movie/library) Price
Compute/Disk + Mtce $6,400,000Library and Drives + Mtce $28,133,000Media $26,128,000SAM License $8,061,230
SAM Mtce $10,499,149Total $79,221,379
26
Costs100TB Archive Object – Archive price
$1,462,466
$505,009
$170,432$112,159 $85,527 $67,171
$0
$200,000
$400,000
$600,000
$800,000
$1,000,000
$1,200,000
$1,400,000
$1,600,000
Movies in Archive
Do
llar
s
20 100 500 1000 1500 2000
27
Alternatives
An unaddressed question… Does celluloid have a future – at all?
Replaced by commercial photographers globallyPrecipitous drop in market share and manufacturer jobsEnvironmentally unfriendly to manufacture and process
Celluloid may not be an option Film may not even exist in 100 yearsFilm infrastructure – labs, chemicals, workers, etc. - may not exist
28
Summary
The technology required to store and maintain irreplaceable digital image content for archive durations is mature, proven and in use today A Digital Content Archive will extend the quick
responsiveness of a studio’s Library to the ArchiveThe return on these increasingly expensive
assets easily can be extended – forever… all using COTS technology
29
ConclusionThe pivotal and immutable point is that this can be
done beginning today.
The experience Sun brings to the project already has been recognized, and is being broadened
by, the Library of Congress and other locations around the world undertaking the
digitization of their media assets using solutions from Sun Microsystems.
The time is now to begin serious efforts to test and implement studio
Digital Content Archives