timed text at netflix
TRANSCRIPT
![Page 1: Timed Text At Netflix](https://reader031.vdocuments.net/reader031/viewer/2022012305/58aa2e471a28abbb108b52a3/html5/thumbnails/1.jpg)
Timed Text at Netflix
Rohit Puri ([email protected])
Engineering Manager, Video Systems
Digital Supply Chain, NetflixFebruary 19, 2016 0
![Page 2: Timed Text At Netflix](https://reader031.vdocuments.net/reader031/viewer/2022012305/58aa2e471a28abbb108b52a3/html5/thumbnails/2.jpg)
The Netflix Content Processing System
• Video systems team develops cloud-scalable systems and tools– Audio/Video: ingestion/inspections and packaging/DRM
• e.g., IMF, QuickTime, and MP4-DASH– Timed Text: entire processing pipeline
• e.g., W3C TTML, W3C WebVTT
1
Ingestionand
Inspections
Packagingand
DRMTrans-coding
sourcefromcontentpartner
downloadableto CDN
February 19, 2016
![Page 3: Timed Text At Netflix](https://reader031.vdocuments.net/reader031/viewer/2022012305/58aa2e471a28abbb108b52a3/html5/thumbnails/3.jpg)
Acknowledgements
• Dae Kim ([email protected])
• Shinjan Tiwary ([email protected])
• Harold Sutherland ([email protected])
• David Ronca ([email protected])
• Glenn Adams (Skynav Inc.) - member W3C Timed Text Working Group (TTWG)
February 19, 2016 2
![Page 4: Timed Text At Netflix](https://reader031.vdocuments.net/reader031/viewer/2022012305/58aa2e471a28abbb108b52a3/html5/thumbnails/4.jpg)
Netflix after January 6, 2016
• > 75 million subscribers• ~ 190 countries• > 12,000,000,000 hours streamed in Q4 2015• 20+ languages
3February 19, 2016
![Page 5: Timed Text At Netflix](https://reader031.vdocuments.net/reader031/viewer/2022012305/58aa2e471a28abbb108b52a3/html5/thumbnails/5.jpg)
Talk Outline
• History and Legacy Workflow
• New Workflow
• Standards Activity and Roadmap
4February 19, 2016
![Page 6: Timed Text At Netflix](https://reader031.vdocuments.net/reader031/viewer/2022012305/58aa2e471a28abbb108b52a3/html5/thumbnails/6.jpg)
Talk Outline
• History and Legacy Workflow
• New Workflow
• Standards Activity and Roadmap
5February 19, 2016
![Page 7: Timed Text At Netflix](https://reader031.vdocuments.net/reader031/viewer/2022012305/58aa2e471a28abbb108b52a3/html5/thumbnails/7.jpg)
A Brief History of Timed Text
• Latin Alphabet (2014 and before)– First subtitles delivered in 2009. Bottom-centered, yellow,
italics and underline options (dfxp-simplesdh)
– Follow-up 2012 offering. multiple colors, background, outline, generic font family, font size, position information (dfxp-ls-sdh) (ls => ‘less simple’)
– First WebVTT output profile was created in 2013
• Global Subtitles (2015+)– TTML2 (Timed Text Markup Language) based Japanese
subtitles in 2015
– Bidirectional text with worldwide launch in January 2016February 19, 2016 6
![Page 8: Timed Text At Netflix](https://reader031.vdocuments.net/reader031/viewer/2022012305/58aa2e471a28abbb108b52a3/html5/thumbnails/8.jpg)
Legacy Workflow (2014 and before)
• Two step-procedure– source inspection– source conversion to output
• Sources– CEA-608 based Scenarist Closed Captions (.scc)– EBU Subtitling data exchange format (.stl)– SubRip (.srt)– Timed Text Markup Language 1 (.ttml, .dfxp)
• Outputs– feature restricted TTML1 outputs– feature restricted WebVTT outputs
February 19, 2016 7
![Page 9: Timed Text At Netflix](https://reader031.vdocuments.net/reader031/viewer/2022012305/58aa2e471a28abbb108b52a3/html5/thumbnails/9.jpg)
Legacy Workflow: Inspections
8
ProprietaryCanonical
model
Semantic inspections
STL parser and syntax inspections
SCC parser and syntax inspections
TTML1 parser and syntax inspections
.scc
.stl
.ttml
SRT parser and syntax inspections
.srt
![Page 10: Timed Text At Netflix](https://reader031.vdocuments.net/reader031/viewer/2022012305/58aa2e471a28abbb108b52a3/html5/thumbnails/10.jpg)
Legacy Workflow: Conversions
9
ProprietaryCanonical
model
STL parser
SCC parser
TTML1 parser
.stl
.ttml
SRT parser.srt
TTMLwriter
WebVTTwriter
.scc
![Page 11: Timed Text At Netflix](https://reader031.vdocuments.net/reader031/viewer/2022012305/58aa2e471a28abbb108b52a3/html5/thumbnails/11.jpg)
Talk Outline
• History and Legacy Workflow
• New Workflow
• Standards Activity and Roadmap
10February 19, 2016
![Page 12: Timed Text At Netflix](https://reader031.vdocuments.net/reader031/viewer/2022012305/58aa2e471a28abbb108b52a3/html5/thumbnails/12.jpg)
Japanese Subtitles
• Essential Features– vertical text– ruby annotations– horizontal-in-vertical– Unicode
11February 19, 2016
![Page 13: Timed Text At Netflix](https://reader031.vdocuments.net/reader031/viewer/2022012305/58aa2e471a28abbb108b52a3/html5/thumbnails/13.jpg)
Japanese Subtitles Challenges
• New source format - Videotron Lambda (.cap)– inaccessible specification– ambiguity in format– non-interoperable vendor implementations
• TTML1 does not support essential Japanese features, TTML2 needed
• Netflix device SDK did not support essential rendering features– delivery of image-based subtitles
– significant investment in open source software (OSS) tools 12February 19, 2016
![Page 14: Timed Text At Netflix](https://reader031.vdocuments.net/reader031/viewer/2022012305/58aa2e471a28abbb108b52a3/html5/thumbnails/14.jpg)
Japanese Subtitles Conversion Workflow (1-off)
• .cap sources converted to TTML2, and then to image subtitles or webVTT (webVTT specification appears incomplete w.r.t. Japanese)
• Image subtitles archive contains .png images + manifest with timing and positioning information
• Green modules available as OSS (https://github.com/skynav/ttt)
13
cap2tt
TTPETTX
WebVTT writer
Archiver
ttml2
ttml2 ISD .png images
February 19, 2016
![Page 15: Timed Text At Netflix](https://reader031.vdocuments.net/reader031/viewer/2022012305/58aa2e471a28abbb108b52a3/html5/thumbnails/15.jpg)
New Workflow: Inspections
14
TTML2based
canonical model
Semantic inspections
STL parser and syntax inspections
SCC parser and syntax inspections
CAP parserand syntax inspections
TTML1,TTML2,IMSC1
SRT parser and syntax inspections
TTML familyparser and syntax inspections
model serialized to disk
Sourceformat and
charsetdetection
inputfile
February 19, 2016
![Page 16: Timed Text At Netflix](https://reader031.vdocuments.net/reader031/viewer/2022012305/58aa2e471a28abbb108b52a3/html5/thumbnails/16.jpg)
New Workflow: Conversions
• Configurable filter-based architecture• Bank of model-domain filters• Output writer generates text or images
15
model-domain filter chain
deserializedmodel FilterN
Output generator
Filter1output
specificfilters
outputwriter
outputfile
February 19, 2016
![Page 17: Timed Text At Netflix](https://reader031.vdocuments.net/reader031/viewer/2022012305/58aa2e471a28abbb108b52a3/html5/thumbnails/17.jpg)
Global Subtitles
• i18n-grade timed text processing workflow
• All languages of this world (+Klingon)
• New TTML2-based output profile (“nflx-ttml-gsdh”)– vertical text– horizontal-in-vertical– ruby annotations– bidirectional text
• Both text and image delivery options16February 19, 2016
![Page 18: Timed Text At Netflix](https://reader031.vdocuments.net/reader031/viewer/2022012305/58aa2e471a28abbb108b52a3/html5/thumbnails/18.jpg)
Talk Outline
• History and Legacy Workflow
• New Workflow
• Standards Activity and Roadmap
17February 19, 2016
![Page 19: Timed Text At Netflix](https://reader031.vdocuments.net/reader031/viewer/2022012305/58aa2e471a28abbb108b52a3/html5/thumbnails/19.jpg)
Our Experience with Source Formats
• SCC– limited to Latin alphabet– non-standard use of SMPTE timecode
• EBU-STL– dated1, ambiguous, non-interoperable industry practices– no support for Asian character set
• SRT– Too simple - positioning information not used in practice
• LambdaCAP– ambiguous format - hard to find official specification
• TTML1– not self-contained - needs “Document Processing Context”– no support for Japanese rendering features
181“The medium for exchange is a 3.5-inch high-density portable magnetic disk (microfloppy). The disk is formatted for 1.44 Mbytes (2 sides, 80 tracks, 18 sectors/track).”
![Page 20: Timed Text At Netflix](https://reader031.vdocuments.net/reader031/viewer/2022012305/58aa2e471a28abbb108b52a3/html5/thumbnails/20.jpg)
TTWG Standards
• IMSC1 (Internet Media Subtitles and Captions) is TTML1-based W3C candidate recommendation; mandatory for IMF– multiple Netflix sponsored implementations were announced to
TTWG on February 1, 2016– Netflix plans 100% support for IMF– Netflix ingest implementation will support IMSC-T (text profile)
• We have multiple TTML2 implementations in development - will support TTML2-based IMSC2
• Netflix is enthusiastic about HTMLCue
19February 19, 2016
![Page 21: Timed Text At Netflix](https://reader031.vdocuments.net/reader031/viewer/2022012305/58aa2e471a28abbb108b52a3/html5/thumbnails/21.jpg)
Open Source Activity
• regxmllib (https://github.com/sandflow/regxmllib)– sponsored by Netflix– tools that provide essential building blocks for authoring of
IMF CPL
• ttt (https://github.com/skynav/ttt)– sponsored by Netflix– tools for validation and rendering of TTML1/2
• photon (https://github.com/Netflix/photon) (December 2015)– developed at Netflix– complete set of tools for validation of IMF packages
February 19, 2016 20
![Page 22: Timed Text At Netflix](https://reader031.vdocuments.net/reader031/viewer/2022012305/58aa2e471a28abbb108b52a3/html5/thumbnails/22.jpg)
Talk Summary
• Subtitles experience core to Netflix business
• Netflix committed to TTWG standards– multiple IMSC1 and TTML2 implementations declared or
in flight
• Netflix plans to be 100% IMF
• Netflix is actively involved in OSS efforts around timed text as well as IMF
21February 19, 2016
![Page 23: Timed Text At Netflix](https://reader031.vdocuments.net/reader031/viewer/2022012305/58aa2e471a28abbb108b52a3/html5/thumbnails/23.jpg)
2015 Tech Emmy for Netflix
• Netflix was a co-recipient of 2015 Technology and Engineering Emmy Award “Standardization and Pioneering Development of Non-Live Broadcast Captioning”
22
![Page 24: Timed Text At Netflix](https://reader031.vdocuments.net/reader031/viewer/2022012305/58aa2e471a28abbb108b52a3/html5/thumbnails/24.jpg)
Questions?
• Rohit Puri ([email protected])• https://www.linkedin.com/in/rohit-puri-0a13b02
February 19, 2016 23