international iso/iec standard 14496-12 · 2018-06-07 · electronic or mechanical, including...
TRANSCRIPT
Referencenumber
ISO/IEC14496‐12:2015(E)
©ISO/IEC2015
INTERNATIONALSTANDARD
ISO/IEC14496-12
Fifthedition2015‐12‐15
Information technology — Coding of audio-visual objects —
Part12:ISO base media file format
Technologies de l'information — Codage des objets audiovisuels —
Partie 12: Format ISO de base pour les fichiers médias
ISO/IEC 14496-12:2015(E)
COPYRIGHT PROTECTED DOCUMENT
©ISO/IEC2015
All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized in any form or by any means,electronicormechanical,includingphotocopyingandmicrofilm,withoutpermissioninwritingfromeitherISOattheaddressbeloworISO'smemberbodyinthecountryoftherequester.
ISOcopyrightofficeCasepostale56CH‐1211Geneva20Tel.+41227490111Fax+41227490947E‐[email protected]
PublishedinSwitzerland
ii ©ISO/IEC2015–Allrightsreserved
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved iii
Contents Page
1 Scope .......................................................................................................................................................................... 1
2 Normative references .......................................................................................................................................... 1
3 Terms, definitions, and abbreviated terms .................................................................................................. 33.1 Terms and definitions ...................................................................................................................................... 33.2 Abbreviated terms ............................................................................................................................................. 5
4 Object-structured File Organization ............................................................................................................... 64.1 File Structure ....................................................................................................................................................... 64.2 Object Structure .................................................................................................................................................. 64.3 File Type Box ....................................................................................................................................................... 7
5 Design Considerations ......................................................................................................................................... 85.1 Usage ....................................................................................................................................................................... 85.1.1 Introduction ..................................................................................................................................................... 85.1.2 Interchange ....................................................................................................................................................... 85.1.3 Content Creation ............................................................................................................................................. 95.1.4 Preparation for streaming ........................................................................................................................ 105.1.5 Local presentation ....................................................................................................................................... 105.1.6 Streamed presentation ............................................................................................................................... 105.2 Design principles ............................................................................................................................................. 11
6 ISO Base Media File organization .................................................................................................................. 126.1 Presentation structure ................................................................................................................................... 126.1.1 File Structure ................................................................................................................................................. 126.1.2 Object Structure ............................................................................................................................................ 126.1.3 Meta Data and Media Data ........................................................................................................................ 126.1.4 Track Identifiers ........................................................................................................................................... 126.2 Metadata Structure (Objects) ...................................................................................................................... 136.2.1 Box ..................................................................................................................................................................... 136.2.2 Data Types and fields .................................................................................................................................. 136.2.3 Box Order ........................................................................................................................................................ 146.2.4 URIs as type indicators ............................................................................................................................... 176.3 Brand Identification ........................................................................................................................................ 17
7 Streaming Support .............................................................................................................................................. 187.1 Handling of Streaming Protocols ............................................................................................................... 187.2 Protocol ‘hint’ tracks ....................................................................................................................................... 187.3 Hint Track Format ........................................................................................................................................... 19
8 Box Structures ...................................................................................................................................................... 208.1 File Structure and general boxes ................................................................................................................ 208.1.1 Media Data Box .............................................................................................................................................. 208.1.2 Free Space Box ............................................................................................................................................... 21
ISO/IEC 14496-12:2015(E)
iv ©ISO/IEC2015–Allrightsreserved
8.1.3 Progressive Download Information Box ............................................................................................. 218.2 Movie Structure ................................................................................................................................................ 228.2.1 Movie Box ........................................................................................................................................................ 228.2.2 Movie Header Box ........................................................................................................................................ 228.3 Track Structure ................................................................................................................................................ 248.3.1 Track Box ........................................................................................................................................................ 248.3.2 Track Header Box ........................................................................................................................................ 248.3.3 Track Reference Box ................................................................................................................................... 268.3.4 Track Group Box .......................................................................................................................................... 278.4 Track Media Structure ................................................................................................................................... 288.4.1 Media Box ....................................................................................................................................................... 288.4.2 Media Header Box ........................................................................................................................................ 298.4.3 Handler Reference Box .............................................................................................................................. 298.4.4 Media Information Box .............................................................................................................................. 308.4.5 Media Information Header Boxes .......................................................................................................... 308.4.6 Extended language tag ............................................................................................................................... 318.5 Sample Tables ................................................................................................................................................... 328.5.1 Sample Table Box ......................................................................................................................................... 328.5.2 Sample Description Box ............................................................................................................................. 328.5.3 Degradation Priority Box .......................................................................................................................... 348.5.4 Sample Scale Box .......................................................................................................................................... 358.6 Track Time Structures ................................................................................................................................... 358.6.1 Time to Sample Boxes ................................................................................................................................ 358.6.2 Sync Sample Box ........................................................................................................................................... 408.6.3 Shadow Sync Sample Box .......................................................................................................................... 408.6.4 Independent and Disposable Samples Box ......................................................................................... 418.6.5 Edit Box ............................................................................................................................................................ 438.6.6 Edit List Box ................................................................................................................................................... 438.7 Track Data Layout Structures ..................................................................................................................... 458.7.1 Data Information Box ................................................................................................................................. 458.7.2 Data Reference Box ..................................................................................................................................... 458.7.3 Sample Size Boxes ........................................................................................................................................ 478.7.4 Sample To Chunk Box ................................................................................................................................. 488.7.5 Chunk Offset Box .......................................................................................................................................... 498.7.6 Padding Bits Box .......................................................................................................................................... 498.7.7 Sub-Sample Information Box ................................................................................................................... 508.7.8 Sample Auxiliary Information Sizes Box ............................................................................................. 518.7.9 Sample Auxiliary Information Offsets Box ......................................................................................... 538.8 Movie Fragments ............................................................................................................................................. 548.8.1 Movie Extends Box ....................................................................................................................................... 548.8.2 Movie Extends Header Box ....................................................................................................................... 548.8.3 Track Extends Box ....................................................................................................................................... 558.8.4 Movie Fragment Box ................................................................................................................................... 568.8.5 Movie Fragment Header Box ................................................................................................................... 568.8.6 Track Fragment Box .................................................................................................................................... 578.8.7 Track Fragment Header Box .................................................................................................................... 57
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved v
8.8.8 Track Fragment Run Box ........................................................................................................................... 588.8.9 Movie Fragment Random Access Box .................................................................................................... 608.8.10 Track Fragment Random Access Box ................................................................................................. 608.8.11 Movie Fragment Random Access Offset Box .................................................................................... 618.8.12 Track fragment decode time .................................................................................................................. 628.8.13 Level Assignment Box .............................................................................................................................. 638.8.14 Sample Auxiliary Information in Movie Fragments ...................................................................... 658.8.15 Track Extension Properties Box ........................................................................................................... 658.8.16 Alternative Startup Sequence Properties Box ................................................................................. 668.8.17 Metadata and user data in movie fragments ................................................................................... 668.9 Sample Group Structures .............................................................................................................................. 678.9.1 Introduction ................................................................................................................................................... 678.9.2 Sample to Group Box ................................................................................................................................... 688.9.3 Sample Group Description Box ................................................................................................................ 698.9.4 Representation of group structures in Movie Fragments .............................................................. 708.10 User Data .......................................................................................................................................................... 718.10.1 User Data Box .............................................................................................................................................. 718.10.2 Copyright Box .............................................................................................................................................. 728.10.3 Track Selection Box ................................................................................................................................... 728.10.4 Track kind .................................................................................................................................................... 748.11 Metadata Support .......................................................................................................................................... 758.11.1 The Meta box ............................................................................................................................................... 758.11.2 XML Boxes ..................................................................................................................................................... 768.11.3 The Item Location Box ............................................................................................................................. 778.11.4 Primary Item Box ....................................................................................................................................... 808.11.5 Item Protection Box .................................................................................................................................. 808.11.6 Item Information Box ............................................................................................................................... 818.11.7 Additional Metadata Container Box .................................................................................................... 838.11.8 Metabox Relation Box .............................................................................................................................. 848.11.9 URL Forms for meta boxes ...................................................................................................................... 858.11.10 Static Metadata ......................................................................................................................................... 858.11.11 Item Data Box ........................................................................................................................................... 868.11.12 Item Reference Box ................................................................................................................................. 878.11.13 Auxiliary video metadata ..................................................................................................................... 888.12 Support for Protected Streams ................................................................................................................. 888.12.1 Protection Scheme Information Box ................................................................................................... 898.12.2 Original Format Box .................................................................................................................................. 908.12.3 IPMPInfoBox ................................................................................................................................................ 908.12.4 IPMP Control Box ....................................................................................................................................... 908.12.5 Scheme Type Box ....................................................................................................................................... 908.12.6 Scheme Information Box ......................................................................................................................... 918.13 File Delivery Format Support .................................................................................................................... 918.13.1 Introduction ................................................................................................................................................. 918.13.2 FD Item Information Box ......................................................................................................................... 928.13.3 File Partition Box ....................................................................................................................................... 928.13.4 FEC Reservoir Box ...................................................................................................................................... 94
ISO/IEC 14496-12:2015(E)
vi ©ISO/IEC2015–Allrightsreserved
8.13.5 FD Session Group Box .............................................................................................................................. 958.13.6 Group ID to Name Box .............................................................................................................................. 968.13.7 File Reservoir Box ..................................................................................................................................... 968.14 Sub tracks ........................................................................................................................................................ 978.14.1 Introduction ................................................................................................................................................ 978.14.2 Backward compatibility .......................................................................................................................... 978.14.3 Sub Track box ............................................................................................................................................. 988.14.4 Sub Track Information box .................................................................................................................... 988.14.5 Sub Track Definition box ...................................................................................................................... 1008.14.6 Sub Track Sample Group box .............................................................................................................. 1008.15 Post-decoder requirements on media ................................................................................................. 1008.15.1 General ........................................................................................................................................................ 1008.15.2 Transformation ........................................................................................................................................ 1018.15.3 Restricted Scheme Information box ................................................................................................. 1028.15.4 Scheme for stereoscopic video arrangements .............................................................................. 1028.16 Segments ........................................................................................................................................................ 1048.16.1 Introduction .............................................................................................................................................. 1048.16.2 Segment Type Box ................................................................................................................................... 1048.16.3 Segment Index Box .................................................................................................................................. 1058.16.4 Subsegment Index Box .......................................................................................................................... 1098.16.5 Producer Reference Time Box ............................................................................................................ 1118.17 Support for Incomplete Tracks .............................................................................................................. 1128.17.1 General ........................................................................................................................................................ 1128.17.2 Transformation ........................................................................................................................................ 1138.17.3 Complete Track Information Box ...................................................................................................... 114
9 Hint Track Formats .......................................................................................................................................... 1149.1 RTP and SRTP Hint Track Format ........................................................................................................... 1149.1.1 Introduction ................................................................................................................................................. 1149.1.2 Sample Description Format ................................................................................................................... 1159.1.3 Sample Format ............................................................................................................................................ 1179.1.4 SDP Information ......................................................................................................................................... 1199.1.5 Statistical Information ............................................................................................................................. 1209.2 ALC/LCT and FLUTE Hint Track Format ................................................................................................ 1219.2.1 Introduction ................................................................................................................................................. 1219.2.2 Design principles ....................................................................................................................................... 1229.2.3 Sample Description Format ................................................................................................................... 1239.2.4 Sample Format ............................................................................................................................................ 1249.3 MPEG-2 Transport Hint Track Format ................................................................................................... 1279.3.1 Introduction ................................................................................................................................................. 1279.3.2 Design Principles ....................................................................................................................................... 1289.3.3 Sample Description Format ................................................................................................................... 1309.3.4 Sample Format ............................................................................................................................................ 1329.3.5 Protected MPEG 2 Transport Stream Hint Track ........................................................................... 1349.4 RTP, RTCP, SRTP and SRTCP Reception Hint Tracks ........................................................................ 1349.4.1 RTP Reception Hint Track ...................................................................................................................... 134
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved vii
9.4.2 RTCP Reception Hint Track ................................................................................................................... 1389.4.3 SRTP Reception Hint Track .................................................................................................................... 1409.4.4 SRTCP Reception Hint Tracks ............................................................................................................... 1429.4.5 Protected RTP Reception Hint Track ................................................................................................. 1439.4.6 Recording Procedure ............................................................................................................................... 1439.4.7 Parsing Procedure .................................................................................................................................... 143
10 Sample Groups ................................................................................................................................................ 14310.1 Random Access Recovery Points .......................................................................................................... 14310.2 Rate Share Groups ...................................................................................................................................... 14410.2.1 Introduction .............................................................................................................................................. 14410.2.2 Rate Share Sample Group Entry ........................................................................................................ 14610.2.3 Relationship between tracks .............................................................................................................. 14710.2.4 Bitrate allocation .................................................................................................................................... 14710.3 Alternative Startup Sequences .............................................................................................................. 14810.3.4 Examples .................................................................................................................................................... 14910.4 Random Access Point (RAP) Sample Grouping ................................................................................ 15110.5 Temporal level sample grouping .......................................................................................................... 15210.6 Stream access point sample group ....................................................................................................... 152
11 Extensibility ..................................................................................................................................................... 15311.1 Objects ............................................................................................................................................................ 15311.2 Storage formats ........................................................................................................................................... 15411.3 Derived File formats .................................................................................................................................. 154
12 Media-specific definitions ........................................................................................................................... 15512.1 Video media .................................................................................................................................................. 15512.1.1 Media handler .......................................................................................................................................... 15512.1.2 Video media header ............................................................................................................................... 15512.1.3 Sample entry ............................................................................................................................................. 15612.1.4 Pixel Aspect Ratio and Clean Aperture ........................................................................................... 15612.1.5 Colour information ................................................................................................................................. 15812.2 Audio media ................................................................................................................................................. 15912.2.1 Media handler .......................................................................................................................................... 15912.2.2 Sound media header .............................................................................................................................. 15912.2.3 Sample entry ............................................................................................................................................. 16012.2.4 Channel layout ......................................................................................................................................... 16212.2.5 Downmix Instructions ........................................................................................................................... 16312.2.6 DRC Information ..................................................................................................................................... 16512.2.7 Audio stream loudness ......................................................................................................................... 16512.3 Metadata media ........................................................................................................................................... 16712.3.1 Media handler .......................................................................................................................................... 16712.3.2 Media header ............................................................................................................................................ 16712.3.3 Sample entry ............................................................................................................................................. 16712.4 Hint media ..................................................................................................................................................... 16912.4.1 Media handler .......................................................................................................................................... 16912.4.2 Hint media header .................................................................................................................................. 16912.4.3 Sample entry ............................................................................................................................................. 170
ISO/IEC 14496-12:2015(E)
viii ©ISO/IEC2015–Allrightsreserved
12.5 Text media ..................................................................................................................................................... 17012.5.1 Media handler ........................................................................................................................................... 17012.5.2 Media header ............................................................................................................................................ 17012.5.3 Sample entry ............................................................................................................................................. 17012.6 Subtitle media .............................................................................................................................................. 17112.6.1 Media handler ........................................................................................................................................... 17112.6.2 Subtitle media header ........................................................................................................................... 17112.6.3 Sample entry ............................................................................................................................................. 17112.7 Font media ..................................................................................................................................................... 17212.7.1 Media handler ........................................................................................................................................... 17212.7.2 Media header ............................................................................................................................................ 17212.7.3 Sample entry ............................................................................................................................................. 17212.8 Transformed media ................................................................................................................................... 172
Annex A(informative) Overview and Introduction ..................................................................................... 173A.1 Section Overview ........................................................................................................................................... 173A.2 Core Concepts ................................................................................................................................................. 173A.3 Physical structure of the media ............................................................................................................... 174A.4 Temporal structure of the media ............................................................................................................ 174A.5 Interleave ......................................................................................................................................................... 175A.6 Composition .................................................................................................................................................... 175A.7 Random access ............................................................................................................................................... 175A.8 Fragmented movie files ............................................................................................................................... 176
Annex B(void) ........................................................................................................................................................... 178
Annex C(informative) Guidelines on deriving from this specification ................................................ 179C.1 Introduction .................................................................................................................................................... 179C.2 General Principles ......................................................................................................................................... 179C.2.1 General ........................................................................................................................................................... 179C.2.2 Base layer operations ............................................................................................................................... 180C.3 Boxes .................................................................................................................................................................. 180C.4 Brand Identifiers ........................................................................................................................................... 181C.4.1 Introduction ................................................................................................................................................. 181C.4.2 Usage of the Brand ..................................................................................................................................... 181C.4.3 Introduction of a new brand .................................................................................................................. 182C.4.4 Player Guideline ......................................................................................................................................... 182C.4.5 Authoring Guideline .................................................................................................................................. 182C.4.6 Example ......................................................................................................................................................... 183C.5 Storage of new media types ....................................................................................................................... 183C.6 Use of Template fields .................................................................................................................................. 183C.7 Tracks ................................................................................................................................................................ 184C.7.1 Data Location ............................................................................................................................................... 184C.7.2 Time ................................................................................................................................................................ 184C.7.3 Media Types ................................................................................................................................................. 185C.7.4 Coding Types ................................................................................................................................................ 185C.7.5 Sub-sample information .......................................................................................................................... 185C.7.6 Sample Dependency .................................................................................................................................. 185
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved ix
C.7.7 Sample Groups ............................................................................................................................................ 185C.7.8 Track-level ................................................................................................................................................... 186C.7.9 Protection ..................................................................................................................................................... 186C.8 Construction of fragmented movies ....................................................................................................... 186C.9 Meta-data ......................................................................................................................................................... 187C.10 Registration .................................................................................................................................................. 187C.11 Guidelines on the use of sample groups, timed metadata tracks, and sample auxiliary information ................................................................................................................................................................ 187
Annex D(informative) Registration Authority ............................................................................................. 190D.1 Code points to be registered ..................................................................................................................... 190D.2 Procedure for the request of an MPEG-4 registered identifier value ........................................ 191D.3 Responsibilities of the Registration Authority .................................................................................. 191D.4 Contact information for the Registration Authority ........................................................................ 191D.5 Responsibilities of Parties Requesting a RID ..................................................................................... 192D.6 Appeal Procedure for Denied Applications ........................................................................................ 192D.7 Registration Application Form ................................................................................................................ 192D.7.1 Contact Information of organization requesting a RID ............................................................... 192D.7.2 Request for a specific RID ...................................................................................................................... 193D.7.3 Short description of RID that is in use and date system was implemented ......................... 193D.7.4 Statement of an intention to apply the assigned RID ................................................................... 193D.7.5 Date of intended implementation of the RID .................................................................................. 193D.7.6 Authorized representative .................................................................................................................... 193D.7.7 For official use of the Registration Authority ................................................................................. 194
Annex E(normative)File format brands ........................................................................................................ 195E.1 Introduction .................................................................................................................................................... 195E.2 The ‘isom’ brand ........................................................................................................................................ 196E.3 The ‘avc1’ brand ........................................................................................................................................ 197E.4 The ‘iso2’ brand ........................................................................................................................................ 197E.5 The ‘mp71’ brand ........................................................................................................................................ 198E.6 The ‘iso3’ brand ........................................................................................................................................ 198E.7 The ‘iso4’ brand ........................................................................................................................................ 199E.8 The ‘iso5’ brand ........................................................................................................................................ 199E.9 The ‘iso6’ brand ........................................................................................................................................ 200E.10 The ‘iso7’ brand ..................................................................................................................................... 200E.11 The ‘iso8’ brand ..................................................................................................................................... 201E.12 The ‘iso9’ brand ..................................................................................................................................... 201
Annex F(void) ........................................................................................................................................................... 202
Annex G(informative)URI-labelled metadata forms ................................................................................. 203G.1 UUID-labelled metadata ............................................................................................................................. 203G.2 ISO OID-labelled metadata ........................................................................................................................ 203G.3 SMPTE-labelled metadata .......................................................................................................................... 204
Annex H(informative)Processing of RTP streams and reception hint tracks .................................. 205H.1 Introduction ................................................................................................................................................... 205H.1.1 Overview ...................................................................................................................................................... 205
ISO/IEC 14496-12:2015(E)
x ©ISO/IEC2015–Allrightsreserved
H.1.2 Structure ....................................................................................................................................................... 205H.1.3 Terms and definitions ............................................................................................................................. 205H.2 Synchronization of RTP streams ............................................................................................................. 205H.3 Recording of RTP streams ......................................................................................................................... 206H.3.1 Introduction ................................................................................................................................................ 206H.3.2 Compensation for unequal starting for position of received RTP streams .......................... 209H.3.3 Recording of SDP ....................................................................................................................................... 210H.3.4 Creation of a sample within an RTP reception hint track ........................................................... 210H.3.5 Representation of RTP timestamps .................................................................................................... 211H.3.6 Recording operations to facilitate inter-stream synchronization in playback .................. 214H.3.7 Representation of reception times ..................................................................................................... 216H.3.8 Creation of media samples ..................................................................................................................... 217H.3.9 Creation of hint samples referring to media samples .................................................................. 217H.4 Playing of recorded RTP streams ............................................................................................................ 217H.4.1 Introduction ................................................................................................................................................ 217H.4.2 Preparation for the playback ................................................................................................................ 218H.4.3 Decoding of a sample within an RTP reception hint track ......................................................... 218H.4.4 Lip synchronization .................................................................................................................................. 219H.4.5 Random access ........................................................................................................................................... 220H.5 Re-sending recorded RTP streams ......................................................................................................... 221H.5.1 Introduction ................................................................................................................................................ 221H.5.2 Re-sending RTP packets.......................................................................................................................... 222H.5.3 RTCP Processing ........................................................................................................................................ 223
Annex I(normative)Stream Access Points ..................................................................................................... 224I.1 Introduction ..................................................................................................................................................... 224I.2 SAP properties ................................................................................................................................................. 224I.2.1 General ............................................................................................................................................................ 224I.2.2 SAP properties for layers ......................................................................................................................... 225I.3 SAP types ........................................................................................................................................................... 226
Annex J(normative)MIME Type Registration of Segments ..................................................................... 227J.1 Introduction ..................................................................................................................................................... 227J.2 Registration ...................................................................................................................................................... 227
Annex K : Segment Index Examples (informative) ...................................................................................... 228K.1 Introduction .................................................................................................................................................... 228K.2 Examples .......................................................................................................................................................... 228K.2.1 Simple one-level indexing ...................................................................................................................... 228K.2.2 Hierarchical ................................................................................................................................................. 228K.2.3 Daisy-chain .................................................................................................................................................. 229K.2.4 Combination hierarchical and daisy-chain ...................................................................................... 230
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved xi
Foreword
ISO (the International Organization for Standardization) and IEC (the International ElectrotechnicalCommission) form the specialized system for worldwide standardization. National bodies that aremembers of ISO or IEC participate in the development of International Standards through technicalcommitteesestablishedbytherespectiveorganizationtodealwithparticularfieldsoftechnicalactivity.ISO and IEC technical committees collaborate in fields of mutual interest. Other internationalorganizations,governmentalandnon‐governmental, in liaisonwith ISOand IEC,also takepart in thework.Inthefieldofinformationtechnology,ISOandIEChaveestablishedajointtechnicalcommittee,ISO/IECJTC1.
The procedures used to develop this document and those intended for its further maintenance aredescribedintheISO/IECDirectives,Part1.Inparticularthedifferentapprovalcriterianeededforthedifferent types of document should be noted. This document was drafted in accordance with theeditorialrulesoftheISO/IECDirectives,Part2(seewww.iso.org/directives).
Attentionisdrawntothepossibilitythatsomeoftheelementsofthisdocumentmaybethesubjectofpatent rights. ISO and IEC shall not be held responsible for identifying any or all such patentrights.Detailsof anypatent rights identifiedduring thedevelopmentof thedocumentwillbe in theIntroductionand/orontheISOlistofpatentdeclarationsreceived(seewww.iso.org/patents).
Anytradenameusedinthisdocumentisinformationgivenfortheconvenienceofusersanddoesnotconstituteanendorsement.
For an explanation on the meaning of ISO specific terms and expressions related to conformityassessment, as well as information about ISO's adherence to the WTO principles in the TechnicalBarrierstoTrade(TBT)seethefollowingURL:Foreword‐Supplementaryinformation
ThecommitteeresponsibleforthisdocumentisISO/IECJTC1,Information technology,SC29,Coding of audio, picture, multimedia and hypermedia information.
This fifth edition cancels and replaces the fourth edition (ISO/IEC 14496‐12:2012), which has beentechnicallyrevised.ItalsoincorporatestheAmendmentsISO/IEC14496‐12:2012/Amd1:2013,ISO/IEC14496‐12:2012/Amd2:2014, ISO/IEC 14496‐12:2012/Amd3:2015 and the Technical CorrigendaISO/IEC 14496‐12:2012/Cor1:2013, ISO/IEC 14496‐12:2012/Cor2:2014 and ISO/IEC 14496‐12:2012/Cor3:2015.
ISO/IEC14496consistsofthefollowingparts,underthegeneraltitleInformation technology — Coding of audio-visual objects:
Part 1: Systems
Part 2: Visual
Part 3: Audio
ISO/IEC 14496-12:2015(E)
xii ©ISO/IEC2015–Allrightsreserved
Part 4: Conformance testing
Part 5: Reference software
Part 6: Delivery Multimedia Integration Framework (DMIF)
Part 7: Optimized reference software for coding of audio-visual objects
Part 8: Carriage of ISO/IEC 14496 contents over IP networks
Part 9: Reference hardware description
Part 10: Advanced Video Coding
Part 11: Scene description and application engine
Part 12: ISO base media file format
Part 13: Intellectual Property Management and Protection (IPMP) extensions
Part 14: MP4 file format
Part 15: Carriage of NAL unit structured video in the ISO Base Media File Format
Part 16: Animation Framework eXtension (AFX)
Part 17: Streaming text format
Part 18: Font compression and streaming
Part 19: Synthesized texture stream
Part 20: Lightweight Application Scene Representation (LASeR) and Simple Aggregation Format (SAF)
Part 21: MPEG-J Graphics Framework eXtensions (GFX)
Part 22: Open Font Format
Part 23: Symbolic Music Representation
Part 24: Audio and systems interaction
Part 25: 3D Graphics Compression Model
Part 26: Audio conformance
Part 27: 3D Graphics conformance
Part 28: Composite font representation
Part 29: Web video coding
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved xiii
Part 30: Timed text and other visual overlays in ISO base media file format
Part 31: Video Coding for Browsers
ISO/IEC 14496-12:2015(E)
xiv ©ISO/IEC2015–Allrightsreserved
Introduction
TheISOBaseMediaFileFormatisdesignedtocontaintimedmediainformationforapresentationinaflexible, extensible format that facilitates interchange, management, editing, and presentation of themedia. This presentation may be ‘local’ to the system containing the presentation, or may be via anetworkorotherstreamdeliverymechanism.
Thefilestructureisobject‐oriented;afilecanbedecomposedintoconstituentobjectsverysimply,andthestructureoftheobjectsinferreddirectlyfromtheirtype.
The file format is designed to be independent of any particular network protocol while enablingefficientsupportforthemingeneral.
TheISOBaseMediaFileFormatisabaseformatformediafileformats.
It is intended that the ISO Base Media File Format shall be jointly maintained by WG1 andWG11.Consequently, a subdivision of work created ISO/IEC15444‐12 and ISO/IEC14496‐12 in order todocumenttheISOBaseMediaFileFormatandtofacilitatethejointmaintenance.
ThistechnicallyidenticaltextispublishedasISO/IEC14496‐12forMPEG‐4,andasISO/IEC15444‐12forJPEG2000,andreferencetothisspecificationshouldbemadeaccordingly.Therecommendationistoreferenceone,forexampleISO/IEC14496‐12,andappendtothereferenceaparentheticalcommentidentifyingtheother,forexample“(technicallyidenticaltoISO/IEC15444‐12)”.
The International Organization for Standardization (ISO) and International ElectrotechnicalCommission(IEC)drawattentiontothefactthatitisclaimedthatcompliancewiththisdocumentmayinvolvetheuseofpatents.
TheISOandIECtakenopositionconcerningtheevidence,validityandscopeofthispatentright.
TheholderofthispatentrighthasassuredtheISOandIECthatheiswillingtonegotiatelicencesunderreasonableandnon‐discriminatorytermsandconditionswithapplicantsthroughouttheworld.Inthisrespect,thestatementoftheholderofthispatentrightisregisteredwiththeISOandIEC.InformationmaybeobtainedfromthecompanieslistedinAnnexB.
Attentionisdrawntothepossibilitythatsomeoftheelementsofthisdocumentmaybethesubjectofpatent rights other than those identified in Annex B. ISO and IEC shall not be held responsible foridentifyinganyorallsuchpatentrights.
ISO (www.iso.org/patents) and IEC (http://patents.iec.ch) maintain on‐line databases of patentsrelevant to their standards. Users are encouraged to consult the databases for themost up to dateinformationconcerningpatents.
INTERNATIONAL STANDARD ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 1
Information technology — Coding of audio-visual objects —
Part12:ISO base media file format
1 Scope
ThispartofISO/IEC14496specifiestheISObasemediafileformat,whichisageneralformatformingthebasis foranumberofothermorespecific file formats.This formatcontains the timing,structure,andmediainformationfortimedsequencesofmediadata,suchasaudio‐visualpresentations.
This part of ISO/IEC14496 is applicable to MPEG‐4, but its technical content is identical to that ofISO/IEC15444‐12,whichisapplicabletoJPEG2000.
2 Normative references
The following documents, inwhole or in part, are normatively referenced in this document and areindispensable for its application. For dated references, only the edition cited applies. For undatedreferences,thelatesteditionofthereferenceddocument(includinganyamendments)applies.
ISO639‐2:1998,Codes for the representation of names of languages — Part 2: Alpha-3 code
ISO/IEC9834‐8:2005, Information technology — Open Systems Interconnection — Procedures for the operation of OSI Registration Authorities: Generation and registration of Universally Unique Identifiers (UUIDs) and their use as ASN.1 Object Identifier components
ISO/IEC11578:1996,Information technology — Open Systems Interconnection — Remote Procedure Call (RPC)
ISO/IEC14496‐1:2010:Information technology — Coding of audio-visual objects — Part 1: Systems
ISO/IEC14496‐10, Information technology — Coding of audio-visual objects — Part 10: Advanced Video Coding
ISO/IEC14496‐14,Information technology — Coding of audio-visual objects — Part 14: MP4 file format
ISO/IEC15444‐1,Information technology — JPEG 2000 image coding system: Core coding system
ISO/IEC15444‐3,Information technology — JPEG 2000 image coding system: Motion JPEG 2000
ISO/IEC15938‐1,Information technology — Multimedia content description interface — Part 1: Systems
ISO/IEC23001‐1, Information technology — MPEG systems technologies — Part 1: Binary MPEG format for XML
ISO/IEC 14496-12:2015(E)
2 ©ISO/IEC2015–Allrightsreserved
ISO/IEC23002‐3, Information technology — MPEG video technologies — Part 3: Representation of auxiliary video and supplemental information
ISO/IEC29199‐2:2012, Information technology — JPEG XR image coding system — Part 2: Image coding specification
ISO15076‐1:2010, Image technology colour management — Architecture, profile format and data structure — Part 1: Based on ICC.1:2010
IETF RFC 2045, Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies,FREED,N.andBORENSTEIN,N.,November1996
IETFRFC2046, Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types, FREED, N. andBORENSTEIN,N.,November1996
IETFRFC3550,RTP: A Transport Protocol for Real-Time Applications,SCHULZRINNE,H.etal.,July2003.
IETFRFC3711,The Secure Real-time Transport Protocol (SRTP),BAUGHER,M.etal.,March2004
IETFRFC5052,Forward Error Correction (FEC) Building Block,WATSON,M.etal.,August2007
IETFRFC5905,Network Time Protocol Version 4: Protocol and Algorithms Specification,MILLS,D., et al,June2010
SMIL1.0, Synchronized Multimedia Integration Language (SMIL) 1.0 Specification,<http://www.w3.org/TR/REC‐smil/>
Rec.ITU‐RTF.460‐6,Standard-frequency and time-signal emissions (Annex I for the definition of UTC.)
ISO/IEC23003‐4,Information technology – MPEG audio technologies – Part 4: Dynamic range control
ITU‐R, Recommendation ITU‐R BS.1770‐3.Algorithm to measure audio programme loudness and true-peak audio level,August2012.
ITU‐R, Recommendation ITU‐R BS.1771‐1.Requirements for loudness and true-peak indicating meters,January2012.
EBUR128‐2014,Loudness normalization and permitted maximum level of audio signals,June2014.
EBUEBU–Tech3341,Loudness Metering: EBU mode metering to supplement loudness normalization in accordance with EBU R128
EBUEBU‐Tech3342,Loudness Range:�A measure to supplement loudness normalisation�in accordance with EBU R 128,Geneva,August2011
ETSITS101154V1.11.1,Digital Video Broadcasting (DVB); Specification for the use of Video and Audio Coding in Broadcasting Applications based on the MPEG-2 Transport Stream,November2012.
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 3
ATSCDocumentA/85:2011,ATSC Recommended Practice: Techniques for Establishing and Maintaining Audio Loudness for Digital Television,July2011
ATSCDoc.A/52:2012,ATSC Standard: Digital Audio Compression (AC-3, E-AC-3).
IETFRFC5646,BCP47,Tags for Identifying Languages,PHILLIPS,A.,etal,September2009
3 Terms, definitions, and abbreviated terms
3.1 Terms and definitions
Forthepurposesofthisdocument,thefollowingtermsanddefinitionsapply.
3.1.1 box object‐orientedbuildingblockdefinedbyauniquetypeidentifierandlength
Note1toentry:Called‘atom’insomespecifications,includingthefirstdefinitionofMP4.
3.1.2 chunk contiguoussetofsamplesforonetrack
3.1.3 container box boxwhosesolepurposeistocontainandgroupasetofrelatedboxes
Note1toentry:Containerboxesarenormallynotderivedfrom‘fullbox’.
3.1.4 hint track specialtrackwhichdoesnotcontainmediadata,butinsteadcontainsinstructionsforpackagingoneormoretracksintoastreamingchannel
3.1.5 hinter toolthatisrunonafilecontainingonlymedia,toaddoneormorehinttrackstothefileandsofacilitatestreaming
3.1.6 ISO Base Media File nameofthefilesconformingtothefileformatdescribedinthisspecification
3.1.7 leaf subsegment subsegmentthatdoesnotcontainanyindexinginformationthatwouldenableitsfurtherdivisionintosubsegments
ISO/IEC 14496-12:2015(E)
4 ©ISO/IEC2015–Allrightsreserved
3.1.8 media data box boxwhichcanholdtheactualmediadataforapresentation(‘mdat’)
3.1.9 movie box containerboxwhosesub‐boxesdefinethemetadataforapresentation(‘moov’)
3.1.10 movie-fragment relative addressing signalling of offsets for media data in movie fragments that is relative to the start of those moviefragments, specifically setting the flagsbase‐data‐offset‐present to0 anddefault‐base‐is‐moof to1 inTrackFragmentHeaderBoxes
Note1toentry:Settingthedefault‐base‐is‐moofflagto1isonlyrelevantformoviefragmentsthatcontainmorethanonetrackrun(eitherinthesameorseveraltracks).
3.1.11 presentation oneormoremotionsequences,possiblycombinedwithaudio
3.1.12 random access point (RAP) sampleinatrackthatstartsattheISAUofaSAPoftype1or2or3asdefinedinAnnexI;informally,asample, fromwhichwhendecodingstarts, thesample itselfandall samples following incompositionordercanbecorrectlydecoded
3.1.13 random access recovery point sample ina trackwithpresentation timeequal to theTSAPofaSAPof type4asdefined inAnnex I;informally,asample, thatcanbecorrectlydecodedafterhavingdecodedanumberofsamplesthat isbeforethissampleindecodingorder,sometimesknownasgradualdecodingrefresh
3.1.14 sample allthedataassociatedwithasingletimestamp
Note1toentry:Notwosampleswithinatrackcansharethesametime‐stamp.
Note2toentry:Innon‐hinttracks,asampleis,forexample,anindividualframeofvideo,aseriesofvideoframesindecodingorder, or a compressed section of audio in decoding order; in hint tracks, a sample defines the formation of one ormorestreamingpackets.
3.1.15 sample description structurewhichdefinesanddescribestheformatofsomenumberofsamplesinatrack
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 5
3.1.16 sample table packeddirectoryforthetimingandphysicallayoutofthesamplesinatrack
3.1.17 sync sample sampleinatrackthatstartsattheISAUofaSAPoftype1or2asdefinedinAnnexI;informally,amediasamplethatstartsanewindependentsequenceofsamples;ifdecodingstartsatthesyncsample,itandsucceeding samples indecoding order can all be correctlydecoded, and the resulting set of decodedsamplesformsthecorrectpresentationofthemediastartingatthedecodedsamplethathastheearliestcomposition time; a media format may provide a more precise definition of a sync sample for thatformat
3.1.18 segment portionofan ISObasemedia file format file, consistingofeither (a)amoviebox,with its associatedmediadata (if any) andotherassociatedboxesor (b)oneormoremovie fragmentboxes,with theirassociatedmediadata,andotherassociatedboxes
3.1.18 subsegment timeintervalofasegmentformedfrommoviefragmentboxes,thatisalsoavalidsegment
3.1.19 track timedsequenceofrelatedsamples(q.v.)inanISObasemediafile
Note 1 to entry: For media data, a track corresponds to a sequence of images or sampled audio; for hint tracks, a trackcorrespondstoastreamingchannel.
3.2 Abbreviated terms
Forthepurposesofthisdocument,thefollowingabbreviatedtermsapply.
ALC AsynchronousLayeredCoding
FD FileDelivery
FDT FileDeliveryTable
FEC ForwardErrorCorrection
FLUTE FileDeliveryoverUnidirectionalTransport
IANA InternetAssignedNumbersAuthority
LCT LayeredCodingTransport
MBMS MultimediaBroadcast/MulticastService
ISO/IEC 14496-12:2015(E)
6 ©ISO/IEC2015–Allrightsreserved
4 Object-structured File Organization
4.1 File Structure
Filesareformedasaseriesofobjects,calledboxesinthisspecification.Alldataiscontainedinboxes;there is no other datawithin the file. This includes any initial signature required by the specific fileformat.
All object‐structured files conformant to this section of this specification (all Object‐Structured files)shallcontainaFileTypeBox.
4.2 Object Structure
Anobjectinthisterminologyisabox.
Boxesstartwithaheaderwhichgivesbothsizeandtype.Theheaderpermitscompactorextendedsize(32or64bits)andcompactorextendedtypes(32bitsorfullUniversalUniqueIDentifiers,i.e.UUIDs).Thestandardboxesallusecompacttypes(32‐bit)andmostboxeswillusethecompact(32‐bit)size.TypicallyonlytheMediaDataBox(es)needthe64‐bitsize.
Thesizeistheentiresizeofthebox,includingthesizeandtypeheader,fields,andallcontainedboxes.Thisfacilitatesgeneralparsingofthefile.
The definitions of boxes are given in the syntax description language (SDL) defined inMPEG‐4 (seereference in Clause2). Comments in the code fragments in this specification indicate informativematerial.
The fields in theobjectsarestoredwith themostsignificantbyte first, commonlyknownasnetworkbyte order or big‐endian format.When fields smaller than a byte are defined, or fields span a byteboundary,thebitsareassignedfromthemostsignificantbits ineachbytetotheleastsignificant.Forexample,afieldoftwobitsfollowedbyafieldofsixbitshasthetwobitsinthehighorderbitsofthebyte.
aligned(8) class Box (unsigned int(32) boxtype, optional unsigned int(8)[16] extended_type) { unsigned int(32) size; unsigned int(32) type = boxtype; if (size==1) { unsigned int(64) largesize; } else if (size==0) { // box extends to end of file } if (boxtype==‘uuid’) { unsigned int(8)[16] usertype = extended_type; } }
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 7
Thesemanticsofthesetwofieldsare:
size is an integer that specifies the number of bytes in this box, including all its fields andcontainedboxes;ifsizeis1thentheactualsizeisinthefieldlargesize;ifsizeis0,thenthisboxisthelastoneinthefile,anditscontentsextendtotheendofthefile(normallyonlyusedforaMediaDataBox)
typeidentifiestheboxtype;standardboxesuseacompacttype,whichisnormallyfourprintablecharacters,topermiteaseofidentification,andisshownsointheboxesbelow.Userextensionsuseanextendedtype;inthiscase,thetypefieldissetto‘uuid’.
Boxeswithanunrecognizedtypeshallbeignoredandskipped.
Manyobjectsalsocontainaversionnumberandflagsfield:
aligned(8) class FullBox(unsigned int(32) boxtype, unsigned int(8) v, bit(24) f) extends Box(boxtype) { unsigned int(8) version = v; bit(24) flags = f; }
Thesemanticsofthesetwofieldsare:
versionisanintegerthatspecifiestheversionofthisformatofthebox.flagsisamapofflags
Boxeswithanunrecognizedversionshallbeignoredandskipped.
4.3 File Type Box
4.3.1 Definition
BoxType: `ftyp’Container: FileMandatory: YesQuantity: Exactlyone(butseebelow)
Fileswrittentothisversionofthisspecificationmustcontainafile‐typebox.Forcompatibilitywithanearlierversionofthisspecification,filesmaybeconformanttothisspecificationandnotcontainafile‐type box. Files with no file‐type box should be read as if they contained an FTYP box withMajor_brand='mp41', minor_version=0,andthesinglecompatiblebrand'mp41'.
Amedia‐filestructuredtothispartofthisspecificationmaybecompatiblewithmorethanonedetailedspecification,anditisthereforenotalwayspossibletospeakofasingle‘type’or‘brand’forthefile.ThismeansthattheutilityofthefilenameextensionandMultipurposeInternetMailExtension(MIME)typearesomewhatreduced.
Thisboxmustbeplacedasearlyaspossible inthefile(e.g.afteranyobligatorysignature,butbeforeany significant variable‐sizeboxes suchas aMovieBox,MediaDataBox, orFreeSpace). It identifieswhichspecificationisthe‘bestuse’ofthefile,andaminorversionofthatspecification;andalsoasetofother specifications towhich the file complies. Readers implementing this format should attempt toreadfilesthataremarkedascompatiblewithanyofthespecificationsthatthereaderimplements.Anyincompatiblechangeinaspecificationshouldthereforeregisteranew‘brand’identifiertoidentifyfilesconformanttothenewspecification.
ISO/IEC 14496-12:2015(E)
8 ©ISO/IEC2015–Allrightsreserved
Theminorversionisinformativeonly.Itdoesnotappearforcompatible‐brands,andmustnotbeusedtodetermine the conformanceof a file to a standard. Itmayallowmoreprecise identificationof themajorspecification,forinspection,debugging,orimproveddecoding.
Fileswouldnormallybeexternallyidentified(e.g.withafileextensionormimetype)thatidentifiesthe‘bestuse’(majorbrand),orthebrandthattheauthorbelieveswillprovidethegreatestcompatibility.
This section of this specification does not define any brands. However, see subclause 6.3 below forbrands for filesconformant to thewholespecificationandnot just thissection.All file formatbrandsdefinedinthisspecificationareincludedinAnnexEwithasummaryofwhichfeaturestheyrequire.
4.3.2 Syntax
aligned(8) class FileTypeBox extends Box(‘ftyp’) { unsigned int(32) major_brand; unsigned int(32) minor_version; unsigned int(32) compatible_brands[]; // to end of the box }
4.3.3 Semantics
Thisboxidentifiesthespecificationstowhichthisfilecomplies.
Eachbrandisaprintablefour‐charactercode,registeredwithISO,thatidentifiesaprecisespecification.
major_brand –isabrandidentifierminor_version –isaninformativeintegerfortheminorversionofthemajorbrandcompatible_brands –isalist,totheendofthebox,ofbrands
5 Design Considerations
5.1 Usage
5.1.1 Introduction
Thefileformatisintendedtoserveasabasisforanumberofoperations.Inthesevariousroles,itmaybeusedindifferentways,anddifferentaspectsoftheoveralldesignexercised.
5.1.2 Interchange
Whenusedasaninterchangeformat,thefileswouldnormallybeself‐contained(notreferencingmediain other files), contain only the media data actually used in the presentation, and not contain anyinformationrelatedtostreaming.Thiswillresult inasmall,protocol‐independent,self‐containedfile,whichcontainsthecoremediadataandtheinformationneededtooperateonit.
Thefollowingdiagramgivesanexampleofasimpleinterchangefile,containingtwostreams.
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 9
ISO file
moov …other boxes
mdat
Interleaved, time-ordered, videoand audio framestrak (audio)
trak (video)
Figure 1 — Simple interchange file
5.1.3 Content Creation
Duringcontentcreation,anumberofareasoftheformatcanbeexercisedtousefuleffect,particularly:
theabilitytostoreeachelementarystreamseparately(notinterleaved),possiblyinseparatefiles.
theabilitytoworkinasinglepresentationthatcontainsmediadataandotherstreams(e.g.editing the audio track in the uncompressed format, to align with an already‐preparedvideotrack).
Thesecharacteristicsmeanthatpresentationsmaybeprepared,editsapplied,andcontentdevelopedand integrated without either iteratively re‐writing the presentation on disc – which would benecessary if interleavewas required andunuseddatahad to bedeleted;and alsowithout iterativelydecodingandre‐encodingthedata–whichwouldbenecessaryifthedatamustbestoredinanencodedstate.
Inthefollowingdiagram,asetoffilesbeingusedintheprocessofcontentcreationisshown.
ISO/IEC 14496-12:2015(E)
10 ©ISO/IEC2015–Allrightsreserved
media file video frames, possibly un-ordered with other unused data
ISO File
…other boxes (inc. moov)
mdat Video and Audio frames possibly un-ordered with other unused data
ISO file
moov …other boxes
trak (audio)
trak (video)
Figure 2 — Content Creation File
5.1.4 Preparation for streaming
Whenprepared forstreaming, the filemustcontain informationtodirect thestreamingserver in theprocessofsendingtheinformation.Inaddition,itishelpfuliftheseinstructionsandthemediadataareinterleavedsothatexcessiveseekingcanbeavoidedwhenservingthepresentation.Itisalsoimportantthat the originalmedia data be retained unscathed, so that the filesmay be verified, or re‐edited orotherwisere‐used.Finally, it ishelpful if asingle filecanbeprepared formore thanoneprotocol, sodifferingserversmayuseitoverdisparateprotocols.
5.1.5 Local presentation
‘Locally’ viewing a presentation (i.e. directly from the file, not over a streamed interconnect) is animportantapplication;itisusedwhenapresentationisdistributed(e.g.onCDorDVDROM),duringtheprocessofdevelopment,andwhenverifyingthecontentonstreamingservers.Suchlocalviewingmustbesupported,withfullrandomaccess.IfthepresentationisonCDorDVDROM,interleaveisimportantasseekingmaybeslow.
5.1.6 Streamed presentation
Whenaserveroperatesfromthefiletomakeastream,theresultingstreammustbeconformantwiththespecificationsfortheprotocol(s)used,andshouldcontainnotraceofthefile‐formatinformationinthefileitself.Theserverneedstobeabletorandomaccessthepresentation.Itcanbeusefultore‐useservercontent(e.g.tomakeexcerpts)byreferencingthesamemediadatafrommultiplepresentations;itcanalsoassiststreamingifthemediadatacanbeonread‐onlymedia(e.g.CD)andnotcopied,merelyaugmented,whenpreparedforstreaming.
Thefollowingdiagramshowsapresentationpreparedforstreamingoveramultiplexingprotocol,onlyonehinttrackisrequired.
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 11
ISO file
moov …other boxes
mdat
Interleaved, time-ordered, videoand audio frames, and hintinstructions
trak (video)
trak (audio)
trak (hint)
Figure 3 — Hinted Presentation for Streaming
5.2 Design principles
Thefilestructureisobject‐oriented;afilecanbedecomposedintoconstituentobjectsverysimply,andthestructureoftheobjectsinferreddirectlyfromtheirtype.
Media‐data isnot ‘framed’by the file format; the file formatdeclarations that give the size, typeandpositionofmediadataunitsarenotphysicallycontiguouswiththemediadata.Thismakesitpossibletosubsetthemedia‐data,andtouseitinitsnaturalstate,withoutrequiringittobecopiedtomakespaceforframing.Themetadataisusedtodescribethemediadatabyreference,notbyinclusion.
Similarlytheprotocol informationforaparticularstreamingprotocoldoesnot framethemediadata;theprotocolheadersarenotphysicallycontiguouswiththemediadata.Instead,themediadatacanbeincludedbyreference.Thismakesitpossibletorepresentmediadatainitsnaturalstate,notfavouringanyprotocol.Italsomakesitpossibleforthesamesetofmediadatatoserveforlocalpresentation,andformultipleprotocols.
Theprotocolinformationisbuiltinsuchawaythatthestreamingserversneedtoknowonlyabouttheprotocolandthewayitshouldbesent;theprotocolinformationabstractsknowledgeofthemediasothattheserversare,toalargeextent,media‐typeagnostic.Similarlythemedia‐data,storedasitisinaprotocol‐unawarefashion,enablesthemediatoolstobeprotocol‐agnostic.
The file formatdoesnot require that a singlepresentationbe in a single file. This enablesboth sub‐settingandre‐useofcontent.Whencombinedwiththenon‐framingapproach,italsomakesitpossibletoincludemediadatainfilesnotformattedtothisspecification(e.g. ‘raw’filescontainingonlymediadataandnodeclarativeinformation,orfileformatsalreadyinuseinthemediaorcomputerindustries).
Thefileformatisbasedonacommonsetofdesignsandarichsetofpossiblestructuresandusages.Thesameformatservesallusages;translationisnotrequired.However,whenusedinaparticularway(e.g.forlocalpresentation),thefilemayneedstructuringincertainwaysforoptimalbehaviour(e.g.time‐ordering of the data). No normative structuring rules are defined by this specification, unless arestrictedprofileisused.
ISO/IEC 14496-12:2015(E)
12 ©ISO/IEC2015–Allrightsreserved
6 ISO Base Media File organization
6.1 Presentation structure
6.1.1 File Structure
A presentation may be contained in several files. One file contains the metadata for the wholepresentation, and is formatted to this specification. This file may also contain all the media data,whereuponthepresentationisself‐contained.Theotherfiles,ifused,arenotrequiredtobeformattedtothisspecification;theyareusedtocontainmediadata,andmayalsocontainunusedmediadata,orotherinformation.Thisspecificationconcernsthestructureofthepresentationfileonly.Theformatofthemedia‐data files isconstrainedby thisspecificationonly in that themedia‐data in themedia filesmustbecapableofdescriptionbythemetadatadefinedhere.
These other filesmay be ISO files, image files, or other formats. Only themedia data itself, such asJPEG2000images,isstoredintheseotherfiles;alltimingandframing(positionandsize)informationisintheISObasemediafile,sotheancillaryfilesareessentiallyfree‐format.
IfanISOfilecontainshinttracks,themediatracksthatreferencethemediadatafromwhichthehintswerebuilt shall remain in the file,even if thedatawithin them isnotdirectlyreferencedby thehinttracks;afterdeletingallhinttracks,theentireun‐hintedpresentationshallremain.Notethatthemediatracksmay,however,refertoexternalfilesfortheirmediadata.
AnnexAprovidesaninformativeintroduction,whichmaybeofassistancetofirst‐timereaders.
6.1.2 Object Structure
The file is structuredas a sequenceofobjects; someof theseobjectsmay containotherobjects. Thesequenceofobjectsinthefileshallcontainexactlyonepresentationmetadatawrapper(theMovieBox).Itisusuallyclosetothebeginningorendofthefile,topermititseasylocation.Theotherobjectsfoundat this levelmay be a File‐Type box, Free Space Boxes,Movie Fragments,Meta‐data, orMedia DataBoxes.
6.1.3 Meta Data and Media Data
Themetadataiscontainedwithinthemetadatawrapper(theMovieBox);themediadataiscontainedeither in the same file, withinMedia Data Box(es), or in other files. Themedia data is composed ofimages or audio data; themedia data objects, ormedia data files, may contain other un‐referencedinformation.
6.1.4 Track Identifiers
The track identifiersused inan ISO fileareuniquewithin that file;no two tracks shalluse the sameidentifier.
Thenexttrackidentifiervaluestoredinnext_track_IDintheMovieHeaderBoxgenerallycontainsavalueonegreaterthanthelargesttrackidentifiervaluefoundinthefile.Thisenableseasygenerationofatrackidentifierundermostcircumstances.However,ifthisvalueisequaltoones(32‐bitunsignedmaxint),thenasearchforanunusedtrackidentifierisneededforalladditions.
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 13
6.2 Metadata Structure (Objects)
6.2.1 Box
Type fields not defined here are reserved. Private extensions shall be achieved through the‘uuid’type.Inaddition,thefollowingtypesarenotandwillnotbeused,orusedonlyintheirexistingsense,infutureversionsof thisspecification, toavoidconflictwithexistingcontentusingearlierpre‐standardversionsofthisformat:
clip, crgn, matt, kmat, pnot, ctab, load, imap; these track reference types (as found in the reference_type of a Track Reference Box): tmcd, chap, sync, scpt, ssrc.
Anumberof boxes contain index values into sequences in otherboxes.These indexes startwith thevalue1(1isthefirstentryinthesequence).
6.2.2 Data Types and fields
Inanumberofboxesinthisspecification,therearetwovariantforms:version0using32‐bitfields,andversion1using64‐bitsizesforthosesamefields.Ingeneral,ifaversion0box(32‐bitfieldsizes)canbeused, it should be; version 1 boxes should be used onlywhen the 64‐bit field sizes they permit, arerequired.Values forcounters,offsets, times,durationsetc. in this formatdonot ‘wrap’ to0whenthemaximumvaluethatcanbestoredintheirfieldisreached;appropriatelylargefieldsmustbeusedforallvalues.
For convenienceduring contentcreation thereare creationandmodification times stored in the file.These can be 32‐bit or 64‐bit numbers, counting seconds since midnight, Jan. 1, 1904, which is aconvenientdateforleap‐yearcalculations.32bitsaresufficientuntilapproximatelyyear2040.ThesetimesshallbeexpressedinUniversalTimeCoordinated(UTC),andthereforemayneedadjustmenttolocaltimeifdisplayed.
Fixed‐point numbers are signed or unsigned values resulting from dividing an integer by anappropriatepowerof2.Forexample,a30.2fixed‐pointnumberisformedbydividinga32‐bitintegerby4.
Fields shown as “template” in the box descriptions are optional in the specifications that use thisspecification.Ifthefieldisusedinanotherspecification,thatusemustbeconformantwithitsdefinitionhere, and the specification must define whether the use is optional or mandatory. Similarly, fieldsmarked“pre‐defined”wereusedinanearlierversionofthisspecification.Forbothkindsoffields,ifafieldofthatkindisnotusedinaspecification,thenitshouldbesettotheindicateddefaultvalue.Ifthefieldisnotuseditmustbecopiedun‐inspectedwhenboxesarecopied,andignoredonreading.
Matrixvalueswhichoccurintheheadersspecifyatransformationofvideoimagesforpresentation.Notallderivedspecificationsusematrices;iftheyarenotused,theyshallbesettotheidentitymatrix.Ifamatrixisused,thepoint(p,q)istransformedinto(p',q')usingthematrixasfollows:
ISO/IEC 14496-12:2015(E)
14 ©ISO/IEC2015–Allrightsreserved
(p q 1) * | a b u | = (m n z) | c d v | | x y w | m = ap + cq + x; n = bp + dq + y; z = up + vq + w; p' = m/z; q' = n/z
The coordinates {p,q} are on the decompressed frame, and {p’, q’} are at the rendering output.Therefore,forexample,thematrix{2,0,0,0,2,0,0,0,1}exactlydoublesthepixeldimensionofanimage.The co‐ordinates transformed by the matrix are not normalized in any way, and represent actualsamplelocations.Therefore{x,y}can,forexample,beconsideredatranslationvectorfortheimage.
Theco‐ordinateoriginislocatedattheupperleftcorner,andXvaluesincreasetotheright,andYvaluesincreasedownwards.{p,q}and{p’,q’}aretobetakenasabsolutepixel locationsrelativetotheupperlefthandcorneroftheoriginalimage(afterscalingtothesizedeterminedbythetrackheader'swidthandheight)andthetransformed(rendering)surface,respectively.
Eachtrackiscomposedusingitsmatrixasspecifiedintoanoverallimage;thisisthentransformedandcomposed according to the matrix at the movie level in the MovieHeaderBox. It is application‐dependent whether the resulting image is ‘clipped’ to eliminate pixels, which have no display, to avertical rectangular regionwithin awindow, for example. So for example, if only one video track isdisplayed and it has a translation to {20,30}, and a unity matrix is in the MovieHeaderBox, anapplicationmaychoosenottodisplaytheempty“L”shapedregionbetweentheimageandtheorigin.
Allthevaluesinamatrixarestoredas16.16fixed‐pointvalues,exceptforu,vandw,whicharestoredas2.30fixed‐pointvalues.
Thevaluesinthematrixarestoredintheorder{a,b,u,c,d,v,x,y,w}.
6.2.3 Box Order
An overall view of the normal encapsulation structure is provided in the following informativeTable1—Boxtypes,structure,andcross‐reference (Informative). Intheeventofaconflictbetweenthistableandtheprose, theproseprevails.Theorderofboxeswithin itscontainer isnotnecessarilyindicatedinthetable.
Thetableshowsthoseboxesthatmayoccuratthetop‐levelintheleft‐mostcolumn;indentationisusedtoshowpossiblecontainment.Thus, forexample,aTrackHeaderBox(tkhd) is found inaTrackBox(trak),whichisfoundinaMovieBox(moov).Notallboxesneedtobeusedinallfiles;themandatoryboxesaremarkedwithanasterisk(*).See thedescriptionof the individualboxes foradiscussionofwhatmustbeassumediftheoptionalboxesarenotpresent.
UserdataobjectsshallbeplacedonlyinMovieorTrackBoxes,andobjectsusinganextendedtypemaybeplacedinawidevarietyofcontainers,notjustthetoplevel.
Inordertoimproveinteroperabilityandutilityofthefiles,thefollowingrulesandguidelinesshallbefollowedfortheorderofboxes:
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 15
1) The file type box‘ftyp’ shall occur before any variable‐length box (e.g.movie, free space,mediadata).Onlyafixed‐sizeboxsuchasafilesignature,ifrequired,mayprecedeit.
2) Itisstronglyrecommendedthatallheaderboxesbeplacedfirstintheircontainer:theseboxesaretheMovieHeader,TrackHeader,MediaHeader,andthespecificmediaheadersinsidetheMediaInformationBox(e.g.theVideoMediaHeader).
3) AnyMovieFragmentBoxesshallbeinsequenceorder(seesubclause8.8.5).
4) It is recommended that the boxes within the Sample Table Box be in the following order:SampleDescription,TimetoSample,SampletoChunk,SampleSize,ChunkOffset.
5) ItisstronglyrecommendedthattheTrackReferenceBoxandEditList(ifany)shouldprecedetheMediaBox,andtheHandlerReferenceBoxshouldprecedetheMediaInformationBox,andtheDataInformationBoxshouldprecedetheSampleTableBox.
6) It isrecommended thatuserDataBoxesbeplaced last intheircontainer,which iseithertheMovieBoxorTrackBox.
7) ItisrecommendedthattheMovieFragmentRandomAccessBox,ifpresent,belastinthefile.
8) It is recommended that the progressive download information box be placed as early aspossibleinfiles,formaximumutility.
Table 1 — Box types, structure, and cross-reference(Informative)
Box types, structure, and cross-reference (Informative) ftyp * 4.3 file type and compatibility pdin 8.1.3 progressive download information moov * 8.2.1 container for all the metadata mvhd * 8.2.2 movie header, overall declarations meta 8.11.1 metadata trak * 8.3.1 container for an individual track or stream tkhd * 8.3.2 track header, overall information about the track tref 8.3.3 track reference container trgr 8.3.4 track grouping indication edts 8.6.4 edit list container elst 8.6.6 an edit list meta 8.11.1 metadata mdia * 8.4 container for the media information in a track mdhd * 8.4.2 media header, overall information about the media hdlr * 8.4.3 handler, declares the media (handler) type elng 8.4.6 extended language tag minf * 8.4.4 media information container
vmhd 12.1.2 video media header, overall information (video
track only)
smhd 12.2.2 sound media header, overall information (sound
track only)
hmhd 12.4.2 hint media header, overall information (hint track
only)
sthd 12.6.2 subtitle media header, overall information (subtitle
track only)
nmhd 8.4.5.2 Null media header, overall information (some
tracks only) dinf * 8.7.1 data information box, container
ISO/IEC 14496-12:2015(E)
16 ©ISO/IEC2015–Allrightsreserved
Box types, structure, and cross-reference (Informative)
dref * 8.7.2 data reference box, declares source(s) of media
data in track
stbl * 8.5.1 sample table box, container for the time/space
map
stsd * 8.5.2 sample descriptions (codec types, initialization
etc.) stts * 8.6.1.2 (decoding) time-to-sample ctts 8.6.1.3 (composition) time to sample cslg 8.6.1.4 composition to decode timeline mapping stsc * 8.7.4 sample-to-chunk, partial data-offset information stsz 8.7.3.2 sample sizes (framing) stz2 8.7.3.3 compact sample sizes (framing) stco * 8.7.5 chunk offset, partial data-offset information co64 8.7.5 64-bit chunk offset stss 8.6.2 sync sample table stsh 8.6.3 shadow sync sample table padb 8.7.6 sample padding bits stdp 8.7.6 sample degradation priority sdtp 8.6.4 independent and disposable samples sbgp 8.9.2 sample-to-group sgpd 8.9.3 sample group description subs 8.7.7 sub-sample information saiz 8.7.8 sample auxiliary information sizes saio 8.7.9 sample auxiliary information offsets udta 8.10.1 user-data mvex 8.8.1 movie extends box mehd 8.8.2 movie extends header box trex * 8.8.3 track extends defaults leva 8.8.13 level assignment
moof 8.8.4 movie fragment mfhd * 8.8.5 movie fragment header meta 8.11.1 metadata traf 8.8.6 track fragment tfhd * 8.8.7 track fragment header trun 8.8.8 track fragment run sbgp 8.9.2 sample-to-group sgpd 8.9.3 sample group description subs 8.7.7 sub-sample information saiz 8.7.8 sample auxiliary information sizes saio 8.7.9 sample auxiliary information offsets tfdt 8.8.12 track fragment decode time meta 8.11.1 metadata
mfra 8.8.9 movie fragment random access tfra 8.8.10 track fragment random access mfro * 8.8.11 movie fragment random access offset
mdat 8.2.2 media data container free 8.1.2 free space skip 8.1.2 free space udta 8.10.1 user-data cprt 8.10.2 copyright etc. tsel 8.10.3 track selection box strk 8.14.3 sub track box stri 8.14.4 sub track information box strd 8.14.5 sub track definition box
meta 8.11.1 metadata hdlr * 8.4.3 handler, declares the metadata (handler) type dinf 8.7.1 data information box, container
dref 8.7.2 data reference box, declares source(s) of
metadata items
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 17
Box types, structure, and cross-reference (Informative) iloc 8.11.3 item location ipro 8.11.5 item protection sinf 8.12.1 protection scheme information box frma 8.12.2 original format box schm 8.12.5 scheme type box schi 8.12.6 scheme information box iinf 8.11.6 item information xml 8.11.2 XML container bxml 8.11.2 binary XML container pitm 8.11.4 primary item reference fiin 8.13.2 file delivery item information paen 8.13.2 partition entry fire 8.13.7 file reservoir fpar 8.13.3 file partition fecr 8.13.4 FEC reservoir segr 8.13.5 file delivery session group gitn 8.13.6 group id to name idat 8.11.11 item data iref 8.11.12 item reference
meco 8.11.7 additional metadata container mere 8.11.8 metabox relation meta 8.11.1 metadata
styp 8.16.2 segment type sidx 8.16.3 segment index ssix 8.16.4 subsegment index prft 8.16.5 producer reference time
6.2.4 URIs as type indicators
WhenURIsareusedasatypeindicator(e.g.inasampleentryorforun‐timedmeta‐data),theURImustbeabsolute,notrelativeandtheformatandmeaningofthedatamustbedefinedbytheURIinquestion.Thisidentificationmaybehierarchical,inthataninitialsub‐stringoftheURImightidentifytheoverallnature or family of the data (e.g. urn:oid: identifies that themetadata is labelled by an ISO‐standardobjectidentifier).
TheURIshouldbe,butisnotrequiredtobe,de‐referencable.ItmaybestringcomparedbyreaderswiththesetofURItypesitknowsandrecognizes.URIsprovidea largenon‐collidingnon‐registeredspacefortypeidentifiers.
IftheURIcontainsadomainname(e.g.itisaURL),thenitshouldalsocontainamonth‐dateintheformmmyyyy.Thatdatemustbenearthetimeofthedefinitionoftheextension,anditmustbetruethattheURI was defined in a way authorized by the owner of the domain name at that date. (This avoidsproblemswhendomainnameschangeownership).
6.3 Brand Identification
ThedefinitionsofthebrandsthatthatapplytothefileformatarefoundinAnnexE.
ISO/IEC 14496-12:2015(E)
18 ©ISO/IEC2015–Allrightsreserved
7 Streaming Support
7.1 Handling of Streaming Protocols
Thefileformatsupportsstreamingofmediadataoveranetworkaswellaslocalplayback.Theprocessofsendingprotocoldataunits is time‐based, just like thedisplayof time‐baseddata,and is thereforesuitably described by a time‐based format. A file or ‘movie’ that supports streaming includesinformationaboutthedataunitstostream.Thisinformationisincludedinadditionaltracksofthefilecalled“hint” tracks.Hint tracksmayalsobeused torecorda stream; thesearecalledReceptionHintTracks,todifferentiatethemfromplain(orserver,ortransmission)hinttracks.
Transmissionorserverhinttrackscontaininstructionstoassistastreamingserverintheformationofpackets for transmission.These instructionsmay contain immediatedata for the server to send (e.g.headerinformation)orreferencesegmentsofthemediadata.Theseinstructionsareencodedinthefileinthesamewaythateditingorpresentationinformationisencodedinafileforlocalplayback.Insteadofeditingorpresentationinformation,informationisprovidedwhichallowsaservertopacketizethemediadatainamannersuitableforstreamingusingaspecificnetworktransport.
Thesamemediadataisusedinafilethatcontainshints,whetheritisforlocalplayback,orstreamingover a number of different protocols. Separate ‘hint’ tracks for different protocols may be includedwithin the same file and themedia will play over all such protocolswithoutmaking any additionalcopiesofthemediaitself.Inaddition,existingmediacanbeeasilymadestreamablebytheadditionofappropriatehinttracksforspecificprotocols.Themediadataitselfneednotberecastorreformattedinanyway.
Thisapproachtostreamingandrecordingismorespaceefficientthananapproachthatrequiresthatthemedia information be partitioned into the actual data units that will be transmitted for a giventransportandmediaformat.Undersuchanapproach,localplaybackrequireseitherre‐assemblingthemedia from the packets, or having two copies of the media — one for local playback and one forstreaming. Similarly, streaming such media over multiple protocols using this approach requiresmultiplecopiesofthemediadataforeachtransport.Thisisinefficientwithspace,unlessthemediadatahas been heavily transformed for streaming (e.g. by the application of error‐correcting codingtechniques,orbyencryption).
Receptionhinttracksmaybeusedwhenoneormorepacketstreamsofdataarerecorded.Receptionhint tracks indicate the order, reception timing, and contents of the received packets among otherthings.
NOTE Playersmayreproducethepacketstreamthatwasreceivedbasedonthereceptionhinttracksandprocessthereproducedpacketstreamasifitwasnewlyreceived.
7.2 Protocol ‘hint’ tracks
Supportforstreamingisbaseduponthefollowingthreedesignparameters:
Themediadataisrepresentedasasetofnetwork‐independentstandardtracks,whichmaybeplayed,edited,andsoon,asnormal;
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 19
There is a common declaration and base structure for hint tracks; this common format isprotocol independent, but contains the declarations of which protocol(s) aredescribedinthehinttrack(s);
There is a specific design of the hint tracks for eachprotocol thatmaybe transmitted; allthesedesignsusethesamebasicstructure.Forexample,theremaybedesignsforRTP(fortheInternet)andMPEG‐2transport(forbroadcast),orfornewstandardorvendor‐specificprotocols.
Theresultingstreams,sentbytheserversunderthedirectionoftheserverhinttracksorreconstructedfromthereceptionhinttracks,needcontainnotraceoffile‐specificinformation.Thisdesigndoesnotrequire that the file structures or declaration style, be used either in the data on thewire or in thedecoding station. For example, a file using ITU‐T H.261 video and DVI audio, streamed under RTP,resultsinapacketstreamthatisfullycompliantwiththeIETFspecificationsforpackingthosecodingsintoRTP.
7.3 Hint Track Format
Hint tracks are used to describe elementary stream data in the file. Each protocol or each family ofrelatedprotocolshas itsownhint track format.Aserverhint track formatandareceptionhint trackformatforthesameprotocolaredistinguishablefromtheassociatedfour‐charactercodeofthesampledescriptionentry.Inotherwords,adifferentfour‐charactercodeisusedforaserverhinttrackandareceptionhinttrackofthesameprotocol.Thesyntaxoftheserverhinttrackformatandthereceptionhinttrackformatforthesameprotocolshouldbethesameorcompatiblesothatareceptionhinttrackcan be used for re‐sending of the stream provided that the potential degradations of the receivedstreamsare handled appropriately.Mostprotocolswill needonlyone sampledescription format foreachtrack.
Serversfindtheirhinttracksbyfirstfindingallhinttracks,andthenlookingwithinthatsetforserverhinttracksusingtheirprotocol(sampledescriptionformat).Iftherearechoicesatthispoint,thentheserverchoosesonthebasisofpreferredprotocolorbycomparingfeaturesinthehinttrackheaderorotherprotocol‐specificinformationinthesampledescriptions.Particularlyintheabsenceofserverhinttracks, serversmay also use reception hint tracks of their protocol.However, servers should handlepotentialdegradationsofthereceivedstreamdescribedbytheusedreceptionhinttrackappropriately.
Trackshavingthetrack_in_movie flagsetarecandidates forplayback,regardlessofwhethertheyaremediatracksorreceptionhinttracks.
Hinttracksconstructstreamsbypullingdataoutofothertracksbyreference.Theseothertracksmaybehinttracksorelementarystreamtracks.Theexactformofthesepointersisdefinedbythesampleformatfortheprotocol,butingeneraltheyconsistoffourpiecesofinformation:atrackreferenceindex,asamplenumber,anoffset,andalength.Someofthesemaybeimplicitforaparticularprotocol.These‘pointers’alwayspoint to theactualsourceof thedata. Ifahint track isbuilt ‘ontop’ofanotherhinttrack, then the secondhint trackmusthavedirect references to themedia track(s) usedby the firstwheredatafromthosemediatracksisplacedinthestream.
Allhinttracksuseacommonsetofdeclarationsandstructures.
ISO/IEC 14496-12:2015(E)
20 ©ISO/IEC2015–Allrightsreserved
Hinttracksarelinkedtotheelementarystreamtrackstheycarry,bytrackreferencesoftype‘hint’
Theyuseahandler‐typeof‘hint’intheHandlerReferenceBox
TheyuseaHintMediaHeaderBox
Theyuseahintsampleentryinthesampledescription,withanameandformatuniquetotheprotocoltheyrepresent.
Server hint tracks are usually marked as disabled for local playback, with their track headertrack_in_movieandtrack_in_preview flagssetto0.
Hint tracksmay be created by an authoring tool, ormay be added to an existing presentation by ahinting tool. Such a tool serves as a ‘bridge’ between themedia and the protocol, since it intimatelyunderstandsboth.Thispermitsauthoringtoolstounderstandthemediaformat,butnotprotocols,andforserverstounderstandprotocols(andtheirhinttracks)butnotthedetailsofmediadata.
Hinttracksdonotuseseparatecompositiontimes;the‘ctts’tableisnotpresentinhinttracks.Theprocessofhintingcomputestransmissiontimescorrectlyasthedecodingtime.
NOTE1:Serversusingreceptionhinttracksashintsforsendingofthereceivedstreamsshouldhandlethepotentialdegradationsof thereceivedstreams,suchas transmissiondelay jitterandpacket losses,gracefullyandensure that the constraints of the protocols and contained data formats are obeyed regardless of thepotentialdegradationsofthereceivedstreams.
NOTE2:ConversionofreceivedstreamstomediatracksallowsexistingplayerscompliantwithearlierversionsoftheISObasemediafileformattoprocessrecordedfilesaslongasthemediaformatsaresupported.However,mostmediacodingstandardsonlyspecifythedecodingoferror‐freestreams,andconsequentlyitshouldbeensuredthatthecontent inmediatrackscanbecorrectlydecoded.Playersmayutilizereceptionhinttracks for handling of degradations caused by the transmission, i.e., content that may not be correctlydecodedislocatedonlywithinreceptionhinttracks.Theneedforhavingaduplicateofthecorrectmediasamplesinbothamediatrackandareceptionhinttrackcanbeavoidedbyincludingdatafromthemediatrackbyreferenceintothereceptionhinttrack.
8 Box Structures
8.1 File Structure and general boxes
8.1.1 Media Data Box
8.1.1.1 Definition
BoxType: ‘mdat’Container: FileMandatory:NoQuantity: Zeroormore
Thisboxcontainsthemediadata.Invideotracks,thisboxwouldcontainvideoframes.ApresentationmaycontainzeroormoreMediaDataBoxes.Theactualmediadatafollowsthetypefield;itsstructureisdescribedbythemetadata(seeparticularlythesampletable,subclause8.5,andtheitemlocationbox,subclause8.11.3).
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 21
Inlargepresentations,itmaybedesirabletohavemoredatainthisboxthana32‐bitsizewouldpermit.Inthiscase,thelargevariantofthesizefield,aboveinsubclause4.2,isused.
Theremaybeanynumberoftheseboxesinthefile(includingzero,ifallthemediadataisinotherfiles).Themetadatareferstomediadatabyitsabsoluteoffsetwithinthefile(seesubclause8.7.5,theChunkOffsetBox);soMediaDataBoxheadersandfreespacemayeasilybeskipped,andfileswithoutanyboxstructuremayalsobereferencedandused.
8.1.1.2 Syntax
aligned(8) class MediaDataBox extends Box(‘mdat’) { bit(8) data[]; }
8.1.1.3 Semantics
dataisthecontainedmediadata
8.1.2 Free Space Box
8.1.2.1 Definition
BoxTypes: ‘free’,‘skip’Container: FileorotherboxMandatory: NoQuantity: Zeroormore
The contents of a free‐space box are irrelevant andmay be ignored, or the object deleted, withoutaffecting thepresentation. (Careshouldbeexercisedwhendeleting theobject,as thismay invalidatetheoffsetsusedinthesampletable,unlessthisobjectisafterallthemediadata).
8.1.2.2 Syntax
aligned(8) class FreeSpaceBox extends Box(free_type) { unsigned int(8) data[]; }
8.1.2.3 Semantics
free_typemaybe‘free’or‘skip’.
8.1.3 Progressive Download Information Box
8.1.3.1 Definition
BoxTypes: ‘pdin’Container: FileMandatory: NoQuantity: ZeroorOne
The Progressive download information box aids the progressive download of an ISO file. The boxcontainspairs of numbers (to the endof thebox) specifying combinationsof effective file downloadbitrateinunitsofbytes/secandasuggestedinitialplaybackdelayinunitsofmilliseconds.
ISO/IEC 14496-12:2015(E)
22 ©ISO/IEC2015–Allrightsreserved
A receiving party can estimate the download rate it is experiencing, and from that obtain an upperestimateforasuitableinitialdelaybylinearinterpolationbetweenpairs,orbyextrapolationfromthefirstorlastentry.
Itisrecommendedthattheprogressivedownloadinformationboxbeplacedasearlyaspossibleinfiles,formaximumutility.
8.1.3.2 Syntax
aligned(8) class ProgressiveDownloadInfoBox extends FullBox(‘pdin’, version = 0, 0) { for (i=0; ; i++) { // to end of box unsigned int(32) rate; unsigned int(32) initial_delay; } }
8.1.3.3 Semantics
rateisadownloadrateexpressedinbytes/secondinitial_delay is the suggested delay to use when playing the file, such that if download
continuesat thegivenrate,alldatawithin the filewillarrive in time for itsuseandplaybackshouldnotneedtostall.
8.2 Movie Structure
8.2.1 Movie Box
8.2.1.1 Definition
BoxType: ‘moov’Container: FileMandatory:YesQuantity: Exactlyone
ThemetadataforapresentationisstoredinthesingleMovieBoxwhichoccursatthetop‐levelofafile.Normallythisboxisclosetothebeginningorendofthefile,thoughthisisnotrequired.
8.2.1.2 Syntax
aligned(8) class MovieBox extends Box(‘moov’){ }
8.2.2 Movie Header Box
8.2.2.1 Definition
BoxType: ‘mvhd’Container: MovieBox(‘moov’)Mandatory:YesQuantity: Exactlyone
This box defines overall information which is media‐independent, and relevant to the entirepresentationconsideredasawhole.
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 23
8.2.2.2 Syntax
aligned(8) class MovieHeaderBox extends FullBox(‘mvhd’, version, 0) { if (version==1) { unsigned int(64) creation_time; unsigned int(64) modification_time; unsigned int(32) timescale; unsigned int(64) duration; } else { // version==0 unsigned int(32) creation_time; unsigned int(32) modification_time; unsigned int(32) timescale; unsigned int(32) duration; } template int(32) rate = 0x00010000; // typically 1.0 template int(16) volume = 0x0100; // typically, full volume const bit(16) reserved = 0; const unsigned int(32)[2] reserved = 0; template int(32)[9] matrix = { 0x00010000,0,0,0,0x00010000,0,0,0,0x40000000 }; // Unity matrix bit(32)[6] pre_defined = 0; unsigned int(32) next_track_ID; }
8.2.2.3 Semantics
version isanintegerthatspecifiestheversionofthisbox(0or1inthisspecification)creation_time is an integer that declares the creation time of the presentation (in seconds
sincemidnight,Jan.1,1904,inUTCtime)modification_time is an integer that declares the most recent time the presentation was
modified(insecondssincemidnight,Jan.1,1904,inUTCtime)timescale is an integer that specifies the time‐scale for the entire presentation; this is the
number of time units that pass in one second. For example, a time coordinate system thatmeasurestimeinsixtiethsofasecondhasatimescaleof60.
duration isanintegerthatdeclareslengthofthepresentation(intheindicatedtimescale).Thisproperty is derived from the presentation’s tracks: the value of this field corresponds to theduration of the longest track in the presentation. If the duration cannot be determined thendurationissettoall1s.
rate isafixedpoint16.16numberthatindicatesthepreferredratetoplaythepresentation;1.0(0x00010000)isnormalforwardplayback
volume isafixedpoint8.8numberthatindicatesthepreferredplaybackvolume.1.0(0x0100)isfullvolume.
matrix providesatransformationmatrixforthevideo;(u,v,w)arerestrictedhereto(0,0,1),hexvalues(0,0,0x40000000).
next_track_IDisanon‐zerointegerthatindicatesavaluetouseforthetrackIDofthenexttrackto be added to this presentation. Zero is not a valid track ID value. The value ofnext_track_IDshallbelargerthanthelargesttrack‐IDinuse.Ifthisvalueisequaltoall1s(32‐bitmaxint),andanewmediatrackistobeadded,thenasearchmustbemadeinthefileforanunusedtrackidentifier.
ISO/IEC 14496-12:2015(E)
24 ©ISO/IEC2015–Allrightsreserved
8.3 Track Structure
8.3.1 Track Box
8.3.1.1 Definition
BoxType: ‘trak’Container: MovieBox(‘moov’)Mandatory:YesQuantity: Oneormore
Thisisacontainerboxforasingletrackofapresentation.Apresentationconsistsofoneormoretracks.Each track is independent of the other tracks in the presentation and carries its own temporal andspatialinformation.EachtrackwillcontainitsassociatedMediaBox.
Tracks are used for two purposes: (a) to contain media data (media tracks) and (b) to containpacketizationinformationforstreamingprotocols(hinttracks).
ThereshallbeatleastonemediatrackwithinanISOfile,andallthemediatracksthatcontributedtothehinttracksshallremaininthefile,evenifthemediadatawithinthemisnotreferencedbythehinttracks;afterdeletingallhinttracks,theentireun‐hintedpresentationshallremain.
8.3.1.2 Syntax
aligned(8) class TrackBox extends Box(‘trak’) { }
8.3.2 Track Header Box
8.3.2.1 Definition
BoxType: ‘tkhd’Container: TrackBox(‘trak’)Mandatory:YesQuantity: Exactlyone
Thisboxspecifiesthecharacteristicsofasingletrack.ExactlyoneTrackHeaderBoxiscontainedinatrack.
In the absence of an edit list, the presentation of a track starts at the beginning of the overallpresentation.Anemptyeditisusedtooffsetthestarttimeofatrack.
The default value of the track header flags for media tracks is 7 (track_enabled, track_in_movie,track_in_preview).Ifinapresentationalltrackshaveneithertrack_in_movienortrack_in_previewset,thenalltracksshallbetreatedasifbothflagsweresetonalltracks.Serverhinttracksshouldhavethetrack_in_movieandtrack_in_previewsetto0,sothattheyareignoredforlocalplaybackandpreview.
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 25
Underthe‘iso3’brandorbrandsthatshareitsrequirements,thewidthandheightinthetrackheaderaremeasuredonanotional'square'(uniform)grid.Trackvideodataisnormalizedtothesedimensions(logically) before any transformation or placement caused by a layup or composition system. Track(andmovie)matrices,ifused,alsooperateinthisuniformly‐scaledspace.
Thedurationfieldheredoesnotincludethedurationoffollowingmoviefragments,ifany,butonlyofthemedia in theenclosingMovieBox.TheMovieExtendsHeaderboxmaybeused todocument thedurationincludingmoviefragments,whendesiredandpossible.
8.3.2.2 Syntax
aligned(8) class TrackHeaderBox extends FullBox(‘tkhd’, version, flags){ if (version==1) { unsigned int(64) creation_time; unsigned int(64) modification_time; unsigned int(32) track_ID; const unsigned int(32) reserved = 0; unsigned int(64) duration; } else { // version==0 unsigned int(32) creation_time; unsigned int(32) modification_time; unsigned int(32) track_ID; const unsigned int(32) reserved = 0; unsigned int(32) duration; } const unsigned int(32)[2] reserved = 0; template int(16) layer = 0; template int(16) alternate_group = 0; template int(16) volume = {if track_is_audio 0x0100 else 0}; const unsigned int(16) reserved = 0; template int(32)[9] matrix= { 0x00010000,0,0,0,0x00010000,0,0,0,0x40000000 }; // unity matrix unsigned int(32) width; unsigned int(32) height; }
8.3.2.3 Semantics
version isanintegerthatspecifiestheversionofthisbox(0or1inthisspecification)flags isa24‐bitintegerwithflags;thefollowingvaluesaredefined:
Track_enabled:Indicatesthatthetrackisenabled.Flagvalueis0x000001.Adisabledtrack(thelowbitiszero)istreatedasifitwerenotpresent.
Track_in_movie:Indicatesthatthetrackisusedinthepresentation.Flagvalueis0x000002.Track_in_preview:Indicatesthatthetrackisusedwhenpreviewingthepresentation.Flagvalue
is0x000004.Track_size_is_aspect_ratio: Indicates that thewidth andheight fields are not expressed in
pixelunits.Thevalueshavethesameunitsbuttheseunitsarenotspecified.Thevaluesareonly an indication of the desired aspect ratio. If the aspect ratios of this track and otherrelated tracksarenot identical, then therespectivepositioningof the tracks isundefined,possiblydefinedbyexternalcontexts.Flagvalueis0x000008.
creation_time is an integer that declares the creation time of this track (in seconds sincemidnight,Jan.1,1904,inUTCtime).
modification_time isanintegerthatdeclaresthemostrecenttimethetrackwasmodified(insecondssincemidnight,Jan.1,1904,inUTCtime).
track_ID is an integer that uniquely identifies this track over the entire life‐time of thispresentation.TrackIDsareneverre‐usedandcannotbezero.
ISO/IEC 14496-12:2015(E)
26 ©ISO/IEC2015–Allrightsreserved
duration isanintegerthatindicatesthedurationofthistrack(inthetimescaleindicatedintheMovieHeaderBox).Thevalueofthisfieldisequaltothesumofthedurationsofallofthetrack’sedits.Ifthereisnoeditlist,thenthedurationisthesumofthesampledurations,convertedintothetimescaleintheMovieHeaderBox.Ifthedurationofthistrackcannotbedeterminedthendurationissettoall1s.
layer specifiesthefront‐to‐backorderingofvideotracks;trackswithlowernumbersareclosertotheviewer.0isthenormalvalue,and‐1wouldbeinfrontoftrack0,andsoon.
alternate_group is an integer that specifies a group or collection of tracks. If this field is 0thereisnoinformationonpossiblerelationstoothertracks.Ifthisfieldisnot0,itshouldbethesamefortracksthatcontainalternatedataforoneanotheranddifferentfortracksbelongingtodifferentsuchgroups.Onlyonetrackwithinanalternategroupshouldbeplayedorstreamedatanyonetime,andmustbedistinguishablefromothertracksinthegroupviaattributessuchasbitrate,codec,language,packetsizeetc.Agroupmayhaveonlyonemember.
volume isafixed8.8valuespecifyingthetrack'srelativeaudiovolume.Fullvolumeis1.0(0x0100)andisthenormalvalue.Itsvalueisirrelevantforapurelyvisualtrack.Tracksmaybecomposedbycombiningthemaccordingtotheirvolume,andthenusingtheoverallMovieHeaderBoxvolumesetting;ormorecomplexaudiocomposition(e.g.MPEG‐4BIFS)maybeused.
matrix providesatransformationmatrixforthevideo;(u,v,w)arerestrictedhereto(0,0,1),hex(0,0,0x40000000).
width and height fixed‐point16.16valuesaretrack‐dependentasfollows:
Fortextandsubtitletracks,theymay,dependingonthecodingformat,describethesuggestedsizeoftherenderingarea.Forsuchtracks,thevalue0x0mayalsobeusedtoindicatethatthedatamayberenderedatanysize,thatnopreferredsizehasbeenindicatedandthattheactualsizemaybedeterminedbytheexternalcontextorbyreusingthewidthandheightofanothertrack.Forthosetracks,theflagtrack_size_is_aspect_ratiomayalsobeused.
Fornon‐visualtracks(e.g.audio),theyshouldbesettozero.
Forallothertracks,theyspecifythetrack'svisualpresentationsize.Theseneednotbethesameas the pixel dimensions of the images,which is documented in the sample description(s); allimages in the sequence are scaled to this size, before anyoverall transformationof the trackrepresentedbythematrix.Thepixeldimensionsoftheimagesarethedefaultvalues.
8.3.3 Track Reference Box
8.3.3.1 Definition
BoxType: `tref’Container: TrackBox(‘trak’)Mandatory:NoQuantity: Zeroorone
This box provides a reference from the containing track to another track in the presentation. Thesereferencesaretyped.A‘hint’referencelinksfromthecontaininghinttracktothemediadatathatithints. A content description reference‘cdsc’ links a descriptive or metadata track to the contentwhichitdescribes.The‘hind’dependencyindicatesthatthereferencedtrack(s)maycontainmediadatarequiredfordecodingof thetrackcontainingthetrackreference.Thereferencedtracksshallbehint tracks. The ‘hind’ dependency can, for example, be used for indicating the dependenciesbetweenhinttracksdocumentinglayeredIPmulticastoverRTP.
ExactlyoneTrackReferenceBoxcanbecontainedwithintheTrackBox.
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 27
Ifthisboxisnotpresent,thetrackisnotreferencinganyothertrackinanyway.Thereferencearrayissizedtofillthereferencetypebox.
8.3.3.2 Syntax
aligned(8) class TrackReferenceBox extends Box(‘tref’) { }
aligned(8) class TrackReferenceTypeBox (unsigned int(32) reference_type) extends Box(reference_type) { unsigned int(32) track_IDs[]; }
8.3.3.3 Semantics
TheTrackReferenceBoxcontainstrackreferencetypeboxes.
track_ID isanintegerthatprovidesareferencefromthecontainingtracktoanothertrackinthepresentation.track_IDsareneverre‐usedandcannotbeequaltozero.
Thereference_type shallbesettooneofthefollowingvalues,oravalueregisteredorfromaderivedspecificationorregistration: ‘hint’ thereferencedtrack(s)containtheoriginalmediaforthishinttrack.
‘cdsc‘ thistrackdescribesthereferencedtrack.
‘font‘ thistrackusesfontscarried/definedinthereferencedtrack.
‘hind‘ this trackdependsonthereferencedhint track, i.e., itshouldonlybeused if thereferencedhinttrackisused.
‘vdep’ this track contains auxiliary depth video information for the referenced videotrack.
‘vplx’ this track contains auxiliary parallax video information for the referenced videotrack.
‘subt’ this track contains subtitle, timed text or overlay graphical information for thereferencedtrackoranytrackinthealternategrouptowhichthetrackbelongs,ifany.
8.3.4 Track Group Box
8.3.4.1 Definition
BoxType: ‘trgr’Container: TrackBox(‘trak’)Mandatory: NoQuantity: Zeroorone
Thisboxenablesindicationofgroupsoftracks,whereeachgroupsharesaparticularcharacteristicorthetrackswithinagrouphaveaparticularrelationship.Theboxcontainszeroormoreboxes,andtheparticular characteristic or the relationship is indicated by the box type of the contained boxes. Thecontainedboxesincludeanidentifier,whichcanbeusedtoconcludethetracksbelongingtothesame
ISO/IEC 14496-12:2015(E)
28 ©ISO/IEC2015–Allrightsreserved
trackgroup.ThetracksthatcontainthesametypeofacontainedboxwithintheTrackGroupBoxandhavethesameidentifiervaluewithinthesecontainedboxesbelongtothesametrackgroup.
Trackgroupsshallnotbeusedtoindicatedependencyrelationshipsbetweentracks.Instead,theTrackReferenceBoxisusedforsuchpurposes.
8.3.4.2 Syntax
aligned(8) class TrackGroupBox('trgr') { }
aligned(8) class TrackGroupTypeBox(unsigned int(32) track_group_type) extends FullBox(track_group_type, version = 0, flags = 0) { unsigned int(32) track_group_id; // the remaining data may be specified for a particular track_group_type }
8.3.4.3 Semantics
track_group_type indicatesthegroupingtypeandshallbesettooneofthefollowingvalues,oravalueregistered,oravaluefromaderivedspecificationorregistration:
'msrc' indicates that this track belongs to amulti‐source presentation. The tracks thathave the same value of track_group_id within a Group Type Box oftrack_group_type 'msrc' are mapped as being originated from the samesource. For example, a recording of a video telephony callmayhaveboth audioand video for both participants, and the value oftrack_group_id associatedwith theaudio trackand thevideo trackofoneparticipantdiffers fromvalueoftrack_group_idassociatedwiththetracksoftheotherparticipant.
Thepairof track_group_idandtrack_group_typeidentifiesatrackgroupwithinthefile.Thetracks that contain a particular track group type box having the same value of track_group_idbelongtothesametrackgroup.
8.4 Track Media Structure
8.4.1 Media Box
8.4.1.1 Definition
BoxType: ‘mdia’Container: TrackBox(‘trak’)Mandatory:YesQuantity: Exactlyone
Themediadeclarationcontainercontainsalltheobjectsthatdeclareinformationaboutthemediadatawithinatrack.
8.4.1.2 Syntax
aligned(8) class MediaBox extends Box(‘mdia’) { }
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 29
8.4.2 Media Header Box
8.4.2.1 Definition
BoxType: ‘mdhd’Container: MediaBox(‘mdia’)Mandatory:YesQuantity: Exactlyone
The media header declares overall information that is media‐independent, and relevant tocharacteristicsofthemediainatrack.
8.4.2.2 Syntax
aligned(8) class MediaHeaderBox extends FullBox(‘mdhd’, version, 0) { if (version==1) { unsigned int(64) creation_time; unsigned int(64) modification_time; unsigned int(32) timescale; unsigned int(64) duration; } else { // version==0 unsigned int(32) creation_time; unsigned int(32) modification_time; unsigned int(32) timescale; unsigned int(32) duration; } bit(1) pad = 0; unsigned int(5)[3] language; // ISO-639-2/T language code unsigned int(16) pre_defined = 0; }
8.4.2.3 Semantics
version isanintegerthatspecifiestheversionofthisbox(0or1)creation_time isanintegerthatdeclaresthecreationtimeofthemediainthistrack(in
secondssincemidnight,Jan.1,1904,inUTCtime).modification_time isanintegerthatdeclaresthemostrecenttimethemediainthistrackwas
modified(insecondssincemidnight,Jan.1,1904,inUTCtime).timescale isanintegerthatspecifiesthetime‐scaleforthismedia;thisisthenumberoftime
unitsthatpassinonesecond.Forexample,atimecoordinatesystemthatmeasurestimeinsixtiethsofasecondhasatimescaleof60.
durationisanintegerthatdeclaresthedurationofthismedia(inthescaleofthetimescale).Ifthedurationcannotbedeterminedthendurationissettoall1s.
language declares the language code for this media. See ISO 639‐2/T for the set of threecharactercodes.Eachcharacter ispackedas thedifferencebetween itsASCIIvalueand0x60.Sincethecodeisconfinedtobeingthreelower‐caseletters,thesevaluesarestrictlypositive.
8.4.3 Handler Reference Box
8.4.3.1 Definition
BoxType: ‘hdlr’Container: MediaBox(‘mdia’)orMetaBox(‘meta’)Mandatory:YesQuantity: Exactlyone
ThisboxwithinaMediaBoxdeclaresmediatypeofthetrack,andthustheprocessbywhichthemedia‐data in the track is presented. For example, a format forwhich thedecoderdelivers videowouldbe
ISO/IEC 14496-12:2015(E)
30 ©ISO/IEC2015–Allrightsreserved
stored in a video track, identified by being handled by a video handler. The documentation of thestorageofamediaformatidentifiesthemediatypewhichthatformatuses.
ThisboxwhenpresentwithinaMetaBox,declaresthestructureorformatofthe'meta'boxcontents.
There is a general handler formetadata streams of any type; the specific format is identified by thesampleentry,asforvideooraudio,forexample.
8.4.3.2 Syntax
aligned(8) class HandlerBox extends FullBox(‘hdlr’, version = 0, 0) { unsigned int(32) pre_defined = 0; unsigned int(32) handler_type; const unsigned int(32)[3] reserved = 0; string name; }
8.4.3.3 Semantics
version isanintegerthatspecifiestheversionofthisboxhandler_type–whenpresentinamediabox,containsavalueasdefinedinclause12,oravaluefromaderivedspecification,orregistration.
-- whenpresentinametabox,containsanappropriatevaluetoindicatetheformatofthemetaboxcontents.Thevalue‘null’ canbeusedintheprimarymetaboxtoindicatethatitismerelybeingusedtoholdresources.
name isanull‐terminatedstringinUTF‐8characterswhichgivesahuman‐readablenameforthetracktype(fordebuggingandinspectionpurposes).
8.4.4 Media Information Box
8.4.4.1 Definition
BoxType: ‘minf’Container: MediaBox(‘mdia’)Mandatory:YesQuantity: Exactlyone
Thisboxcontainsalltheobjectsthatdeclarecharacteristicinformationofthemediainthetrack.
8.4.4.2 Syntax
aligned(8) class MediaInformationBox extends Box(‘minf’) { }
8.4.5 Media Information Header Boxes
8.4.5.1 Definition
Thereisadifferentmediainformationheaderforeachtracktype(correspondingtothemediahandler‐type); thematchingheader shallbepresent,whichmaybeoneof thosedefined in clause12, oronedefinedinaderivedspecification.
Thetypeofmediaheaderisusedisdeterminedbythedefinitionofthemediatypeandmustmatchthemediahandler.
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 31
8.4.5.2 Null Media Header Box
8.4.5.2.1 Definition
BoxTypes: ‘nmhd’Container: MediaInformationBox(‘minf’)Mandatory:YesQuantity: Exactlyonespecificmediaheadershallbepresent
StreamsforwhichnospecificmediaheaderisidentifieduseanullMediaHeaderBox,asdefinedhere.
8.4.5.2.2 Syntax
aligned(8) class NullMediaHeaderBox extends FullBox(’nmhd’, version = 0, flags) { } 8.4.5.2.3 Semantics
version ‐isanintegerthatspecifiestheversionofthisbox.flags ‐isa24‐bitintegerwithflags(currentlyallzero).
8.4.6 Extended language tag
8.4.6.1 Definition
BoxType: ‘elng’Container: MediaBox(‘mdia’)Mandatory:NoQuantity: Zeroorone
The extended language tag box represents media language information, based on RFC 4646 (BestCommonPractices–BCP–47)industrystandard.It isanoptionalpeerofthemediaheaderbox,andmustoccurafterthemediaheaderbox.
TheextendedlanguagetagcanprovidebetterlanguageinformationthanthelanguagefieldintheMediaHeader,includinginformationsuchasregion,script,variation,andsoon,asparts(orsubtags).
Theextendedlanguagetagboxisoptional,andif it isabsentthemedialanguageshouldbeused.Theextendedlanguagetagoverridesthemedialanguageiftheyarenotconsistent.
Forbestcompatibilitywithearlierplayers,ifanextendedlanguagetagisspecified,themostcompatiblelanguagecodeshouldbespecifiedinthelanguagefieldoftheMediaHeaderbox(forexample,"eng"iftheextendedlanguagetagis"en‐UK").Ifthereisnoreasonablycompatibletag,thepackedformof'und'canbeused.
8.4.6.2 Syntax
aligned(8) class ExtendedLanguageBox extends FullBox(‘elng’, 0, 0) { string extended_language; }
ISO/IEC 14496-12:2015(E)
32 ©ISO/IEC2015–Allrightsreserved
8.4.6.3 Semantics
extended_languageisaNULL‐terminatedCstringcontaininganRFC4646(BCP47)compliantlanguagetagstring,suchas"en‐US","fr‐FR",or"zh‐CN".
8.5 Sample Tables
8.5.1 Sample Table Box
8.5.1.1 Definition
BoxType: ‘stbl’Container: MediaInformationBox(‘minf’)Mandatory:YesQuantity: Exactlyone
The sample table contains all the time and data indexing of themedia samples in a track. Using thetables here, it is possible to locate samples in time, determine their type (e.g. I‐frame or not), anddeterminetheirsize,container,andoffsetintothatcontainer.
IfthetrackthatcontainstheSampleTableBoxreferencesnodata,thentheSampleTableBoxdoesnotneedtocontainanysub‐boxes(thisisnotaveryusefulmediatrack).
IfthetrackthattheSampleTableBoxiscontainedindoesreferencedata,thenthefollowingsub‐boxesarerequired:SampleDescription,SampleSize,SampleToChunk,andChunkOffset.Further,theSampleDescription Box shall contain at least one entry. A Sample Description Box is required because itcontainsthedatareferenceindexfieldwhichindicateswhichDataReferenceBoxtousetoretrievethemedia samples. Without the Sample Description, it is not possible to determine where the mediasamplesarestored.TheSyncSampleBoxisoptional.IftheSyncSampleBoxisnotpresent,allsamplesaresyncsamples.
A.7providesanarrativedescriptionofrandomaccessusingthestructuresdefinedintheSampleTableBox.
8.5.1.2 Syntax
aligned(8) class SampleTableBox extends Box(‘stbl’) { }
8.5.2 Sample Description Box
8.5.2.1 Definition
BoxTypes: ‘stsd’Container: SampleTableBox(‘stbl’)Mandatory:YesQuantity: Exactlyone
The sample description table gives detailed information about the coding type used, and anyinitializationinformationneededforthatcoding.
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 33
Theinformationstoredinthesampledescriptionboxaftertheentry‐countisbothtrack‐typespecificas documented here, and can also have variantswithin a track type (e.g. different codingsmay usedifferentspecificinformationaftersomecommonfields,evenwithinavideotrack).
Which typeof sample entry form isused isdeterminedby themediahandler, using a suitable form,suchasonedefinedinclause12,ordefinedinaderivedspecification,orregistration.
Multipledescriptionsmaybeusedwithinatrack.
Note Thoughthecountis32bits,thenumberofitemsisusuallymuchfewer,andisrestrictedbythefactthatthereferenceindexinthesampletableisonly16bits
If the ‘format’ field of a SampleEntry is unrecognized, neither the sample description itself, nor theassociatedmediasamples,shallbedecoded.
Note The definition of sample entries specifies boxes in a particular order, and this is usually also followed inderivedspecifications.Formaximumcompatibility,writers shouldconstruct files respecting theorderbothwithinspecificationsandasimpliedbytheinheritance,whereasreadersshouldbepreparedtoacceptanyboxorder.
Allstringfieldsshallbenull‐terminated,evenifunused.“Optional”meansthereisatleastonenullbyte.
Entries that identify the format by MIME type, such as a TextSubtitleSampleEntry,TextMetaDataSampleEntry,orSimpleTextSampleEntry,allofwhichcontainaMIMEtype,maybeusedtoidentifytheformatofstreamsforwhichaMIMEtypeapplies.AMIMEtypeappliesifthecontentsofthestringintheoptionalconfigurationbox(withoutitsnulltermination),followedbythecontentsofaset of samples, startingwith a sync sample and ending at the sample immediately preceding a syncsample, are concatenated in their entirety, and the result meets the decoding requirements fordocuments of that MIME type. Non‐sync samples should be used only if that format specifies thebehaviour of ‘progressive decoding’, and then the sample times indicate when the results of suchprogressivedecodingshouldbepresented(accordingtothemediatype).
Note ThesamplesinatrackthatisallsyncsamplesarethereforeeachavaliddocumentforthatMIMEtype.
Insomeclassesderived fromSampleEntry,namespaceandschema_locationareusedbothto identifythe XML document content and to declare “brand” or profile compatibility. Multiple namespaceidentifiers indicatethat thetrackconformstothespecificationrepresentedbyeachof the identifiers,someofwhichmayidentifysupersetsofthefeaturespresent.Adecodershouldbeabletodecodeallthenamespacesinordertobeabletodecodeandpresentcorrectlythemediaassociatedwiththissampleentry.
Note Additionally, namespace identifiers may represent performance constraints, such as limits ondocument size, font size, drawing rate, etc., as well as syntax constraints such as features that are notpermittedorignored.
8.5.2.2 Syntax
aligned(8) abstract class SampleEntry (unsigned int(32) format) extends Box(format){ const unsigned int(8)[6] reserved = 0; unsigned int(16) data_reference_index; }
ISO/IEC 14496-12:2015(E)
34 ©ISO/IEC2015–Allrightsreserved
class BitRateBox extends Box(‘btrt’){ unsigned int(32) bufferSizeDB; unsigned int(32) maxBitrate; unsigned int(32) avgBitrate; }
aligned(8) class SampleDescriptionBox (unsigned int(32) handler_type) extends FullBox('stsd', version, 0){ int i ; unsigned int(32) entry_count; for (i = 1 ; i <= entry_count ; i++){ SampleEntry(); // an instance of a class derived from SampleEntry } }
8.5.2.3 Semantics
version issettozerounlesstheboxcontainsanAudioSampleEntryV1,whereuponversionmustbe1
entry_countisanintegerthatgivesthenumberofentriesinthefollowingtableSampleEntryistheappropriatesampleentry.data_reference_index is an integer that contains the index of the data reference to use to
retrieve data associated with samples that use this sample description. Data references arestoredinDataReferenceBoxes.Theindexrangesfrom1tothenumberofdatareferences.
bufferSizeDBgivesthesizeofthedecodingbufferfortheelementarystreaminbytes.maxBitrategivesthemaximumrateinbits/secondoveranywindowofonesecond.avgBitrategivestheaveragerateinbits/secondovertheentirepresentation.
8.5.3 Degradation Priority Box
8.5.3.1 Definition
BoxType: ‘stdp’Container: SampleTableBox(‘stbl’).Mandatory:No.Quantity: Zeroorone.
Thisboxcontainsthedegradationpriorityofeachsample.Thevaluesarestored inthetable,one foreachsample.Thesizeof the table,sample_count is taken fromthesample_count in theSampleSizeBox('stsz').Specificationsderivedfromthisdefinetheexactmeaningandacceptablerangeofthepriorityfield.
8.5.3.2 Syntax
aligned(8) class DegradationPriorityBox extends FullBox(‘stdp’, version = 0, 0) { int i; for (i=0; i < sample_count; i++) { unsigned int(16) priority; } }
8.5.3.3 Semantics
version ‐isanintegerthatspecifiestheversionofthisbox.priority ‐isintegerspecifyingthedegradationpriorityforeachsample.
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 35
8.5.4 Sample Scale Box
(emptysub‐clause)
8.6 Track Time Structures
8.6.1 Time to Sample Boxes
8.6.1.1 Definition
Thecompositiontimes(CT)anddecodingtimes(DT)ofsamplesarederivedfromtheTimetoSampleBoxes,ofwhichtherearetwotypes.ThedecodingtimeisdefinedintheDecodingTimetoSampleBox,giving time deltas between successive decoding times. The composition times are derived in theCompositionTime toSampleBoxascomposition timeoffsets fromdecoding time. If thecompositiontimesanddecodingtimesare identical foreverysample inthetrack, thenonlytheDecodingTimetoSampleBoxisrequired;thecompositiontimetosampleboxmustnotbepresent.
Thetimetosampleboxesmustgivenon‐zerodurationsforallsampleswiththepossibleexceptionofthelastone.Durationsinthe‘stts’boxarestrictlypositive(non‐zero),exceptfortheverylastentry,whichmaybe zero.This rulederives from the rule thatno two time‐stamps in a streammaybe thesame.Greatcaremustbetakenwhenaddingsamplestoastream,thatthesamplethatwaspreviouslylastmayneedtohaveanon‐zerodurationestablished,inordertoobservethisrule.Ifthedurationofthelastsampleisindeterminate,useanarbitrarysmallvalueanda‘dwell’edit.
Somecodingsystemsmayallowsamples thatareusedonly forreferenceandnotoutput (e.g.anon‐displayed reference frame in video). When any such non‐output sample is present in a track, thefollowingapplies:
1) Anon‐outputsampleshallbegivenacompositiontimewhichisoutsidethetime‐rangeofthesamplesthatareoutput.
2) Aneditlistshallbeusedtoexcludethecompositiontimesofthenon‐outputsamples.
3) WhenthetrackincludesaCompositionOffsetBox(‘ctts’),
a. version1oftheCompositionOffsetBoxshallbeused,
b. thevalueofsample_offsetshallbesetequaltothemostnegativenumberpossible(for32‐bitvalues,‐231)foreachnon‐outputsample,
c. theCompositionToDecodeBox(‘cslg’)shouldbecontainedintheSampleTableBox(‘stbl’)ofthetrack,and
d. whentheCompositionToDecodeBoxispresentforthetrack,thevalueofleastDecodeToDisplayDeltafieldintheboxshallbeequaltothesmallestcompositionoffsetintheCompositionOffsetBoxexcludingthesample_offsetvaluesfornon‐outputsamples.
Note Thus,leastDecodeToDisplayDeltaisgreaterthan‐231.
Inthefollowingexample,thereisasequenceofI,P,andBframes,eachwithadecodingtimedeltaof10.The samples are stored as follows, with the indicated values for their decoding time deltas andcompositiontimeoffsets(theactualCTandDTaregivenforreference).There‐orderingoccursbecausethepredictedPframesmustbedecodedbeforethebi‐directionallypredictedBframes.ThevalueofDT
ISO/IEC 14496-12:2015(E)
36 ©ISO/IEC2015–Allrightsreserved
forasampleisalwaysthesumofthedeltasoftheprecedingsamples.Notethatthetotalofthedecodingdeltasisthedurationofthemediainthistrack.
Table 2 — Closed GOP Example
GOP /‐‐ ‐‐‐ ‐‐‐ ‐‐‐ ‐‐‐ ‐‐‐ ‐‐\ /‐‐ ‐‐‐ ‐‐‐ ‐‐‐ ‐‐‐ ‐‐‐ ‐‐\
I1 P4 B2 B3 P7 B5 B6 I8 P11 B9 B10 P14 B12 B13
DT 0 10 20 30 40 50 60 70 80 90 100 110 120 130
CT 10 40 20 30 70 50 60 80 110 90 100 140 120 130
Decodedelta 10 10 10 10 10 10 10 10 10 10 10 10 10 10
Compositionoffset
10 30 0 0 30 0 0 10 30 0 0 30 0 0
Table 3 — Open GOP Example
GOP /‐‐ ‐‐ ‐‐ ‐‐ ‐‐ ‐‐\ /‐ ‐‐ ‐‐ ‐‐ ‐‐‐ ‐‐\ I3 B1 B2 P6 B4 B5 I9 B7 B8 P12 B10 B11DT 0 10 20 30 40 50 60 70 80 90 100 110CT 30 10 20 60 40 50 90 70 80 120 100 110DecodeDelta 10 10 10 10 10 10 10 10 10 10 10 10Compositionoffset
30 0 0 30 0 0 30 0 0 30 0 0
8.6.1.2 Decoding Time to Sample Box
8.6.1.2.1 Definition
BoxType: ‘stts’Container: SampleTableBox(‘stbl’)Mandatory:YesQuantity: Exactlyone
This box contains a compact version of a table that allows indexing from decoding time to samplenumber.Othertablesgivesamplesizesandpointers,fromthesamplenumber.Eachentryinthetablegivesthenumberofconsecutivesampleswiththesametimedelta,andthedeltaofthosesamples.Byaddingthedeltasacompletetime‐to‐samplemapmaybebuilt.
TheDecoding Time to Sample Box contains decode time delta's: DT(n+1) = DT(n) + STTS(n)whereSTTS(n)isthe(uncompressed)tableentryforsamplen.
Thesampleentriesareorderedbydecodingtimestamps;thereforethedeltasareallnon‐negative.
TheDTaxishasazeroorigin;DT(i)=SUM(forj=0toi‐1ofdelta(j)),andthesumofalldeltasgivesthelengthofthemediainthetrack(notmappedtotheoveralltimescale,andnotconsideringanyeditlist).
TheEditListBoxprovidestheinitialCTvalueifitisnon‐empty(non‐zero).
8.6.1.2.2 Syntax
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 37
aligned(8) class TimeToSampleBox extends FullBox(’stts’, version = 0, 0) { unsigned int(32) entry_count; int i; for (i=0; i < entry_count; i++) { unsigned int(32) sample_count; unsigned int(32) sample_delta; } }
ForexamplewithTable2,theentrywouldbe:
SamplecountSample‐delta
14 10
8.6.1.2.3 Semantics
version ‐isanintegerthatspecifiestheversionofthisbox.entry_count‐isanintegerthatgivesthenumberofentriesinthefollowingtable.sample_count‐isanintegerthatcountsthenumberofconsecutivesamplesthathavethegiven
duration.sample_delta‐isanintegerthatgivesthedeltaofthesesamplesinthetime‐scaleofthemedia.
8.6.1.3 Composition Time to Sample Box
8.6.1.3.1 Definition
BoxType: ‘ctts’Container: SampleTableBox(‘stbl’)Mandatory:NoQuantity: Zeroorone
Thisboxprovidestheoffsetbetweendecodingtimeandcompositiontime.Inversion0ofthisboxthedecoding time must be less than the composition time, and the offsets are expressed as unsignednumbers such that CT(n) = DT(n) + CTTS(n) where CTTS(n) is the (uncompressed) table entry forsamplen.Inversion1ofthisbox,thecompositiontimelineandthedecodingtimelinearestillderivedfrom each other, but the offsets are signed. It is recommended that for the computed compositiontimestamps,thereisexactlyonewiththevalue0(zero).
Foreitherversionofthebox,eachsamplemusthaveauniquecompositiontimestampvalue,thatis,thetimestampfortwosamplesshallneverbethesame.
Itmaybetruethatthereisnoframetocomposeattime0;thehandlingofthisisunspecified(systemsmightdisplaythefirstframeforlonger,orasuitablefillcolour).
Whenversion1of thisbox isused, theCompositionToDecodeBoxmayalsobepresent in thesampletabletorelatethecompositionanddecodingtimelines.Whenbackwards‐compatibilityorcompatibilitywithanunknownsetofreadersisdesired,version0ofthisboxshouldbeusedwhenpossible.Ineitherversionofthisbox,butparticularlyunderversion0,ifitisdesiredthatthemediastartattracktime0,andthefirstmediasampledoesnothaveacompositiontimeof0,aneditlistmaybeusedto‘shift’themediatotime0.
ISO/IEC 14496-12:2015(E)
38 ©ISO/IEC2015–Allrightsreserved
ThecompositiontimetosampletableisoptionalandmustonlybepresentifDTandCTdifferforanysamples.
Hinttracksdonotusethisbox.
ForexampleinTable2
Samplecount Sample_offset
1 10
1 30
2 0
1 30
2 0
1 10
1 30
2 0
1 30
2 0
8.6.1.3.2 Syntax
aligned(8) class CompositionOffsetBox extends FullBox(‘ctts’, version, 0) { unsigned int(32) entry_count; int i; if (version==0) { for (i=0; i < entry_count; i++) { unsigned int(32) sample_count; unsigned int(32) sample_offset; } } else if (version == 1) { for (i=0; i < entry_count; i++) { unsigned int(32) sample_count; signed int(32) sample_offset; } } }
8.6.1.3.3 Semantics
version ‐isanintegerthatspecifiestheversionofthisbox.entry_count isanintegerthatgivesthenumberofentriesinthefollowingtable.sample_count isan integer thatcounts thenumberofconsecutivesamples thathavethegiven
offset.sample_offsetisanintegerthatgivestheoffsetbetweenCTandDT,suchthatCT(n)=DT(n)+
CTTS(n).
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 39
8.6.1.4 Composition to Decode Box
8.6.1.4.1 Definition
BoxType: ‘cslg’Container: SampleTableBox(‘stbl’)orTrackExtensionPropertiesBox(‘trep’)Mandatory:NoQuantity: Zeroorone
Whensignedcompositionoffsetsareused,thisboxmaybeusedtorelatethecompositionanddecodingtimelines,anddealwithsomeoftheambiguitiesthatsignedcompositionoffsetsintroduce.
Note that all these fields apply to the entire media (not just that selected by any edits). It isrecommendedthatanyedits,explicitorimplied,notselectanyportionofthecompositiontimelinethatdoesnotmaptoasample.Forexample,ifthesmallestcompositiontimeis1000,thenthedefaulteditfrom0tothemediadurationleavestheperiodfrom0to1000associatedwithnomediasample.Playerbehaviour, and what is composed in this interval, is undefined under these circumstances. It isrecommendedthatthesmallestcomputedCTSbezero,ormatchthebeginningofthefirstedit.
Thecompositiondurationof the lastsample ina trackmightbe(often is)ambiguousorunclear; thefield for composition end time can be used to clarify this ambiguity and,with the composition starttime,establishaclearcompositiondurationforthetrack.
When the Composition to Decode Box is included in the Sample Table Box, it documents thecomposition and decoding time relations of the samples in the Movie Box only, not including anysubsequentmoviefragments.WhentheCompositiontoDecodeBoxisincludedintheTrackExtensionPropertiesBox,itdocumentsthecompositionanddecodingtimerelationsofthesamplesinallmoviefragmentsfollowingtheMovieBox.
Version1ofthisboxsupports64‐bittimestampsandshouldonlybeusedifneeded(atleastonevaluedoesnotfitinto32bits).
8.6.1.4.2 Syntax
class CompositionToDecodeBox extends FullBox(‘cslg’, version, 0) { if (version==0) { signed int(32) compositionToDTSShift; signed int(32) leastDecodeToDisplayDelta; signed int(32) greatestDecodeToDisplayDelta; signed int(32) compositionStartTime; signed int(32) compositionEndTime; } else { signed int(64) compositionToDTSShift; signed int(64) leastDecodeToDisplayDelta; signed int(64) greatestDecodeToDisplayDelta; signed int(64) compositionStartTime; signed int(64) compositionEndTime; } }
8.6.1.4.3 Semantics
compositionToDTSShift: ifthisvalueisaddedtothecompositiontimes(ascalculatedbytheCTSoffsets fromtheDTS), then forall samples, theirCTS isguaranteed tobegreater thanorequaltotheirDTS,andthebuffermodelimpliedbytheindicatedprofile/levelwillbehonoured;
ISO/IEC 14496-12:2015(E)
40 ©ISO/IEC2015–Allrightsreserved
ifleastDecodeToDisplayDeltaispositiveorzero,thisfieldcanbe0;otherwiseitshouldbeatleast(- leastDecodeToDisplayDelta)
leastDecodeToDisplayDelta: the smallest composition offset in theCompositionTimeToSampleboxinthistrack
greatestDecodeToDisplayDelta: the largest composition offset in theCompositionTimeToSampleboxinthistrack
compositionStartTime: thesmallestcomputedcompositiontime(CTS)foranysampleinthemediaofthistrack
compositionEndTime:thecompositiontimeplusthecompositionduration,ofthesamplewiththe largest computedcomposition time(CTS) in themediaof this track; if this field takes thevalue0,thecompositionendtimeisunknown.
8.6.2 Sync Sample Box
8.6.2.1 Definition
BoxType: ‘stss’Container: SampleTableBox(‘stbl’)Mandatory:NoQuantity: Zeroorone
Thisboxprovidesacompactmarkingofthesyncsampleswithinthestream.Thetableisarrangedinstrictlyincreasingorderofsamplenumber.
Ifthesyncsampleboxisnotpresent,everysampleisasyncsample.
8.6.2.2 Syntax
aligned(8) class SyncSampleBox extends FullBox(‘stss’, version = 0, 0) { unsigned int(32) entry_count; int i; for (i=0; i < entry_count; i++) { unsigned int(32) sample_number; } }
8.6.2.3 Semantics
version ‐isanintegerthatspecifiestheversionofthisbox.entry_count isanintegerthatgivesthenumberofentriesinthefollowingtable.Ifentry_count
iszero,therearenosyncsampleswithinthestreamandthefollowingtableisempty.sample_numbergivesthenumbersofthesamplesthataresyncsamplesinthestream.
8.6.3 Shadow Sync Sample Box
8.6.3.1 Definition
BoxType: ‘stsh’Container: SampleTableBox(‘stbl’)Mandatory:NoQuantity: Zeroorone
Theshadowsynctableprovidesanoptionalsetofsyncsamplesthatcanbeusedwhenseekingorforsimilarpurposes.Innormalforwardplaytheyareignored.
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 41
Eachentry intheShadowSyncTableconsistsofapairofsamplenumbers.The firstentry(shadowed‐sample‐number)indicatesthenumberofthesamplethatashadowsyncwillbedefinedfor.Thisshouldalways be a non‐sync sample (e.g. a frame difference). The second sample number (sync‐sample‐number)indicatesthesamplenumberofthesyncsample(i.e.keyframe)thatcanbeusedwhenthereisaneedforasyncsampleat,orbefore,theshadowed‐sample‐number.
TheentriesintheShadowSyncBoxshallbesortedbasedontheshadowed‐sample‐numberfield.
The shadow sync samples are normally placed in an area of the track that is not presented duringnormalplay (editedoutbymeansofanedit list), though this isnotarequirement.Theshadowsynctable canbe ignoredand the trackwillplay (andseek) correctly if it is ignored (thoughperhapsnotoptimally).
TheShadowSyncSamplereplaces,notaugments,thesamplethatitshadows(i.e.thenextsamplesentisshadowed‐sample‐number+1).Theshadowsync sample is treatedas if itoccurredat the timeof thesampleitshadows,havingthedurationofthesampleitshadows.
Hinting and transmission might become more complex if a shadow sample is used also as part ofnormalplayback,orisusedmorethanonceasashadow.Inthiscasethehinttrackmightneedseparateshadowsyncs,allofwhichcanget theirmediadata fromtheoneshadowsync in themedia track, toallowforthedifferenttime‐stampsetc.neededintheirheaders.
8.6.3.2 Syntax
aligned(8) class ShadowSyncSampleBox extends FullBox(‘stsh’, version = 0, 0) { unsigned int(32) entry_count; int i; for (i=0; i < entry_count; i++) { unsigned int(32) shadowed_sample_number; unsigned int(32) sync_sample_number; } }
8.6.3.3 Semantics
version ‐isanintegerthatspecifiestheversionofthisbox.entry_count‐isanintegerthatgivesthenumberofentriesinthefollowingtable.shadowed_sample_number‐givesthenumberofasampleforwhichthereisanalternativesync
sample.sync_sample_number‐givesthenumberofthealternativesyncsample.
8.6.4 Independent and Disposable Samples Box
8.6.4.1 Definition
BoxTypes: ‘sdtp’Container: SampleTableBox(‘stbl’)Mandatory:NoQuantity: Zeroorone
Thisoptionaltableanswersthreequestionsaboutsampledependency:1) doesthissampledependonothers(e.g.isitanI‐picture)?2) donoothersamplesdependonthisone?
ISO/IEC 14496-12:2015(E)
42 ©ISO/IEC2015–Allrightsreserved
3) does this sample contain multiple (redundant) encodings of the data at this time‐instant(possiblywithdifferentdependencies)?
Intheabsenceofthistable:1) thesyncsample table(partly)answers the firstquestion; inmostvideocodecs, I‐picturesare
alsosyncpoints,2) thedependencyofothersamplesonthisoneisunknown.3) theexistenceofredundantcodingisunknown.
Whenperforming‘trick’modes,suchasfast‐forward,itispossibletousethefirstpieceofinformationtolocateindependentlydecodablesamples.Similarly,whenperformingrandomaccess,itmaybenecessarytolocatetheprevioussyncsampleorrandomaccessrecoverypoint,androll‐forwardfromthesyncsampleorthepre‐rollstartingpointoftherandomaccessrecoverypointtothedesiredpoint.Whilerollingforward,samplesonwhichnoothersdependneednotberetrievedordecoded.
Thevalueof‘sample_is_depended_on’isindependentoftheexistenceofredundantcodings.However,aredundantcodingmayhavedifferentdependenciesfromtheprimarycoding;ifredundantcodingsareavailable,thevalueof‘sample_depends_on’documentsonlytheprimarycoding.
A leading sample (usually a picture in video) is defined relative to a reference sample,which is theimmediatelypriorsamplethatismarkedas“sample_depends_on”havingnodependency(anIpicture).Aleadingsamplehasbothacompositiontimebeforethereferencesample,andpossiblyalsoadecodingdependencyonasamplebeforethereferencesample.Thereforeif,forexample,playbackanddecodingweretostartatthereferencesample,thosesamplesmarkedasleadingwouldnotbeneededandmightnotbedecodable.Aleadingsampleitselfmustthereforenotbemarkedashavingnodependency.
For tracks with a handler_type that is not ‘vide’, ‘soun’, ‘hint’ or ‘auxv’, if another sample withsample_depends_on=2 oranothersample taggedasa “SyncSample”hasalreadybeenprocessedand unless specified otherwise, a sample tagged with sample_depends_on=2, andsample_has_redundancy=1 can be discarded, and its duration added to the duration of theprecedingone,tomaintainthetimingofsubsequentsamples.
The size of the table, sample_count, is taken from the sample_count in the Sample Size Box('stsz')orCompactSampleSizeBox(‘stz2’).
8.6.4.2 Syntax
aligned(8) class SampleDependencyTypeBox extends FullBox(‘sdtp’, version = 0, 0) { for (i=0; i < sample_count; i++){ unsigned int(2) is_leading; unsigned int(2) sample_depends_on; unsigned int(2) sample_is_depended_on; unsigned int(2) sample_has_redundancy; } }
8.6.4.3 Semantics
is_leadingtakesoneofthefollowingfourvalues:0: theleadingnatureofthissampleisunknown;
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 43
1: thissampleisaleadingsamplethathasadependencybeforethereferencedI‐picture(andisthereforenotdecodable);
2: thissampleisnotaleadingsample;3: thissampleisaleadingsamplethathasnodependencybeforethereferencedI‐picture(andisthereforedecodable);
sample_depends_ontakesoneofthefollowingfourvalues:0: thedependencyofthissampleisunknown;1: thissampledoesdependonothers(notanIpicture);2: thissampledoesnotdependonothers(Ipicture);3: reserved
sample_is_depended_ontakesoneofthefollowingfourvalues:0: thedependencyofothersamplesonthissampleisunknown;1: othersamplesmaydependonthisone(notdisposable);2: noothersampledependsonthisone(disposable);3: reserved
sample_has_redundancytakesoneofthefollowingfourvalues:0: itisunknownwhetherthereisredundantcodinginthissample;1: thereisredundantcodinginthissample;2: thereisnoredundantcodinginthissample;3: reserved
8.6.5 Edit Box
8.6.5.1 Definition
BoxType: ‘edts’Container: TrackBox(‘trak’)Mandatory:NoQuantity: Zeroorone
AnEditBoxmapsthepresentationtime‐linetothemediatime‐lineasit isstoredinthefile.TheEditBoxisacontainerfortheeditlists.
TheEditBox isoptional. In theabsenceof thisbox, there is an implicitone‐to‐onemappingof thesetime‐lines,andthepresentationofatrackstartsatthebeginningofthepresentation.Anemptyeditisusedtooffsetthestarttimeofatrack.
8.6.5.2 Syntax
aligned(8) class EditBox extends Box(‘edts’) { }
8.6.6 Edit List Box
8.6.6.1 Definition
BoxType: ‘elst’Container: EditBox(‘edts’)Mandatory:NoQuantity: Zeroorone
Thisboxcontainsanexplicittimelinemap.Eachentrydefinespartofthetracktime‐line:bymappingpartofthemediatime‐line,orbyindicating‘empty’time,orbydefininga‘dwell’,whereasingletime‐pointinthemediaisheldforaperiod.
ISO/IEC 14496-12:2015(E)
44 ©ISO/IEC2015–Allrightsreserved
NOTEEditsarenotrestrictedtofallonsampletimes.Thismeansthatwhenenteringanedit,itcanbenecessaryto (a)backup to a syncpoint, andpre‐roll from there and then (b)be careful about thedurationof the firstsample—itmighthavebeentruncatediftheeditentersitduringitsnormalduration.Ifthisisaudio,thatframemightneed tobedecoded,and then the finalslicingdone.Likewise, thedurationof the lastsample inaneditmightneedslicing.
Starting offsets for tracks (streams) are represented by an initial empty edit. For example, to play atrackfromitsstartfor30seconds,butat10secondsintothepresentation,wehavethefollowingeditlist:
Entry‐count=2Segment‐duration=10secondsMedia‐Time=‐1Media‐Rate=1Segment‐duration=30seconds(couldbethelengthofthewholetrack)Media‐Time=0secondsMedia‐Rate=1
Anon‐emptyeditmayinsertaportionofthemediatimelinethatisnotpresentintheinitialmovie,andispresentonlyinsubsequentmoviefragments.Particularlyinanemptyinitialmovieofafragmentedmoviefile(whentherearenomediasamplesyetpresent),thesegment_durationofthiseditmaybezero,whereupontheeditprovidestheoffsetfrommediacompositiontimetomoviepresentationtime,for the movie and subsequent movie fragments. It is recommended that such an edit be used toestablishapresentationtimeof0forthefirstpresentedsample,whencompositionoffsetsareused.
For example, if the composition time of the first composed frame is 20, then the edit thatmaps themediatimefrom20onwardstomovietime0onwards,wouldread:
Entry‐count=1Segment‐duration=0Media‐Time=20Media‐Rate=1
8.6.6.2 Syntax
aligned(8) class EditListBox extends FullBox(‘elst’, version, 0) { unsigned int(32) entry_count; for (i=1; i <= entry_count; i++) { if (version==1) { unsigned int(64) segment_duration; int(64) media_time; } else { // version==0 unsigned int(32) segment_duration; int(32) media_time; } int(16) media_rate_integer; int(16) media_rate_fraction = 0; } }
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 45
8.6.6.3 Semantics
version isanintegerthatspecifiestheversionofthisbox(0or1)entry_countisanintegerthatgivesthenumberofentriesinthefollowingtablesegment_duration isan integer thatspecifies thedurationof thiseditsegment inunitsof the
timescaleintheMovieHeaderBoxmedia_time isan integercontaining thestarting timewithin themediaof thiseditsegment (in
mediatimescaleunits,incompositiontime).Ifthisfieldissetto–1,itisanemptyedit.Thelasteditinatrackshallneverbeanemptyedit.AnydifferencebetweenthedurationintheMovieHeaderBox,andthetrack’sdurationisexpressedasanimplicitemptyeditattheend.
media_rate specifies the relative rate at which to play the media corresponding to this editsegment. If this value is 0, then the edit is specifying a ‘dwell’: the media at media‐time ispresentedforthesegment‐duration.Otherwisethisfieldshallcontainthevalue1.
8.7 Track Data Layout Structures
8.7.1 Data Information Box
8.7.1.1 Definition
BoxType: ‘dinf’Container: MediaInformationBox(‘minf’)orMetaBox(‘meta’)Mandatory:Yes(requiredwithin‘minf’box)andNo(optionalwithin‘meta’box)Quantity: Exactlyone
Thedatainformationboxcontainsobjectsthatdeclarethelocationofthemediainformationinatrack.
8.7.1.2 Syntax
aligned(8) class DataInformationBox extends Box(‘dinf’) { }
8.7.2 Data Reference Box
8.7.2.1 Definition
BoxTypes:‘dref’Container:DataInformationBox(‘dinf’)Mandatory:YesQuantity:Exactlyone
BoxTypes:‘url ‘,‘urn ‘Container:DataInformationBox(‘dref’)Mandatory:Yes(atleastoneof‘url‘or‘urn‘shallbepresent)Quantity:Oneormore
The data reference object contains a table of data references (normally URLs) that declare thelocation(s) of the media data used within the presentation. The data reference index in the sampledescription ties entries in this table to the samples in the track. A track may be split over severalsourcesinthisway.
Iftheflagissetindicatingthatthedataisinthesamefileasthisbox,thennostring(notevenanemptyone)shallbesuppliedintheentryfield.
ISO/IEC 14496-12:2015(E)
46 ©ISO/IEC2015–Allrightsreserved
The entry_count in the DataReferenceBox shall be 1 or greater; each DataEntryBox within theDataReferenceBoxshallbeeitheraDataEntryUrnBoxoraDataEntryUrlBox.
NOTEThoughthecountis32bits,thenumberofitemsisusuallymuchfewer,andisrestrictedbythefactthatthereferenceindexinthesampletableisonly16bits
Whenafilethathasdataentrieswiththeflagsetindicatingthatthemediadatais inthesamefile, issplit into segments for transport, the value of this flag does not change, as the file is (logically)reassembledafterthetransportoperation.
8.7.2.2 Syntax
aligned(8) class DataEntryUrlBox (bit(24) flags) extends FullBox(‘url ’, version = 0, flags) { string location; }
aligned(8) class DataEntryUrnBox (bit(24) flags) extends FullBox(‘urn ’, version = 0, flags) { string name; string location; }
aligned(8) class DataReferenceBox extends FullBox(‘dref’, version = 0, 0) { unsigned int(32) entry_count; for (i=1; i <= entry_count; i++) { DataEntryBox(entry_version, entry_flags) data_entry; } }
8.7.2.3 Semantics
version isanintegerthatspecifiestheversionofthisboxentry_count isanintegerthatcountstheactualentriesentry_version isanintegerthatspecifiestheversionoftheentryformatentry_flags isa24‐bit integerwithflags;oneflagisdefined(x000001)whichmeansthatthe
mediadataisinthesamefileastheMovieBoxcontainingthisdatareference.data_entry isaURLorURNentry.NameisaURN,andisrequiredinaURNentry.Locationisa
URL,andisrequiredinaURLentryandoptionalinaURNentry,whereitgivesalocationtofindtheresourcewiththegivenname.Eachisanull‐terminatedstringusingUTF‐8characters.Iftheself‐containedflagisset,theURLformisusedandnostringispresent;theboxterminateswiththeentry‐flagsfield.TheURLtypeshouldbeofaservicethatdeliversafile(e.g.URLsoftypefile,http, ftp etc.), and which services ideally also permit random access. Relative URLs arepermissible and are relative to the file containing the Movie Box that contains this datareference.
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 47
8.7.3 Sample Size Boxes
8.7.3.1 Definition
BoxType: ‘stsz’,‘stz2’Container: SampleTableBox(‘stbl’)Mandatory:YesQuantity: Exactlyonevariantmustbepresent
Thisboxcontainsthesamplecountandatablegivingthesizeinbytesofeachsample.Thisallowsthemediadataitselftobeunframed.Thetotalnumberofsamplesinthemediaisalwaysindicatedinthesamplecount.
There are two variants of the sample size box. The first variant has a fixed size 32‐bit field forrepresentingthesamplesizes;itpermitsdefiningaconstantsizeforallsamplesinatrack.Thesecondvariant permits smaller size fields, to save spacewhen the sizes are varying but small. One of theseboxesmustbepresent;thefirstversionispreferredformaximumcompatibility.
NOTEAsamplesizeofzeroisnotprohibitedingeneral,butitmustbevalidanddefinedforthecodingsystem,asdefinedbythesampleentry,thatthesamplebelongsto.
8.7.3.2 Sample Size Box
8.7.3.2.1 Syntax
aligned(8) class SampleSizeBox extends FullBox(‘stsz’, version = 0, 0) { unsigned int(32) sample_size; unsigned int(32) sample_count; if (sample_size==0) { for (i=1; i <= sample_count; i++) { unsigned int(32) entry_size; } } }
8.7.3.2.2 Semantics
version isanintegerthatspecifiestheversionofthisboxsample_size is integerspecifying thedefault samplesize. If all thesamplesare the samesize,
thisfieldcontainsthatsizevalue.If thisfieldissetto0,thenthesampleshavedifferentsizes,andthosesizesarestoredinthesamplesizetable.Ifthisfieldisnot0,itspecifiestheconstantsamplesize,andnoarrayfollows.
sample_countisanintegerthatgivesthenumberofsamplesinthetrack;ifsample‐sizeis0,thenitisalsothenumberofentriesinthefollowingtable.
entry_size isanintegerspecifyingthesizeofasample,indexedbyitsnumber.
8.7.3.3 Compact Sample Size Box
8.7.3.3.1 Syntax
aligned(8) class CompactSampleSizeBox extends FullBox(‘stz2’, version = 0, 0) { unsigned int(24) reserved = 0; unisgned int(8) field_size; unsigned int(32) sample_count; for (i=1; i <= sample_count; i++) { unsigned int(field_size) entry_size; } }
ISO/IEC 14496-12:2015(E)
48 ©ISO/IEC2015–Allrightsreserved
8.7.3.3.2 Semantics
version isanintegerthatspecifiestheversionofthisboxfield_sizeisanintegerspecifyingthesizeinbitsoftheentriesinthefollowingtable;itshall
takethevalue4,8or16.Ifthevalue4isused,theneachbytecontainstwovalues:entry[i]<<4+entry[i+1];ifthesizesdonotfillanintegralnumberofbytes,thelastbyteispaddedwithzeros.
sample_countisanintegerthatgivesthenumberofentriesinthefollowingtableentry_size isanintegerspecifyingthesizeofasample,indexedbyitsnumber.
8.7.4 Sample To Chunk Box
8.7.4.1 Definition
BoxType: ‘stsc’Container: SampleTableBox(‘stbl’)Mandatory:YesQuantity: Exactlyone
Samples within the media data are grouped into chunks. Chunks can be of different sizes, and thesampleswithinachunkcanhavedifferentsizes.Thistablecanbeusedtofindthechunkthatcontainsasample,itsposition,andtheassociatedsampledescription.
Thetableiscompactlycoded.Eachentrygivestheindexofthefirstchunkofarunofchunkswiththesamecharacteristics.Bysubtractingoneentryherefromthepreviousone,youcancomputehowmanychunks are in this run. You can convert this to a sample count by multiplying by the appropriatesamples‐per‐chunk.
8.7.4.2 Syntax
aligned(8) class SampleToChunkBox extends FullBox(‘stsc’, version = 0, 0) { unsigned int(32) entry_count; for (i=1; i <= entry_count; i++) { unsigned int(32) first_chunk; unsigned int(32) samples_per_chunk; unsigned int(32) sample_description_index; } }
8.7.4.3 Semantics
version isanintegerthatspecifiestheversionofthisboxentry_countisanintegerthatgivesthenumberofentriesinthefollowingtablefirst_chunkisanintegerthatgivestheindexofthefirstchunkinthisrunofchunksthatshare
the same samples‐per‐chunk and sample‐description‐index; the index of the first chunk in atrackhas thevalue1 (thefirst_chunk field in the first recordof thisboxhas thevalue1,identifyingthatthefirstsamplemapstothefirstchunk).
samples_per_chunkisanintegerthatgivesthenumberofsamplesineachofthesechunkssample_description_index is an integer that gives the index of the sample entry that
describesthesamplesinthischunk.Theindexrangesfrom1tothenumberofsampleentriesintheSampleDescriptionBox
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 49
8.7.5 Chunk Offset Box
8.7.5.1 Definition
BoxType: ‘stco’,‘co64’Container: SampleTableBox(‘stbl’)Mandatory:YesQuantity: Exactlyonevariantmustbepresent
Thechunkoffset table gives the indexof eachchunk into the containing file.Thereare twovariants,permitting the use of 32‐bit or 64‐bit offsets. The latter is useful when managing very largepresentations.Atmostoneofthesevariantswilloccurinanysingleinstanceofasampletable.
Offsets are file offsets, not theoffset into anyboxwithin the file (e.g.MediaDataBox).Thispermitsreferring tomediadata in fileswithoutanyboxstructure. Itdoesalsomean thatcaremustbe takenwhenconstructingaself‐containedISOfilewithitsmetadata(MovieBox)atthefront,asthesizeoftheMovieBoxwillaffectthechunkoffsetstothemediadata.
8.7.5.2 Syntax
aligned(8) class ChunkOffsetBox extends FullBox(‘stco’, version = 0, 0) { unsigned int(32) entry_count; for (i=1; i <= entry_count; i++) { unsigned int(32) chunk_offset; } }
aligned(8) class ChunkLargeOffsetBox extends FullBox(‘co64’, version = 0, 0) { unsigned int(32) entry_count; for (i=1; i <= entry_count; i++) { unsigned int(64) chunk_offset; } }
8.7.5.3 Semantics
version isanintegerthatspecifiestheversionofthisboxentry_countisanintegerthatgivesthenumberofentriesinthefollowingtablechunk_offset is a 32 or 64 bit integer that gives the offset of the start of a chunk into its
containingmediafile.
8.7.6 Padding Bits Box
8.7.6.1 Definition
BoxType: ‘padb’Container: SampleTable(‘stbl’)Mandatory:NoQuantity: Zeroorone
Insomestreamsthemediasamplesdonotoccupyallbitsofthebytesgivenbythesamplesize,andarepaddedattheendtoabyteboundary.Insomecases,itisnecessarytorecordexternallythenumberofpaddingbitsused.Thistablesuppliesthatinformation.
ISO/IEC 14496-12:2015(E)
50 ©ISO/IEC2015–Allrightsreserved
8.7.6.2 Syntax
aligned(8) class PaddingBitsBox extends FullBox(‘padb’, version = 0, 0) { unsigned int(32) sample_count; int i; for (i=0; i < ((sample_count + 1)/2); i++) { bit(1) reserved = 0; bit(3) pad1; bit(1) reserved = 0; bit(3) pad2; } }
8.7.6.3 Semantics
sample_count –countsthenumberofsamplesinthetrack;itshouldmatchthecountinothertables
pad1 –avaluefrom0to7,indicatingthenumberofbitsattheendofsample(i*2)+1.pad2 –avaluefrom0to7,indicatingthenumberofbitsattheendofsample(i*2)+2
8.7.7 Sub-Sample Information Box
8.7.7.1 Definition
BoxType: ‘subs’Container: SampleTableBox(‘stbl’)orTrackFragmentBox(‘traf’)Mandatory:NoQuantity: Zeroormore
Thisbox,namedtheSub-Sample Information box,isdesignedtocontainsub‐sampleinformation.
Asub‐sampleisacontiguousrangeofbytesofasample.Thespecificdefinitionofasub‐sampleshallbesuppliedforagivencodingsystem(e.g.forISO/IEC14496‐10,AdvancedVideoCoding).Intheabsenceofsuchaspecificdefinition,thisboxshallnotbeappliedtosamplesusingthatcodingsystem.
Ifsubsample_count is0 foranyentry, thenthosesampleshavenosubsample informationandnoarrayfollows.Thetableissparselycoded;thetableidentifieswhichsampleshavesub‐samplestructurebyrecordingthedifferenceinsample‐numberbetweeneachentry.Thefirstentryinthetablerecordsthesamplenumberofthefirstsamplehavingsub‐sampleinformation.
NOTEIt is possible to combine subsample_priority and discardable such that whensubsample_priority is smaller than a certain value,discardable is set to 1. However, since differentsystems may use different scales of priority values, to separate them is safe to have a clean solution fordiscardablesub‐samples.
Whenmore thanoneSub‐Sample Informationbox ispresent in the samecontainerbox, thevalueofflagsshalldifferineachoftheseSub‐SampleInformationboxes.Thesemanticsofflags,ifany,shallbe supplied for a given coding system. If flags have no semantics for a given coding system, theflagsshallbe0.
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 51
8.7.7.2 Syntax
aligned(8) class SubSampleInformationBox extends FullBox(‘subs’, version, flags) { unsigned int(32) entry_count; int i,j; for (i=0; i < entry_count; i++) { unsigned int(32) sample_delta; unsigned int(16) subsample_count; if (subsample_count > 0) { for (j=0; j < subsample_count; j++) { if(version == 1) { unsigned int(32) subsample_size; } else { unsigned int(16) subsample_size; } unsigned int(8) subsample_priority; unsigned int(8) discardable; unsigned int(32) codec_specific_parameters; } } } }
8.7.7.3 Semantics
version isanintegerthatspecifiestheversionofthisbox(0or1inthisspecification)entry_count isanintegerthatgivesthenumberofentriesinthefollowingtable.sample_delta isanintegerthatspecifiesthesamplenumberofthesamplehavingsub‐sample
structure. It is coded as the difference between the desired sample number, and the samplenumberindicatedinthepreviousentry.Ifthecurrententryisthefirstentry,thevalueindicatesthesamplenumberofthefirstsamplehavingsub‐sampleinformation,that is, thevalueisthedifferencebetweenthesamplenumberandzero(0).
subsample_count isanintegerthatspecifiesthenumberofsub‐sampleforthecurrentsample.Ifthereisnosub‐samplestructure,thenthisfieldtakesthevalue0.
subsample_size isanintegerthatspecifiesthesize,inbytes,ofthecurrentsub‐sample.subsample_priority is an integer specifying the degradation priority for each sub‐sample.
Higher values ofsubsample_priority, indicate sub‐sampleswhich are important to, andhaveagreaterimpacton,thedecodedquality.
discardableequalto0meansthatthesub‐sampleisrequiredtodecodethecurrentsample,whileequalto1meansthesub‐sampleisnotrequiredtodecodethecurrentsamplebutmaybeusedforenhancements,e.g.,thesub‐sampleconsistsofsupplementalenhancementinformation(SEI)messages.
codec_specific_parametersisdefinedbythecodecinuse.Ifnosuchdefinitionisavailable,thisfieldshallbesetto0.
8.7.8 Sample Auxiliary Information Sizes Box
8.7.8.1 Definition
BoxType: ‘saiz’Container: SampleTableBox(‘stbl’)orTrackFragmentBox('traf')Mandatory:NoQuantity: ZeroorMore
Per‐samplesampleauxiliaryinformationmaybestoredanywhereinthesamefileasthesampledataitself; for self‐contained media files, this is typically in a MediaData box or a box from a derived
ISO/IEC 14496-12:2015(E)
52 ©ISO/IEC2015–Allrightsreserved
specification.Itisstoredeither(a)inmultiplechunks,withthenumberofsamplesperchunk,aswellasthenumberofchunks,matchingthechunkingoftheprimarysampledataor(b)inasinglechunkforallthesamples inamoviesample table (oramovie fragment).TheSampleAuxiliary Information forallsamples contained within a single chunk (or track run) is stored contiguously (similarly to sampledata).
SampleAuxiliaryInformation,whenpresent,isalwaysstoredinthesamefileasthesamplestowhichitrelatesastheysharethesamedatareference(‘dref’)structure.However,thisdatamaybelocatedanywherewithinthis file,usingauxiliary informationoffsets(‘saio’) to indicatethe locationof thedata.
Whethersampleauxiliaryinformationispermittedorrequiredmaybespecifiedbythebrandsorthecoding format in use. The format of the sample auxiliary information is determined byaux_info_type. If aux_info_type and aux_info_type_parameter are omitted then theimpliedvalueofaux_info_type iseither(a) in thecaseof transformedcontent,suchasprotectedcontent, thescheme_type included in theProtection Scheme Informationbox or otherwise (b) thesample entry type. The default value of the aux_info_type_parameter is 0. Some values ofaux_info_type may be restricted to be used only with particular track types. A track may havemultiple streams of sample auxiliary information of different types. The types are registered at theregistrationauthority.
While aux_info_type determines the format of the auxiliary information, several streams ofauxiliary information having the same format may be used when their value ofaux_info_type_parameter differs. The semantics of aux_info_type_parameter for aparticular aux_info_type value must be specified along with specifying the semantics of theparticularaux_info_typevalueandtheimpliedauxiliaryinformationformat.
Thisboxprovidesthesizeoftheauxiliaryinformationforeachsample.Foreachinstanceof thisbox,theremust be amatchingSampleAuxiliaryInformationOffsetsBoxwith the same values ofaux_info_type and aux_info_type_parameter, providing the offset information for thisauxiliaryinformation.
NOTE Fordiscussionsontheuseofsampleauxiliaryinformationversusothermechanisms,seeAnnexC.8.
8.7.8.2 Syntax
aligned(8) class SampleAuxiliaryInformationSizesBox extends FullBox(‘saiz’, version = 0, flags) { if (flags & 1) { unsigned int(32) aux_info_type; unsigned int(32) aux_info_type_parameter; } unsigned int(8) default_sample_info_size; unsigned int(32) sample_count; if (default_sample_info_size == 0) { unsigned int(8) sample_info_size[ sample_count ]; } }
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 53
8.7.8.3 Semantics
aux_info_typeisanintegerthatidentifiesthetypeofthesampleauxiliaryinformation.Atmostone occurrence of this box with the same values for aux_info_type andaux_info_type_parametershallexistinthecontainingbox.
aux_info_type_parameter identifies the “stream” of auxiliary information having the samevalue of aux_info_type and associated to the same track. The semantics ofaux_info_type_parameteraredeterminedbythevalueofaux_info_type.
default_sample_info_size isanintegerspecifyingthesampleauxiliaryinformationsizeforthecasewherealltheindicatedsampleshavethesamesampleauxiliaryinformationsize.Ifthesizevariesthenthisfieldshallbezero.
sample_count isan integer thatgives thenumberofsamples forwhichasize isdefined.ForaSample Auxiliary Information Sizes box appearing in the Sample Table Box thismust be thesameas,orlessthan,thesample_countwithintheSampleSizeBoxorCompactSampleSizeBox.ForaSampleAuxiliaryInformationSizesboxappearinginaTrackFragmentboxthismustbethesameas,orlessthan,thesumofthesample_countentrieswithintheTrackFragmentRun boxes of the Track Fragment. If this is less than the number of samples, then auxiliaryinformation issupplied for the initial samples,and theremainingsampleshavenoassociatedauxiliaryinformation.
sample_info_sizegivesthesizeofthesampleauxiliaryinformationinbytes.Thismaybezerotoindicatesampleswithnoassociatedauxiliaryinformation.
8.7.9 Sample Auxiliary Information Offsets Box
8.7.9.1 Definition
BoxType: ‘saio’Container: SampleTableBox(‘stbl’)orTrackFragmentBox('traf')Mandatory:NoQuantity: ZeroorMore
For an introduction to sample auxiliary information, see the definition of the Sample AuxiliaryInformationSizeBox.
Thisboxprovidesthepositioninformationforthesampleauxiliaryinformation,inawaysimilartothechunkoffsetsforsampledata.
8.7.9.2 Syntax
aligned(8) class SampleAuxiliaryInformationOffsetsBox extends FullBox(‘saio’, version, flags) { if (flags & 1) { unsigned int(32) aux_info_type; unsigned int(32) aux_info_type_parameter; } unsigned int(32) entry_count; if ( version == 0 ) { unsigned int(32) offset[ entry_count ]; } else { unsigned int(64) offset[ entry_count ]; } }
ISO/IEC 14496-12:2015(E)
54 ©ISO/IEC2015–Allrightsreserved
8.7.9.3 Semantics
aux_info_type and aux_info_type_parameter are defined as in theSampleAuxiliaryInformationSizesBox
entry_count gives the number of entries in the following table. For a Sample AuxiliaryInformationOffsetsbox appearing in a SampleTableBox thismustbeequal tooneor to thevalue of the entry_count field in the Chunk Offset Box or Chunk Large Offset Box. For aSampleAuxiliaryInformationOffsetsBoxappearinginaTrackFragmentbox,thismustbeequaltooneortothenumberofTrackFragmentRunboxesintheTrackFragmentBox.
offsetgivesthepositioninthefileoftheSampleAuxiliaryInformationforeachChunkorTrackFragmentRun.Ifentry_countisone,thentheSampleAuxiliaryInformationforallChunksorRunsiscontiguousinthefileinchunkorrunorder.WhenintheSampleTableBox,theoffsetsareabsolute.Inatrackfragmentbox,thisvalueisrelativetothebaseoffsetestablishedbythetrackfragmentheaderbox(‘tfhd’)inthesametrackfragment(see8.8.14).
8.8 Movie Fragments
8.8.1 Movie Extends Box
8.8.1.1 Definition
BoxType: ‘mvex’Container: MovieBox(‘moov’)Mandatory:NoQuantity: Zeroorone
ThisboxwarnsreadersthattheremightbeMovieFragmentBoxesinthisfile.Toknowofallsamplesinthe tracks, theseMovie Fragment Boxesmust be found and scanned in order, and their informationlogicallyaddedtothatfoundintheMovieBox.
ThereisanarrativeintroductiontoMovieFragmentsinAnnexA.
8.8.1.2 Syntax
aligned(8) class MovieExtendsBox extends Box(‘mvex’){ }
8.8.2 Movie Extends Header Box
8.8.2.1 Definition
BoxType: ‘mehd’Container: MovieExtendsBox(‘mvex’)Mandatory:NoQuantity: Zeroorone
The Movie Extends Header is optional, and provides the overall duration, including fragments, of afragmentedmovie.Ifthisboxisnotpresent,theoveralldurationmustbecomputedbyexaminingeachfragment.
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 55
8.8.2.2 Syntax
aligned(8) class MovieExtendsHeaderBox extends FullBox(‘mehd’, version, 0) { if (version==1) { unsigned int(64) fragment_duration; } else { // version==0 unsigned int(32) fragment_duration; } }
8.8.2.3 Semantics
fragment_duration is an integer thatdeclares length of thepresentationof thewholemovieincludingfragments(inthetimescaleindicatedintheMovieHeaderBox).Thevalueofthisfieldcorresponds to thedurationof the longest track, includingmovie fragments. If anMP4 file iscreated in real‐time, such as used in live streaming, it is not likely that thefragment_durationisknowninadvanceandthisboxmaybeomitted.
8.8.3 Track Extends Box
8.8.3.1 Definition
BoxType: ‘trex’Container: MovieExtendsBox(‘mvex’)Mandatory:YesQuantity: ExactlyoneforeachtrackintheMovieBox
This sets up default values used by themovie fragments. By setting defaults in this way, space andcomplexitycanbesavedineachTrackFragmentBox.
Thesampleflagsfieldinsamplefragments(default_sample_flagshereandinaTrackFragmentHeaderBox,andsample_flagsandfirst_sample_flagsinaTrackFragmentRunBox)iscodedasa32‐bitvalue.Ithasthefollowingstructure:
bit(4) reserved=0; unsigned int(2) is_leading; unsigned int(2) sample_depends_on; unsigned int(2) sample_is_depended_on; unsigned int(2) sample_has_redundancy; bit(3) sample_padding_value; bit(1) sample_is_non_sync_sample; unsigned int(16) sample_degradation_priority;
The is_leading, sample_depends_on, sample_is_depended_on andsample_has_redundancy values are defined as documented in the Independent and DisposableSamplesBox.
The flagsample_is_non_sync_sample provides the same information as the sync sample table[8.6.2]. When this value is set 0 for a sample, it is the same as if the sample were not in a moviefragmentandmarkedwithanentry in thesyncsample table (or, ifallsamplesaresyncsamples, thesyncsampletablewereabsent).
The sample_padding_value is defined as for the padding bits table. Thesample_degradation_priorityisdefinedasforthedegradationprioritytable.
ISO/IEC 14496-12:2015(E)
56 ©ISO/IEC2015–Allrightsreserved
8.8.3.2 Syntax
aligned(8) class TrackExtendsBox extends FullBox(‘trex’, 0, 0){ unsigned int(32) track_ID; unsigned int(32) default_sample_description_index; unsigned int(32) default_sample_duration; unsigned int(32) default_sample_size; unsigned int(32) default_sample_flags; }
8.8.3.3 Semantics
track_id identifiesthetrack;thisshallbethetrackIDofatrackintheMovieBoxdefault_thesefieldssetupdefaultsusedinthetrackfragments.
8.8.4 Movie Fragment Box
8.8.4.1 Definition
BoxType: ‘moof’Container: FileMandatory:NoQuantity: Zeroormore
The movie fragments extend the presentation in time. They provide the information that wouldpreviouslyhavebeenintheMovieBox.TheactualsamplesareinMediaDataBoxes,asusual,iftheyarein the same file. The data reference index is in the sample description, so it is possible to buildincrementalpresentationswherethemediadataisinfilesotherthanthefilecontainingtheMovieBox.
TheMovie Fragment Box is a top‐level box, (i.e. a peer to theMovie Box andMedia Data boxes). ItcontainsaMovieFragmentHeaderBox,andthenoneormoreTrackFragmentBoxes.
NOTE Thereisnorequirementthatanyparticularmoviefragmentextendalltrackspresentinthemovieheader, and there is no restriction on the location of themedia data referred to by themovie fragments.However,derivedspecificationsmaymakesuchrestrictions.
8.8.4.2 Syntax
aligned(8) class MovieFragmentBox extends Box(‘moof’){ }
8.8.5 Movie Fragment Header Box
8.8.5.1 Definition
BoxType: ‘mfhd’Container: MovieFragmentBox('moof')Mandatory:YesQuantity: Exactlyone
The movie fragment header contains a sequence number, as a safety check. The sequence numberusuallystartsat1andincreasesforeachmoviefragmentinthefile,intheorderinwhichtheyoccur.Thisallows readers toverify integrityof thesequence inenvironmentswhereundesiredre‐orderingmightoccur.
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 57
8.8.5.2 Syntax
aligned(8) class MovieFragmentHeaderBox extends FullBox(‘mfhd’, 0, 0){ unsigned int(32) sequence_number; }
8.8.5.3 Semantics
sequence_number anumberassociatedwiththisfragment
8.8.6 Track Fragment Box
8.8.6.1 Definition
BoxType: ‘traf’Container: MovieFragmentBox('moof')Mandatory:NoQuantity: Zeroormore
Withinthemoviefragmentthereisasetoftrackfragments,zeroormorepertrack.Thetrackfragmentsinturncontainzeroormoretrackruns,eachofwhichdocumentacontiguousrunofsamplesforthattrack.Withinthesestructures,manyfieldsareoptionalandcanbedefaulted.
It ispossible toadd 'empty time' toa trackusing thesestructures,aswellasaddingsamples.Emptyinsertscanbeusedinaudiotracksdoingsilencesuppression,forexample.
8.8.6.2 Syntax
aligned(8) class TrackFragmentBox extends Box(‘traf’){ }
8.8.7 Track Fragment Header Box
8.8.7.1 Definition
BoxType: ‘tfhd’Container: TrackFragmentBox('traf')Mandatory:YesQuantity: Exactlyone
Eachmoviefragmentcanaddzeroormorefragmentstoeachtrack;andatrackfragmentcanaddzeroormorecontiguousrunsofsamples.Thetrackfragmentheadersetsupinformationanddefaultsusedforthoserunsofsamples.
Thebase‐data‐offset,ifexplicitlyprovided,isadataoffsetthatisidenticaltoachunkoffsetintheChunkOffset Box, i.e. applying to the complete file (e.g. starting with a file‐type box and movie box). Incircumstanceswhenthecompletefiledoesnotexistoritssizeisunknown,itmaybeimpossibletouseanexplicitbase‐data‐offset;then,offsetsneedtobeestablishedrelativetothemoviefragment.
Thefollowingflagsaredefinedinthetf_flags:
0x000001base‐data‐offset‐present: indicates the presence of the base‐data‐offset field. Thisprovidesanexplicitanchorforthedataoffsetsineachtrackrun(seebelow).Ifnotprovidedandif the default‐base‐is‐moof flag is not set, the base‐data‐offset for the first track in themovie
ISO/IEC 14496-12:2015(E)
58 ©ISO/IEC2015–Allrightsreserved
fragmentis thepositionofthefirstbyteoftheenclosingMovieFragmentBox,andforsecondand subsequent track fragments, the default is the end of the data defined by the precedingtrack fragment. Fragments 'inheriting' their offset in this way must all use the same data‐reference(i.e.,thedataforthesetracksmustbeinthesamefile)
0x000002sample‐description‐index‐present:indicatesthepresenceofthisfield,whichover‐rides,inthisfragment,thedefaultsetupintheTrackExtendsBox.
0x000008default‐sample‐duration‐present0x000010default‐sample‐size‐present0x000020default‐sample‐flags‐present0x010000duration‐is‐empty: this indicates that the duration provided in either default‐sample‐
duration, or by thedefault‐duration in theTrackExtendsBox, is empty, i.e. that therearenosamplesforthistimeinterval.ItisanerrortomakeapresentationthathasbotheditlistsintheMovieBox,andempty‐durationfragments.
0x020000default‐base‐is‐moof: if base‐data‐offset‐present is 1, this flag is ignored. If base‐data‐offset‐present is zero, this indicates that the base‐data‐offset for this track fragment is thepositionofthefirstbyteoftheenclosingMovieFragmentBox.Supportforthedefault‐base‐is‐moof flag is requiredunder the ‘iso5’brand,and it shallnotbeused inbrandsorcompatiblebrandsearlierthaniso5.
NOTE Theuseofthedefault‐base‐is‐moofflagbreaksthecompatibilitytoearlierbrandsofthefileformat,becauseitsetstheanchorpointforoffsetcalculationdifferentlythanearlier.Therefore,thedefault‐base‐is‐moofflagcannotbesetwhenearlierbrandsareincludedintheFileTypebox.
8.8.7.2 Syntax
aligned(8) class TrackFragmentHeaderBox extends FullBox(‘tfhd’, 0, tf_flags){ unsigned int(32) track_ID; // all the following are optional fields unsigned int(64) base_data_offset; unsigned int(32) sample_description_index; unsigned int(32) default_sample_duration; unsigned int(32) default_sample_size; unsigned int(32) default_sample_flags }
8.8.7.3 Semantics
base_data_offset thebaseoffsettousewhencalculatingdataoffsets
8.8.8 Track Fragment Run Box
8.8.8.1 Definition
BoxType: ‘trun’Container: TrackFragmentBox('traf')Mandatory:NoQuantity: Zeroormore
WithintheTrackFragmentBox,therearezeroormoreTrackRunBoxes.Iftheduration‐is‐emptyflagissetinthetf_flags,therearenotrackruns.Atrackrundocumentsacontiguoussetofsamplesforatrack.
Thenumberofoptionalfieldsisdeterminedfromthenumberofbitssetinthelowerbyteoftheflags,and the size of a record from the bits set in the second byte of the flags. This procedure shall befollowed,toallowfornewfieldstobedefined.
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 59
If the data‐offset is not present, then the data for this run starts immediately after the data of thepreviousrun,oratthebase‐data‐offsetdefinedbythetrackfragmentheaderifthisisthefirstruninatrackfragment,Ifthedata‐offsetispresent,itisrelativetothebase‐data‐offsetestablishedinthetrackfragmentheader.
Thefollowingflagsaredefined:
0x000001 data‐offset‐present.0x000004 first‐sample‐flags‐present;thisover‐ridesthedefaultflagsforthefirstsampleonly.This
makesitpossibletorecordagroupofframeswherethefirstisakeyandtherestaredifferenceframes,withoutsupplyingexplicitflagsforeverysample.Ifthisflagandfieldareused,sample‐flagsshallnotbepresent.
0x000100 sample‐duration‐present:indicatesthateachsamplehasitsownduration,otherwisethedefaultisused.
0x000200 sample‐size‐present:eachsamplehasitsownsize,otherwisethedefaultisused.0x000400 sample‐flags‐present;eachsamplehasitsownflags,otherwisethedefaultisused.0x000800 sample‐composition‐time‐offsets‐present; each sample has a composition time offset
(e.g.asusedforI/P/BvideoinMPEG).
Thecompositionoffsetvaluesinthecompositiontime‐to‐sampleboxandinthetrackrunboxmaybesignedorunsigned.Therecommendationsgiveninthecompositiontime‐to‐sampleboxconcerningtheuseofsignedcompositionoffsetsalsoapplyhere.
8.8.8.2 Syntax
aligned(8) class TrackRunBox extends FullBox(‘trun’, version, tr_flags) { unsigned int(32) sample_count; // the following are optional fields signed int(32) data_offset; unsigned int(32) first_sample_flags; // all fields in the following array are optional { unsigned int(32) sample_duration; unsigned int(32) sample_size; unsigned int(32) sample_flags if (version == 0) { unsigned int(32) sample_composition_time_offset; } else { signed int(32) sample_composition_time_offset; } }[ sample_count ] }
8.8.8.3 Semantics
sample_count thenumberofsamplesbeingadded in thisrun;also thenumberof rows in thefollowingtable(therowscanbeempty)
data_offsetisaddedtotheimplicitorexplicitdata_offsetestablishedinthetrackfragmentheader.
first_sample_flagsprovidesasetofflagsforthefirstsampleonlyofthisrun.
ISO/IEC 14496-12:2015(E)
60 ©ISO/IEC2015–Allrightsreserved
8.8.9 Movie Fragment Random Access Box
8.8.9.1 Definition
BoxType: ‘mfra’Container: FileMandatory:NoQuantity: Zeroorone
The Movie Fragment Random Access Box (‘mfra’) provides a table which may assist readers infindingsyncsamplesinafileusingmoviefragments.Itcontainsatrackfragmentrandomaccessboxforeach track forwhich information isprovided (whichmaynotbeall tracks). It isusuallyplacedatorneartheendofthefile;thelastboxwithintheMovieFragmentRandomAccessBoxprovidesacopyofthelengthfieldfromtheMovieFragmentRandomAccessBox.Readersmayattempttofindthisboxbyexamining the last 32bits of the file, or scanning backwards from the end of the file for a MovieFragment Random Access Offset Box and using the size information in it, to see if that locates thebeginningofaMovieFragmentRandomAccessBox.
Thisboxprovidesonlyahintastowheresyncsamplesare;themoviefragmentsthemselvesaredefinitive.Itisrecommendedthatreaderstakecareinbothlocatingandusingthisboxasmodificationstothefileafteritwascreatedmayrendereitherthepointers,orthedeclarationofsyncsamples,incorrect.
8.8.9.2 Syntax
aligned(8) class MovieFragmentRandomAccessBox extends Box(‘mfra’) { }
8.8.10 Track Fragment Random Access Box
8.8.10.1 Definition
BoxType: ‘tfra’Container: MovieFragmentRandomAccessBox(‘mfra’)Mandatory:NoQuantity: Zerooronepertrack
Eachentrycontainsthelocationandthepresentationtimeofthesyncsample.Notethatnoteverysyncsampleinthetrackneedstobelistedinthetable.
The absence of this box does not mean that all the samples are sync samples. Random accessinformationinthe‘trun’,‘traf’and‘trex’shallbesetappropriatelyregardlessofthepresenceofthisbox.
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 61
8.8.10.2 Syntax
aligned(8) class TrackFragmentRandomAccessBox extends FullBox(‘tfra’, version, 0) { unsigned int(32) track_ID; const unsigned int(26) reserved = 0; unsigned int(2) length_size_of_traf_num; unsigned int(2) length_size_of_trun_num; unsigned int(2) length_size_of_sample_num; unsigned int(32) number_of_entry; for(i=1; i <= number_of_entry; i++){ if(version==1){ unsigned int(64) time; unsigned int(64) moof_offset; }else{ unsigned int(32) time; unsigned int(32) moof_offset; } unsigned int((length_size_of_traf_num+1) * 8) traf_number; unsigned int((length_size_of_trun_num+1) * 8) trun_number; unsigned int((length_size_of_sample_num+1) * 8) sample_number; } }
8.8.10.3 Semantics
track_ID isanintegeridentifyingthetrack_ID.length_size_of_traf_num indicatesthelengthinbyteofthetraf_numberfieldminusone.length_size_of_trun_num indicatesthelengthinbyteofthetrun_numberfieldminusone.length_size_of_sample_num indicatesthelengthinbyteofthesample_numberfieldminus
one.number_of_entry isanintegerthatgivesthenumberoftheentriesforthistrack.Ifthisvalueis
zero,itindicatesthateverysampleisasyncsampleandnotableentryfollows. time is 32 or 64 bits integer that indicates the presentation time of the sync sample in units
definedinthe‘mdhd’oftheassociatedtrack.moof_offset is32or64bitsintegerthatgivestheoffsetofthe‘moof’usedinthisentry.Offset
isthebyte‐offsetbetweenthebeginningofthefileandthebeginningofthe‘moof’.traf_numberindicatesthe‘traf’numberthatcontainsthesyncsample.Thenumberranges
from1(thefirst‘traf’ isnumbered1)ineach‘moof’.trun_numberindicatesthe‘trun’numberthatcontainsthesyncsample.Thenumberranges
from1ineach‘traf’.sample_number indicatesthesamplenumberof thesyncsample.Thenumberranges from1 in
each‘trun’. 8.8.11 Movie Fragment Random Access Offset Box
8.8.11.1 Definition
BoxType: ‘mfro’Container: MovieFragmentRandomAccessBox(‘mfra’)Mandatory:YesQuantity: Exactlyone
TheMovieFragmentRandomAccessOffsetBoxprovidesacopyofthelengthfieldfromtheenclosingMovieFragmentRandomAccessBox.Itisplacedlastwithinthatbox,sothatthesizefieldisalsolastintheenclosingMovieFragmentRandomAccessBox.WhentheMovieFragmentRandomAccessBox isalsolastinthefilethispermitsitseasylocation.Thesizefieldheremustbecorrect.However,neitherthepresenceoftheMovieFragmentRandomAccessBox,noritsplacementlastinthefile,areassured.
ISO/IEC 14496-12:2015(E)
62 ©ISO/IEC2015–Allrightsreserved
8.8.11.2 Syntax
aligned(8) class MovieFragmentRandomAccessOffsetBox extends FullBox(‘mfro’, version, 0) { unsigned int(32) size; }
8.8.11.3 Semantics
size isanintegergivesthenumberofbytesoftheenclosing‘mfra’box.Thisfieldisplacedatthe lastof theenclosingbox to assist readers scanning from theendof the file in finding the‘mfra’box.
8.8.12 Track fragment decode time
8.8.12.1 Definition
BoxType: `tfdt’Container: TrackFragmentbox(‘traf’)Mandatory: NoQuantity: Zeroorone
TheTrackFragmentBaseMediaDecodeTimeBoxprovidestheabsolutedecodetime,measuredonthemedia timeline, of the first sample in decode order in the track fragment. This can be useful, forexample,whenperformingrandomaccessinafile;itisnotnecessarytosumthesampledurationsofallprecedingsamplesinpreviousfragmentstofindthisvalue(wherethesampledurationsarethedeltasintheDecodingTimetoSampleBoxandthesample_durationsintheprecedingtrackruns).
The Track Fragment Base Media Decode Time Box, if present, shall be positioned after the TrackFragmentHeaderBoxandbeforethefirstTrackFragmentRunbox.
NOTE Thedecodetimelineisamediatimeline,establishedbeforeanyexplicitorimpliedmappingofmediatimetopresentationtime,forexamplebyaneditlistorsimilarstructure.
Ifthetimeexpressedinthetrackfragmentdecodetime(‘tfdt’)boxexceedsthesumofthedurationsofthe samples in the preceding movie and movie fragments, then the duration of the last sampleprecedingthistrackfragmentisextendedsuchthatthesumnowequalsthetimegiveninthisbox.Inthisway,itispossibletogenerateafragmentcontainingasamplewhenthetimeofthenextsampleisnotyetknown.
Inparticular,anemptytrackfragment(withnosamples,butwithatrackfragmentdecodetimebox)maybeusedtoestablishthedurationofthelastsample.
8.8.12.2 Syntax
aligned(8) class TrackFragmentBaseMediaDecodeTimeBox extends FullBox(‘tfdt’, version, 0) { if (version==1) { unsigned int(64) baseMediaDecodeTime; } else { // version==0 unsigned int(32) baseMediaDecodeTime; } }
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 63
8.8.12.3 Semantics
version isanintegerthatspecifiestheversionofthisbox(0or1inthisspecification).baseMediaDecodeTime is an integer equal to the sum of the decode durations of all earlier
samplesinthemedia,expressedinthemedia'stimescale.Itdoesnotincludethesamplesaddedintheenclosingtrackfragment.
8.8.13 Level Assignment Box
8.8.13.1 Definition
BoxType: `leva’Container: MovieExtendsBox(`mvex’)Mandatory: NoQuantity: Zeroorone
Levelsspecifysubsetsof the file.Samplesmappedto levelnmaydependonanysamplesof levelsm,wherem<=n,andshallnotdependonanysamplesoflevelsp,wherep>n.Forexample,levelscanbespecifiedaccordingtotemporallevel(e.g.,temporal_idofSVCorMVC).
Levelscannotbespecifiedfortheinitialmovie.WhentheLevelAssignmentboxispresent,itappliestoallmoviefragmentssubsequenttotheinitialmovie.
For the context of the Level Assignment box, a fraction is defined to consist of one or more MovieFragmentboxesandtheassociatedMediaDataboxes,possiblyincludingonlyaninitialpartofthelastMediaDataBox.Withinafraction,dataforeachlevelshallappearcontiguously.Dataforlevelswithinafractionshallappearinincreasingorderoflevelvalue.Alldatainafractionshallbeassignedtolevels.
NOTE In the context of DASH (ISO/IEC 23009‐1), each subsegment indexedwithin a Subsegment Index box is afraction.
The Level Assignment box provides amapping from features, such as scalability layers, to levels. Afeaturecanbespecifiedthroughatrack,asub‐trackwithinatrack,orasamplegroupingofatrack.
When padding_flag is equal to 1 this indicates that a conforming fraction can be formed byconcatenatinganypositiveintegernumberoflevelswithinafractionandpaddingthelastMediaDataboxbyzerobytesuptothefullsizethat is indicatedintheheaderofthelastMediaDatabox.Forexample,padding_flagcanbesetequalto1whenthefollowingconditionsaretrue:
Each fraction contains two ormoreAVC, SVC, orMVC [ISO/IEC 14496‐15] tracks of the samevideobitstream.
Thesamples foreachtrackofa fractionarecontiguousand indecodingorder inaMediaDatabox.
The samples of the first AVC, SVC, orMVC level contain extractorNAL units for including thevideocodingNALunitsfromtheotherlevelsofthesamefraction.
ISO/IEC 14496-12:2015(E)
64 ©ISO/IEC2015–Allrightsreserved
8.8.13.2 Syntax
aligned(8) class LevelAssignmentBox extends FullBox(‘leva’, 0, 0) { unsigned int(8) level_count; for (j=1; j <= level_count; j++) { unsigned int(32) track_id; unsigned int(1) padding_flag; unsigned int(7) assignment_type; if (assignment_type == 0) { unsigned int(32) grouping_type; } else if (assignment_type == 1) { unsigned int(32) grouping_type; unsigned int(32) grouping_type_parameter; } else if (assignment_type == 2) {} // no further syntax elements needed else if (assignment_type == 3) {} // no further syntax elements needed else if (assignment_type == 4) { unsigned int(32) sub_track_id; } // other assignment_type values are reserved } }
8.8.13.3 Semantics
level_countspecifiesthenumberoflevelseachfractionisgroupedinto.level_countshallbegreaterthanorequalto2.
track_idforloopentryjspecifiesthetrackidentifierofthetrackassignedtolevelj.padding_flagequalto1indicatesthataconformingfractioncanbeformedbyconcatenatingany
positiveintegernumberoflevelswithinafractionandpaddingthelastMediaDataboxbyzerobytesuptothefullsizethatisindicatedintheheaderofthelastMediaDatabox.Thesemanticsofpadding_flagequalto0arethatthisisnotassured.
assignment_type indicates the mechanism used to specify the assignment to a level.assignment_type values greater than 4 are reserved, while the semantics for the othervaluesarespecifiedasfollows.Thesequenceofassignment_typesisrestrictedtobeasetofzeroormoreoftype2or3,followedbyzeroormoreofexactlyonetype. 0:samplegroupsareusedtospecifylevels,i.e.,samplesmappedtodifferentsamplegroup
description indexes of a particular sample grouping lie in different levels within theidentifiedtrack;othertracksarenotaffectedandmusthavealltheirdatainpreciselyonelevel;
1:asforassignment_type0exceptassignmentisbyaparameterizedsamplegroup; 2, 3: level assignment is by track (see the Subsegment Index Box for the difference in
processingoftheselevels) 4: the respective level contains the samples for a sub‐track. The sub‐tracks are specified
through the Sub Track box; other tracks are not affected andmust have all their data inpreciselyonelevel;
grouping_type and grouping_type_parameter, if present, specify the sample groupingusedtomapsamplegroupdescriptionentriesintheSampleGroupDescriptionboxtolevels.Levelncontainsthesamplesthataremappedtothesamplegroupdescriptionentryhavingindexninthe Sample Group Description box having the same values of grouping_type andgrouping_type_parameter,ifpresent,asthoseprovidedinthisbox.
sub_track_id specifies that the sub‐track identified bysub_track_idwithin loop entry j ismappedtolevelj.
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 65
8.8.14 Sample Auxiliary Information in Movie Fragments
Whensampleauxiliaryinformation(8.7.8and8.7.9)ispresentintheMovieFragmentbox,theoffsetsintheSampleAuxiliaryInformationOffsetsBoxaretreatedthesameasthedata_offset intheTrackFragmentRunbox,thatis,theyarerelativetoanybasedataoffsetestablishedforthattrackfragment.Ifmovie fragment relative addressing is used (no base data offset is provided in the track fragmentheader)andauxiliaryinformationispresent,thenthedefault_base_is_moofflagmustalsobesetintheflagsofthattrackfragmentheader.
If only one offset is provided, then the Sample Auxiliary Information for all the track runs in thefragmentisstoredcontiguously,otherwiseexactlyoneoffsetmustbeprovidedforeachtrackrun.
If the field default_sample_info_size is non‐zero in one of these boxes, then the size of theauxiliaryinformationisconstantfortheidentifiedsamples.
Inaddition,if:
thisboxispresentinthemoviebox,
anddefault_sample_info_sizeisnon‐zerointheboxinthemoviebox,
andthesampleauxiliaryinformationsizesboxisabsentinamoviefragment,
thentheauxiliaryinformationhasthissameconstantsizeforeverysampleinthemoviefragmentalso;itisthennotnecessarytorepeattheboxinthemoviefragment.
8.8.15 Track Extension Properties Box
8.8.15.1 Definition
BoxType: ‘trep’Container: MovieExtendsBox(‘mvex’)Mandatory:NoQuantity: Zeroormore.(Zerooronepertrack)
Thisboxcanbeusedtodocumentorsummarizecharacteristicsofthetrackinthesubsequentmoviefragments.Itmaycontainanynumberofchildboxes.
8.8.15.2 Syntax
class TrackExtensionPropertiesBox extends FullBox(‘trep’, 0, 0) { unsigned int(32) track_id; // Any number of boxes may follow }
8.8.15.3 Semantics
track_idindicatesthetrackforwhichthetrackextensionpropertiesareprovidedinthisbox.
ISO/IEC 14496-12:2015(E)
66 ©ISO/IEC2015–Allrightsreserved
8.8.16 Alternative Startup Sequence Properties Box
8.8.16.1 Definition
BoxType: ‘assp’Container: TrackExtensionPropertiesBox(‘trep’)Mandatory:NoQuantity: Zeroorone
ThisboxindicatesthepropertiesofalternativestartupsequencesamplegroupsinthesubsequenttrackfragmentsofthetrackindicatedinthecontainingTrackExtensionPropertiesbox.
Version0oftheAlternativeStartupSequencePropertiesboxshallbeusedifversion0oftheSampletoGroupbox isused for thealternativestartupsequencesamplegrouping.Version1of theAlternativeStartupSequencePropertiesboxshallbeusedifversion1oftheSampletoGroupboxisusedforthealternativestartupsequencesamplegrouping.
8.8.16.2 Syntax
class AlternativeStartupSequencePropertiesBox extends FullBox(‘assp’, version, 0) { if (version == 0) { signed int(32) min_initial_alt_startup_offset; } else if (version == 1) { unsigned int(32) num_entries; for (j=1; j <= num_entries; j++) { unsigned int(32) grouping_type_parameter; signed int(32) min_initial_alt_startup_offset; } } }
8.8.16.3 Semantics
min_initial_alt_startup_offset:Novalueofsample_offset[1]ofthereferredsamplegroupdescriptionentriesofthealternativestartupsequencesamplegroupingshallbesmallerthanmin_initial_alt_startup_offset.Inversion0ofthisbox,thealternativestartupsequencesamplegroupingusingversion0oftheSampletoGroupboxisreferredto.Inversion1ofthisbox,thealternativestartupsequencesamplegroupingusingversion1oftheSampletoGroupboxisreferredtoasfurtherconstrainedbygrouping_type_parameter.
num_entriesindicatesthenumberofalternativestartupsequencesamplegroupingsdocumentedinthisbox.
grouping_type_parameter indicateswhichoneof thealternativesamplegroupingsthis loopentryappliesto.
8.8.17 Metadata and user data in movie fragments
Whenmetaboxesoccurinmoviefragmentortrackfragmentboxes,thefollowingapplies.Thefilemusthavebeenfragmentedsuchthatanymeta‐dataneededinthemovieortrackfragmentisformedfromtheunionof themeta‐data in themovieboxand the fragment,notconsideringorusingmeta‐data inanyotherfragment.Meta‐datainamovieortrackfragmentislogically‘arrivinglate’butisvalidfortheentire track. When a file is de‐fragmented, the meta‐data in the movie or track fragments must bemerged into themovie or track boxes, respectively. This process allows for ‘just in time’ delivery of
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 67
supportresources,andbandwidthmanagement,whilepreservingtheessentiallyatemporalnatureofuntimedmeta‐data.Ifmeta‐datatrulychangesovertime,atimedmeta‐datatrackmaybeneeded.
If,duringthismerge,thereareeither(a)meta‐dataitemswiththesameitem_IDor(b)user‐dataitemswiththesametype,thenthefollowingapplies:
a) alloccurrencesofthedata(user‐databoxormeta‐dataitem)mustbe‘true’fortheentiremovieincludingallfragments;
b) theoccurrencesinhigher‐numberedmoviefragments(‘later’occurrences)maybemoreaccurateor‘preferred’;
c) inparticular,datainanemptyinitialmovieatommaybeonlyestimatesor‘nottoexceed’values,anddatainafinalotherwiseemptymoviefragmentmaybethe‘final’ormostaccuratevalues.
8.9 Sample Group Structures
8.9.1 Introduction
This clause specifies a generic mechanism for representing a partition of the samples in a track. Asample grouping isanassignmentofeachsampleinatracktobeamemberofonesample group,basedonagroupingcriterion.Asamplegroupinasamplegroupingisnotlimitedtobeingcontiguoussamplesand may contain non‐adjacent samples. As there may be more than one sample grouping for thesamplesinatrack,eachsamplegroupinghasatypefieldtoindicatethetypeofgrouping.Forexample,afilemightcontaintwosamplegroupingsforthesametrack:onebasedonanassignmentofsampletolayersandanothertosub‐sequences.
Sample groupings are represented by two linked data structures: (1) a SampleToGroup boxrepresents the assignment of samples to sample groups; (2) a SampleGroupDescription boxcontainsasample group entryforeachsamplegroupdescribingthepropertiesofthegroup.Theremaybe multiple instances of the SampleToGroup and SampleGroupDescription boxes based ondifferentgroupingcriteria.Thesearedistinguishedbyatypefieldusedtoindicatethetypeofgrouping.
Agroupingofaparticulargroupingtypemayuseaparameterinthesampletogroupmapping;ifso,themeaning of the parameter must be documented with the group. An example of this might bedocumentedthesyncpointsinamultiplexofseveralvideostreams;thegroupdefinitionmightbe‘IsanIframe’,andthegroupparametermightbetheidentifierofeachstream.Sincethesampletogroupboxoccurs once for each stream, it is now both compact, and informs the reader about each streamseparately.
Oneexampleofusingthesetablesistorepresenttheassignmentsofsamplestolayers.Inthiscaseeachsample group represents one layer, with an instance of theSampleToGroup box describingwhichlayerasamplebelongsto.
ISO/IEC 14496-12:2015(E)
68 ©ISO/IEC2015–Allrightsreserved
8.9.2 Sample to Group Box
8.9.2.1 Definition
BoxType: ‘sbgp’Container: SampleTableBox(‘stbl’)orTrackFragmentBox(‘traf’)Mandatory:NoQuantity: Zeroormore.
Thistablecanbeusedtofindthegroupthatasamplebelongstoandtheassociateddescriptionofthatsamplegroup.Thetableiscompactlycodedwitheachentrygivingtheindexofthefirstsampleofarunofsampleswith thesamesamplegroupdescriptor.Thesamplegroupdescription ID isan index thatreferstoaSampleGroupDescriptionbox,whichcontainsentriesdescribingthecharacteristicsofeachsamplegroup.
Theremaybemultipleinstancesofthisboxifthereismorethanonesamplegroupingforthesamplesin a track. Each instance of the SampleToGroup box has a type code that distinguishes differentsamplegroupings.ThereshallbeatmostoneinstanceofthisboxwithaparticulargroupingtypeinaSampleTableBoxorTrackFragmentBox.TheassociatedSampleGroupDescriptionshallindicatethesamevalueforthegroupingtype.
Version1ofthisboxshouldonlybeusedifagroupingtypeparameterisneeded.
8.9.2.2 Syntax
aligned(8) class SampleToGroupBox extends FullBox(‘sbgp’, version, 0) { unsigned int(32) grouping_type; if (version == 1) { unsigned int(32) grouping_type_parameter; } unsigned int(32) entry_count; for (i=1; i <= entry_count; i++) { unsigned int(32) sample_count; unsigned int(32) group_description_index; } }
8.9.2.3 Semantics
version isanintegerthatspecifiestheversionofthisbox,either0or1.grouping_type is an integer that identifies the type (i.e. criterion used to form the sample
groups)ofthesamplegroupingandlinksittoitssamplegroupdescriptiontablewiththesamevalue for grouping type. At most one occurrence of this box with the same value forgrouping_type(and, if used, grouping_type_parameter)shallexistforatrack.
grouping_type_parameter isanindicationofthesub‐typeofthegroupingentry_countisanintegerthatgivesthenumberofentriesinthefollowingtable.sample_countisanintegerthatgivesthenumberofconsecutivesampleswiththesamesample
groupdescriptor.Ifthesumofthesamplecountinthisboxislessthanthetotalsamplecount,orthereisnosample‐to‐groupboxthatappliestosomesamples(e.g.itisabsentfromatrackfragment),thenthereadershouldassociatesthesamplesthathavenoexplicitgroupassociationwiththedefaultgroupdefinedintheSampleDescriptionGroupbox,ifany,orelsewithnogroup.It is an error for the total in this box to be greater than the sample_count documentedelsewhere,andthereaderbehaviourwouldthenbeundefined.
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 69
group_description_index isanintegerthatgivestheindexofthesamplegroupentrywhichdescribes thesamples in thisgroup.The indexranges from1 to thenumberof samplegroupentries in the SampleGroupDescription Box, or takes the value 0 to indicate that thissampleisamemberofnogroupofthistype.
8.9.3 Sample Group Description Box
8.9.3.1 Definition
BoxType: ‘sgpd’Container: SampleTableBox(‘stbl’)orTrackFragmentBox(‘traf’)Mandatory:NoQuantity: Zeroormore,withoneforeachSampletoGroupBox.
This description table gives information about the characteristics of sample groups. The descriptiveinformationisanyotherinformationneededtodefineorcharacterizethesamplegroup.
Theremaybemultipleinstancesofthisboxifthereismorethanonesamplegroupingforthesamplesina track.Each instanceof theSampleGroupDescription boxhas a typecode thatdistinguishesdifferentsamplegroupings.ThereshallbeatmostoneinstanceofthisboxwithaparticulargroupingtypeinaSampleTableBoxorTrackFragmentBox.TheassociatedSampleToGroupshallindicatethesamevalueforthegroupingtype.
Theinformationisstoredinthesamplegroupdescriptionboxaftertheentry‐count.Anabstractentrytype isdefinedand samplegroupings shall definederived types to represent thedescriptionof eachsamplegroup.Forvideotracks,anabstractVisualSampleGroupEntryisusedwithsimilartypesforaudioandhinttracks.
NOTEInversion0oftheentriesthebaseclassesforsamplegroupdescriptionentriesareneitherboxesnorhaveasizethatissignaled.Forthisreason,useofversion0entriesisdeprecated.Whendefiningderivedclasses,ensureeitherthattheyhaveafixedsize,orthatthesizeisexplicitlyindicatedwithalengthfield.Animplied size (e.g. achieved by parsing the data) is not recommended as this makes scanning the arraydifficult.
8.9.3.2 Syntax
// Sequence Entry abstract class SampleGroupDescriptionEntry (unsigned int(32) grouping_type) { } abstract class VisualSampleGroupEntry (unsigned int(32) grouping_type) extends SampleGroupDescriptionEntry (grouping_type) { } abstract class AudioSampleGroupEntry (unsigned int(32) grouping_type) extends SampleGroupDescriptionEntry (grouping_type) { } abstract class HintSampleGroupEntry (unsigned int(32) grouping_type) extends SampleGroupDescriptionEntry (grouping_type) { }
ISO/IEC 14496-12:2015(E)
70 ©ISO/IEC2015–Allrightsreserved
abstract class SubtitleSampleGroupEntry (unsigned int(32) grouping_type) extends SampleGroupDescriptionEntry (grouping_type) { } abstract class TextSampleGroupEntry (unsigned int(32) grouping_type) extends SampleGroupDescriptionEntry (grouping_type) { } aligned(8) class SampleGroupDescriptionBox (unsigned int(32) handler_type) extends FullBox('sgpd', version, 0){ unsigned int(32) grouping_type; if (version==1) { unsigned int(32) default_length; } if (version>=2) { unsigned int(32) default_sample_description_index; } unsigned int(32) entry_count; int i; for (i = 1 ; i <= entry_count ; i++){ if (version==1) { if (default_length==0) { unsigned int(32) description_length; } } SampleGroupEntry (grouping_type); // an instance of a class derived from SampleGroupEntry // that is appropriate and permitted for the media type } }
8.9.3.3 Semantics
version isanintegerthatspecifiestheversionofthisbox.grouping_type is an integer that identifies theSampleToGroup box that is associatedwith
this sample group description. If grouping_type_parameter is not defined for a givengrouping_type,thenthereshallbeonlyoneoccurrenceofthisboxwiththisgrouping_type.
default_sample_description_index: specifies the index of the sample group description entry which applies to all samples in the track for which no sample to group mapping is provided through a SampleToGroup box. The default value of this field is zero (indicating that the samples are mapped to no group of this type).
entry_countisanintegerthatgivesthenumberofentriesinthefollowingtable.default_length indicatesthelengthofeverygroupentry(ifthelengthisconstant),orzero(0)
ifitisvariabledescription_lengthindicatesthelengthofanindividualgroupentry,inthecaseitvariesfrom
entrytoentryanddefault_lengthistherefore0
8.9.4 Representation of group structures in Movie Fragments
Support for Sample Group structures within Movie fragments is provided by the use of theSampleToGroupBoxwiththecontainerforthisBoxbeingtheTrack FragmentBox(‘traf’).Thedefinition,syntaxandsemanticsofthisBoxisasspecifiedinsubclause8.9.2.
TheSampleToGroup Boxcanbeusedtofindthegroupthatasampleinatrackfragmentbelongstoand the associated description of that sample group. The table is compactly coded with each entrygiving the indexof the first sampleof a run of sampleswith the same sample groupdescriptor.Thesample group description ID is an index that refers to a SampleGroupDescription Box, which
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 71
contains entries describing the characteristics of each sample group and present in theSampleTableBox.
TheremaybemultipleinstancesoftheSampleToGroupBoxifthereismoretheonesamplegroupingfor thesamples ina track fragment.Each instanceof theSampleToGroupBoxhasa typecode thatdistinguishesdifferentsamplegroupings.TheassociatedSampleGroupDescription shall indicatethesamevalueforthegroupingtype.
The total number of samples represented in anySampleToGroup Box in the track fragment mustmatch the total number of samples in all the track fragment runs. Each SampleToGroup Boxdocumentsadifferentgroupingofthesamesamples.
Zero or more SampleGroupDescription boxes may also be present in a Track Fragment Box. ThesedefinitionsareadditionaltothedefinitionsprovidedintheSampleTableofthetrackintheMovieBox.Group definitions within a movie fragment can also be referenced and used fromwithin that samemoviefragment.
Within the SampleToGroup box in that movie fragment, the group description indexes for groupsdefinedwithinthesamefragmentstartat0x10001,i.e.theindexvalue1,withthevalue1inthetop16bits.Thismeanstheremustbefewerthan65536groupdefinitionsforthistrackandgroupingtypeinthesampletableintheMovieBox.
Whenchangingthesizeofmoviefragments,orremovingthem,thesefragment‐localgroupdefinitionswillneedtobemergedintothedefinitionsinthemoviebox,orintothenewmoviefragments,andtheindex numbers in the SampleToGroup box(es) adjusted accordingly. It is recommended that, in thisprocess, identical (andhenceduplicate)definitionsnotbemade inanySampleGroupDescriptionbox,butthatduplicatesbemergedandtheindexesadjustedaccordingly.
8.10 User Data
8.10.1 User Data Box
8.10.1.1 Definition
BoxType: ‘udta’Container: MovieBox(‘moov’),TrackBox(‘trak’), MovieFragmentBox(‘moof’)orTrackFragmentBox(‘traf’)Mandatory:NoQuantity: Zeroorone
This box contains objects that declare user information about the containing box and its data(presentationortrack).
TheUserDataBoxisacontainerboxforinformativeuser‐data.Thisuserdataisformattedasasetofboxeswithmorespecificboxtypes,whichdeclaremorepreciselytheircontent.
Thehandlingofuser‐datainmoviefragmentsisdescribedin8.8.17.
ISO/IEC 14496-12:2015(E)
72 ©ISO/IEC2015–Allrightsreserved
8.10.1.2 Syntax
aligned(8) class UserDataBox extends Box(‘udta’) { }
8.10.2 Copyright Box
8.10.2.1 Definition
BoxType: ‘cprt’Container: Userdatabox(‘udta’)Mandatory: NoQuantity: Zeroormore
The Copyright box contains a copyright declaration which applies to the entire presentation, whencontained within the Movie Box, or, when contained in a track, to that entire track. There may bemultiplecopyrightboxesusingdifferentlanguagecodes.
8.10.2.2 Syntax
aligned(8) class CopyrightBox extends FullBox(‘cprt’, version = 0, 0) { const bit(1) pad = 0; unsigned int(5)[3] language; // ISO-639-2/T language code string notice; }
8.10.2.3 Semantics
language declaresthelanguagecodeforthefollowingtext.SeeISO639‐2/Tforthesetofthreecharactercodes.Eachcharacter ispackedas thedifferencebetween itsASCIIvalueand0x60.Thecodeisconfinedtobeingthreelower‐caseletters,sothesevaluesarestrictlypositive.
noticeisanull‐terminatedstringineitherUTF‐8orUTF‐16characters,givingacopyrightnotice.IfUTF‐16isused,thestringshallstartwiththeBYTEORDERMARK(0xFEFF),todistinguishitfromaUTF‐8string.Thismarkdoesnotformpartofthefinalstring.
8.10.3 Track Selection Box
8.10.3.1 Introduction
Atypicalpresentationstoredinafilecontainsonealternategrouppermediatype:oneforvideo,oneforaudio,etc.Sucha filemayincludeseveralvideotracks,although,atanypoint intime,onlyoneofthemshouldbeplayedorstreamed.Thisisachievedbyassigningallvideotrackstothesamealternategroup.(Seesubclause8.3.2forthedefinitionofalternategroups.)
Alltracksinanalternategrouparecandidatesformediaselection,butitmaynotmakesensetoswitchbetweensomeof those tracksduringa session.Onemay for instanceallowswitchingbetweenvideotracks at different bitrates and keep frame size but not allow switching between tracks of differentframesize.Inthesamemanneritmaybedesirabletoenableselection–butnotswitching–betweentracksofdifferentvideocodecsordifferentaudiolanguages.
Thedistinctionbetween tracks for selection and switching is addressedbyassigning tracks to switchgroupsinadditiontoalternategroups.Onealternategroupmaycontainoneormoreswitchgroups.Alltracksinanalternategrouparecandidatesformediaselection,whiletracksinaswitchgrouparealso
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 73
available forswitchingduringasession.Differentswitchgroupsrepresentdifferentoperationpoints,suchasdifferentframesize,high/lowquality,etc.
Forthecaseofnon‐scalablebitstreams,severaltracksmaybeincludedinaswitchgroup.Thesamealsoappliestonon‐layeredscalablebitstreams,suchastraditionalAVCstreams.
Bylabellingtrackswithattributesitispossibletocharacterizethem.Eachtrackcanbelabelledwithalistofattributeswhichcanbeusedtodescribetracksinaparticularswitchgroupordifferentiatetracksthatbelongtodifferentswitchgroups.
8.10.3.2 Definition
BoxType: ‘tsel’Container: UserDataBox(‘udta’)Mandatory:NoQuantity: ZeroorOne
Thetrackselectionboxiscontainedintheuserdataboxofthetrackitmodifies.
8.10.3.3 Syntax
aligned(8) class TrackSelectionBox extends FullBox(‘tsel’, version = 0, 0) { template int(32) switch_group = 0; unsigned int(32) attribute_list[]; // to end of the box }
8.10.3.4 Semantics
switch_group isanintegerthatspecifiesagrouporcollectionoftracks.Ifthisfieldis0(defaultvalue)oriftheTrackSelectionboxisabsentthereisnoinformationonwhetherthetrackcanbeusedforswitchingduringplayingorstreaming. If this integerisnot0 itshallbethesamefortracksthatcanbeusedforswitchingbetweeneachother.Tracksthatbelongtothesameswitchgroupshallbelongtothesamealternategroup.Aswitchgroupmayhaveonlyonemember.
attribute_listisalist,totheendofthebox,ofattributes.Theattributesinthislistshouldbeused as descriptions of tracks or differentiation criteria for tracks in the same alternate orswitch group. Each differentiating attribute is associated with a pointer to the field orinformationthatdistinguishesthetrack.
8.10.3.5 Attributes
ISO/IEC 14496-12:2015(E)
74 ©ISO/IEC2015–Allrightsreserved
Thefollowingattributesaredescriptive:
Name Attribute Description
Temporalscalability
‘tesc’ Thetrackcanbetemporallyscaled.
Fine‐grainSNRscalability
‘fgsc’ Thetrackcanbescaledintermsofquality.
Coarse‐grainSNRscalability
‘cgsc’ Thetrackcanbescaledintermsofquality.
Spatialscalability ‘spsc’ Thetrackcanbespatiallyscaled.
Region‐of‐interestscalability
‘resc’ Thetrackcanberegion‐of‐interestscaled.
Viewscalability ‘vwsc’ Thetrackcanbescaledintermsofnumberofviews.
Thefollowingattributesaredifferentiating:
Name Attribute Pointer
Codec ‘cdec’ Sample Entry (in Sample Description box of mediatrack)
Screensize ‘scsz’ WidthandheightfieldsofVisualSampleEntries.
Maxpacketsize ‘mpsz’ MaxpacketsizefieldinRTPHintSampleEntry
Mediatype ‘mtyp’ HandlertypeinHandlerbox(ofmediatrack)
Medialanguage ‘mela’ LanguagefieldinMediaHeaderbox
Bitrate ‘bitr’ Totalsizeofthesamplesinthetrackdividedbythedurationinthetrackheaderbox
Framerate ‘frar’ Numberofsamplesinthetrackdividedbydurationinthetrackheaderbox
Numberofviews ‘nvws’ Numberofviewsinthesubtrack
Descriptive attributes characterize the tracks they modify, whereas differentiating attributesdifferentiate between tracks that belong to the same alternate or switch groups. The pointer of adifferentiatingattributeindicatesthelocationoftheinformationthatdifferentiatesthetrackfromothertrackswiththesameattribute.
8.10.4 Track kind
8.10.4.1 Definition
BoxType: ‘kind’Container: Userdatabox(‘udta’)inatrackMandatory: NoQuantity: Zeroormore
TheKindboxlabelsatrackwithitsroleorkind.
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 75
ItcontainsaURI,possiblyfollowedbyavalue.IfonlyaURIoccurs,thenthekindisdefinedbythatURI;ifavaluefollows,thenthenamingschemeforthevalueisidentifiedbytheURI.BoththeURIandthevaluearenull‐terminatedCstrings.
More thanoneof thesemayoccur ina track,withdifferent contentsbutwithappropriatesemantics(e.g.twoschemesthatbothdefineakindthatindicatessub‐titles).
8.10.4.2 Syntax
aligned(8) class KindBox extends FullBox(‘kind’, version = 0, 0) { string schemeURI; string value; }
8.10.4.3 Semantics
schemeURIisaNULL‐terminatedCstringdeclaringeithertheidentifierofthekind,ifnovaluefollows,ortheidentifierofthenamingschemeforthefollowingvalue.
valueisanamefromthedeclaredscheme
8.11 Metadata Support
Acommonbasestructureisusedtocontaingeneralmetadata,calledthemetabox.
8.11.1 The Meta box
8.11.1.1 Definition
BoxType: ‘meta’Container: File,MovieBox(‘moov’),TrackBox(‘trak’), AdditionalMetadataContainerBox(‘meco’), MovieFragmentBox(‘moof’)orTrackFragmentBox(‘traf’)Mandatory:NoQuantity: Zeroorone(inFile,‘moov’,and‘trak’),Oneormore(in‘meco’)
A meta box contains descriptive or annotative metadata. The 'meta' box is required to contain a‘hdlr’boxindicatingthestructureorformatofthe‘meta’boxcontents.Thatmetadatais locatedeitherwithinaboxwithinthisbox(e.g.anXMLbox),orislocatedbytheitemidentifiedbyaprimaryitembox.
Allothercontainedboxesarespecifictotheformatspecifiedbythehandlerbox.
Theotherboxesdefinedheremaybedefinedasoptionalormandatoryforagivenformat.Iftheyareused,thentheymusttaketheformspecifiedhere.Theseoptionalboxesincludeadata‐informationbox,whichdocumentsother files inwhichmetadatavalues (e.g. pictures) areplaced, anda item locationbox,whichdocumentswhere in those files each item is located (e.g. in the commoncaseofmultiplepicturesstoredinthesamefile).Atmostonemetaboxmayoccurateachofthefilelevel,movielevel,ortracklevel,unlesstheyarecontainedinanadditionalmetadatacontainerbox(‘meco’).
If an Item Protection Box occurs, then some or all of themeta‐data, including possibly the primaryresource, may have been protected and be un‐readable unless the protection system is taken intoaccount.
ISO/IEC 14496-12:2015(E)
76 ©ISO/IEC2015–Allrightsreserved
Thehandlingofmeta‐datainmoviefragmentsisdescribedin8.8.17.
8.11.1.2 Syntax
aligned(8) class MetaBox (handler_type) extends FullBox(‘meta’, version = 0, 0) { HandlerBox(handler_type) theHandler; PrimaryItemBox primary_resource; // optional DataInformationBox file_locations; // optional ItemLocationBox item_locations; // optional ItemProtectionBox protections; // optional ItemInfoBox item_infos; // optional IPMPControlBox IPMP_control; // optional ItemReferenceBox item_refs; // optional ItemDataBox item_data; // optional Box other_boxes[]; // optional }
8.11.1.3 Semantics
Thestructureorformatofthemetadataisdeclaredbythehandler.Inthecasethattheprimarydatais identifiedbyaprimary item, and thatprimary itemhasan item informationentrywithanitem_type,thehandlertypemaybethesameastheitem_type.
8.11.2 XML Boxes
8.11.2.1 Definition
BoxType: ‘xml ‘or‘bxml’Container: Metabox(‘meta’)Mandatory:NoQuantity: Zeroorone
WhentheprimarydataisinXMLformatanditisdesiredthattheXMLbestoreddirectlyinthemeta‐box,oneoftheseformsmaybeused.TheBinaryXMLBoxmayonlybeusedwhenthereisasinglewell‐definedbinarizationoftheXMLforthatdefinedformatasidentifiedbythehandler.
WithinanXMLboxthedata is inUTF‐8formatunlessthedatastartswithabyte‐order‐mark(BOM),whichindicatesthatthedataisinUTF‐16format.
8.11.2.2 Syntax
aligned(8) class XMLBox extends FullBox(‘xml ’, version = 0, 0) { string xml; }
aligned(8) class BinaryXMLBox extends FullBox(‘bxml’, version = 0, 0) { unsigned int(8) data[]; // to end of box }
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 77
8.11.3 The Item Location Box
8.11.3.1 Definition
BoxType: ‘iloc’Container: Metabox(‘meta’)Mandatory:NoQuantity: Zeroorone
Theitemlocationboxprovidesadirectoryofresourcesinthisorotherfiles,bylocatingtheircontainer,their offset within that container, and their length. Placing this in binary format enables commonhandling of this data, even by systems which do not understand the particular metadata system(handler)used.Forexample,asystemmightintegratealltheexternallyreferencedmetadataresourcesintooneplace,re‐adjustingoffsetsandreferencesaccordingly.
The box startswith three or four values, specifying the size in bytes of theoffset field,length field,base_offsetfield,and,inversions1and2ofthisbox,theextent_indexfields,respectively.Thesevaluesmustbefromtheset{0,4,8}.
Theconstruction_methodfieldindicatesthe‘constructionmethod’fortheitem:
i) file_offset:bytheusualabsolutefileoffsetsintothefileatdata_reference_index;(construction_method==0)
ii) idat_offset:byboxoffsetsintotheidatboxinthesamemetabox;neitherthedata_reference_indexnorextent_indexfieldsareused;(construction_method==1)
iii) item_offset:byitemoffsetintotheitemsindicatedbytheextent_indexfield,whichisonlyused(currently)bythisconstructionmethod.(construction_method==2).
The extent_index is only used for themethod item_offset; it indicates the 1‐based index of the itemreferencewithreferenceType‘iloc’linkedfromthisitem.Ifindex_sizeis0,thenthevalue1isimplied;thevalue0isreserved.
Items may be stored fragmented into extents, e.g. to enable interleaving. An extent is a contiguoussubset of thebytes of the resource; the resource is formedby concatenating the extents. If only oneextentisused(extent_count=1)theneitherorbothoftheoffsetandlengthmaybeimplied:
If theoffset isnot identified (the fieldhas a lengthof zero), then thebeginningof the source(offset0)isimplied.
Ifthelengthisnotspecified,orspecifiedaszero,thentheentirelengthofthesourceisimplied.References into the same file as this metadata, or items divided into more than one extent,shouldhaveanexplicitoffsetandlength,oruseaMIMEtyperequiringadifferentinterpretationofthefile,toavoidinfiniterecursion.
Thesizeoftheitemisthesumoftheextentlengths.
NOTEExtentsmaybeinterleavedwiththechunksdefinedbythesampletablesoftracks.
Theoffsetsarerelativetoadataorigin.Thatoriginisdeterminedasfollows:
ISO/IEC 14496-12:2015(E)
78 ©ISO/IEC2015–Allrightsreserved
1) whentheMetaboxisinaMovieFragment,andtheconstruction_methodspecifiesafileoffset,and the data reference indicates ‘same file’, the data origin is the first byte of the enclosingMovieFragmentBox(asforthedefault‐base‐is‐moofflagintheTrackFragmentHeader);
2) in all other caseswhen the construction_method specifies a file offset, the data origin is thebeginningofthefileidentifiedbythedatareference;
3) when the construction_method specifies offsets into the ItemData box, the data origin is thebeginningofdata[]intheItemDatabox;
4) when the data reference specifies another item, the data origin is the first byte of theconcatenateddata(ofalltheextents)ofthatitem;
Note – There are offset calculations in other parts of this file format based on the beginning of a box header; incontrast,itemdataoffsetsarecalculatedrelativetotheboxcontents.
The data‐reference index may take the value 0, indicating a reference into the same file as thismetadata,oranindexintothedata‐referencetable.
Some referenced datamay itself use offset/length techniques to address resourceswithin it (e.g. anMP4 filemight be ‘included’ in thisway). Normally such offsets in the item itself are relative to thebeginning of the containing file. The field ‘base offset’ provides an additional offset for offsetcalculationswithinthatcontaineddata.Forexample,ifanMP4fileisincludedwithinafileformattedtothis specification, thennormallydata‐offsetswithin thatMP4 sectionare relative to thebeginningoffile;thebaseoffsetaddstothoseoffsets.
Ifanitemisconstructedfromotheritems,andthosesourceitemsareprotected,theoffsetandlengthinformationapplytothesourceitemsaftertheyhavebeende‐protected.Thatis,thetargetitemdataisformedfromunprotectedsourcedata.
For maximum compatibility, version 0 of this box should be used in preference to version 1 withconstruction_method==0,orversion2whenpossible.Similarly,version2ofthisboxshouldonlybe used when support for large item_ID values (exceeding 65535) is required or expected to berequired.
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 79
8.11.3.2 Syntax
aligned(8) class ItemLocationBox extends FullBox(‘iloc’, version, 0) { unsigned int(4) offset_size; unsigned int(4) length_size; unsigned int(4) base_offset_size; if ((version == 1) || (version == 2)) { unsigned int(4) index_size; } else { unsigned int(4) reserved; } if (version < 2) { unsigned int(16) item_count; } else if (version == 2) { unsigned int(32) item_count; } for (i=0; i<item_count; i++) { if (version < 2) { unsigned int(16) item_ID; } else if (version == 2) { unsigned int(32) item_ID; } if ((version == 1) || (version == 2)) { unsigned int(12) reserved = 0; unsigned int(4) construction_method; } unsigned int(16) data_reference_index; unsigned int(base_offset_size*8) base_offset; unsigned int(16) extent_count; for (j=0; j<extent_count; j++) { if (((version == 1) || (version == 2)) && (index_size > 0)) { unsigned int(index_size*8) extent_index; } unsigned int(offset_size*8) extent_offset; unsigned int(length_size*8) extent_length; } } }
8.11.3.3 Semantics
offset_sizeistakenfromtheset{0,4,8}andindicatesthelengthinbytesoftheoffset field.length_sizeistakenfromtheset{0,4,8}andindicatesthelengthinbytesofthelength field.base_offset_size is taken from the set {0, 4, 8} and indicates the length in bytes of the
base_offsetfield.index_sizeistakenfromtheset{0,4,8}andindicatesthelengthinbytesoftheextent_index
field.item_countcountsthenumberofresourcesinthefollowingarray.item_ID isanarbitraryinteger‘name’forthisresourcewhichcanbeusedtorefertoit(e.g.ina
URL).construction_methodistakenfromtheset0(file),1(idat)or2(item)data-reference-indexiseitherzero(‘thisfile’)ora1‐basedindexintothedatareferencesin
thedatainformationbox.base_offset provides a base value for offset calculations within the referenced data. If
base_offset_sizeis0,base_offsettakesthevalue0,i.e.itisunused.extent_count provides the count of the number of extents into which the resource is
fragmented;itmusthavethevalue1orgreaterextent_indexprovidesanindexasdefinedfortheconstructionmethodextent_offsetprovidestheabsoluteoffset,inbytesfromthedataoriginofthecontainer,ofthis
extentdata.Ifoffset_sizeis0,extent_offsettakesthevalue0
ISO/IEC 14496-12:2015(E)
80 ©ISO/IEC2015–Allrightsreserved
extent_length provides the absolute length in bytes of this metadata item extent. Iflength_sizeis0,extent_lengthtakesthevalue0.Ifthevalueis0,thenlengthoftheextentisthelengthoftheentirereferencedcontainer.
8.11.4 Primary Item Box
8.11.4.1 Definition
BoxType: ‘pitm’Container: Metabox(‘meta’)Mandatory:NoQuantity: Zeroorone
Foragivenhandler,theprimarydatamaybeoneofthereferenceditemswhenitisdesiredthatitbestoredelsewhere,ordividedintoextents;ortheprimarymetadatamaybecontainedinthemeta‐box(e.g.inanXMLbox).Eitherthisboxmustoccur,ortheremustbeaboxwithinthemeta‐box(e.g.anXMLbox)containingtheprimaryinformationintheformatrequiredbytheidentifiedhandler.
8.11.4.2 Syntax
aligned(8) class PrimaryItemBox extends FullBox(‘pitm’, version, 0) { if (version == 0) { unsigned int(16) item_ID; } else { unsigned int(32) item_ID; } }
8.11.4.3 Semantics
item_IDistheidentifieroftheprimaryitem.Version1shouldonlybeusedwhenlargeitem_IDvalues(exceeding65535)arerequiredorexpectedtoberequired.
8.11.5 Item Protection Box
8.11.5.1 Definition
BoxType: ‘ipro’Container: Metabox(‘meta’)Mandatory:NoQuantity: Zeroorone
The item protection box provides an array of item protection information, for use by the ItemInformationBox.
8.11.5.2 Syntax
aligned(8) class ItemProtectionBox extends FullBox(‘ipro’, version = 0, 0) { unsigned int(16) protection_count; for (i=1; i<=protection_count; i++) { ProtectionSchemeInfoBox protection_information; } }
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 81
8.11.6 Item Information Box
8.11.6.1 Definition
BoxType: ‘iinf’Container: MetaBox(‘meta’)Mandatory:NoQuantity: Zeroorone
The Item informationboxprovides extra informationabout selected items, including symbolic (‘file’)names. It may optionally occur, but if it does, it must be interpreted, as item protection or contentencodingmayhavechangedtheformatofthedataintheitem.Ifbothcontentencodingandprotectionareindicatedforanitem,areadershouldfirstun‐protecttheitem,andthendecodetheitem’scontentencoding.Ifmorecontrolisneeded,anIPMPsequencecodemaybeused.
This box contains an array of entries, and each entry is formatted as a box. This array is sorted byincreasingitem_IDintheentryrecords.
Fourversionsoftheiteminfoentryaredefined.Version1includesadditionalinformationtoversion0asspecifiedbyanextensiontype.Forinstance,itshallbeusedwithextensiontype'fdel' foritemsthatarereferencedbythefilepartitionbox('fpar'),whichisdefinedforsourcefilepartitioningsandapplies to file delivery transmissions. Versions 2 and 3 provide an alternative structure in whichmetadataitemtypesareindicatedbya32‐bit(typically4‐character)registeredordefinedcode;twoofthesecodesaredefinedtoindicateaMIMEtypeormetadatatypedbyaURI.Version2supports16‐bititem_IDvalues,whereasversion3supports32‐bititem_IDvalues.
If no extension is desired, the box may terminate without the extension_type field and theextension;if,inaddition,content_encodingisnotdesired,thatfieldalsomaybeabsentandtheboxterminatebefore it. If anextension isdesiredwithout anexplicitcontent_encoding, a singlenullbyte,signifyingtheemptystring,mustbesuppliedforthecontent_encoding,beforetheindicationofextension_type.
If file delivery item information is needed and a version 2 or 3 ItemInfoEntry is used, then the filedeliveryinformationisstoredasaseparateitemoftype‘fdel’thatisalsolinkedbyanitemreferencefromtheitem,tothefiledeliveryinformation,oftype‘fdel’.Theremustbeexactlyonesuchreferenceiffiledeliveryinformationisneeded.
It ispossiblethattherearevalidURI formsforMPEG‐7metadata(e.g.aschemaURIwitha fragmentidentifyingaparticularelement),anditmaybepossiblethatthesestructurescouldbeusedforMPEG‐7.However,thereisexplicitsupportforMPEG‐7inISObasemediafileformatfamilyfiles,andthisexplicitsupportispreferredasitallows,amongotherthings:
a) incrementalupdateofthemetadata(logically,I/Pcoding,invideoterms)whereasthisdraftis‘I‐frameonly’;
b) binarizationandthuscompaction;
c) theuseofmultipleschemas.
Therefore,theuseofthesestructuresforMPEG‐7isdeprecated(andundocumented).
ISO/IEC 14496-12:2015(E)
82 ©ISO/IEC2015–Allrightsreserved
InformationonURIformsforsomemetadatasystemscanbefoundinAnnexG.
Version 1 of ItemInfoBox should only be used when support for a large number ofitemInfoEntries(exceeding65535)isrequiredorexpectedtoberequired.
8.11.6.2 Syntax
aligned(8) class ItemInfoExtension(unsigned int(32) extension_type) { }
aligned(8) class FDItemInfoExtension() extends ItemInfoExtension (’fdel’) { string content_location; string content_MD5; unsigned int(64) content_length; unsigned int(64) transfer_length; unsigned int(8) entry_count; for (i=1; i <= entry_count; i++) unsigned int(32) group_id; }
aligned(8) class ItemInfoEntry extends FullBox(‘infe’, version, 0) { if ((version == 0) || (version == 1)) { unsigned int(16) item_ID; unsigned int(16) item_protection_index string item_name; string content_type; string content_encoding; //optional } if (version == 1) { unsigned int(32) extension_type; //optional ItemInfoExtension(extension_type); //optional } if (version >= 2) { if (version == 2) { unsigned int(16) item_ID; } else if (version == 3) { unsigned int(32) item_ID; } unsigned int(16) item_protection_index; unsigned int(32) item_type; string item_name; if (item_type==’mime’) { string content_type; string content_encoding; //optional } else if (item_type == ‘uri ‘) { string item_uri_type; } } }
aligned(8) class ItemInfoBox extends FullBox(‘iinf’, version, 0) { if (version == 0) { unsigned int(16) entry_count; } else { unsigned int(32) entry_count; } ItemInfoEntry[ entry_count ] item_infos; }
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 83
8.11.6.3 Semantics
item_id containseither0fortheprimaryresource(e.g.,theXMLcontainedinan‘xml ‘box)ortheIDoftheitemforwhichthefollowinginformationisdefined.
item_protection_index contains either 0 for an unprotected item, or the one‐based indexintotheitemprotectionboxdefiningtheprotectionappliedtothisitem(thefirstboxintheitemprotectionboxhastheindex1).
item_nameisanull‐terminatedstringinUTF‐8characterscontainingasymbolicnameoftheitem(sourcefileforfiledeliverytransmissions).
item_type is a 32‐bit value, typically 4 printable characters, that is a defined valid item typeindicator,suchas‘mime’
content_typeisanull‐terminatedstringinUTF‐8characterswiththeMIMEtypeoftheitem.Iftheitemiscontentencoded(seebelow),thenthecontenttypereferstotheitemaftercontentdecoding.
item_uri_type isastringthatisanabsoluteURI,thatisusedasatypeindicator.content_encoding is an optional null‐terminated string in UTF‐8 characters used to indicate
that thebinary file isencodedandneeds tobedecodedbefore interpreted.Thevaluesareasdefined for Content‐Encoding forHTTP/1.1. Somepossible values are “gzip”, “compress” and“deflate”.Anemptystringindicatesnocontentencoding.Notethattheitemisstoredafterthecontentencodinghasbeenapplied.
extension_type isaprintablefour‐charactercodethatidentifiestheextensionfieldsofversion1withrespecttoversion0oftheIteminformationentry.
content_location isanull‐terminatedstringinUTF‐8characterscontainingtheURIofthefileasdefinedinHTTP/1.1(RFC2616).
content_MD5 isanull‐terminatedstringinUTF‐8characterscontaininganMD5digestofthefile.SeeHTTP/1.1(RFC2616)andRFC1864.
content_length givesthetotallength(inbytes)ofthe(un‐encoded)file.transfer_length givesthetotallength(inbytes)ofthe(encoded)file.Notethattransferlength
isequaltocontentlengthifnocontentencodingisapplied(seeabove).entry_count providesacountofthenumberofentriesinthefollowingarray.group_ID indicatesafilegrouptowhichthefileitem(sourcefile)belongs.See3GPPTS26.346
formoredetailsonfilegroups.
8.11.7 Additional Metadata Container Box
8.11.7.1 Definition
BoxType: ‘meco’Container: File,MovieBox(‘moov’),orTrackBox(‘trak’)Mandatory:NoQuantity: Zeroorone
Theadditionalmetadatacontainerbox includesoneormoremetaboxes. It canbecarriedat the toplevelofthefile,intheMovieBox(‘moov’),orintheTrackBox(‘trak’)andshallonlybepresentifitisaccompaniedbyametaboxinthesamecontainer.Ametaboxthatisnotcontainedintheadditionalmetadata container box is the preferred (primary)meta box.Meta boxes in the additionalmetadatacontainerboxcomplementorgivealternativemetadatainformation.Theusageofmultiplemetaboxesmaybedesirablewhen,e.g.,asinglehandlerisnotcapableofprocessingallmetadata.Allmetaboxesata certain level, including thepreferredoneand thosecontained in theadditionalmetadatacontainerbox,musthavedifferenthandlertypes.
ISO/IEC 14496-12:2015(E)
84 ©ISO/IEC2015–Allrightsreserved
AmetaboxcontainedinanadditionalmetadatacontainerboxshallcontainaprimaryItemboxortheprimary data box required by the handler (e.g., an XML Box). It shall not include boxes or syntaxelementsconcerningitemsotherthantheprimaryitemindicatedbythepresentprimaryitemboxorXML box. URLs in ameta box contained in an additionalmetadata container box are relative to thecontextofthepreferredmetabox.
8.11.7.2 Syntax
aligned(8) class AdditionalMetadataContainerBox extends Box('meco') { }
8.11.8 Metabox Relation Box
8.11.8.1 Definition
BoxType: ‘mere’Container: AdditionalMetadataContainerBox(‘meco’)Mandatory:NoQuantity: Zeroormore
Themetaboxrelationbox indicatesarelationbetweentwometaboxesat thesame level, i.e., the toplevel of the file, theMovieBox, orTrackBox.The relationbetween twometaboxes is unspecified ifthere isnometaboxrelationboxforthosemetaboxes.Metaboxesarereferencedbyspecifyingtheirhandlertypes.
8.11.8.2 Syntax
aligned(8) class MetaboxRelationBox extends FullBox('mere', version=0, 0) { unsigned int(32) first_metabox_handler_type; unsigned int(32) second_metabox_handler_type; unsigned int(8) metabox_relation; }
8.11.8.3 Semantics
first_metabox_handler_type indicatesthefirstmetaboxtoberelated.second_metabox_handler_type indicatesthesecondmetaboxtoberelated.metabox_relation indicatestherelationbetweenthetwometaboxes.Thefollowingvaluesare
defined:1 Therelationshipbetweentheboxesisunknown(whichisthedefaultwhenthisbox
isnotpresent);
2 the two boxes are semantically un‐related (e.g., one is presentation, the otherannotation);
3 thetwoboxesaresemanticallyrelatedbutcomplementary(e.g.,twodisjointsetsofmeta‐dataexpressedintwodifferentmeta‐datasystems);
4 the two boxes are semantically related but overlap (e.g., two sets of meta‐dataneitherofwhichisasubsetoftheother);neitheris‘preferred’totheother;
5 thetwoboxesaresemanticallyrelatedbutthesecondisapropersubsetorweakerversionofthefirst;thefirstispreferred;
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 85
6 thetwoboxesaresemanticallyrelatedandequivalent(e.g.,twoessentiallyidenticalsetsofmeta‐dataexpressedintwodifferentmeta‐datasystems).
8.11.9 URL Forms for meta boxes
When ameta‐box is used, thenURLsmaybeused to refer to items in themeta‐box, eitherusing anabsoluteURL,orusingarelativeURL.AbsoluteURLsmayonlybeusedtorefertoitemsinafile‐levelmetabox.
When interpretingdata that is in the context of ameta‐box (i.e. the file for a file‐levelmeta‐box, thepresentationforamovie‐levelmeta‐box,orthetrackforatrack‐levelmeta‐box),theitemsinthemeta‐boxaretreatedasshadowingfilesinthesamelocationasthatfromwhichthecontainerfilecame.Thisshadowingmeans that a reference to another file in the same location as the container filemay beresolvedtoanitemwithinthecontainerfileitself.ItemscanbeaddressedwithinthecontainerfilebyappendingafragmenttotheURLforthecontainerfileitself.Thatfragmentstartswiththe“#”characterandconsistsofeither:
b) item_ID=<n>,identifyingtheitembyitsID(theIDmaybe0fortheprimaryresource);
c) item_name=<item_name>,whentheiteminformationboxisused.
If a fragment within the contained item must be addressed, then the initial “#” character of thatfragmentisreplacedby“*”.
Consider the following example:<http://a.com/d/v.qrv#item_name=tree.html*branch1>.Weassumethatv.qrv isa filewith ameta‐box at the file level. First, the client strips the fragment and fetchesv.qrv from a.comusingHTTP.Ittheninspectsthetop‐levelmetaboxandaddstheitemsinit,logically,toitscacheofthedirectory “d” on a.com. It then re‐forms the URL as <http://a.com/d/tree.html#branch1>.Notethatthefragmenthasbeenelevatedtoafullfilename,andthefirst“*”hasbeentransformedbackinto a “#”. The client then either finds an item named tree.html in the meta box, or fetchestree.html froma.com,and it then finds theanchor“branch1”withintree.html. Ifwithin thathtml,afilewasreferencedusingarelativeURL,e.g.“flower.gif”,thentheclientconvertsthistoanabsoluteURLusingthenormalrules:<http://a.com/d/flower.gif>andagainitcheckstoseeifflower.gifisanameditem(andhenceshadowingaseparatefileofthisname),andthenifitisnot,fetchesflower.giffroma.com.
8.11.10 Static Metadata
Thissectiondefinesthestorageofstatic(un‐timed)metadataintheISOfileformatfamily.
Reader support formetadata in general is optional, and therefore it is also optional for the formatsdefinedhereorelsewhere,unlessmademandatorybyaderivedspecification.
8.11.10.1 Simple textual
Thereisexistingsupportforsimpletextualtagsintheformoftheuser‐databoxes;currentlyonlyoneisdefined–thecopyrightnotice.Othermetadataispermittedusingthissimpleformif:
ISO/IEC 14496-12:2015(E)
86 ©ISO/IEC2015–Allrightsreserved
a) itusesaregisteredbox‐typeoritusestheUUIDescape(thelatterispermittedtoday);
b) it uses a registered tag, the equivalentMPEG‐7 constructmustbedocumented aspart of theregistration.
8.11.10.2 Other forms
Whenotherformsofmetadataaredesired,thena‘meta’boxasdefinedabovemaybeincludedattheappropriate levelof thedocument. If thedocument is intendedtobeprimarilyametadatadocumentperse,thenthemetaboxisatfilelevel.Ifthemetadataannotatesanentirepresentation,thenthemetaboxisatthemovielevel;anentirestream,atthetracklevel.
8.11.10.3 MPEG-7 metadata
MPEG‐7metadataisstoredinmetaboxestothisspecification.
1) Thehandler‐typeis‘mp7t’fortextualmetadatainUnicodeformat;
2) Thehandler‐typeis‘mp7b’forbinarymetadatacompressedintheBIMformat.Inthiscase,thebinaryXMLboxcontainstheconfigurationinformationimmediatelyfollowedbythebinarizedXML.
3) When the format is textual, there is either another box in the metadata container ‘meta’,called‘xml ‘,which contains the textualMPEG‐7document, or there is aprimary itemboxidentifyingtheitemcontainingtheMPEG‐7XML.
4) Whentheformatisbinary,thereiseitheranotherboxinthemetadatacontainer‘meta’,called‘bxml‘,which contains thebinaryMPEG‐7document, or aprimary itembox identifying theitemcontainingtheMPEG‐7binarizedXML.
5) IfanMPEG‐7box isusedat the file level, thenthebrand‘mp71’ shouldbeamemberof thecompatible‐brandslistinthefile‐typebox.
8.11.11 Item Data Box
8.11.11.1 Definition
BoxType: ‘idat’Container: Metadatabox(‘meta’)Mandatory:NoQuantity: Zeroorone
Thisboxcontainsthedataofmetadataitemsthatusetheconstructionmethodindicatingthatanitem’sdataextentsarestoredwithinthisbox.
8.11.11.2 Syntax
aligned(8) class ItemDataBox extends Box(‘idat’) { bit(8) data[]; }
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 87
8.11.11.3 Semantics
dataisthecontainedmetadata
8.11.12 Item Reference Box
8.11.12.1 Definition
BoxType: ‘iref’Container: Metadatabox(‘meta’)Mandatory:NoQuantity: Zeroorone
Theitemreferenceboxallowsthelinkingofoneitemtoothersviatypedreferences.Allthereferencesfor one itemof a specific type are collected into a single item type reference box,whose type is thereferencetype,andwhichhasa‘fromitemID’fieldindicatingwhichitemislinked.Theitemslinkedtoare thenrepresentedbyanarrayof ‘to item ID’s.All thesesingle itemtypereferenceboxesare thencollectedintotheitemreferencebox.Thereferencetypesdefinedforthetrackreferenceboxdefinedin8.3.3 may be used here if appropriate, or other registered reference types. Version 1 ofItemReferenceBox with SingleItemReferenceBoxLarge should only be used when largefrom_item_IDorto_item_IDvalues(exceeding65535)arerequiredorexpectedtoberequired.
NOTE:Thisdesignmakesitfairlyeasytofindallthereferencesofaspecifictype,orfromaspecificitem.
Anitemreferenceoftype‘font’ maybeusedtoindicatethatanitemusesfontscarried/definedinthereferenceditem.
8.11.12.2 Syntax
aligned(8) class SingleItemTypeReferenceBox(referenceType) extends Box(referenceType) { unsigned int(16) from_item_ID; unsigned int(16) reference_count; for (j=0; j<reference_count; j++) { unsigned int(16) to_item_ID; } }
aligned(8) class SingleItemTypeReferenceBoxLarge(referenceType) extends Box(referenceType) { unsigned int(32) from_item_ID; unsigned int(16) reference_count; for (j=0; j<reference_count; j++) { unsigned int(32) to_item_ID; } }
aligned(8) class ItemReferenceBox extends FullBox(‘iref’, version, 0) { if (version==0) { SingleItemTypeReferenceBox references[]; } else if (version==1) { SingleItemTypeReferenceBoxLarge references[]; } }
8.11.12.3 Semantics
reference_type containsanindicationofthetypeofthereferencefrom_item_id containstheIDoftheitemthatreferstootheritems
ISO/IEC 14496-12:2015(E)
88 ©ISO/IEC2015–Allrightsreserved
reference_count isthenumberofreferencesto_item_id containstheIDoftheitemreferredto
8.11.13 Auxiliary video metadata
An auxiliary video track used for depth or parallax informationmay carry ameta‐data item of type‘auvd’(auxiliaryvideodescriptor);thedataofthatitemisexactlyonesi_rbsp()asspecifiedinISO/IEC23002‐3. (Note that si_rbsp() is externally framed, and the length is supplied by the item locationinformation in the file format). Theremay bemore than one of thesemeta‐data items (e.g. one forparallaxinfoandonefordepth,inthecasethatthesamestreamserves).
8.12 Support for Protected Streams
This section documents the file‐format transformationswhich are used for protected content. Thesetransformationscanbeusedunderseveralcircumstances:
Theymustbeusedwhenthecontenthasbeentransformed(e.g.byencryption)insuchawaythatitcannolongerbedecodedbythenormaldecoder;
Theymay be usedwhen the content should only be decodedwhen the protection system isunderstoodandimplemented.
The transformation functions by encapsulating the original media declarations. The encapsulationchanges the four‐character‐code of the sample entries, so that protection‐unaware readers see themediastreamasanewstreamformat.
Becausetheformatofasampleentryvarieswithmedia‐type,adifferentencapsulatingfour‐character‐codeisusedforeachmediatype(audio,video,textetc.).Theyare:
Stream (Track) Type Sample-Entry Code
Video encv
Audio enca
Text enct
System encs
Thetransformationofthesampledescriptionisdescribedbythefollowingprocedure:
1) The four‐character‐code of the sample description is replaced with a four‐character‐codeindicatingprotectionencapsulation:thesecodesvaryonlybymedia‐type.Forexample,‘mp4v’isreplacedwith‘encv’and‘mp4a’isreplacedwith‘enca’.
2) AProtectionSchemeInfoBox(defined below)isaddedtothesampledescription,leavingallotherboxesunmodified.
3) The original sample entry type (four‐character‐code) is stored within theProtectionSchemeInfoBox, in a new box called the OriginalFormatBox (defined below);
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 89
There are then three methods for signalling the nature of the protection, which may be usedindividuallyorincombination.
1) WhenMPEG‐4systemsisused,thenIPMPmustbeusedtosignalthatthestreamsareprotected.
2) IPMPdescriptorsmayalsobeusedoutsidetheMPEG‐4systemscontextusingboxescontainingIPMPdescriptors.
3) Theprotectionappliedmayalsobedescribedusingtheschemetypeandinformationboxes.
When IPMP is used outside of MPEG‐4 systems, then a ‘global’ IPMPControlBox may also occurwithinthe‘moov’atom.
NOTEWhenMPEG‐4 systems isused,anMPEG‐4 systems terminal caneffectively treat, forexample,‘encv’withanOriginalFormatof‘mp4v’exactlythesameas‘mp4v’,byusingtheIPMPdescriptors.
8.12.1 Protection Scheme Information Box
8.12.1.1 Definition
BoxTypes: ‘sinf’Container: ProtectedSampleEntry,orItemProtectionBox(‘ipro’)Mandatory:YesQuantity: OneorMore
TheProtectionSchemeInformationBoxcontainsall the informationrequiredbothtounderstandtheencryptiontransformappliedand itsparameters,andalso to findother informationsuchas thekindandlocationofthekeymanagementsystem.Italsodocumentstheoriginal(unencrypted)formatofthemedia.TheProtectionScheme InformationBox isa containerBox. It ismandatory ina sampleentrythatusesacodeindicatingaprotectedstream.
Whenusedinaprotectedsampleentry,thisboxmustcontaintheoriginalformatboxtodocumenttheoriginalformat.Atleastoneofthefollowingsignallingmethodsmustbeusedtoidentifytheprotectionapplied:
a) MPEG‐4 systems with IPMP: no other boxes, when IPMP descriptors in MPEG‐4 systemsstreamsareused;
b) Schemesignalling: aSchemeTypeBoxandSchemeInformationBox,whentheseareused(eitherbothmustoccur,orneither).
At leastoneprotectionscheme informationboxmustoccur inaprotectedsampleentry.Whenmorethanoneoccurs, theyareequivalent,alternative,descriptionsof thesameprotection.Readersshouldchooseonetoprocess.
ISO/IEC 14496-12:2015(E)
90 ©ISO/IEC2015–Allrightsreserved
8.12.1.2 Syntax
aligned(8) class ProtectionSchemeInfoBox(fmt) extends Box('sinf') { OriginalFormatBox(fmt) original_format; SchemeTypeBox scheme_type_box; // optional SchemeInformationBox info; // optional }
8.12.2 Original Format Box
8.12.2.1 Definition
BoxTypes: ‘frma’Container: ProtectionSchemeInformationBox(‘sinf’),RestrictedSchemeInformationBox(‘rinf’),or CompleteTrackInformationBox(‘cinf’)Mandatory: Yeswhenusedinaprotectedsampleentry,inarestrictedsampleentry,or inasampleentryforanincompletetrack.Quantity: Exactlyone.
The Original Format Box ‘frma’ contains the four‐character‐code of the original un‐transformedsampledescription:
8.12.2.2 Syntax
aligned(8) class OriginalFormatBox(codingname) extends Box ('frma') { unsigned int(32) data_format = codingname; // format of decrypted, encoded data (in case of protection) // or un-transformed sample entry (in case of restriction // and complete track information) }
8.12.2.3 Semantics
data_formatisthefour‐character‐codeoftheoriginalun‐transformedsampleentry(e.g.‘mp4v’ifthestreamcontainsprotectedorrestrictedMPEG‐4visualmaterial).
8.12.3 IPMPInfoBox
(emptysub‐clause)
8.12.4 IPMP Control Box
(emptysub‐clause)
8.12.5 Scheme Type Box
8.12.5.1 Definition
BoxTypes: ‘schm’Container: ProtectionSchemeInformationBox(‘sinf’),RestrictedSchemeInformationBox(‘rinf’), orSRTPProcessbox(‘srpp‘)Mandatory:NoQuantity: Zerooronein‘sinf’,dependingontheprotectionstructure;Exactlyonein‘rinf’and‘srpp’
TheSchemeTypeBox(‘schm’)identifiestheprotectionorrestrictionscheme.
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 91
8.12.5.2 Syntax
aligned(8) class SchemeTypeBox extends FullBox('schm', 0, flags) { unsigned int(32) scheme_type; // 4CC identifying the scheme unsigned int(32) scheme_version; // scheme version if (flags & 0x000001) { unsigned int(8) scheme_uri[]; // browser uri } }
8.12.5.3 Semantics
scheme_typeisthecodedefiningtheprotectionorrestrictionscheme.scheme_versionistheversionofthescheme(usedtocreatethecontent)scheme_URI allows for the option of directing the user to a web‐page if they do not have the
scheme installed on their system. It is an absoluteURI formed as a null‐terminated string inUTF‐8characters.
8.12.6 Scheme Information Box
8.12.6.1 Definition
BoxTypes: ‘schi’Container: ProtectionSchemeInformationBox(‘sinf’),RestrictedSchemeInformationBox(‘rinf’), orSRTPProcessbox(‘srpp‘)Mandatory:NoQuantity: Zeroorone
TheSchemeInformationBoxisacontainerBoxthatisonlyinterpretedbytheschemebeingused.Anyinformationtheencryptionorrestrictionsystemneedsisstoredhere.ThecontentofthisboxisaseriesofboxeswhosetypeandformataredefinedbytheschemedeclaredintheSchemeTypeBox.
8.12.6.2 Syntax
aligned(8) class SchemeInformationBox extends Box('schi') { Box scheme_specific_data[]; }
8.13 File Delivery Format Support
8.13.1 Introduction
Files intended for transmission overALC/LCT or FLUTE are stored as items in a top‐levelmeta box(‘meta’).Theitemlocationbox(‘iloc’)specifiestheactualstoragelocationofeachitemwithinthecontainerfileaswellasthefilesizeofeachitem.Filename,contenttype(MIMEtype),etc.,ofeachitemareprovidedbyversion1oftheiteminformationbox(‘iinf’).
Pre‐computedFECreservoirsarestoredasadditionalitemsinthemetabox.Ifasourcefileissplitintoseveral source blocks, FEC reservoirs for each source block are stored as separate items. TherelationshipbetweenFEC reservoirs andoriginal source items is recorded in thepartitionentrybox('paen')locatedintheFDiteminformationbox('fiin').
Pre‐composedFilereservoirsarestoredasadditionalitemsinthecontainerfile.Ifasourcefileissplitintoseveral sourceblocks,eachsourceblock is storedasa separate itemcalledaFile reservoir.The
ISO/IEC 14496-12:2015(E)
92 ©ISO/IEC2015–Allrightsreserved
relationship betweenFile reservoirs andoriginal source items is recorded in thepartition entry box('paen')locatedintheFDiteminformationbox('fiin').
Seesubclause9.2formoredetailsontheusageofthefiledeliveryformat.
8.13.2 FD Item Information Box
8.13.2.1 Definition
BoxType: ‘fiin’Container: MetaBox(‘meta’)Mandatory:NoQuantity: Zeroorone
The FD item information box is optional, although it is mandatory for files using FD hint tracks. ItprovidesinformationonthepartitioningofsourcefilesandhowFDhinttracksarecombinedintoFDsessions. Each partition entry provides details on a particular file partitioning, FEC encoding andassociated File and FEC reservoirs. It is possible to provide multiple entries for one source file(identifiedbyitsitemID)ifalternativeFECencodingschemesorpartitioningsareusedinthefile.Allpartitionentriesareimplicitlynumberedandthefirstentryhasnumber1.
8.13.2.2 Syntax
aligned(8) class PartitionEntry extends Box('paen') { FilePartitionBox blocks_and_symbols; FECReservoirBox FEC_symbol_locations; //optional FileReservoirBox File_symbol_locations; //optional } aligned(8) class FDItemInformationBox extends FullBox('fiin', version = 0, 0) { unsigned int(16) entry_count; PartitionEntry partition_entries[ entry_count ]; FDSessionGroupBox session_info; //optional GroupIdToNameBox group_id_to_name; //optional }
8.13.2.3 Semantics
entry_count providesacountofthenumberofentriesinthefollowingarray.
Thesemanticsoftheboxesaredescribedwheretheboxesaredocumented.
8.13.3 File Partition Box
8.13.3.1 Definition
BoxType: ‘fpar’Container: PartitionEntry(‘paen’)Mandatory:YesQuantity: Exactlyone
TheFilePartitionboxidentifiesthesourcefileandprovidesapartitioningofthatfileintosourceblocksandsymbols.Furtherinformationaboutthesourcefile,e.g.,filename,contentlocationandgroupIDs,iscontainedintheItemInformationbox('iinf'),wheretheItemInformationentrycorrespondingtothe item IDof the source file isofversion1 and includesaFileDelivery Item InformationExtension
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 93
('fdel').Version1ofFilePartitionBoxshouldonlybeusedwhensupportforlargeitem_IDorentry_countvalues(exceeding65535)isrequiredorexpectedtoberequired.
8.13.3.2 Syntax
aligned(8) class FilePartitionBox extends FullBox('fpar', version, 0) { if (version == 0) { unsigned int(16) item_ID; } else { unsigned int(32) item_ID; } unsigned int(16) packet_payload_size; unsigned int(8) reserved = 0; unsigned int(8) FEC_encoding_ID; unsigned int(16) FEC_instance_ID; unsigned int(16) max_source_block_length; unsigned int(16) encoding_symbol_length; unsigned int(16) max_number_of_encoding_symbols; string scheme_specific_info; if (version == 0) { unsigned int(16) entry_count; } else { unsigned int(32) entry_count; } for (i=1; i <= entry_count; i++) { unsigned int(16) block_count; unsigned int(32) block_size; } }
8.13.3.3 Semantics
item_ID referencestheitemintheitemlocationbox('iloc')thatthefilepartitioningappliesto.
packet_payload_size gives the target ALC/LCT or FLUTE packet payload size of thepartitioningalgorithm.NotethatUDPpacketpayloadsarelarger,astheyalsocontainALC/LCTorFLUTEheaders.
FEC_encoding_ID identifies theFECencodingschemeand issubject to IANAregistration(seeRFC5052). Note that i) value zero corresponds to the "Compact No‐Code FEC scheme" alsoknown as "Null‐FEC" (RFC 3695); ii) value one corresponds to the “MBMS FEC” (3GPP TS26.346); iii) for values in the range of 0 to 127, inclusive, the FEC scheme is Fully‐Specified,whereasforvaluesintherangeof128to255,inclusive,theFECschemeisUnder‐Specified.
FEC_instance_ID providesamorespecificidentificationoftheFECencoderbeingusedforanUnder‐SpecifiedFECscheme.ThisvalueshouldbesettozeroforFully‐SpecifiedFECschemesand shall be ignoredwhen parsing a filewithFEC_encoding_ID in the range of 0 to 127,inclusive.FEC_instance_IDisscopedbytheFEC_encoding_ID.SeeRFC5052forfurtherdetails.
max_source_block_length givesthemaximumnumberofsourcesymbolspersourceblock.encoding_symbol_length gives the size (in bytes) of one encoding symbol. All encoding
symbolsofoneitemhavethesamelength,exceptthelastsymbolwhichmaybeshorter.max_number_of_encoding_symbols gives the maximum number of encoding symbols that
canbegeneratedforasourceblock forthoseFECschemes inwhichthemaximumnumberofencodingsymbolsisrelevant,suchasFECencodingID129definedinRFC5052.ForthoseFECschemesinwhichthemaximumnumberofencodingsymbolsisnotrelevant,thesemanticsofthisfieldisunspecified.
scheme_specific_info is a base64‐encoded null‐terminated string of the scheme‐specificobject transfer information (FEC‐OTI‐Scheme‐Specific‐Info). The definition of the informationdependsontheFECencodingID.
ISO/IEC 14496-12:2015(E)
94 ©ISO/IEC2015–Allrightsreserved
entry_count givesthenumberofentriesinthelistof(block_count,block_size)pairsthatprovides a partitioning of the source file. Starting from the beginning of the file, each entryindicateshowthenextsegmentofthefileisdividedintosourceblocksandsourcesymbols.
block_count indicatesthenumberofconsecutivesourceblocksofsizeblock_size.block_size indicates the size of a block (in bytes). A block_size that is not a multiple of the
encoding_symbol_lengthsymbolsize indicateswithCompactNo‐CodeFECthatthe lastsourcesymbolsincludespaddingthatisnotstoredintheitem.WithMBMSFEC(3GPPTS26.346)thepaddingmayextendacrossmultiplesymbolsbutthesizeofpaddingshouldneverbemorethanencoding_symbol_length.
8.13.4 FEC Reservoir Box
8.13.4.1 Definition
BoxType: ‘fecr’Container: PartitionEntry(‘paen’)Mandatory:NoQuantity: ZeroorOne
TheFECreservoirboxassociatesthesourcefileidentifiedinthefilepartitionbox('fpar')withFECreservoirsstoredasadditionalitems.ItcontainsalistthatstartswiththefirstFECreservoirassociatedwiththefirstsourceblockofthesourcefileandcontinuessequentiallythroughthesourceblocksofthesource file. Version 1 ofFECReservoirBox should only be usedwhen support for largeitem_IDvaluesandentry_count(exceeding65535)isrequiredorexpectedtoberequired.
8.13.4.2 Syntax
aligned(8) class FECReservoirBox extends FullBox('fecr', version, 0) { if (version == 0) { unsigned int(16) entry_count; } else { unsigned int(32) entry_count; } for (i=1; i <= entry_count; i++) { if (version == 0) { unsigned int(16) item_ID; } else { unsigned int(32) item_ID; } unsigned int(32) symbol_count; } }
8.13.4.3 Semantics
entry_count givesthenumberofentriesinthefollowinglist.Anentrycounthereshouldmatchthetotalnumberofblocksinthecorrespondingfilepartitionbox.
item_ID indicatesthelocationoftheFECreservoirassociatedwithasourceblock.symbol_count indicatesthenumberofrepairsymbolscontainedintheFECreservoir.
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 95
8.13.5 FD Session Group Box
8.13.5.1 Definition
BoxType: ‘segr’Container: FDInformationBox(‘fiin’)Mandatory:NoQuantity: ZeroorOne
TheFDsessiongroupbox isoptional,althoughit ismandatory for filescontainingmorethanoneFDhint track. It contains a list of sessions aswell as all file groups and hint tracks that belong to eachsession.AnFDsessionsendssimultaneouslyoverallFDhinttracks(channels)thatarelistedintheFDsessiongroupboxforaparticularFDsession.
Onlyonesessiongroupshouldbeprocessedatanytime.Thefirst listedhinttrackinasessiongroupspecifies the base channel. If the server has no preference between the session groups, the defaultchoiceshouldbethefirstsessiongroup.ThegroupIDsofallfilegroupscontainingthefilesreferencedbythehinttracksshallbeincludedinthelistoffilegroups.ThefilegroupIDscaninturnbetranslatedintofilegroupnames(usingthegroupIDtonamebox)thatcanbeincludedbytheserverinFDTs.
8.13.5.2 Syntax
aligned(8) class FDSessionGroupBox extends Box('segr') { unsigned int(16) num_session_groups; for(i=0; i < num_session_groups; i++) { unsigned int(8) entry_count; for (j=0; j < entry_count; j++) { unsigned int(32) group_ID; } unsigned int(16) num_channels_in_session_group; for(k=0; k < num_channels_in_session_group; k++) { unsigned int(32) hint_track_id; } } }
8.13.5.3 Semantics
num_session_groups specifiesthenumberofsessiongroups.entry_count givesthenumberofentriesinthefollowinglistcomprisingallfilegroupsthatthe
session group complies with. The session group contains all files included in the listed filegroupsasspecifiedbytheiteminformationentryofeachsourcefile.NotethattheFDTforthesessiongroupshouldonlycontainthosegroupsthatarelistedinthisstructure.
group_ID indicatesafilegroupthatthesessiongroupcomplieswith.num_channels_in_session_groups specifies the number of channels in the session group.
Thevalueofnum_channels_in_session_groupsshallbeapositiveinteger.hint_track_ID specifiesthetrackIDoftheFDhinttrackbelongingtoaparticularsessiongroup.
NotethatoneFDhinttrackcorrespondstooneLCTchannel.
ISO/IEC 14496-12:2015(E)
96 ©ISO/IEC2015–Allrightsreserved
8.13.6 Group ID to Name Box
8.13.6.1 Definition
BoxType: ‘gitn’Container: FDInformationBox(‘fiin’)Mandatory:NoQuantity: ZeroorOne
The Group ID toName box associates file group names to file group IDs used in the version 1 iteminformationentriesintheiteminformationbox('iinf').
8.13.6.2 Syntax
aligned(8) class GroupIdToNameBox extends FullBox('gitn', version = 0, 0) { unsigned int(16) entry_count; for (i=1; i <= entry_count; i++) { unsigned int(32) group_ID; string group_name; } }
8.13.6.3 Semantics
entry_count givesthenumberofentriesinthefollowinglist.group_ID indicatesafilegroup.group_name isanull‐terminatedstringinUTF‐8characterscontainingafilegroupname.
8.13.7 File Reservoir Box
8.13.7.1 Definition
BoxType: ‘fire’Container: PartitionEntry(‘paen’)Mandatory: NoQuantity: ZeroorOne
The File reservoir box associates the source file identified in the file partition box ('fpar') with Filereservoirsstoredasadditionalitems.ItcontainsalistthatstartswiththefirstFilereservoirassociatedwiththefirstsourceblockofthesourcefileandcontinuessequentiallythroughthesourceblocksofthesourcefile.Version1ofFileReservoirBox shouldonlybeusedwhensupportforlargeitem_IDorentry_countvalues(exceeding65535)isrequiredorexpectedtoberequired.
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 97
8.13.7.2 Syntax
aligned(8) class FileReservoirBox extends FullBox('fire', version, 0) { if (version == 0) { unsigned int(16) entry_count; } else { unsigned int(32) entry_count; } for (i=1; i <= entry_count; i++) { if (version == 0) { unsigned int(16) item_ID; } else { unsigned int(32) item_ID; } unsigned int(32) symbol_count; } }
8.13.7.3 Semantics
entry_count givesthenumberofentriesinthefollowinglist.Anentrycounthereshouldmatchthetotalnumberorblocksinthecorrespondingfilepartitionbox.
item_ID indicatesthelocationoftheFilereservoirassociatedwithasourceblock.symbol_count indicatesthenumberofsourcesymbolscontainedintheFilereservoir.
8.14 Sub tracks
8.14.1 Introduction
Subtracksareusedtoassignpartsoftrackstoalternateandswitchgroupsinthesamewayas(entire)trackscanbeassignedtoalternateandswitchgroupstoindicatewhetherthosetracksarealternativesto each other andwhether itmakes sense to switch between them during a session. Sub tracks aresuitableforlayeredmedia,e.g.,SVCandMVC,wheremediaalternativesoftenareincommensuratewithtrackstructures.Bydefiningalternateandswitchgroupsatsub‐tracklevelitispossibletouseexistingrules formediaselectionandswitching for such layeredcodecs.Theover‐all syntax isgeneric forallkinds of media and backward compatible with track‐level definitions. Sub‐track level alternate andswitchgroupsusethesamenumberingastracklevelgroups.Thenumberingsareglobaloveralltrackssuchthatgroupscanbedefinedacrosstrackandsub‐trackboundaries.
Inordertodefinesubtracks,media‐specificdefinitionsarerequired.DefinitionsforSVCandMVCarespecifiedintheAVCfileformat(ISO/IEC14496‐15).AnotherwayistodefinesamplegroupsandmapthemtosubtracksusingtheSubTrackSampleGroupboxdefinedhere.Thesyntaxcanalsobeextendedtoincludeothermedia‐specificdefinitions.
ForeachsubtrackthatshallbedefinedaSubTrackboxshallbeincludedintheUserDataboxofthecorrespondingtrack.TheSubTrackboxcontainsobjectsthatdefineandprovideinformationaboutasubtrackinthesametrack.TheTrackSelectionboxforthissametrackisalreadylocatedhere.
8.14.2 Backward compatibility
Thedefault is toassignalternateandswitchgroups to0 (zero) for (entire) tracks,whichmeans thatthere is no information on alternate and/or switch groups for those (entire) tracks. However, filereadersthatareawareofsub‐trackdefinitionswillbeableto findsub‐trackinformationonalternateandswitchgroupsevenifthetrackindicationissetto0.Thiswayitispossibletoindicatethatafilecan
ISO/IEC 14496-12:2015(E)
98 ©ISO/IEC2015–Allrightsreserved
beusedby legacy readersby including theappropriatebrand in the file typebox.A file creator thatrequiresareadertobeawareofsub‐trackinformationshouldnotincludelegacybrands.
Thesamemethodofassigningsubtrackinformationcanalsobeappliedifallpartsofatrackexceptasub trackbelong to thesamealternateor switchgroup.Then theoveralldefinitionscanbemadeontrack level as usual and specific assignments canbemade at sub‐track level. For sub trackswithoutspecific assignments, track level assignments apply by default. As before, if a file creator requires areader to be aware of sub‐track information it should not include legacy brands (which wouldotherwiseindicatethatsubtrackinformationcanbeskipped).
8.14.3 Sub Track box
8.14.3.1 Definition
BoxType: ‘strk’Container: UserDatabox(‘udta’)ofthecorrespondingTrackbox(‘trak’)Mandatory: NoQuantity: Zeroormore
Thisboxcontainsobjectsthatdefineandprovideinformationaboutasubtrackinthepresenttrack.
8.14.3.2 Syntax
aligned(8) class SubTrack extends Box(‘strk’) { }
8.14.4 Sub Track Information box
8.14.4.1 Definition
BoxType: ‘stri’Container: SubTrackbox(‘strk’)Mandatory: YesQuantity: One
8.14.4.2 Syntax
aligned(8) class SubTrackInformation extends FullBox(‘stri’, version = 0, 0){ template int(16) switch_group = 0; template int(16) alternate_group = 0; template unsigned int(32) sub_track_ID = 0; unsigned int(32) attribute_list[]; // to the end of the box }
8.14.4.3 Semantics
switch_group isanintegerthatspecifiesagrouporcollectionoftracksand/orsubtracks.Ifthisfieldis0(defaultvalue),thenthereisnoinformationonwhetherthesubtrackcanbeusedforswitching during playing or streaming. If this integer is not 0 it shall be the same for tracksand/orsubtracksthatcanbeusedforswitchingbetweeneachother.Tracksthatbelongtothesameswitchgroupshallbelongtothesamealternategroup.Aswitchgroupmayhaveonlyonemember.
alternate_group isanintegerthatspecifiesagrouporcollectionoftracksand/orsubtracks.Ifthis field is 0 (default value), then there is no information on possible relations to othertracks/sub‐tracks.Ifthisfieldisnot0,itshouldbethesamefortracks/sub‐tracksthatcontain
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 99
alternatedata foroneanother anddifferent for tracks/sub‐tracksbelonging todifferent suchgroups.Onlyonetrack/sub‐trackwithinanalternategroupshouldbeplayedorstreamedatanyonetime.
sub_track_ID isaninteger.Anon‐zerovalueuniquelyidentifiesthesubtracklocallywithinthetrack.Azerovalue(default)meansthatsubtrackIDisnotassigned.
attribute_list isalist,totheendofthebox,ofattributes.Theattributesinthislistshouldbeused as descriptions of sub tracks or differentiating criteria for tracks and sub tracks in thesamealternateorswitchgroup.
Thefollowingattributesaredescriptive:
Name Attribute
Description
Temporalscalability
‘tesc’
Thesub‐trackcanbetemporallyscaled.
Fine‐grainSNRscalability
‘fgsc’
Thesub‐trackcanbescaledintermsofquality.
Coarse‐grainSNRscalability
‘cgsc’
Thesub‐trackcanbescaledintermsofquality.
Spatialscalability ‘spsc’
Thesub‐trackcanbespatiallyscaled.
Region‐of‐interestscalability
‘resc’
Thesub‐trackcanberegion‐of‐interestscaled.
Viewscalability ‘vwsc’
The sub‐track can be scaled in terms of number ofviews.
Thefollowingattributesaredifferentiating:
Name Attribute
Pointer
Bitrate ‘bitr’
Total size of the samples in the track divided by thedurationinthetrackheaderbox
Framerate ‘frar’
Numberofsamplesinthetrackdividedbydurationinthetrackheaderbox
Numberofviews ‘nvws’
Numberofviewsinthesubtrack
ISO/IEC 14496-12:2015(E)
100 ©ISO/IEC2015–Allrightsreserved
8.14.5 Sub Track Definition box
8.14.5.1 Definition
BoxType: ‘strd’Container: SubTrackbox(‘strk’)Mandatory: YesQuantity: One
Thisboxcontainsobjectsthatprovideadefinitionofthesubtrack.
8.14.5.2 Syntax
aligned(8) class SubTrackDefinition extends Box(‘strd’) { }
8.14.6 Sub Track Sample Group box
8.14.6.1 Definition
BoxType: ‘stsg’Container: SubTrackDefinitionbox(‘strd’)Mandatory: NoQuantity: Zeroormore
Thisboxdefinesasubtrackasoneormoresamplegroupsbyreferringtothecorrespondingsamplegroupdescriptionsdescribingthesamplesofeachgroup.
8.14.6.2 Syntax
aligned(8) class SubTrackSampleGroupBox extends FullBox(‘stsg’, 0, 0){ unsigned int(32) grouping_type; unsigned int(16) item_count; for(i = 0; i< item_count; i++) unsigned int(32) group_description_index; }
8.14.6.3 Semantics
grouping_type isanintegerthatidentifiesthesamplegrouping.ThevalueshallbethesameasinthecorrespondingSampletoGroupandSampleGroupDescriptionboxes.
item_count countsthenumberofsamplegroupslistedinthisbox.group_description_index isanintegerthatgivestheindexofthesamplegroupentrywhich
describesthesamplesinthegroup.
8.15 Post-decoder requirements on media
8.15.1 General
Inordertohandlesituationswherethefileauthorrequirescertainactionsontheplayerorrenderer,this Subclause specifies a mechanism that enables players to simply inspect a file to find out suchrequirementsforrenderingabitstreamandstopslegacyplayersfromdecodingandrenderingfilesthatrequirefurtherprocessing.Themechanismappliestoanytypeofvideocodec.InparticularitappliestoAVCandforthiscasespecificsignallingisdefinedintheAVCfileformat(ISO/IEC14496‐15)thatallows
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 101
a file author to list occurring SEI message IDs and distinguish between required and non‐requiredactionsfortherenderingprocess.
Themechanism is similar to the contentprotection transformationwhere sample entries arehiddenbehindgenericsampleentries,‘encv’,‘enca’,etc.,indicatingencryptedorencapsulatedmedia.Theanalogous mechanism for restricted video uses a transformation with the generic sample entry‘resv’.Themethodmaybeappliedwhenthecontentshouldonlybedecodedbyplayersthatpresentitcorrectly.
8.15.2 Transformation
Themethodisappliedasfollows:
1) Thefour‐character‐codeofthesampleentryisreplacedbyanewsampleentrycode‘resv’meaningrestrictedvideo.
2) A Restricted Scheme Info box is added to the sample description, leaving all other boxesunmodified.
3) The original sample entry type is stored within an Original Format box contained in theRestrictedSchemeInfobox.
ARestrictedSchemeInfoBox is formattedexactlythesameasaProtectionSchemeInfoBox,exceptthatisusestheidentifier‘rinf’insteadof‘sinf’ (see below).
TheoriginalsampleentrytypeiscontainedintheOriginalFormatboxlocatedintheRestrictedSchemeInfobox(inanidenticalwaytotheProtectionSchemeInfoboxforencryptedmedia).
The exact nature of the restriction is defined in theSchemeTypeBox, and the data needed for thatschemeisstoredintheSchemeInformationBox,again,analogouslytoprotectioninformation.
Notethatrestrictionandprotectioncanbeappliedatthesametime.Theorderofthetransformationsfollows from the four‐character code of the sample entry. For instance, if the sample entry type is‘resv’,undoingtheabovetransformationmayresultinasampleentrytype‘encv’,indicatingthatthemediaisprotected.
Notethatifthefileauthoronlywantstoprovideadvisoryinformationwithoutstoppinglegacyplayersfromplaying the file, theRestrictedScheme Infoboxmaybeplaced inside the sampleentrywithoutapplyingthefour‐character‐codetransformation.InthiscaseitisnotnecessarytoincludeanOriginalFormatbox.
ISO/IEC 14496-12:2015(E)
102 ©ISO/IEC2015–Allrightsreserved
8.15.3 Restricted Scheme Information box
8.15.3.1 Definition
BoxTypes: ‘rinf’Container: RestrictedSampleEntryorSampleEntryMandatory:YesQuantity: Exactlyone
TheRestrictedSchemeInformationBoxcontainsall the informationrequiredbothtounderstand therestrictionschemeappliedanditsparameters.Italsodocumentstheoriginal(un‐transformed)sampleentrytypeofthemedia.TheRestrictedSchemeInformationBoxisacontainerBox.Itismandatoryinasampleentrythatusesacodeindicatingarestrictedstream,i.e.,‘resv’.
Whenusedinarestrictedsampleentry,thisboxmustcontaintheoriginalformatboxtodocumenttheoriginal sample entry type and a Scheme type box. A Scheme Information box may be requireddependingontherestrictionscheme.
8.15.3.2 Syntax
aligned(8) class RestrictedSchemeInfoBox(fmt) extends Box('rinf') { OriginalFormatBox(fmt) original_format; SchemeTypeBox scheme_type_box; SchemeInformationBox info; // optional }
8.15.4 Scheme for stereoscopic video arrangements
8.15.4.1 General
Whenstereo‐codedvideo framesaredecoded, thedecoded frameseither containa representationoftwospatiallypackedconstituentframesthatformastereopair(framepacking)oronlyoneviewofastereo pair (left and right views in different tracks). Restrictions due to stereo‐coded video arecontainedintheStereoVideobox.
TheSchemeType‘stvi’(stereoscopicvideo)isused.
8.15.4.2 Stereo video box
8.15.4.2.1 Definition
BoxType: `stvi’Container: SchemeInformationbox(‘schi’)Mandatory: Yes(whentheSchemeTypeis‘stvi’)Quantity: One
TheStereoVideobox isused to indicate thatdecoded frameseither contain a representationof twospatiallypackedconstituentframesthatformastereopairorcontainoneoftwoviewsofastereopair.TheStereoVideoboxshallbepresentwhentheSchemeTypeis‘stvi’.
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 103
8.15.4.2.2 Syntax
aligned(8) class StereoVideoBox extends extends FullBox(‘stvi’, version = 0, 0) { template unsigned int(30) reserved = 0; unsigned int(2) single_view_allowed; unsigned int(32) stereo_scheme; unsigned int(32) length; unsigned int(8)[length] stereo_indication_type; Box[] any_box; // optional }
8.15.4.2.3 Semantics
single_view_allowed is an integer. A zero value indicates that the content may only bedisplayed on stereoscopic displays.When (single_view_allowed & 1) is equal to 1, it isallowed to display the right view on a monoscopic single‐view display. When(single_view_allowed & 2) is equal to 2, it is allowed to display the left view on amonoscopicsingle‐viewdisplay.
stereo_scheme isanintegerthatindicatesthestereoarrangementschemeusedandthestereoindicationtypeaccordingto theusedscheme.The followingvalues forstereo_scheme arespecified:1: the frame packing scheme as specified by the Frame packing arrangement Supplemental
EnhancementInformationmessageofISO/IEC14496‐10[ISO/IEC14496‐10]2: thearrangementtypeschemeasspecifiedinAnnexLofISO/IEC13818‐2[ISO/IEC13818‐
2:2000/Amd.4]3: thestereoschemeasspecifiedinISO/IEC23000‐11forbothframe/servicecompatibleand
2D/3Dmixedservices.Othervaluesofstereo_schemearereserved.
length indicatesthenumberofbytesforthestereo_indication_typefield.stereo_indication_type indicatesthestereoarrangementtypeaccordingtotheusedstereo
indicationscheme.Thesyntaxandsemanticsofstereo_indication_typedependon thevalue of stereo_scheme. The syntax and semantics for stereo_indication_type forthefollowingvaluesofstereo_schemearespecifiedasfollows:stereo_scheme equal to 1: The value of length shall be 4 and
stereo_indication_type shall be unsigned int(32) which contains theframe_packing_arrangement_type value from Table D‐8 of ISO/IEC14496‐10 [ISO/IEC14496‐10](‘Definitionofframe_packing_arrangement_type’).
stereo_scheme equal to 2: The value of length shall be 4 andstereo_indication_typeshallbeunsigned int(32)whichcontainsthetypevaluefrom Table L‐1 of ISO/IEC13818‐2 [ISO/IEC13818‐2:2000/Amd.4] (‘Definition ofarrangement_type’).
stereo_scheme equal to 3: The value of length shall be 2 andstereo_indication_type shall contain two syntax elements ofunsigned int(8).The first syntax element shall contain the stereoscopic composition type from Table4 ofISO/IEC23000‐11:2009.Theleastsignificantbitofthesecondsyntaxelementshallcontainthe value ofis_left_first as specified in 8.4.3 of ISO/IEC23000‐11:2009, while theotherbitsarereservedandshallbesetto0.
ISO/IEC 14496-12:2015(E)
104 ©ISO/IEC2015–Allrightsreserved
ThefollowingapplieswhentheStereoVideoboxisused:
IntheTrackHeaderbox
widthandheightspecifythevisualpresentationsizeofasingleviewafterunpacking.
IntheSampleDescriptionbox
frame_count shall be 1, because the decoder physically outputs a single frame. In otherwords,theconstituentframesincludedwithinaframe‐packedpicturearenotdocumentedbyframe_count.
width andheightdocument thepixel countsof a frame‐packedpicture (andnot thepixelcountsofasingleviewwithinaframe‐packedpicture).
the Pixel Aspect Ratio box documents the pixel aspect ratio of each viewwhen the view isdisplayed on amonoscopic single‐view display. For example, inmany spatial frame packingarrangements,thePixelAspectRatioboxthereforeindicates2:1or1:2pixelaspectratio,asthespatialresolutionofoneviewof frame‐packedvideo is typicallyhalvedalongonecoordinateaxiscomparedtothatofthesingle‐viewvideoofthesameformat.
8.16 Segments
8.16.1 Introduction
Mediapresentationsmaybedividedintosegmentsfordelivery,forexample,itispossible(e.g.inHTTPstreaming) to form files that contain a segment – or concatenated segments – which would notnecessarilyformISObasemediafileformatcompliantfiles(e.g.theydonotcontainamoviebox).
ThisSubclausedefinesspecificboxesthatmaybeusedinsuchsegments.
8.16.2 Segment Type Box
BoxType: `styp’Container: FileMandatory: NoQuantity: Zeroormore
Ifsegmentsarestoredinseparatefiles(e.g.onastandardHTTPserver)itisrecommendedthatthese‘segment files’ contain a segment‐typebox,whichmust be first if present, to enable identification ofthosefiles,anddeclarationofthespecificationswithwhichtheyarecompliant.
Asegmenttypehasthesameformatasan'ftyp'box[4.3],exceptthatittakestheboxtype'styp'.Thebrandswithinitmayincludethesamebrandsthatwereincludedinthe'ftyp'boxthatprecededthe‘moov’box,andmayalsoincludeadditionalbrandstoindicatethecompatibilityofthissegmentwithvariousspecification(s).
Validsegment typeboxesshallbe the firstbox inasegment.Segment typeboxesmayberemoved ifsegmentsareconcatenated(e.g.toformafullfile),butthisisnotrequired.Segmenttypeboxesthatarenotfirstintheirfilesmaybeignored.
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 105
8.16.3 Segment Index Box
8.16.3.1 Definition
BoxType: `sidx’Container: FileMandatory: NoQuantity: Zeroormore
The Segment Index box ('sidx') provides a compact index of one media stream within the mediasegmenttowhichitapplies.Itisdesignedsothatitcanbeusednotonlywithmediaformatsbasedonthis specification (i.e. segments containing sample tables ormovie fragments), but also othermediaformats (for example, MPEG‐2 Transport Streams [ISO/IEC13818‐1]). For this reason, the formaldescription of the box given here is deliberately generic, and then at the end of this Subclause thespecificdefinitionsforsegmentsusingmoviefragmentsaregiven.
Each Segment Index box documents how a (sub)segment is divided into one or more subsegments(whichmaythemselvesbefurthersubdividedusingSegmentIndexboxes).
Asubsegmentisdefinedasatimeintervalofthecontaining(sub)segment,andcorrespondstoasinglerange of bytes of the containing (sub)segment. The durations of all the subsegments sum to thedurationofthecontaining(sub)segment.
Each entry in the Segment Index box contains a reference type that indicateswhether the referencepoints directly to themedia bytes of a referenced leaf subsegment, or to a Segment Index box thatdescribes how the referenced subsegment is further subdivided; as a result, the segment may beindexed in a ‘hierarchical’ or ‘daisy‐chain’ or other form by documenting time and byte offsetinformationforotherSegmentIndexboxesapplyingtoportionsofthesame(sub)segment.
EachSegmentIndexboxprovidesinformationaboutasinglemediastreamoftheSegment,referredtoasthereferencestream.Ifprovided,thefirstSegmentIndexboxinasegment, foragivenmediastream,shalldocumenttheentiretyofthatmediastreaminthesegment,andshallprecedeanyotherSegmentIndexboxinthesegmentforthesamemediastream.
Ifasegmentindexispresentforat leastonemediastreambutnotallmediastreamsinthesegment,thennormallyamediastreaminwhichnoteveryaccessunitisindependentlycoded,suchasvideo,isselectedtobeindexed.Foranymediastreamforwhichnosegmentindexispresent,referredtoasnon‐indexedstream,themediastreamassociatedwiththefirstSegmentIndexboxinthesegmentservesasareferencestreaminasensethatitalsodescribesthesubsegmentsforanynon‐indexedmediastream.
NOTE1Furtherrestrictionsmaybespecifiedinderivedspecifications.
SegmentIndexboxesmaybeinlineinthesamefileastheindexedmediaor,insomecases,inaseparatefilecontainingonlyindexinginformation.
A Segment Index box contains a sequence of references to subsegments of the (sub)segmentdocumentedbythebox.Thereferencedsubsegmentsarecontiguousinpresentationtime.Similarly,thebytesreferredtobyaSegmentIndexboxarealwayscontiguousinboththemediafile,andtheseparate
ISO/IEC 14496-12:2015(E)
106 ©ISO/IEC2015–Allrightsreserved
indexsegment,orinthesinglefileifindexesareplacedwithinthemediafile.Thereferencedsizegivesthecountofthenumberofbytesinthematerialreferenced.
NOTE2Amediasegmentmaybeindexedbymorethanone“top‐level”SegmentIndexboxthatare independentofeach other, each ofwhich indexes onemedia streamwithin themedia segment. In segments containingmultiplemediastreamsthereferencedbytesmaycontainmediafrommultiplestreams,eventhoughtheSegmentIndexboxprovidestiminginformationforonlyonemediastream.
InthefilecontainingtheSegmentIndexbox,theanchorpointforaSegmentIndexboxisthefirstbyteafterthatbox.Iftherearetwofiles,theanchorpointinthemediafileisthebeginningofthetop‐levelsegment(i.e.thebeginningofthesegmentfileifeachsegmentisstoredinaseparatefile).Thematerialinthefilecontainingmedia(whichmayalsobethefilethatcontainsthesegmentindexboxes)startsattheindicatedoffsetfromtheanchorpoint.Iftherearetwofiles,thematerialintheindexfilestartsattheanchorpoint,i.e.immediatelyfollowingtheSegmentIndexbox.
Withinthetwoconstraints(a)that,intime,thesubsegmentsarecontiguous,thatis,eachentryintheloop is consecutive from the immediately preceding one and (b) within a given file (integrated file,mediafile,or indexsidefile)thereferencedbytesarecontiguous,thereareanumberofpossibilities,including:
1) a reference to a segment index box may include, in its byte count, immediately followingSegmentIndexboxesthatdocumentsubsegments;
2) inan integrated file,using thefirst_offset field, it ispossible toseparateSegment Indexboxesfromthemediathattheyreferto;
3) inan integratedfile, it ispossibleto locateSegmentIndexboxes forsubsegmentsclosetothemediatheyindex;
4) whenaseparatefilecontainingSegmentIndexesisused,itispossiblefortheloopentriestobeof‘mixedtype’,sometoSegmentIndexboxesintheindexsegment,sometomediasubsegmentsinthemediafile.
NOTE3Profilesmaybeusedtorestricttheplacementofsegmentindexes,ortheoverallcomplexityoftheindexing.
TheSegmentIndexboxdocumentsthepresenceofStreamAccessPoints(SAPs),asspecifiedinAnnexI,inthereferencedsubsegments.TheannexspecifiescharacteristicsofSAPs,suchasISAU,ISAPandTSAP,aswellasSAPtypes,whichareallusedinthesemanticsbelow.AsubsegmentstartswithaSAPwhenthesubsegmentcontainsaSAP,andforthefirstSAP,ISAUistheindexofthefirstaccessunitthatfollowsISAP,andISAPiscontainedinthesubsegment.
Forsegmentsbasedonthisspecification(i.e.basedonmoviesampletablesormoviefragments):
anaccessunitisasample; a subsegment is a self‐contained set of one or more consecutive movie fragments; a self‐
containedsetcontainsoneormoreMovieFragmentboxeswiththecorrespondingMediaDatabox(es),andaMediaDataBoxcontainingdatareferencedbyaMovieFragmentBoxmustfollowthat Movie Fragment box and precede the next Movie Fragment box containing informationaboutthesametrack;
SegmentIndexboxesshallbeplacedbeforesubsegmentmaterialtheydocument,thatis,beforeanyMovieFragment(‘moof’)boxofthedocumentedmaterialofthesubsegment;
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 107
streamsaretracksinthefileformat,andstreamIDsaretrackIDs; asubsegmentcontainsastreamaccesspointifatrackfragmentwithinthesubsegmentforthe
trackwithtrack_IDequaltoreference_IDcontainsastreamaccesspoint; initialisationdataforSAPsconsistsofthemoviebox; presentation times are in the movie timeline, that is they are composition times after the
applicationofanyeditlistforthetrack; theISAPisapositionexactlypointingtothestartofatop‐levelbox,suchasamoviefragmentbox
'moof'; aSAPoftype1ortype2isindicatedasasyncsample,orbysample_is_non_sync_sample
equalto0inthemoviefragment; aSAPoftype3ismarkedasamemberofasamplegroupoftype‘rap ‘; aSAPoftype4ismarkedasamemberofasamplegroupoftype‘roll‘wherethevalueofthe
roll_distancefieldisgreaterthan0.
NOTE4ForSAPsoftype5and6,nospecificsignallingintheISObasemediafileformatissupported.
8.16.3.2 Syntax
aligned(8) class SegmentIndexBox extends FullBox(‘sidx’, version, 0) { unsigned int(32) reference_ID; unsigned int(32) timescale; if (version==0) { unsigned int(32) earliest_presentation_time; unsigned int(32) first_offset; } else { unsigned int(64) earliest_presentation_time; unsigned int(64) first_offset; } unsigned int(16) reserved = 0; unsigned int(16) reference_count; for(i=1; i <= reference_count; i++) { bit (1) reference_type; unsigned int(31) referenced_size; unsigned int(32) subsegment_duration; bit(1) starts_with_SAP; unsigned int(3) SAP_type; unsigned int(28) SAP_delta_time; } }
8.16.3.3 Semantics
reference_ID provides the stream ID for the reference stream; if this Segment Index box isreferencedfroma“parent”SegmentIndexbox,thevalueofreference_IDshallbethesameasthevalueofreference_IDofthe“parent”SegmentIndexbox;
timescale providesthetimescale,intickspersecond,forthetimeanddurationfieldswithinthisbox;itisrecommendedthatthismatchthetimescaleofthereferencestreamortrack;forfilesbasedonthisspecification,thatisthetimescalefieldoftheMediaHeaderBoxofthetrack;
earliest_presentation_timeistheearliestpresentationtimeofanycontentinthereferencestream in the first subsegment, in the timescale indicated in the timescale field; the earliestpresentation time isderived frommedia inaccessunits,orpartsof accessunits, thatarenotomittedbyaneditlist(ifany);
first_offsetisthedistanceinbytes,inthefilecontainingmedia,fromtheanchorpoint,tothefirstbyteoftheindexedmaterial;
ISO/IEC 14496-12:2015(E)
108 ©ISO/IEC2015–Allrightsreserved
reference_countprovidesthenumberofreferenceditems;reference_type:whensetto1indicatesthatthereferenceistoasegmentindex(‘sidx’)box;
otherwisethereferenceistomediacontent(e.g.,inthecaseoffilesbasedonthisspecification,toamoviefragmentbox);ifaseparateindexsegmentisused,thenentrieswithreferencetype1areintheindexsegment,andentrieswithreferencetype0areinthemediafile;
referenced_size:thedistanceinbytesfromthefirstbyteofthereferenceditemtothefirstbyteofthenextreferenceditem,orinthecaseofthelastentry,theendofthereferencedmaterial;
subsegment_duration:whenthereferenceistoSegmentIndexbox,thisfieldcarriesthesumofthesubsegment_duration fields in that box;when the reference is to a subsegment, thisfield carries the difference between the earliest presentation time of any access unit of thereferencestreaminthenextsubsegment(orthefirstsubsegmentofthenextsegment,ifthisisthelastsubsegmentofthesegment,ortheendpresentationtimeofthereferencestreamifthisisthelastsubsegmentofthestream)andtheearliestpresentationtimeofanyaccessunitofthereference stream in the referenced subsegment; the duration is in the same units asearliest_presentation_time;
starts_with_SAP indicates whether the referenced subsegments start with a SAP. For thedetailedsemanticsofthisfieldincombinationwithotherfields,seethetablebelow.
SAP_type indicates a SAP type as specified in AnnexI, or the value 0. Other type values arereserved.Forthedetailedsemanticsofthisfieldincombinationwithotherfields,seethetablebelow.
SAP_delta_time:indicatesTSAPofthefirstSAP,indecodingorder,inthereferencedsubsegmentfor the reference stream. If the referenced subsegments do not contain a SAP,SAP_delta_timeisreservedwiththevalue0;otherwiseSAP_delta_timeisthedifferencebetweentheearliestpresentationtimeofthesubsegment,andtheTSAP(notethatthisdifferencemaybezero,inthecasethatthesubsegmentstartswithaSAP).
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 109
Table 4 — Semantics of SAP and reference type combinations
starts_with_SAP SAP_type reference_type Meaning
0 0 0or1 NoinformationofSAPsisprovided.
0 1to6,inclusive
0(media) Thesubsegmentcontains(butmaynotstartwith)aSAPofthegivenSAP_typeandthefirstSAPofthegivenSAP_typecorrespondstoSAP_delta_time.
0 1to6,inclusive
1(index) AllthereferencedsubsegmentscontainaSAPofatmostthegivenSAP_typeandnoneoftheseSAPsisofanunknowntype.
1 0 0(media) ThesubsegmentstartswithaSAPofanunknowntype.
1 0 1(index) AllthereferencedsubsegmentsstartwithaSAPwhichmaybeofanunknowntype
1 1to6,inclusive
0(media) ThereferencedsubsegmentstartswithaSAPofthegivenSAP_type.
1 1to6,inclusive
1(index) AllthereferencedsubsegmentsstartwithaSAPofatmostthegivenSAP_typeandnoneoftheseSAPsisofanunknowntype.
8.16.4 Subsegment Index Box
8.16.4.1 Definition
BoxType: `ssix’Container: FileMandatory: NoQuantity: Zeroormore
The Subsegment Index box ('ssix') provides a mapping from levels (as specified by the LevelAssignmentbox)tobyterangesoftheindexedsubsegment.Inotherwords,thisboxprovidesacompactindex for how the data in a subsegment is ordered according to levels into partial subsegments. Itenables a client to easily access data for partial subsegments by downloading ranges of data in thesubsegment.
Eachbyteinthesubsegmentshallbeexplicitlyassignedtoalevel,andhencetherangecountmustbe2orgreater. If therange isnotassociatedwithany information in the levelassignment, thenany levelthatisnotincludedinthelevelassignmentmaybeused.
ISO/IEC 14496-12:2015(E)
110 ©ISO/IEC2015–Allrightsreserved
There shall be 0 or 1 Subsegment Index boxes per each Segment Index box that indexes only leafsubsegments, i.e. thatonly indexes subsegmentsbutnosegment indexes.ASubsegment Indexbox, ifany,shallbethenextboxaftertheassociatedSegmentIndexbox.ASubsegmentIndexboxdocumentsthesubsegmentsthatareindicatedintheimmediatelyprecedingSegmentIndexbox.
Ingeneral,themediadataconstructedfromthebyterangesisincomplete,i.e.itdoesnotconformtothemediaformatoftheentiresubsegment.
For leaf subsegments based on this specification (i.e. based on movie sample tables and moviefragments):
Eachlevelshallbeassignedtoexactlyonepartialsubsegment,i.e.byterangesforonelevelshallbecontiguous.
Levelsofpartialsubsegmentsshallbeassignedbyincreasingnumberswithinasubsegment,i.e.,samplesofapartialsubsegmentmaydependonanysamplesofprecedingpartialsubsegmentsinthesamesubsegment,butnottheotherwayaround.Forexample,eachpartialsubsegmentcontains samples having an identical temporal level and partial subsegments appear inincreasingtemporallevelorderwithinthesubsegment.
Whenapartialsubsegmentisaccessedinthisway, foranyassignment_typeotherthan3,the final Media Data box may be incomplete, that is, less data is accessed than the lengthindication of theMediaDataBox indicates is present. The length of theMediaData boxmayneed adjusting, or paddingused. Thepadding_flag in the LevelAssignmentBox indicateswhetherthismissingdatacanbereplacedbyzeros.Ifnot,thesampledataforsamplesassignedtolevelsthatarenotaccessedisnotpresent,andcareshouldbetakennottoattempttoprocesssuchsamples.
ThedatarangescorrespondingtopartialsubsegmentsincludebothMovieFragmentboxesandMediaDataboxes.Thefirstpartialsubsegment,i.e.thelowestlevel,willcorrespondtoaMovieFragment box as well as (parts of) Media Data box(es), whereas subsequent partialsubsegments(higherlevels)maycorrespondto(partsof)MediaDatabox(es)only.
NOTE assignment_type equal to 0 (specified in the subsegment index box ‘leva’) can be used, for example,togetherwiththetemporallevelsamplegrouping(‘tele’)whenframesofavideobitstreamaretemporallyorderedwithinsubsegments;assignment_type equalto2canbeused,forexample,wheneachviewofamultiviewvideobitstreamiscontainedinaseparatetrackandthetrackfragmentsforalltheviewsarecontainedinasinglemoviefragment. assignment_type equal to 3 may be used, for example, when audio and video movie fragments(including the respectiveMedia Data boxes) are interleaved. The first level can be specified to contain the audiomoviefragments(includingtherespectiveMediaDataboxes),whereasthesecondlevelcanbespecifiedtocontainbothaudioandvideomoviefragments(includingallMediaDataboxes).
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 111
8.16.4.2 Syntax
aligned(8) class SubsegmentIndexBox extends FullBox(‘ssix’, 0, 0) { unsigned int(32) subsegment_count; for( i=1; i <= subsegment_count; i++) { unsigned int(32) range_count; for ( j=1; j <= range_count; j++) { unsigned int(8) level; unsigned int(24) range_size; } } }
8.16.4.3 Semantics
subsegment_countisapositiveintegerspecifyingthenumberofsubsegmentsforwhichpartialsubsegment information is specified in this box. subsegment_count shall be equal toreference_count (i.e., the number of movie fragment references) in the immediatelyprecedingSegmentIndexbox.
range_count specifies the number of partial subsegment levels into which the media data isgrouped.Thisvalueshallbegreaterthanorequalto2.
range_sizeindicatesthesizeofthepartialsubsegment.levelspecifiestheleveltowhichthispartialsubsegmentisassigned.
8.16.5 Producer Reference Time Box
8.16.5.1 Definition
BoxType: `prft’Container: FileMandatory: NoQuantity: Zeroormore
Theproducerreferencetimeboxsuppliesrelativewall‐clocktimesatwhichmoviefragments,orfilescontainingmovie fragments (such as segments)wereproduced.When these files are bothproducedand consumed in real time, this can provide clients with information to enable consumption andproductiontoproceedatequivalentrates,thusavoidingpossiblebufferoverfloworunderflow.
Thisboxisrelatedtothenextmoviefragmentboxthatfollowsitinbitstreamorder.Itmustfollowanysegment type or segment index box (if any) in the segment, and occur before the following moviefragmentbox(towhichitrefers).Ifasegmentfilecontainsanyproducerreferencetimeboxes,thenthefirstofthemshalloccurbeforethefirstmoviefragmentboxinthatsegment.
The box contains a time value measured on a clock which increments at the same rate as a UTC‐synchronizedNTP[RFC5905]clock,usingNTPformat.Thisisassociatedwithamediatimeforoneofthetracksinthemoviefragment.Thatmediatimeshouldbeintherangeoftimesinthattrackintheassociatedmoviefragment.
Producerreferencetimesshouldbeassociatedwithatmostonetrack.
ISO/IEC 14496-12:2015(E)
112 ©ISO/IEC2015–Allrightsreserved
8.16.5.2 Syntax
aligned(8) class ProducerReferenceTimeBox extends FullBox(‘prft’, version, 0) { unsigned int(32) reference_track_ID; unsigned int(64) ntp_timestamp; if (version==0) { unsigned int(32) media_time; } else { unsigned int(64) media_time; } }
8.16.5.3 Semantics
reference_track_IDprovidesthetrack_IDforthereferencetrack.ntp_timestampindicatesaUTCtimeinNTPformatcorrespondingtodecoding_time.media_timecorrespondstothesametimeasntp_timestamp,butinthetimeunitsusedforthe
referencetrack,andismeasuredonthismediaclockasthemediaisproduced.
NOTE inmostcasesthistimestampwillnotbeequaltothetimestampofthefirstsampleoftheadjacentsegmentofthereferencetrack,butitisrecommendeditbeintherangeofthesegmentcontainingthisproducerreferencetimebox.
8.17 Support for Incomplete Tracks
8.17.1 General
ThisSubclausedocumentsthesampleentryformatsfortracksthatareincomplete.Incompletetracksmaycontainsamplesthataremarkedemptyornotreceivedusingthesampleformat.
Incompletetracksmayresult,forexample,whensubsegmentsarereceivedpartiallyaccordingtolevelassignmentsandpadding_flagintheLevelAssignmentboxindicatesthatthedatainaMediaDataboxthatisnotreceivedcanbereplacedbyzeros.Consequently,sampledataassignedtonon‐accessedlevels is not present, and care should be taken not to attempt to process such samples.However, inpartiallyreceivedsubsegmentssometracksmightremaincompleteincontentwhileothertracksmightbeincompleteandonlycontaindatathatisincludedbyreferenceintothecompletetracks.
This Subclause specifies support for sample entry formats for incomplete tracks.With this support,readerscandetectincompletetracksfromtheirsampleentriesandavoidprocessingsuchtracksortakethepossibilityofemptyornotreceivedsamplesintoaccountwhenprocessingsuchtracks.
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 113
The support for incomplete tracks is similar to the content protection transformationwhere sampleentriesarehiddenbehindgenericsampleentries,suchas‘encv’and‘enca’.Becausetheformatofasample entry varies with media‐type, a different encapsulating four‐character‐code is used forincompletetracksofeachmediatype(audio,video,textetc.).Theyare:
Stream (Track) Type Sample-Entry Code
Video icpv
Audio icpa
Text icpt
System icps
Hint icph
TimedMetadata icpm
Sampledataofincompletetracksmaybeincludedintosamplesofothertracksbyreference,andhenceanincompletetrackshouldnotberemovedaslongasanytrackreferencepointstoit.
NOTE–Thechoiceoflevelbytheoriginalrecordingclientmayvaryovertime,andattimesrepresentthecomplete track.The level isnot indicatedhere, and it isnot required that the sampleentry change from‘incomplete’ to ‘complete’whenall levelswere, in fact, received, foraperiod.Notealso that the ‘originalformat’ may have indicated encryption, if partial reception and decryption works for that encryptionformat.
8.17.2 Transformation
Thesampleentryforatrackthatbecomesincompletee.g.throughpartialreception,shouldbemodifiedasfollows:
1) Thefour‐character‐codeofthesampleentry,e.g.‘avc1’,isreplacedbyanewsampleentrycode‘icpv’meaninganincompletetrack.
2) ACompleteTrackInformationboxisaddedtothesampledescription,leavingallotherboxesunmodified.
3) The original sample entry type, e.g.‘avc1’, is storedwithin anOriginal Format boxcontainedintheCompleteTrackInformationbox.
Aftertransformation,anexampleAVCsampleentrymightlooklike:
class IncompleteAVCSampleEntry() extends VisualSampleEntry (‘icpv’){ CompleteTrackInfoBox(); AVCConfigurationBox config; MPEG4BitRateBox (); // optional MPEG4ExtensionDescriptorsBox (); // optional }
ISO/IEC 14496-12:2015(E)
114 ©ISO/IEC2015–Allrightsreserved
8.17.3 Complete Track Information Box
8.17.3.1 Definition
BoxTypes: ‘cinf’Container: SampleEntryforanIncompleteTrackMandatory: YesQuantity: Exactlyone
TheCompleteTrackInformationBoxcontains,withintheOriginalFormatBox,thesampleentryformatof the complete track thatwas transformed to thepresent incomplete track. Itmay containoptionalboxesforexampleincludinginformationrequiredtoprocesssamplesofthepresentincompletetrack.TheCompleteTrackInformationBoxisacontainerbox.Itismandatoryinasampleentrythatusesacodeindicatinganincompletetrack.
8.17.3.2 Syntax
aligned(8) class CompleteTrackInfoBox(fmt) extends Box('cinf') { OriginalFormatBox(fmt) original_format; }
9 Hint Track Formats
9.1 RTP and SRTP Hint Track Format
9.1.1 Introduction
RTP is the real‐time transport protocol defined by the IETF (RFC 3550 and 3551) and is currentlydefinedtobeabletocarryalimitedsetofmediatypes(principallyaudioandvideo)andcodings.ThepackingofMPEG‐4elementarystreamsintoRTPisunderdiscussioninbothbodies.However,itisclearthatthewaythemediaispacketizeddoesnotdifferinkindfromtheexistingtechniquesusedforothercodecsinRTP,andsupportedbythisscheme.
InstandardRTP,eachmediastreamissentasaseparateRTPstream;multiplexingisachievedbyusingIP’s port‐level multiplexing, not by interleaving the data from multiple streams into a single RTPsession.However,ifMPEGisused,itmaybenecessarytomultiplexseveralmediatracksintooneRTPtrack(e.g.whenusingMPEG‐2transportinRTP,orFlexMux).Eachhinttrackisthereforetiedtoasetofmedia tracks by track references. The hint tracks extract data from their media tracks by indexingthroughthistable.Hinttrackreferencestomediatrackshavethereferencetype‘hint’.
This design decides the packet size at the time the server hint track is created; therefore, in thedeclarations for thehint track,we indicate the chosenpacket size. This is in the sample‐description.NotethatitisvalidfortheretobeseveralRTPhinttracksforeachmediatrack,withdifferentpacketsize choices. Similarly the time‐scale for the RTP clock is provided. The timescale of the server hinttrackisusuallychosentomatchthetimescaleofthemediatracks,orasuitablevalueispickedfortheserver. In somecases, theRTP timescale isdifferent (e.g. 90kHz for someMPEGpayloads), and thispermits thatvariation.Sessiondescription(SAP/SDP) information isstored inuser‐databoxes in thetrack.
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 115
RTPhinttracksdonotusethecompositiontimeoffsettable(‘ctts’).Instead,thehintingprocessforserver hint tracks establishes the correct transmission order and time‐stamps, perhaps using thetransmissiontimeoffsettosettransmissiontimes.
Hinted contentmay require the use of SRTP for streaming by using the hint track format for SRTP,definedhere.SRTPhinttracksareformattedidenticallytoRTPhinttracks,exceptthat:
1) thesampleentrynameischangedfrom‘rtp ‘to‘srtp’toindicatetotheserverthatSRTPisrequired;
2) anextraboxisaddedtothesampleentrywhichcanbeusedtoinstructtheserverinthenatureoftheon‐the‐flyencryptionandintegrityprotectionthatmustbeapplied.
9.1.2 Sample Description Format
RTP server hint tracks are hint tracks (media handler‘hint’),with an entry‐format in the sampledescriptionof‘rtp ‘:
class RtpHintSampleEntry() extends SampleEntry (‘rtp ‘) { uint(16) hinttrackversion = 1; uint(16) highestcompatibleversion = 1; uint(32) maxpacketsize; box additionaldata[]; }
The hinttrackversion is currently 1; the highest compatible version field specifies the oldestversionwithwhichthistrackisbackward‐compatible.
Themaxpacketsizeindicatesthesizeofthelargestpacketthatthistrackwillgenerate.
Theadditionaldataisasetofboxes,fromthefollowing.
class timescaleentry() extends Box(‘tims’) { uint(32) timescale; } class timeoffset() extends Box(‘tsro’) { int(32) offset; } class sequenceoffset extends Box(‘snro’) { int(32) offset; }
The timescale entry is required. The other two are optional. The offsets over‐ride the default serverbehaviour,whichistochoosearandomoffset.Avalueof0,therefore,willcausetheservertoapplynooffsettothetimestamporsequencenumberrespectively.
AnSRTPHintSampleentryisusedwhenitisrequiredthatSRTPprocessingisrequired.
class SrtpHintSampleEntry() extends SampleEntry (‘srtp‘) { uint(16) hinttrackversion = 1; uint(16) highestcompatibleversion = 1; uint(32) maxpacketsize; box additionaldata[]; }
ISO/IEC 14496-12:2015(E)
116 ©ISO/IEC2015–Allrightsreserved
FieldsandboxesaredefinedasfortheRtpHintSampleEntry(‘rtp ‘)oftheISOBaseMediaFileFormat.However,anSRTPProcessBoxshallbeincludedinanSrtpHintSampleEntryasoneoftheadditionaldataboxes.
9.1.2.1 SRTP Process box ‘srpp‘:
BoxType: ‘srpp’Container: SrtpHintSampleEntryMandatory:YesQuantity: Exactlyone
TheSRTPProcessBoxmayinstructtheserverastowhichSRTPalgorithmsshouldbeapplied.
aligned(8) class SRTPProcessBox extends FullBox(‘srpp’, version, 0) { unsigned int(32) encryption_algorithm_rtp; unsigned int(32) encryption_algorithm_rtcp; unsigned int(32) integrity_algorithm_rtp; unsigned int(32) integrity_algorithm_rtcp; SchemeTypeBox scheme_type_box; SchemeInformationBox info; }
TheSchemeTypeBoxandSchemeInformationBoxhavethesyntaxdefinedaboveforprotectedmediatracks.TheyservetoprovidetheparametersrequiredforapplyingSRTP.TheSchemeTypeBoxisusedto indicate the necessary key‐management and security policy for the stream in extension to thedefined algorithmic pointers provided by the SRTPProcessBox. The key‐management functionality isalso used to establish all the necessary SRTP parameters as listed in section 8.2 of the SRTPspecification.Theexactdefinitionofprotectionschemesisoutofthescopeofthefileformat.
The algorithms for encryption and integrity protection are defined by SRTP. The following formatidentifiersaredefinedhere.Anentryoffourspaces($20$20$20$20)maybeusedtoindicatethatthechoiceofalgorithmforeitherencryptionorintegrityprotectionisdecidedbyaprocessoutsidethefileformat.
Format Algorithm
$20$20$20$20 Thechoiceofalgorithmforeitherencryptionorintegrityprotectionisdecidedbyaprocessoutsidethefileformat
ACM1 EncryptionusingAESinCounterModewith128‐bitkey,asdefinedinSection4.1.1oftheSRTPspecification.
AF81 Encryption using AES in F8‐mode with 128‐bit key, as defined inSection4.1.2oftheSRTPspecification.
ENUL EncryptionusingtheNULL‐algorithmasdefinedinSection4.1.3oftheSRTPspecification
SHM2 IntegrityprotectionusingHMAC‐SHA‐1with160‐bitkey,asdefinedinSection4.2.1oftheSRTPspecification.
ANUL Integrity protection not applied to RTP (but still applied to RTCP).Note:thisisvalidonlyforintegrity_algorithm_rtp
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 117
9.1.3 Sample Format
EachsampleinaserverhinttrackwillgenerateoneormoreRTPpackets,whoseRTPtimestampisthesameasthehintsampletime.Therefore,allthepacketsmadebyonesamplehavethesametimestamp.However, provision ismade to ask the server to ‘warp’ the actual transmission times, for data‐ratesmoothing,forexample.
Eachsamplecontains twoareas: the instructions tocompose thepackets,andanyextradataneededwhen sending those packets (e.g. an encrypted version of themedia data). Note that the size of thesampleisknownfromthesamplesizetable.
aligned(8) class RTPsample { unsigned int(16) packetcount; unsigned int(16) reserved; RTPpacket packets[packetcount]; byte extradata[]; }
9.1.3.1 Packet Entry format
Eachpacketinthepacketentrytablehasthefollowingstructure:
aligned(8) class RTPpacket { int(32) relative_time; // the next fields form initialization for the RTP // header (16 bits), and the bit positions correspond bit(2) RTP_version; bit(1) P_bit; bit(1) X_bit; bit(4) CSRC_count; bit(1) M_bit; bit(7) payload_type; unsigned int(16) RTPsequenceseed; unsigned int(13) reserved = 0; unsigned int(1) extra_flag; unsigned int(1) bframe_flag; unsigned int(1) repeat_flag; unsigned int(16) entrycount; if (extra_flag) { uint(32) extra_information_length; box extra_data_tlv[]; } dataentry constructors[entrycount]; }
ThesemanticsofthefieldsforRTPserverhinttracksisspecifiedbelow.RTPreceptionhinttracksusethe same packet structure. The semantics of the fieldswhen the packet structure is used in an RTPreceptionhinttrackisspecifiedinsubclause9.4.1.4.
In serverhint tracks, therelative_time field ‘warps’ theactual transmission timeaway from thesampletime.Thisallowstrafficsmoothing.
Thefollowing2bytesexactlyoverlaytheRTPheader;theyassisttheserverinmakingtheRTPheader(the server fills in the remaining fields). Within these 2 bytes, the fields RTP_version andCSRC_countarereservedinserver(transmission)hinttracksandtheserverfillsinthesefields.
ISO/IEC 14496-12:2015(E)
118 ©ISO/IEC2015–Allrightsreserved
ThesequenceseedisthebasisfortheRTPsequencenumber.Ifahinttrackcausesmultiplecopiesofthe same RTP packet to be sent, then the seed value would be the same for them all. The servernormallyaddsarandomoffsettothisvalue(butseeabove,under‘sequenceoffset’).
extra_flagequalto1indicatesthatthereisextrainformationbeforetheconstructors,intheformoftype‐length‐valuesets.
extra_information_length indicates the length in bytes of all extra information before theconstructors, which includes the four bytes of the extra information_length field. Thesubsequentboxesbefore theconstructors,referredtoas theTLVboxes,arealignedon32‐bitboundaries.TheboxsizeofanyTLVboxindicatestheactualbytesused,notthelengthrequiredfor padding to 32‐bit boundaries. The value of extra_information_length includes therequiredpaddingfor32‐bitboundaries.
The rtpoffsetTLV (‘rtpo’)givesa32‐bitsignedintegeroffsettotheactualRTPtime‐stamptoplaceinthepacket.Thisenablespacketstobeplacedinthehinttrackindecodingorder,buthavetheirpresentationtime‐stampin the transmittedpacketbe inadifferentorder.This isnecessary forsomeMPEGpayloads.
Thebframe_flagindicatesadisposable‘b‐frame’.Therepeat_flagindicatesa‘repeatpacket’,onethatissentasaduplicateofapreviouspacket.Serversmaywishtooptimizehandlingofthesepackets.
9.1.3.2 Constructor format
Therearevariousformsoftheconstructor.Eachconstructoris16bytes,tomakeiterationeasier.Thefirstbyteisauniondiscriminator:
aligned(8) class RTPconstructor(type) { unsigned int(8) constructor_type = type; } aligned(8) class RTPnoopconstructor extends RTPconstructor(0) { uint(8) pad[15]; } aligned(8) class RTPimmediateconstructor extends RTPconstructor(1) { unsigned int(8) count; unsigned int(8) data[count]; unsigned int(8) pad[14 - count]; } aligned(8) class RTPsampleconstructor extends RTPconstructor(2) { signed int(8) trackrefindex; unsigned int(16) length; unsigned int(32) samplenumber; unsigned int(32) sampleoffset; unsigned int(16) bytesperblock = 1; unsigned int(16) samplesperblock = 1; } aligned(8) class RTPsampledescriptionconstructor
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 119
extends RTPconstructor(3) { signed int(8) trackrefindex; unsigned int(16) length; unsigned int(32) sampledescriptionindex; unsigned int(32) sampledescriptionoffset; unsigned int(32) reserved; }
Theimmediatemodepermitstheinsertionofpayload‐specificheaders(e.g.theRTPH.261header).Forhint trackswhere themedia is sent ‘in the clear’, thesample entry then specifies thebytes to copyfromthemediatrack,bygivingthesamplenumber,dataoffset,andlengthtocopy.Thetrackreferencemayindexintothetableoftrackreferences(astrictlypositivevalue),namethehinttrackitself(‐1),ortheonlyassociatedmediatrack(0).(Thevaluezeroisthereforeequivalenttothevalue1.)
Thebytesperblock andsamplesperblock concern compressed audio, using a schemeprior toMP4,inwhichtheaudioframingwasnotevidentinthefile.Thesefieldshavethefixedvaluesof1forMP4files.
The sampledescription mode allows sending of sample descriptions (which would containelementary stream descriptors), by reference, as part of an RTP packet. The index is the index of aSampleEntry in a Sample Description Box, and the offset is relative to the beginning of thatSampleEntry.
Forcomplexcases(e.g.encryptionorforwarderrorcorrection),thetransformeddatawouldbeplacedintothehintsamples,intheextradatafield,andthensamplemodereferencingthehinttrackitselfwouldbeused.
Noticethatthereisnorequirementthatsuccessivepacketstransmitsuccessivebytes fromthemediastream.Forexample,toconformwithRTP‐standardpackingofH.261, it issometimesrequiredthatabyte be sent at the end of one packet and also at the beginning of the next (when a macroblockboundaryfallswithinabyte).
9.1.4 SDP Information
Streaming servers using RTSP and SDP usually use SDP as the description format; and there arenecessary relationships between the SDP information, and the RTP streams, such as themapping ofpayload IDs to MIME names. Provision is therefore made for the hinter to leave fragments of SDPinformationinthefile,toassisttheserverinformingafullSDPdescription.NotethattherearerequiredSDPentries,whichtheservershouldalsogenerate.Theinformationhereisonlypartial.
SDPinformationisformattedasasetofboxeswithinuser‐databoxes,atboththemovieandthetracklevel.Thetextinthemovie‐levelSDPboxshouldbeplacedbeforeanymedia‐specificlines(beforethefirst‘m=’intheSDPfile).
9.1.4.1 Movie SDP information
Atthemovielevel,withintheuser‐data(‘udta’)box,ahintinformationcontainerboxmayoccur:
ISO/IEC 14496-12:2015(E)
120 ©ISO/IEC2015–Allrightsreserved
aligned(8) class moviehintinformation extends box(‘hnti’) { } aligned(8) class rtpmoviehintinformation extends box(‘rtp ‘) { uint(32) descriptionformat = ‘sdp ‘; char sdptext[]; }
Thehintinformationboxmaycontaininformationformultipleprotocols;onlyRTPisdefinedhere.TheRTP box may contain information for various description formats; only SDP is defined here. Thesdptextiscorrectlyformattedasaseriesoflines,eachterminatedby<crlf>,asrequiredbySDP.
9.1.4.2 Track SDP Information
Atthetracklevel,thestructureissimilar;however,wealreadyknowthatthistrackisanRTPhinttrack,fromthesampledescription.Thereforethechildboxmerelyspecifiesthedescriptionformat.
aligned(8) class trackhintinformation extends box(‘hnti’) { } aligned(8) class rtptracksdphintinformation extends box(‘sdp ‘) { char sdptext[]; }
Thesdptextiscorrectlyformattedasaseriesoflines,eachterminatedby<crlf>,asrequiredbySDP.
9.1.5 Statistical Information
Inadditiontothestatisticsinthehintmediaheader,thehintermayplaceextradatainahintstatisticsbox,inthetrackuser‐databox.Thisisacontainerboxwithavarietyofsub‐boxesthatitmaycontain.
aligned(8) class hintstatisticsbox extends box(‘hinf’) { }
aligned(8) class hintBytesSent extends box(‘trpy’) { uint(64) bytessent; } // total bytes sent, including 12-byte RTP headers aligned(8) class hintPacketsSent extends box(‘nump’) { uint(64) packetssent; } // total packets sent aligned(8) class hintBytesSent extends box(‘tpyl’) { uint(64) bytessent; } // total bytes sent, not including RTP headers
aligned(8) class hintBytesSent extends box(‘totl’) { uint(32) bytessent; } // total bytes sent, including 12-byte RTP headers aligned(8) class hintPacketsSent extends box(‘npck’) { uint(32) packetssent; } // total packets sent aligned(8) class hintBytesSent extends box(‘tpay’) { uint(32) bytessent; } // total bytes sent, not including RTP headers
aligned(8) class hintmaxrate extends box(‘maxr’) { // maximum data rate uint(32) period; // in milliseconds uint(32) bytes; } // max bytes sent in any period ‘period’ long // including RTP headers
aligned(8) class hintmediaBytesSent extends box(‘dmed’) { uint(64) bytessent; } // total bytes sent from media tracks aligned(8) class hintimmediateBytesSent extends box(‘dimm’) { uint(64) bytessent; } // total bytes sent immediate mode aligned(8) class hintrepeatedBytesSent extends box(‘drep’) { uint(64) bytessent; } // total bytes in repeated packets
aligned(8) class hintminrelativetime extends box(‘tmin’) { int(32) time; } // smallest relative transmission time, milliseconds aligned(8) class hintmaxrelativetime extends box(‘tmax’) { int(32) time; } // largest relative transmission time, milliseconds
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 121
aligned(8) class hintlargestpacket extends box(‘pmax’) { uint(32) bytes; } // largest packet sent, including RTP header aligned(8) class hintlongestpacket extends box(‘dmax’) { uint(32) time; } // longest packet duration, milliseconds
aligned(8) class hintpayloadID extends box(‘payt’) { uint(32) payloadID; // payload ID used in RTP packets uint(8) count; char rtpmap_string[count]; }
NOTENotallthesesub‐boxesmaybepresent,andthattheremaybemultiple‘maxr’boxes,coveringdifferentperiods.
9.2 ALC/LCT and FLUTE Hint Track Format
9.2.1 Introduction
Thefileformatsupportsmulticast/broadcastdeliveryoffileswithFECprotection.Filestobedeliveredarestoredas items inacontainer file(definedbythe file format)andthemetabox isamendedwithinformation on how the files are partitioned into source symbols. For each source block of a FECencoding, additional parity symbols can be pre‐computed and stored as FEC reservoir items. Thepartitioning depends on the FEC scheme, the target packet size, and the desired FEC overhead. Pre‐composedsourcesymbolscanbestoredasFilereservoiritemstominimizeduplicateinformationinthecontainer file especially with MBMS‐FEC. The actual transmission is governed by hint tracks thatcontainserverinstructionsthatfacilitatetheencapsulationofsourceandFECsymbolsintopackets.
FD hint tracks have been designed for the ALC/LCT (Asynchronous Layered Coding/Layered CodingTransport)andFLUTE(FileDeliveryoverUnidirectionalTransport)protocols.LCTprovidestransportlevelsupportforreliablecontentdeliveryandstreamdeliveryprotocols.ALCisaprotocolinstantiationof the LCT building block, and it serves as a base protocol for massively scalable reliablemulticastdistribution of arbitrary binary objects. FLUTE builds on top of ALC/LCT and defines a protocol forunidirectionaldeliveryoffiles.
FLUTEdefinesaFileDeliveryTable(FDT),whichcarriesmetadataassociatedwiththefilesdeliveredintheALC/LCTsession,andprovidesmechanismsfor in‐banddeliveryandupdatesofFDT. Incontrast,ALC/LCTreliesonothermeansforout‐of‐banddeliveryoffilemetadata,e.g.,anelectronicserviceguidethat is normally delivered to clients well in advance of the ALC/LCT session combinedwith updatefragmentsthatcanbesentduringtheALC/LCTsession.
FilepartitioningsandFECreservoirscanbeusedindependentlyofFDhinttracksandviceversa.Theformeraidthedesignofhinttracksandallowalternativehinttracks,e.g.,withdifferentFECoverheads,tore‐usethesameFECsymbols.TheyalsoprovidemeanstoaccesssourcesymbolsandadditionalFECsymbols independently forpost‐deliveryrepair,whichmaybeperformedoverALC/LCTorFLUTEorout‐of‐band via another protocol. In order to reduce complexity when a server follows hint trackinstructions,hinttracksreferdirectlytodatarangesofitemsordatacopiedintohintsamples.
ItisrecommendedthataserversendsadifferentsetofFECsymbolsforeachretransmissionofafile.
Thesyntaxforusingthemetaboxasacontainerfileforsourcefilesisdefinedin8.10.4,partitions,fileandFECreservoirsaredefinedin8.13,whilethesyntaxforFDhinttracksisdefinedin9.2.
ISO/IEC 14496-12:2015(E)
122 ©ISO/IEC2015–Allrightsreserved
9.2.2 Design principles
The support for file delivery is designed to optimize the server transmission process by enablingALC/LCT or FLUTE servers to follow simple instructions. It is enough to follow one pre‐definedsequenceofinstructionsperchannelinordertotransmitonesession.Thefileformatenablesstorageofpre‐computedsourceblocksandsymbolpartitionings,i.e.,filesmaybepartitionedintosymbolswhichfitanintendedpacketsize,andpre‐computingacertainamountofFEC‐symbolsthatalsocanbeusedfor post‐session repair. The file format also allows storage of alternative ALC/LCT or FLUTEtransmission session instructions thatmay lead to equivalent end results. Such alternativesmay beintendedfordifferentchannelconditionsbecauseofhigherFECprotectionorevenbyusingdifferenterrorcorrectionschemes.Alternativesessionscanrefer toacommonsetofsymbols.Thehint tracksareflexibleandcanbeusedtocomposeFDTfragmentsandinterleavingofsuchfragmentswithintheactual object transmission. Several hint tracks can be combined into one ormore sessions involvingsimultaneoustransmissionovermultiplechannels.
It is important to make a difference between the definition of sessions for transmission and theschedulingofsuchsessions.ALC/LCTandFLUTEserver filesonlyaddressoptimizationof theservertransmissionprocess.Inordertoensuremaximalusageandflexibilityofsuchpre‐definedsessions,alldetailsregardingschedulingaddresses,etc.arekeptoutsidethedefinitionof the file format.Externalscheduling applications decide such details, which are not important for optimizing transmissionsessions per se. In particular, the following information is out‐of‐scope of the file format: timescheduling,targetaddressesandports,sourceaddressesandports,andso‐calledTransmissionSessionIdentifiers(TSI).
The sample numbers associated with the samples of a file delivery hint track provide a numberedsequence. Hint track sample times provide send times of ALC/LCT or FLUTE packets for a defaultbitrate.Dependingon theactual transmissionbitrate,anALC/LCTorFLUTEservermayapply lineartime scaling. Sample times may simplify the scheduling process, but it is up to the server to sendALC/LCTorFLUTEpacketsinatimelymanner.
AschematicpictureofafilecontainingthreealternativehinttrackswithdifferentFECoverheadforasourcefileisprovidedinFigure6.Inthisexample,eachsourceblockconsistsofonlyonesub‐block.
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 123
Src Sym [0-5119 ]
FEC Sym #2 [0-511 ]Src Sym [5120 -10240 ]FEC Sym #1 [0-511 ]track #1
(10 % FEC )
FEC Sym #2 [0-614 ]FEC Sym #1 [0 -614 ]track #2
( ~12% FEC)
FEC Sym #2 [0 -716 ]FEC Sym #1 [0- 716 ]track #3
(14 % FEC )
File item
Storage Format of a single file
FEC reservoir item s
FEC for Src Block #1
FEC for Src Block #2
Src Sym [0 -5119 ]
Src Sym [0 -5119 ]
Src Sym [5120 -10240 ]
Src Sym [5120 -10240 ]
Figure 4 — Different FEC overheads of a source file provided by alternative hint tracks.
Thesourcefileintheabovefigureispartitionedinto2sourceblockscontainingsymbolsofafixedsize.FECredundancysymbolsarecalculatedforbothsourceblocksandstoredasFECreservoiritems.Asthehint tracks reference the same items in the file there is no duplication of information. The originalsourcesymbolsandFECreservoirscanalsobeusedbyrepairserversthatdon’tusehinttracks.
9.2.3 Sample Description Format
9.2.3.1 Definition
FD hint tracks are tracks with handler_type ‘hint’ and with the entry‐format ‘fdp ' in thesampledescriptionbox.TheFDhintsampleentryiscontainedinthesampledescriptionbox('stsd').
9.2.3.2 Syntax
class FDHintSampleEntry() extends SampleEntry ('fdp ') { unsigned int(16) hinttrackversion = 1; unsigned int(16) highestcompatibleversion = 1; unsigned int(16) partition_entry_ID; unsigned int(16) FEC_overhead; Box additionaldata[]; //optional }
ISO/IEC 14496-12:2015(E)
124 ©ISO/IEC2015–Allrightsreserved
9.2.3.3 Semantics
partition_entry_IDindicatesthepartitionentryintheFDiteminformationbox.Azerovalueindicates that no partition entry is associated with this sample entry, e.g., for FDT. If thecorrespondingFDhinttrackcontainsonlyoverheaddatathisvalueshouldindicatethepartitionentrywhoseoverheaddataisinquestion.
FEC_overheadisafixed8.8valueindicatingthepercentageprotectionoverheadusedbythehintsample(s). The intention of providing this value is to provide characteristics to help a serverselecta sessiongroup(andcorrespondingFDhint tracks). If thecorrespondingFDhint trackcontains only overhead data this value should indicate the protection overhead achieved byusingallFDhinttracksinasessiongroupuptotheFDhinttrackinquestion.
Thehinttrackversion andhighestcompatibleversion fieldshavethesameinterpretationas in theRTPhint sampleentrydescribed in9.1.2.As additionaldata a timescale entryboxmaybeprovided.Ifnotprovided,thereisnoindicationgivenontimingofpackets.
Fileentriesneeded for anFDTor anelectronic serviceguide canbe createdbyobservingall sampleentries of a hint track and the corresponding item informationboxes of the items referencedby theabovepartitionentryIDs.Nosampleentriesshallbeincludedinthehinttrackiftheyarenotreferencedbyanysample.
9.2.4 Sample Format
9.2.4.1 Sample Container
EachFDsampleinthehinttrackwillgenerateoneormoreFDpackets.
Eachsamplecontains twoareas: the instructions tocompose thepackets,andanyextradataneededwhensendingthosepackets(e.g.,encodingsymbolsthatarecopiedintothesampleinsteadofresidinginitemsforsourcefilesorFEC).Notethatthesizeofthesampleisknownfromthesamplesizetable.
aligned(8) class FDsample extends Box(‘fdsa’) { FDPacketBox packetbox[] ExtraDataBox extradata; //optional }
SamplenumbersofFD samplesdefine theorder they shall beprocessedby the server. Likewise, FDpacketboxes ineachFDsampleshouldappear in theorder theyshallbeprocessed. If the timescaleentrybox ispresent in theFDhintsampleentry, thensample timesaredefinedandproviderelativesendtimesofpacketsforadefaultbitrate.Dependingontheactualtransmissionbitrate,aservermayapplylineartimescaling.Sampletimesmaysimplifytheschedulingprocess,butitisuptotheservertosendpacketsinatimelymanner.
9.2.4.2 Packet Entry Format
EachpacketintheFDsamplehasthefollowingstructure(References:RFC3926,3450,3451):
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 125
aligned(8) class FDpacketBox extends Box(‘fdpa’) { LCTheaderTemplate LCT_header_info; unsigned int(16) entrycount1; LCTheaderExtension header_extension_constructors[ entrycount1 ]; unsigned int(16) entrycount2; dataentry packet_constructors[ entrycount2 ]; }
The LCT header info contains LCT header templates for the current FD packet. Header extensionconstructors are structures which are used for constructing the LCT header extensions. PacketconstructorsareusedforconstructingtheFECpayloadIDandthesourcesymbolsinanFDpacket.
9.2.4.3 LCT Header Template Format
TheLCTheadertemplateisdefinedasfollows:
aligned(8) class LCTheaderTemplate { unsigned int(1) sender_current_time_present; unsigned int(1) expected_residual_time_present; unsigned int(1) session_close_bit; unsigned int(1) object_close_bit; unsigned int(4) reserved; unsigned int(16) transport_object_identifier; }
It can be used by a server to form an LCT header for a packet. Note that some parts of the headerdependontheserverpolicyandarenotincludedinthetemplate.SomefieldlengthsalsodependontheLCTheaderbitsassignedbytheserver.TheservermayalsoneedtochangethevalueoftheTransportObjectIdentifier(TOI).
9.2.4.4 LCT Header Extension Constructor Format
TheLCTheaderextensionconstructorformatisdefinedasfollows:
aligned(8) class LCTheaderextension { unsigned int(8) header_extension_type; if (header_extension_type > 127) { unsigned int(8) content[3]; } else { unsigned int(8) length; if (length > 0) { unsigned int(8) content[(length*4) - 2]; } }
Apositivevalueofthelengthfieldspecifiesthelengthoftheconstructorcontentinmultiplesof32bitwords.Azerovaluemeansthattheheaderisgeneratedbytheserver.
The usage and rules for LCT header extensions are defined in RFC3451 (LCT RFC). Theheader_extension_typecontainstheLCTHeaderExtensionType(HET)value.
HET values between 0 and 127 are used for variable‐length (multiple 32‐bitword) extensions. HETvalues between 128 and 255 are used for fixed length (one 32‐bit word) extensions. If theheader_extension_typeissmallerthan128,thenthelengthfieldcorrespondstotheLCTHeader
ISO/IEC 14496-12:2015(E)
126 ©ISO/IEC2015–Allrightsreserved
ExtensionLength (HEL)asdefined inRFC3451.Thecontent field always corresponds to theHeaderExtensionContent(HEC).
NOTEAservercanidentifypacketsincludingFDTbyobservingwhetherEXT_FDT(header_extension_type == 192)ispresent.
9.2.4.5 Packet Constructor Format
There are various forms of the constructor. Each constructor is 16 bytes in order tomake iterationeasier.Thefirstbyteisauniondiscriminator.ThepacketconstructorsareusedtoincludeFECpayloadIDaswellassourceandparitysymbolsinanFDpacket.
aligned(8) class FDconstructor(type) { unsigned int(8) constructor_type = type; } aligned(8) class FDnoopconstructor extends FDconstructor(0) { unsigned int(8) pad[15]; } aligned(8) class FDimmediateconstructor extends FDconstructor(1) { unsigned int(8) count; unsigned int(8) data[count]; unsigned int(8) pad[14 - count]; } aligned(8) class FDsampleconstructor extends FDconstructor(2) { signed int(8) trackrefindex; unsigned int(16) length; unsigned int(32) samplenumber; unsigned int(32) sampleoffset; unsigned int(16) bytesperblock = 1; unsigned int(16) samplesperblock = 1; } aligned(8) class FDitemconstructor extends FDconstructor(3) { unsigned int(16) item_ID; unsigned int(16) extent_index; unsigned int(64) data_offset; //offset in byte within extent unsigned int(24) data_length; //non-zero length in byte within extent or //if (data_length==0) rest of extent }
aligned(8) class FDitemconstructorLarge extends FDconstructor(5) { unsigned int(32) item_ID; unsigned int(32) extent_index; unsigned int(64) data_offset; //offset in byte within extent unsigned int(24) data_length; //non-zero length in byte within extent or //if (data_length==0) rest of extent }
aligned(8) class FDxmlboxconstructor extends FDconstructor(4) { unsigned int(64) data_offset; //offset in byte within XMLBox or BinaryXMLBox unsigned int(32) data_length; unsigned int(24) reserved; }
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 127
9.2.4.6 Extra Data Box
EachsampleofanFDhinttrackmayincludeextradatastoredinanextradatabox:
aligned(8) class ExtraDataBox extends Box(‘extr’) { FECInformationBox feci; bit(8) extradata[]; }
9.2.4.7 FEC Information Box
9.2.4.7.1 Definition
BoxType: ‘feci’Container: ExtraDataBox(‘extr’)Mandatory: NoQuantity: ZeroorOne
TheFECInformationboxstoresFECencodingID,FECinstanceIDandFECpayloadIDwhichareneededwhensendinganFDpacket.
9.2.4.7.2 Syntax
aligned(8) class FECInformationBox extends Box('feci') { unsigned int(8) FEC_encoding_ID; unsigned int(16) FEC_instance_ID; unsigned int(16) source_block_number; unsigned int(16) encoding_symbol_ID; }
9.2.4.7.3 Semantics
FEC_encoding_ID identifies theFECencodingschemeand issubject to IANAregistration(seeRFC 5052), in which (i) value zero corresponds to the "Compact No‐Code FEC scheme" alsoknown as "Null‐FEC" (RFC3695); (ii) value one corresponds to the “MBMS FEC” (3GPP TS26.346); (iii) for values in the rangeof 0 to127, inclusive, theFEC scheme is Fully‐Specified,whereasforvaluesintherangeof128to255,inclusive,theFECschemeisUnder‐Specified.
FEC_instance_ID providesamorespecificidentificationoftheFECencoderbeingusedforanUnder‐SpecifiedFECscheme.ThisvalueshouldbesettozeroforFully‐SpecifiedFECschemesand shall be ignored when parsing a file with FEC_encoding_ID in the range of 0 to 127,inclusive.FEC_instance_IDisscopedbytheFEC_encoding_ID.SeeRFC5052forfurtherdetails.
source_block_number identifiesfromwhichsourceblockoftheobjecttheencodingsymbol(s)intheFDpacketaregenerated.
encoding_symbol_ID identifieswhichspecificencodingsymbol(s)generated fromthesourceblockarecarriedintheFDpacket.
9.3 MPEG-2 Transport Hint Track Format
9.3.1 Introduction
MPEG‐2TS(TransportStream)isastreammultiplexwhichcancarryoneormoreprograms,consistingofaudio,videoandothermedia.ThefileformatsupportsthestorageofMPEG‐2TSinahinttrack.AnMPEG‐2TShinttrackcanbeusedforbothstorageofreceivedTSpackets(asareceptionhinttrack),andasaserverhinttrackusedforthegenerationofanMPEG‐2TS.
ISO/IEC 14496-12:2015(E)
128 ©ISO/IEC2015–Allrightsreserved
TheMPEG‐2TShinttrackdefinitionsupportsso‐called“precomputedhints”.Precomputedhintsmakenouseof includingdatabyreference fromother tracks,but ratherMPEG‐2TSpacketsarestoredassuch.ThisallowsreusingtheMPEG‐2TSpacketsstoredinaseparatefile.Furthermore,precomputedhintsfacilitatesimplerecordingoperation.
In addition to precomputed hint samples, it is possible to includemedia data by reference tomediatracks into hint samples. Conversion of a received transport stream to media tracks would allowexistingplayerscompliantwithearlierversionsoftheISObasemediafileformattoprocessrecordedfiles as long as themedia formats are also supported. Storing the original transport headers retainsvaluableinformationforerrorconcealmentandthereconstructionoftheoriginaltransportstream.
9.3.2 Design Principles
ThedesignprinciplesoftheMPEG‐2TSHintTrackFormatareasfollows.
AsequenceofsamplesinanMPEG‐2TSHintTrackisasetofprecomputedandconstructedMPEG‐2TSpackets.PrecomputedpacketsareTSpacketswhicharestoredunchangedinthecaseofreceptionorwill be sent as is. This is especially importantwhere data cannot be de‐multiplexed and elementarystreamscannotbecreated–e.g.whenthetransportstreamisencryptedandisnotallowedtobestoreddecrypted. Therefore, it is necessary to be able to store the MPEG‐2 TS as such in a hint track.ConstructedpacketsusethesameapproachasRTPhinttracks,i.e.,thesamplecontainsinstructionsforastreamingservertoconstructthepacket.Theactualmediadataiscontainedinothertracks.Atrackreferenceoftype‘hint’isused.
9.3.2.1 Reusing existing Transport Streams
ItwasdesiredtoreuseexistingTSinstancesandthereforeanadditionalmechanismexiststocoverawidevarietyofexistingTSrecordings.TheserecordingsmayconsistnotonlyofTSpacketsbuthaveprecedingortrailingdatawitheachTSpacket.Aspecificcaseforprecedingdataisa4‐bytetimestampinfrontofeachTSpackettoremovethejitterofatransmissionsystem.AspecificcasefortrailingdataistheadditionofFECwhenaTSpacketistransmittedoveranerror‐pronechannel.
9.3.2.2 Timing
MPEG‐2 TS defines a single clock for each program, running at 27MHz, which sampling value istransported as PCRs in the TS for clock recovery. The timescale of MPEG‐2 TS Hint Tracks isrecommendedtobe90000,oranintegerdivisionormultiplethereof.
ThedecodingtimeofasampleinaMPEG‐2TSHintTrackisthereception/transmissiontimeofthefirstbitofthatpacketorpacketgroupwhichisrecommendedtobederivedfromthePCRtimestampsoftheTS,since if thePCRtimesareused,piece‐wise linearitycanbeassumedandthe ‘stts’ tablecompactssensibly. The optional ‘tsti’ box in the sample description can be used to signal whether receptiontimingwithorwithoutclockrecoverywasusedwhenthehinttrackisareceptionhinttrack.InthecaseofaserverhinttrackPCRtimingisassumed.
NOTE:Whentherearemultiplepacketsinasample,theycannotbegivenindependenttransmissiontimeoffsets.
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 129
9.3.2.3 Packet Grouping
ThesampleformatforMPEG‐2TransportStreamHintTracksallowsmultipleTSpacketsinonesample.Specific applications, suchas some IPTVapplications, conveyTSpackets in anRTPstream.OnlyonereceptiontimestampcanbederivedforallTSpacketscarriedinoneRTPpacket.AnotherapplicationforstoringmultipleTSpacketsinasampleisSPTSs,whereasamplecontainsalltheTSpacketsforaGoP.Inthiscaseeverysampleisarandomaccesspoint.
Notethatrandom‐accesstoeveryTSpacketisnotpossiblebythemeansofthefileformatifmultipleTSpacketspersampleareused.
InthecaseofanMPTSonlyonepacketpersampleshouldbeused.Thisfacilitatestheuseofthesamplegroupmechanismonaper‐packetbasis.
9.3.2.4 Random-access points
Asyncsampleisapointatwhichprocessingofatrackmaybeginwithouterror.BothMPTSandSPTSaresupportedbyMPEG‐2TSHintTracks,howeverarandomaccesspoint,markedasasyncsample,isnormallyonlydefinedforSPTS,whereitspecifiesthebeginningofapacketthatcontainsthefirstbyteof an independently decodable media access unit (e.g. MPEG‐2 video I‐frames or MPEG‐4 AVC IDRpictures)ofastreamthatusesdifferentialcoding.ForMPTS,thesyncsampletablewouldnormallybepresentbutempty,indicatingthatthereisnopointinthetrackatwhichprocessingoftheentiretrackmaybeginwithouterror.ItisrecommendedthatthePSI/SIbeintheSampleDescriptionsothattruerandom‐accesswithjustthemediadataispossible.
NotethatinthecaseofanMPTS,thesyncsampletableispresentbutempty(whichmeansessentiallythatnosampleisasyncsample).
NotealsothatincaseofanSPTS,samplesincludingmultipleTSpacketsshouldhaveasyncpoint(e.g.GoPboundary)atthestartofasample.Thesyncsampletablethenmarksthesamplesthesyncpoints(e.g. thestartofGoPs); if thesyncsample table isabsent,all thesamplesaresyncpoints. If thesyncsampletableispresentbutempty,thesyncsamplepositionsareunknownandmaybenotatthestartofsamples.
NOTE: Anapplicationsearchingforakeyframecanstartreadingatthatlocation,butingeneralitalsohastoreadfurtherMPEG‐2TSpackets(regardingthefileformatthesearesubsequentsamples)sothatthedecodercandecodeacompleteframe.
9.3.2.5 Application as a Reception Hint Track
Reception hint tracks may be used when one or more packet streams of data are recorded. Theyindicatetheorder,receptiontiming,andcontentsofthereceivedpacketsamongotherthings.
NOTE1:Playersmayreproducethepacketstreamthatwasreceivedbasedonthereceptionhinttracksandprocessthereproducedpacketstreamasifitwasnewlyreceived.
Receptionhinttrackshavethesamestructureashinttracksforservers.
Theformatofthereceptionhintsamplesisindicatedbythesampledescriptionforthereceptionhinttrack.Eachprotocolhasitsownreceptionhintsampleformatandname.
ISO/IEC 14496-12:2015(E)
130 ©ISO/IEC2015–Allrightsreserved
NOTE2:Serversusingreceptionhinttracksashintsforsendingofthereceivedstreamsshouldhandlethepotentialdegradations of the received streams, such as transmission delay jitter and packet losses, gracefully andensure that the constraints of the protocols and contained data formats are obeyed regardless of thepotentialdegradationsofthereceivedstreams.
NOTE3:Aswithserverhinttracks,thesampleformatsofreceptionhinttracksmayenableconstructionofpacketsbypullingdataoutofothertracksbyreference.Theseothertracksmaybehinttracksormediatracks.Theexactformof thesepointers isdefinedbythesample format for theprotocol,but ingeneral theyconsistof fourpiecesofinformation:atrackreferenceindex,asamplenumber,anoffset,andalength.Someofthesemaybeimplicitforaparticularprotocol.These'pointers'alwayspointtotheactualsourceofthedata,i.e.,indirectdatareferencingisdisallowed.Ifahinttrackisbuilt'ontop'ofanotherhinttrack,thenthesecondhinttrackmusthavedirect references to themedia track(s)usedby the firstwheredata fromthosemedia tracks isplacedinthestream.
Ifreceiveddataisextractedtomediatracks,thede‐hintingprocessmustensurethatthemediastreamsarevalid,i.e.thestreamsmustbeerror‐free(whichrequirese.g.errorconcealment).
Asamplewithasizeofzeroispermittedinreceptionhinttracks,andsuchsamplesmaybeignored.
9.3.3 Sample Description Format
9.3.3.1 Introduction
ThesampledescriptionforanMPEG2‐TSreceptionhinttrackcontainsallstaticmetadatathatdescribethe streamoraportion thereof, especially thePSI/SI tables.MPEG‐2TS receptionhint tracksuseanentry‐formatinthesampledescriptionof'rm2t'(whichindicatesMPEG-2 Transport Stream).Theentry‐formatforMPEG2‐TSserverhinttracksis'sm2t'.
The staticmetadata documents e.g. PSI/SI tables. The presence of staticmetadata is optional.Whenpresent, the staticmetadata shall be valid for theMPEG2‐TS packets it describes. Consequently, if apieceofstaticmetadatachangesinthestream,anewsampleentryisneededforthefirstsampleatorafterthechange.Ifstaticmetadataisnotpresentinthesampleentry,structures,suchasPSI/SItables,storedintheMPEG2‐TSpacketsarevalidandthestreammustbescannedinordertofindoutwhichvaluesofstaticmetadataarevalidforaparticularsample.
9.3.3.2 Syntax
class MPEG2TSReceptionSampleEntry extends MPEG2TSSampleEntry(`rm2t´) { }
class MPEG2TSServerSampleEntry extends MPEG2TSSampleEntry(`sm2t´) { }
class MPEG2TSSampleEntry(name) extends HintSampleEntry(name) { uint(16) hinttrackversion = 1; uint(16) highestcompatibleversion = 1; uint(8) precedingbyteslen; uint(8) trailingbyteslen; uint(1) precomputed_only_flag; uint(7) reserved; box additionaldata[]; }
9.3.3.3 Semantics
hinttrackversion is currently 1; the highestcompatibleversion field specifies the oldestversionwithwhichthistrackisbackward‐compatible.
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 131
precedingbyteslen indicates the number of bytes that are preceding each MPEG2‐TS packet(whichmaye.g.beatime‐codefromanexternalrecordingdevice).
trailingbyteslen indicates the number of bytes that are at the end of eachMPEG2‐TS packet(whichmaye.g.containchecksumsorotherdatathatwasaddedbyarecordingdevice).
precomputed_only_flagindicateswhethertheassociatedsamplesarepurelyprecomputedifsetto1;
additionaldataisasetofboxes.ThissetcancontainboxesthatdescribeonecommonversionofthePSI/SItablesbymeansofthe'tPAT'boxorthe'tPMT'boxorotherdata,e.g.boxesthatareonlyvalidforasample(whichcontainsmultiplepackets)anddescribetheinitialconditionsoftheSTCorboxesthatdefinethecontentoftheprecedingortrailingdata.ThereshallbeatmostoneofeachofPATBox,TSTimingBox,InitialSampleTimeBoxpresentwithinadditionaldata
Thefollowingoptionalboxesforadditionaldataaredefined:
aligned(8) class PATBox() extends Box(‘tPAT’) { uint(3) reserved; uint(13) PID; uint(8) sectiondata[]; }
aligned(8) class PMTBox() extends Box(‘tPMT’) { uint(3) reserved; uint(13) PID; uint(8) sectiondata[]; }
aligned(8) class ODBox () extends Box (‘tOD ’) { uint(3) reserved; uint(13) PID; uint(8) sectiondata[]; }
aligned(8) class TSTimingBox() extends Box(‘tsti’) { uint(1) timing_derivation_method; uint(2) reserved; uint(13) PID; }
aligned(8) class InitialSampleTimeBox() extends Box(‘istm’) { uint(32) initialsampletime; uint(32) reserved; }
The'tPAT'boxcontainsthesectiondataofthePATandeach'tPMT'boxcontainsthesectiondataofoneofthePMTs.
In the case of an SPTS, it is strongly recommended that the 'tPMT' box is present in theadditionaldata. If the PMT is not present in the sample data, then it shall be present in theadditionaldata. If the 'tPMT' box is present, it shall be the PMT for the program contained in thesampledata(althoughtherecordedstreammaycontainotherprogramsandbeanMPTS).
PIDisthePIDoftheMPEG2‐TSpacketsfromwhichthedatawasextracted.Inthecaseofthe'tPAT'boxthisvalueisalways0.
sectiondata extends to the endof the box and is the completeMPEG2‐TS table, containing theconcatenatedsections,ofanidenticalversionnumber.
initialsampletimespecifiestheinitialvalueofthesampletimesincasethesampletimesdonotstartfrom0.Unlikemediatracks,MPEG‐2TShinttrackusuallyhavesampletimesnotstarting
ISO/IEC 14496-12:2015(E)
132 ©ISO/IEC2015–Allrightsreserved
from0, e.g., PCR times and reception times. Since ‘stts’ only stores thedeltabetween sampletimes,thisfieldisrequiredforreconstructingtheoriginalsampletimes:
OriginalSampleTime(n) = initialsampletime + STTS(n). IncasePCRtimesareusedforsampletimes,thereconstructedsampletimecanbeusedtoinitialize
theSTCwhenthesampleisrandomlyaccessed.Notethatthisfieldmayneedtobeupdatedafterediting.
timing_derivation_method is a flag which specifies the method which was used to set thesampletimeforagivenPID.Thevaluesfortiming_derivation_methodareasfollows:0x0receptiontime:thesampletimingisderivedfromthereceptiontime.Itisnotguaranteed
thattheSTCwasrecoveredforderivationofthereceptiontime.
0x1piecewiselinearitybetweenPCRs:thesampletimeisderivedfromareconstructedSTCforthisprogram.PiecewiselinearitybetweenadjacentPCRsisassumedandallTSpacketsinthesampleshaveaconstantdurationinthisrange.
9.3.4 Sample Format
EachsampleofanMPEG‐2TSHinttrackconsistsofasetof
pre‐computedpackets:oneormoreMPEG‐2TSpacketswiththeassociatedheadersandtrailers
constructedpackets:instructionstocomposeoneormoreMPEG2‐TSpacketswiththeassociatedheadersandtrailersbypointingtodataofanothertrack.
NotethateachMPEG‐2TSpacketinthesamplemaybeprecededwithapreheader(precedingbytes),orfollowedbyaposttrailer(trailingbytes),asdetailedintheSampleDescriptionFormat.Thesizeofthe preheader and the posttrailer are specified by precedingbyteslen and trailingbyteslen,respectively,inthesampledescriptiontoallowcompactsampletableswithfewerchunks.
It is possible for a mixture of precomputed and constructed samples to occur in the same track. Ifpadding of the transport stream packet is required, this can be accomplished with theadaptation_fieldorexplicitlybyusingtheMPEG2TSImmediateConstructorasappropriate.
NOTE1: ThenumberofMPEG‐2TSpacketsinthesamplecanbederivedfromthesamplesizetabledirectlyifthe sample consists of pre‐computed packets only, which is a conclusion if theprecomputed_only_flaginthesampleentryisset.ThenumberofMPEG‐2TSpacketsinthesamplemaybevariableorrestricted,e.g.extensionsofthisfileformatmaydefineasampletocontainexactlyonepacket.
NOTE2 Itispossibletocompactcommonsequencesofbytesintransportpacketsbyincludingthosebytesinoneormorepacketsdirectly for example in theirprecedingbytes ortrailingbytes section,and then using theMPEG2TSSampleConstructor in other places to refer to them; this is especiallyrelevantforrunsof0xFFbytes.
9.3.4.1 Syntax
// Constructor format aligned(8) abstract class MPEG2TSConstructor (uint(8) type) { uint(8) constructor_type = type; }
aligned(8) class MPEG2TSImmediateConstructor extends MPEG2TSConstructor(1) { uint(8) immediatedatalen; uint(8) data[immediatedatalen]; }
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 133
aligned(8) class MPEG2TSSampleConstructor extends MPEG2TSConstructor(2) { uint(8) sampledatalen; uint(16) trackrefindex; uint(32) samplenumber; uint(32) sampleoffset; }
// Packet format aligned(8) class MPEG2TSPacketRepresentation { uint(8) precedingbytes[precedingbyteslen]; uint(8) sync_byte; if (sync_byte == 0x47) { uint(8) packet[187]; } else if (sync_byte == 0x00 || sync_byte == 0x01) { uint(8) headerdatalen; uint(4) reserved; uint(4) num_constructors; bit(1) transport_error_indicator; bit(1) payload_unit_start_indicator; bit(1) transport_priority; bit(13) PID; bit(2) transport_scrambling_control; bit(2) adaptation_field_control; bit(4) continuity_counter; if (sync_byte == 0x00 && (adaptation_field_control == ´10´ || adaptation_field_control == ´11´)) { uint(8) adaptation_field[headerdatalen-3]; } MPEG2TSConstructor constructors[num_constructors]; } else if (sync_byte == 0xFF) { // implicit null packet that has been removed } uint(8) trailingbytes[trailingbyteslen]; } // Sample format aligned(8) class MPEG2TSSample { MPEG2TSPacketRepresentation sample[]; }
9.3.4.2 Semantics
precedingbytescontainsanyextradataprecedingthepacket,typicallyprovidedbytherecordingdevice.Forexample,thismayincludeatimestamp.
sync_byte:ifthisvalueis0x47,thenthepacketrepresentationcontainsatransportstreampacket(aprecomputedreceptionhinttracksample),withtheremainingbytesfollowinginthefieldpacket.Thevalues0x00and0x01areusedforconstructedpacketrepresentation(s).IfMPEG2TSSampleConstructorisusedtoconstructpacketrepresentation(s),itpointstoatrackindexedbytrackrefindexinthetrackreferenceboxwithreferencetype'hint'.Ifthisvalueis0xFF,itimpliesthatanullpackethasbeenremovedatthisposition.Allothervaluesarecurrentlyreserved.
trackrefindexindexesinthetrackreferenceboxwithreferencetype'hint'toindicatewithwhichmediatrackthecurrentsampleisassociated.Thesamplenumberandsampleoffset fieldsintheMPEG2TSSampleConstructorpointintothismediatrack.Thetrackrefindexstartsfromvalue1.Thevalue0isreservedforfutureuse.
packet:TheMPEG‐2TSpacket,apartfromthesyncbyte(0x47).TheMPEG2TSConstructor array is a collection of one ormore constructor entries, to allow for
multipleaccessunitsinonetransportstreampacket.AnMPEG2TSImmediateConstructorcancontain,amongstothers, thePESheader.AnMPEG2TSSampleConstructor referencesdata inthe associated media track. The sum of headerdatalen and the datalen fields of allconstructorsofanMPEG2TSPacketmustbeequaltothelengthofthetransportstreampacketbeingconstructed,minus1byte,whichis187.
ISO/IEC 14496-12:2015(E)
134 ©ISO/IEC2015–Allrightsreserved
trailingbytes contains any extra data following the packet. For example, this may include achecksum.
samplenumber indicates the sample within the referred track contained in the packet andsampleoffset indicates thestartingbytepositionof thereferredmedia samplecontained inthepacketofwhichsampledatalenbytesareincluded.sampleoffsetstartsfromvalue0.
immediatedatalen indicates thenumberof byteswithin the fielddata that are included in thesampleratherthandatabeingincludedintothesamplebyreferencetoamediatrack.
headerdatalenindicatesthelengthoftheTSpacketheader(withoutthesyncbyte)inbytes.Thisfield has the value 3 if the adaptation_field is not present or the value(adaptation_field_length+3), where adaptation_field_length is the first octet of thestructureadaptation_field asdefinedinISO/IEC13818‐1.
Neithertheformatofprecedingbytesnortrailingbytesaredefinedbythisspecification.
The remaining fields (transport_error_indicator, payload_unit_start_indicator,transport_priority, PID, transport_scrambling_control, adaptation_field_control,continuity_counter, adaptation_field) of the sample structure contain a copy of the packetheaderoftheTSpacket,asdefinedinISO/IEC13818‐1.
9.3.5 Protected MPEG 2 Transport Stream Hint Track
9.3.5.1 Introduction
ThisSubclausedefinesamechanismformarkingmediastreamsasprotected.Thisworksbychangingthe four character code of the SampleEntry, and appending boxes containing both details of theprotection mechanism and the original four character code. However, in this case the track is notprotected;itisan‘intheclear’hinttrackwhichcontainsprotecteddata.ThisSubclausedescribeshowhinttracksshouldbemarkedascarryingprotecteddata,usingasimilarmechanism,andutilizingthesameboxes.
9.3.5.2 Syntax
class ProtectedMPEG2TransportStreamSampleEntry extends MPEG2TransportStreamSampleEntry(‘pm2t’) { ProtectionSchemeInfoBox SchemeInformation; }
9.3.5.3 Semantics
The SchemeInformation (‘sinf’) box (defined in 0) shall contain details of the protection schemeapplied.ThisshallincludetheOriginalFormatBoxwhichshallcontaintheoriginalsampleentrytypeoftheMPEG‐2TransportStreamSampleEntrybox.
9.4 RTP, RTCP, SRTP and SRTCP Reception Hint Tracks
9.4.1 RTP Reception Hint Track
9.4.1.1 Introduction
ThisSubclausespecifiesthereceptionhinttrackformatforthereal‐timetransportprotocol(RTP),asdefinedinIETFRFC3550.
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 135
RTP is used for real‐timemedia transport over the Internet Protocol. Each RTP stream carries onemedia type, andoneRTPreceptionhint trackcarriesoneRTPstream.Hence, recordingof anaudio‐visualprogramresultsintoatleasttwoRTPreceptionhinttracks.
Thedesignof theRTPreceptionhint track format followsasmuchaspossible thedesignof theRTPserver hint track format. This design should ensure that RTP packet transmission operates verysimilarly regardless whether it is based on RTP reception hint tracks or RTP server hint tracks.Furthermore,thenumberofnewdatastructures inthefile formatwasconsequentlykeptassmallaspossible.
TheformatoftheRTPreceptionhinttracksallowstoringofthepacketpayloadsinthehintsamples,orconverting the RTP packet payloads to media samples and including them by reference to the hintsamples, or combining both approaches. As noted earlier, conversion of received streams to mediatracks allows existing players compliant with earlier versions of the ISO base media file format toprocessrecordedfilesaslongasthemediaformatsarealsosupported.StoringtheoriginalRTPheadersretainsvaluableinformationforerrorconcealmentandthereconstructionoftheoriginalRTPstream.Itisnotedthattheconversionofpacketpayloadstomediasamplesmayhappen"off‐line"afterrecordingofthestreamsinprecomputedRTPreceptionhinttrackshasbeencompleted.
9.4.1.2 Sample Description Format
Theentry‐formatinthesampledescriptionfortheRTPreceptionhinttracksis'rrtp'.ThesyntaxofthesampleentryisthesameasforRTPserverhinttrackshavingtheentry‐format'rtp'.
class ReceivedRtpHintSampleEntry() extends SampleEntry (‘rrtp‘) { uint(16) hinttrackversion = 1; uint(16) highestcompatibleversion = 1; uint(32) maxpacketsize; box additionaldata[]; }
Theentry‐formatidentifierinthesampledescriptionoftheRTPreceptionhinttrackisdifferentfromtheentry‐formatinthesampledescriptionoftheRTPserverhinttrack,inordertoavoidusinganRTPreceptionhinttrackthatcontainserrorsasavalidserverhinttrack.
The additionaldata set of boxes may include the timescale entry ('tims') and time offset ('tsro')boxes.Moreover,theadditionaldatamaycontainatimestampsynchronybox.
Thetimescaleentrybox(‘tims’)shallbepresentandthevalueoftimescaleshallbesettomatchtheclockfrequencyoftheRTPtimestampsofthestreamcapturedinthereceptionhinttrack.
Thetimeoffsetbox(‘tsro’)maybepresent.Ifthetimeoffsetboxisnotpresent,thevalueofthefieldoffsetisinferredtobeequalto0.ThevalueofthefieldoffsetisusedforthederivationoftheRTPtimestamp,asspecifiedin9.4.1.4.
RTPtimestampstypicallydonotstartfromzero,especiallyifanRTPreceiver'tunes'intoastream.ThetimeoffsetboxshouldthereforebepresentinRTPreceptionhinttracksandthevalueofoffsetinthetimeoffsetboxshouldbesetequaltothefirstRTPtimestampoftheRTPstreaminreceptionorder.
ISO/IEC 14496-12:2015(E)
136 ©ISO/IEC2015–Allrightsreserved
Zerooronetimestampsynchronyboxesmaybepresentintheadditionaldataofthesampleentryfor a RTP reception hint track. If a timestampsynchrony box is not present, the value oftimestamp_syncisinferredtobeequalto0.
class timestampsynchrony() extends Box(‘tssy’) { unsigned int(6) reserved; unsigned int(2) timestamp_sync; }
timestamp_syncequal to0 indicatesthat theRTPtimestampsof thepresentRTPreceptionhinttrack derived from the Formula in 9.4.1.4 may or may not be synchronized with RTPtimestampsofotherRTPreceptionhinttracks.
timestamp_syncequal to1 indicatesthat theRTPtimestampsof thepresentRTPreceptionhinttrackderivedfromtheFormulain9.4.1.4reflectthereceivedRTPtimestampsexactly(withoutcorrectedsynchronizationtoanyotherRTPreceptionhinttrack).
timestamp_syncequalto2indicatesthatRTPtimestampsofthepresentRTPreceptionhinttrackderived from the Formula in 9.4.1.4 are synchronized with RTP timestamps of other RTPreceptionhinttracks.
Whentimestamp_sync is equal to 0 or 1, a player should correct the inter‐stream synchronizationusing storedRTCP sender reports.Whentimestamp_sync is equal to2, themedia contained in theRTP reception hint tracks can be played out synchronously according to the reconstructed RTPtimestampswithout synchronization correctionusingRTCPSenderReports. If it is expected that theRTPreceptionhinttrackwillbeusedforre‐sendingtherecordedRTPstream,itisrecommendedthattimestamp_syncbesetequalto0or1,becausethestoredRTCPsenderreportscanbereused.
timestamp_syncequalto3isreserved.
Thevalueoftimestamp_syncshallbeidenticalforallRTPreceptionhinttrackspresentinafile.
WhenRTCPisalsostored,usinganRTCPhinttrack,thetimestamprelationshipbetweentheRTPandRTCPhinttrackscanonlybemaintainediftheRTPtimestampsareanchoredbyusingasettimeoffset(‘tsro’)intheRTPtrack,andhencethetimeoffsetismandatoryifRTCPisstoredinanRTCPhinttrack.
ZerooroneReceivedSsrcBox identifiedwith the four‐charactercode ‘rssr’shallbepresent in theadditionaldataofasampledescriptorentryofaRTPreceptionhinttrack:
class ReceivedSsrcBox extends Box(‘rssr’) { unsigned int(32) SSRC }
TheSSRCvaluemustequaltheSSRCvalueintheheaderofallrecordedSRTPpacketsdescribedbythesampledescription.
9.4.1.3 Sample Format
Thesample formatofRTPreceptionhint tracks is identical to thesyntaxof thesample formatof theRTPserverhinttracks.Eachsample inthereceptionhinttrackrepresentsoneormorereceivedRTPpackets.IfmediaframesarenotbothfragmentedandinterleavedinanRTPstream,itisrecommendedthat each sample represents all received RTP packets that have the same RTP timestamp, i.e.,consecutivepacketsinRTPsequencenumberorderwithacommonRTPtimestamp.
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 137
EachRTPreceptionhint samplecontains twoareas: the instructions tocompose thepacket, andanyextradataneededforcomposingthepacket,suchasacopyofthepacketpayload.Notethatthesizeofthesampleisknownfromthesamplesizetable.
Since the reception time for the packetsmay vary, this variation canbe signalled for eachpacket asspecifiedsubsequently.
Asamplewithasizeofzeroispermittedinreceptionhinttracks,andsuchsamplesmaybeignored.
9.4.1.4 Packet Entry Format
Eachpacket in thepacket entry tablehas same structure as for server (transmission)hint tracks, in9.1.3.1.
Whereiisthesamplenumberofasample,thesumofthesampletimeDT(i)asspecifiedin8.6.1.2andrelative_time indicatesthereceptiontimeofthepacket.Theclocksourceforthereceptiontimeisundefinedandmaybe, for instance, thewallclockofthereceiver. If therangeofreceptiontimesofareceptionhinttrackoverlapsentirelyorpartlywiththerangeofreceptiontimesofanotherreceptionhinttrack,theclocksourcesforthesehinttracksshallbethesame.
Itisrecommendedthatreceiversmayuseaconstantvalueforsample_deltainthedecodingtimetosample box ('stts') as much as reasonable and smooth out packet scheduling and end‐to‐end delayvariationbysettingrelative_timeadaptivelyinstoredreceptionhintsamples.Thisarrangementofsetting the values of sample_delta and relative_time can facilitate a compact decoding time tosamplebox.Inthiscasetimestamp_syncissetto1,thesampledurationsaremostlyconstant,andthetimeoffset(‘tsro’)isstoredinthesampleentry.
The values of RTP_version, P_bit, X_bit, CSRC_count, M_bit, payload_type, andRTPsequenceseedshallbesetequal totheV,P,X,CC,M,PTandsequencenumber fieldsof theRTPpacketcapturedinthesample.
Thefieldsbframe_flagandrepeat_flagarereservedinreceptionhinttracksandmustbezero.
Thesemanticsof extra_flagandextra_information_lengthareidenticaltothoseofspecifiedfortheRTPserverhinttracks.
ThefollowingTLVboxesarespecified:rtphdrextTLV,rtpoffsetTLV, receivedCSRC.
If theX_bit is set a singlertphdrextTLV box shall bepresent for storing the receivedRTPHeaderExtension.
aligned(8) class rtphdrextTLV extends Box(‘rtpx’) { unsigned int(8) data[]; }
dataistherawRTPHeaderExtensionwhichisapplication‐specific.
ThesyntaxofthertpoffsetTLVboxisspecifiedin9.1.3.1.
offsetindicatesa32‐bitsignedintegeroffsettotheRTPtimestampofthereceivedRTPpacket.Letibe the samplenumberof a sample,DT(i)beequal toDTas specified in8.6.1.2 for samplenumber i,
ISO/IEC 14496-12:2015(E)
138 ©ISO/IEC2015–Allrightsreserved
tsro.offsetbethevalueofoffsetinthe'tsro'boxofthereferredreceptionhintsampleentry,and%bethemodulooperation.ThevalueofoffsetshallbesuchthatthefollowingFormulaistrue:
RTPtimestamp (DTi tsro.offset offset)mod232
formula(1)RTPtimestampcalculation
NOTE1: When each reception hint sample represents all received RTP packets that have the same RTPtimestamp,thevalueofsample_deltainthedecodingtimetosampleboxcanbesettomatchtheRTP timestamp. Inotherwords,DT(i), as specified above, canbe set equal to (theRTP timestamp–tsro.offset–offset)(assumingthattheresultingvaluewouldbegreaterthanorequalto0).Thisisrecommended.
NOTE2: RTPtimestampsdonotnecessarilyincreaseasafunctionofRTPsequencenumberinallRTPstreams,i.e., transmissionorderandplaybackorderofpacketsmaynotbe identical.Forexample,manyvideocoding schemes allow bi‐prediction from previous and succeeding pictures in playback order. Assamplesappearintracksintheirdecodingorder,i.e.,inreceptionorderincaseofRTPreceptionhinttracks,offsetinthertpoffsetTLVboxcanbeusedtowarptheRTPtimestampawayfromthesampletimeDT(i).
ForthepurposeofeditsinEditListBoxes,thecompositiontimeofareceivedRTPpacketisinferredtobethesumofthesampletimeDT(i)andoffsetasspecifiedabove.
IfthevalueofCSRC_countisnotequaltozero,areceivedCSRCboxmaybepresentforstoringthereceivedCSRCheaderfieldsforeachRTPpacket.ThereceivedCSRCboxisidentifiedwiththefour‐charactercode‘rcsr’
aligned(8) class receivedCSRC extends Box('rcsr') { unsigned int(32) CSRC[]; //to end of the box }
The number of entries in CSRC[] equals the CC value of received SRTP packets. The nth entry ofCSRC[]shallequalthenthCSRCvalueoftheRTPpacketheader.
9.4.1.5 SDP information
BothmovieandtrackSDPinformationmaybepresent,asspecifiedin9.1.4.
9.4.2 RTCP Reception Hint Track
9.4.2.1 Introduction
This Subclause specifies the reception hint track format for the real‐time control protocol (RTCP),definedinIETFRFC3550.
RTCP is used for real‐time transport of control information for an RTP session over the InternetProtocol.Duringstreaming,eachRTPstreamtypicallyhasanaccompanyingRTCPstreamthatcarriescontrolinformationfortheRTPstream.OneRTCPreceptionhinttrackcarriesoneRTCPstreamandisassociatedtothecorrespondingRTPreceptionhinttrackthroughatrackreference.
The format of theRTCP receptionhint tracks allows the storageofRTCPSenderReports in the hintsamples.
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 139
The RTCP Sender Reports are of particular interest for stream recording, because they reflect thecurrentstatusof theserver,e.g., therelationshipof themediatiming(RTPtimestampofaudio/videopackets) to the server time (absolute time in NTP format). Knowledge of this relationship is alsonecessaryforplaybackofrecordedRTPreceptionhinttrackstobeabletodetectandcorrectclockdriftandjitter.
Thetimestampsynchronyboxasspecifiedin9.4.1.2makes itpossibletocorrectclockdriftand jitterbeforeplayingafile,andthereforerecordingofRTCPstreamsisoptionalwhentimestamp_syncisequalto2.
There is no serverhint track equivalent for theRTCP receptionhint track, sinceRTCPmessages aregeneratedon‐the‐flyduringtransmission.
9.4.2.2 General
There shall be zero or one RTCP reception hint track for each RTP reception hint track. An RTCPreception hint track shall contain a track reference box including a reference of type 'cdsc' to theassociatedRTPreceptionhinttrack.
When i is the samplenumberof a sample, the sample timeDT(i) as specified in8.6.1.2 indicates thereception time of the packet. The clock source for the reception time shall be the same as for theassociated RTP reception hint track. The value of timescale in the Media Header Box of an RTCPreceptionhinttrackshallbeequaltothevalueoftimescaleinthemediaheaderboxoftheassociatedRTPreceptionhinttrack.
9.4.2.3 Sample Description Format
Theentry‐formatinthesampledescriptionfortheRTCPreceptionhinttracksis 'rtcp'.Itisotherwiseidentical in structure to the sample entry format for RTP. There are no defined boxes for theadditionaldatafield.
9.4.2.4 Sample Format
9.4.2.4.1 Introduction
Eachsample in thereceptionhint trackrepresentsoneormorereceivedRTCPpackets.Eachsamplecontainstwoareas:therawRTCPpacketsandanyextradataneeded.Notethatthesizeofthesampleisknownfromthesamplesize table,andthat thesizeofanRTCPpacket is indicatedwithin thepacketitself(asdocumentedinRFC3550),asacountonelessthanthenumberof32‐bitwordsinthatpacket.
9.4.2.4.2 Syntax
aligned(8) class receivedRTCPpacket { unsigned int(8) data[]; }
aligned(8) class receivedRTCPsample { unsigned int(16) packetcount; unsigned int(16) reserved; receivedRTCPpacket packets[packetcount]; }
ISO/IEC 14496-12:2015(E)
140 ©ISO/IEC2015–Allrightsreserved
9.4.2.4.3 Semantics
datacontainsarawRTCPpacketincludingtheRTCPreportheader,the20‐bytesenderinformationblock and any number of report blocks.Note that the size of eachRTCPpacket is knownbyparsingthe16‐bitlengthfieldoftheRTCPheader.
packetcountindicatesthenumberofreceivedRTCPpacketscontainedinthesample.packetscontainsthereceivedRTCPpackets.
9.4.3 SRTP Reception Hint Track
9.4.3.1 Introduction
This Subclause specifies the reception hint track formats for the secure real‐time transport protocol(SRTP),asdefinedinIETFRFC3711.
SRTP is a secure extension of the real‐timemedia transport (RTP) over the Internet Protocol. EachSRTP stream carries one media type, and one SRTP reception hint track carries one SRTP stream.Hence,recordingofanaudio‐visualprogramresultsintoatleasttwoSRTPreceptionhinttracks.
ThedesignoftheSRTPreceptionhinttrackformatfollowsthedesignofRTPreceptionhinttracksandreusesmost of the frameworkprovidedbyRTP receptionhint tracks.Themajordifference betweenRTPandSRTPreceptionhinttracksisthattheactualmediapayloadisstoredinanencryptedformforSRTP receptionhint tracks,whereas it is unencrypted forRTP receptionhint tracks. SRTP receptionhint tracks provide additional boxes to store informationnecessary todecrypt encrypted content onplayback.Additionally,allheaderfieldsoftheSRTPpacketheadershallbestoredwiththepayload,asthisinformationisnecessarytochecktheintegrityofthereceiveddata.SRTPreceptionhinttracksarecommonlyusedtogetherwithSRTCPreceptionhinttracks.
SRTPreceptionhinttracksmay,forexample,beusedtostoreprotectedmobileTVcontent.
9.4.3.2 Sample Description Format
9.4.3.2.1 Sample Description Entry
ThesampledescriptionformatforSRTPreceptionhinttracksisidenticaltothatforRTPreceptionhinttrackswith theexception that thesampleentryname ischanged from ‘rrtp’ to ‘rsrp’and that itmaycontainadditionalboxes:
class ReceivedSrtpHintSampleEntry() extends SampleEntry (‘rsrp‘) { uint(16) hinttrackversion = 1; uint(16) highestcompatibleversion = 1; uint(32) maxpacketsize; box additionaldata[]; }
Fields and boxes are identical to those of the ReceivedRtpHintSampleEntry (‘rrtp‘). Theaddtionaldata[] of each sample description entry of a SRTPReceptionHint Track shall containexactlyoneReceivedSsrcBox(‘rssr’).
Additionally, the additionaldata[] may contain the Received Cryptographic Context ID box and theRolloverCounterboxdefinedbelow.Furthermore,aSRTPProcessBoxshallalsobeincludedasoneofthe additionaldata boxes. As the content is stored encrypted, the integrity and the encryption
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 141
algorithmfieldsintheSRTPProcessboxspecifythealgorithmthatwasappliedtothereceivedstream.Anentryoffourspaces($20$20$20$20)maybeusedtoindicatethatthealgorithmisdefinedbymeansoutsidethescopeofthisdocument.
9.4.3.2.2 Received Cryptographic Context ID Box
ZerooroneReceivedCryptoContextIdBox,identifiedwiththefour‐charactercode‘ccid’,maybepresent in the additionaldata of a sample descriptor entry of an SRTP reception hint track.InformationtorecoverthecryptographiccontextforthereceivedSRTPstreammaybestoredhere.
aligned(8) class ReceivedCryptoContextIdBox extends Box (‘ccid’) { unsigned int(16) destPort; unsigned int(8) ip_version; switch (ip_version) { case 4: // IPv4 unsigned int(32) destIP; break; case 6: // IPv6 unsigned int(64) destIP; break; } }
ThedestPortanddestIPparameterscontaintheportnumberandtheIPaddress(aspresentinthereceivedIPv4orIPv6packages),respectively,oftheSRTPsessionviawhichtherecordedSRTPpacketswerereceived.ip_versioncontainseither4or6representingIPv4orIPv6,respectively.
9.4.3.2.3 Rollover Counter Box
ZerooroneRolloverCounterBox,identifiedwiththefour‐charactercode‘sroc’,maybepresentintheadditionaldata of a sample descriptor entry of an SRTP reception hint track. Typically, therollovercountervaluechangesevery65536SRTPpackage.
aligned(8) class RolloverCounterBox extends Box (‘sroc’) { unsigned int(32) rollover_counter; }
Therollover_counterisanon‐zerointegerthatgivesthevalueoftheROCfieldforallassociatedreceivedSRTPpackets.
NOTE:Therollovercounter(ROC)isanelementofthecryptographiccontextofaSRTPstreamanddependsontheabsolutepositionofapacketinanRTPstream.KnowledgeoftheROCvalueisnecessaryinordertodecryptareceived SRTP packet. It is optional to use the rollover counter box as RFC 4771 defines as an optionalmechanismtosignaltheROCvalueexplicitlyintheauthenticationtagofaSRTPpackage.
9.4.3.3 Sample and Packet Entry Format
Both, sample formatandpacketEntry format forSRTPreceptionhint tracksare identical to thoseofRTPreceptionhinttracks,definedin9.4.1.3and9.4.1.4.ThepacketpayloadisstoredasreceivedintheSRTPpackets,i.e.,allinformationreceivedintheSRTPpacketexcludingtheheaderor,inotherwords,theencryptedpayloadtogetherwiththekeyidentifier(MKI)andtheauthenticationtag.
If thevalueofCSRC_count isnotequal to zero fora receivedSRTPpacket, theextra_data_tlvcorresponding to this receivedSRTPpacket shall contain exactly one receivedCSRC box(‘rcsr’).
ISO/IEC 14496-12:2015(E)
142 ©ISO/IEC2015–Allrightsreserved
9.4.4 SRTCP Reception Hint Tracks
9.4.4.1 Introduction
This Subclause specifies the reception hint track format for the secure real‐time control protocol(SRTCP),definedinIETFRFC3711.
SRTCP is used for real‐time transport of control information for a SRTP session over the InternetProtocol.SRTCPtakesforSRTPtherolethatRTCPtakesforRTP,cf.,9.4.2.Duringstreaming,eachSRTPstream typically has an accompanying SRTCP stream that carries control information for the SRTPstream. One SRTCP reception hint track carries one SRTCP stream and is associated to thecorrespondingSRTPreceptionhinttrackthroughatrackreference.
TheformatoftheSRTCPreceptionhinttracksallowsthestorageofSRTCPPacketsinthehintsamples,e.g.,ofSRTCPSenderReports.
The SRTCP Sender Reports are of particular interest for stream recording, because they reflect thecurrentstatusoftheserver,e.g.,therelationshipofthemediatiming(SRTPtimestampofaudio/videopackets) to the server time (absolute time in NTP format). Knowledge of this relationship is alsonecessaryforplaybackofrecordedSRTPreceptionhinttracksinordertobeabletodetectandcorrectclockdriftandjitter.
Thetimestampsynchronyboxasspecifiedin9.4.1.2makes itpossibletocorrectclockdriftand jitterbeforeplayingafile,andthereforerecordingofSRTCPstreamsisoptional.
ThereisnoserverhinttrackequivalentfortheSRCTPreceptionhinttrack,sinceSRTCPmessagesaregeneratedon‐the‐flyduringtransmission.
9.4.4.2 General
ThereshallbezerooroneSRTCPreceptionhint track foreachSRTPreceptionhint track.AnSRTCPreception hint track shall contain a track reference box including a reference of type 'cdsc' to theassociatedSRTPreceptionhinttrack.
When i is the sample number a sample, the sample time DT(i) as specified in 8.6.1.2 indicates thereception time of the packet. The clock source for the reception time shall be the same as for theassociatedSRTPreceptionhint track.Thevalueoftimescale in theMediaHeaderBoxof anSRTCPreceptionhinttrackshallbeequaltothevalueoftimescaleinthemediaheaderboxoftheassociatedSRTPreceptionhinttrack.
9.4.4.3 Sample Description Format
Theentry‐formatinthesampledescriptionfortheSRTCPreceptionhinttracksis'stcp'.ItisotherwiseidenticalinstructuretothesampleentryformatforRTCP.TheencryptionandauthenticationmethodoftheSRTCPhinttracksaredefinedbytherespectiveentriesinSRTPProcessboxofthecorrespondingSRTPhinttrack.
NOTE: An equivalent to the ROC boxes defined for SRTP is not necessary for SRTCP, as the SRTCP packetcontainsanexplicitlysignalledinitializationvector.
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 143
9.4.4.4 Sample Format
SampleformatisthesampleformatforRTCPreceptionhinttracksasdefinedin9.4.2.4.
9.4.5 Protected RTP Reception Hint Track
9.4.5.1 Introduction
This specification defines a mechanism for marking media streams as protected. This works bychangingthefourcharactercodeof theSampleEntry,andappendingboxescontainingbothdetailsoftheprotectionmechanismandtheoriginal fourcharactercode.However, inthiscasethetrack isnotprotected;itisan ‘intheclear’hinttrackwhichcontainsprotecteddata.ThisSubclausedescribesthehowreceptionhinttracksshouldbemarkedascarryingprotecteddata,usingasimilarmechanism,andutilizingthesameboxes.
9.4.5.2 Syntax
Class ProtectedRtpReceptionHintSampleEntry extends RtpReceptionHintSampleEntry (‘prtp‘) { ProtectionSchemeInfoBox SchemeInformation; }
9.4.5.3 Semantics
TheSchemeInformation(‘sinf‘)boxshallcontaindetailsoftheprotectionschemeapplied.ThisshallincludetheOriginalFormatBoxwhichshallcontainthefourcharactercode’rrtp‘(thefourcharactercodeoftheoriginalRTPReceptionHintSampleEntrybox).
9.4.6 Recording Procedure
SeeAnnexH.
9.4.7 Parsing Procedure
SeeAnnexH.
10 Sample Groups
10.1 Random Access Recovery Points
10.1.1.1 Definition
Insomecodingsystemsitispossibletorandomaccessintoastreamandachievecorrectdecodingafterhavingdecodedanumberofsamples.Thisisknownasgradualdecodingrefresh.Forexample,invideo,the encoder might encode intra‐codedmacroblocks in the stream, such that it knows that within acertainperiodtheentirepictureconsistsofpixelsthatareonlydependentonintra‐codedmacroblockssuppliedduringthatperiod.
Samples for which such gradual refresh is possible aremarked by being amember of one of thesegroups.Thedefinitionofthegroupsallowsthemarkingtooccurateitherthebeginningoftheperiodortheend.However,whenusedwithaparticularmediatype,theusageofthesegroupsmayberestrictedtomarkingonlyoneend(i.e.restrictedtoonlypositiveornegativerollvalues).Aroll‐groupisdefinedasthatgroupofsampleshavingthesamerolldistance.
ISO/IEC 14496-12:2015(E)
144 ©ISO/IEC2015–Allrightsreserved
Therollgroupshavethefollowingsemantics.
AVisualRollRecoveryEntry documents samples that enable entry points into streams that arealternativestosyncsamples.
AnAudioRollRecoveryEntrydocumentsthepre‐rolldistancerequiredinaudiostreamsinwhicheverysamplecanbeindependentlydecoded,butthedecoderoutputisonlyassuredtobecorrectafterpre‐rollingbytheindicatednumberofsamples.
AnAudioPreRollEntry is usedwith audio streams inwhich not every sample is a sync sample;decodingcanonlystart at a syncsample,butdecoderoutput isonlyassured tobecorrectafterpre‐rollingbytheindicatednumberofsamples.Thismeansthattoachievecorrectoutputwhenperformingrandomaccess, first it isnecessary tobackupby the indicatedpre‐rolldistance,and then(toenabledecodingtostart)findthenearestsyncsampleat,orpreceding,thatposition.
10.1.1.2 Syntax
class VisualRollRecoveryEntry() extends VisualSampleGroupEntry (’roll’) { signed int(16) roll_distance; }
class AudioRollRecoveryEntry() extends AudioSampleGroupEntry (’roll’) { signed int(16) roll_distance; }
class AudioPreRollEntry() extends AudioSampleGroupEntry (’prol’) { signed int(16) roll_distance; }
10.1.1.3 Semantics
roll_distance is a signed integer that gives the number of samples thatmust be decoded inorder fora sample tobedecodedcorrectly.Apositivevalue indicates thenumberof samplesafter the sample that is a groupmember thatmust be decoded such that at the last of theserecovery is complete, i.e. the last sample is correct.Anegative value indicates thenumberofsamplesbeforethesamplethatisagroupmemberthatmustbedecodedinorderforrecoverytobecompleteatthemarkedsample.Thevaluezeromustnotbeused;thesyncsampletabledocumentsrandomaccesspointsforwhichnorecoveryrollisneeded.
10.2 Rate Share Groups
10.2.1 Introduction
Rate share instructions are used by players and streaming servers to help allocating bitratesdynamicallywhenseveralstreamsshareacommonbandwidthresource.Theinstructionsarestoredinthe file as sample group entries and apply when scalable or alternative media streams at differentbitratesarecombinedwithotherscalableoralternativetracks.Theinstructionsaretime‐dependentassamplesinatrackmaybeassociatedwithdifferentsamplegroupentries.Inthesimplestcase,onlyonetargetratesharevalueisspecifiedpermediaandtimerangeasillustratedinFigure5.
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 145
A/
V R
ate
Sh
are
(%)
timeHigher audio rate
required
Audio
Video
Figure 5 — Audio/Video rate share as function of time
In order to accommodate for rate share values that varywith the available bitrate, it is possible tospecify more than one operation range. Onemay for instance indicate that audio requires a higherpercentage(thanvideo)atlowavailablebitrates.TechnicallythisisdonebyspecifyingtwooperationpointsasshowninFigure6.
Au
dio
Ra
te S
hare
(%
)
Available bitrate
Higher audio rate required
Lower audio rate required
OP 1 OP 2
Figure 6 — Audio rate share as function of available bitrate
Operationpointsaredefined in termsof totalavailablebandwidth.Formorecomplexsituations it ispossibletospecifymoreoperationpoints.
Inadditiontotargetratesharevalues,itisalsopossibletospecifymaximumandminimumbitratesforacertainmedia,aswellasdiscardpriority.
ISO/IEC 14496-12:2015(E)
146 ©ISO/IEC2015–Allrightsreserved
10.2.2 Rate Share Sample Group Entry
10.2.2.1 Definition
Eachsampleofatrackmaybeassociatedto(zeroor)oneofanumberofsamplegroupdescriptions,each ofwhich defines a record of rate‐share information. Typically the same rate‐share informationapplies tomany consecutive samples and itmay therefore be enough to define twoor three samplegroupdescriptionsthatcanbeusedatdifferenttimeintervals.
The grouping type'rash' (short for rate share) is defined as the grouping criterion for rate shareinformation. Zero or one sample‐to‐group box ('sbgp') for the grouping type 'rash' can becontainedinthesampletablebox('stbl')ofatrack. Itshallresideinahinttrack, ifahinttrack isused,otherwiseinamediatrack.
Target rate sharemaybespecified for severaloperationpoints thataredefined in termsof the totalavailablebitrate,i.e.,thebitratethatshouldbeshared.Ifonlyoneoperationpointisdefined,thetargetrateshareappliestoallavailablebitrates.Ifseveraloperationpointsaredefined,theneachoperationpointspecifiesatargetrateshare.Targetratesharevaluesspecifiedforthefirstandthelastoperationpointsalsospecifythetargetratesharevaluesatlowerandhigheravailablebitrates,respectively.Thetargetratesharebetweentwooperationpointsisspecifiedtobeintherangebetweenthetargetratesharesofthoseoperationpoints.Onepossibilityistoestimatewithlinearinterpolation.
10.2.2.2 Syntax
class RateShareEntry() extends SampleGroupDescriptionEntry('rash') { unsigned int(16) operation_point_count; if (operation_point_count == 1) { unsigned int(16) target_rate_share; } else { for (i=0; i < operation_point_count; i++) { unsigned int(32) available_bitrate; unsigned int(16) target_rate_share; } } unsigned int(32) maximum_bitrate; unsigned int(32) minimum_bitrate; unsigned int(8) discard_priority; }
10.2.2.3 Semantics
operation_point_countisanon‐zerointegerthatgivesthenumberofoperationpoints.available_bitrateisapositiveintegerthatdefinesanoperationpoint(inkilobitspersecond).
It is the total available bitrate that can be allocated in shares to tracks. Each entry shall begreaterthanthepreviousentry.
target_rate_share is an integer. A non‐zero value indicates the percentage of availablebandwidththatshouldbeallocatedtothemediaforeachoperationpoint.Thevalueofthefirst(last)operationpointappliestolower(higher)availablebitratesthantheoperationpointitself.The target rate share between operation points is bounded by the target rate shares of thecorresponding operation points. A zero value indicates that no information on the preferredratesharepercentageisprovided.
maximum_bitrate is an integer. A nonzero value indicates (in kilobits per second) an upperthreshold for which bandwidth should be allocated to the media. A higher bitrate thanmaximumbitrate should only be allocated if all othermedia in the session has fulfilled their
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 147
quotas fortargetrate‐shareandmaximumbitrate,respectively.Azerovalueindicatesthatnoinformationonmaximumbitrateisprovided.
minimum_bitrate is an integer. A nonzero value indicates (in kilobits per second) a lowerthreshold forwhich bandwidth should be allocated to themedia. If the allocated bandwidthwould correspond to a smaller value, thennobitrate should be allocated. Insteadpreferenceshouldbegiventoothermedia in thesessionoralternateencodingsof thesamemedia.Zerominimumbitrateindicatesthatnoinformationonminimumbitrateisprovided.
discard_priority isanintegerindicatingthepriorityofthetrackwhentracksarediscardedtomeettheconstraintssetbytargetrateshare,maximumbitrateandminimumbitrate.Tracksarediscardedindiscardpriorityorderandthetrackthathasthehighestdiscardpriorityvalueisdiscardedfirst.
10.2.3 Relationship between tracks
Thepurposeofdefiningrateshareinformationistoaidaserverorplayerextractingdatafromatrackincombinationwithothertracks.Notethataserver/playerstreams/playstrackssimultaneouslyiftheybelong to different alternate groups and can switch between tracks that belong to the same switchgroupwithinanalternategroup.Bydefault,alltracksareserved/playedsimultaneouslyifnoalternategroupsaredefined.
Rate share information should be provided for each track. A track that does not include rate shareinformationhasoneoperationpointandcanbetreatedasaconstant‐bitratetrackwithdiscardpriority128.Targetrateshare,minimumandmaximumbitratesdonotapplyinthiscase.
Tracks that are alternates to each other shall (at each instance of time) define the same number ofoperationpointsat thesamesetof totalavailablebitratesandhavethesamediscardpriorities.Notethat the number and definition of operation pointsmay depend on time. Alternate tracksmay havedifferenttargetrateshares,minimumandmaximumbitrates.
10.2.4 Bitrate allocation
Rateshareinformationonmaximumbitrate,minimumbitrate,andtargetratesharecanbecombinedforatrack.Ifthisisthecase,thetargetrateshareshallbeappliedtofindanallocatedbitratebeforetheimpactofthemaximumandminimumbitratesisconsidered.
Whenallocatingbandwidthtoseveraltracks,thefollowingconsiderationsapply:
1. In the caseall trackshaveexplicit target rate sharevalues and theydon’t sumup to100percent,treatthemasweights,i.e.,normalizethem.
2. Thetotalallocationshallnotexceedtotalavailablebitrate.
3. In a choice between alternate tracks, the chosen track should be the track that causes thealternate group tohave an allocationmost closely in accordwith its target rate share, or thetrackthatdesiresthehighestbitratethatcanbeallocatedwithoutdiscardingothertracks(seebelow).
4. Tracksmusthaveanallocationbetweentheirminimumandmaximumbitrates,orbediscarded.
5. Tracks should have an allocation in accord with their target rate shares, but this may bedistorted to allow some tracks to achieve their minima, or in case some have reached theirmaxima.
6. Ifanallocationcannotbedoneincludingatrackfromeveryalternategroup,thentracksshouldbediscardedindiscardpriorityorder.
ISO/IEC 14496-12:2015(E)
148 ©ISO/IEC2015–Allrightsreserved
7. Theallocationmustbe re‐calculatedwhenever theoperatingset foranactive track (one thathasbeenselectedfromanalternategroup)changesortheavailablebitratechanges.
10.3 Alternative Startup Sequences
10.3.1 Definition
Analternativestartupsequencecontainsasubsetofsamplesofatrackwithinacertainperiodstartingfromasyncsampleorasamplemarkedby'rap 'samplegrouping,whicharecollectivelyreferredtoas the initial samplebelow.Bydecoding this subsetof samples, the renderingof the samples canbestartedearlierthaninthecasewhenallsamplesaredecoded.
An'alst' samplegroupdescriptionentryindicatesthenumberofsamplesinanyoftherespectivealternativestartupsequences,afterwhichallsamplesshouldbeprocessed.
Either version 0 or version 1 of the Sample to Group Boxmay be usedwith the alternative startupsequence sample grouping. If version 1 of the Sample to Group Box is used,grouping_type_parameterhasnodefinedsemanticsbutthesamealgorithmtoderivealternativestartupsequencesshouldbeusedconsistentlyforaparticularvalueofgrouping_type_parameter.
Aplayerutilizingalternativestartupsequencescouldoperateas follows.First, an initial syncsamplefrom which to start decoding is identified by using the Sync Sample Box, thesample_is_non_sync_sampleflagforsamplesenclosedintrackfragments,orthe'rap 'samplegrouping. Then, if the initial sync sample is associated to a sample group description entry of type'alst'whereroll_countisgreaterthan0,theplayercanusethealternativestartupsequence.Theplayerthendecodesonlythosesamplesthataremappedtothealternativestartupsequenceuntilthenumber of samples that have been decoded is equal to roll_count. After that, all samples aredecoded.
10.3.2 Syntax
class AlternativeStartupEntry() extends VisualSampleGroupEntry (’alst’) { unsigned int(16) roll_count; unsigned int(16) first_output_sample; for (i=1; i <= roll_count; i++) unsigned int(32) sample_offset[i]; j=1; do { // optional, until the end of the structure unsigned int(16) num_output_samples[j]; unsigned int(16) num_total_samples[j]; j++; } }
10.3.3 Semantics
roll_countindicatesthenumberofsamplesinthealternativestartupsequence.Ifroll_countisequalto0,theassociatedsampledoesnotbelongtoanyalternativestartupsequenceandthesemanticsoffirst_output_sampleareunspecified.Thenumberofsamplesmappedtothissamplegroupentryperonealternativestartupsequenceshallbeequalto roll_count.
first_output_sample indicates the index of the first sample intended for output among thesamples in the alternative startup sequence. The index of the sync initial sample starting the
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 149
alternativestartupsequenceis1,andtheindexisincrementedby1,indecodingorder,pereachsampleinthealternativestartupsequence.
sample_offset[i]indicatesthedecodingtimedeltaofthei‐thsampleinthealternativestartupsequencerelativetotheregulardecodingtimeofthesamplederivedfromtheDecodingTimetoSampleBoxortheTrackFragmentHeaderBox.Thesyncinitialsamplestartingthealternativestartupsequenceisitsfirstsample.
num_output_samples[j] and num_total_samples[j] indicate the sample output ratewithin the alternative startup sequence. The alternative startup sequence is divided into kconsecutivepieces,whereeachpiecehasaconstantsampleoutputratewhichisunequaltothatof the adjacent pieces. The first piece starts from the sample indicated byfirst_output_sample. num_output_samples[j] indicates the number of the outputsamples of the j‐th piece of the alternative startup sequence. num_total_samples[j]indicates the total number of samples, including those that are not in the alternative startupsequence,fromthefirstsampleinthej‐thpiecethatisoutputtotheearlierone(incompositionorder) of the sample that ends the alternative startup sequence and the sample thatimmediatelyprecedesthefirstoutputsampleofthe(j+1)thpiece.
10.3.4 Examples
Hierarchicaltemporalscalability(e.g.,inAVCandSVC)improvescompressionefficiencybutincreasesthe decoding delay due to reordering of the decoded pictures from the (de)coding order to outputorder.Deeptemporalhierarchieshavebeendemonstratedtousefulintermsofcompressionefficiencyinsomestudies.Whenthetemporalhierarchyisdeepandtheoperationspeedofthedecoderislimited(tonofasterthanreal‐timeprocessing), theinitialdelayfromthestartofthedecodingtothestartofrenderingissubstantialandmayaffecttheend‐userexperiencenegatively.
Figure7illustratesatypicalhierarchicallyscalablebitstreamwithfivetemporallevels.Figure7ashowstheexamplesequence inoutputorder.Valuesenclosed inboxes indicate the frame_numvalueof thepicture.Valuesinitalicsindicateanon‐referencepicturewhiletheotherpicturesarereferencepictures.Figure7b shows the example sequence in decoding order. Figure7c shows the example sequence inoutputorderwhenassumingthattheoutputtimelinecoincideswiththatofthedecodingtimelineandthedecodingofonepicturelastsonepictureinterval.Itcanbeseenthatplaybackofthestreamstartsfivepictureintervalslaterthanthedecodingofthestreamstarted.Ifthepicturesweresampledat25Hz,thepictureintervalis40msec,andtheplaybackisdelayedby0.2sec.
ISO/IEC 14496-12:2015(E)
150 ©ISO/IEC2015–Allrightsreserved
Figure 7 — Decoded picture buffering delay of an example sequence with five temporal levels
Thankstothetemporalhierarchy,itispossibletodecodeonlyasubsetofthepicturesatthebeginningofthesequence.Consequently,renderingcanbestartedfasterbutthedisplayedpicturerateisloweratthebeginning.Inotherwords,aplayercanmakeatrade‐offbetweenthedurationoftheinitialstartupdelay and the initial displayed picture rate. Figure8 and Figure9 show two examples of alternativestartupsequenceswhereasubsetofthebitstreamofFigure7isdecoded.
The samples selected fordecodingand thedecoderoutputarepresented inFigure8aandFigure8b,respectively.Thereferencepicturehavingframe_numequalto4andthenon‐referencepictureshavingframe_numequal to5 arenotdecoded. In this example, the renderingofpictures starts fourpictureintervalsearlierthaninFigure7.Whenthepicturerateis25Hz,thesavinginstartupdelayis160msec.Thesaving in thestartupdelaycomeswith thedisadvantageofa lowerdisplayedpicturerateat thebeginningofthebitstream.
Figure 8 — An example of an alternative startup sequence
In the example of Figure9, another way of selecting the pictures for decoding is presented. Thedecoding of the pictures that depend on the picture with frame_num equal to 3 is omitted and thedecodingofnon‐referencepictureswithinthesecondhalfofthefirstgroupofpicturesisomittedtoo.Thedecodedpictureresultingfromthesamplewithframe_numequalto2isthefirstonethatisoutput.Asaresult,theoutputpicturerateofthefirstgroupofpicturesishalfofnormalpicturerate,butthe
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 151
display process starts two frame intervals (80 msec in 25 Hz picture rate) earlier than in theconventionalsolutionillustratedinFigure7.
Figure 9 — Another example of an alternative startup sequence
10.4 Random Access Point (RAP) Sample Grouping
10.4.1 Definition
Asyncsampleisspecifiedtobearandomaccesspointafterwhichallsamplesindecodingordercanbecorrectlydecoded.However,itmaybepossibletoencodean“open”randomaccesspoint,afterwhichallsamples inoutputordercanbecorrectlydecoded,butsomesamples followingtherandomaccesspointindecodingorderandprecedingtherandomaccesspointinoutputorderneednotbecorrectlydecodable.Forexample,anintrapicturestartinganopengroupofpicturescanbefollowedindecodingorderby (bi‐)predictedpictures thathoweverprecede the intrapicture inoutputorder; though theypossiblycannotbecorrectlydecodedifthedecodingstartsfromtheintrapicture,theyarenotneeded.
Such“open”random‐accesssamplescanbemarkedbybeingamemberofthisgroup.Samplesmarkedby thisgroupmustberandomaccesspoints, andmayalsobesyncpoints (i.e. it isnot required thatsamplesmarkedbythesyncsampletablebeexcluded).
10.4.2 Syntax
class VisualRandomAccessEntry() extends VisualSampleGroupEntry (’rap ’) { unsigned int(1) num_leading_samples_known; unsigned int(7) num_leading_samples; }
10.4.3 Semantics
num_leading_samples_known equal to 1 indicates that the number of leading samples isknownforeachsample in thisgroup,andthenumber isspecifiedbynum_leading_samples.Aleading sample is such a sample associated with an “open” random access point (RAP). Itprecedes the RAP in presentation order and immediate follows the RAP or another leadingsample in decoding order, and when decoding starts from the RAP, the sample cannot becorrectlydecoded.
num_leading_samples specifiesthenumberofleadingsamplesforeachsampleinthisgroup.Whennum_leading_samples_knownisequalto0,thisfieldshouldbeignored.
ISO/IEC 14496-12:2015(E)
152 ©ISO/IEC2015–Allrightsreserved
10.5 Temporal level sample grouping
10.5.1 Definition
Manyvideocodecssupporttemporalscalabilitywhereitispossibletoextractoneormoresubsetsofframesthatcanbeindependentlydecoded.AsimplecaseistheextractionofI framesforabitstreamwitharegularI‐frameinterval,e.g,,IPPPIPPP…,whereevery4thpictureisanIframe.Alsosubsetsofthese I frames can be extracted for even lower frame rates. More elaborate situations with severaltemporallevelscanbeconstructedusinghierarchicalBorPframes.
TheTemporalLevelsamplegrouping('tele')providesacodec‐independentsamplegroupingthatcanbe used to group samples (access units) in a track (and potential track fragments) according totemporallevel,wheresamplesofonetemporallevelhavenocodingdependenciesonsamplesofhighertemporal levels.The temporal level equals the samplegroupdescription index (takingvalues1, 2,3,etc).Thebitstreamcontainingonlytheaccessunitsfromthefirsttemporalleveltoahighertemporallevelremainsconformingtothecodingstandard.
A grouping according to temporal level facilitates easy extraction of temporal subsequences, forinstanceusingtheSubsegmentIndexingboxin0.
10.5.2 Syntax
class TemporalLevelEntry() extends VisualSampleGroupEntry('tele') { bit(1) level_independently_decodable; bit(7) reserved=0; }
10.5.3 Semantics
Thetemporallevelofsamplesinasamplegroupequalstothesamplegroupdescriptionindex.
level_independently_decodable isaflag.1indicatesthatallsamplesofthislevelhavenocodingdependenciesonsamplesofotherlevels.0indicatesthatnoinformationisprovided.
10.6 Stream access point sample group
10.6.1 Definition
Astreamaccesspoint,asdefinedinAnnexI,enablesrandomaccessintoacontainerofmediastream(s).The SAP sample grouping identifies samples (the first byte ofwhich is the position ISAU for a SAP asspecifiedinAnnexI)asbeingoftheindicatedSAPtype.
Thesyntaxandsemanticsofgrouping_type_parameterarespecifiedasfollows.
{ unsigned int(28) target_layers; unsigned int(4) layer_id_method_idc; }
target_layersspecifiesthetargetlayersfortheindicatedSAPsaccordingtoAnnexI.Thesemanticsoftarget_layersdependsonthevalueoflayer_id_method_idc.Whenlayer_id_method_idcisequalto0,target_layersisreserved.
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 153
layer_id_method_idcspecifiesthesemanticsoftarget_layers.layer_id_method_idcequal to 0 specifies that the target layers consist of all the layers represented by the track.layer_id_method_idcnotequalto0isspecifiedbyderivedmediaformatspecifications.
10.6.2 Syntax
class SAPEntry() extends SampleGroupDescriptionEntry('sap ') { unsigned int(1) dependent_flag; unsigned int(3) reserved; unsigned int(4) SAP_type; }
10.6.3 Semantics
reservedshallbeequalto0.Parsersshallallowandignoreallvaluesofreserved.dependent_flagshallbe0fornon‐layeredmedia. dependent_flag equalto1specifiesthat
thereferencelayers,ifany,forpredictingthetargetlayersmayhavetobedecodedforaccessingasampleofthissamplegroup.dependent_flagequalto0specifiesthatthereferencelayers,ifany,forpredictingthetargetlayersneednotbedecodedforaccessinganySAPofthissamplegroup.
sap_typevaluesequalto0and7arereserved;sap_typevaluesintherangeof1to6,inclusive,specifytheSAPtype,asspecifiedinAnnexI,oftheassociatedsamples(forwhichthefirstbyteofasampleinthisgroupisthepositionISAU).
11 Extensibility
11.1 Objects
Thenormativeobjectsdefinedinthisspecificationareidentifiedbya32‐bitvalue,whichisnormallyasetoffourprintablecharactersfromtheISO8859‐1characterset.
Topermituserextensionoftheformat,tostorenewobjecttypes,andtopermittheinter‐operationofthe files formatted to this specificationwithcertaindistributedcomputingenvironments, thereare atypemappingandatypeextensionmechanismthattogetherformapair.
CommonlyusedindistributedcomputingareUUIDs(universaluniqueidentifiers),whichare16bytes.AnynormativetypespecifiedherecanbemappeddirectlyintotheUUIDspacebycomposingthefourbyte type value with the twelve byte ISO reserved value, 0xXXXXXXXX-0011-0010-8000-00AA00389B71. The four character code replaces the XXXXXXXX in the preceding number. ThesetypesareidentifiedtoISOastheobjecttypesusedinthisspecification.
Userobjectsusetheescapetype‘uuid’.Theyaredocumentedaboveinsubclause6.2.Afterthesizeandtypefields,thereisafull16‐byteUUID.
SystemswhichwishtotreateveryobjectashavingaUUIDcouldemploythefollowingalgorithm:
size := read_uint32(); type := read_uint32(); if (type==‘uuid’) then uuid := read_uuid() else uuid := form_uuid(type, ISO_12_bytes);
ISO/IEC 14496-12:2015(E)
154 ©ISO/IEC2015–Allrightsreserved
Similarly when linearizing a set of objects into files formatted to this specification, the following isapplied:
write_uint32( object_size(object) ); uuid := object_uuid_type(object); if (is_ISO_uuid(uuid) ) write_uint32( ISO_type_of(uuid) ) else { write_uint32(‘uuid’); write_uuid(uuid); }
Afilecontainingboxesfromthisspecificationthathavebeenwrittenusingthe‘uuid’escapeandthefull UUID is not compliant; systems are not required to recognize standard boxeswritten using the‘uuid’andanISOUUID.
11.2 Storage formats
Themainfilecontainingthemetadatamayuseotherfilestocontainmedia‐data.Theseotherfilesmaycontainheaderdeclarationsfromavarietyofstandards,includingthisone.
If such a secondary file has ametadata declaration set in it, thatmetadata is not part of the overallpresentation.Thisallowssmallpresentationfilestobeaggregatedintoalargeroverallpresentationbybuildingnewmetadataandreferencingthemedia‐data,ratherthancopyingit.
Thereferencesintotheseotherfilesneednotuseallthedatainthosefiles;inthisway,asubsetofthemedia‐datamaybeused,orunwantedheadersignored.
11.3 Derived File formats
This specificationmay be used as the basis as the specific file format for a restricted purpose: forexample,theMP4fileformatforMPEG‐4andtheMotionJPEG2000fileformatarebothderivedfromit.Whenaderivedspecificationiswritten,thefollowingmustbespecified:
Thenameofthenewformat,anditsbrandandcompatibilitytypesfortheFileTypeBox.Generallyanewfileextensionwillbeused,anewMIMEtype,andMacintosh file typealso, thoughthedefinitionandregistrationoftheseareoutsidethescopeofthisspecification.
Any template fields used must be explicitly declared; their use must be conformant with thespecificationhere.
The exact ‘codingname’ and ‘protocol’ identifiers as used in the Sample Description must bedefined.The formatof the samples that these code‐points identifymust alsobedefined.However, itmaybepreferabletofitthenewcodingsystemsintoanexistingframework(e.g.theMPEG‐4systemsframework),thantodefinenewcodingpointsatthislevel.Forexample,anewaudioformatcoulduseanew codingname, or could use ‘mp4a’ and register new identifiers within the MPEG‐4 audioframework.
Newboxesmaybedefined,thoughthisisdiscouraged.
Ifthederivedspecificationneedsanewtracktypeotherthanthosedefinedhereorregistered,thenanewhandler‐typemustberegistered.Themediaheaderrequiredforthistrackmustbeidentified.Ifit
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 155
isanewbox,itmustbedefinedanditsboxtyperegistered.Ingeneral,itisexpectedthatmostsystemscanuseexistingtracktypes.
Anynewtrackreferencetypesshouldberegisteredanddefined.
Asdefinedabove,theSampleDescriptionformatmaybeextendedwithoptionalorrequiredboxes.Theusualsyntaxfordoingthiswouldbetodefineanewboxwithaspecificname,extending(forexample)VisualSampleEntry,andcontainingnewboxes.
12 Media-specific definitions
12.1 Video media
12.1.1 Media handler
Videomediausesthe‘vide’handlertypeinthehandlerboxofthemediabox,asdefinedin8.4.3.
Auxiliaryvideomediausesthe‘auxv’handlertypeinthehandlerboxofthemediabox,asdefinedin8.4.3.
Anauxiliaryvideotrackiscodedthesameasavideotrack,butusesthisdifferenthandlertype,andisnotintendedtobevisuallydisplayed(e.g.itcontainsdepthinformation,orothermonochromeorcolortwo‐dimensional information). Auxiliary video tracks are usually linked to a video track by anappropriatetrackreference.
12.1.2 Video media header
12.1.2.1 Definition
BoxTypes: ‘vmhd’Container: MediaInformationBox(‘minf’)Mandatory:YesQuantity: Exactlyone
VideotracksusetheVideoMediaHeaderboxinthemediainformationboxasdefinedin8.4.5.Thevideomediaheadercontainsgeneralpresentation information, independentof thecoding, forvideomedia.Notethattheflagsfieldhasthevalue1.
12.1.2.2 Syntax
aligned(8) class VideoMediaHeaderBox extends FullBox(‘vmhd’, version = 0, 1) { template unsigned int(16) graphicsmode = 0; // copy, see below template unsigned int(16)[3] opcolor = {0, 0, 0}; }
12.1.2.3 Semantics
version isanintegerthatspecifiestheversionofthisboxgraphicsmode specifiesacompositionmodeforthisvideotrack,fromthefollowingenumerated
set,whichmaybeextendedbyderivedspecifications:copy=0copyovertheexistingimage
opcolor isasetof3colourvalues(red,green,blue)availableforusebygraphicsmodes
ISO/IEC 14496-12:2015(E)
156 ©ISO/IEC2015–Allrightsreserved
12.1.3 Sample entry
12.1.3.1 Definition
VideotracksuseVisualSampleEntry.
Invideotracks,theframe_countfieldmustbe1unlessthespecificationforthemediaformatexplicitlydocuments this template fieldandpermits largervalues.Thatspecificationmustdocumentbothhowthe individual frames of video are found (their size information) and their timing established. Thattimingmightbeas simpleasdividing thesampledurationby the framecount toestablish the frameduration.
Thewidthandheightinthevideosampleentrydocumentthepixelcountsthatthecodecwilldeliver;thisenablestheallocationofbuffers.Sincethesearecountstheydonottakeintoaccountpixelaspectratio.
12.1.3.2 Syntax
class VisualSampleEntry(codingname) extends SampleEntry (codingname){ unsigned int(16) pre_defined = 0; const unsigned int(16) reserved = 0; unsigned int(32)[3] pre_defined = 0; unsigned int(16) width; unsigned int(16) height; template unsigned int(32) horizresolution = 0x00480000; // 72 dpi template unsigned int(32) vertresolution = 0x00480000; // 72 dpi const unsigned int(32) reserved = 0; template unsigned int(16) frame_count = 1; string[32] compressorname; template unsigned int(16) depth = 0x0018; int(16) pre_defined = -1; // other boxes from derived specifications CleanApertureBox clap; // optional PixelAspectRatioBox pasp; // optional }
12.1.3.3 Semantics
resolutionfieldsgivetheresolutionoftheimageinpixels‐per‐inch,asafixed16.16numberframe_count indicates howmany frames of compressed video are stored in each sample. The
defaultis1,foroneframepersample;itmaybemorethan1formultipleframespersampleCompressornameisaname,forinformativepurposes.Itisformattedinafixed32‐bytefield,with
the firstbyteset to thenumberofbytes tobedisplayed, followedby thatnumberofbytesofdisplayabledata,andthenpaddingtocomplete32bytestotal(includingthesizebyte).Thefieldmaybesetto0.
depthtakesoneofthefollowingvalues0x0018–imagesareincolourwithnoalpha
width and height are themaximum visualwidth and height of the stream described by thissampledescription,inpixels
12.1.4 Pixel Aspect Ratio and Clean Aperture
12.1.4.1 Definition
Thepixelaspectratioandcleanapertureofthevideomaybespecifiedusingthe‘pasp’and‘clap’sampleentryboxes,respectively.Thesearebothoptional;ifpresent,theyover‐ridethedeclarations(ifany)instructuresspecifictothevideocodec,whichstructuresshouldbeexaminediftheseboxesare
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 157
absent. Formaximumcompatibility, these boxes should follow, not precede, anyboxes defined in orrequiredbyderivedspecifications.
In the PixelAspectRatioBox, hSpacing and vSpacing have the same units, but those units areunspecified:only theratiomatters.hSpacing andvSpacingmayormaynotbe in reduced terms,andtheymayreduceto1/1.Bothofthemmustbepositive.
Theyaredefinedastheaspectratioofapixel, inarbitraryunits. IfapixelappearsHwideandVtall,thenhSpacing/vSpacing is equal toH/V.Thismeans thata squareon thedisplay that isnpixels tallneedstoben*vSpacing/hSpacingpixelswidetoappearsquare.
NOTEWhenadjustingpixelaspectratio,normally,thehorizontaldimensionofthevideoisscaled,ifneeded(i.e.ifthefinaldisplaysystemhasadifferentpixelaspectratiofromthevideosource).
NOTEItisrecommendedthattheoriginalpixels,andthecomposedtransform,becarriedthroughthepipelineasfaraspossible.Ifthetransformationresultingfrom‘correcting’pixelaspectratiotoasquaregrid,normalizingtothetrackdimensions,compositionorplacement(e.g.trackand/ormoviematrix),andnormalizingtothedisplaycharacteristics, is a unity matrix, then no re‐sampling need be done. In particular, video should not be re‐sampledmorethanonceintheprocessofrendering,ifatallpossible.
There are notionally four values in the CleanApertureBox. These parameters are represented as afractionN/D.Thefractionmayormaynotbeinreducedterms.WerefertothepairofparametersfooNandfooDasfoo.ForhorizOff andvertOff,DmustbepositiveandNmaybepositiveornegative.ForcleanApertureWidth andcleanApertureHeight,bothNandDmustbepositive.
NOTEThesearefractionalnumbersforseveralreasons.First,insomesystemstheexactwidthafterpixelaspectratio correction is integral, not the pixel count before that correction. Second, if video is resized in the fullaperture, theexact expression for the cleanaperturemaynotbe integral.Finally,because this is representedusingcentreandoffset,adivisionbytwoisneeded,andsohalf‐valuescanoccur.
Considering the pixel dimensions as defined by the VisualSampleEntry width and height. If picturecentreoftheimageisatpcXandpcY,thenhorizOffandvertOffaredefinedasfollows:
pcX = horizOff + (width - 1)/2 pcY = vertOff + (height - 1)/2;
Typically,horizOffandvertOffarezero,sotheimageiscentredaboutthepicturecentre.
Theleftmost/rightmostpixelandthetopmost/bottommostlineofthecleanaperturefallat:
pcX ± (cleanApertureWidth - 1)/2 pcY ± (cleanApertureHeight - 1)/2;
12.1.4.2 Syntax
class PixelAspectRatioBox extends Box(‘pasp’){ unsigned int(32) hSpacing; unsigned int(32) vSpacing; }
ISO/IEC 14496-12:2015(E)
158 ©ISO/IEC2015–Allrightsreserved
class CleanApertureBox extends Box(‘clap’){ unsigned int(32) cleanApertureWidthN; unsigned int(32) cleanApertureWidthD; unsigned int(32) cleanApertureHeightN; unsigned int(32) cleanApertureHeightD; unsigned int(32) horizOffN; unsigned int(32) horizOffD; unsigned int(32) vertOffN; unsigned int(32) vertOffD; }
12.1.4.3 Semantics
hSpacing,vSpacing:definetherelativewidthandheightofapixel;cleanApertureWidthN,cleanApertureWidthD:afractionalnumberwhichdefinestheexact
cleanaperturewidth,incountedpixels,ofthevideoimagecleanApertureHeightN, cleanApertureHeightD: a fractional number which defines the
exactcleanapertureheight,incountedpixels,ofthevideoimagehorizOffN, horizOffD: a fractional number which defines the horizontal offset of clean
aperturecentreminus(width‐1)/2.Typically0.vertOffN, vertOffD: a fractional number which defines the vertical offset of clean aperture
centreminus(height‐1)/2.Typically0.
12.1.5 Colour information
12.1.5.1 Definition
Colour information may be supplied in one or more ColourInformationBoxes placed in aVisualSampleEntry.Theseshouldbeplacedinorderinthesampleentrystartingwiththemostaccurate(and potentially the most difficult to process), in progression to the least. These are advisory andconcernrenderingandcolourconversion,andthereisnonormativebehaviourassociatedwiththem;areadermaychoosetousethemostsuitable.AColourInformationBoxwithanunknowncolourtypemaybeignored.
If used, an ICC profile may be a restricted one, under the code ‘rICC’, which permits simplerprocessing.ThatprofileshallbeofeithertheMonochromeorThree‐ComponentMatrix‐Basedclassofinputprofiles,asdefinedbyISO15076‐1.Iftheprofileisofanotherclass,thenthe‘prof’ indicatormustbeused.
If colour information is supplied in both this box, and also in the video bitstream, this box takesprecedence,andover‐ridestheinformationinthebitstream.
NOTE WhenanICCprofileisspecified,SMPTERP177“DerivationofBasicTelevisionColorEquations”maybeofassistanceifthereisaneedtoformtheY'CbCrtoR'G'B'conversionmatrixforthecolorprimariesdescribedbytheICCprofile.
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 159
12.1.5.2 Syntax
class ColourInformationBox extends Box(‘colr’){ unsigned int(32) colour_type; if (colour_type == ‘nclx’) /* on-screen colours */ { unsigned int(16) colour_primaries; unsigned int(16) transfer_characteristics; unsigned int(16) matrix_coefficients; unsigned int(1) full_range_flag; unsigned int(7) reserved = 0; } else if (colour_type == ‘rICC’) { ICC_profile; // restricted ICC profile } else if (colour_type == ‘prof’) { ICC_profile; // unrestricted ICC profile } }
12.1.5.3 Semantics
colour_type: an indication of the type of colour information supplied. For colour_type ‘nclx’: thesefieldsareexactlythefourbytesdefinedforPTM_COLOR_INFO( ) inA.7.2ofISO/IEC29199‐2butnotethatthefullrangeflagishereinadifferentbitposition
ICC_profile: anICCprofileasdefinedinISO15076‐1orICC.1:2010issupplied.
12.2 Audio media
12.2.1 Media handler
Audiomediausesthe‘soun’handlertypeinthehandlerboxofthemediabox,asdefinedin8.4.3.
12.2.2 Sound media header
12.2.2.1 Definition
BoxTypes: ‘smhd’Container: MediaInformationBox(‘minf’)Mandatory:YesQuantity: Exactlyonespecificmediaheadershallbepresent
Audio tracks use the SoundMediaHeaderbox in the media information box as defined in 8.4.5. Thesoundmediaheader contains generalpresentation information, independentof the coding, for audiomedia.Thisheaderisusedforalltrackscontainingaudio.
12.2.2.2 Syntax
aligned(8) class SoundMediaHeaderBox extends FullBox(‘smhd’, version = 0, 0) { template int(16) balance = 0; const unsigned int(16) reserved = 0; }
12.2.2.3 Semantics
version isanintegerthatspecifiestheversionofthisbox
ISO/IEC 14496-12:2015(E)
160 ©ISO/IEC2015–Allrightsreserved
balance isafixed‐point8.8numberthatplacesmonoaudiotracksinastereospace;0iscentre(thenormalvalue);fullleftis‐1.0andfullrightis1.0.
12.2.3 Sample entry
12.2.3.1 Definition
AudiotracksuseAudioSampleEntryorAudioSampleEntryV1.
The samplerate, samplesize and channelcount fields document the default audio outputplayback format for this media. The timescale for an audio track should be chosen to match thesamplingrate,orbeanintegermultipleofit,toenablesample‐accuratetiming.Whenchannelcountis a value greater than zero, it indicates the intended number of loudspeaker channels in the audiostream.AChannelCountof1indicatesmonoaudio,and2indicatesstereo(left/right).Whenvaluesgreaterthan2areused,thecodecconfigurationshouldidentifythechannelassignment.
Whenitisdesiredtoindicateanaudiosamplingrategreaterthanthevaluethatcanberepresentedinthesampleratefield,thefollowingmaybeused:
anAudioSampleEntryV1isused,whichrequiresthattheenclosingSampleDescriptionBoxalsotaketheversion1;
aSamplingRateboxmaybepresentonlyinanAudioSampleEntryV1,andwhenpresent,itover‐ridesthesampleratefieldanddocumentstheactualsamplingrate;
whentheSamplingRateboxispresent,themediatimescaleshouldbethesameasthesamplingrate,oranintegerdivisionormultipleofit;
thesampleratefieldinthesampleentryshouldcontainavalueleft‐shifted16bits(asforAudioSampleEntry)thatmatchesthemediatimescale,orbeanintegerdivisionormultipleofit.
AnAudioSampleEntryV1shouldonlybeusedwhenneeded;otherwise,formaximumcompatibility,anAudioSampleEntryshouldbeused.AnAudioSampleEntryV1mustnotoccurinaSampleDescriptionBoxwithversionsetto0.
Theaudiooutputformat(samplerate,samplesizeandchannelcountfields)inthesampleentryshouldbeconsidereddefinitiveonlyforcodecsthatdonotrecordtheirownoutputconfiguration.Iftheaudiocodechasdefinitive informationabout theoutput format, it shallbe takenasdefinitive; in thiscasethesamplerate,samplesizeandchannelcountfieldsinthesampleentrymaybeignored,thoughsensiblevaluesshouldbechosen(forexample,thehighestpossiblesamplingrate).
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 161
12.2.3.2 Syntax
// Audio Sequences class AudioSampleEntry(codingname) extends SampleEntry (codingname){ const unsigned int(32)[2] reserved = 0; template unsigned int(16) channelcount = 2; template unsigned int(16) samplesize = 16; unsigned int(16) pre_defined = 0; const unsigned int(16) reserved = 0 ; template unsigned int(32) samplerate = { default samplerate of media}<<16; ChannelLayout(); // we permit any number of DownMix or DRC boxes: DownMixInstructions() []; DRCCoefficientsBasic() []; DRCInstructionsBasic() []; DRCCoefficientsUniDRC() []; DRCInstructionsUniDRC() []; Box (); // further boxes as needed }
aligned(8) class SamplingRateBox extends FullBox(‘srat’) { unsigned int(32) sampling_rate; }
class AudioSampleEntryV1(codingname) extends SampleEntry (codingname){ unsigned int(16) entry_version; // must be 1, // and must be in an stsd with version ==1 const unsigned int(16)[3] reserved = 0; template unsigned int(16) channelcount; // must be correct template unsigned int(16) samplesize = 16; unsigned int(16) pre_defined = 0; const unsigned int(16) reserved = 0 ; template unsigned int(32) samplerate = 1<<16; // optional boxes follow SamplingRateBox(); ChannelLayout(); // we permit any number of DownMix or DRC boxes: DownMixInstructions() []; DRCCoefficientsBasic() []; DRCInstructionsBasic() []; DRCCoefficientsUniDRC() []; DRCInstructionsUniDRC() []; Box (); // further boxes as needed }
12.2.3.3 Semantics
ChannelCount isthenumberofchannelssuchas1(mono)or2(stereo)SampleSizeisinbits,andtakesthedefaultvalueof16SampleRatewhenaSamplingRateBoxisabsentisthesamplingrate;whenaSamplingRateBoxis
present,isasuitableintegermultipleordivisionoftheactualsamplingrate.This32‐bitfieldisexpressedasa16.16fixed‐pointnumber(hi.lo)
sampling_rate istheactualsamplingrateoftheaudiomedia,expressedasa32‐bitinteger
ISO/IEC 14496-12:2015(E)
162 ©ISO/IEC2015–Allrightsreserved
12.2.4 Channel layout
12.2.4.1 Definition
BoxTypes: ‘chnl’Container: AudiosampleentryMandatory: NoQuantity: Zeroorone
Thisboxmayappear in anaudio sampleentry todocument theassignmentof channels in theaudiostream.
Thechannelcount field in theAudioSampleEntry must be correct; an AudioSampleEntryV1 istherefore required to signal values other than2.The channel layout canbe all orpart of a standardlayout(fromanenumeratedlist),oracustomlayout(whichalsoallowsatracktocontributepartofanoveralllayout).
Astreammaycontainchannels,objects,neither,orboth.Astreamthat isneitherchannelnorobjectstructuredcanimplicitlyberenderedinavarietyofways.
12.2.4.2 Syntax
aligned(8) class ChannelLayout extends FullBox(‘chnl’) { unsigned int(8) stream_structure; if (stream_structure & channelStructured) { // 1 unsigned int(8) definedLayout; if (definedLayout==0) { for (i = 1 ; i <= channelCount ; i++) { // channelCount comes from the sample entry unsigned int(8) speaker_position; if (speaker_position == 126) { // explicit position signed int (16) azimuth; signed int (8) elevation; } } } else { unsigned int(64) omittedChannelsMap; // a ‘1’ bit indicates ‘not in this track’ } } if (stream_structure & objectStructured) { // 2 unsigned int(8) object_count; } }
12.2.4.3 Semantics
stream_structureisafieldofflagsthatdefinewhetherthestreamhaschannelorobjectstructure(orboth,orneither);thefollowingflagsaredefined,allothervaluesarereserved:1 thestreamcarrieschannels2 thestreamcarriesobjects
definedLayoutisaChannelConfigurationfromISO/IEC23001‐8;speaker_positionisanOutputChannelPositionfromISO/IEC23001‐8.Ifanexplicitpositionis
used,thentheazimuthandelevationareasdefinedasforspeakersinISO/IEC23001‐8.azimuthisasignedvalueindegrees,asdefinedforLoudspeakerAzimuthinISO/IEC23001‐8elevationisasignedvalue,indegrees,asdefinedforLoudspeakerElevationinISO/IEC23001‐8
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 163
omittedChannelsMapisabit‐mapofomittedchannels;thebitsinthechannelmaparenumberfromleast‐significanttomost‐significant,andcorrespondinthatorderingwiththeorderofthechannelsfortheconfigurationasdocumentedinISO/IEC23001‐8ChannelConfiguration.1‐bitsin the channelmapmean that a channel is absent. A zero value of themap therefore alwaysmeansthatthegivenstandardlayoutisfullypresent.
12.2.5 Downmix Instructions
12.2.5.1 Definition
BoxTypes: ‘dmix’Container: AudiosampleentryMandatory: NoQuantity: Zeroormore
Thedownmixcanbecontrolledbytheproductionfacilityifnecessary.Forinstance,somecontentmayrequiremoreattenuationofthesurroundchannelsbeforedownmixingtomaintainintelligibility.
Thedownmixsupportisdesignedsothatanydownmix(e.g.from7.1toquadaswellastostereo)canbedescribed.
It is possible to declare the loudness characteristics of the signal after downmix, and after DRC anddownmix.
If targetChannelCount*baseChannelCount is odd, the box is padded with 4 bits set to 0xF. ThetargetChannelCountmustbeconsistentwiththetargetLayout(ifgiven),andmustbelessthanorequaltothechannelcount.
EachdownmixisuniquelyidentifiedbyanID.
12.2.5.2 Syntax
aligned(8) class DownMixInstructions extends FullBox(‘dmix’) { unsigned int(8) targetLayout; unsigned int(1) reserved = 0; unsigned int(7) targetChannelCount; bit(1) in_stream; unsigned int(7) downmix_ID; if (in_stream==0) { // downmix coefficients are out of stream and supplied here int i, j; for (i = 1 ; i <= targetChannelCount; i++){ for (j=1; j <= baseChannelCount; j++) { bit(4) bs_downmix_coefficient; } } } }
12.2.5.3 Semantics
targetLayoutisaChannelConfigurationfromISO/IEC23001‐8anddefinestheresultinglayoutafterdownmix
targetChannelCountisthecountofchannelsintheresultingstream,andmustcorrespondwiththetargetlayout
ISO/IEC 14496-12:2015(E)
164 ©ISO/IEC2015–Allrightsreserved
downmix_IDisanarbitraryvaluethatidentifiesthisdownmix,andmustbeuniqueamongtheDownMixInstructionsinagivensampleentry;therearetworeservedvalues,0and0x7F,whichmustnotbeused
in_streamhasavalueof1whenthedownmixcoefficientsareinthestream.Otherwise,itiszero..bs_downmix_coefficientisencodedasdefinedinthefollowingtables:
Value Hex Encoding (4 bits) 0.00dB 0x0‐0.50dB 0x1‐1.00dB 0x2‐1.50dB 0x3‐2.00dB 0x4‐2.50dB 0x5‐3.00dB 0x6‐3.50dB 0x7‐4.00dB 0x8‐4.50dB 0x9‐5.00dB 0xA‐5.50dB 0xB‐6.00dB 0xC‐7.50dB 0xD‐9.00dB 0xE‐∞dB 0xF
Table 5: Downmix Coefficient Encoding for non-LFE channels
Value Hex Encoding (4 bits)
10.00dB 0x06.00dB 0x14.5dB 0x23.00dB 0x31.50dB 0x40.00dB 0x5‐1.50dB 0x6‐3.00dB 0x7‐4.50dB 0x8‐6.00dB 0x9‐10.00dB 0xA‐15.00dB 0xB‐20.00dB 0xC‐30.00dB 0xD‐40.00dB 0xE‐∞dB 0xF
Table 6: Downmix Coefficient Encoding for LFE channel
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 165
12.2.6 DRC Information
ADRCisusedintheencodertogenerategainvaluesusingoneofthepre‐definedDRCcharacteristicsasdefinedinISO/IEC23001‐8;thecoefficientsareplacedeitherin‐streamorinanassociatedmeta‐datatrack.
For somecontent, such as somemulti‐channel content, itmaybe advantageous tousedifferentDRCcharacteristicsindifferentchannels.Forinstance,ifspeechisexclusivelypresentinthecenterchannel,this feature can be very useful. It is supported by the assignment of DRC characteristics to audiochannels.
ItispossibletodeclaretheloudnesscharacteristicsofthesignalafterDRC.
DRCsupport includessupporting in‐streamDRCcoefficients,andaseparate trackcarrying them; thelatter is particularly useful for legacy coding systems (including uncompressed audio) that have noprovisionforin‐streamcoefficients.
In the ISObasemedia file format, the audio contentmaybe carried inmultiple trackswhere a basetrackcontainstheDRCmetadataforalltracks.Theadditionaltracksarereferencedbythebasetrackusingatrackreferenceoftype‘adda’(additionalaudio).ThechannelsprocessedbytheDRCareallthechannels inthebasetrack,plusall thechannels intrack(s)referenced, intheorderofthereferences.TheDRCchannelgroupsapplytoallthosechannels(eveniftheyarechannelsinatrackthatisdisabledornotcurrentlybeingplayed).
The boxes DRCCoefficientsBasic, DRCCoefficientsUniDRC, DRCInstructionsBasic,and DRCInstructionsUniDRC may occur in an AudioSampleEntry and are defined in ISO/IEC23003‐4.
12.2.7 Audio stream loudness
12.2.7.1 Introduction
BoxTypes: ‘ludt’Container: Trackuser‐data‘udta’Mandatory: NoQuantity: Zeroormore
Loudness declarations are placed in user‐data boxes, to enable their presence and update inmoviefragments.Inparticular,inlivescenarios,user‐dataintheinitialmovieatommaybea‘promisenottoexceed’or‘bestguess’,andthenuser‐dataupdatesgivebetter(butstillgenerallyvalid)values.Thus,forexample,aloudnessrangeinthisuserdatathatisassociatedwithaparticularsetofDRCinstructionsconstitutesa‘promise’ratherthanameasurement,underthesecircumstances.
Several metadata values are available that describe aspects of the dynamic range. The size of thedynamic rangecanbeuseful inadjusting theDRCcharacteristic, e.g. theDRC is lessaggressive if thedynamicrangeissmallortheDRCcanevenbeturnedoff.
TruePeakandmaximumloudnessvaluescanbeusefulforestimatingtheheadroom,forinstancewhenloudnessnormalizationresultsinapositivegain[dB]orwhenheadroomisneededtoavoidclippingof
ISO/IEC 14496-12:2015(E)
166 ©ISO/IEC2015–Allrightsreserved
thedownmix.TheDRCcharacteristic can thenbeadjusted toapproachaheadroom target.Thepeakleveloftheassociatedcontentisrepresentedhereinacoding‐independentway.
The audio sound pressure level that the contentwasmixed to can also be documented. (If audio islistenedtoatalevelotherthanthemixinglevel,thiscanaffecttheperceivedtonalbalance.)
Thefollowingmeasuresmayalsobeused:
MaximumoftheLoudnessRangederivedfromEBU‐Tech3342
MaximumMomentaryLoudnessderivedfromITU‐RBS.1771‐1orEBU‐Tech3341
MaximumShort‐TermLoudnessderivedfromITU‐RBS.1771‐1orEBU‐Tech3341
Short‐TermLoudnessdefinedinITU‐RBS.1771‐1orEBU‐Tech3341
Undersomecircumstances itcanbedesirableto indicatethe loudnesscharacteristicsofanalbum, ineach song that the album contains. A separate box can be specified for that purpose. TheTrackLoudnessInfo and AlbumLoudnessInfo provide loudness information for the song, and for theentirealbumwhichcontainsthesong,respectively.
The program loudness ismeasured using ITU‐R BS.1770‐3 over the associated content; the ‘anchorloudness’ is the loudness of the anchor content, where what that content is, is determined by thecontent author; one suitable value (especially for content for which the main content is speech) is‘dialognormal level’ orDialNormasdefined inATSCDoc.A/52:2012. ISO/IEC23003‐4 specifies themeasurementsystems,measurementmethodsandthecodingofallloudnessandpeak‐relatedvalues.
12.2.7.2 Syntax
aligned(8) class LoudnessBaseBox extends FullBox(loudnessType) { unsigned int(3) reserved = 0; unsigned int(7) downmix_ID; // matching downmix unsigned int(6) DRC_set_ID; // to match a DRC box signed int(12) bs_sample_peak_level; signed int(12) bs_true_peak_level; unsigned int(4) measurement_system_for_TP; unsigned int(4) reliability_for_TP; unsigned int(8) measurement_count; int i; for (i = 1 ; i <= measurement_count; i++){ unsigned int(8) method_definition; unsigned int(8) method_value; unsigned int(4) measurement_system; unsigned int(4) reliability; } }
aligned(8) class TrackLoudnessInfo extends LoudnessBaseBox(‘tlou’) { }
aligned(8) class AlbumLoudnessInfo extends LoudnessBaseBox (‘alou’) { }
aligned(8) class LoudnessBox extends Box(‘ludt’) { loudness TrackLoudnessInfo[]; // a set of one or more loudness boxes albumLoudness AlbumLoudnessInfo[]; // if applicable }
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 167
12.2.7.3 Semantics
downmix_IDwhenzero,declarestheloudnesscharacteristicsofthelayoutwithoutdownmix.Ifnon‐zero,thisboxdeclarestheloudnessafterapplyingthedownmixwiththematchingdownmix_ID andmustmatchavalueinexactlyoneboxinthethesampleentryofthistrack
DRC_set_IDwhenzero,declaresthecharacteristicswithoutapplyingaDRC.Ifnon‐zero,thisboxdeclarestheloudnessafterapplyingtheDRCwiththematchingDRC_set_ID andmustmatchavalueinexactlyoneboxinthethesampleentryofthistrack
bs_sample_peak_leveltakesavalueforthesamplepeaklevelasdefinedinISO/IEC23003‐4;allothervaluesarereserved
bs_true_peak_leveltakesavalueforthetruepeaklevelasdefinedinISO/IEC23003‐4;allothervaluesarereserved
measurement_system_for_TPtakesanindexforthemeasurementsystemasdefinedinISO/IEC23003‐4;allothervaluesarereserved
method_definitiontakesanindexforthemeasurementmethodasdefinedinISO/IEC23003‐4;allothersarereserved
measurement_systemtakesanindexforthemeasurementsystemasdefinedinISO/IEC23003‐4;allothersarereserved
reliability and reliability_for_TP eachtakeoneofthefollowingvalues(allothervaluesarereserved): 0:Reliabilityisunknown1:Valueisreported/importedbutunverified2:Valueisa‘nottoexceed’ceiling3:Valueismeasuredandaccurate
12.3 Metadata media
12.3.1 Media handler
Timedmetadatamediausesthe‘meta’handlertypeinthehandlerboxofthemediabox,asdefinedin8.4.3.
NOTE MPEG‐7 streams, which are a specific kind of metadata stream, have their own handler declared,documentedintheMP4fileformat[ISO/IEC14496‐14].
NOTE metadatatracksarelinkedtothetracktheydescribeusingatrack‐referenceoftype‘cdsc’.
12.3.2 Media header
Metadatatracksuseanullmediaheader(‘nmhd’),asdefinedinsubclause8.4.5.2.
12.3.3 Sample entry
12.3.3.1 Definition
TimedmetadatatracksuseMetaDataSampleEntry.
AnoptionalBitRateBoxmaybepresentattheendofanyMetaDataSampleEntrytosignalthebitrateinformationofastream.Thiscanbeused forbufferconfiguration. IncaseofXMLmetadata itcanbeusedtochoosetheappropriatememoryrepresentationformat(DOM,STX).
AnoptionalbitrateboxmaybeusedintheURIMetaSampleEntryentry,asusual.
ISO/IEC 14496-12:2015(E)
168 ©ISO/IEC2015–Allrightsreserved
The URIMetaSampleEntry entry contains, in a box, the URI defining the form of the metadata, andoptionalinitializationdata.TheformatofboththesamplesandoftheinitializationdataisdefinedbyallorpartoftheURIform.
ItmaybethecasethattheURI identifiesaformatofmetadatathatallowstheretobemorethanone‘statedfact’withineachsample.However,allmetadatasamplesinthisformatareeffectively‘Iframes’,definingtheentiresetofmetadataforthetimeintervaltheycover.Thismeansthatthecompletesetofmetadataatanyinstant,foragiventrack,iscontainedin(a)thetime‐alignedsamplesofthetrack(s)(ifany)describingthattrack,plus(b)thetrackmetadata(ifany),themoviemetadata(ifany)andthefilemetadata(ifany).
Ifincrementally‐changedmetadataisneeded,theMPEG‐7frameworkprovidesthatcapability.
InformationonURIformsforsomemetadatasystemscanbefoundinAnnexG.
12.3.3.2 Syntax
class MetaDataSampleEntry(codingname) extends SampleEntry (codingname) { Box[] other_boxes; // optional }
class XMLMetaDataSampleEntry() extends MetaDataSampleEntry (’metx‘) { string content_encoding; // optional string namespace; string schema_location; // optional BitRateBox (); // optional }
class TextConfigBox() extends Fullbox (‘txtC’, 0, 0) { string text_config; }
class TextMetaDataSampleEntry() extends MetaDataSampleEntry (‘mett’) { string content_encoding; // optional string mime_format; BitRateBox (); // optional TextConfigBox (); // optional }
aligned(8) class URIBox extends FullBox(‘uri ’, version = 0, 0) { string theURI; }
aligned(8) class URIInitBox extends FullBox(‘uriI’, version = 0, 0) { unsigned int(8) uri_initialization_data[]; }
class URIMetaSampleEntry() extends MetaDataSampleEntry (’urim‘) { URIbox the_label; URIInitBox init; // optional BitRateBox (); // optional }
12.3.3.3 Semantics
content_encoding ‐ is a null-terminated string in UTF-8 characters, and provides aMIME typewhichidentifiesthecontentencodingofthetimedmetadata.Itisdefinedinthesamewayasforan ItemInfoEntry in this specification. If not present (an empty string is supplied) the timedmetadataisnotencoded.Anexampleforthisfieldis‘application/zip’.NotethatnoMIMEtypesforBiM[ISO/IEC23001‐1]andTeM[ISO/IEC15938‐1]currentlyexist.Thustheexperimental
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 169
MIME types ‘application/x‐BiM’ and ‘text/x‐TeM’ shall be used to identify these encodingmechanisms.
namespaceisanull‐terminatedfieldconsistingofaspace‐separatedlist,inUTF‐8characters,ofoneormoreXMLnamespacestowhichthesampledocumentsconform.Whenusedformetadata,thisisneededforidentifyingitstype,e.g.gBSDorAQoS[MPEG‐21‐7]andfordecodingusingXMLawareencodingmechanismssuchasBiM.
schema_locationisanoptionalnull‐terminatedfieldconsistingofaspace‐separatedlist,inUTF‐8characters,ofzeroormoreURL’sforXMLschema(s)towhichthesampledocumentconforms.Ifthereisonenamespaceandoneschema,thenthisfieldshallbetheURLoftheoneschema.Ifthereismorethanonenamespace,thenthesyntaxofthisfieldshalladheretothatforxsi:schemaLocationattributeasdefinedby[XML].Whenusedformetadata,thisisneededfordecodingofthetimedmetadatabyXMLawareencodingmechanismssuchasBiM.
mime_format ‐providesaMIMEtype, innull‐terminatedUTF‐8characters,which identifies thecontentformatofthesamples.Examplesforthisfieldinclude‘text/html’and‘text/plain’.
text_config ‐ provides the initial text of eachdocument, innull‐terminatedUTF‐8 characters,whichisprependedbeforethecontentsofeachsyncsample.
theURI isaURIformattedaccordingtotherulesin6.2.4;uri_initialization_dataisopaquedatawhoseformisdefinedinthedocumentationofthe
URIform.
12.4 Hint media
12.4.1 Media handler
Hintmediausesthe‘hint’handlertypeinthehandlerboxofthemediabox,asdefinedin8.4.3.
12.4.2 Hint media header
12.4.2.1 Hint Media Header Box
BoxTypes: ’hmhd’Container: MediaInformationBox(‘minf’)Mandatory:YesQuantity: Exactlyonespecificmediaheadershallbepresent
Hint tracks use theHintMediaHeaderbox in themedia information box, as defined in 8.4.5. The hintmediaheader contains general information, independentof theprotocol, forhint tracks. (APDU is aProtocolDataUnit.)
12.4.2.2 Syntax
aligned(8) class HintMediaHeaderBox extends FullBox(‘hmhd’, version = 0, 0) { unsigned int(16) maxPDUsize; unsigned int(16) avgPDUsize; unsigned int(32) maxbitrate; unsigned int(32) avgbitrate; unsigned int(32) reserved = 0; }
12.4.2.3 Semantics
version isanintegerthatspecifiestheversionofthisboxmaxPDUsize givesthesizeinbytesofthelargestPDUinthis(hint)streamavgPDUsize givestheaveragesizeofaPDUovertheentirepresentationmaxbitrate givesthemaximumrateinbits/secondoveranywindowofonesecond
ISO/IEC 14496-12:2015(E)
170 ©ISO/IEC2015–Allrightsreserved
avgbitrate givestheaveragerateinbits/secondovertheentirepresentation
12.4.3 Sample entry
12.4.3.1 Definition
Hinttracksuseanentryformatspecifictotheirprotocol,withanappropriatename.
Forhinttracks,thesampledescriptioncontainsappropriatedeclarativedataforthestreamingprotocolbeingused,andtheformatofthehinttrack.Thedefinitionofthesampledescriptionisspecifictotheprotocol.
The ‘protocol’ and ‘codingname’ fields are registered identifiers that uniquely identify the streamingprotocolorcompressionformatdecodertobeused.Agivenprotocolorcodingnamemayhaveoptionalor required extensions to the sample description (e.g. codec initialization parameters). All suchextensionsshallbewithinboxes;theseboxesoccuraftertherequiredfields.Unrecognizedboxesshallbeignored.
12.4.3.2 Syntax
class HintSampleEntry() extends SampleEntry (protocol) { unsigned int(8) data []; }
12.5 Text media
12.5.1 Media handler
Thetimedtextmediatypeindicatesthattheassociateddecoderwillprocessonlytextdata.Timedtextmediausesthe‘text’handlertypeinthehandlerboxofthemediabox,asdefinedin8.4.3.
12.5.2 Media header
Timedtexttracksuseanullmediaheader(‘nmhd’),asdefinedinsubclause8.4.5.2.
12.5.3 Sample entry
12.5.3.1 Definition
TimedtexttracksusePlainTextSampleEntry.
12.5.3.2 Syntax
class PlainTextSampleEntry(codingname) extends SampleEntry (codingname) { }
class SimpleTextSampleEntry(codingname) extends PlainTextSampleEntry (‘stxt’) { string content_encoding; // optional string mime_format; BitRateBox (); // optional TextConfigBox (); // optional }
12.5.3.3 Semantics
content_encoding ‐ is a null-terminated string in UTF-8 characters, and provides aMIME typewhichidentifiesthecontentencodingofthetimedtext.Itisdefinedinthesamewayasforan
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 171
ItemInfoEntryinthisspecification.Ifnotpresent(anemptystringissupplied)thetimedtextisnotencoded.Anexampleforthisfieldis‘application/zip’.
mime_format ‐providesaMIMEtype, innull‐terminatedUTF‐8characters,which identifies thecontentformatofthesamples.Examplesforthisfieldinclude‘text/html’and‘text/plain’.
12.6 Subtitle media
12.6.1 Media handler
The subtitle media type indicates that the associated decoder will process text data and possiblyimages.Subtitlemediausesthe‘subt’handlertypeinthehandlerboxofthemediabox,asdefinedin8.4.3.
12.6.2 Subtitle media header
12.6.2.1 Definition
SubtitletracksusetheSubtitleMediaHeaderboxinthemediainformationbox,asdefinedin8.4.5.Thesubtitlemediaheadercontainsgeneralpresentationinformation,independentofthecoding,forsubtitlemedia.Thisheaderisusedforalltrackscontainingsubtitles.
12.6.2.2 Syntax
aligned(8) class SubtitleMediaHeaderBox extends FullBox (‘sthd’, version = 0, flags = 0){ }
12.6.2.3 Semantics
version ‐isanintegerthatspecifiestheversionofthisbox.flags ‐isa24‐bitintegerwithflags(currentlyallzero).
12.6.3 Sample entry
12.6.3.1 Definition
SubtitletracksuseSubtitleSampleEntry.
12.6.3.2 Syntax
class SubtitleSampleEntry(codingname) extends SampleEntry (codingname) { }
class XMLSubtitleSampleEntry() extends SubtitleSampleEntry (’stpp‘) { string namespace; string schema_location; // optional string auxiliary_mime_types; // optional, required if auxiliary resources are present BitRateBox (); // optional }
class TextSubtitleSampleEntry() extends SubtitleSampleEntry (‘sbtt’) { string content_encoding; // optional string mime_format; BitRateBox (); // optional TextConfigBox (); // optional }
ISO/IEC 14496-12:2015(E)
172 ©ISO/IEC2015–Allrightsreserved
12.6.3.3 Semantics
content_encoding ‐ is a null-terminated string in UTF-8 characters, and provides aMIME typewhich identifies the content encodingof the subtitles. It isdefined in the samewayas foranItemInfoEntry in this specification. If not present (an empty string is supplied) the subtitlesamplesarenotencoded.Anexampleforthisfieldis‘application/zip’.
namespaceisanull‐terminatedfieldconsistingofaspace‐separatedlist,inUTF‐8characters,ofoneormoreXMLnamespacestowhichthesampledocumentsconform.Whenusedformetadata,thisisneededforidentifyingitstype,e.g.gBSDorAQoS[MPEG‐21‐7]andfordecodingusingXMLawareencodingmechanismssuchasBiM.
schema_locationisanoptionalnull‐terminatedfieldconsistingofaspace‐separatedlist,inUTF‐8characters,ofzeroormoreURL’sforXMLschema(s)towhichthesampledocumentconforms.Ifthereisonenamespaceandoneschema,thenthisfieldshallbetheURLoftheoneschema.Ifthereismorethanonenamespace,thenthesyntaxofthisfieldshalladheretothatforxsi:schemaLocationattributeasdefinedby[XML].Whenusedformetadata,thisisneededfordecodingofthetimedmetadatabyXMLawareencodingmechanismssuchasBiM.
mime_format ‐providesaMIMEtype, innull‐terminatedUTF‐8characters,which identifies thecontentformatofthesamples.Examplesforthisfieldinclude‘text/html’and‘text/plain’.
auxiliary_mime_typesindicatesthemediatypeofallauxiliaryresources,suchasimagesandfonts, ifpresent,storedassubtitlesubsamples. If thereismorethanonemime_type,thenthisfieldshallbeaspace‐separatedlist.Thisfieldisnull‐terminatedinUTF‐8characters.
12.7 Font media
12.7.1 Media handler
Fontmediausesthe‘fdsm’handlertypeinthehandlerboxofthemediabox,asdefinedin8.4.3.
12.7.2 Media header
FonttracksuseaNullMediaHeader.
12.7.3 Sample entry
12.7.3.1 Definition
FontstreamsuseaFontSampleEntry.
12.7.3.2 Syntax
class FontSampleEntry(codingname) extends SampleEntry (codingname){ //other boxes from derived specifications BitRateBox (); // optional } 12.8 Transformed media
Protectedmediaisdescribedin8.12.
Incompletemediaisdescribedin8.17.
Restrictedmediaisdescribedin8.15.
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 173
Annex A(informative)
Overview and Introduction
A.1 Section Overview
This section provides an introduction to the file format, that potentially assists readers inunderstanding the overall concepts underlying the file format. It forms an informative annex to thisspecification.
A.2 Core Concepts
In the file format, the overall presentation is called amovie. It is logically divided into tracks; eachtrack represents a timed sequence ofmedia (frames of video, for example).Within each track, eachtimedunitiscalledasample;thismightbeaframeofvideooraudio.Samplesareimplicitlynumberedinsequence.Notethataframeofaudiomaydecompressintoasequenceofaudiosamples(inthesensethiswordisusedinaudio);ingeneral,thisspecificationusesthewordsampletomeanatimedframeorunit of data. Each trackhas oneormoresample descriptions; each sample in the track is tied to adescriptionbyreference.Thedescriptiondefineshowthesamplemaybedecoded(e.g.itidentifiesthecompressionalgorithmused).
Unlikemanyothermulti‐mediafileformats,thisformat,withitsancestors,separatesseveralconceptsthat are often linked. Understanding this separation is key to understanding the file format. Inparticular:
Thephysicalstructureofthefileisnottiedtothephysicalstructuresofthemediaitself.Forexample,many file formats ‘frame’ themedia data, putting headers or other data immediately before or aftereachframeofvideo;thisfileformatdoesnotdothis.
Neitherthephysicalstructureofthefile,northelayoutofthemedia,istiedtothetimeorderingofthemedia.Framesofvideoneednotbelaiddowninthefileintimeorder(thoughtheymaybe).
Thismeansthattherearefilestructuresthatdescribetheplacementandtimingofthemedia;thesefilestructurespermit,butdonotrequire,time‐orderedfiles.
Allthedatawithinaconformingfileisencapsulatedinboxes(calledatomsinpredecessorsofthisfileformat). There is no data outside the box structure. All the metadata, including that defining theplacement and timing of the media, is contained in structured boxes. This specification defines theboxes.Themediadata(framesofvideo, forexample) isreferredtobythismetadata.Themediadatamaybeinthesamefile(containedinoneormoreboxes),orcanbeinotherfiles;themetadatapermitsreferringtootherfilesbymeansofURLs.Theplacementofthemediadatawithinthesesecondaryfilesis entirely described by the metadata in the primary file. They need not be formatted to thisspecification,thoughtheymaybe;itispossiblethattherearenoboxes,forexample,inthesesecondarymediafiles.
ISO/IEC 14496-12:2015(E)
174 ©ISO/IEC2015–Allrightsreserved
Trackscanbeofvariouskinds.Threeareimportanthere.Video trackscontainsamplesthatarevisual;audio tracks contain audiomedia.Hint tracks are rather different; they contain instructions for astreamingserverinhowtoformpacketsforastreamingprotocol,fromthemediatracksinafile.Hinttrackscanbeignoredwhenafileisreadforlocalplayback;theyareonlyrelevanttostreaming.
A.3 Physical structure of the media
Theboxesthatdefinethelayoutofthemediadataarefoundinthesampletable.Theseincludethedatareference,thesamplesizetable,thesampletochunktable,andthechunkoffsettable.Betweenthem,thesetablesalloweachsampleinatracktobebothlocated,anditssizetobeknown.
Thedata referencespermitlocatingmediawithinsecondarymediafiles.Thisallowsacompositiontobebuiltfroma‘library’ofmediainseparatefiles,withoutactuallycopyingthemediaintoasinglefile.Thisgreatlyfacilitatesediting,forexample.
Thetablesarecompactedtosavespace.Inaddition,itisexpectedthattheinterleavewillnotbesampleby sample, but that several samples for a single trackwill occur together, then a set of samples foranothertrack,andsoon.Thesesetsofcontiguoussamplesforonetrackarecalledchunks.Eachchunkhasanoffsetintoitscontainingfile(fromthebeginningofthefile).Withinthechunk,thesamplesarecontiguously stored. Therefore, if a chunk contains two samples, the position of the secondmay befound by adding the size of the first to the offset for the chunk. The chunk offset table provides theoffsets;thesampletochunktableprovidesthemappingfromsamplenumbertochunknumber.
Notethatinbetweenthechunks(butnotwithinthem)theremaybe‘deadspace’,un‐referencedbythemediadata.Thus,duringediting,ifsomemediadataisnotneeded,itcansimplybeleftunreferenced;thedataneednotbecopiedtoremoveit.Likewise,ifthemediadataisinasecondaryfileformattedtoa‘foreign’fileformat,headersorotherstructuresimposedbythatforeignformatcansimplybeskipped.
A.4 Temporal structure of the media
Timinginthefilecanbeunderstoodbymeansofanumberofstructures.Themovie,andeachtrack,hasatimescale.Thisdefinesatimeaxiswhichhasanumberoftickspersecond.Bysuitablechoiceofthisnumber, exact timing can be achieved. Typically, this is the sampling rate of the audio, for an audiotrack. For video, a suitable scale should be chosen. For example, amediaTimeScale of 30000 andmediasampledurationsof1001exactlydefineNTSCvideo(often,butincorrectly,referredtoas29.97)andprovide19.9hoursoftimein32bits.
The timestructureofa trackmaybeaffectedbyanedit list.Theseprovide twokeycapabilities: themovement(andpossiblere‐use)ofportionsofthetime‐lineofatrack,intheoverallmovie,andalsotheinsertionof ‘blank’ time,knownasemptyedits.Note inparticularthat ifa trackdoesnotstartat thebeginningofapresentation,aninitialemptyeditisneeded.
Theoveralldurationofeachtrack isdefined inheaders; thisprovidesausefulsummaryof thetrack.Each sample has a defined duration. The exact presentation time (its time‐stamp) of a sample isdefinedbysummingthedurationsoftheprecedingsamples.
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 175
A.5 Interleave
Thetemporalandphysicalstructuresofthefilemaybealigned.Thismeansthatthemediadatahasitsphysical orderwithin its container in timeorder, asused. In addition, if themedia data formultipletracksiscontainedinthesamefile,thismediadatawouldbeinterleaved.Typically,inordertosimplifythereadingofthemediadataforonetrack,andtokeepthetablescompact,thisinterleaveisdoneatasuitabletimeinterval(e.g.1second),ratherthansamplebysample.Thiskeepsthenumberofchunksdown,andthusthechunkoffsettablesmall.
A.6 Composition
If multiple audio tracks are contained in the same file, they are implicitly mixed for playback. Thismixingisaffectedbytheoveralltrackvolume,andtheleft/rightbalance.
Likewise,video tracksarecomposed,by following their layernumber(fromback to front),and theircomposition mode. In addition, each track may be transformed by means of a matrix, and also theoverall movie transformed by matrix. This permits both simple operations (e.g. pixel doubling,correction of 90º rotation) as well as more complex operations (shearing, arbitrary rotation, forexample).
Derivedspecificationsmayover‐ridethisdefaultcompositionofaudioandvideowithmorepowerfulsystems(e.g.MPEG‐4BIFS).
A.7 Random access
This section describes how to seek. Seeking is accomplished primarily by using the child boxescontainedinthesampletablebox.Ifaneditlistispresent,itmustalsobeconsulted.
IfyouwanttoseekagiventracktoatimeT,whereTisinthetimescaleofthemovieheaderbox,youcouldperformthefollowingoperations:
1) Ifthetrackcontainsaneditlist,determinewhicheditcontainsthetimeTbyiteratingovertheedits.ThestarttimeoftheeditinthemovietimescalemustthenbesubtractedfromthetimeTtogenerateT',thedurationintotheeditinthemovietimescale.T'isnextconvertedtothetimescaleofthetrack'smediatogenerateT''.Finally,thetimeinthemediascaletouseiscalculatedbyaddingthemediastarttimeoftheedittoT''.
2) Thetime‐to‐sampleboxforatrackindicateswhattimesareassociatedwithwhichsampleforthattrack.Usethisboxtofindthefirstsamplepriortothegiventime.
3) Thesamplethatwaslocatedinstep1maynotbeasyncsample.Thesyncsampletableindicateswhichsamplesareinfactrandomaccesspoints.Usingthistable,youcanlocatewhichisthefirstsyncsamplepriortothespecifiedtime.Theabsenceofthesyncsampletableindicatesthatallsamples are synchronizationpoints, andmakes thisproblemeasy.Having consulted the syncsampletable,youprobablywishtoseektowhicheverresultantsampleisclosestto,butpriorto,thesamplefoundinstep1.
ISO/IEC 14496-12:2015(E)
176 ©ISO/IEC2015–Allrightsreserved
4) Atthispointyouknowthesamplethatwillbeusedforrandomaccess.Usethesample‐to‐chunktabletodetermineinwhichchunkthissampleislocated.
5) Knowingwhichchunkcontainedthesampleinquestion,usethechunkoffsetboxtofigureoutwherethatchunkbegins.
6) Startingfromthisoffset,youcanusetheinformationcontainedinthesample‐to‐chunkboxandthesamplesizeboxtofigureoutwherewithinthischunkthesampleinquestionislocated.Thisisthedesiredinformation.
A.8 Fragmented movie files
This section introduces a technique thatmaybeused in ISO files,where the constructionof a singleMovieBoxinamovieisburdensome.Thiscanariseinatleastthefollowingcases:
Recording.At themoment, if a recordingapplicationcrashes, runsoutofdisk,or someotherincidenthappens,afterithaswrittenalotofmediatodiskbutbeforeitwritestheMovieBox,therecordeddataisunusable.Thisoccursbecausethefileformatinsiststhatallmetadata(theMovieBox)bewritteninonecontiguousareaofthefile.
Recording. On embedded devices, particularly still cameras, there is not the RAM to buffer aMovieBoxforthesizeofthestorageavailable,andre‐computingitwhenthemovieisclosedistooslow.Thesameriskofcrashingapplies,aswell.
HTTPfast‐start. If themovie isofreasonablesize(intermsof theMovieBox, ifnottime), theMovieBoxcantakeanuncomfortableperiodtodownloadbeforefast‐starthappens.
Thebasic 'shape'of themovie isset in initialMovieBox: thenumberof tracks, theavailablesampledescriptions, width, height, composition, and so on. However the Movie Box does not contain theinformationforthefulldurationofthemovie;inparticular,itmayhavefewornosamplesinitstracks.
Tothisminimaloremptymovie,extrasamplesareadded,instructurecalledmoviefragments.
ThebasicdesignphilosophyisthesameasintheMovieBox;dataisnot'framed'.However,thedesignissuch that it can be treated as a 'framing' design if that is needed. The structuresmap readily to theMovieBox,soanfragmentedpresentationcanberewrittenasasingleMovieBox.
Theapproach is thatdefaultsareset foreachsample,bothglobally(onceper track)andwithineachfragment.Onlythosefragmentsthathavenon‐defaultvaluesneedincludethosevalues.Thismakesthecommoncase—regular,repeating,structures—compact,withoutdisablingtheincrementalbuildingofmoviesthathavevariations.
TheregularMovieBoxsetsupthestructureofthemovie.Itmayoccuranywhereinthefile,thoughitisbestforreadersifitprecedesthefragments.(Thisisnotarule,astrivialchangestotheMovieBoxthatforceittotheendofthefilewouldthenbeimpossible).ThisMovieBox:
mustrepresentavalidmovieinitsownright(thoughthetracksmayhavenosamplesatall);
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 177
hasanboxinittoindicatethatfragmentsshouldbefoundandused;
isusedtocontainthecompleteeditlist(ifany).
Note that software that doesn't understand fragmentswill play just this initialmovie. Software thatdoesunderstandfragmentsandgetsanon‐fragmentedmoviewon'tscanforfragmentsasthefragmentindicationboxwon'tbefound.
ISO/IEC 14496-12:2015(E)
178 ©ISO/IEC2015–Allrightsreserved
Annex B(void)
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 179
Annex C(informative)
Guidelines on deriving from this specification
C.1 Introduction
ThisAnnexprovidesinformativetexttoexplainhowtoderiveaspecificfileformatfromtheISOBaseMediaFileFormat.
ISO/IEC14496‐12 | ISO/IEC15444‐12 ISO BaseMedia Format defines the basic structure of the fileformat. Media‐specific and user‐defined extensions can be provided in other specifications that arederivedfromtheISOBaseMediaFileFormat.
C.2 General Principles
C.2.1 General
Anumberofexisting file formatsuse the ISOBaseMediaFileFormat,not least theMPEG‐4MP4FileFormat (ISO/IEC14496‐14), and the Motion JPEG 2000 MJ2 File Format (ISO/IEC15444‐3). Whenconsidering a new specification derived from the ISO Base Media File format, all the existingspecificationsshouldbeusedbothasexamplesandasourceofdefinitionsandtechnology.Checkwiththeregistrationauthoritytofindwhatmightalreadyexist,andwhatspecificationsexist.
Inparticular,ifanexistingspecificationalreadycovershowaparticularmediatypeisstoredinthefileformat (e.g. MPEG‐4 video in MP4), that definition should be used and a new one should not beinvented. In thiswayspecificationswhichshare technologywillalsoshare thedefinitionofhowthattechnologyisrepresented.
Beaspermissiveaspossiblewithrespecttothepresenceofotherinformationinthefile;indicatethatunrecognizedboxesandmediamaybeignored(not“shouldbeignored”).Thispermitsthecreationofhybrid files, drawing from more than one specification, and the creation of multi‐format players,capableofhandlingmorethanonespecification.
When layeringon this specification, it'sworthobserving that thereare somecharacteristics that areintentionally‘parameters’tothelower(Part12)specification,thatneedtobespecified.Equally,therearesomecharacteristicsofthePart12file formatspecificationthatareinternalandshouldrarelybediscussedbyotherspecifications.Ofcourse,therearesomecharacteristicsinagreyareainbetween.
Derivedspecificationsareideallywrittensolely intermsoftheparametersofthePart12file format;whatasampleis,whatitstimestampsmean,andsoon.Mentioningspecificexistingboxesinaderivedspecificationmayoftenturnouttobeanerror,exceptinlimitedcases(e.g.addingauser‐databox,oranextensionbox).
ISO/IEC 14496-12:2015(E)
180 ©ISO/IEC2015–Allrightsreserved
C.2.2 Base layer operations
ItshouldbepossibletoperformsomeoperationsonaPart12filewithoutknowinganythingaboutanypotentialderivedspecifications.Theseoperationsmightincludetheobviousreadingtracks,findingthedataandtimingforsamples,andtheirsampledescriptionandtracktype,andsoon.Thismightbedone,forexample,byafile‐formatinspectororgenerallibrarylikethereferencesoftware.
Lessobviousareaclassofmanipulationsofthefiles:
a) re‐interleaving the data;making themedia data in time order, with the samples for varioustracksgroupedintochunksofasensiblesize,withthechunksinterleaved;
b) makingfilesthatusedatareferencesself‐contained,bycopyingthedatafromexternalfilesintothenewfile;
c) removingfreespaceatomsandcompactingtheatomstructure;d) removing data from ‘mdat’ atoms that appears to be un‐referenced by tracks or meta‐data
atoms;e) removingsampleentriesthathavenoassociatedsamples;f) removingsamplegroupsthathavenoassociatedsamples;g) extracting some tracks and making a new file with just those (e.g. an audio track from an
audio/videopresentation);h) inserting,orremoving,moviefragments,orre‐fragmentingamovie.
Thislistisnotexhaustive,ofcourse.
C.3 Boxes
You can add boxes to the file format, but be careful about how they interact with other boxes. Inparticular,ifthey‘cross‐link’intoexistingboxes,youmightnotbeabletomarksuchfilesascompliantwithPart12.
Youmustregisterallnewboxes,exceptthoseusingthe‘uuid’type.Likewise,youshouldregistercodec(sample entry) names, brands, track reference types, handlers (media types), group types, andprotectionschemetypes.Itreallyisabadideatouseoneofthesewithoutregistration,ascollisionsmayoccur–orsomeoneelsemayregisterthesameidentifierwithadifferentmeaning.
Youshouldnotwriteaboxusingthe‘UUIDescape’(thereservedISOUUIDpattern0xXXXXXXXX‐0011‐0010‐8000‐00AA00389B71,wherethefour‐charactercodereplacestheX’s)ifasimplefour‐character‐codecanbeused,andideallyyoushouldn’tdesigntouseaUUIDbox;it’sbettertoplaceyourdatainknown‘expansionpoints’ofthefileformatifatallpossible,orregisteranewboxtypeifreallyneeded.
Don’tforgetthatalldatainISOfilesmustbe,orbecontainedin,boxes.Youcanintroduceasignature,butitmust‘looklike’abox.
Donotrequirethatanyexistingornewboxesyoudefinebeinaparticularposition,ifatallpossible.Forexample,theexistingJPEG2000specificationsrequireasignatureboxandthatitbefirstinthefile.Ifanother specification also defines a signature box and also requires that it be first, then a fileconformanttobothspecificationscannotbeconstructed.
Itmust be possible to ‘walk’ the top‐level of a file by finding box lengths. Don’t forget that ‘impliedlength’ispermittedatfilelevel.
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 181
Unlessabsolutelyunavoidable,boxesshouldcontaineitherdata(e.g.infields),orotherboxes,butnotboth. All boxes containing data should be a full box to allow later changes to syntax and semantics.Boxescontainingotherboxesareknownascontainerboxes, andarenormallyaplain (non‐full)box,sincetheirsemanticswillneverchangeiftheyaredocumentedtocontainonlyboxes.
C.4 Brand Identifiers
C.4.1 Introduction
Thissectioncoverstheuseofbrandidentifiersinthefile‐typebox,including:- Introductionofanewbrand.- Player’sbehaviourdependingonthebrand.- SettingofthebrandonthecreationoftheISOBaseMediafile.
Brandsidentifyaspecificationandmakeasimplesetofstatements:a) thefileconformstoallrequirementsoftheidentifiedspecification;b) thefilecontainsnothingcontrarytotheidentifiedspecification;c) a reader implementing potentially that single specification may read, interpret, and possibly
presentthefile,ignoringdataitdoesnotrecognize.
Specificationsshouldthereforesay(iftheyneedabrand)“thebrandthatidentifiesfilesconformanttothisspecificationisXXXX”,andregisterthebrand.
C.4.2 Usage of the Brand
Inordertoidentifythespecificationstowhichthefilecomplies,brandsareusedasidentifiersinthefileformat.ThesebrandsaresetwintheFileTypeBox.
Forexample,abrandmightindicate:(1)thecodecsthatmaybepresentinthefile,(2)howthedataofeachcodecisstored,(3)constraintsandextensionsthatareappliedtothefile.
Newbrandsmayberegisteredifitisnecessarytomakeanewspecificationthatisnotfullyconformanttotheexistingstandards.Forexample,3GPPallowsusingAMRandH.263inthefileformat.Sincethesecodecswerenotsupportedinanystandardsatthattime,3GPPspecifiedtheusageoftheSampleEntryandtemplatefieldsintheISOBaseMediaFormataswellasdefiningnewboxestowhichthesecodecsrefer.Consideringthatthefileformatisusedmorewidelyinthefuture,itisexpectedthatmorebrandswillbeneeded.
Brandsarenotadditive; theystandalone.Youcannotsay: “thisbrand indicates that support forY isalsorequired”becausethe‘also’hasnoreferent.
Systems that re‐write files should remove brands that they do not recognize, as they do not knowwhetherthefilestillconformstothatbrand’srequirements(e.g.re‐interleavingafilemaytakeitoutofconformancewithaspecificationthatrequiresacertainstyleofinterleaving).
ISO/IEC 14496-12:2015(E)
182 ©ISO/IEC2015–Allrightsreserved
Notethatthemajorbrandusuallyimpliesthefileextension,whichinturnimpliestheMIMEtype.Butthesearenotrules.Inaddition,whenservingunderaMIMEtypedonotforgetthatMIMEtypescantakeparameters,andthelistofcompatiblebrandswouldoftenbeusefultothereceivingsystem.
C.4.3 Introduction of a new brand
Anewbrand canbedefined if conformance to anew specificationmust be indicated.This generallymeansthatforthedefinitionofanewbrandatleastoneofthefollowingconditionsshouldbesatisfied:
1. Useofacodecthatisnotsupportedinanyexistingbrands.
2. Usemorethanonecodecinacombinationthatisnotsupportedinanyexistingbrands.Inaddition,theplaybackofthefileisallowedonlywhendecodingofallthemediainthefileissupportedbytheplayer.
3. Useconstraintsand/orextensions(Boxes,templatefields,etc.)thatareuser‐specific.
However,thefileformatcontainsbothamajor_brandfieldandacompatible_brandsarray.Thesefieldsareownedbythefileauthorandthepart12specification.Donotwriteaspecificationthattalksaboutthesefields,merelyaboutbrandsandwhattheymean.Inparticular,donotclaimthemajor_brandfield(“files conformant to this specificationmust set themajor_brand to XXXX”) as a file could never beconformanttotwosuchspecificationswrittenthatway,andyoualsoblocksomeonealsofromderivinga specification from yours. However, brands that are only permitted as compatible brands may bedefined.
Brandscanbeusedasatracer,however.It’sperfectlylegaltohaveabrandwhichhasnorequirements,and is placed in a file as an ‘Iwas there’ point (or strictly “this brand requires that the filewas lastwrittenbyZZZZ”).
C.4.4 Player Guideline
Ifmore than one brand is present in the list of the compatible_brands, and one ormore brands aresupported by the player, the player shall play those aspects of the file that comply with thosespecifications.Inthiscase,theplayermaynotbeabletodecodeunsupportedmedia.
C.4.5 Authoring Guideline
If the author wants to create a file that complies with more than one specification, the followingconsiderationsapply:
1. Theremustbenothingcontrarytothespecificationidentifiedbyabrandwithinthefile.Forexample,ifaspecificationrequiresthatfilesbeself‐contained,thenthebrandindicationofthatspecificationmustnotbeusedonnon‐self‐containedfiles.
2. If theauthorissatisfiedthataplayercompliantwithonlyoneofthespecificationsplayonlythatmediacompliantwiththatspecification,thenthatbrandmaybeindicated.
3. Iftheauthorrequiresthatthemediafrommorethanonespecificationbeplayed,thenanewbrandwouldbeneededasthisrepresentsanewconformancerequirementfortheplayer.
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 183
C.4.6 Example
Inthissection,wetaketheexamplecasewhenanewbrandcanbedefined.
Firstofall,weexplainaboutthetwocurrentlyexistingbrands. If thebrand ‘3gp5’ is inthelistofthecompatible_brands,itindicatesthatthefilecontainsthemediadefinedin3GPPTS26.234(Release5)inthewayspecifiedbythestandard.Forexample,thefileof‘3gp5’brandmaycontainH.263.Likewise,ifthe brand ‘mp42’ is in the list of the compatible_brands, it indicate that the file contains themediadefinedintheISO/IEC14496‐14inthespecifiedway.Forexample,thefileof‘mp42’brandmaycontainMP3.However,MP3isnotsupportedin‘3gp5’brand.
GiventhatthefilecontainsH.263andMP3,andhas‘3gp5’and‘mp42’asthecompatible_brands.Iftheplayercompliesonlywith‘3gp5’anddoesnotsupportMP3,recommendedbehaviouroftheplayeristoplay only H.263. If the content’s author does not expect such behaviour, a new brand is defined toindicatethatbothH.263andMP3aresupportedinthefile.Byspecifyingthenewlydefinedbrandinthelistofthecompatible_brands, itcanpreventtheabovebehaviourandthefileisplayedonlywhentheplayersupportsbothH.263andMP3.
C.5 Storage of new media types
Therearetwochoicesinthedefinitionofhowanewmediatypeshouldbestored.
First,ifMPEG‐4systemsconstructsaredesiredoracceptable,then:a) anewObjectTypeIndicationshouldberequestedandused;b) thedecoderspecificinformationforthiscodecshouldbedefinedasanMPEG‐4descriptor;c) theaccessunitformatshouldbedefinedforthismedia.
ThemediathenusestheMPEG‐4code‐pointsinthefileformat;forexample,anewvideocodecwoulduseasampleentryoftype‘mp4v’.
IftheMPEG‐4systemslayerisnotsuitableorotherwisenotdesired,then:a) anewsampleentryfour‐charactercodeshouldberequestedandused;b) anyadditionalinformationneededbythedecodershouldbedefinedasboxestobestored
withinthesampleentry;c) thefile‐formatsampleformatshouldbedefinedforthismedia.
Notethatinthesecondcase,theregistrationauthoritywillalsoallocateanobjecttypeindicationforuseinMPEG‐4systems.
C.6 Use of Template fields
Templatefieldsaredefinedinthefileformat.Ifanyareusedinaderivedspecification,theusemustbecompatiblewiththebasedefinition,andthatuseexplicitlydocumented.
ISO/IEC 14496-12:2015(E)
184 ©ISO/IEC2015–Allrightsreserved
C.7 Tracks
C.7.1 Data Location
Atrackisatimedsequenceofsamples;eachsampleisdefinedbyitsdata(thebytesitcontains),theirlength and location. The length and data of a sample are external parameters to the file format; thelocationofthebytesisnot.
TheexactwaythatthedataisstoredisinternaltothePart12fileformat.Whendefiningwhatasampleinyourformatis,youshoulddefinethelengthandthedataofasample.
Youshouldnotmentionthe followingboxes,however,asthewaythattheyarestructured isopentochange,andtheinformationthattheystoremaybestoredinotherways(e.g.samplesizeinformationmaybeinanstszbox,anstz2box,oramoviefragment):
samplesize(stsz),compactsamplesize(stz2)
Samplesare, infact,storedincontiguousrunsofsamplesforonetrack;theserunsarecalledchunks,and it is chunks from different tracks that are interleaved. But files may be re‐interleaved or re‐chunked;thefollowingboxesareabouthowchunkingisdone:
chunkoffsets(stcoorco64),sample‐to‐chunk(stsc)
Mostcritically,locatingdatainaPart12filemustbedonethroughtheseboxes(ortheirequivalentinmoviefragments).Themediadatabox(‘mdat’)ismerelyonepossiblelocation,andlookedatbyitself,itcan only be considered an un‐ordered bag of un‐identifiable bits. There is no assurance that thedesirable material in a media‐data box is the only data in the box or in any particular order, and,especially if data references are used, there is no assurance that any particular sample is even in amedia‐data box at all. Mentioning the media‐data (‘mdat’) box in a derived specification is almostcertainly a mistake, and attempting to define (or assume) its structure is usurping the Part 12specification,andisanerror.
It isperfectlypermissible torequireacertainstyle,duration,orsizeof interleaving inan integrationspecification(“thisspecificationrequiresthatthe filebeself‐contained,andthatthemedia‐databeindecodingtimeorder,interleavedonagranularityofnogreaterthanonesecond”).
C.7.2 Time
Similarly, samples are parameterized in time in the file format by their decoding timestamp, andoptionally by their composition timestamp. You should define what these mean for your media.However,thewaythatthesearestoredisagaininternaltothepart12fileformat.
Youshouldnotmentionthe followingboxes,however,asthewaythattheyarestructured isopentochange,andtheinformationthattheystoremaybestoredinotherways:
time‐to‐samplebox(stts),compositionoffsets(ctts)
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 185
Likewise,thetime‐structureeffectofeditsshouldbepreservedbythefileformat,butthereaPart12filesimplifiermay, forexample,mergetwoadjacenteditsthatinfactbelongtogether(e.g.twoemptyedits,oraneditthatselectstimeA‐BfollowedbyonethatselectsB‐C).
C.7.3 Media Types
There are a number ofmedia types in the Part 12 specification: video, audio,meta‐data, and so on.These are represented by track handler types and bymedia‐specificmedia headers. It is possible toregisternewmediahandlers,butthisisrarelyrequired.Itmightbeneeded,forexample,ifatracktypewere needed for say, laboratory instrument traces, or for a ‘timed aroma’ track. The registrationauthority should also be checked; the needed handler might be already defined in another derivedspecification.
C.7.4 Coding Types
Thenameofasampleentryidentifiesthecodingformatused.ThisisoneoftheprincipalwaysthatthePart12specificationisparameterized;AVC(MPEG‐4Part10)uses‘avc1’forexample,asasampleentrytype.Defining this name for a codec, and registering it, and then definingwhat extra boxes are in asampleentryforthiscodec,areprimarywaysthatthePart12formatisused.Youshoulddefinetheseforyourcodingsystem.Notethattechnicallythecodingtypeis‘scoped’bythemediatype(thoughwetrynottodefinethesamefour‐character‐codeastwodifferentcodecsintwomediatypes,suchasvideoandaudio,inordertoavoidconfusion).
C.7.5 Sub-sample information
The part 12 specification can carry information about ‘sub‐sample’ boundaries for each sample.However,thedefinitionofwhatasub‐sampleis,isspecifictoacodingsystem.Youmightwishtodefineitwhendefininghowacodingsystemisstored.
C.7.6 Sample Dependency
Thepart12 formatallowsyouto identifysomeof thedecodingdependency informationforacodingsystem.Inparticular,youshouldidentifywhatconstitutesavalid‘sync’orrandomaccesspoint(pointsfromwhichdecodingmaybestarted).Theycanbemarkedinthefileformat(inthesyncsampletable,orbyflagsinmoviefragments).Howsyncsamplearemarkedshouldbeoflessconcern.
Similarly,itispossibletoindicatewhichsamples:a) dependonothers,orcanbedecodedindependently;b) aredependedonbyothers,orcanbediscardedwithoutaffectingdecoding;c) containmultipleencodingsofthesameinformation,possiblywithdifferentdependencies(are
redundantlycoded).
Formostcodingsystemsthemeaningsoftheseareself‐evidentanddonotneedspellingout;however,theymayneedexplicitstatementforsomecodingsystems.
C.7.7 Sample Groups
Sample groups provide another way to describe samples and their characteristics. To use samplegroups,youcandefineagrouptype,andthenhowagroupisdefined(thegroupdescription).Thefileformatcanthenmapagivensample toasingledefinitionofagroupofanygiventype.Definingnew
ISO/IEC 14496-12:2015(E)
186 ©ISO/IEC2015–Allrightsreserved
groupingtypesandthewaythattheyareparameterizedisanimportantwaytoparameterizethefileformat.
C.7.8 Track-level
Trackscanbeassociatedwitheachotherinthefileformat,intwoimportantways.Trackreferencesareatypedlinkindicatingareferenceordependencyofonetracktooronanother(e.g.ameta‐datatrackthatdescribesamedia trackhasadependencyonthatmedia track,as itmakesnosensewithout it).Newtrackreferencetypescanberegisteredandusedinderivedspecifications.
Similarlytracksmaybegroupedintosetsofalternatives,wherethereaderisexpectedtobeabletopickonethatsuitsit(e.g.onthebasisofsupportedcodecs,bit‐rates,screensizes,andsoon).3GPP26.234hastakenthisconceptandincludeduser‐data(apermittedextension)togiveahintastowhyatrackisamemberofagroup(‘Icontainadifferentcodec’).
Lastly,tracksmaybeenabledordisabledinthefileformat.Disabledtracksmightbeused,forexample,foroptionalfeatures(e.g.closedcaptions).
C.7.9 Protection
Similarlytotheparameterizationofcodingschemesbyusingthesameentrytype,andextraboxesinthe sampleentry, thepart12 formatallowsprotection tobeapplied to tracks,parameterizedby theschemetypeandthecontentsoftheschemeinformationbox.Theschemeinformationboxis‘owned’bythescheme type– to theextent that containedboxes theredonotneed toberegistered,as theyarealreadyscopedbytheschemetype.
Protection can be subtle; many encryption systems, for example, ‘chain’ together. It’s tempting toencrypt‘thecontentsofthemdatbox’,butthatisverybadlynon‐resilienttominorchangestothefile.It’salsotemptingtoprotectchunks–theydoseemtorepresentcontiguousrunsofmediadataforonetrack.Butagain,re‐chunkingthefilemaybreaktheabilitytode‐protect.
Instead,considermodifyingthesample,orintroducingtime‐parallelmeta‐data,orusesamplegroups,to introduce enough context to enable both file‐based manipulation and decryption. Time‐parallelmeta‐datawouldbeinatrack,andatrackreferenceshouldbeusedtoindicatethattheprotecteddatadependsontheparallelencryption‐contexttrack.
C.8 Construction of fragmented movies
Whenconstructingafragmentedfileforplayback,therearesomerecommendationsforstructuringthecontentwhichwouldoptimizeplaybackandrandomaccess.Therecommendationsareasfollows:
Thefileshouldconsistofboxesinthefollowingorder:‐ 'ftyp'‐ 'moov'‐ pairof'moof'and'mdat'(arbitrarynumber)‐ 'mfra'
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 187
A'moof'boxconsistsofatmostone'traf'foreachmedia.Whenthefilecontainsasinglevideotrackandasingleaudiotrack,the'moof'willcontaintwo'traf',oneforthevideoandonefortheaudio.
For video, randomaccessible samples are stored as the first sampleof each 'traf'. In the caseofgradual decoder refresh, a random accessible sample and the corresponding recovery point arestored in the samemovie fragment. For audio, samples having the closest presentation time foreveryvideorandomaccessiblesamplearestoredasthefirstsampleofeach'traf'.Hence,thefirstsamplesofeachmediainthe'moof'havetheapproximatelyequalpresentationtimes.
First(randomaccessible)samplesarerecordedinthe'mfra'forbothvideoandaudio.
Allsamplesin‘mdat’areinterleavedwithanappropriateinterleavedepth.
Theoffsetandthe initialpresentationtimeofevery 'moof'aregiven in the 'mfra' forbothaudioandvideo.
Theplayerwill loadthe 'moov'and 'mfra' initially,andhold theminmemoryduringplayback.Whenrandomaccessisneeded,theplayerwillsearch'mfra'inordertofindtherandomaccesspointhavingtheclosestpresentationtimefortheindicatedtime.
Since the first sample in the 'moof' is random accessible, the player can directory jump in on therandomaccesspoint.Theplayercanreadthe'moof'oftherandomaccesspointfromthebeginning.Thesubsequent'mdat'startsfromtherandomaccessiblesample.Assuch,atwo‐stepseekingwouldnotbenecessaryforrandomaccess.
Notethatan‘mfra’boxisoptional,andmightneveroccurinagivenfile.
C.9 Meta-data
Much of what is said above about tracks and their data applies to meta‐data items, except that, ofcourse, meta‐data items have no time structure. In particular, the division of items into extents –allowingthemtobeinterleaved–isagain,apropertyofthefileformat.Itwouldbeamistaketodesignsomenewsupportbasedonextentstructure.
C.10 Registration
Register! If indoubt,contact theregistrationauthorityathttp://www.mp4ra.org.Registration is free,and so is the advice and help you will get. Not registering means that your use may conflict withsomeone else, and your use is also un‐traceable and therefore effectively undocumented. The RA isawareofmanybrands(at least)beingcheerfully inventedandused,butnotregistered.Thesepeopleare‘flyingdangerously’;don’tjointhem.
C.11 Guidelines on the use of sample groups, timed metadata tracks, and sample auxiliary information
TheISOBaseMediaFileFormatcontainsthreemechanismsfortimedmetadatathatcanbeassociatedwith particular samples: sample groups, timed metadata tracks, and sample auxiliary information.
ISO/IEC 14496-12:2015(E)
188 ©ISO/IEC2015–Allrightsreserved
Derivedspecificationmayprovidesimilar functionalitywithoneormoreof these threemechanisms.ThisClauseprovidesguidelinesforderivedspecificationstochoosebetweenthethreemechanisms.
Sample groups and timed metadata are less tightly coupled to the media data and are typically‘descriptive’,whereassampleauxiliaryinformationmightberequiredfordecoding.
Sampleauxiliaryinformationisonly intendedforusewheretheinformationisdirectlyrelatedtothesampleonaone‐to‐onebasis,and isrequired for themediasampleprocessingandpresentation.Forgeneralcontent,theexistingsolutionofadditionaltracksshouldbeused.Sampleauxiliaryinformationandsamplemediadataarebothaddressedusingbytepointersandsizeinformation,andsowhenthesamebytesformthedataformorethanonesampleitmaybepossibletosharethatdatabyre‐usingthesamebytepointer.
Samplegroupsmaybeusefulinthefollowingoccasions.
- When several samples share the same metadata values, it is space‐efficient to specify themetadata in a Sample Group Description box and the association of samples to metadata inSampletoGroupbox(es).
- As the sample group information is stored in Movie box and Movie Fragment box(es), theyprovideanindextothedataintheMediaDataboxes.NodatafromtheMediaDataboxesneedto be fetched,whichmay therefore reduce disk accesseswhen compared to timedmetadatatracksandsampleauxiliaryinformation.
Timedmetadatatracksmaybeusefulinthefollowingoccasions.
- Thesame timedmetadata trackmaybeassociated tomore thanone track. Inotherwords, atimedmetadata trackmay bemore independent of the content of the associated tracks thansamplegroupsandsampleauxiliaryinformation.
- It may be easier to append a file with a timed metadata track than with sample auxiliaryinformation or sample groups, because sample auxiliary information and Sample to GroupboxeshavetoresideinthesameTrackFragmentboxastheassociatedsamples,whereastimedmetadata may reside in its own Movie Fragment box(es). For example, it may be easier toprovideanadditionalsubtitletrackastimedmetadatathanusesampleauxiliaryinformation.
- Thedurationof timedmetadatasamplesneednotmatch thedurationofassociatedmediaorhint samples. In cases where the duration of timed metadata samples spans over multipleassociated media or hint samples, timed metadata tracks may be more space‐efficient thansampleauxiliaryinformation.
Sampleauxiliaryinformationmaybeusefulinthefollowingoccasions.
- Thedataassociatedwithsamplesischangingsufficientlyfrequentlysuchthatspecifyingsamplegroupsmaynotbejustifiedfromstoragespacepointofview.
- TheamountofdataassociatedwithsamplesissuchlargethatitscarriagewithintheMovieboxor Movie Fragment box (as required by sample grouping) would cause disadvantages. For
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 189
example,inprogressivedownloading,itmaybebeneficialtomakethesizeofMovieboxsmallinordertokeeptheinitialbufferingtimesmall.
- Wheneachsampleisassociatedwithmetadata,sampleauxiliaryinformationprovidesamorestraightforwardassociationoftheauxiliaryinformationtosampleswhencomparedtothesamefunctionalitywith timedmetadata tracks,which typically requires resolving sampledecodingtimetoestablishtheassociationbetweentimedmetadatasamplesandmedia/hintsamples.
ISO/IEC 14496-12:2015(E)
190 ©ISO/IEC2015–Allrightsreserved
Annex D(informative)
Registration Authority
D.1 Code points to be registered
The code‐points within the file format are all 32‐bit fields, normally four printable characters(commonlyknownasfour‐character‐codesor4CCs).Anobjecttypeidentifierisan8‐bitinteger.
Thecode‐pointsthatmayberegisteredare:
1) Fileformatboxidentifiers.Notethatinsomespecificationsboxeswereknownasatoms.Notethattheintroductionofnewatomtypesisdiscouraged;ingeneralotherextensibilityfeaturesofthefileformatshouldbeusedifpossible.
2) Fileformattracktypeidentifiers.Apairofidentifiersisusuallyusedhere,toidentifythetracktype (audio, video, etc.) and, if required, amedia‐specific header atom (videomedia header,etc.). It isexpectedthattheneedfornewtracktypesisrare,however;mostmediashouldfallintoexistingtypes(e.g.videocodecsshouldusevideotracks,hintprotocolsusehinttracks,andsoon).
3) Fileformatsampledescriptionandsampleformatidentifiers(alsoknownascodecnames).Thisincludesaudioandvideocodecs,andalsoprotocolidentifiersforhinttracks.Anyregistrationofanewsampleformatwillautomaticallybeissuedanobject‐typeidentifieralso(seebelow),thusmaking the identification of the carriage of this format within the MPEG‐4 systems objectdescriptorframeworkpossible.
4) Fileformattrackreferenceidentifiers.Dependenciesbetweentracksaretypedinthefileformat(forexample,hinttracksdependonthemediatrackstheyhint,usingatrackdependencyoftype‘hint’).
5) This specification includes a ‘file type’ atom which includes a list of ‘brands’ which identifywhichspecificationsthefileisconformantto.Bodiesdefiningstandardsbasedonthestructuraldefinition of this file formatwould normally use a new brand to identify files conformant totheirspecification.Anyregistrationofanewbrandmustspecifytheprecisespecificationwhichthebrandidentifies.
6) WithintheMPEG‐4objectdescriptorframework,theobjecttypevalueisusedtoidentifytheformatof thestreams.Anobjecttype identifiermayberequested independentlyof the fileformatidentifiersabove.
7) Samplegroupsassociatetypedinformationwithgroupsofsamples.Thegroupingtypemayberegistered.
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 191
8) Bothmedia andmetadata canbeprotectedand theprotection schemeused identifiedwitharegisteredprotectionschemetype.
Thesecode‐pointsarereferredtointherestofthisannexasregisteredidentifiers,abbreviatedasRIDs.
D.2 Procedure for the request of an MPEG-4 registered identifier value
Requestersof anMPEG‐4code‐pointsasdetailedabovevalue to identify aprivatedata format shallapply to the Registration Authority. Registration forms shall be available from the RegistrationAuthority.Therequestershallprovide the informationspecified inD.4.Companiesandorganizationsareeligibletoapply.
D.3 Responsibilities of the Registration Authority
TheprimaryresponsibilitiesoftheRegistrationAuthorityadministratingtheregistrationoftheprivatedataformatidentifiersareoutlinedinthisannex;certainotherresponsibilitiesmaybefoundintheJTC1Directives.TheRegistrationAuthorityshall:
a) implement a registration procedure for application for a unique RID in accordancewith theJTC1Directives;
b) receiveandprocesstheapplicationsforallocationofanidentifierfromapplicationproviders;
c) ascertainwhichapplicationsreceivedareinaccordancewiththisregistrationprocedure,andtoinformtherequesterwithin30daysofreceiptoftheapplicationoftheirassignedRID;
d) informapplicationproviderswhoserequestisdeniedinwritingwith30daysofreceiptoftheapplication,andtoconsiderresubmissionsoftheapplicationinatimelymanner;
e) maintainanaccurateregisteroftheallocatedidentifiers.RevisionstoformatspecificationsshallbeacceptedandmaintainedbytheRegistrationAuthority;
f) make thecontentsof this registeravailableuponrequest toNationalBodiesof JTC1 thataremembersofISOorIEC,toliaisonorganizationsofISOorIECandtoanyinterestedparty;
g) maintain a data base of RID request forms, granted and denied. Parties seeking technicalinformationontheformatofprivatedatawhichhasaRIDshallhaveaccesstosuchinformationwhichispartofthedatabasemaintainedbytheRegistrationAuthority;
h) report its activities annually to JTC1, the ITTF, and the SC29 Secretariat, or their respectivedesignees;and
i) accommodatetheuseofexistingRIDswheneverpossible.
D.4 Contact information for the Registration Authority
AppleComputerInc.
OneInfiniteLoop,M/S301‐4B
ISO/IEC 14496-12:2015(E)
192 ©ISO/IEC2015–Allrightsreserved
Cupertino,California95014USAE‐mail:[email protected]:http://www.mp4ra.org/
D.5 Responsibilities of Parties Requesting a RID
Thepartyrequestingaformatidentifiershall:
a) applyusingtheFormandproceduressuppliedbytheRegistrationAuthority;
b) include a description of the purpose of the registered identifier, and the required technicaldetailsasspecifiedintheapplicationform;
c) providecontactinformationdescribinghowacompletedescriptioncanbeobtainedonanon‐discriminatorybasis;
d) agreetoinstitutetheintendeduseofthegrantedRIDwithinareasonabletimeframe;and
e) tomaintainapermanentrecordoftheapplicationformandthenotificationreceivedfromtheRegistrationAuthorityofagrantedRID.
D.6 Appeal Procedure for Denied Applications
TheRegistrationManagementGroupisformedtohavejurisdictionoverappealstodeniedrequestforaRID.TheRMGshallhaveamembershipwho isnominatedbyP‐andL‐membersof the ISO technicalcommitteeresponsibleforISO/IEC14496.Itshallhaveaconvenorandsecretariatnominatedfromitsmembers.TheRegistrationAuthorityisentitledtonominateonenon‐votingobservingmember.
TheresponsibilitiesoftheRMGshallbe:
a) toreviewandactonallappealswithinareasonabletimeframe;
b) toinform,inwriting,organizationswhichmakeanappealforreconsiderationofitspetitionoftheRMGsdispositionofthematter;
c) toreviewtheannualreportoftheRegistrationAuthoritiessummaryofactivities;and
d) tosupplyMemberBodiesof ISOandNationalCommitteesof IECwithinformationconcerningthescopeofoperationoftheRegistrationAuthority.
D.7 Registration Application Form
D.7.1 Contact Information of organization requesting a RID
OrganizationName:
Address:
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 193
Telephone:
Fax:
E‐mail:
Telex:
D.7.2 Request for a specific RID
NOTE—Ifthesystemhasalreadybeenimplementedandisinuse,fillinthisitemanditemD.7.3andskiptoD.7.5,otherwiseleavethisspaceblankandskiptoD.7.3)
D.7.3 Short description of RID that is in use and date system was implemented
D.7.4 Statement of an intention to apply the assigned RID
D.7.5 Date of intended implementation of the RID
D.7.6 Authorized representative
Name:
Title:
Address:
Email:
Signature__________________________________
ISO/IEC 14496-12:2015(E)
194 ©ISO/IEC2015–Allrightsreserved
D.7.7 For official use of the Registration Authority
Attachment1Attachmentoftechnicaldetailsoftheregistereddataformat.
Attachment2Attachmentofnotificationofappealprocedureforrejectedapplications.
RegistrationRejected_____
Reasonforrejectionoftheapplication:
RegistrationGranted RegistrationValue____________________
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 195
Annex E (normative)
File format brands
E.1 Introduction
Thepresenceofabrandinthecompatible_brandslistoftheftypboxisaclaimandapermission.It isaclaimthatthefileconformstoalltherequirementsofthatbrand,andapermissiontoareaderimplementingpotentiallyonlythatbrandtoreadthefile.
In general, readersare required to implement all featuresdocumented forabrandunlessoneof thefollowingapplies:
a) themediatheyareusingdoesnotuseorrequireafeature:forexample,I‐framevideodoesnotneedasyncsampletable,andifcompositionre‐orderingisnotused,thennocompositiontimeoffset table is needed; similarly, if content protection is not needed, then support for thestructuresofcontentprotectionisnotrequired.
b) anotherspecificationwithwhichthefileisconformantforbidstheuseofafeature(forexample,somederivedspecificationsexplicitlyforbiduseofmoviefragments);
c) the context in which the product operatesmeans that some structures are not relevant; forexample, hint track structures are only relevant to products preparing content for, orperforming,filedelivery(suchasstreaming)fortheprotocolinthehinttrack.
The following sections list the brands defined in this specification; no inheritance is implied by thesectionorder–wheninheritanceoccurs,itisspecificallystated.Otherbrandsmaybedefinedinotherspecifications.Notethatifonebrandisasubsetofanother(e.g.,‘isom’requirementsareasubsetofthe‘iso2’requirements)then:
a) fileslabelledascompatiblewiththesubsetcanalwaysbelabelledasalsocompatiblewiththesuperset;afilecompatiblewith‘isom’canalwaysbelabelledascompatiblewith‘iso2’;
b) productssupportingthesupersetautomaticallycansupportthesubset;aproductthatsupports‘iso2’alsonecessarilysupports‘isom’.
Nobrandsdefinedhererequiresupportforanyparticularmediatype(e.g.,video,audio,meta‐data)ormedia encoding (e.g., a particular codec), or structures supporting a specificmedia type (e.g., VisualSampleEntriesortheboxescontainedinaspecifickindofsampleentry).
Morespecificidentifierscanbeusedtoidentifypreciseversionsofspecificationsprovidingmoredetail.These brands should not be used as the major brand; this base file format should be derived intoanother specification to be used. There is therefore no defined normal file extension, ormime typeassigned to thesebrands,nordefinitionof theminor versionwhenoneof thesebrands is themajorbrand.
ISO/IEC 14496-12:2015(E)
196 ©ISO/IEC2015–Allrightsreserved
E.2 The ‘isom’ brand
Thetype‘isom’(ISOBaseMediafile)isdefinedinthissectionofthisspecification,asidentifyingfilesthatconformtothefirstversionofISOBaseMediaFileFormat.
Supportforthefollowingstructuralboxesisrequired:
moov container for all the meta-data mvhd movie header, overall declarations trak container for an individual track or stream tkhd track header, overall information about the track tref track reference container edts edit list container elst an edit list mdia container for the media information in a track mdhd media header, overall information about the media hdlr handler, at this level, the media (handler) type minf media information container vmhd video media header, overall information (video track only) smhd sound media header, overall information (sound track only) hmhd hint media header, overall information (hint track only) <mpeg> mpeg stream headers dinf data information atom, container dref data reference atom, declares source(s) of media in track stbl sample table atom, container for the time/space map stts (decoding) time-to-sample ctts composition time-to-sample table stss sync (key, I-frame) sample map stsd sample descriptions (codec types, initialization etc.) stsz sample sizes (framing) stsc sample-to-chunk, partial data-offset information stco chunk offset, partial data-offset information co64 64-bit chunk offset stsh shadow sync stdp degradation priority mdat Media data container free free space skip free space udta user-data, copyright etc. ftyp file type and compatibility stz2 compact sample sizes (framing) padb sample padding bits mvex movie extends box mehd movie extends header box trex track extends defaults moof movie fragment mfhd movie fragment header traf track fragment tfhd track fragment header trun track fragment run mfra movie fragment random access tfra track fragment random access mfro movie fragment random access offset
Hinttracksmustberecognized,andinhinttracks,RTPprotocolhinttracks.
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 197
NotethatsomerequirementsoftheTrackHeaderBoxdonotapplytothisbrand;seesub‐clause8.3.2.1.
Supportforonlyversion0ofthe‘ctts’boxisrequiredhere;version1supportisnotrequired.
Supportforonlyversion0ofthe‘trun’boxisrequiredhere;version1supportisnotrequired.
NOTE Thedefault‐base‐is‐moofflag(8.8.7.1)cannotbesetwhereafileismarkedwiththisbrand.
E.3 The ‘avc1’ brand
Thebrand‘avc1’ shall beused to indicate that the file is conformantwith the ‘AVCExtensions’ insubclauses8.6.4and8.9.Ifusedwithoutotherbrands,thisimpliesthatsupportforthoseextensionsisrequired.Theuseof‘avc1’ asamajor‐brandmaybepermittedbyspecifications; in that case, thatspecificationdefinesthefileextensionandrequiredbehaviour.
The‘avc1’brandrequiressupportforthe‘isom’brand.Inaddition,supportofthefollowingboxesisrequired:
sdtp independent and disposable samples sbgp sample-to-group sgpd sample group description
Withinthesamplegroups,supportforrollgroups(groupingtype‘roll’)isrequired.
NOTE Thedefault‐base‐is‐moofflag(8.8.7.1)cannotbesetwhereafileismarkedwiththisbrand.
NotethatsomerequirementsoftheTrackHeaderBoxdonotapplytothisbrand;seesub‐clause8.3.2.1.
Supportforonlyversion0ofthe‘ctts’boxisrequiredhere;version1supportisnotrequired.
Supportforonlyversion0ofthe‘trun’boxisrequiredhere;version1supportisnotrequired.
SupportofSampleGroupDescriptionboxesinmoviefragmentsisnotrequired.
E.4 The ‘iso2’ brand
Thebrand‘iso2’shallbeusedtoindicatecompatibilitywiththesecondversionoftheISOBaseMediaFileFormat;itmaybeusedinadditiontoorinsteadofthe‘isom’brandandthesameusagerulesapply.If used without the brand 'isom' identifying the first version of this specification, it indicates thatsupportforsomeorallofthetechnologyinsubclauses8.6.4,8.8.15,8.11.1through8.11.7,8.11.10,0,ortheSRTPsupportinsubclause9.1,isrequired.
The‘iso2’brandrequiressupportforallfeaturesofthe‘avc1’brand.
Inaddition,supportforthefollowingboxesisrequired:
pdin progressive download information subs sub-sample information meta metadata iloc item location ipro item protection sinf protection scheme information box
ISO/IEC 14496-12:2015(E)
198 ©ISO/IEC2015–Allrightsreserved
frma original format box schm scheme type box schi scheme information box iinf item information (version field set to 0) xml XML container bxml binary XML container pitm primary item reference
In the context ofRTPhint tracks, SRTPhint tracksmust nowbe recognized. Contentprotection andgeneralizedmeta‐databoxessupportisrequired.
Only support for version 0 of the item information box, and version 0 of the item location box, isrequired.
NotethatsomerequirementsoftheTrackHeaderBoxdonotapplytothisbrand;seesub‐clause8.3.2.1.
Supportforonlyversion0ofthe‘ctts’boxisrequiredhere;version1supportisnotrequired.
Supportforonlyversion0ofthe‘trun’boxisrequiredhere;version1supportisnotrequired.
SupportforSampleGroupDescriptionboxesinmoviefragmentsisnotrequired.
NOTE Thedefault‐base‐is‐moofflag(8.8.7.1)cannotbesetwhereafileismarkedwiththisbrand.
Supportforonly16‐bititem_IDanditem_countvaluesin‘meta’boxisrequiredhere;32‐bititem_IDanditem_countvaluesin‘meta’boxisnotrequired
Supportfor‘meta’boxinmoviefragmentsisnotrequired Supportforonly‘subs’boxpertrackisrequiredhere
E.5 The ‘mp71’ brand
IfaMeta‐boxwithanMPEG‐7handlertypeisusedatthefilelevel,thenthebrand‘mp71’shouldbeamemberofthecompatible‐brandslistinthefile‐typebox.
E.6 The ‘iso3’ brand
Thebrand‘iso3’requiressupportforallfeaturesofthe‘iso2’brand.
Inaddition,supportforthefollowingisrequired:
fiin file delivery item information paen partition entry fpar file partition fecr FEC reservoir segr file delivery session group gitn group id to name meco additional metadata container mere metabox relation
Supportforversion0andversion1oftheiteminformationboxisrequired.Withinthesamplegroups,support for rate share information (grouping type ‘rash’) is required. File delivery hint tracks(sampleentry‘fdp ’)mustberecognized.
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 199
Supportforonlyversion0ofthe‘ctts’boxisrequiredhere;version1supportisnotrequired.
Supportforonlyversion0ofthe‘trun’boxisrequiredhere;version1supportisnotrequired.
SupportforSampleGroupDescriptionboxesinmoviefragmentsisnotrequired.
Onlysupportforversion0oftheitemlocationbox,isrequired.
NOTE Thedefault‐base‐is‐moofflag(8.8.7.1)cannotbesetwhereafileismarkedwiththisbrand.
Supportforonly16‐bititem_IDanditem_countvaluesin‘meta’boxisrequiredhere;32‐bititem_IDanditem_countvaluesin‘meta’boxisnotrequired
Supportfor‘meta’boxinmoviefragmentsisnotrequired Supportforonly‘subs’boxpertrackisrequiredhere
E.7 The ‘iso4’ brand
Thebrand‘iso4’requiressupportforallfeaturesofthe‘iso3’brand.
Supportforversion1ofthecompositionoffset(‘ctts’and‘iloc’)boxesisrequiredunderthisbrand.
Support forversion1of the item locationbox,version2of the item infobox,and thenew itemdata(‘idat’)anditemreference(‘iref’)boxesisrequired.
Inaddition,supportforthefollowingisrequired:
trgr track grouping indication cslg composition to decode timeline mapping idat item data iref item reference
Supportforonlyversion0ofthe‘trun’boxisrequiredhere;version1supportisnotrequired.
SupportforSampleGroupDescriptionboxesinmoviefragmentsisnotrequired.
NOTE Thedefault‐base‐is‐moofflag(8.8.7.1)cannotbesetwhereafileismarkedwiththisbrand.
Supportforonly16‐bititem_IDanditem_countvaluesin‘meta’boxisrequiredhere;32‐bititem_IDanditem_countvaluesin‘meta’boxisnotrequired
Supportfor‘meta’boxinmoviefragmentsisnotrequired Supportforonly‘subs’boxpertrackisrequiredhere Supportforonly32‐bitvaluesin‘cslg’boxisrequired.here;64‐bitvaluesin‘cslg’boxisnot
required.
E.8 The ‘iso5’ brand
Thebrand‘iso5’requiressupportforallfeaturesofthe‘iso4’brand.
Supportforthedefault‐base‐is‐moofflagisrequiredunderthisbrand.
ISO/IEC 14496-12:2015(E)
200 ©ISO/IEC2015–Allrightsreserved
Processingofrestrictedsampleentries(i.e.‘resv’)isrequiredunderthisbrand.
Supportforonlyversion0ofthe‘trun’boxisrequiredhere;version1supportisnotrequired.
SupportforSampleGroupDescriptionboxesinmoviefragmentsisnotrequired.
Supportforonly16‐bititem_IDanditem_countvaluesin‘meta’boxisrequiredhere;32‐bititem_IDanditem_countvaluesin‘meta’boxisnotrequired
Supportfor‘meta’boxinmoviefragmentsisnotrequired Supportforonly‘subs’boxpertrackisrequiredhere Supportforonly32‐bitvaluesin‘cslg’boxisrequired.here;64‐bitvaluesin‘cslg’boxisnot
required.
E.9 The ‘iso6’ brand
Thebrand‘iso6’requiressupportforallfeaturesofthe‘iso5’brand.
Supportforthefollowingboxesisrequiredunderthisbrand:
saiz sample auxiliary information sizes saio sample auxiliary information offsets tfdt track fragment decode time styp segment type sidx segment index ssix subsegment index prft producer reference time
Supportforthefollowingisrequiredunderthisbrand:
SampleGroupDescriptionboxesinmoviefragments;
Signedcompositionoffsetsintrackrunboxes(i.e.version1oftrackrunboxes);
Withinthesamplegroups,support forrandomaccesspoint information(groupingtype‘rap ’)isrequired.
Supportforonly16‐bititem_IDanditem_countvaluesin‘meta’boxisrequiredhere;32‐bititem_IDanditem_countvaluesin‘meta’boxisnotrequired
Supportfor‘meta’boxinmoviefragmentsisnotrequired Supportforonly‘subs’boxpertrackisrequiredhere Supportforonly32‐bitvaluesin‘cslg’boxisrequired.here;64‐bitvaluesin‘cslg’boxisnot
required.
E.10 The ‘iso7’ brand
Thebrand‘iso7’requiressupportforallfeaturesofthe‘iso6’brand.
Supportforthefollowingboxesisrequiredunderthisbrand:
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 201
trep track extension properties assp alternative startup sequence properties
Supportforthefollowingisrequiredunderthisbrand:
Supportfor32‐bititem_IDanditem_countvaluesin‘meta’box Recognizingincompletetracks. Supportfor‘meta’boxinmoviefragmentsisnotrequired Supportforonly‘subs’boxpertrackisrequiredhere Supportforonly32‐bitvaluesin‘cslg’boxisrequired.here;64‐bitvaluesin‘cslg’boxisnot
required.
E.11 The ‘iso8’ brand
Thebrand‘iso8’requiressupportforallfeaturesofthe‘iso7’brand.
Supportforthefollowingboxesisrequiredunderthisbrand:
sthd subtitle media header, overall information (subtitle track only)
Supportforthefollowingisrequiredunderthisbrand:
Supportfor‘meta’boxinmoviefragments Supportforoneormore‘subs’boxpertrack Supportforonly32‐bitvaluesin‘cslg’boxisrequired.here;64‐bitvaluesin‘cslg’boxisnot
required.
E.12 The ‘iso9’ brand
Thebrand‘iso9’requiressupportforallfeaturesofthe‘iso8’brand.
Supportforthefollowingboxesisrequiredunderthisbrand:
elng extended language tag
Supportforthefollowingisrequiredunderthisbrand:
Supportfor64‐bitvaluesin‘cslg’box;
ISO/IEC 14496-12:2015(E)
202 ©ISO/IEC2015–Allrightsreserved
Annex F(void)
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 203
Annex G (informative)
URI-labelled metadata forms
G.1 UUID-labelled metadata
The formatof theURI forUUID‐labelledmetadata isdefined in IETFRFC4122:AUniversallyUniqueIDentifier(UUID)URNNamespace(July2005).
There are no general statements about the form of the primarymetadata, the initialization data fortemporalmetadata, or the temporalmetadata itself. The formof all of these depends on thepreciseUUIDanditsdefinition.
NotethatUUIDscannoteasilybetracedtotheirpointoforigin,andsotheymaybeunsuitable if it isdesiredthatrecipientsofmetadatabeabletofind,ifneeded,theassociateddocumentation.
If traceability is needed, then a standardizedmetadata framework, such asMPEG‐7, or a registeredframework,suchasSMPTE,orade‐referencableURLshouldbeused.
G.2 ISO OID-labelled metadata
The formatof theURI forOID‐labelledmetadata isdefined inRFC3061:AURNNamespaceofObjectIdentifiers(February2001).
There are no general statements about the form of the primarymetadata, the initialization data fortemporalmetadata, or the temporalmetadata itself. The formof all of these depends on thepreciseobjectidentifieranditsdefinition.
A number of more specific labelling systems can also be expressed as object identifiers. The morespecificUUIDformshouldbeused.
Object identifiers starting {joint‐iso‐itu(2) uuid(25)} (i.e. starting urn:oid:2.25) should not be used;UUIDURIsshouldbeuseddirectly.
Object identifiers starting {iso(1) identified‐organizations(3) SMPTE(52)metadata‐dictionary(1)} (i.e.urn:oid:1.3.52.1) should not be used, nor should any other OID being used as a label according toSMPTE298Mor336M;themorespecificSMPTEURIformshouldbeused.
Object Identifiers are registered to specific organizations, and so it may be possible to identify theorganization owning a particular identifier. However, some sections of the object identifier tree aredelegatedtounregistereduses(suchasUUIDs,asnotedabove),andtraceabilityisthenlost.
If traceability is needed, then a standardizedmetadata framework, such asMPEG‐7, or a registeredframework,suchasSMPTE,orade‐referencableURLshouldbeused.
ISO/IEC 14496-12:2015(E)
204 ©ISO/IEC2015–Allrightsreserved
G.3 SMPTE-labelled metadata
TheformatoftheURIforSMPTE‐labelledmetadataisinRFC5119;AUniformResourceName(URN)NamespacefortheSocietyofMotionPictureandTelevisionEngineers(SMPTE).
Theprimarymetadata is exactly the value (V)partof aKLV (key, length, value) triplet asdefined inSMPTE336M,withthekeybeingthelabelgivenintheURN,andthelength(L)beingderivedfromtheitemlength.
Similarly,eachtemporalmetadatasampleisthevalue(V)partofaKLV,wherethekeyistheURNlabelgiveninthematchingsampleentry,andthelength(L)isderivedfromthesamplesize(asgiveninthesamplesizeorcompactsamplesizetables).
Theinitializationdatamaybepresent.Itcontainsthekey(K)andvalue(V)ofaKLVthatprovidesaninitializationcontextfortheKLVsformedfromthesamples,withthelength(L)beingderivedfromtheDataBoxsize.Thefirst16bytesareaSMPTElabeloftheinitializationdata,storedasdefinedinSMPTE336M,followedbythedata.
Thetypicalvalueofthesebytes,asdefinedinSMPTE377M,is‘primerpack’(inhexadecimal):060E2B34 02050101 0D010201 01050100. If the labelof the initializationdatadoesnot, in fact,identifyastructuregivingcontextinformation(suchasaprimerpack),thebehaviourisundefined.Thisenableseachsampletobealocalset.Therulesfortheconstructionoflocalsets,asdefinedinSMPTE377M,mustbefollowed.
SMPTE377Muses locators to locateotherresourcesoutside themetadata itself.Forstaticmetadata,theseshouldusetheitemlocationboxinthemeta‐box.Fortemporalmetadata,externalpointersmaybeuseddirectly.
The initialization data may be absent, and the label then identifies a specific metadata item (e.g. ageographiclocator)notneedingacontext.
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 205
Annex H (informative)
Processing of RTP streams and reception hint tracks
H.1 Introduction
H.1.1 Overview
This Annex provides recommendations for recording of RTP streams and the use of recorded RTPstreamsforplaybackandre‐sending.
H.1.2 Structure
ThisAnnexisorganizedasfollows:
- H.2introducesthepotentialsourceswhytheplaybackofRTPstreamsmightbecomeunsynchronizedandprovidesanoverviewhowpropersynchronizationisfacilitatedinrecordingandplayback.ItprecedestheotherClauses,becauseboththerecordingunitandtheplayerhavetotakeactionstoachievepropersynchronization.
- H.3providesrecommendationsforstoringRTPstreams.
- H.4providesrecommendationshowtoplayfilescontainingrecordedRTPstreams.
- H.5providesrecommendationsforre‐sendingreceivedRTPstreamsstoredinfilesasdescribedinH.3.
H.1.3 Terms and definitions
Forthepurposesofthisannex,thefollowingtermsanddefinitionsapply.
H.1.3.1 player entitythatparsesafile,decodesatleastasubsetofthetracksinthefile,andrendersthedecodedtracks
H.1.3.2 recording unit entitythatreceivesoneormorepacketstreamsofencapsulatedandcompressedmediaandstoresthereceivedmediaintoafile
H.1.3.3 re-sending unit entitythatparsesafilecontainingmediathatoriginatesfromoneormorereceivedpacketstreamsofencapsulatedandcompressedmediaandtransmitsatleastasubsetofthemediastoredinthefile
H.2 Synchronization of RTP streams
ThereareseveralpotentialsourcesofunsynchronizedplaybackforreceivedRTPstreams.WhenRTPstreams are recorded as RTP reception hint tracks, the necessary information for guaranteeingsynchronized playback is also recorded. When RTP streams are recorded as media tracks, thesynchronizationoftheplaybackofthemediatrackshastobeguaranteedbycreatingthecompositiontimesof themediasamplesappropriately.Thefollowing listdescribesthesourcesofunsynchronized
ISO/IEC 14496-12:2015(E)
206 ©ISO/IEC2015–Allrightsreserved
playbackforreceivedRTPstreams,summarizestherecommendedsynchronizationmeans,andpointstotherelevantClausesforfurtherinformation.
1. The RTP timestamp of the first packet of the stream has a random offset. Hence, the RTPtimestampsof twostreamsareshiftedbythedifferenceof their initialrandomoffsetseven ifthe potentially different clock rate of the RTP timestamps of the different streams werecompensated.Therandomoffsetshouldbereflectedinthevalueoftheoffsetfieldofthe'tsro'boxofthereferredreceptionhintsampleentryasdescribedinH.3.5.
2. The first received and recorded packet of the different streams may not have an identicalplaybacktimeasdiscussedinH.3.2.TheunequalstarttimeofthedifferentrecordedstreamsiscompensatedbyparsingoneormoreRTCPSenderReportstoderivetheplaybacktimeasthewallclocktimeofthesenderandcreatinganinitialoffsetoftheplaybackusingtheEditListboxasdescribedinH.3.2.TheEditListboxisinterpretedbytheplayerasdescribedin0.
3. ThereisnoguaranteethattheclockforproducingtheRTPtimestampsofacertainRTPstreamruns at the samepace as thewallclock time of the sender,which is used to create theRTCPSenderReports.Forexample,theRTPtimestampsmaybegeneratedonthebasisofaconstantsamplingfrequency,e.g.44.1kHzforaudio,andhencegovernedbytheclockrateoftheaudiocapturing hardware. However, the RTCP Sender Reports may be generated according to thesystem clock running at a different pace than the clock of the audio capturing hardware.Moreover, theclockused togenerateRTPtimestamps foraudiomightrunatadifferentpacethantheclockusedtogenerateRTPtimestampsforvideo(whenbothanormalizedtothesameclocktickfrequency).
Asimilarproblemintheplayerarisesiftheclockpacingtheoutputofadecodedstreamrunsatadifferentpacethanthewallclockoftheplayerortheclockspacingtherenderingofdifferentdecodedstreamsarenotsynchronized.
The recommended approach for all these potential problems of clocks running at a differentpaceistouseRTCPSenderReportstoaligntheRTPtimestampsofdifferentstreamsontothesamewallclocktimeline,whichisusedforinter‐streamsynchronization.Thisalignmentcanbedone while recording the streams by modifying the representation of the recorded RTPtimestampsorwhileplayingtherecordedstreamsbyusingtherecordedRTCPSenderReportsasdescribedinH.3.6.Moreover,itisrecommendedtopacetheplaybackaccordingtotheaudioplayoutrateasdescribedin0.
4. Thewallclockofthesendermayrunatadifferentpacethanthewallclockoftheplayer.
Itisrecommendedtoplayarecordedprogramatthepaceofthewallclockoftheplayerandtouse the audio playout clock as thewallclock of the player. Consequently, the audio timescaledoesnottypicallyhavetobemodified.Evenifthewallclockoftheplayerranatadifferentpacethanthewallclockofthesender,itistypicallyunnoticeable.
Pacingoftheoutputofdecodedmediasamplesisdescribedin0.
H.3 Recording of RTP streams
H.3.1 Introduction
RecordingofRTPstreamscanresultintothreebasicfilestructures.
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 207
1. AfilecontainingonlyRTPreceptionhinttracks.Nomediatracksareincluded.Thisfilestructureenablesefficientprocessingofpacketlosses,butonlyplayerscapableofparsingRTPreceptionhinttrackscanplaythefile.
2. Afilecontainingonlymediatracks.NoRTPreceptionhinttracksareincluded.Thisfilestructureallows existing players compliantwith the earlier versions of the ISO basemedia file formatprocessrecordedfilesaslongasthemediaformatsarealsosupported.However,sophisticatedprocessing of transmission errors is not possible due to reasons explained in subsequentclauses.
3. AfilecontainingbothRTPreceptionhinttracksandmediatracks.ThisfilestructurehasboththebenefitsmentionedaboveandshouldbeusedwhenforasgoodinteroperabilityaspossiblewithotherfileformatsderivedfromtheISObasemediafileformat.
IfanRTPstreambeingrecordedisprotected,aprotectedRTPreceptionhinttrackisusedinsteadofanRTPreceptionhint track,while theoperationof therecordingunit remainsunchangedotherwise.Atthe timeofplayback, thedata included in theprotectedRTPreceptionhint track isunprotected firstandthenprocessedsimilarlytoaconventionalunprotectedRTPstream.Alternatively,theRTPstreammaybeunprotectedbeforestoringitasaRTPreceptionhinttrack,butthencarehastobetakenthattherightstousethecontentintheprotectedRTPstreamareobeyed.
Someoftherecordingoperationsarecommonforallthethreefilestructures,whileothersdiffer.TableH.1indicateswhichrecordingoperationsarerequiredforthebasicfilestructures.
ISO/IEC 14496-12:2015(E)
208 ©ISO/IEC2015–Allrightsreserved
Table H.1
FilecontainingonlyRTPreceptionhinttracks
Filecontainingonlymediatracks
FilecontainingbothRTPreceptionhinttracksandmediatracks
CompensationforunequalstartingpositionofreceivedRTPstreams(H.3.2)
no,whenRTCPreceptionhinttracksarestored;yes,otherwise
yes no,whenRTCPreceptionhinttracksarestored;yes,otherwise
RecordingofSDP(H.3.3)
yes no yes,forRTPreceptionhinttracksonly
CreationofasamplewithinanRTPreceptionhinttrack(H.3.4)
yes no yes,forRTPreceptionhinttracksonly
RepresentationofRTPtimestamps(H.3.5)
yes no yes,forRTPreceptionhinttracksonly
Recordingoperationstofacilitateinter‐streamsynchronizationinplayback(H.3.6)
yes yes,thecompositiontimesofmediatracksshouldbecompensatedasdescribedinH.3.6.3
yes
Representationofreceptiontimes(H.3.7)
yes no yes,forRTPreceptionhinttracksonly
Creationofmediasamples(H.3.8)
no yes yes,formediatracksonly
Creationofhintsamplesreferringtomediasamples(H.3.9)
no no yes
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 209
Some implementations may record first to RTP reception hint tracks only and create a file with acombinationofmediatracksandRTPreceptionhinttracksoff‐line.
H.3.2 Compensation for unequal starting for position of received RTP streams
When the recording of RTP streams is started, it can happen that the presentation time of the firstmedia sample in one RTP stream is not equal to the presentation time of the firstmedia sample inanotherRTPstreamatleastduetothefollowingreasons:
- Thesamplingfrequencyofaudioandvideotypicallydiffer.
- Audio and video streamsmay not be perfectly interleaved in terms of presentation times intransmissionorder.
IfRTCPreceptionhinttracksarestored,thecompensationforunequalstartingpositionofreceivedRTPstreams should be done at playback time andnoEdit List box concerningRTP receptionhint tracksshould be created. If RTCP reception hint tracks are not stored or if media tracks are stored it isessential that the recording unit indicates the relative initial delay of the streams in order tosynchronize audio and video correctly at the beginning of the playback of the streams as describedsubsequentlyinthisClause.Therecordingunitshouldperformthefollowingoperations.
1. AnRTCPSenderReport indicateswhichRTP timestampcorresponds to thewallclock timeofthetimeinstantthereportwassent.AtleastthefirstRTCPSenderReportforeachRTPstreamshouldbeparsedinordertoestablishanequivalenceofanRTPtimestampofeachRTPstreamandawallclocktimeofthesender.ThewallclocktimestampoftheearliestreceivedRTPpacket,inpresentationorder,isderivedforeachRTPstreambysimplelinearextrapolation.
2. ThesmallestwallclocktimestampderivedaboveamongallthereceivedRTPstreamsismappedto presentation timestamp zero in the movie timeline, i.e., is presented immediately at thebeginningoftheplaybackoftherecordedfile.Themovietimelineisthemastertimelinefortheplaybackofthefile.
3. Themediatimelineforeachtrackstartsfrom0.Inordertoshiftthemediatimelinetoacorrectstartingpositioninthemovietimeline,anEditboxandanEditListboxarecreatedforeachoftheotherRTPtracks(whichdonotcontainapackethavingtheearliestwallclocktimestamp)asfollows:
TheEditListboxcontainstwoentries:
a) The first entry is an empty edit (indicated bymedia_time equal to ‐1), and its duration(segment_duration) is equal to the difference of the presentation times of the earliestmedia sample among all the RTP streams and the earliest media sample of the track.FigureH.1presentsanexampleofhowthesegment_durationofthefirstentryinanEditListboxisderived.
b) Thevalueofmedia_timeofthesecondentryisequaltothecompositiontimeoftheearliestsample in presentation order, and the value of segment_duration of the second entryspansovertheentiretrack.Astheactualdurationof thetrackmightnotbeknownatthetimeofcreatingtheEditListbox,itisrecommendedtosetthesegment_durationequaltothemaximumpossiblevalue(eitherthemaximum32‐bitunsignedintegerorthemaximum64‐bitunsignedinteger,dependingonwhichversionoftheboxisused).
Thevalueofmedia_rate_integerisequalto1inboththeentriesoftheEditListbox.
ISO/IEC 14496-12:2015(E)
210 ©ISO/IEC2015–Allrightsreserved
1staudio sam
ple
1stvid
eo sample
Figure H.1 — An example of an Edit List box to compensate the unequal starting of the received RTP streams, segment_duration is copied to the first entry of the Edit List box
Somerecordingunitsmaydetectpacketsfromwhichdecodingcanbestarted,suchasIDRpicturesofH.264/AVCstreams,whichareherereferredtoasrandomaccesspoints.Ifastreamcontainsapackethavingtheearliestwallclocktimestampamongallthereceivedstreamsandthesamestreamcontainspacketspreceding, indecodingorder, the firstrandomaccesspointof thestream, it is recommendednottostorethepacketsprecedingthefirstrandomaccesspointofthestreamandnottoconsiderthemwhendeterminingtheearliestwallclocktimestampamongallthereceivedstreams.
H.3.3 Recording of SDP
TheSDPshouldbestoredasfollows.Session‐levelSDP,i.e.,alllinesbeforethefirstmedia‐specificline(“m=”line),shouldbestoredasMovieSDPinformationwithintheUserDatabox,asspecifiedin9.1.4.1.Eachmedia‐levelsectionwithintheSDPdescriptionstartswithan'm='lineandcontinuestothenextmedia‐level section or the end of thewhole session description. Eachmedia‐level section should bestored as Track SDP informationwithin the User Data box of the corresponding RTP reception hinttrack.
H.3.4 Creation of a sample within an RTP reception hint track
It is recommended that each sample represents all received RTP packets that have the same RTPtimestamp,i.e.,consecutivepacketsinRTPsequencenumberorderwithacommonRTPtimestamp.TheRTPsamplestructureissettocontainoneRTPpacketstructurepereachreceivedRTPpackethavingthesameRTPtimestamp.EachRTPpacketisrecommendedtocontainonepacketconstructoroftype2(RTPsampleconstructor). AnRTPsampleconstructor copies a particular byte range, indicated bythe sampleoffset and length fields of the constructor, of a particular sample, indicated by thesamplenumber field of the constructor, by reference into the packet payload being constructed. Thepayload of each received RTP packet having the same RTP timestamp is copied to the extradatasectionofthesample.Thetrackreferenceofeachconstructorissettopointtothehinttrackitself,i.e.,
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 211
issetequalto‐1,andsampleoffsetandlengtharesettomatchtothelocationandsizeofthepacketpayloadwithinthesample.
FigureH.2presentsapseudo‐codeexampleofanRTPreceptionhintsample,whichcontainstwoRTPpackets.
Figure H.2 — An example of a RTP reception hint sample containing two packets (their header and payload)
The use of an error occurrence indexing event to indicate an RTP packet loss is not recommended,becausetheRTPsequenceseedfieldcanbeusedfordetectingpacketlosseswithoutanyincreaseinthestoragespace.Furthermore,theminimumunittheerroroccurrenceeventcanrefertoisasample(inanRTPreceptionhinttrack).Sinceasamplecancontainmanypackets,itisambiguouswhichonesofthesepacketstheerroroccurrenceindexingeventconcerns.
H.3.5 Representation of RTP timestamps
RTPtimestampsarerepresentedinaRTPreceptionhinttrackbyasumofthreevalues,oneofwhichisthedecodingtimeDTinthemediatimelineofthetrack.Thedecodingtimeisrun‐lengthcodedintothe
ISO/IEC 14496-12:2015(E)
212 ©ISO/IEC2015–Allrightsreserved
DecodingTimetoSampleboxandadditionallytooneormoreTrackFragmentRunboxes,ifasampleresides inamovie fragment.TheDecodingTime toSamplebox includesanumberofsample_countand sample_delta pairs, where sample_delta is the decoding time increment (i.e., the sampleduration in terms of decoding time) for each sample in a set of consecutive samples, the number ofwhichequalstosample_count.TheTrackFragmentRunboxindicatesonepairofsample_countandsample_duration, where sample_duration is the decoding time increment (i.e., the sampleduration) for each sample in a set of consecutive samples, the number of which equals tosample_count. Each Track Fragment box can contain a number of Track Fragment Run boxes. Thedecoding timeDT(i) for sample number i is derived by summing up the sample durations of all thesamplesprecedingsampleifromtheDecodingTimetoSampleboxand,ifneeded,theTrackFragmentRunboxesreferringtoanysampleprecedingsamplei.
TheRTPtimestampforsamplei,RTPTS(i),isrepresentedbyasumofthreevaluesspecifiedasfollows:
RTPTS(i) = (DT(i) + tsro.offset + offset) mod 232 (H.1)
wheretsro.offsetisthevalueofoffsetinthe'tsro'boxofthereferredreceptionhintsampleentryandoffsetisthevalueincludedinthertpoffsetTLVboxintheRTPpacketstructure,andmodisthemodulooperation.
A'tsro'boxshouldbepresentinRTPreceptionhintsampleentries.Thevalueofoffsetinany'tsro'boxofatrackshouldbeequaltotheRTPtimestampofthefirstpacketoftherespectivestreaminRTPsequencenumberorder.
Providedthatnowrap‐aroundoftheRTPtimestampvaluesoverthemaximum32‐bitunsignedintegerhappenedbetweensample i‐1and i, thedifferencebetweenconsecutiveunequalRTP timestamps, inRTPsequencenumberorder,is
RTPTS_DIFF(i) = RTPTS(i) – RTPTS(i – 1) for any i > 1 (H.2)
RTPTS_DIFF(i) remains unchanged, when the frame rate is constant, the number of frames in anypacketisconstant,andthetransmissionorderisthesameasthepresentationorder.Theseconstraintsare typicallymet by audio streamsand temporallynon‐scalable video streams. IfRTPTS_DIFF(i) is aconstant denoted asRTPTS_DIFF, the following is recommended. The value ofsample_delta in theDecodingTimetoSampleboxand,ifmoviefragmentsareused,thevalueofsample_durationintheTrackFragmentRunboxorboxesaresettoRTPTS_DIFF,whichresultsintocompactDecodingTimetoSample and Track Fragment Run boxes. ThertpoffsetTLV box should not be usedwithin the RTPreceptionhintsamples,ifRTCPreceptionhinttracksareused(seeH.3.6).Otherwise(ifRTCPreceptionhinttracksarenotused),offsetinthertpoffsetTLVboxshouldbesetto0.
Whentemporalscalabilityisusedinavideostream,thetransmissionorderandtheplaybackorderofpacketsarenotidentical,RTPtimestampsdonotincreaseasafunctionofRTPsequencenumber,andRTPTS_DIFF(i) is not constant. However, RTP timestamps typically have a constant behaviour inperiodsdeterminedbytheGOP_size,whichisoneplusthenumberofpicturesbetweentwoconsecutivepicturesinthelowesttemporallevelinRTPsequencenumberorder.Forexample,iftwonon‐referencepicturesarecodedforeachpairofreferencepicturesasillustratedinFigureH.3,GOP_sizeisequalto3.FigureH.4presentsanexampleofahierarchicallytemporallyscalablebitstreamwithGOP_sizeequalto4.
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 213
Figure H.3 — An example of a temporally scalable bitstream with GOP_size equal to 3
(RTPsequencenumbers(SN)arenormalizedtostartfrom0,andonepacketperframeisassumed.RTPtimestamps(TS)arenormalizedtostartfrom0andindicatedasclocktickslastingoneframeinterval.InterpredictionarrowsareindicatedforthefirstGOPonly,whilepicturesinotherGOPsarepredictedsimilarly.)
Figure H.4 — An example of a hierarchically temporally scalable bitstream with GOP_size equal to 4
(RTPsequencenumbers(SN)arenormalizedtostartfrom0,andonepacketperframeisassumed.RTPtimestamps(TS)arenormalizedtostartfrom0andindicatedasclocktickslastingoneframeinterval.)
TheRTPtimestampincrementcausedbyoneGOPisderivedasfollows,whennowrap‐aroundoftheRTPtimestampvaluesoverthemaximum32‐bitunsignedintegerhappenedbetweensampleiandi+GOP_size,inclusive:
RTPTS_GOP_DIFF(i) = RTPTS(i + GOP_size) – RTPTS(i) (H.3)
IfRTPTS_GOP_DIFF(i)isaconstantequaltoRTPTS_GOP_DIFF,whennosamplei,i+1,…,i+GOP_sizeisapicturestartingaso‐calledclosedgroupofpictures,suchasanIDRpictureofH.264/AVCstreams,thefollowing is recommended. The value ofsample_delta in theDecoding Time to Sample Box and, ifmoviefragmentsareused,thevalueofsample_durationintheTrackFragmentRunboxorboxesareset to RTPTS_GOP_DIFF / GOP_size. The rtpoffsetTLV box should not be used for pictures in thelowesttemporallevel,ifRTCPreceptionhinttracksareused(seeH.3.6).Otherwise(ifRTCPreceptionhinttracksarenotused),offsetinthertpoffsetTLVboxshouldbesetto0.ThevalueofoffsetinthertpoffsetTLVboxshouldbesetforpicturesinothertemporallevelstosuchthatFormulaH.(1)is
ISO/IEC 14496-12:2015(E)
214 ©ISO/IEC2015–Allrightsreserved
fulfilled.FigureH.5indicateshowthedecodingtimeandoffsetaresetforahierarchicallytemporallyscalablevideobitstreampresentedinFigureH.4.
IDR
B
B
P
B B
B
B
P
...
...0
1
Temporal level
0 43 2 61 7 8 5DT ...
RTP TS 0 31 2 64 5 7 8 ...(x clock tick of one frame interval)
2 ...
offset 0 -1-2 0 03 -2 -1 3 ...
Figure H.5 — An example of setting the decoding time (DT) and the value of offset in the rtpoffsetTLV box of a hierarchically temporally scalable bitstream with GOP_size equal to 4.
(Inthisexample,thedecodingtimeincrementbetweensamplesissetequaltoRTPTS_GOP_DIFF/GOP_sizetohaveacompactencodingdecodingtimes.ThevalueofoffsetinthertpoffsetTLVboxisadjustedforeachsampletostorearepresentationoftheRTPtimestamp.Forthisillustration,RTPtimestampsanddecodingtimesarenormalizedtostartfrom0andindicatedasclocktickslastingoneframeinterval.)
IfnolinearandperiodicalbehaviourofRTPtimestampsisdetectedfromthereceivedpackets,andnotworeceivedpacketsofdifferentsampleshavethesamereceptiontime, itisrecommendedtosetthevalueofsample_deltaintheDecodingTimetoSampleBoxand,ifmoviefragmentsareused,thevalueofsample_duration in theTrackFragmentRunboxorboxestorepresentthereceptiontimeof thefirstpacketofthesample.Thatis,thederiveddecodingtimeDT(i)shouldbeequaltothereceptiontimeofthefirstpacketofthesamplesubtractedbythereceptiontimeofthefirstpacketofthefirstreceivedsampleofthestream.
It isnotedthatcompositiontimestampsarenotexplicitly indicatedinthefile forsamplesinanyhinttracks.Consequently,forRTPreceptionhinttracks,thecompositiontimestampsareinferredfromtheinformationrelated theRTPtimestamps indicated in thestoredpacketstream.ForanRTPreceptionhinttrackthatisnotassociatedwithanRTCPreceptionhinttrack,thecompositiontimeofareceivedRTPpacketisinferredtobethesumofthesampletimeDT(i)andthevalueoftheoffsetfieldinthertpoffsetTLV box including the sample. For anRTP receptionhint track that is associatedwith anRTCP reception hint track, the composition time is inferred as follows. Let the received RTP packethaving the earliest RTP timestamp within the same track have composition time equal to 0. AnyremainingRTPpackethasacompositiontimeequaltotheRTPtimestampdifferenceofthepresentRTPpacketandtheearliestRTPpacketinpresentationorderwithclockdriftcorrectionsimilartoH.3.6.3.Thecompositiontimereferstothemediatimelineofthetrack.
H.3.6 Recording operations to facilitate inter-stream synchronization in playback
H.3.6.1 General
Lipsynchronization,i.e.,correctsynchronizationbetweenrecordedRTPstreams,duringplaybackcanbefacilitatedatleastwiththefollowingtwomeans:
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 215
1. AnRTCPreceptionhinttrackisgeneratedforeachRTPreceptionhinttrack.ThepotentialclockdriftbetweentheRTPtimestampclocksofdifferentstreamsiscorrectedatthetimewhenthefileisparsedandthemediastreamsincludedinthefilearedecodedandplayed.TheclockdriftcorrectionisdonesimilarlytoaswouldbedoneforRTPstreamsthatarereceivedandplayedsimultaneously. Thismode of operation is straightforward for the recording units. However,accessingafilefromanexactplaybackpositionmightbemorecumbersome,becauseitrequirescompensationoftheclockdriftofalltherecordedstreamsatthetimeoftheaccess.
2. The potential clock drift between recorded RTP streams is corrected by modifying the RTPtimestampsofoneormorerecordedstreams.ThismodeofoperationisrequiresprocessingofRTCPSenderReportsatthetimeofrecordingandishencemoretediousfortherecordingunitsthan creation of RTCP reception hint tracks. However, the operation of the player isstraightforward.
Recordingunitsshouldusethetimestampsynchronybox[9.4.1.2]toindicatewhichlipsynchronizationapproach has been used. The timestamp synchrony box includes the timestamp_sync field.timestamp_sync equal to 1 indicates that players should use RTCP reception hint tracks for lipsynchronization. timestamp_sync equal to 2 indicates that players should use compositiontimestampsforlipsynchronization.
Some implementations may create RTCP reception hint tracks first during the real‐time recordingoperation and then compensate the clock drift by modifying RTP timestamps as an off‐line post‐processingstep.
Thefollowingclausesprovidemoredetailsaboutbothapproaches.
H.3.6.2 Facilitating lip synchronization based on RTCP Sender Reports
A recording unit stores all RTCP Sender Reports for a particular RTP stream as samples in therespectiveRTCPreceptionhinttrack.
H.3.6.3 Compensating clock drift in timestamps
It is not recommended to modify the RTP timestamps of the recorded audio streams. Such amodification would cause an audio timescale modification in the player, which is a non‐trivialoperation.
TherecordedrepresentationoftheRTPtimestampsofthevideoandothernon‐audiostreamsshouldbemodifiedusingthefollowingprocedure.
1. First, the wallclock timestamp a of a video frame is derived from the RTP timestampcorresponding to the video frame as a sum of the wallclock timestamp of the previous videoframeandthedifferenceoftheRTPtimestampsofthecurrentandpreviousvideoframesintheunitsofthewallclocktimeline.
2. Second, the playback timeb for the video frameon thewallclock time is derivedbasedon theRTCPSenderReports.IfnoRTCPSenderReportthatexactlyindicatesthewallclocktimeforthevideoframeisavailable,thewallclocktimecanbeextrapolatedassumingthattherateatwhichthe RTP timestamp clock and the sender wallclock in RTCP Sender Reports deviates staysunchanged.
ISO/IEC 14496-12:2015(E)
216 ©ISO/IEC2015–Allrightsreserved
3. Third, based on the RTCP Sender Reports for audio, the audio RTP timestamp that is playedsimultaneouslywiththevideoframeattimebofthewallclocktimelineisderived.ThereneednotbeanaudioframehavingexactlythederivedaudioRTPtimestamp.Thewallclocktimestampcofanaudiosample iscalculated fromthederivedaudioRTPtimestampasasumof thewallclocktimestampoftheprecedingaudioframeandthedifferenceoftheRTPtimestampsofthederivedaudioRTPtimestampandtheRTPtimestampoftheprecedingaudioframe.
Thedifferencebetweenaandc,ifany,shouldbecompensatedinthefieldsthatrepresentthevideoRTPtimestampinthefile.Inpractice,theeasiestwaymightbetoaddthedifferencetotheoffsetfieldinthertpoffsetTLVbox,whichisillustratedinFigureH.6.Theotheroption,rewritingtheDecodingTimetoSampleboxandtheTrackFragmentRunboxes(ifany),mightbemorecumbersometoimplement,becauseofparticularwayofcodingthesampletimesbyacombinationofsamplecountsanddurations,andmightrequiremorestoragespacetoo.
Figure H.6 — An example of correcting the lip synchronization in the RTP timestamp representation
H.3.7 Representation of reception times
Asspecifiedin9.4.1.4,thereceptiontimeofapacketisindicatedbythesumofthedecodingtimeofthesample containing the packet and the value of relative_time of the RTPpacket structure of thepacket.
Thereception timeof theearliest receivedRTPpacket shouldbezero,and thereception timesofallsubsequentpacketsshouldberelativetothereceptiontimeoftheearliestreceivedRTPpacket.
The clock source for the reception time is undefined andmay be, for instance, the wallclock of thereceiver. If the rangeof reception timesofa receptionhint trackoverlapsentirelyorpartlywith therangeofreceptiontimesofanotherreceptionhinttrack,theclocksourcesforthesehinttracksshallbethesame.
The reception time of a packet should correspond to the time instantwhen the protocol stack layerunderneathRTP,typicallyUDP,outputsthepacket.
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 217
H.3.8 Creation of media samples
Mediasamplesarecreated fromthereceivedRTPpacketsas instructedby therelevantRTPpayloadspecificationandRTPitself.However,mostmediacodingstandardsonlyspecifythedecodingoferror‐freestreamsandconsequently itshouldbeensuredthatthecontent inmediatrackscanbecorrectlydecodedbyanystandard‐compliantmediadecoder.Handlingoftransmissionerrorsthereforerequirestwo steps: detection of transmission errors and inference of samples that can be decoded correctly.Thesestepsaredescribedinthesubsequentparagraphs.
LostRTPpacketscanbedetectedfromagapinRTPsequencenumbervalues.RTPpacketscontainingbit errors are usually not forwarded to the application as their UDP checksum fails and packets arediscardedintheprotocolstackofthereceiver.Consequently,bit‐erroneouspacketsareusuallytreatedaspacketlossesinthereceiver.
Theinferenceofmediasamplesthatcanbecorrectlydecodeddependsonthemediacodingformatandisthereforenotdescribedhereindetails.Generally,inter‐samplepredictionisweakornon‐existinginaudiocodingformats,whereasmostvideocodingformatsutilizeinterpredictionheavily.Consequently,alostsampleinmanyaudioformatscanoftenbereplacedbyasilentorerror‐concealedaudiosample.Itshouldbeanalyzedwhetheralossofavideopacketconcernedanon‐referencepictureorareferencepicture, or,more generally, inwhich level of the temporal scalability hierarchy the loss occurred. Itshouldthenbeconcludedwhichpicturesmaynotbecorrectlydecodable.Forexample,alossofanon‐reference picture does not affect the decoding of any other pictures, whereas a loss of a referencepictureinthebasetemporalleveltypicallyaffectsallpicturesuntilthenextpictureforrandomaccess,suchasanIDRpictureinH.264/AVC.Videotracksmustnotcontainanysamplesdependentonanylostvideosample.
H.3.9 Creation of hint samples referring to media samples
Media samples are created from the receivedRTP packets as explained inH.3.8. RTP reception hinttracks are created as explained inH.3.4, but the contentsof theRTPpacket structuredependon theexistenceofthecorrespondingmediasampleasfollows.
IfthepacketpayloadofthereceivedRTPpacketisrepresentedinamediatrack,thetrackreferenceoftherelevantpacketconstructorsaresettopointtothemediatrackandincludethepacketpayloadbyreference.Itisnotrecommendedtohaveacopyofthepacketpayloadintheextradatasectionofthereceived RTP sample in order to save storage space and make file editing operations easier toimplement.
IfthepacketpayloadofthereceivedRTPpacketisnotrepresentedinamediatrack,theinstanceoftheRTPpacketstructureiscreatedasexplainedinH.3.4.
H.4 Playing of recorded RTP streams
H.4.1 Introduction
ThisClausedescribesoperationsrequiredforplaybackofafilecontainingrecordedRTPstreams.Itisorganizedasfollows:
ISO/IEC 14496-12:2015(E)
218 ©ISO/IEC2015–Allrightsreserved
- Before RTP streams can be played, the contents of the files should be analyzed. Particularly,alternative tracks representing the samemedia stream should be identified andone of thesetracksshouldbeselectedfordecodingandplayback.Thecodingformatshouldbedetectedinordertoconcludeupfrontthatitcanbedecodedbytheplayer.ThesepreparationoperationsaredescribedinmoredetailsinH.4.2.
- IfanRTPreceptionhinttrackisbeingprocessed,thereareafewthingstobetakenintoaccountasdescribedin0.Forexample,packetlossesshouldbedetectedandhandledappropriately.
- Thesynchronizationofthedecodedmediasamplesshouldbehandledproperlyasdescribedin0.
- IftheRTPstreamsstoredinafileareaccessedfromapositionotherthanthebeginningofthestreams, proper inter‐stream synchronization and decoder initialization are needed asdescribedinH.4.5.
H.4.2 Preparation for the playback
In the preparation phase for playback, the player selects which tracks are played. The basic trackstructure of the file is parsed first. The tracks are grouped according towhich alternate group theybelong to. Tracks that belong to the same alternate group are indicated by the same value ofalternate_groupinthetrackheaderbox.Onetrackfromeachalternategroupisselectedforplaybackasfollows.
If there is anRTP receptionhint track in thealternate group, it ispreferred forplayback,because itcontains an entire representation of the received RTP stream, unlikemedia tracks derived from thereceivedRTPstreams,whichmightusesuchsubsetofthereceivedRTPpacketsthatcanbedecodedbyanystandard‐compliantdecoderwithoutcapabilityforhandlingpacketlosses.
The compatibility of the playerwith the selected track shouldbe ensured. For example, it should beexaminedwhetherthecodec,theprofile,andthelevelusedinthetrackaresuchthattheplayerisabletosupport.
The codec, profile, and level used for the coded bitstream in an RTP reception hint track can beconcludedfromtheSDPdescriptionoftheRTPstream.TheSDPdescriptionsarestoredinthemovie‐level indextrack. IfSDP isunchangedthroughoutthe file, itmaybeadditionallystoredasMovieSDPinformationandTrackSDPinformationwithinUserDataboxes.IfTrackSDPinformationispresent,itmay be parsed to find out the codec, profile, and level used for the bitstream contained in the RTPreceptionhinttrack.IfMovieSDPinformationorTrackSDPinformationisnotpresent,themove‐levelindextrackistraversedtofindandparseeachSDPindexand,consequently,thecodec,profile,andlevelusedforthebitstreamcontainedintheRTPreceptionhinttrack.
IfnoRTPreceptionhint trackexists inanalternategroup, thesampleentryorsampleentriesof themedia tracks in thealternategroupshouldbeexamined to findoutwhichonesof themtheplayer isabletosupport.
H.4.3 Decoding of a sample within an RTP reception hint track
TheoriginalRTPpacketsmaybereconstructedfromanRTPreceptionhintsamplebycreatingtheRTPpacket header from the RTPpacket structures and by resolving the constructors of the RTPpacket
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 219
structures.Hence,oneapproachforfileplayerstoprocessRTPreceptionhinttracksistore‐createthepacketstreamthatwasreceivedandprocessthere‐createdpacketstreamasifitwasnewlyreceived.
Therelative_timefieldincludedintheRTPpacketstructuremaybeusedtoscheduletheinsertionofthe packet into the buffer for the RTP receiver. However, it may be more advisable to modify thedecodingprocessofrecordedRTPstreamssuchamannerthatthedecoderoutputbuffersarekeptasfullaspossible inorder toavoid interruptionsor jerkyplaybackcausedby latepacketsoroccasionalproblemsinreal‐timedecodinginsystemsrunningotherprocessesinadditiontotheplayer.
PacketlossesshouldbedetectedfromgapsintheRTPsequencenumber.Thereactiontopacketlossesdependsontheparticularmediadecoderimplementationandmayalsodependonuserpreferences.
H.4.4 Lip synchronization
Thefollowingstepsarerequiredforachievingcorrectsynchronizationbetweenstreams:
1. Inter‐tracksynchronizationatthestartoftheplayback.
Thestartingpositionofthemediatimelineofatrackmaybeshiftedinthemovietimelineofthefileasdescribedinthefollowingtwoparagraphs.
ForamediatrackandanRTPreceptionhinttrackthatisnotassociatedwithanRTCPreceptionhint track,anEditListboxshouldbeused toshift thestartingpositionof themedia timelinewithin themovetimelineasdescribed inH.3.2.Themedia timelinesof the tracksselected forplayback are mapped to the movie timeline by parsing the Edit List boxes of the tracks, ifpresent. The playback of each media track and each RTP reception hint track that is notassociatedwithanRTCPreceptionhinttrackstartsatthemovietimelinepositionindicatedintheEditListboxof the trackor from thebeginningof themovie timeline, if noEdit Listboxexistsforthetrack.
ForRTPreceptionhint tracks thatareassociatedwith respectiveRTCPreceptionhint tracks,theshiftingofthestartingpositionofthemediatimelinewithinthemovietimelineisinferredasfollows.ThemediatimelineoftheRTPreceptionhinttrackcontainingtheearliestRTPpacket(inpresentationtimeonthesenderwallclocktimeline)amongallRTPreceptionhinttracksisnotshiftedwithinthemovietimeline(i.e.,startsattime0onthemovietimeline).ThestartingtimeofthemediatimelineoftheanyotherRTPreceptionhinttrackisequaltothetimestampdifferenceoftheearliestRTPpacketsofthepresenttrackandthetrackcontainingtheearliestRTPpacketamongallRTPreceptionhinttracks.
2. ReconstructionofRTPtimestampsandcompositiontimesonthemediatimeline(H.3.5).
3. CorrectionofRTPtimestampsandcompositiontimesbasedonRTCPSenderReports,ifRTCPreceptionhinttracksareused.
ThecorrectionisdonesimilarlytowhatisdescribedinH.3.6.3.However,insteadofaddingthedifferencebetweentimesaandcintotherepresentationoftheRTPtimestampsinthefile,thedifference is addedduring theplayback to thepresentation timesof the video frameson themovietimeline.
ISO/IEC 14496-12:2015(E)
220 ©ISO/IEC2015–Allrightsreserved
4. Pacingtheoutputofthedecodedmediasamples.
Itisrecommendedtoplayarecordedprogramatthepaceofthewallclockoftheplayerandtousetheaudioplayoutclockasthewallclockoftheplayer.Theaudioplaybackisarrangedtobecontinuous at the native sampling frequency of the audio signal. A presentation clock of theplayerrunsatthepaceoftheaudioplayback,i.e.,itsvalueisalwaysequaltothe(thenumberofthemostfrequentuncompressedaudiosamplethatwasplayedout)×(samplingfrequencyoftheaudiosignal).Theplaybackofthevideotrack(andpotentialothercontinuousmediatracks)issynchronizedtothepresentationclockoftheplayer.Inotherwords,whenthepresentationclock of the playermeets the composition time of a video sample on themovie timeline, thevideosampleisplayedout.
Onlyifafilebeingsimultaneouslyrecordedandplayedbackandifthereceiverwallclocksrunsfaster than the sender wallclock, pacing the playback according to the rate of the receiverwallclockmightnotberecommendedandsynchronizingtherateofthereceiverwallclocktotherateofthesenderwallclockmaybedoneasfollows.
Thepaceofthesenderclockisrecoveredbycreatingarelationshipbetweenthereceptiontimes(accordingtothereceiverclock)andtherespectivewallclocktimestampsofthesender,whicharereconstructedfromRTCPSenderReports.Itisrecommendedtousetheaudioplayoutclockas the receiver clock. As the delay in the network and in the receiver may be varying, therelation between the reception times and the respective timestamps of the sender should beaveragedoveralargenumberofreceivedpackets.Atimescalemultiplicationfactorisconcludedas a result of the averaging of the relation between the reception times and the respectivetimestampsofthesender.
A presentation time on a timeline of the receiver clock is derived for each sample. If RTCPreceptionhinttracksareinuse,thepresentationtimeisthecompositiontimeofthesampleonthemovie timeline, also includingclockdrift correctionasdescribed instep3above. IfRTCPreceptionhinttracksarenotinuse,thepresentationtimeisdirectlythecompositiontimeofthesampleonthemovietimeline.Then,forplaybackpurposesonly,thepresentationtimesofthesamplesinalltracksbeingplayedshouldbemultipliedbythetimescalemultiplicationfactor.
Time stretching of the signal should be done accordingly. Samples are played out at theirpresentationtimes.
Inpractice, the timescalemultiplication factor and themapping from theRTP timeline to thewallclockofthesender(step3above)maybeimplementedasasingleoperation.
H.4.5 Random access
Random access refers to a non‐linear access to the media streams represented in the file. In otherwords, in a random access operation the file is accessed from another sample than thatwhichwaspreviouslyplayedorthefileisinitiallyaccessedfromapositionthatisnotthebeginningofthemovietimeline.
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 221
Itisrecommendedtoprovidetherandomaccessfunctionalitytotheuserrelativetothemovietimelineof the file rather thananyother timelines, suchas thesenderwallclock timeline.Byusing themovietimelineasthebasis,thenumberofstepsforarandomaccessoperationiskeptlow.
First,itisderivedwhichmediaframesareatadesiredrandomaccessposition(orclosesttoit,iftherearenoneexactlyatthedesiredrandomaccessposition).Inthecaseofmediatracks,RTPreceptionhinttracks for audio, and any RTP reception hint tracks having the timestamp_sync field equal to 2(indicating pre‐compensated lip synchronization), the media frame closest to the desired randomaccesspositioncanbedirectlyderivedbasedonthecompositiontimestamps(onthemediatimeline)shiftedbytheinitialstartingpositionindicatedintheEditListbox,ifany.Inthecaseofnon‐audioRTPreceptionhinttrackshavingthetimestamp_syncfieldequalto1(indicatingtheuseofRTCPreceptionhint tracks), thepresentation times of samples shouldbederived asdescribed in 0, until the closestpresentationtimetothedesiredrandomaccesspositionisfound.
Second,decodingofmanymediabitstreamscanbestartedonlyfromframesofaparticulartype,suchan IDR picture of H.264/AVC. Player implementations may therefore have different approaches,includingthefollowing:
1. Discover the closest frame at or preceding the desired random access position from whichdecoding can be started, start decoding from that frame, and start rendering only from thedesired random access point. This approach may imply some processing delay before therenderingisstarted.
2. Start decoding and rendering at or after the desired random access point using the earliestframe fromwhichdecodingcanbestarted.Typically, audioplaybackwouldstartearlier thanvideoplayback,buttheprocessingdelaybeforetherendering isstartedissmallerthaninthepreviousoption.
H.5 Re-sending recorded RTP streams
H.5.1 Introduction
Itmaybeadesirableoperationtore‐sendtheRTPstreamsthathavebeenrecordedearliertoafile.Forexample,ifRTPstreamsarereceivedthroughabroadcastorstreamingserviceandrecordedintoafile,itmaybedesirabletore‐sendthemfromonedevicetoanotherdeviceinahomeenvironmentusingaWLANconnection.ThisClauseprovidesrecommendationsforre‐sendingofrecordedRTPstreams.
AcommunicationsystembasedonRTPincludesasourceendpoint(a.k.a.,asender)andadestinationendpoint(a.k.a.,areceiver)andmaycontainoneormoremixersandtranslators.ThesenderandthereceiveraretheendpointsoftheRTPandRTCPsessions.ThebehaviourofRTPtranslatorsandmixersisspecifiedinRFC3550andclarifiedinRFC5117.Ingeneral,therecordingunitreceivingRTPstreamsandstoring them intoa file actsasadestinationendpoint, anda re‐sendingunit readingstoredRTPstreamsfromafileandsendingthemactsasasource.Typically,thepayloadsofthere‐sentRTPstreamarenotmodified,whichmakesacombinationofarecordingunitandare‐sendingunitactingsimilarlytoatransporttranslatorasdescribedinRFC5117.However,theessentialcharacteristicofatranslatoristhatreceiverscannotdetectitspresence.Consequently,acombinationofarecordingunitandare‐sendingunitcannotactasa transport translator,unlessre‐sendinghappenssimultaneouslywith therecordingoftheoriginalstreams.Asthiscaseisconsideredrare,thediscussioninthisClauseregardsa
ISO/IEC 14496-12:2015(E)
222 ©ISO/IEC2015–Allrightsreserved
recordingunitasadestinationterminatingtheoriginalRTPandRTCPsessionsandare‐sendingunitasasourceofnewRTPandRTCPsessions.
ThisClauseisorganizedasfollows:
- H.5.2includesrecommendationshowtocomposeRTPpacketsfromRTPreceptionhinttracksandhowtoschedulethetransmissionoftheRTPpackets.
- H.5.3discusseshowRTCPpacketsshouldbegeneratedandhowreceivedRTCPpacketsshouldbeprocessed.
H.5.2 Re-sending RTP packets
Thepacketsarerecommendedtobeconstructedandtransmittedasfollows.
Thepacketpayloadsarerecommendedtobeconstructedaccording to theconstructorsstored in thereceptionhinttrack,i.e.,thepacketpayloadsarerecommendedtobeidenticaltothosereceived,unlessadifferentpacketsizeiscrucialforthenetworktowhichthepacketsarere‐sent.
- Thevaluesof theheader fieldsfortheRTPpacketscreatedassuggestedbyanRTPreceptionhint track should be kept the same as in the respective RTPpacket structure except for thefollowingcases:
- The initial RTP timestamp offset and the RTP sequence number offset should be selectedrandomly regardless of the values stored in the offset field of the 'tsro' box of the referredreception hint sample entry or the values of the RTPsequenceseed field of the RTPpacketstructureofanyforanyofthepacketsoftherespectiveRTPreceptionhinttrack.
- ThevalueoftheRTPtimestampfieldshouldbeasumoftherandominitialoffset,thevalueofoffsetintheRTPpacketstructure,andthedecodingtimeoftherespectiveRTPsample.Ifthesumexceedsthemaximumunsigned32‐bitinteger,itshouldbewrappedover.
- TherelativeincrementsoftheRTPsequencenumbershouldbethesameasthoserecordedinthe values of the RTPsequenceseed fields. Consequently, if there was a packet loss in thestream that was recorded, the stream that is re‐sent also has a respective gap in the RTPsequencenumber,andthereceiverisabletodeduceapacketloss.
- ThevalueoftheCSRCcountfieldshouldalwaysbezero,becausenocontributingsourcesofthepreviousRTPsessionthatwasrecordedareactivelymodifyingthestreamsfortheRTPsessionfor the streambeing re‐sent.The source identifier space (forbothSSRCandCSRC) is sessionspecific. Consequently, the CSRC list of the RTP header should be empty regardless of thepotentially stored CSRC values for the received streams, which are included in thereceivedCSRCTLVboxintheRTPpacketstructure.
- The value of the payload type fieldmay be dynamically selected depending on the signallingschemeinuse.
- The value of the SSRC field should be randomly selected and potential collisions should behandled as specified inRFC3550. The SSRCvalueof a received streammaybe stored in theReceivedSsrcBoxof thereferredreceptionhintsampleentrybut itshouldbe ignoredwhenthestreamisre‐sent.
- TherecordedRTPheaderextensions, stored inrtphdrextTLV in theRTPpacket structure, ifany,shouldbere‐sentonly if there‐sendingunitcanverifythattheyarevalid forthere‐sent
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 223
stream. If the re‐sending unit is not able to parse the semantics of the recordedRTPheaderextensions,theyshouldnotbere‐sent.
Thereceptiontimeofapacket,representedbythesumofthedecodingtimeoftheRTPreceptionhintsamplecontainingthepacketandthevalueoftherelative_timeoftheRTPpacketstructure,equalstothetransmissiontimeofthepacketwithaskewcausedbythetransmissiondelayandtheprocessingdelay in the protocol stack of the receiver. The skew of adjacent packetsmight not be equal due totransmission delay jitter and varying processing delay. Moreover, the protocol stack used whenreceivingthestreammightdifferfromtheprotocolstackusedforre‐sendingthestream.Duetothesereasons, the reception times areoftennot applicable as such topace the transmissionof the re‐sentpackets.Inallcases,there‐sendingunitshouldverifythatthere‐sentpacketstreamcomplieswiththebufferingmodelinuse, ifany.Ifthere‐sendingunitcanconcludethatthenetworkenvironmentsandprotocolstacksusedwhenreceivingthestreamandwhenre‐sendingtherecordedstreamaresimilar,reception timesmay be used as a basis for scheduling the packet transmission. The re‐sending unitshouldmakeanefforttoremoveorconcealthetransmissiondelayjitterintherecordedstream.Ifthere‐sendingunit isunable toconclude that thenetworkenvironmentsandprotocol stacksusedwhenreceivingthestreamandwhenre‐sendingtherecordedstreamaresimilarorisuncertainwhichkindofpacketschedulingisappropriate,itmayusethedecodingtimeasthebasisforscheduling.
H.5.3 RTCP Processing
RTCPSenderReportsandotherRTCPmessagesareregeneratedfollowingtheconstraintsspecifiedinRFC3550ratherthandirectlyusingtheRTCPmessagesrecordedinRTCPreceptionhinttracks,ifany.
AnRTCPSenderReportcontainsthewallclocktimewhenthereportwassentandtheRTPtimestampcorrespondingtothesametimeastheindicatedwallclocktime.TheRTPtimestampforanRTCPSenderReportisgeneratedasfollows.ApresentationtimeonatimelineofareferenceclockisderivedforthesamplecorrespondingtheindicatedwallclocktimeintheRTCPSenderReport.Thereferenceclockmaybe the wallclock of the re‐sending unit initialized to 0 at the beginning of the session. The samplecorrespondingtotheindicatedwallclocktimemightnotexistinthecorrespondingRTPreceptionhinttrack,becausethesamplinginstantsofthesamplesintheRTPreceptionhinttracksmightnotmatchwith the transmission instantsof theRTCPSenderReports.However,as instructedbyRFC3550, theRTPtimestampisderivedas if therewasasample in theRTPstreamcorrespondingto the indicatedwallclocktime.TheRTPtimestampforanRTCPSenderReportshouldbelinearlyinterpolatedfromtheRTP timestampsof the samples immediatelyprecedingand following thewallclock time indicated intheRTCP SenderReport. In order to conclude the samples immediately preceding and following thewallclocktimeindicatedintheRTCPSenderReport,presentationtimesonthetimelineofthereferenceclock should be derived until the closest samples are discovered. If RTCP reception hint tracks arepresentfortheRTPreceptionhinttrackbeingre‐sent,thepresentationtimeisthecompositiontimeofthe sampleon themovie timeline, also including clockdrift correction asdescribed in step3of 0. IfRTCPreceptionhinttracksarenotpresent,thepresentationtimeisdirectlythecompositiontimeofthesampleonthemovietimeline.
WhenhandlingthereceivedRTCPReceiverReports,itshouldbenoticedthatthereportedcumulativenumber of packets lost includes also the unsent packets that were never originally received andcorrespondtothegapsintheRTPsequencenumberintheRTPreceptionhinttracks.Anycongestionmanagement,retransmission,orotherpacketlossresiliencemethodshouldtakethisintoaccount.
ISO/IEC 14496-12:2015(E)
224 ©ISO/IEC2015–Allrightsreserved
Annex I (normative)
Stream Access Points
I.1 Introduction
ThisAnnexdefinesaStreamAccessPoint(SAP)andspecifiessixtypesofSAPs.
AStreamAccessPoint(SAP)enablesrandomaccess intoacontainerofmediastream(s).Acontainermay contain more than one media stream, each being an encoded version of continuous media ofcertainmediatype.ASAPisapositioninacontainerenablingplaybackofanidentifiedmediastreamtobe started using only (a) the information contained in the container starting from that positiononwards,and(b)possibleinitialisationdatafromotherpart(s)ofthecontainer,orexternallyavailable.DerivedspecificationsshouldspecifyifinitialisationdataisneededtoaccessthecontainerataSAP,andhowtheinitialisationdatacanbeaccessed.
ASAPforlayeredmediamayapplytoallthelayers,aparticularsetoflayers,oronlyasinglelayerinamediastream.WhenaSAPappliestoasetoflayersthatuseinterpredictionfromalayerthatisnotamemberoftheset,theremaybeanindicationiftheSAPrequiresthecorrectdecodingofthereferencelayer.
WhenSAPs areusedwith layeredmedia, derived specifications should specify orprovidesmeans toindicatewhichlayersSAPsapplytoandwhetherSAPsrequirecorrectdecodingofthereferencelayer.
I.2 SAP properties
I.2.1 General
ForeachSAPtheproperties,ISAP,TSAP,ISAU,TDEC,TEPT,andTPTFareidentifiedanddefinedas:
TSAP is the earliest presentation timeof any access unit of themedia stream such that all accessunits of themedia streamwith presentation time greater than or equal to TSAP can be correctlydecodedusingdataintheBitstreamstartingatISAPandnodatabeforeISAP.
ISAP is the greatest position in the Bitstream such that all access units of themedia streamwithpresentation time greater than or equal to TSAP can be correctly decoded using Bitstream datastartingatISAPandnodatabeforeISAP.
ISAU is thestartingposition intheBitstreamof the latestaccessunit indecodingorderwithinthemediastreamsuchthatallaccessunitsofthemediastreamwithpresentationtimegreaterthanorequal toTSAP canbe correctlydecodedusing this latest accessunit and accessunits following indecodingorderandnoaccessunitsearlierindecodingorder.
NOTE ISAUisalwaysgreaterthanorequaltoISAP.
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 225
TDEC istheearliestpresentationtimeofanyaccessunitof themediastreamthatcanbecorrectlydecodedusingdataintheBitstreamstartingatISAUandnodatabeforeISAU.
TEPT istheearliestpresentationtimeofanyaccessunitofthemediastreamstartingatISAU intheBitstream.
TPTF is thepresentationtimeof the firstaccessunitof themediastreamindecodingorder intheBitstreamstartingatISAU.
For the purposes of these definitions, the SAP is the access unit that is described as located at ISAUand/orISAP.
Note–Thedistinctionbetween ISAUand ISAP isonlyneeded todistinguishbetween referringdirectly to theaccessunit,andreferringtoitscontainingstructure.
I.2.2 SAP properties for layers
ThefollowingpropertiesapplytolayeredmediastreamsforwhichSAPsareindicatedforoneormorelayers,referredtoasthetarget layers. Inthefollowingproperties,anaccess‐unitpartitionreferstoaunit that contains the codeddata of a single time instance for the target layers, and amedia streampartitionreferstoasequenceofaccess‐unitpartitionofthetargetlayersindecodingorder.
Whenthetargetlayerscoverallthelayersofamediastream,thefollowingpropertiesareequivalenttothoseinI.2.1.
ForeachSAPtheproperties,ISAP,TSAP,ISAU,TDEC,TEPT,andTPTFareidentifiedanddefinedas:
TSAPistheearliestpresentationtimeofanyaccess‐unitpartitionsofthetargetlayerssuchthatallaccess‐unitpartitionsof target layerswithpresentationtimegreaterthanorequal toTSAPcanbecorrectlydecodedusingdatainthemediastreampartitionstartingatISAPandnodatabeforeISAP.
ISAPisthegreatestpositioninthecontainerofthemediastreampartitionsuchthatallaccess‐unitpartitionofthetargetlayerswithpresentationtimegreaterthanorequaltoTSAPcanbecorrectlydecodedusingdataofthemediastreampartitionstartingatISAPandnodatabeforeISAP.
ISAU is the starting position, in the media stream partition, of the latest access‐unit partition indecodingordersuchthatallaccess‐unitpartitionofthetargetlayerswithpresentationtimegreaterthanorequaltoTSAPcanbecorrectlydecodedusingthislatestaccess‐unitpartitionandaccess‐unitpartitionsfollowingindecodingorderandnoaccess‐unitpartitionearlierindecodingorder.
NOTE ISAUisalwaysgreaterthanorequaltoISAP.
TDEC is the earliestpresentation timeof any access‐unitpartitionof the target layers that canbecorrectlydecodedusingdatainthemediastreampartitionstartingatISAUandnodatabeforeISAU.
TEPTistheearliestpresentationtimeofanyaccess‐unitpartitionofthetargetlayersstartingatISAUinthemediastreampartition.
ISO/IEC 14496-12:2015(E)
226 ©ISO/IEC2015–Allrightsreserved
TPTFisthepresentationtimeofthefirstaccess‐unitpartitionofthetargetlayersindecodingorderinthemediastreampartitionstartingatISAU.
I.3 SAP types
SixtypesofSAPsaredefinedwithpropertiesasfollows:
Type1:TEPT=TDEC=TSAP=TPTF
Type2:TEPT=TDEC=TSAP<TPTF
Type3:TEPT<TDEC=TSAP<=TPTF
Type4:TEPT<=TPTF<TDEC=TSAP
Type5:TEPT=TDEC<TSAP
Type6:TEPT<TDEC<TSAP
NOTE ThetypeofSAPisdependentonlyonwhichAccessUnitsarecorrectlydecodableandtheirarrangementinpresentationorder.Thetypesinformallycorrespondwithsomecommonterms:
Type1correspondstowhatisknowninsomecodingschemesasa“ClosedGoPrandomaccesspoint”(inwhichall accessunits, indecodingorder, starting from ISAP canbe correctlydecoded, resulting in a continuoustimesequenceofcorrectlydecodedaccessunitswithnogaps)andinadditiontheaccessunitindecodingorderisalsothefirstaccessunitinpresentationorder.
Type2correspondstowhatisknowinsomecodingschemesasa“ClosedGoPrandomaccesspoint”,forwhichthefirstaccessunitindecodingorderinthemediastreamstartingfromISAUisnotthefirstaccessunitinpresentationorder.
Type3correspondstowhatisknowninsomecodingschemesasan“OpenGoPrandomaccesspoint”,inwhichthere are some access units in decoding order following ISAU that cannot be correctly decoded and havepresentationtimeslessthanTSAP.
Type4correspondstowhatisknowninsomecodingschemesasan"GradualDecodingRefresh(GDR)randomaccesspoint”,inwhichtherearesomeaccessunitsindecodingorderstartingfromandfollowingISAUthatcannotbecorrectlydecodedandhavepresentationtimeslessthanTSAP.
Type5correspondstothecaseforwhichthereisatleastoneaccessunitindecodingorderstartingfromISAPthatcannotbecorrectlydecodedandhaspresentationtimegreaterthanTDECandwhereTDECistheearliestpresentationtimeofanyaccessunitstartingfromISAU.
Type6correspondstothecaseforwhichthereisatleastoneaccessunitindecodingorderstartingfromISAPthat cannot be correctly decoded and has presentation time greater thanTDEC andwhereTDEC is not theearliestpresentationtimeofanyaccessunitstartingfromISAU.
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 227
Annex J (normative)
MIME Type Registration of Segments
J.1 Introduction
ThisAnnexprovidestheformalMIMEregistrationofmediasegmentsformattedaccordingto8.16.
J.2 Registration
MIME media type name: video
MIME subtype name: iso.segment
Required parameters: none
Optional parameters: as specified by RFC 6381 and its successors
Encoding considerations: as for video/mp4
Security considerations: See section 5 of RFC 4337.
Interoperability considerations: A number of interoperating implementations exist within the ISO/IEC 14496 community, and that community has reference software for reading and writing the file format.
Published specification: ISO/IEC 14496-12:2012 (expected)
Applications: Multimedia
Additional information:
Magic number(s): none
File extension(s): m4s
Macintosh File Type Code(s): None
Person to contact for info: David Singer, [email protected]
Intended usage: Common
Author/Change controller: David Singer, ISO/IEC 14496 file format chair
ISO/IEC 14496-12:2015(E)
228 ©ISO/IEC2015–Allrightsreserved
Annex K (informative)
Segment Index Examples
K.1 Introduction
Thisannexgivessomeexamplesoftheuseofthesegmentindexbox,andwhatvaluesareinsertedinitwhenitisusedinvariousdifferent‘styles’orconfigurations.
Inthefollowingexamples,thesizeofi‐th‘sidx’boxisdefinedasSi,index,thesizeofi‐thsubsegment,e.g.i‐th ‘moof’ and ‘mdat’ boxes, is defined as Si,media, the duration of i‐th subsegment is defined asDi, thenumberofthelastsubsegmentisdefinedasN,andthedurationofthesegmentisdefinedasDsegment.
K.2 Examples
K.2.1 Simple one-level indexing
Thisexampleshowsasimplesegmentindex(FigureK.1).Allentriesofthetoplevelsidxpointtomediacontent(segmentscomprisingoneormoremoviefragments),i.e.reference_typeisequalto0.Thevalueofreferenced_sizeandsubsegment_durationofeachentryarecalculatedasTableK.1.
Figure K. 1: Simple Segment Index
sidx entries referenced_size subsegment_duration
e0 Si Di
e1 Si+1 Di+1
Table K. 1: Simple Segment Index
K.2.2 Hierarchical
Thisexampleshowshierarchicalsegmentindex(FigureK.2).Allentriesofthetoplevelsidxpointtoanother‘sidx’box,i.e.reference_typeisequalto1,andallentriesofthesecondlevelsidxpointto
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 229
media content, i.e. reference_type is equal to 0. The value of referenced_size andsubsegment_durationofeachentryarecalculatedasTableK.2.
Figure K. 2: Hierarchical segment index
sidx# entries referenced_size subsegment_duration
i-th e0 Si+1,index + Sj,media + Sj+1,media Dj + Dj+1
e1 Si+2,index + Sj+2,media + Sj+3,media Dj+2 + Dj+3
(i+1)th e0 Sj,media Dj
e1 Sj+1,media Dj+1
(i+2)th e0 Sj+2,media Dj+2
e1 Sj+3,media Dj+3
Table K. 2: Hierarchical segment index
K.2.3 Daisy-chain
Thisexampleshowsdaisy‐chainedsegmentindex(FigureK.3).Each‘sidx’boxhastwoentries,thefirstentrypointstomediacontent,i.e.reference_typeisequalto0,thesecond(thelast)entrypointstonext ‘sidx’ box, i.e. reference_type is equal to 1. The value of referenced_size andsubsegment_durationofeachentryarecalculatedasTableK.3.
Figure K. 3: Daisy-chained segment index
ISO/IEC 14496-12:2015(E)
230 ©ISO/IEC2015–Allrightsreserved
sidx# entries referenced_size subsegment_duration
i-th e0 Si,media Di
e1 Si+1,index
i
jjsegment
N
ijj DDD
01
(i+1)th e0 Si+1,media Di+1
e1 Si+2,index
1
02
i
jjsegment
N
ijj DDD
Table K. 3: Daisy-chained segment index
K.2.4 Combination hierarchical and daisy-chain
Thisexampleshowshierarchicalanddaisy‐chainedsegmentindex(FigureK.4),whichiscombinationofA.2.3andA.2.4.Thevalueofreferenced_sizeandsubsegment_durationofeachentryarecalculatedasTableK.4.
Figure K. 4: Combined segment index
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 231
sidx# entries referenced_size subsegment_duration
i-th e0 Si+1,index + Sj,media + Sj+1,media Dj + Dj+1
e1 Si+2,index + Sj+2,media + Sj+3,media Dj+2 + Dj+3
e2 Si+3,index + Sj+4,media
3
04
j
kksegment
N
jkk DDD
(i+1)th e0 Sj,media Dj
e1 Sj+1,media Dj+1
(i+2)th e0 Sj+2,media Dj+2
e1 Sj+3,media Dj+3
(i+3)th e0 Sj+4,media Dj+4
e1 Si+4,index
4
05
j
kksegment
N
jkk DDD
Table K. 4: Combined segment index
ISO/IEC 14496-12:2015(E)
232 ©ISO/IEC2015–Allrightsreserved
Bibliography
[1] TheQuickTimefileformatspecification,inPDF:<http://developer.apple.com/documentation/QuickTime/QTFF/qtff.pdf>
[2] 3GPPTS26.244,3GPPfileformat(3GP)
[3] 3GPPTS26.346,MultimediaBroadcast/MulticastService(MBMS);Protocolsandcodecs
[4] OMABCAST_Distribution‐V1_0:FileandStreamDistributionforMobileBroadcastServices
[5] IETFRFC3926,FLUTE‐FileDeliveryoverUnidirectionalTransport,October2004
[6] IETFRFC3450,AsynchronousLayeredCoding(ALC)ProtocolInstantiation,December2002
[7] IETFRFC3451,LayeredCodingTransport(LCT)BuildingBlock,December2002
[8] IETFRFC3452,ForwardErrorCorrection(FEC)BuildingBlock,December2002
[9] IETFRFC3695,CompactForwardErrorCorrection(FEC)Schemes,February2004
[10] IETFRFC1864,TheContent‐MD5HeaderField,October1995
[11] IETFRFC2616,HypertextTransferProtocol—HTTP/1.1,June1999
[12] IETFRFC3061,AURNNamespaceofObjectIdentifiers,February2001
[13] IETFRFC3550,RTP:ATransportProtocolforReal‐TimeApplications,July2003
[14] IETFRFC3551,RTPProfileforAudioandVideoConferenceswithMinimalControl,July2003
[15] IETFRFC4122,AUniversallyUniqueIDentifier(UUID)URNNamespace,July2005
[16] IETF RFC 4771, Integrity Transform Carrying Roll‐Over Counter for the Secure Real‐timeTransportProtocol(SRTP),January2007
[17] IETFRFC5119,AUniformResourceName(URN)Namespace for theSocietyofMotionPictureandTelevisionEngineers(SMPTE),February2008
[18] ICC.1:2001‐04,Fileformatforcolorprofiles,InternationalColorConsortium
[19] SMPTE RP 177, Derivation of Basic Television Color Equations; Society of Motion Picture andTelevisionEngineers(SMPTE),1993
[20] ISO/IEC13818‐1, Information technology — Generic coding of moving pictures and associated audio information — Systems
ISO/IEC 14496-12:2015(E)
©ISO/IEC2015–Allrightsreserved 233
[21] ISO/IEC14496‐15, Information technology — Coding of audio-visual objects — Advanced Video Coding (AVC) file format
[22] IETFRFC5117,RTP Topologies,WESTERLUND,M.etal.,January2008.
ISO/IEC 14496-12:2015(E)
ICS 35.040
Pricebasedon233pages
©ISO/IEC2015–Allrightsreserved