international iso/iec standard 14496-12 · 2018-06-07 · electronic or mechanical, including...

248
Reference number ISO/IEC 14496‐12:2015(E) © ISO/IEC 2015 INTERNATIONAL STANDARD ISO/IEC 14496-12 Fifth edition 2015‐12‐15 Information technology — Coding of audio- visual objects — Part 12: ISO base media file format Technologies de l'information — Codage des objets audiovisuels Partie 12: Format ISO de base pour les fichiers médias

Upload: others

Post on 07-Jan-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

Referencenumber

ISO/IEC14496‐12:2015(E)

©ISO/IEC2015

INTERNATIONALSTANDARD

ISO/IEC14496-12

Fifthedition2015‐12‐15

Information technology — Coding of audio-visual objects —

Part12:ISO base media file format

Technologies de l'information — Codage des objets audiovisuels —

Partie 12: Format ISO de base pour les fichiers médias

Page 2: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

COPYRIGHT PROTECTED DOCUMENT

©ISO/IEC2015

All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized in any form or by any means,electronicormechanical,includingphotocopyingandmicrofilm,withoutpermissioninwritingfromeitherISOattheaddressbeloworISO'smemberbodyinthecountryoftherequester.

ISOcopyrightofficeCasepostale56CH‐1211Geneva20Tel.+41227490111Fax+41227490947E‐[email protected]

PublishedinSwitzerland

ii ©ISO/IEC2015–Allrightsreserved

Page 3: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved iii

Contents Page

1 Scope .......................................................................................................................................................................... 1

2 Normative references .......................................................................................................................................... 1

3 Terms, definitions, and abbreviated terms .................................................................................................. 33.1 Terms and definitions ...................................................................................................................................... 33.2 Abbreviated terms ............................................................................................................................................. 5

4 Object-structured File Organization ............................................................................................................... 64.1 File Structure ....................................................................................................................................................... 64.2 Object Structure .................................................................................................................................................. 64.3 File Type Box ....................................................................................................................................................... 7

5 Design Considerations ......................................................................................................................................... 85.1 Usage ....................................................................................................................................................................... 85.1.1 Introduction ..................................................................................................................................................... 85.1.2 Interchange ....................................................................................................................................................... 85.1.3 Content Creation ............................................................................................................................................. 95.1.4 Preparation for streaming ........................................................................................................................ 105.1.5 Local presentation ....................................................................................................................................... 105.1.6 Streamed presentation ............................................................................................................................... 105.2 Design principles ............................................................................................................................................. 11

6 ISO Base Media File organization .................................................................................................................. 126.1 Presentation structure ................................................................................................................................... 126.1.1 File Structure ................................................................................................................................................. 126.1.2 Object Structure ............................................................................................................................................ 126.1.3 Meta Data and Media Data ........................................................................................................................ 126.1.4 Track Identifiers ........................................................................................................................................... 126.2 Metadata Structure (Objects) ...................................................................................................................... 136.2.1 Box ..................................................................................................................................................................... 136.2.2 Data Types and fields .................................................................................................................................. 136.2.3 Box Order ........................................................................................................................................................ 146.2.4 URIs as type indicators ............................................................................................................................... 176.3 Brand Identification ........................................................................................................................................ 17

7 Streaming Support .............................................................................................................................................. 187.1 Handling of Streaming Protocols ............................................................................................................... 187.2 Protocol ‘hint’ tracks ....................................................................................................................................... 187.3 Hint Track Format ........................................................................................................................................... 19

8 Box Structures ...................................................................................................................................................... 208.1 File Structure and general boxes ................................................................................................................ 208.1.1 Media Data Box .............................................................................................................................................. 208.1.2 Free Space Box ............................................................................................................................................... 21

Page 4: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

iv ©ISO/IEC2015–Allrightsreserved

8.1.3 Progressive Download Information Box ............................................................................................. 218.2 Movie Structure ................................................................................................................................................ 228.2.1 Movie Box ........................................................................................................................................................ 228.2.2 Movie Header Box ........................................................................................................................................ 228.3 Track Structure ................................................................................................................................................ 248.3.1 Track Box ........................................................................................................................................................ 248.3.2 Track Header Box ........................................................................................................................................ 248.3.3 Track Reference Box ................................................................................................................................... 268.3.4 Track Group Box .......................................................................................................................................... 278.4 Track Media Structure ................................................................................................................................... 288.4.1 Media Box ....................................................................................................................................................... 288.4.2 Media Header Box ........................................................................................................................................ 298.4.3 Handler Reference Box .............................................................................................................................. 298.4.4 Media Information Box .............................................................................................................................. 308.4.5 Media Information Header Boxes .......................................................................................................... 308.4.6 Extended language tag ............................................................................................................................... 318.5 Sample Tables ................................................................................................................................................... 328.5.1 Sample Table Box ......................................................................................................................................... 328.5.2 Sample Description Box ............................................................................................................................. 328.5.3 Degradation Priority Box .......................................................................................................................... 348.5.4 Sample Scale Box .......................................................................................................................................... 358.6 Track Time Structures ................................................................................................................................... 358.6.1 Time to Sample Boxes ................................................................................................................................ 358.6.2 Sync Sample Box ........................................................................................................................................... 408.6.3 Shadow Sync Sample Box .......................................................................................................................... 408.6.4 Independent and Disposable Samples Box ......................................................................................... 418.6.5 Edit Box ............................................................................................................................................................ 438.6.6 Edit List Box ................................................................................................................................................... 438.7 Track Data Layout Structures ..................................................................................................................... 458.7.1 Data Information Box ................................................................................................................................. 458.7.2 Data Reference Box ..................................................................................................................................... 458.7.3 Sample Size Boxes ........................................................................................................................................ 478.7.4 Sample To Chunk Box ................................................................................................................................. 488.7.5 Chunk Offset Box .......................................................................................................................................... 498.7.6 Padding Bits Box .......................................................................................................................................... 498.7.7 Sub-Sample Information Box ................................................................................................................... 508.7.8 Sample Auxiliary Information Sizes Box ............................................................................................. 518.7.9 Sample Auxiliary Information Offsets Box ......................................................................................... 538.8 Movie Fragments ............................................................................................................................................. 548.8.1 Movie Extends Box ....................................................................................................................................... 548.8.2 Movie Extends Header Box ....................................................................................................................... 548.8.3 Track Extends Box ....................................................................................................................................... 558.8.4 Movie Fragment Box ................................................................................................................................... 568.8.5 Movie Fragment Header Box ................................................................................................................... 568.8.6 Track Fragment Box .................................................................................................................................... 578.8.7 Track Fragment Header Box .................................................................................................................... 57

Page 5: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved v

8.8.8 Track Fragment Run Box ........................................................................................................................... 588.8.9 Movie Fragment Random Access Box .................................................................................................... 608.8.10 Track Fragment Random Access Box ................................................................................................. 608.8.11 Movie Fragment Random Access Offset Box .................................................................................... 618.8.12 Track fragment decode time .................................................................................................................. 628.8.13 Level Assignment Box .............................................................................................................................. 638.8.14 Sample Auxiliary Information in Movie Fragments ...................................................................... 658.8.15 Track Extension Properties Box ........................................................................................................... 658.8.16 Alternative Startup Sequence Properties Box ................................................................................. 668.8.17 Metadata and user data in movie fragments ................................................................................... 668.9 Sample Group Structures .............................................................................................................................. 678.9.1 Introduction ................................................................................................................................................... 678.9.2 Sample to Group Box ................................................................................................................................... 688.9.3 Sample Group Description Box ................................................................................................................ 698.9.4 Representation of group structures in Movie Fragments .............................................................. 708.10 User Data .......................................................................................................................................................... 718.10.1 User Data Box .............................................................................................................................................. 718.10.2 Copyright Box .............................................................................................................................................. 728.10.3 Track Selection Box ................................................................................................................................... 728.10.4 Track kind .................................................................................................................................................... 748.11 Metadata Support .......................................................................................................................................... 758.11.1 The Meta box ............................................................................................................................................... 758.11.2 XML Boxes ..................................................................................................................................................... 768.11.3 The Item Location Box ............................................................................................................................. 778.11.4 Primary Item Box ....................................................................................................................................... 808.11.5 Item Protection Box .................................................................................................................................. 808.11.6 Item Information Box ............................................................................................................................... 818.11.7 Additional Metadata Container Box .................................................................................................... 838.11.8 Metabox Relation Box .............................................................................................................................. 848.11.9 URL Forms for meta boxes ...................................................................................................................... 858.11.10 Static Metadata ......................................................................................................................................... 858.11.11 Item Data Box ........................................................................................................................................... 868.11.12 Item Reference Box ................................................................................................................................. 878.11.13 Auxiliary video metadata ..................................................................................................................... 888.12 Support for Protected Streams ................................................................................................................. 888.12.1 Protection Scheme Information Box ................................................................................................... 898.12.2 Original Format Box .................................................................................................................................. 908.12.3 IPMPInfoBox ................................................................................................................................................ 908.12.4 IPMP Control Box ....................................................................................................................................... 908.12.5 Scheme Type Box ....................................................................................................................................... 908.12.6 Scheme Information Box ......................................................................................................................... 918.13 File Delivery Format Support .................................................................................................................... 918.13.1 Introduction ................................................................................................................................................. 918.13.2 FD Item Information Box ......................................................................................................................... 928.13.3 File Partition Box ....................................................................................................................................... 928.13.4 FEC Reservoir Box ...................................................................................................................................... 94

Page 6: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

vi ©ISO/IEC2015–Allrightsreserved

8.13.5 FD Session Group Box .............................................................................................................................. 958.13.6 Group ID to Name Box .............................................................................................................................. 968.13.7 File Reservoir Box ..................................................................................................................................... 968.14 Sub tracks ........................................................................................................................................................ 978.14.1 Introduction ................................................................................................................................................ 978.14.2 Backward compatibility .......................................................................................................................... 978.14.3 Sub Track box ............................................................................................................................................. 988.14.4 Sub Track Information box .................................................................................................................... 988.14.5 Sub Track Definition box ...................................................................................................................... 1008.14.6 Sub Track Sample Group box .............................................................................................................. 1008.15 Post-decoder requirements on media ................................................................................................. 1008.15.1 General ........................................................................................................................................................ 1008.15.2 Transformation ........................................................................................................................................ 1018.15.3 Restricted Scheme Information box ................................................................................................. 1028.15.4 Scheme for stereoscopic video arrangements .............................................................................. 1028.16 Segments ........................................................................................................................................................ 1048.16.1 Introduction .............................................................................................................................................. 1048.16.2 Segment Type Box ................................................................................................................................... 1048.16.3 Segment Index Box .................................................................................................................................. 1058.16.4 Subsegment Index Box .......................................................................................................................... 1098.16.5 Producer Reference Time Box ............................................................................................................ 1118.17 Support for Incomplete Tracks .............................................................................................................. 1128.17.1 General ........................................................................................................................................................ 1128.17.2 Transformation ........................................................................................................................................ 1138.17.3 Complete Track Information Box ...................................................................................................... 114

9 Hint Track Formats .......................................................................................................................................... 1149.1 RTP and SRTP Hint Track Format ........................................................................................................... 1149.1.1 Introduction ................................................................................................................................................. 1149.1.2 Sample Description Format ................................................................................................................... 1159.1.3 Sample Format ............................................................................................................................................ 1179.1.4 SDP Information ......................................................................................................................................... 1199.1.5 Statistical Information ............................................................................................................................. 1209.2 ALC/LCT and FLUTE Hint Track Format ................................................................................................ 1219.2.1 Introduction ................................................................................................................................................. 1219.2.2 Design principles ....................................................................................................................................... 1229.2.3 Sample Description Format ................................................................................................................... 1239.2.4 Sample Format ............................................................................................................................................ 1249.3 MPEG-2 Transport Hint Track Format ................................................................................................... 1279.3.1 Introduction ................................................................................................................................................. 1279.3.2 Design Principles ....................................................................................................................................... 1289.3.3 Sample Description Format ................................................................................................................... 1309.3.4 Sample Format ............................................................................................................................................ 1329.3.5 Protected MPEG 2 Transport Stream Hint Track ........................................................................... 1349.4 RTP, RTCP, SRTP and SRTCP Reception Hint Tracks ........................................................................ 1349.4.1 RTP Reception Hint Track ...................................................................................................................... 134

Page 7: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved vii

9.4.2 RTCP Reception Hint Track ................................................................................................................... 1389.4.3 SRTP Reception Hint Track .................................................................................................................... 1409.4.4 SRTCP Reception Hint Tracks ............................................................................................................... 1429.4.5 Protected RTP Reception Hint Track ................................................................................................. 1439.4.6 Recording Procedure ............................................................................................................................... 1439.4.7 Parsing Procedure .................................................................................................................................... 143

10 Sample Groups ................................................................................................................................................ 14310.1 Random Access Recovery Points .......................................................................................................... 14310.2 Rate Share Groups ...................................................................................................................................... 14410.2.1 Introduction .............................................................................................................................................. 14410.2.2 Rate Share Sample Group Entry ........................................................................................................ 14610.2.3 Relationship between tracks .............................................................................................................. 14710.2.4 Bitrate allocation .................................................................................................................................... 14710.3 Alternative Startup Sequences .............................................................................................................. 14810.3.4 Examples .................................................................................................................................................... 14910.4 Random Access Point (RAP) Sample Grouping ................................................................................ 15110.5 Temporal level sample grouping .......................................................................................................... 15210.6 Stream access point sample group ....................................................................................................... 152

11 Extensibility ..................................................................................................................................................... 15311.1 Objects ............................................................................................................................................................ 15311.2 Storage formats ........................................................................................................................................... 15411.3 Derived File formats .................................................................................................................................. 154

12 Media-specific definitions ........................................................................................................................... 15512.1 Video media .................................................................................................................................................. 15512.1.1 Media handler .......................................................................................................................................... 15512.1.2 Video media header ............................................................................................................................... 15512.1.3 Sample entry ............................................................................................................................................. 15612.1.4 Pixel Aspect Ratio and Clean Aperture ........................................................................................... 15612.1.5 Colour information ................................................................................................................................. 15812.2 Audio media ................................................................................................................................................. 15912.2.1 Media handler .......................................................................................................................................... 15912.2.2 Sound media header .............................................................................................................................. 15912.2.3 Sample entry ............................................................................................................................................. 16012.2.4 Channel layout ......................................................................................................................................... 16212.2.5 Downmix Instructions ........................................................................................................................... 16312.2.6 DRC Information ..................................................................................................................................... 16512.2.7 Audio stream loudness ......................................................................................................................... 16512.3 Metadata media ........................................................................................................................................... 16712.3.1 Media handler .......................................................................................................................................... 16712.3.2 Media header ............................................................................................................................................ 16712.3.3 Sample entry ............................................................................................................................................. 16712.4 Hint media ..................................................................................................................................................... 16912.4.1 Media handler .......................................................................................................................................... 16912.4.2 Hint media header .................................................................................................................................. 16912.4.3 Sample entry ............................................................................................................................................. 170

Page 8: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

viii ©ISO/IEC2015–Allrightsreserved

12.5 Text media ..................................................................................................................................................... 17012.5.1 Media handler ........................................................................................................................................... 17012.5.2 Media header ............................................................................................................................................ 17012.5.3 Sample entry ............................................................................................................................................. 17012.6 Subtitle media .............................................................................................................................................. 17112.6.1 Media handler ........................................................................................................................................... 17112.6.2 Subtitle media header ........................................................................................................................... 17112.6.3 Sample entry ............................................................................................................................................. 17112.7 Font media ..................................................................................................................................................... 17212.7.1 Media handler ........................................................................................................................................... 17212.7.2 Media header ............................................................................................................................................ 17212.7.3 Sample entry ............................................................................................................................................. 17212.8 Transformed media ................................................................................................................................... 172

Annex A(informative) Overview and Introduction ..................................................................................... 173A.1 Section Overview ........................................................................................................................................... 173A.2 Core Concepts ................................................................................................................................................. 173A.3 Physical structure of the media ............................................................................................................... 174A.4 Temporal structure of the media ............................................................................................................ 174A.5 Interleave ......................................................................................................................................................... 175A.6 Composition .................................................................................................................................................... 175A.7 Random access ............................................................................................................................................... 175A.8 Fragmented movie files ............................................................................................................................... 176

Annex B(void) ........................................................................................................................................................... 178

Annex C(informative) Guidelines on deriving from this specification ................................................ 179C.1 Introduction .................................................................................................................................................... 179C.2 General Principles ......................................................................................................................................... 179C.2.1 General ........................................................................................................................................................... 179C.2.2 Base layer operations ............................................................................................................................... 180C.3 Boxes .................................................................................................................................................................. 180C.4 Brand Identifiers ........................................................................................................................................... 181C.4.1 Introduction ................................................................................................................................................. 181C.4.2 Usage of the Brand ..................................................................................................................................... 181C.4.3 Introduction of a new brand .................................................................................................................. 182C.4.4 Player Guideline ......................................................................................................................................... 182C.4.5 Authoring Guideline .................................................................................................................................. 182C.4.6 Example ......................................................................................................................................................... 183C.5 Storage of new media types ....................................................................................................................... 183C.6 Use of Template fields .................................................................................................................................. 183C.7 Tracks ................................................................................................................................................................ 184C.7.1 Data Location ............................................................................................................................................... 184C.7.2 Time ................................................................................................................................................................ 184C.7.3 Media Types ................................................................................................................................................. 185C.7.4 Coding Types ................................................................................................................................................ 185C.7.5 Sub-sample information .......................................................................................................................... 185C.7.6 Sample Dependency .................................................................................................................................. 185

Page 9: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved ix

C.7.7 Sample Groups ............................................................................................................................................ 185C.7.8 Track-level ................................................................................................................................................... 186C.7.9 Protection ..................................................................................................................................................... 186C.8 Construction of fragmented movies ....................................................................................................... 186C.9 Meta-data ......................................................................................................................................................... 187C.10 Registration .................................................................................................................................................. 187C.11 Guidelines on the use of sample groups, timed metadata tracks, and sample auxiliary information ................................................................................................................................................................ 187

Annex D(informative) Registration Authority ............................................................................................. 190D.1 Code points to be registered ..................................................................................................................... 190D.2 Procedure for the request of an MPEG-4 registered identifier value ........................................ 191D.3 Responsibilities of the Registration Authority .................................................................................. 191D.4 Contact information for the Registration Authority ........................................................................ 191D.5 Responsibilities of Parties Requesting a RID ..................................................................................... 192D.6 Appeal Procedure for Denied Applications ........................................................................................ 192D.7 Registration Application Form ................................................................................................................ 192D.7.1 Contact Information of organization requesting a RID ............................................................... 192D.7.2 Request for a specific RID ...................................................................................................................... 193D.7.3 Short description of RID that is in use and date system was implemented ......................... 193D.7.4 Statement of an intention to apply the assigned RID ................................................................... 193D.7.5 Date of intended implementation of the RID .................................................................................. 193D.7.6 Authorized representative .................................................................................................................... 193D.7.7 For official use of the Registration Authority ................................................................................. 194

Annex E(normative)File format brands ........................................................................................................ 195E.1 Introduction .................................................................................................................................................... 195E.2 The ‘isom’ brand ........................................................................................................................................ 196E.3 The ‘avc1’ brand ........................................................................................................................................ 197E.4 The ‘iso2’ brand ........................................................................................................................................ 197E.5 The ‘mp71’ brand ........................................................................................................................................ 198E.6 The ‘iso3’ brand ........................................................................................................................................ 198E.7 The ‘iso4’ brand ........................................................................................................................................ 199E.8 The ‘iso5’ brand ........................................................................................................................................ 199E.9 The ‘iso6’ brand ........................................................................................................................................ 200E.10 The ‘iso7’ brand ..................................................................................................................................... 200E.11 The ‘iso8’ brand ..................................................................................................................................... 201E.12 The ‘iso9’ brand ..................................................................................................................................... 201

Annex F(void) ........................................................................................................................................................... 202

Annex G(informative)URI-labelled metadata forms ................................................................................. 203G.1 UUID-labelled metadata ............................................................................................................................. 203G.2 ISO OID-labelled metadata ........................................................................................................................ 203G.3 SMPTE-labelled metadata .......................................................................................................................... 204

Annex H(informative)Processing of RTP streams and reception hint tracks .................................. 205H.1 Introduction ................................................................................................................................................... 205H.1.1 Overview ...................................................................................................................................................... 205

Page 10: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

x ©ISO/IEC2015–Allrightsreserved

H.1.2 Structure ....................................................................................................................................................... 205H.1.3 Terms and definitions ............................................................................................................................. 205H.2 Synchronization of RTP streams ............................................................................................................. 205H.3 Recording of RTP streams ......................................................................................................................... 206H.3.1 Introduction ................................................................................................................................................ 206H.3.2 Compensation for unequal starting for position of received RTP streams .......................... 209H.3.3 Recording of SDP ....................................................................................................................................... 210H.3.4 Creation of a sample within an RTP reception hint track ........................................................... 210H.3.5 Representation of RTP timestamps .................................................................................................... 211H.3.6 Recording operations to facilitate inter-stream synchronization in playback .................. 214H.3.7 Representation of reception times ..................................................................................................... 216H.3.8 Creation of media samples ..................................................................................................................... 217H.3.9 Creation of hint samples referring to media samples .................................................................. 217H.4 Playing of recorded RTP streams ............................................................................................................ 217H.4.1 Introduction ................................................................................................................................................ 217H.4.2 Preparation for the playback ................................................................................................................ 218H.4.3 Decoding of a sample within an RTP reception hint track ......................................................... 218H.4.4 Lip synchronization .................................................................................................................................. 219H.4.5 Random access ........................................................................................................................................... 220H.5 Re-sending recorded RTP streams ......................................................................................................... 221H.5.1 Introduction ................................................................................................................................................ 221H.5.2 Re-sending RTP packets.......................................................................................................................... 222H.5.3 RTCP Processing ........................................................................................................................................ 223

Annex I(normative)Stream Access Points ..................................................................................................... 224I.1 Introduction ..................................................................................................................................................... 224I.2 SAP properties ................................................................................................................................................. 224I.2.1 General ............................................................................................................................................................ 224I.2.2 SAP properties for layers ......................................................................................................................... 225I.3 SAP types ........................................................................................................................................................... 226

Annex J(normative)MIME Type Registration of Segments ..................................................................... 227J.1 Introduction ..................................................................................................................................................... 227J.2 Registration ...................................................................................................................................................... 227

Annex K : Segment Index Examples (informative) ...................................................................................... 228K.1 Introduction .................................................................................................................................................... 228K.2 Examples .......................................................................................................................................................... 228K.2.1 Simple one-level indexing ...................................................................................................................... 228K.2.2 Hierarchical ................................................................................................................................................. 228K.2.3 Daisy-chain .................................................................................................................................................. 229K.2.4 Combination hierarchical and daisy-chain ...................................................................................... 230

Page 11: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved xi

Foreword

ISO (the International Organization for Standardization) and IEC (the International ElectrotechnicalCommission) form the specialized system for worldwide standardization. National bodies that aremembers of ISO or IEC participate in the development of International Standards through technicalcommitteesestablishedbytherespectiveorganizationtodealwithparticularfieldsoftechnicalactivity.ISO and IEC technical committees collaborate in fields of mutual interest. Other internationalorganizations,governmentalandnon‐governmental, in liaisonwith ISOand IEC,also takepart in thework.Inthefieldofinformationtechnology,ISOandIEChaveestablishedajointtechnicalcommittee,ISO/IECJTC1.

The procedures used to develop this document and those intended for its further maintenance aredescribedintheISO/IECDirectives,Part1.Inparticularthedifferentapprovalcriterianeededforthedifferent types of document should be noted. This document was drafted in accordance with theeditorialrulesoftheISO/IECDirectives,Part2(seewww.iso.org/directives).

Attentionisdrawntothepossibilitythatsomeoftheelementsofthisdocumentmaybethesubjectofpatent rights. ISO and IEC shall not be held responsible for identifying any or all such patentrights.Detailsof anypatent rights identifiedduring thedevelopmentof thedocumentwillbe in theIntroductionand/orontheISOlistofpatentdeclarationsreceived(seewww.iso.org/patents).

Anytradenameusedinthisdocumentisinformationgivenfortheconvenienceofusersanddoesnotconstituteanendorsement.

For an explanation on the meaning of ISO specific terms and expressions related to conformityassessment, as well as information about ISO's adherence to the WTO principles in the TechnicalBarrierstoTrade(TBT)seethefollowingURL:Foreword‐Supplementaryinformation

ThecommitteeresponsibleforthisdocumentisISO/IECJTC1,Information technology,SC29,Coding of audio, picture, multimedia and hypermedia information.

This fifth edition cancels and replaces the fourth edition (ISO/IEC 14496‐12:2012), which has beentechnicallyrevised.ItalsoincorporatestheAmendmentsISO/IEC14496‐12:2012/Amd1:2013,ISO/IEC14496‐12:2012/Amd2:2014, ISO/IEC 14496‐12:2012/Amd3:2015 and the Technical CorrigendaISO/IEC 14496‐12:2012/Cor1:2013, ISO/IEC 14496‐12:2012/Cor2:2014 and ISO/IEC 14496‐12:2012/Cor3:2015.

ISO/IEC14496consistsofthefollowingparts,underthegeneraltitleInformation technology — Coding of audio-visual objects:

Part 1: Systems

Part 2: Visual

Part 3: Audio

Page 12: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

xii ©ISO/IEC2015–Allrightsreserved

Part 4: Conformance testing

Part 5: Reference software

Part 6: Delivery Multimedia Integration Framework (DMIF)

Part 7: Optimized reference software for coding of audio-visual objects

Part 8: Carriage of ISO/IEC 14496 contents over IP networks

Part 9: Reference hardware description

Part 10: Advanced Video Coding

Part 11: Scene description and application engine

Part 12: ISO base media file format

Part 13: Intellectual Property Management and Protection (IPMP) extensions

Part 14: MP4 file format

Part 15: Carriage of NAL unit structured video in the ISO Base Media File Format

Part 16: Animation Framework eXtension (AFX)

Part 17: Streaming text format

Part 18: Font compression and streaming

Part 19: Synthesized texture stream

Part 20: Lightweight Application Scene Representation (LASeR) and Simple Aggregation Format (SAF)

Part 21: MPEG-J Graphics Framework eXtensions (GFX)

Part 22: Open Font Format

Part 23: Symbolic Music Representation

Part 24: Audio and systems interaction

Part 25: 3D Graphics Compression Model

Part 26: Audio conformance

Part 27: 3D Graphics conformance

Part 28: Composite font representation

Part 29: Web video coding

Page 13: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved xiii

Part 30: Timed text and other visual overlays in ISO base media file format

Part 31: Video Coding for Browsers

Page 14: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

xiv ©ISO/IEC2015–Allrightsreserved

Introduction

TheISOBaseMediaFileFormatisdesignedtocontaintimedmediainformationforapresentationinaflexible, extensible format that facilitates interchange, management, editing, and presentation of themedia. This presentation may be ‘local’ to the system containing the presentation, or may be via anetworkorotherstreamdeliverymechanism.

Thefilestructureisobject‐oriented;afilecanbedecomposedintoconstituentobjectsverysimply,andthestructureoftheobjectsinferreddirectlyfromtheirtype.

The file format is designed to be independent of any particular network protocol while enablingefficientsupportforthemingeneral.

TheISOBaseMediaFileFormatisabaseformatformediafileformats.

It is intended that the ISO Base Media File Format shall be jointly maintained by WG1 andWG11.Consequently, a subdivision of work created ISO/IEC15444‐12 and ISO/IEC14496‐12 in order todocumenttheISOBaseMediaFileFormatandtofacilitatethejointmaintenance.

ThistechnicallyidenticaltextispublishedasISO/IEC14496‐12forMPEG‐4,andasISO/IEC15444‐12forJPEG2000,andreferencetothisspecificationshouldbemadeaccordingly.Therecommendationistoreferenceone,forexampleISO/IEC14496‐12,andappendtothereferenceaparentheticalcommentidentifyingtheother,forexample“(technicallyidenticaltoISO/IEC15444‐12)”.

The International Organization for Standardization (ISO) and International ElectrotechnicalCommission(IEC)drawattentiontothefactthatitisclaimedthatcompliancewiththisdocumentmayinvolvetheuseofpatents.

TheISOandIECtakenopositionconcerningtheevidence,validityandscopeofthispatentright.

TheholderofthispatentrighthasassuredtheISOandIECthatheiswillingtonegotiatelicencesunderreasonableandnon‐discriminatorytermsandconditionswithapplicantsthroughouttheworld.Inthisrespect,thestatementoftheholderofthispatentrightisregisteredwiththeISOandIEC.InformationmaybeobtainedfromthecompanieslistedinAnnexB.

Attentionisdrawntothepossibilitythatsomeoftheelementsofthisdocumentmaybethesubjectofpatent rights other than those identified in Annex B. ISO and IEC shall not be held responsible foridentifyinganyorallsuchpatentrights.

ISO (www.iso.org/patents) and IEC (http://patents.iec.ch) maintain on‐line databases of patentsrelevant to their standards. Users are encouraged to consult the databases for themost up to dateinformationconcerningpatents.

Page 15: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

INTERNATIONAL STANDARD ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 1

Information technology — Coding of audio-visual objects —

Part12:ISO base media file format

1 Scope

ThispartofISO/IEC14496specifiestheISObasemediafileformat,whichisageneralformatformingthebasis foranumberofothermorespecific file formats.This formatcontains the timing,structure,andmediainformationfortimedsequencesofmediadata,suchasaudio‐visualpresentations.

This part of ISO/IEC14496 is applicable to MPEG‐4, but its technical content is identical to that ofISO/IEC15444‐12,whichisapplicabletoJPEG2000.

2 Normative references

The following documents, inwhole or in part, are normatively referenced in this document and areindispensable for its application. For dated references, only the edition cited applies. For undatedreferences,thelatesteditionofthereferenceddocument(includinganyamendments)applies.

ISO639‐2:1998,Codes for the representation of names of languages — Part 2: Alpha-3 code

ISO/IEC9834‐8:2005, Information technology — Open Systems Interconnection — Procedures for the operation of OSI Registration Authorities: Generation and registration of Universally Unique Identifiers (UUIDs) and their use as ASN.1 Object Identifier components

ISO/IEC11578:1996,Information technology — Open Systems Interconnection — Remote Procedure Call (RPC)

ISO/IEC14496‐1:2010:Information technology — Coding of audio-visual objects — Part 1: Systems

ISO/IEC14496‐10, Information technology — Coding of audio-visual objects — Part 10: Advanced Video Coding

ISO/IEC14496‐14,Information technology — Coding of audio-visual objects — Part 14: MP4 file format

ISO/IEC15444‐1,Information technology — JPEG 2000 image coding system: Core coding system

ISO/IEC15444‐3,Information technology — JPEG 2000 image coding system: Motion JPEG 2000

ISO/IEC15938‐1,Information technology — Multimedia content description interface — Part 1: Systems

ISO/IEC23001‐1, Information technology — MPEG systems technologies — Part 1: Binary MPEG format for XML

Page 16: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

2 ©ISO/IEC2015–Allrightsreserved

ISO/IEC23002‐3, Information technology — MPEG video technologies — Part 3: Representation of auxiliary video and supplemental information

ISO/IEC29199‐2:2012, Information technology — JPEG XR image coding system — Part 2: Image coding specification

ISO15076‐1:2010, Image technology colour management — Architecture, profile format and data structure — Part 1: Based on ICC.1:2010

IETF RFC 2045, Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies,FREED,N.andBORENSTEIN,N.,November1996

IETFRFC2046, Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types, FREED, N. andBORENSTEIN,N.,November1996

IETFRFC3550,RTP: A Transport Protocol for Real-Time Applications,SCHULZRINNE,H.etal.,July2003.

IETFRFC3711,The Secure Real-time Transport Protocol (SRTP),BAUGHER,M.etal.,March2004

IETFRFC5052,Forward Error Correction (FEC) Building Block,WATSON,M.etal.,August2007

IETFRFC5905,Network Time Protocol Version 4: Protocol and Algorithms Specification,MILLS,D., et al,June2010

SMIL1.0, Synchronized Multimedia Integration Language (SMIL) 1.0 Specification,<http://www.w3.org/TR/REC‐smil/>

Rec.ITU‐RTF.460‐6,Standard-frequency and time-signal emissions (Annex I for the definition of UTC.)

ISO/IEC23003‐4,Information technology – MPEG audio technologies – Part 4: Dynamic range control

ITU‐R, Recommendation ITU‐R BS.1770‐3.Algorithm to measure audio programme loudness and true-peak audio level,August2012.

ITU‐R, Recommendation ITU‐R BS.1771‐1.Requirements for loudness and true-peak indicating meters,January2012.

EBUR128‐2014,Loudness normalization and permitted maximum level of audio signals,June2014.

EBUEBU–Tech3341,Loudness Metering: EBU mode metering to supplement loudness normalization in accordance with EBU R128

EBUEBU‐Tech3342,Loudness Range:�A measure to supplement loudness normalisation�in accordance with EBU R 128,Geneva,August2011

ETSITS101154V1.11.1,Digital Video Broadcasting (DVB); Specification for the use of Video and Audio Coding in Broadcasting Applications based on the MPEG-2 Transport Stream,November2012.

Page 17: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 3

ATSCDocumentA/85:2011,ATSC Recommended Practice: Techniques for Establishing and Maintaining Audio Loudness for Digital Television,July2011

ATSCDoc.A/52:2012,ATSC Standard: Digital Audio Compression (AC-3, E-AC-3).

IETFRFC5646,BCP47,Tags for Identifying Languages,PHILLIPS,A.,etal,September2009

3 Terms, definitions, and abbreviated terms

3.1 Terms and definitions

Forthepurposesofthisdocument,thefollowingtermsanddefinitionsapply.

3.1.1 box object‐orientedbuildingblockdefinedbyauniquetypeidentifierandlength

Note1toentry:Called‘atom’insomespecifications,includingthefirstdefinitionofMP4.

3.1.2 chunk contiguoussetofsamplesforonetrack

3.1.3 container box boxwhosesolepurposeistocontainandgroupasetofrelatedboxes

Note1toentry:Containerboxesarenormallynotderivedfrom‘fullbox’.

3.1.4 hint track specialtrackwhichdoesnotcontainmediadata,butinsteadcontainsinstructionsforpackagingoneormoretracksintoastreamingchannel

3.1.5 hinter toolthatisrunonafilecontainingonlymedia,toaddoneormorehinttrackstothefileandsofacilitatestreaming

3.1.6 ISO Base Media File nameofthefilesconformingtothefileformatdescribedinthisspecification

3.1.7 leaf subsegment subsegmentthatdoesnotcontainanyindexinginformationthatwouldenableitsfurtherdivisionintosubsegments

Page 18: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

4 ©ISO/IEC2015–Allrightsreserved

3.1.8 media data box boxwhichcanholdtheactualmediadataforapresentation(‘mdat’)

3.1.9 movie box containerboxwhosesub‐boxesdefinethemetadataforapresentation(‘moov’)

3.1.10 movie-fragment relative addressing signalling of offsets for media data in movie fragments that is relative to the start of those moviefragments, specifically setting the flagsbase‐data‐offset‐present to0 anddefault‐base‐is‐moof to1 inTrackFragmentHeaderBoxes

Note1toentry:Settingthedefault‐base‐is‐moofflagto1isonlyrelevantformoviefragmentsthatcontainmorethanonetrackrun(eitherinthesameorseveraltracks).

3.1.11 presentation oneormoremotionsequences,possiblycombinedwithaudio

3.1.12 random access point (RAP) sampleinatrackthatstartsattheISAUofaSAPoftype1or2or3asdefinedinAnnexI;informally,asample, fromwhichwhendecodingstarts, thesample itselfandall samples following incompositionordercanbecorrectlydecoded

3.1.13 random access recovery point sample ina trackwithpresentation timeequal to theTSAPofaSAPof type4asdefined inAnnex I;informally,asample, thatcanbecorrectlydecodedafterhavingdecodedanumberofsamplesthat isbeforethissampleindecodingorder,sometimesknownasgradualdecodingrefresh

3.1.14 sample allthedataassociatedwithasingletimestamp

Note1toentry:Notwosampleswithinatrackcansharethesametime‐stamp.

Note2toentry:Innon‐hinttracks,asampleis,forexample,anindividualframeofvideo,aseriesofvideoframesindecodingorder, or a compressed section of audio in decoding order; in hint tracks, a sample defines the formation of one ormorestreamingpackets.

3.1.15 sample description structurewhichdefinesanddescribestheformatofsomenumberofsamplesinatrack

Page 19: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 5

3.1.16 sample table packeddirectoryforthetimingandphysicallayoutofthesamplesinatrack

3.1.17 sync sample sampleinatrackthatstartsattheISAUofaSAPoftype1or2asdefinedinAnnexI;informally,amediasamplethatstartsanewindependentsequenceofsamples;ifdecodingstartsatthesyncsample,itandsucceeding samples indecoding order can all be correctlydecoded, and the resulting set of decodedsamplesformsthecorrectpresentationofthemediastartingatthedecodedsamplethathastheearliestcomposition time; a media format may provide a more precise definition of a sync sample for thatformat

3.1.18 segment portionofan ISObasemedia file format file, consistingofeither (a)amoviebox,with its associatedmediadata (if any) andotherassociatedboxesor (b)oneormoremovie fragmentboxes,with theirassociatedmediadata,andotherassociatedboxes

3.1.18 subsegment timeintervalofasegmentformedfrommoviefragmentboxes,thatisalsoavalidsegment

3.1.19 track timedsequenceofrelatedsamples(q.v.)inanISObasemediafile

Note 1 to entry: For media data, a track corresponds to a sequence of images or sampled audio; for hint tracks, a trackcorrespondstoastreamingchannel.

3.2 Abbreviated terms

Forthepurposesofthisdocument,thefollowingabbreviatedtermsapply.

ALC AsynchronousLayeredCoding

FD FileDelivery

FDT FileDeliveryTable

FEC ForwardErrorCorrection

FLUTE FileDeliveryoverUnidirectionalTransport

IANA InternetAssignedNumbersAuthority

LCT LayeredCodingTransport

MBMS MultimediaBroadcast/MulticastService

Page 20: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

6 ©ISO/IEC2015–Allrightsreserved

4 Object-structured File Organization

4.1 File Structure

Filesareformedasaseriesofobjects,calledboxesinthisspecification.Alldataiscontainedinboxes;there is no other datawithin the file. This includes any initial signature required by the specific fileformat.

All object‐structured files conformant to this section of this specification (all Object‐Structured files)shallcontainaFileTypeBox.

4.2 Object Structure

Anobjectinthisterminologyisabox.

Boxesstartwithaheaderwhichgivesbothsizeandtype.Theheaderpermitscompactorextendedsize(32or64bits)andcompactorextendedtypes(32bitsorfullUniversalUniqueIDentifiers,i.e.UUIDs).Thestandardboxesallusecompacttypes(32‐bit)andmostboxeswillusethecompact(32‐bit)size.TypicallyonlytheMediaDataBox(es)needthe64‐bitsize.

Thesizeistheentiresizeofthebox,includingthesizeandtypeheader,fields,andallcontainedboxes.Thisfacilitatesgeneralparsingofthefile.

The definitions of boxes are given in the syntax description language (SDL) defined inMPEG‐4 (seereference in Clause2). Comments in the code fragments in this specification indicate informativematerial.

The fields in theobjectsarestoredwith themostsignificantbyte first, commonlyknownasnetworkbyte order or big‐endian format.When fields smaller than a byte are defined, or fields span a byteboundary,thebitsareassignedfromthemostsignificantbits ineachbytetotheleastsignificant.Forexample,afieldoftwobitsfollowedbyafieldofsixbitshasthetwobitsinthehighorderbitsofthebyte.

aligned(8) class Box (unsigned int(32) boxtype, optional unsigned int(8)[16] extended_type) { unsigned int(32) size; unsigned int(32) type = boxtype; if (size==1) { unsigned int(64) largesize; } else if (size==0) { // box extends to end of file } if (boxtype==‘uuid’) { unsigned int(8)[16] usertype = extended_type; } }

Page 21: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 7

Thesemanticsofthesetwofieldsare:

size is an integer that specifies the number of bytes in this box, including all its fields andcontainedboxes;ifsizeis1thentheactualsizeisinthefieldlargesize;ifsizeis0,thenthisboxisthelastoneinthefile,anditscontentsextendtotheendofthefile(normallyonlyusedforaMediaDataBox)

typeidentifiestheboxtype;standardboxesuseacompacttype,whichisnormallyfourprintablecharacters,topermiteaseofidentification,andisshownsointheboxesbelow.Userextensionsuseanextendedtype;inthiscase,thetypefieldissetto‘uuid’.

Boxeswithanunrecognizedtypeshallbeignoredandskipped.

Manyobjectsalsocontainaversionnumberandflagsfield:

aligned(8) class FullBox(unsigned int(32) boxtype, unsigned int(8) v, bit(24) f) extends Box(boxtype) { unsigned int(8) version = v; bit(24) flags = f; }

Thesemanticsofthesetwofieldsare:

versionisanintegerthatspecifiestheversionofthisformatofthebox.flagsisamapofflags

Boxeswithanunrecognizedversionshallbeignoredandskipped.

4.3 File Type Box

4.3.1 Definition

BoxType: `ftyp’Container: FileMandatory: YesQuantity: Exactlyone(butseebelow)

Fileswrittentothisversionofthisspecificationmustcontainafile‐typebox.Forcompatibilitywithanearlierversionofthisspecification,filesmaybeconformanttothisspecificationandnotcontainafile‐type box. Files with no file‐type box should be read as if they contained an FTYP box withMajor_brand='mp41', minor_version=0,andthesinglecompatiblebrand'mp41'.

Amedia‐filestructuredtothispartofthisspecificationmaybecompatiblewithmorethanonedetailedspecification,anditisthereforenotalwayspossibletospeakofasingle‘type’or‘brand’forthefile.ThismeansthattheutilityofthefilenameextensionandMultipurposeInternetMailExtension(MIME)typearesomewhatreduced.

Thisboxmustbeplacedasearlyaspossible inthefile(e.g.afteranyobligatorysignature,butbeforeany significant variable‐sizeboxes suchas aMovieBox,MediaDataBox, orFreeSpace). It identifieswhichspecificationisthe‘bestuse’ofthefile,andaminorversionofthatspecification;andalsoasetofother specifications towhich the file complies. Readers implementing this format should attempt toreadfilesthataremarkedascompatiblewithanyofthespecificationsthatthereaderimplements.Anyincompatiblechangeinaspecificationshouldthereforeregisteranew‘brand’identifiertoidentifyfilesconformanttothenewspecification.

Page 22: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

8 ©ISO/IEC2015–Allrightsreserved

Theminorversionisinformativeonly.Itdoesnotappearforcompatible‐brands,andmustnotbeusedtodetermine the conformanceof a file to a standard. Itmayallowmoreprecise identificationof themajorspecification,forinspection,debugging,orimproveddecoding.

Fileswouldnormallybeexternallyidentified(e.g.withafileextensionormimetype)thatidentifiesthe‘bestuse’(majorbrand),orthebrandthattheauthorbelieveswillprovidethegreatestcompatibility.

This section of this specification does not define any brands. However, see subclause 6.3 below forbrands for filesconformant to thewholespecificationandnot just thissection.All file formatbrandsdefinedinthisspecificationareincludedinAnnexEwithasummaryofwhichfeaturestheyrequire.

4.3.2 Syntax

aligned(8) class FileTypeBox extends Box(‘ftyp’) { unsigned int(32) major_brand; unsigned int(32) minor_version; unsigned int(32) compatible_brands[]; // to end of the box }

4.3.3 Semantics

Thisboxidentifiesthespecificationstowhichthisfilecomplies.

Eachbrandisaprintablefour‐charactercode,registeredwithISO,thatidentifiesaprecisespecification.

major_brand –isabrandidentifierminor_version –isaninformativeintegerfortheminorversionofthemajorbrandcompatible_brands –isalist,totheendofthebox,ofbrands

5 Design Considerations

5.1 Usage

5.1.1 Introduction

Thefileformatisintendedtoserveasabasisforanumberofoperations.Inthesevariousroles,itmaybeusedindifferentways,anddifferentaspectsoftheoveralldesignexercised.

5.1.2 Interchange

Whenusedasaninterchangeformat,thefileswouldnormallybeself‐contained(notreferencingmediain other files), contain only the media data actually used in the presentation, and not contain anyinformationrelatedtostreaming.Thiswillresult inasmall,protocol‐independent,self‐containedfile,whichcontainsthecoremediadataandtheinformationneededtooperateonit.

Thefollowingdiagramgivesanexampleofasimpleinterchangefile,containingtwostreams.

Page 23: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 9

ISO file

moov …other boxes

mdat

Interleaved, time-ordered, videoand audio framestrak (audio)

trak (video)

Figure 1 — Simple interchange file

5.1.3 Content Creation

Duringcontentcreation,anumberofareasoftheformatcanbeexercisedtousefuleffect,particularly:

theabilitytostoreeachelementarystreamseparately(notinterleaved),possiblyinseparatefiles.

theabilitytoworkinasinglepresentationthatcontainsmediadataandotherstreams(e.g.editing the audio track in the uncompressed format, to align with an already‐preparedvideotrack).

Thesecharacteristicsmeanthatpresentationsmaybeprepared,editsapplied,andcontentdevelopedand integrated without either iteratively re‐writing the presentation on disc – which would benecessary if interleavewas required andunuseddatahad to bedeleted;and alsowithout iterativelydecodingandre‐encodingthedata–whichwouldbenecessaryifthedatamustbestoredinanencodedstate.

Inthefollowingdiagram,asetoffilesbeingusedintheprocessofcontentcreationisshown.

Page 24: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

10 ©ISO/IEC2015–Allrightsreserved

media file video frames, possibly un-ordered with other unused data

ISO File

…other boxes (inc. moov)

mdat Video and Audio frames possibly un-ordered with other unused data

ISO file

moov …other boxes

trak (audio)

trak (video)

Figure 2 — Content Creation File

5.1.4 Preparation for streaming

Whenprepared forstreaming, the filemustcontain informationtodirect thestreamingserver in theprocessofsendingtheinformation.Inaddition,itishelpfuliftheseinstructionsandthemediadataareinterleavedsothatexcessiveseekingcanbeavoidedwhenservingthepresentation.Itisalsoimportantthat the originalmedia data be retained unscathed, so that the filesmay be verified, or re‐edited orotherwisere‐used.Finally, it ishelpful if asingle filecanbeprepared formore thanoneprotocol, sodifferingserversmayuseitoverdisparateprotocols.

5.1.5 Local presentation

‘Locally’ viewing a presentation (i.e. directly from the file, not over a streamed interconnect) is animportantapplication;itisusedwhenapresentationisdistributed(e.g.onCDorDVDROM),duringtheprocessofdevelopment,andwhenverifyingthecontentonstreamingservers.Suchlocalviewingmustbesupported,withfullrandomaccess.IfthepresentationisonCDorDVDROM,interleaveisimportantasseekingmaybeslow.

5.1.6 Streamed presentation

Whenaserveroperatesfromthefiletomakeastream,theresultingstreammustbeconformantwiththespecificationsfortheprotocol(s)used,andshouldcontainnotraceofthefile‐formatinformationinthefileitself.Theserverneedstobeabletorandomaccessthepresentation.Itcanbeusefultore‐useservercontent(e.g.tomakeexcerpts)byreferencingthesamemediadatafrommultiplepresentations;itcanalsoassiststreamingifthemediadatacanbeonread‐onlymedia(e.g.CD)andnotcopied,merelyaugmented,whenpreparedforstreaming.

Thefollowingdiagramshowsapresentationpreparedforstreamingoveramultiplexingprotocol,onlyonehinttrackisrequired.

Page 25: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 11

ISO file

moov …other boxes

mdat

Interleaved, time-ordered, videoand audio frames, and hintinstructions

trak (video)

trak (audio)

trak (hint)

Figure 3 — Hinted Presentation for Streaming

5.2 Design principles

Thefilestructureisobject‐oriented;afilecanbedecomposedintoconstituentobjectsverysimply,andthestructureoftheobjectsinferreddirectlyfromtheirtype.

Media‐data isnot ‘framed’by the file format; the file formatdeclarations that give the size, typeandpositionofmediadataunitsarenotphysicallycontiguouswiththemediadata.Thismakesitpossibletosubsetthemedia‐data,andtouseitinitsnaturalstate,withoutrequiringittobecopiedtomakespaceforframing.Themetadataisusedtodescribethemediadatabyreference,notbyinclusion.

Similarlytheprotocol informationforaparticularstreamingprotocoldoesnot framethemediadata;theprotocolheadersarenotphysicallycontiguouswiththemediadata.Instead,themediadatacanbeincludedbyreference.Thismakesitpossibletorepresentmediadatainitsnaturalstate,notfavouringanyprotocol.Italsomakesitpossibleforthesamesetofmediadatatoserveforlocalpresentation,andformultipleprotocols.

Theprotocolinformationisbuiltinsuchawaythatthestreamingserversneedtoknowonlyabouttheprotocolandthewayitshouldbesent;theprotocolinformationabstractsknowledgeofthemediasothattheserversare,toalargeextent,media‐typeagnostic.Similarlythemedia‐data,storedasitisinaprotocol‐unawarefashion,enablesthemediatoolstobeprotocol‐agnostic.

The file formatdoesnot require that a singlepresentationbe in a single file. This enablesboth sub‐settingandre‐useofcontent.Whencombinedwiththenon‐framingapproach,italsomakesitpossibletoincludemediadatainfilesnotformattedtothisspecification(e.g. ‘raw’filescontainingonlymediadataandnodeclarativeinformation,orfileformatsalreadyinuseinthemediaorcomputerindustries).

Thefileformatisbasedonacommonsetofdesignsandarichsetofpossiblestructuresandusages.Thesameformatservesallusages;translationisnotrequired.However,whenusedinaparticularway(e.g.forlocalpresentation),thefilemayneedstructuringincertainwaysforoptimalbehaviour(e.g.time‐ordering of the data). No normative structuring rules are defined by this specification, unless arestrictedprofileisused.

Page 26: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

12 ©ISO/IEC2015–Allrightsreserved

6 ISO Base Media File organization

6.1 Presentation structure

6.1.1 File Structure

A presentation may be contained in several files. One file contains the metadata for the wholepresentation, and is formatted to this specification. This file may also contain all the media data,whereuponthepresentationisself‐contained.Theotherfiles,ifused,arenotrequiredtobeformattedtothisspecification;theyareusedtocontainmediadata,andmayalsocontainunusedmediadata,orotherinformation.Thisspecificationconcernsthestructureofthepresentationfileonly.Theformatofthemedia‐data files isconstrainedby thisspecificationonly in that themedia‐data in themedia filesmustbecapableofdescriptionbythemetadatadefinedhere.

These other filesmay be ISO files, image files, or other formats. Only themedia data itself, such asJPEG2000images,isstoredintheseotherfiles;alltimingandframing(positionandsize)informationisintheISObasemediafile,sotheancillaryfilesareessentiallyfree‐format.

IfanISOfilecontainshinttracks,themediatracksthatreferencethemediadatafromwhichthehintswerebuilt shall remain in the file,even if thedatawithin them isnotdirectlyreferencedby thehinttracks;afterdeletingallhinttracks,theentireun‐hintedpresentationshallremain.Notethatthemediatracksmay,however,refertoexternalfilesfortheirmediadata.

AnnexAprovidesaninformativeintroduction,whichmaybeofassistancetofirst‐timereaders.

6.1.2 Object Structure

The file is structuredas a sequenceofobjects; someof theseobjectsmay containotherobjects. Thesequenceofobjectsinthefileshallcontainexactlyonepresentationmetadatawrapper(theMovieBox).Itisusuallyclosetothebeginningorendofthefile,topermititseasylocation.Theotherobjectsfoundat this levelmay be a File‐Type box, Free Space Boxes,Movie Fragments,Meta‐data, orMedia DataBoxes.

6.1.3 Meta Data and Media Data

Themetadataiscontainedwithinthemetadatawrapper(theMovieBox);themediadataiscontainedeither in the same file, withinMedia Data Box(es), or in other files. Themedia data is composed ofimages or audio data; themedia data objects, ormedia data files, may contain other un‐referencedinformation.

6.1.4 Track Identifiers

The track identifiersused inan ISO fileareuniquewithin that file;no two tracks shalluse the sameidentifier.

Thenexttrackidentifiervaluestoredinnext_track_IDintheMovieHeaderBoxgenerallycontainsavalueonegreaterthanthelargesttrackidentifiervaluefoundinthefile.Thisenableseasygenerationofatrackidentifierundermostcircumstances.However,ifthisvalueisequaltoones(32‐bitunsignedmaxint),thenasearchforanunusedtrackidentifierisneededforalladditions.

Page 27: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 13

6.2 Metadata Structure (Objects)

6.2.1 Box

Type fields not defined here are reserved. Private extensions shall be achieved through the‘uuid’type.Inaddition,thefollowingtypesarenotandwillnotbeused,orusedonlyintheirexistingsense,infutureversionsof thisspecification, toavoidconflictwithexistingcontentusingearlierpre‐standardversionsofthisformat:

clip, crgn, matt, kmat, pnot, ctab, load, imap; these track reference types (as found in the reference_type of a Track Reference Box): tmcd, chap, sync, scpt, ssrc.

Anumberof boxes contain index values into sequences in otherboxes.These indexes startwith thevalue1(1isthefirstentryinthesequence).

6.2.2 Data Types and fields

Inanumberofboxesinthisspecification,therearetwovariantforms:version0using32‐bitfields,andversion1using64‐bitsizesforthosesamefields.Ingeneral,ifaversion0box(32‐bitfieldsizes)canbeused, it should be; version 1 boxes should be used onlywhen the 64‐bit field sizes they permit, arerequired.Values forcounters,offsets, times,durationsetc. in this formatdonot ‘wrap’ to0whenthemaximumvaluethatcanbestoredintheirfieldisreached;appropriatelylargefieldsmustbeusedforallvalues.

For convenienceduring contentcreation thereare creationandmodification times stored in the file.These can be 32‐bit or 64‐bit numbers, counting seconds since midnight, Jan. 1, 1904, which is aconvenientdateforleap‐yearcalculations.32bitsaresufficientuntilapproximatelyyear2040.ThesetimesshallbeexpressedinUniversalTimeCoordinated(UTC),andthereforemayneedadjustmenttolocaltimeifdisplayed.

Fixed‐point numbers are signed or unsigned values resulting from dividing an integer by anappropriatepowerof2.Forexample,a30.2fixed‐pointnumberisformedbydividinga32‐bitintegerby4.

Fields shown as “template” in the box descriptions are optional in the specifications that use thisspecification.Ifthefieldisusedinanotherspecification,thatusemustbeconformantwithitsdefinitionhere, and the specification must define whether the use is optional or mandatory. Similarly, fieldsmarked“pre‐defined”wereusedinanearlierversionofthisspecification.Forbothkindsoffields,ifafieldofthatkindisnotusedinaspecification,thenitshouldbesettotheindicateddefaultvalue.Ifthefieldisnotuseditmustbecopiedun‐inspectedwhenboxesarecopied,andignoredonreading.

Matrixvalueswhichoccurintheheadersspecifyatransformationofvideoimagesforpresentation.Notallderivedspecificationsusematrices;iftheyarenotused,theyshallbesettotheidentitymatrix.Ifamatrixisused,thepoint(p,q)istransformedinto(p',q')usingthematrixasfollows:

Page 28: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

14 ©ISO/IEC2015–Allrightsreserved

(p q 1) * | a b u | = (m n z) | c d v | | x y w | m = ap + cq + x; n = bp + dq + y; z = up + vq + w; p' = m/z; q' = n/z

The coordinates {p,q} are on the decompressed frame, and {p’, q’} are at the rendering output.Therefore,forexample,thematrix{2,0,0,0,2,0,0,0,1}exactlydoublesthepixeldimensionofanimage.The co‐ordinates transformed by the matrix are not normalized in any way, and represent actualsamplelocations.Therefore{x,y}can,forexample,beconsideredatranslationvectorfortheimage.

Theco‐ordinateoriginislocatedattheupperleftcorner,andXvaluesincreasetotheright,andYvaluesincreasedownwards.{p,q}and{p’,q’}aretobetakenasabsolutepixel locationsrelativetotheupperlefthandcorneroftheoriginalimage(afterscalingtothesizedeterminedbythetrackheader'swidthandheight)andthetransformed(rendering)surface,respectively.

Eachtrackiscomposedusingitsmatrixasspecifiedintoanoverallimage;thisisthentransformedandcomposed according to the matrix at the movie level in the MovieHeaderBox. It is application‐dependent whether the resulting image is ‘clipped’ to eliminate pixels, which have no display, to avertical rectangular regionwithin awindow, for example. So for example, if only one video track isdisplayed and it has a translation to {20,30}, and a unity matrix is in the MovieHeaderBox, anapplicationmaychoosenottodisplaytheempty“L”shapedregionbetweentheimageandtheorigin.

Allthevaluesinamatrixarestoredas16.16fixed‐pointvalues,exceptforu,vandw,whicharestoredas2.30fixed‐pointvalues.

Thevaluesinthematrixarestoredintheorder{a,b,u,c,d,v,x,y,w}.

6.2.3 Box Order

An overall view of the normal encapsulation structure is provided in the following informativeTable1—Boxtypes,structure,andcross‐reference (Informative). Intheeventofaconflictbetweenthistableandtheprose, theproseprevails.Theorderofboxeswithin itscontainer isnotnecessarilyindicatedinthetable.

Thetableshowsthoseboxesthatmayoccuratthetop‐levelintheleft‐mostcolumn;indentationisusedtoshowpossiblecontainment.Thus, forexample,aTrackHeaderBox(tkhd) is found inaTrackBox(trak),whichisfoundinaMovieBox(moov).Notallboxesneedtobeusedinallfiles;themandatoryboxesaremarkedwithanasterisk(*).See thedescriptionof the individualboxes foradiscussionofwhatmustbeassumediftheoptionalboxesarenotpresent.

UserdataobjectsshallbeplacedonlyinMovieorTrackBoxes,andobjectsusinganextendedtypemaybeplacedinawidevarietyofcontainers,notjustthetoplevel.

Inordertoimproveinteroperabilityandutilityofthefiles,thefollowingrulesandguidelinesshallbefollowedfortheorderofboxes:

Page 29: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 15

1) The file type box‘ftyp’ shall occur before any variable‐length box (e.g.movie, free space,mediadata).Onlyafixed‐sizeboxsuchasafilesignature,ifrequired,mayprecedeit.

2) Itisstronglyrecommendedthatallheaderboxesbeplacedfirstintheircontainer:theseboxesaretheMovieHeader,TrackHeader,MediaHeader,andthespecificmediaheadersinsidetheMediaInformationBox(e.g.theVideoMediaHeader).

3) AnyMovieFragmentBoxesshallbeinsequenceorder(seesubclause8.8.5).

4) It is recommended that the boxes within the Sample Table Box be in the following order:SampleDescription,TimetoSample,SampletoChunk,SampleSize,ChunkOffset.

5) ItisstronglyrecommendedthattheTrackReferenceBoxandEditList(ifany)shouldprecedetheMediaBox,andtheHandlerReferenceBoxshouldprecedetheMediaInformationBox,andtheDataInformationBoxshouldprecedetheSampleTableBox.

6) It isrecommended thatuserDataBoxesbeplaced last intheircontainer,which iseithertheMovieBoxorTrackBox.

7) ItisrecommendedthattheMovieFragmentRandomAccessBox,ifpresent,belastinthefile.

8) It is recommended that the progressive download information box be placed as early aspossibleinfiles,formaximumutility.

Table 1 — Box types, structure, and cross-reference(Informative)

Box types, structure, and cross-reference (Informative) ftyp * 4.3 file type and compatibility pdin 8.1.3 progressive download information moov * 8.2.1 container for all the metadata mvhd * 8.2.2 movie header, overall declarations meta 8.11.1 metadata trak * 8.3.1 container for an individual track or stream tkhd * 8.3.2 track header, overall information about the track tref 8.3.3 track reference container trgr 8.3.4 track grouping indication edts 8.6.4 edit list container elst 8.6.6 an edit list meta 8.11.1 metadata mdia * 8.4 container for the media information in a track mdhd * 8.4.2 media header, overall information about the media hdlr * 8.4.3 handler, declares the media (handler) type elng 8.4.6 extended language tag minf * 8.4.4 media information container

vmhd 12.1.2 video media header, overall information (video

track only)

smhd 12.2.2 sound media header, overall information (sound

track only)

hmhd 12.4.2 hint media header, overall information (hint track

only)

sthd 12.6.2 subtitle media header, overall information (subtitle

track only)

nmhd 8.4.5.2 Null media header, overall information (some

tracks only) dinf * 8.7.1 data information box, container

Page 30: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

16 ©ISO/IEC2015–Allrightsreserved

Box types, structure, and cross-reference (Informative)

dref * 8.7.2 data reference box, declares source(s) of media

data in track

stbl * 8.5.1 sample table box, container for the time/space

map

stsd * 8.5.2 sample descriptions (codec types, initialization

etc.) stts * 8.6.1.2 (decoding) time-to-sample ctts 8.6.1.3 (composition) time to sample cslg 8.6.1.4 composition to decode timeline mapping stsc * 8.7.4 sample-to-chunk, partial data-offset information stsz 8.7.3.2 sample sizes (framing) stz2 8.7.3.3 compact sample sizes (framing) stco * 8.7.5 chunk offset, partial data-offset information co64 8.7.5 64-bit chunk offset stss 8.6.2 sync sample table stsh 8.6.3 shadow sync sample table padb 8.7.6 sample padding bits stdp 8.7.6 sample degradation priority sdtp 8.6.4 independent and disposable samples sbgp 8.9.2 sample-to-group sgpd 8.9.3 sample group description subs 8.7.7 sub-sample information saiz 8.7.8 sample auxiliary information sizes saio 8.7.9 sample auxiliary information offsets udta 8.10.1 user-data mvex 8.8.1 movie extends box mehd 8.8.2 movie extends header box trex * 8.8.3 track extends defaults leva 8.8.13 level assignment

moof 8.8.4 movie fragment mfhd * 8.8.5 movie fragment header meta 8.11.1 metadata traf 8.8.6 track fragment tfhd * 8.8.7 track fragment header trun 8.8.8 track fragment run sbgp 8.9.2 sample-to-group sgpd 8.9.3 sample group description subs 8.7.7 sub-sample information saiz 8.7.8 sample auxiliary information sizes saio 8.7.9 sample auxiliary information offsets tfdt 8.8.12 track fragment decode time meta 8.11.1 metadata

mfra 8.8.9 movie fragment random access tfra 8.8.10 track fragment random access mfro * 8.8.11 movie fragment random access offset

mdat 8.2.2 media data container free 8.1.2 free space skip 8.1.2 free space udta 8.10.1 user-data cprt 8.10.2 copyright etc. tsel 8.10.3 track selection box strk 8.14.3 sub track box stri 8.14.4 sub track information box strd 8.14.5 sub track definition box

meta 8.11.1 metadata hdlr * 8.4.3 handler, declares the metadata (handler) type dinf 8.7.1 data information box, container

dref 8.7.2 data reference box, declares source(s) of

metadata items

Page 31: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 17

Box types, structure, and cross-reference (Informative) iloc 8.11.3 item location ipro 8.11.5 item protection sinf 8.12.1 protection scheme information box frma 8.12.2 original format box schm 8.12.5 scheme type box schi 8.12.6 scheme information box iinf 8.11.6 item information xml 8.11.2 XML container bxml 8.11.2 binary XML container pitm 8.11.4 primary item reference fiin 8.13.2 file delivery item information paen 8.13.2 partition entry fire 8.13.7 file reservoir fpar 8.13.3 file partition fecr 8.13.4 FEC reservoir segr 8.13.5 file delivery session group gitn 8.13.6 group id to name idat 8.11.11 item data iref 8.11.12 item reference

meco 8.11.7 additional metadata container mere 8.11.8 metabox relation meta 8.11.1 metadata

styp 8.16.2 segment type sidx 8.16.3 segment index ssix 8.16.4 subsegment index prft 8.16.5 producer reference time

6.2.4 URIs as type indicators

WhenURIsareusedasatypeindicator(e.g.inasampleentryorforun‐timedmeta‐data),theURImustbeabsolute,notrelativeandtheformatandmeaningofthedatamustbedefinedbytheURIinquestion.Thisidentificationmaybehierarchical,inthataninitialsub‐stringoftheURImightidentifytheoverallnature or family of the data (e.g. urn:oid: identifies that themetadata is labelled by an ISO‐standardobjectidentifier).

TheURIshouldbe,butisnotrequiredtobe,de‐referencable.ItmaybestringcomparedbyreaderswiththesetofURItypesitknowsandrecognizes.URIsprovidea largenon‐collidingnon‐registeredspacefortypeidentifiers.

IftheURIcontainsadomainname(e.g.itisaURL),thenitshouldalsocontainamonth‐dateintheformmmyyyy.Thatdatemustbenearthetimeofthedefinitionoftheextension,anditmustbetruethattheURI was defined in a way authorized by the owner of the domain name at that date. (This avoidsproblemswhendomainnameschangeownership).

6.3 Brand Identification

ThedefinitionsofthebrandsthatthatapplytothefileformatarefoundinAnnexE.

Page 32: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

18 ©ISO/IEC2015–Allrightsreserved

7 Streaming Support

7.1 Handling of Streaming Protocols

Thefileformatsupportsstreamingofmediadataoveranetworkaswellaslocalplayback.Theprocessofsendingprotocoldataunits is time‐based, just like thedisplayof time‐baseddata,and is thereforesuitably described by a time‐based format. A file or ‘movie’ that supports streaming includesinformationaboutthedataunitstostream.Thisinformationisincludedinadditionaltracksofthefilecalled“hint” tracks.Hint tracksmayalsobeused torecorda stream; thesearecalledReceptionHintTracks,todifferentiatethemfromplain(orserver,ortransmission)hinttracks.

Transmissionorserverhinttrackscontaininstructionstoassistastreamingserverintheformationofpackets for transmission.These instructionsmay contain immediatedata for the server to send (e.g.headerinformation)orreferencesegmentsofthemediadata.Theseinstructionsareencodedinthefileinthesamewaythateditingorpresentationinformationisencodedinafileforlocalplayback.Insteadofeditingorpresentationinformation,informationisprovidedwhichallowsaservertopacketizethemediadatainamannersuitableforstreamingusingaspecificnetworktransport.

Thesamemediadataisusedinafilethatcontainshints,whetheritisforlocalplayback,orstreamingover a number of different protocols. Separate ‘hint’ tracks for different protocols may be includedwithin the same file and themedia will play over all such protocolswithoutmaking any additionalcopiesofthemediaitself.Inaddition,existingmediacanbeeasilymadestreamablebytheadditionofappropriatehinttracksforspecificprotocols.Themediadataitselfneednotberecastorreformattedinanyway.

Thisapproachtostreamingandrecordingismorespaceefficientthananapproachthatrequiresthatthemedia information be partitioned into the actual data units that will be transmitted for a giventransportandmediaformat.Undersuchanapproach,localplaybackrequireseitherre‐assemblingthemedia from the packets, or having two copies of the media — one for local playback and one forstreaming. Similarly, streaming such media over multiple protocols using this approach requiresmultiplecopiesofthemediadataforeachtransport.Thisisinefficientwithspace,unlessthemediadatahas been heavily transformed for streaming (e.g. by the application of error‐correcting codingtechniques,orbyencryption).

Receptionhinttracksmaybeusedwhenoneormorepacketstreamsofdataarerecorded.Receptionhint tracks indicate the order, reception timing, and contents of the received packets among otherthings.

NOTE Playersmayreproducethepacketstreamthatwasreceivedbasedonthereceptionhinttracksandprocessthereproducedpacketstreamasifitwasnewlyreceived.

7.2 Protocol ‘hint’ tracks

Supportforstreamingisbaseduponthefollowingthreedesignparameters:

Themediadataisrepresentedasasetofnetwork‐independentstandardtracks,whichmaybeplayed,edited,andsoon,asnormal;

Page 33: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 19

There is a common declaration and base structure for hint tracks; this common format isprotocol independent, but contains the declarations of which protocol(s) aredescribedinthehinttrack(s);

There is a specific design of the hint tracks for eachprotocol thatmaybe transmitted; allthesedesignsusethesamebasicstructure.Forexample,theremaybedesignsforRTP(fortheInternet)andMPEG‐2transport(forbroadcast),orfornewstandardorvendor‐specificprotocols.

Theresultingstreams,sentbytheserversunderthedirectionoftheserverhinttracksorreconstructedfromthereceptionhinttracks,needcontainnotraceoffile‐specificinformation.Thisdesigndoesnotrequire that the file structures or declaration style, be used either in the data on thewire or in thedecoding station. For example, a file using ITU‐T H.261 video and DVI audio, streamed under RTP,resultsinapacketstreamthatisfullycompliantwiththeIETFspecificationsforpackingthosecodingsintoRTP.

7.3 Hint Track Format

Hint tracks are used to describe elementary stream data in the file. Each protocol or each family ofrelatedprotocolshas itsownhint track format.Aserverhint track formatandareceptionhint trackformatforthesameprotocolaredistinguishablefromtheassociatedfour‐charactercodeofthesampledescriptionentry.Inotherwords,adifferentfour‐charactercodeisusedforaserverhinttrackandareceptionhinttrackofthesameprotocol.Thesyntaxoftheserverhinttrackformatandthereceptionhinttrackformatforthesameprotocolshouldbethesameorcompatiblesothatareceptionhinttrackcan be used for re‐sending of the stream provided that the potential degradations of the receivedstreamsare handled appropriately.Mostprotocolswill needonlyone sampledescription format foreachtrack.

Serversfindtheirhinttracksbyfirstfindingallhinttracks,andthenlookingwithinthatsetforserverhinttracksusingtheirprotocol(sampledescriptionformat).Iftherearechoicesatthispoint,thentheserverchoosesonthebasisofpreferredprotocolorbycomparingfeaturesinthehinttrackheaderorotherprotocol‐specificinformationinthesampledescriptions.Particularlyintheabsenceofserverhinttracks, serversmay also use reception hint tracks of their protocol.However, servers should handlepotentialdegradationsofthereceivedstreamdescribedbytheusedreceptionhinttrackappropriately.

Trackshavingthetrack_in_movie flagsetarecandidates forplayback,regardlessofwhethertheyaremediatracksorreceptionhinttracks.

Hinttracksconstructstreamsbypullingdataoutofothertracksbyreference.Theseothertracksmaybehinttracksorelementarystreamtracks.Theexactformofthesepointersisdefinedbythesampleformatfortheprotocol,butingeneraltheyconsistoffourpiecesofinformation:atrackreferenceindex,asamplenumber,anoffset,andalength.Someofthesemaybeimplicitforaparticularprotocol.These‘pointers’alwayspoint to theactualsourceof thedata. Ifahint track isbuilt ‘ontop’ofanotherhinttrack, then the secondhint trackmusthavedirect references to themedia track(s) usedby the firstwheredatafromthosemediatracksisplacedinthestream.

Allhinttracksuseacommonsetofdeclarationsandstructures.

Page 34: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

20 ©ISO/IEC2015–Allrightsreserved

Hinttracksarelinkedtotheelementarystreamtrackstheycarry,bytrackreferencesoftype‘hint’

Theyuseahandler‐typeof‘hint’intheHandlerReferenceBox

TheyuseaHintMediaHeaderBox

Theyuseahintsampleentryinthesampledescription,withanameandformatuniquetotheprotocoltheyrepresent.

Server hint tracks are usually marked as disabled for local playback, with their track headertrack_in_movieandtrack_in_preview flagssetto0.

Hint tracksmay be created by an authoring tool, ormay be added to an existing presentation by ahinting tool. Such a tool serves as a ‘bridge’ between themedia and the protocol, since it intimatelyunderstandsboth.Thispermitsauthoringtoolstounderstandthemediaformat,butnotprotocols,andforserverstounderstandprotocols(andtheirhinttracks)butnotthedetailsofmediadata.

Hinttracksdonotuseseparatecompositiontimes;the‘ctts’tableisnotpresentinhinttracks.Theprocessofhintingcomputestransmissiontimescorrectlyasthedecodingtime.

NOTE1:Serversusingreceptionhinttracksashintsforsendingofthereceivedstreamsshouldhandlethepotentialdegradationsof thereceivedstreams,suchas transmissiondelay jitterandpacket losses,gracefullyandensure that the constraints of the protocols and contained data formats are obeyed regardless of thepotentialdegradationsofthereceivedstreams.

NOTE2:ConversionofreceivedstreamstomediatracksallowsexistingplayerscompliantwithearlierversionsoftheISObasemediafileformattoprocessrecordedfilesaslongasthemediaformatsaresupported.However,mostmediacodingstandardsonlyspecifythedecodingoferror‐freestreams,andconsequentlyitshouldbeensuredthatthecontent inmediatrackscanbecorrectlydecoded.Playersmayutilizereceptionhinttracks for handling of degradations caused by the transmission, i.e., content that may not be correctlydecodedislocatedonlywithinreceptionhinttracks.Theneedforhavingaduplicateofthecorrectmediasamplesinbothamediatrackandareceptionhinttrackcanbeavoidedbyincludingdatafromthemediatrackbyreferenceintothereceptionhinttrack.

8 Box Structures

8.1 File Structure and general boxes

8.1.1 Media Data Box

8.1.1.1 Definition

BoxType: ‘mdat’Container: FileMandatory:NoQuantity: Zeroormore

Thisboxcontainsthemediadata.Invideotracks,thisboxwouldcontainvideoframes.ApresentationmaycontainzeroormoreMediaDataBoxes.Theactualmediadatafollowsthetypefield;itsstructureisdescribedbythemetadata(seeparticularlythesampletable,subclause8.5,andtheitemlocationbox,subclause8.11.3).

Page 35: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 21

Inlargepresentations,itmaybedesirabletohavemoredatainthisboxthana32‐bitsizewouldpermit.Inthiscase,thelargevariantofthesizefield,aboveinsubclause4.2,isused.

Theremaybeanynumberoftheseboxesinthefile(includingzero,ifallthemediadataisinotherfiles).Themetadatareferstomediadatabyitsabsoluteoffsetwithinthefile(seesubclause8.7.5,theChunkOffsetBox);soMediaDataBoxheadersandfreespacemayeasilybeskipped,andfileswithoutanyboxstructuremayalsobereferencedandused.

8.1.1.2 Syntax

aligned(8) class MediaDataBox extends Box(‘mdat’) { bit(8) data[]; }

8.1.1.3 Semantics

dataisthecontainedmediadata

8.1.2 Free Space Box

8.1.2.1 Definition

BoxTypes: ‘free’,‘skip’Container: FileorotherboxMandatory: NoQuantity: Zeroormore

The contents of a free‐space box are irrelevant andmay be ignored, or the object deleted, withoutaffecting thepresentation. (Careshouldbeexercisedwhendeleting theobject,as thismay invalidatetheoffsetsusedinthesampletable,unlessthisobjectisafterallthemediadata).

8.1.2.2 Syntax

aligned(8) class FreeSpaceBox extends Box(free_type) { unsigned int(8) data[]; }

8.1.2.3 Semantics

free_typemaybe‘free’or‘skip’.

8.1.3 Progressive Download Information Box

8.1.3.1 Definition

BoxTypes: ‘pdin’Container: FileMandatory: NoQuantity: ZeroorOne

The Progressive download information box aids the progressive download of an ISO file. The boxcontainspairs of numbers (to the endof thebox) specifying combinationsof effective file downloadbitrateinunitsofbytes/secandasuggestedinitialplaybackdelayinunitsofmilliseconds.

Page 36: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

22 ©ISO/IEC2015–Allrightsreserved

A receiving party can estimate the download rate it is experiencing, and from that obtain an upperestimateforasuitableinitialdelaybylinearinterpolationbetweenpairs,orbyextrapolationfromthefirstorlastentry.

Itisrecommendedthattheprogressivedownloadinformationboxbeplacedasearlyaspossibleinfiles,formaximumutility.

8.1.3.2 Syntax

aligned(8) class ProgressiveDownloadInfoBox extends FullBox(‘pdin’, version = 0, 0) { for (i=0; ; i++) { // to end of box unsigned int(32) rate; unsigned int(32) initial_delay; } }

8.1.3.3 Semantics

rateisadownloadrateexpressedinbytes/secondinitial_delay is the suggested delay to use when playing the file, such that if download

continuesat thegivenrate,alldatawithin the filewillarrive in time for itsuseandplaybackshouldnotneedtostall.

8.2 Movie Structure

8.2.1 Movie Box

8.2.1.1 Definition

BoxType: ‘moov’Container: FileMandatory:YesQuantity: Exactlyone

ThemetadataforapresentationisstoredinthesingleMovieBoxwhichoccursatthetop‐levelofafile.Normallythisboxisclosetothebeginningorendofthefile,thoughthisisnotrequired.

8.2.1.2 Syntax

aligned(8) class MovieBox extends Box(‘moov’){ }

8.2.2 Movie Header Box

8.2.2.1 Definition

BoxType: ‘mvhd’Container: MovieBox(‘moov’)Mandatory:YesQuantity: Exactlyone

This box defines overall information which is media‐independent, and relevant to the entirepresentationconsideredasawhole.

Page 37: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 23

8.2.2.2 Syntax

aligned(8) class MovieHeaderBox extends FullBox(‘mvhd’, version, 0) { if (version==1) { unsigned int(64) creation_time; unsigned int(64) modification_time; unsigned int(32) timescale; unsigned int(64) duration; } else { // version==0 unsigned int(32) creation_time; unsigned int(32) modification_time; unsigned int(32) timescale; unsigned int(32) duration; } template int(32) rate = 0x00010000; // typically 1.0 template int(16) volume = 0x0100; // typically, full volume const bit(16) reserved = 0; const unsigned int(32)[2] reserved = 0; template int(32)[9] matrix = { 0x00010000,0,0,0,0x00010000,0,0,0,0x40000000 }; // Unity matrix bit(32)[6] pre_defined = 0; unsigned int(32) next_track_ID; }

8.2.2.3 Semantics

version isanintegerthatspecifiestheversionofthisbox(0or1inthisspecification)creation_time is an integer that declares the creation time of the presentation (in seconds

sincemidnight,Jan.1,1904,inUTCtime)modification_time is an integer that declares the most recent time the presentation was

modified(insecondssincemidnight,Jan.1,1904,inUTCtime)timescale is an integer that specifies the time‐scale for the entire presentation; this is the

number of time units that pass in one second. For example, a time coordinate system thatmeasurestimeinsixtiethsofasecondhasatimescaleof60.

duration isanintegerthatdeclareslengthofthepresentation(intheindicatedtimescale).Thisproperty is derived from the presentation’s tracks: the value of this field corresponds to theduration of the longest track in the presentation. If the duration cannot be determined thendurationissettoall1s.

rate isafixedpoint16.16numberthatindicatesthepreferredratetoplaythepresentation;1.0(0x00010000)isnormalforwardplayback

volume isafixedpoint8.8numberthatindicatesthepreferredplaybackvolume.1.0(0x0100)isfullvolume.

matrix providesatransformationmatrixforthevideo;(u,v,w)arerestrictedhereto(0,0,1),hexvalues(0,0,0x40000000).

next_track_IDisanon‐zerointegerthatindicatesavaluetouseforthetrackIDofthenexttrackto be added to this presentation. Zero is not a valid track ID value. The value ofnext_track_IDshallbelargerthanthelargesttrack‐IDinuse.Ifthisvalueisequaltoall1s(32‐bitmaxint),andanewmediatrackistobeadded,thenasearchmustbemadeinthefileforanunusedtrackidentifier.

Page 38: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

24 ©ISO/IEC2015–Allrightsreserved

8.3 Track Structure

8.3.1 Track Box

8.3.1.1 Definition

BoxType: ‘trak’Container: MovieBox(‘moov’)Mandatory:YesQuantity: Oneormore

Thisisacontainerboxforasingletrackofapresentation.Apresentationconsistsofoneormoretracks.Each track is independent of the other tracks in the presentation and carries its own temporal andspatialinformation.EachtrackwillcontainitsassociatedMediaBox.

Tracks are used for two purposes: (a) to contain media data (media tracks) and (b) to containpacketizationinformationforstreamingprotocols(hinttracks).

ThereshallbeatleastonemediatrackwithinanISOfile,andallthemediatracksthatcontributedtothehinttracksshallremaininthefile,evenifthemediadatawithinthemisnotreferencedbythehinttracks;afterdeletingallhinttracks,theentireun‐hintedpresentationshallremain.

8.3.1.2 Syntax

aligned(8) class TrackBox extends Box(‘trak’) { }

8.3.2 Track Header Box

8.3.2.1 Definition

BoxType: ‘tkhd’Container: TrackBox(‘trak’)Mandatory:YesQuantity: Exactlyone

Thisboxspecifiesthecharacteristicsofasingletrack.ExactlyoneTrackHeaderBoxiscontainedinatrack.

In the absence of an edit list, the presentation of a track starts at the beginning of the overallpresentation.Anemptyeditisusedtooffsetthestarttimeofatrack.

The default value of the track header flags for media tracks is 7 (track_enabled, track_in_movie,track_in_preview).Ifinapresentationalltrackshaveneithertrack_in_movienortrack_in_previewset,thenalltracksshallbetreatedasifbothflagsweresetonalltracks.Serverhinttracksshouldhavethetrack_in_movieandtrack_in_previewsetto0,sothattheyareignoredforlocalplaybackandpreview.

Page 39: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 25

Underthe‘iso3’brandorbrandsthatshareitsrequirements,thewidthandheightinthetrackheaderaremeasuredonanotional'square'(uniform)grid.Trackvideodataisnormalizedtothesedimensions(logically) before any transformation or placement caused by a layup or composition system. Track(andmovie)matrices,ifused,alsooperateinthisuniformly‐scaledspace.

Thedurationfieldheredoesnotincludethedurationoffollowingmoviefragments,ifany,butonlyofthemedia in theenclosingMovieBox.TheMovieExtendsHeaderboxmaybeused todocument thedurationincludingmoviefragments,whendesiredandpossible.

8.3.2.2 Syntax

aligned(8) class TrackHeaderBox extends FullBox(‘tkhd’, version, flags){ if (version==1) { unsigned int(64) creation_time; unsigned int(64) modification_time; unsigned int(32) track_ID; const unsigned int(32) reserved = 0; unsigned int(64) duration; } else { // version==0 unsigned int(32) creation_time; unsigned int(32) modification_time; unsigned int(32) track_ID; const unsigned int(32) reserved = 0; unsigned int(32) duration; } const unsigned int(32)[2] reserved = 0; template int(16) layer = 0; template int(16) alternate_group = 0; template int(16) volume = {if track_is_audio 0x0100 else 0}; const unsigned int(16) reserved = 0; template int(32)[9] matrix= { 0x00010000,0,0,0,0x00010000,0,0,0,0x40000000 }; // unity matrix unsigned int(32) width; unsigned int(32) height; }

8.3.2.3 Semantics

version isanintegerthatspecifiestheversionofthisbox(0or1inthisspecification)flags isa24‐bitintegerwithflags;thefollowingvaluesaredefined:

Track_enabled:Indicatesthatthetrackisenabled.Flagvalueis0x000001.Adisabledtrack(thelowbitiszero)istreatedasifitwerenotpresent.

Track_in_movie:Indicatesthatthetrackisusedinthepresentation.Flagvalueis0x000002.Track_in_preview:Indicatesthatthetrackisusedwhenpreviewingthepresentation.Flagvalue

is0x000004.Track_size_is_aspect_ratio: Indicates that thewidth andheight fields are not expressed in

pixelunits.Thevalueshavethesameunitsbuttheseunitsarenotspecified.Thevaluesareonly an indication of the desired aspect ratio. If the aspect ratios of this track and otherrelated tracksarenot identical, then therespectivepositioningof the tracks isundefined,possiblydefinedbyexternalcontexts.Flagvalueis0x000008.

creation_time is an integer that declares the creation time of this track (in seconds sincemidnight,Jan.1,1904,inUTCtime).

modification_time isanintegerthatdeclaresthemostrecenttimethetrackwasmodified(insecondssincemidnight,Jan.1,1904,inUTCtime).

track_ID is an integer that uniquely identifies this track over the entire life‐time of thispresentation.TrackIDsareneverre‐usedandcannotbezero.

Page 40: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

26 ©ISO/IEC2015–Allrightsreserved

duration isanintegerthatindicatesthedurationofthistrack(inthetimescaleindicatedintheMovieHeaderBox).Thevalueofthisfieldisequaltothesumofthedurationsofallofthetrack’sedits.Ifthereisnoeditlist,thenthedurationisthesumofthesampledurations,convertedintothetimescaleintheMovieHeaderBox.Ifthedurationofthistrackcannotbedeterminedthendurationissettoall1s.

layer specifiesthefront‐to‐backorderingofvideotracks;trackswithlowernumbersareclosertotheviewer.0isthenormalvalue,and‐1wouldbeinfrontoftrack0,andsoon.

alternate_group is an integer that specifies a group or collection of tracks. If this field is 0thereisnoinformationonpossiblerelationstoothertracks.Ifthisfieldisnot0,itshouldbethesamefortracksthatcontainalternatedataforoneanotheranddifferentfortracksbelongingtodifferentsuchgroups.Onlyonetrackwithinanalternategroupshouldbeplayedorstreamedatanyonetime,andmustbedistinguishablefromothertracksinthegroupviaattributessuchasbitrate,codec,language,packetsizeetc.Agroupmayhaveonlyonemember.

volume isafixed8.8valuespecifyingthetrack'srelativeaudiovolume.Fullvolumeis1.0(0x0100)andisthenormalvalue.Itsvalueisirrelevantforapurelyvisualtrack.Tracksmaybecomposedbycombiningthemaccordingtotheirvolume,andthenusingtheoverallMovieHeaderBoxvolumesetting;ormorecomplexaudiocomposition(e.g.MPEG‐4BIFS)maybeused.

matrix providesatransformationmatrixforthevideo;(u,v,w)arerestrictedhereto(0,0,1),hex(0,0,0x40000000).

width and height fixed‐point16.16valuesaretrack‐dependentasfollows:

Fortextandsubtitletracks,theymay,dependingonthecodingformat,describethesuggestedsizeoftherenderingarea.Forsuchtracks,thevalue0x0mayalsobeusedtoindicatethatthedatamayberenderedatanysize,thatnopreferredsizehasbeenindicatedandthattheactualsizemaybedeterminedbytheexternalcontextorbyreusingthewidthandheightofanothertrack.Forthosetracks,theflagtrack_size_is_aspect_ratiomayalsobeused.

Fornon‐visualtracks(e.g.audio),theyshouldbesettozero.

Forallothertracks,theyspecifythetrack'svisualpresentationsize.Theseneednotbethesameas the pixel dimensions of the images,which is documented in the sample description(s); allimages in the sequence are scaled to this size, before anyoverall transformationof the trackrepresentedbythematrix.Thepixeldimensionsoftheimagesarethedefaultvalues.

8.3.3 Track Reference Box

8.3.3.1 Definition

BoxType: `tref’Container: TrackBox(‘trak’)Mandatory:NoQuantity: Zeroorone

This box provides a reference from the containing track to another track in the presentation. Thesereferencesaretyped.A‘hint’referencelinksfromthecontaininghinttracktothemediadatathatithints. A content description reference‘cdsc’ links a descriptive or metadata track to the contentwhichitdescribes.The‘hind’dependencyindicatesthatthereferencedtrack(s)maycontainmediadatarequiredfordecodingof thetrackcontainingthetrackreference.Thereferencedtracksshallbehint tracks. The ‘hind’ dependency can, for example, be used for indicating the dependenciesbetweenhinttracksdocumentinglayeredIPmulticastoverRTP.

ExactlyoneTrackReferenceBoxcanbecontainedwithintheTrackBox.

Page 41: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 27

Ifthisboxisnotpresent,thetrackisnotreferencinganyothertrackinanyway.Thereferencearrayissizedtofillthereferencetypebox.

8.3.3.2 Syntax

aligned(8) class TrackReferenceBox extends Box(‘tref’) { }

aligned(8) class TrackReferenceTypeBox (unsigned int(32) reference_type) extends Box(reference_type) { unsigned int(32) track_IDs[]; }

8.3.3.3 Semantics

TheTrackReferenceBoxcontainstrackreferencetypeboxes.

track_ID isanintegerthatprovidesareferencefromthecontainingtracktoanothertrackinthepresentation.track_IDsareneverre‐usedandcannotbeequaltozero.

Thereference_type shallbesettooneofthefollowingvalues,oravalueregisteredorfromaderivedspecificationorregistration: ‘hint’ thereferencedtrack(s)containtheoriginalmediaforthishinttrack.

‘cdsc‘ thistrackdescribesthereferencedtrack.

‘font‘ thistrackusesfontscarried/definedinthereferencedtrack.

‘hind‘ this trackdependsonthereferencedhint track, i.e., itshouldonlybeused if thereferencedhinttrackisused.

‘vdep’ this track contains auxiliary depth video information for the referenced videotrack.

‘vplx’ this track contains auxiliary parallax video information for the referenced videotrack.

‘subt’ this track contains subtitle, timed text or overlay graphical information for thereferencedtrackoranytrackinthealternategrouptowhichthetrackbelongs,ifany.

8.3.4 Track Group Box

8.3.4.1 Definition

BoxType: ‘trgr’Container: TrackBox(‘trak’)Mandatory: NoQuantity: Zeroorone

Thisboxenablesindicationofgroupsoftracks,whereeachgroupsharesaparticularcharacteristicorthetrackswithinagrouphaveaparticularrelationship.Theboxcontainszeroormoreboxes,andtheparticular characteristic or the relationship is indicated by the box type of the contained boxes. Thecontainedboxesincludeanidentifier,whichcanbeusedtoconcludethetracksbelongingtothesame

Page 42: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

28 ©ISO/IEC2015–Allrightsreserved

trackgroup.ThetracksthatcontainthesametypeofacontainedboxwithintheTrackGroupBoxandhavethesameidentifiervaluewithinthesecontainedboxesbelongtothesametrackgroup.

Trackgroupsshallnotbeusedtoindicatedependencyrelationshipsbetweentracks.Instead,theTrackReferenceBoxisusedforsuchpurposes.

8.3.4.2 Syntax

aligned(8) class TrackGroupBox('trgr') { }

aligned(8) class TrackGroupTypeBox(unsigned int(32) track_group_type) extends FullBox(track_group_type, version = 0, flags = 0) { unsigned int(32) track_group_id; // the remaining data may be specified for a particular track_group_type }

8.3.4.3 Semantics

track_group_type indicatesthegroupingtypeandshallbesettooneofthefollowingvalues,oravalueregistered,oravaluefromaderivedspecificationorregistration:

'msrc' indicates that this track belongs to amulti‐source presentation. The tracks thathave the same value of track_group_id within a Group Type Box oftrack_group_type 'msrc' are mapped as being originated from the samesource. For example, a recording of a video telephony callmayhaveboth audioand video for both participants, and the value oftrack_group_id associatedwith theaudio trackand thevideo trackofoneparticipantdiffers fromvalueoftrack_group_idassociatedwiththetracksoftheotherparticipant.

Thepairof track_group_idandtrack_group_typeidentifiesatrackgroupwithinthefile.Thetracks that contain a particular track group type box having the same value of track_group_idbelongtothesametrackgroup.

8.4 Track Media Structure

8.4.1 Media Box

8.4.1.1 Definition

BoxType: ‘mdia’Container: TrackBox(‘trak’)Mandatory:YesQuantity: Exactlyone

Themediadeclarationcontainercontainsalltheobjectsthatdeclareinformationaboutthemediadatawithinatrack.

8.4.1.2 Syntax

aligned(8) class MediaBox extends Box(‘mdia’) { }

Page 43: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 29

8.4.2 Media Header Box

8.4.2.1 Definition

BoxType: ‘mdhd’Container: MediaBox(‘mdia’)Mandatory:YesQuantity: Exactlyone

The media header declares overall information that is media‐independent, and relevant tocharacteristicsofthemediainatrack.

8.4.2.2 Syntax

aligned(8) class MediaHeaderBox extends FullBox(‘mdhd’, version, 0) { if (version==1) { unsigned int(64) creation_time; unsigned int(64) modification_time; unsigned int(32) timescale; unsigned int(64) duration; } else { // version==0 unsigned int(32) creation_time; unsigned int(32) modification_time; unsigned int(32) timescale; unsigned int(32) duration; } bit(1) pad = 0; unsigned int(5)[3] language; // ISO-639-2/T language code unsigned int(16) pre_defined = 0; }

8.4.2.3 Semantics

version isanintegerthatspecifiestheversionofthisbox(0or1)creation_time isanintegerthatdeclaresthecreationtimeofthemediainthistrack(in

secondssincemidnight,Jan.1,1904,inUTCtime).modification_time isanintegerthatdeclaresthemostrecenttimethemediainthistrackwas

modified(insecondssincemidnight,Jan.1,1904,inUTCtime).timescale isanintegerthatspecifiesthetime‐scaleforthismedia;thisisthenumberoftime

unitsthatpassinonesecond.Forexample,atimecoordinatesystemthatmeasurestimeinsixtiethsofasecondhasatimescaleof60.

durationisanintegerthatdeclaresthedurationofthismedia(inthescaleofthetimescale).Ifthedurationcannotbedeterminedthendurationissettoall1s.

language declares the language code for this media. See ISO 639‐2/T for the set of threecharactercodes.Eachcharacter ispackedas thedifferencebetween itsASCIIvalueand0x60.Sincethecodeisconfinedtobeingthreelower‐caseletters,thesevaluesarestrictlypositive.

8.4.3 Handler Reference Box

8.4.3.1 Definition

BoxType: ‘hdlr’Container: MediaBox(‘mdia’)orMetaBox(‘meta’)Mandatory:YesQuantity: Exactlyone

ThisboxwithinaMediaBoxdeclaresmediatypeofthetrack,andthustheprocessbywhichthemedia‐data in the track is presented. For example, a format forwhich thedecoderdelivers videowouldbe

Page 44: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

30 ©ISO/IEC2015–Allrightsreserved

stored in a video track, identified by being handled by a video handler. The documentation of thestorageofamediaformatidentifiesthemediatypewhichthatformatuses.

ThisboxwhenpresentwithinaMetaBox,declaresthestructureorformatofthe'meta'boxcontents.

There is a general handler formetadata streams of any type; the specific format is identified by thesampleentry,asforvideooraudio,forexample.

8.4.3.2 Syntax

aligned(8) class HandlerBox extends FullBox(‘hdlr’, version = 0, 0) { unsigned int(32) pre_defined = 0; unsigned int(32) handler_type; const unsigned int(32)[3] reserved = 0; string name; }

8.4.3.3 Semantics

version isanintegerthatspecifiestheversionofthisboxhandler_type–whenpresentinamediabox,containsavalueasdefinedinclause12,oravaluefromaderivedspecification,orregistration.

-- whenpresentinametabox,containsanappropriatevaluetoindicatetheformatofthemetaboxcontents.Thevalue‘null’ canbeusedintheprimarymetaboxtoindicatethatitismerelybeingusedtoholdresources.

name isanull‐terminatedstringinUTF‐8characterswhichgivesahuman‐readablenameforthetracktype(fordebuggingandinspectionpurposes).

8.4.4 Media Information Box

8.4.4.1 Definition

BoxType: ‘minf’Container: MediaBox(‘mdia’)Mandatory:YesQuantity: Exactlyone

Thisboxcontainsalltheobjectsthatdeclarecharacteristicinformationofthemediainthetrack.

8.4.4.2 Syntax

aligned(8) class MediaInformationBox extends Box(‘minf’) { }

8.4.5 Media Information Header Boxes

8.4.5.1 Definition

Thereisadifferentmediainformationheaderforeachtracktype(correspondingtothemediahandler‐type); thematchingheader shallbepresent,whichmaybeoneof thosedefined in clause12, oronedefinedinaderivedspecification.

Thetypeofmediaheaderisusedisdeterminedbythedefinitionofthemediatypeandmustmatchthemediahandler.

Page 45: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 31

8.4.5.2 Null Media Header Box

8.4.5.2.1 Definition

BoxTypes: ‘nmhd’Container: MediaInformationBox(‘minf’)Mandatory:YesQuantity: Exactlyonespecificmediaheadershallbepresent

StreamsforwhichnospecificmediaheaderisidentifieduseanullMediaHeaderBox,asdefinedhere.

8.4.5.2.2 Syntax

aligned(8) class NullMediaHeaderBox extends FullBox(’nmhd’, version = 0, flags) { } 8.4.5.2.3 Semantics

version ‐isanintegerthatspecifiestheversionofthisbox.flags ‐isa24‐bitintegerwithflags(currentlyallzero).

8.4.6 Extended language tag

8.4.6.1 Definition

BoxType: ‘elng’Container: MediaBox(‘mdia’)Mandatory:NoQuantity: Zeroorone

The extended language tag box represents media language information, based on RFC 4646 (BestCommonPractices–BCP–47)industrystandard.It isanoptionalpeerofthemediaheaderbox,andmustoccurafterthemediaheaderbox.

TheextendedlanguagetagcanprovidebetterlanguageinformationthanthelanguagefieldintheMediaHeader,includinginformationsuchasregion,script,variation,andsoon,asparts(orsubtags).

Theextendedlanguagetagboxisoptional,andif it isabsentthemedialanguageshouldbeused.Theextendedlanguagetagoverridesthemedialanguageiftheyarenotconsistent.

Forbestcompatibilitywithearlierplayers,ifanextendedlanguagetagisspecified,themostcompatiblelanguagecodeshouldbespecifiedinthelanguagefieldoftheMediaHeaderbox(forexample,"eng"iftheextendedlanguagetagis"en‐UK").Ifthereisnoreasonablycompatibletag,thepackedformof'und'canbeused.

8.4.6.2 Syntax

aligned(8) class ExtendedLanguageBox extends FullBox(‘elng’, 0, 0) { string extended_language; }

Page 46: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

32 ©ISO/IEC2015–Allrightsreserved

8.4.6.3 Semantics

extended_languageisaNULL‐terminatedCstringcontaininganRFC4646(BCP47)compliantlanguagetagstring,suchas"en‐US","fr‐FR",or"zh‐CN".

8.5 Sample Tables

8.5.1 Sample Table Box

8.5.1.1 Definition

BoxType: ‘stbl’Container: MediaInformationBox(‘minf’)Mandatory:YesQuantity: Exactlyone

The sample table contains all the time and data indexing of themedia samples in a track. Using thetables here, it is possible to locate samples in time, determine their type (e.g. I‐frame or not), anddeterminetheirsize,container,andoffsetintothatcontainer.

IfthetrackthatcontainstheSampleTableBoxreferencesnodata,thentheSampleTableBoxdoesnotneedtocontainanysub‐boxes(thisisnotaveryusefulmediatrack).

IfthetrackthattheSampleTableBoxiscontainedindoesreferencedata,thenthefollowingsub‐boxesarerequired:SampleDescription,SampleSize,SampleToChunk,andChunkOffset.Further,theSampleDescription Box shall contain at least one entry. A Sample Description Box is required because itcontainsthedatareferenceindexfieldwhichindicateswhichDataReferenceBoxtousetoretrievethemedia samples. Without the Sample Description, it is not possible to determine where the mediasamplesarestored.TheSyncSampleBoxisoptional.IftheSyncSampleBoxisnotpresent,allsamplesaresyncsamples.

A.7providesanarrativedescriptionofrandomaccessusingthestructuresdefinedintheSampleTableBox.

8.5.1.2 Syntax

aligned(8) class SampleTableBox extends Box(‘stbl’) { }

8.5.2 Sample Description Box

8.5.2.1 Definition

BoxTypes: ‘stsd’Container: SampleTableBox(‘stbl’)Mandatory:YesQuantity: Exactlyone

The sample description table gives detailed information about the coding type used, and anyinitializationinformationneededforthatcoding.

Page 47: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 33

Theinformationstoredinthesampledescriptionboxaftertheentry‐countisbothtrack‐typespecificas documented here, and can also have variantswithin a track type (e.g. different codingsmay usedifferentspecificinformationaftersomecommonfields,evenwithinavideotrack).

Which typeof sample entry form isused isdeterminedby themediahandler, using a suitable form,suchasonedefinedinclause12,ordefinedinaderivedspecification,orregistration.

Multipledescriptionsmaybeusedwithinatrack.

Note Thoughthecountis32bits,thenumberofitemsisusuallymuchfewer,andisrestrictedbythefactthatthereferenceindexinthesampletableisonly16bits

If the ‘format’ field of a SampleEntry is unrecognized, neither the sample description itself, nor theassociatedmediasamples,shallbedecoded.

Note The definition of sample entries specifies boxes in a particular order, and this is usually also followed inderivedspecifications.Formaximumcompatibility,writers shouldconstruct files respecting theorderbothwithinspecificationsandasimpliedbytheinheritance,whereasreadersshouldbepreparedtoacceptanyboxorder.

Allstringfieldsshallbenull‐terminated,evenifunused.“Optional”meansthereisatleastonenullbyte.

Entries that identify the format by MIME type, such as a TextSubtitleSampleEntry,TextMetaDataSampleEntry,orSimpleTextSampleEntry,allofwhichcontainaMIMEtype,maybeusedtoidentifytheformatofstreamsforwhichaMIMEtypeapplies.AMIMEtypeappliesifthecontentsofthestringintheoptionalconfigurationbox(withoutitsnulltermination),followedbythecontentsofaset of samples, startingwith a sync sample and ending at the sample immediately preceding a syncsample, are concatenated in their entirety, and the result meets the decoding requirements fordocuments of that MIME type. Non‐sync samples should be used only if that format specifies thebehaviour of ‘progressive decoding’, and then the sample times indicate when the results of suchprogressivedecodingshouldbepresented(accordingtothemediatype).

Note ThesamplesinatrackthatisallsyncsamplesarethereforeeachavaliddocumentforthatMIMEtype.

Insomeclassesderived fromSampleEntry,namespaceandschema_locationareusedbothto identifythe XML document content and to declare “brand” or profile compatibility. Multiple namespaceidentifiers indicatethat thetrackconformstothespecificationrepresentedbyeachof the identifiers,someofwhichmayidentifysupersetsofthefeaturespresent.Adecodershouldbeabletodecodeallthenamespacesinordertobeabletodecodeandpresentcorrectlythemediaassociatedwiththissampleentry.

Note Additionally, namespace identifiers may represent performance constraints, such as limits ondocument size, font size, drawing rate, etc., as well as syntax constraints such as features that are notpermittedorignored.

8.5.2.2 Syntax

aligned(8) abstract class SampleEntry (unsigned int(32) format) extends Box(format){ const unsigned int(8)[6] reserved = 0; unsigned int(16) data_reference_index; }

Page 48: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

34 ©ISO/IEC2015–Allrightsreserved

class BitRateBox extends Box(‘btrt’){ unsigned int(32) bufferSizeDB; unsigned int(32) maxBitrate; unsigned int(32) avgBitrate; }

aligned(8) class SampleDescriptionBox (unsigned int(32) handler_type) extends FullBox('stsd', version, 0){ int i ; unsigned int(32) entry_count; for (i = 1 ; i <= entry_count ; i++){ SampleEntry(); // an instance of a class derived from SampleEntry } }

8.5.2.3 Semantics

version issettozerounlesstheboxcontainsanAudioSampleEntryV1,whereuponversionmustbe1

entry_countisanintegerthatgivesthenumberofentriesinthefollowingtableSampleEntryistheappropriatesampleentry.data_reference_index is an integer that contains the index of the data reference to use to

retrieve data associated with samples that use this sample description. Data references arestoredinDataReferenceBoxes.Theindexrangesfrom1tothenumberofdatareferences.

bufferSizeDBgivesthesizeofthedecodingbufferfortheelementarystreaminbytes.maxBitrategivesthemaximumrateinbits/secondoveranywindowofonesecond.avgBitrategivestheaveragerateinbits/secondovertheentirepresentation.

8.5.3 Degradation Priority Box

8.5.3.1 Definition

BoxType: ‘stdp’Container: SampleTableBox(‘stbl’).Mandatory:No.Quantity: Zeroorone.

Thisboxcontainsthedegradationpriorityofeachsample.Thevaluesarestored inthetable,one foreachsample.Thesizeof the table,sample_count is taken fromthesample_count in theSampleSizeBox('stsz').Specificationsderivedfromthisdefinetheexactmeaningandacceptablerangeofthepriorityfield.

8.5.3.2 Syntax

aligned(8) class DegradationPriorityBox extends FullBox(‘stdp’, version = 0, 0) { int i; for (i=0; i < sample_count; i++) { unsigned int(16) priority; } }

8.5.3.3 Semantics

version ‐isanintegerthatspecifiestheversionofthisbox.priority ‐isintegerspecifyingthedegradationpriorityforeachsample.

Page 49: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 35

8.5.4 Sample Scale Box

(emptysub‐clause)

8.6 Track Time Structures

8.6.1 Time to Sample Boxes

8.6.1.1 Definition

Thecompositiontimes(CT)anddecodingtimes(DT)ofsamplesarederivedfromtheTimetoSampleBoxes,ofwhichtherearetwotypes.ThedecodingtimeisdefinedintheDecodingTimetoSampleBox,giving time deltas between successive decoding times. The composition times are derived in theCompositionTime toSampleBoxascomposition timeoffsets fromdecoding time. If thecompositiontimesanddecodingtimesare identical foreverysample inthetrack, thenonlytheDecodingTimetoSampleBoxisrequired;thecompositiontimetosampleboxmustnotbepresent.

Thetimetosampleboxesmustgivenon‐zerodurationsforallsampleswiththepossibleexceptionofthelastone.Durationsinthe‘stts’boxarestrictlypositive(non‐zero),exceptfortheverylastentry,whichmaybe zero.This rulederives from the rule thatno two time‐stamps in a streammaybe thesame.Greatcaremustbetakenwhenaddingsamplestoastream,thatthesamplethatwaspreviouslylastmayneedtohaveanon‐zerodurationestablished,inordertoobservethisrule.Ifthedurationofthelastsampleisindeterminate,useanarbitrarysmallvalueanda‘dwell’edit.

Somecodingsystemsmayallowsamples thatareusedonly forreferenceandnotoutput (e.g.anon‐displayed reference frame in video). When any such non‐output sample is present in a track, thefollowingapplies:

1) Anon‐outputsampleshallbegivenacompositiontimewhichisoutsidethetime‐rangeofthesamplesthatareoutput.

2) Aneditlistshallbeusedtoexcludethecompositiontimesofthenon‐outputsamples.

3) WhenthetrackincludesaCompositionOffsetBox(‘ctts’),

a. version1oftheCompositionOffsetBoxshallbeused,

b. thevalueofsample_offsetshallbesetequaltothemostnegativenumberpossible(for32‐bitvalues,‐231)foreachnon‐outputsample,

c. theCompositionToDecodeBox(‘cslg’)shouldbecontainedintheSampleTableBox(‘stbl’)ofthetrack,and

d. whentheCompositionToDecodeBoxispresentforthetrack,thevalueofleastDecodeToDisplayDeltafieldintheboxshallbeequaltothesmallestcompositionoffsetintheCompositionOffsetBoxexcludingthesample_offsetvaluesfornon‐outputsamples.

Note Thus,leastDecodeToDisplayDeltaisgreaterthan‐231.

Inthefollowingexample,thereisasequenceofI,P,andBframes,eachwithadecodingtimedeltaof10.The samples are stored as follows, with the indicated values for their decoding time deltas andcompositiontimeoffsets(theactualCTandDTaregivenforreference).There‐orderingoccursbecausethepredictedPframesmustbedecodedbeforethebi‐directionallypredictedBframes.ThevalueofDT

Page 50: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

36 ©ISO/IEC2015–Allrightsreserved

forasampleisalwaysthesumofthedeltasoftheprecedingsamples.Notethatthetotalofthedecodingdeltasisthedurationofthemediainthistrack.

Table 2 — Closed GOP Example

GOP /‐‐ ‐‐‐ ‐‐‐ ‐‐‐ ‐‐‐ ‐‐‐ ‐‐\ /‐‐ ‐‐‐ ‐‐‐ ‐‐‐ ‐‐‐ ‐‐‐ ‐‐\

I1 P4 B2 B3 P7 B5 B6 I8 P11 B9 B10 P14 B12 B13

DT 0 10 20 30 40 50 60 70 80 90 100 110 120 130

CT 10 40 20 30 70 50 60 80 110 90 100 140 120 130

Decodedelta 10 10 10 10 10 10 10 10 10 10 10 10 10 10

Compositionoffset

10 30 0 0 30 0 0 10 30 0 0 30 0 0

Table 3 — Open GOP Example

GOP /‐‐ ‐‐ ‐‐ ‐‐ ‐‐ ‐‐\ /‐ ‐‐ ‐‐ ‐‐ ‐‐‐ ‐‐\ I3 B1 B2 P6 B4 B5 I9 B7 B8 P12 B10 B11DT 0 10 20 30 40 50 60 70 80 90 100 110CT 30 10 20 60 40 50 90 70 80 120 100 110DecodeDelta 10 10 10 10 10 10 10 10 10 10 10 10Compositionoffset

30 0 0 30 0 0 30 0 0 30 0 0

8.6.1.2 Decoding Time to Sample Box

8.6.1.2.1 Definition

BoxType: ‘stts’Container: SampleTableBox(‘stbl’)Mandatory:YesQuantity: Exactlyone

This box contains a compact version of a table that allows indexing from decoding time to samplenumber.Othertablesgivesamplesizesandpointers,fromthesamplenumber.Eachentryinthetablegivesthenumberofconsecutivesampleswiththesametimedelta,andthedeltaofthosesamples.Byaddingthedeltasacompletetime‐to‐samplemapmaybebuilt.

TheDecoding Time to Sample Box contains decode time delta's: DT(n+1) = DT(n) + STTS(n)whereSTTS(n)isthe(uncompressed)tableentryforsamplen.

Thesampleentriesareorderedbydecodingtimestamps;thereforethedeltasareallnon‐negative.

TheDTaxishasazeroorigin;DT(i)=SUM(forj=0toi‐1ofdelta(j)),andthesumofalldeltasgivesthelengthofthemediainthetrack(notmappedtotheoveralltimescale,andnotconsideringanyeditlist).

TheEditListBoxprovidestheinitialCTvalueifitisnon‐empty(non‐zero).

8.6.1.2.2 Syntax

Page 51: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 37

aligned(8) class TimeToSampleBox extends FullBox(’stts’, version = 0, 0) { unsigned int(32) entry_count; int i; for (i=0; i < entry_count; i++) { unsigned int(32) sample_count; unsigned int(32) sample_delta; } }

ForexamplewithTable2,theentrywouldbe:

SamplecountSample‐delta

14 10

8.6.1.2.3 Semantics

version ‐isanintegerthatspecifiestheversionofthisbox.entry_count‐isanintegerthatgivesthenumberofentriesinthefollowingtable.sample_count‐isanintegerthatcountsthenumberofconsecutivesamplesthathavethegiven

duration.sample_delta‐isanintegerthatgivesthedeltaofthesesamplesinthetime‐scaleofthemedia.

8.6.1.3 Composition Time to Sample Box

8.6.1.3.1 Definition

BoxType: ‘ctts’Container: SampleTableBox(‘stbl’)Mandatory:NoQuantity: Zeroorone

Thisboxprovidestheoffsetbetweendecodingtimeandcompositiontime.Inversion0ofthisboxthedecoding time must be less than the composition time, and the offsets are expressed as unsignednumbers such that CT(n) = DT(n) + CTTS(n) where CTTS(n) is the (uncompressed) table entry forsamplen.Inversion1ofthisbox,thecompositiontimelineandthedecodingtimelinearestillderivedfrom each other, but the offsets are signed. It is recommended that for the computed compositiontimestamps,thereisexactlyonewiththevalue0(zero).

Foreitherversionofthebox,eachsamplemusthaveauniquecompositiontimestampvalue,thatis,thetimestampfortwosamplesshallneverbethesame.

Itmaybetruethatthereisnoframetocomposeattime0;thehandlingofthisisunspecified(systemsmightdisplaythefirstframeforlonger,orasuitablefillcolour).

Whenversion1of thisbox isused, theCompositionToDecodeBoxmayalsobepresent in thesampletabletorelatethecompositionanddecodingtimelines.Whenbackwards‐compatibilityorcompatibilitywithanunknownsetofreadersisdesired,version0ofthisboxshouldbeusedwhenpossible.Ineitherversionofthisbox,butparticularlyunderversion0,ifitisdesiredthatthemediastartattracktime0,andthefirstmediasampledoesnothaveacompositiontimeof0,aneditlistmaybeusedto‘shift’themediatotime0.

Page 52: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

38 ©ISO/IEC2015–Allrightsreserved

ThecompositiontimetosampletableisoptionalandmustonlybepresentifDTandCTdifferforanysamples.

Hinttracksdonotusethisbox.

ForexampleinTable2

Samplecount Sample_offset

1 10

1 30

2 0

1 30

2 0

1 10

1 30

2 0

1 30

2 0

8.6.1.3.2 Syntax

aligned(8) class CompositionOffsetBox extends FullBox(‘ctts’, version, 0) { unsigned int(32) entry_count; int i; if (version==0) { for (i=0; i < entry_count; i++) { unsigned int(32) sample_count; unsigned int(32) sample_offset; } } else if (version == 1) { for (i=0; i < entry_count; i++) { unsigned int(32) sample_count; signed int(32) sample_offset; } } }

8.6.1.3.3 Semantics

version ‐isanintegerthatspecifiestheversionofthisbox.entry_count isanintegerthatgivesthenumberofentriesinthefollowingtable.sample_count isan integer thatcounts thenumberofconsecutivesamples thathavethegiven

offset.sample_offsetisanintegerthatgivestheoffsetbetweenCTandDT,suchthatCT(n)=DT(n)+

CTTS(n).

Page 53: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 39

8.6.1.4 Composition to Decode Box

8.6.1.4.1 Definition

BoxType: ‘cslg’Container: SampleTableBox(‘stbl’)orTrackExtensionPropertiesBox(‘trep’)Mandatory:NoQuantity: Zeroorone

Whensignedcompositionoffsetsareused,thisboxmaybeusedtorelatethecompositionanddecodingtimelines,anddealwithsomeoftheambiguitiesthatsignedcompositionoffsetsintroduce.

Note that all these fields apply to the entire media (not just that selected by any edits). It isrecommendedthatanyedits,explicitorimplied,notselectanyportionofthecompositiontimelinethatdoesnotmaptoasample.Forexample,ifthesmallestcompositiontimeis1000,thenthedefaulteditfrom0tothemediadurationleavestheperiodfrom0to1000associatedwithnomediasample.Playerbehaviour, and what is composed in this interval, is undefined under these circumstances. It isrecommendedthatthesmallestcomputedCTSbezero,ormatchthebeginningofthefirstedit.

Thecompositiondurationof the lastsample ina trackmightbe(often is)ambiguousorunclear; thefield for composition end time can be used to clarify this ambiguity and,with the composition starttime,establishaclearcompositiondurationforthetrack.

When the Composition to Decode Box is included in the Sample Table Box, it documents thecomposition and decoding time relations of the samples in the Movie Box only, not including anysubsequentmoviefragments.WhentheCompositiontoDecodeBoxisincludedintheTrackExtensionPropertiesBox,itdocumentsthecompositionanddecodingtimerelationsofthesamplesinallmoviefragmentsfollowingtheMovieBox.

Version1ofthisboxsupports64‐bittimestampsandshouldonlybeusedifneeded(atleastonevaluedoesnotfitinto32bits).

8.6.1.4.2 Syntax

class CompositionToDecodeBox extends FullBox(‘cslg’, version, 0) { if (version==0) { signed int(32) compositionToDTSShift; signed int(32) leastDecodeToDisplayDelta; signed int(32) greatestDecodeToDisplayDelta; signed int(32) compositionStartTime; signed int(32) compositionEndTime; } else { signed int(64) compositionToDTSShift; signed int(64) leastDecodeToDisplayDelta; signed int(64) greatestDecodeToDisplayDelta; signed int(64) compositionStartTime; signed int(64) compositionEndTime; } }

8.6.1.4.3 Semantics

compositionToDTSShift: ifthisvalueisaddedtothecompositiontimes(ascalculatedbytheCTSoffsets fromtheDTS), then forall samples, theirCTS isguaranteed tobegreater thanorequaltotheirDTS,andthebuffermodelimpliedbytheindicatedprofile/levelwillbehonoured;

Page 54: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

40 ©ISO/IEC2015–Allrightsreserved

ifleastDecodeToDisplayDeltaispositiveorzero,thisfieldcanbe0;otherwiseitshouldbeatleast(- leastDecodeToDisplayDelta)

leastDecodeToDisplayDelta: the smallest composition offset in theCompositionTimeToSampleboxinthistrack

greatestDecodeToDisplayDelta: the largest composition offset in theCompositionTimeToSampleboxinthistrack

compositionStartTime: thesmallestcomputedcompositiontime(CTS)foranysampleinthemediaofthistrack

compositionEndTime:thecompositiontimeplusthecompositionduration,ofthesamplewiththe largest computedcomposition time(CTS) in themediaof this track; if this field takes thevalue0,thecompositionendtimeisunknown.

8.6.2 Sync Sample Box

8.6.2.1 Definition

BoxType: ‘stss’Container: SampleTableBox(‘stbl’)Mandatory:NoQuantity: Zeroorone

Thisboxprovidesacompactmarkingofthesyncsampleswithinthestream.Thetableisarrangedinstrictlyincreasingorderofsamplenumber.

Ifthesyncsampleboxisnotpresent,everysampleisasyncsample.

8.6.2.2 Syntax

aligned(8) class SyncSampleBox extends FullBox(‘stss’, version = 0, 0) { unsigned int(32) entry_count; int i; for (i=0; i < entry_count; i++) { unsigned int(32) sample_number; } }

8.6.2.3 Semantics

version ‐isanintegerthatspecifiestheversionofthisbox.entry_count isanintegerthatgivesthenumberofentriesinthefollowingtable.Ifentry_count

iszero,therearenosyncsampleswithinthestreamandthefollowingtableisempty.sample_numbergivesthenumbersofthesamplesthataresyncsamplesinthestream.

8.6.3 Shadow Sync Sample Box

8.6.3.1 Definition

BoxType: ‘stsh’Container: SampleTableBox(‘stbl’)Mandatory:NoQuantity: Zeroorone

Theshadowsynctableprovidesanoptionalsetofsyncsamplesthatcanbeusedwhenseekingorforsimilarpurposes.Innormalforwardplaytheyareignored.

Page 55: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 41

Eachentry intheShadowSyncTableconsistsofapairofsamplenumbers.The firstentry(shadowed‐sample‐number)indicatesthenumberofthesamplethatashadowsyncwillbedefinedfor.Thisshouldalways be a non‐sync sample (e.g. a frame difference). The second sample number (sync‐sample‐number)indicatesthesamplenumberofthesyncsample(i.e.keyframe)thatcanbeusedwhenthereisaneedforasyncsampleat,orbefore,theshadowed‐sample‐number.

TheentriesintheShadowSyncBoxshallbesortedbasedontheshadowed‐sample‐numberfield.

The shadow sync samples are normally placed in an area of the track that is not presented duringnormalplay (editedoutbymeansofanedit list), though this isnotarequirement.Theshadowsynctable canbe ignoredand the trackwillplay (andseek) correctly if it is ignored (thoughperhapsnotoptimally).

TheShadowSyncSamplereplaces,notaugments,thesamplethatitshadows(i.e.thenextsamplesentisshadowed‐sample‐number+1).Theshadowsync sample is treatedas if itoccurredat the timeof thesampleitshadows,havingthedurationofthesampleitshadows.

Hinting and transmission might become more complex if a shadow sample is used also as part ofnormalplayback,orisusedmorethanonceasashadow.Inthiscasethehinttrackmightneedseparateshadowsyncs,allofwhichcanget theirmediadata fromtheoneshadowsync in themedia track, toallowforthedifferenttime‐stampsetc.neededintheirheaders.

8.6.3.2 Syntax

aligned(8) class ShadowSyncSampleBox extends FullBox(‘stsh’, version = 0, 0) { unsigned int(32) entry_count; int i; for (i=0; i < entry_count; i++) { unsigned int(32) shadowed_sample_number; unsigned int(32) sync_sample_number; } }

8.6.3.3 Semantics

version ‐isanintegerthatspecifiestheversionofthisbox.entry_count‐isanintegerthatgivesthenumberofentriesinthefollowingtable.shadowed_sample_number‐givesthenumberofasampleforwhichthereisanalternativesync

sample.sync_sample_number‐givesthenumberofthealternativesyncsample.

8.6.4 Independent and Disposable Samples Box

8.6.4.1 Definition

BoxTypes: ‘sdtp’Container: SampleTableBox(‘stbl’)Mandatory:NoQuantity: Zeroorone

Thisoptionaltableanswersthreequestionsaboutsampledependency:1) doesthissampledependonothers(e.g.isitanI‐picture)?2) donoothersamplesdependonthisone?

Page 56: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

42 ©ISO/IEC2015–Allrightsreserved

3) does this sample contain multiple (redundant) encodings of the data at this time‐instant(possiblywithdifferentdependencies)?

Intheabsenceofthistable:1) thesyncsample table(partly)answers the firstquestion; inmostvideocodecs, I‐picturesare

alsosyncpoints,2) thedependencyofothersamplesonthisoneisunknown.3) theexistenceofredundantcodingisunknown.

Whenperforming‘trick’modes,suchasfast‐forward,itispossibletousethefirstpieceofinformationtolocateindependentlydecodablesamples.Similarly,whenperformingrandomaccess,itmaybenecessarytolocatetheprevioussyncsampleorrandomaccessrecoverypoint,androll‐forwardfromthesyncsampleorthepre‐rollstartingpointoftherandomaccessrecoverypointtothedesiredpoint.Whilerollingforward,samplesonwhichnoothersdependneednotberetrievedordecoded.

Thevalueof‘sample_is_depended_on’isindependentoftheexistenceofredundantcodings.However,aredundantcodingmayhavedifferentdependenciesfromtheprimarycoding;ifredundantcodingsareavailable,thevalueof‘sample_depends_on’documentsonlytheprimarycoding.

A leading sample (usually a picture in video) is defined relative to a reference sample,which is theimmediatelypriorsamplethatismarkedas“sample_depends_on”havingnodependency(anIpicture).Aleadingsamplehasbothacompositiontimebeforethereferencesample,andpossiblyalsoadecodingdependencyonasamplebeforethereferencesample.Thereforeif,forexample,playbackanddecodingweretostartatthereferencesample,thosesamplesmarkedasleadingwouldnotbeneededandmightnotbedecodable.Aleadingsampleitselfmustthereforenotbemarkedashavingnodependency.

For tracks with a handler_type that is not ‘vide’, ‘soun’, ‘hint’ or ‘auxv’, if another sample withsample_depends_on=2 oranothersample taggedasa “SyncSample”hasalreadybeenprocessedand unless specified otherwise, a sample tagged with sample_depends_on=2, andsample_has_redundancy=1 can be discarded, and its duration added to the duration of theprecedingone,tomaintainthetimingofsubsequentsamples.

The size of the table, sample_count, is taken from the sample_count in the Sample Size Box('stsz')orCompactSampleSizeBox(‘stz2’).

8.6.4.2 Syntax

aligned(8) class SampleDependencyTypeBox extends FullBox(‘sdtp’, version = 0, 0) { for (i=0; i < sample_count; i++){ unsigned int(2) is_leading; unsigned int(2) sample_depends_on; unsigned int(2) sample_is_depended_on; unsigned int(2) sample_has_redundancy; } }

8.6.4.3 Semantics

is_leadingtakesoneofthefollowingfourvalues:0: theleadingnatureofthissampleisunknown;

Page 57: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 43

1: thissampleisaleadingsamplethathasadependencybeforethereferencedI‐picture(andisthereforenotdecodable);

2: thissampleisnotaleadingsample;3: thissampleisaleadingsamplethathasnodependencybeforethereferencedI‐picture(andisthereforedecodable);

sample_depends_ontakesoneofthefollowingfourvalues:0: thedependencyofthissampleisunknown;1: thissampledoesdependonothers(notanIpicture);2: thissampledoesnotdependonothers(Ipicture);3: reserved

sample_is_depended_ontakesoneofthefollowingfourvalues:0: thedependencyofothersamplesonthissampleisunknown;1: othersamplesmaydependonthisone(notdisposable);2: noothersampledependsonthisone(disposable);3: reserved

sample_has_redundancytakesoneofthefollowingfourvalues:0: itisunknownwhetherthereisredundantcodinginthissample;1: thereisredundantcodinginthissample;2: thereisnoredundantcodinginthissample;3: reserved

8.6.5 Edit Box

8.6.5.1 Definition

BoxType: ‘edts’Container: TrackBox(‘trak’)Mandatory:NoQuantity: Zeroorone

AnEditBoxmapsthepresentationtime‐linetothemediatime‐lineasit isstoredinthefile.TheEditBoxisacontainerfortheeditlists.

TheEditBox isoptional. In theabsenceof thisbox, there is an implicitone‐to‐onemappingof thesetime‐lines,andthepresentationofatrackstartsatthebeginningofthepresentation.Anemptyeditisusedtooffsetthestarttimeofatrack.

8.6.5.2 Syntax

aligned(8) class EditBox extends Box(‘edts’) { }

8.6.6 Edit List Box

8.6.6.1 Definition

BoxType: ‘elst’Container: EditBox(‘edts’)Mandatory:NoQuantity: Zeroorone

Thisboxcontainsanexplicittimelinemap.Eachentrydefinespartofthetracktime‐line:bymappingpartofthemediatime‐line,orbyindicating‘empty’time,orbydefininga‘dwell’,whereasingletime‐pointinthemediaisheldforaperiod.

Page 58: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

44 ©ISO/IEC2015–Allrightsreserved

NOTEEditsarenotrestrictedtofallonsampletimes.Thismeansthatwhenenteringanedit,itcanbenecessaryto (a)backup to a syncpoint, andpre‐roll from there and then (b)be careful about thedurationof the firstsample—itmighthavebeentruncatediftheeditentersitduringitsnormalduration.Ifthisisaudio,thatframemightneed tobedecoded,and then the finalslicingdone.Likewise, thedurationof the lastsample inaneditmightneedslicing.

Starting offsets for tracks (streams) are represented by an initial empty edit. For example, to play atrackfromitsstartfor30seconds,butat10secondsintothepresentation,wehavethefollowingeditlist:

Entry‐count=2Segment‐duration=10secondsMedia‐Time=‐1Media‐Rate=1Segment‐duration=30seconds(couldbethelengthofthewholetrack)Media‐Time=0secondsMedia‐Rate=1

Anon‐emptyeditmayinsertaportionofthemediatimelinethatisnotpresentintheinitialmovie,andispresentonlyinsubsequentmoviefragments.Particularlyinanemptyinitialmovieofafragmentedmoviefile(whentherearenomediasamplesyetpresent),thesegment_durationofthiseditmaybezero,whereupontheeditprovidestheoffsetfrommediacompositiontimetomoviepresentationtime,for the movie and subsequent movie fragments. It is recommended that such an edit be used toestablishapresentationtimeof0forthefirstpresentedsample,whencompositionoffsetsareused.

For example, if the composition time of the first composed frame is 20, then the edit thatmaps themediatimefrom20onwardstomovietime0onwards,wouldread:

Entry‐count=1Segment‐duration=0Media‐Time=20Media‐Rate=1

8.6.6.2 Syntax

aligned(8) class EditListBox extends FullBox(‘elst’, version, 0) { unsigned int(32) entry_count; for (i=1; i <= entry_count; i++) { if (version==1) { unsigned int(64) segment_duration; int(64) media_time; } else { // version==0 unsigned int(32) segment_duration; int(32) media_time; } int(16) media_rate_integer; int(16) media_rate_fraction = 0; } }

Page 59: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 45

8.6.6.3 Semantics

version isanintegerthatspecifiestheversionofthisbox(0or1)entry_countisanintegerthatgivesthenumberofentriesinthefollowingtablesegment_duration isan integer thatspecifies thedurationof thiseditsegment inunitsof the

timescaleintheMovieHeaderBoxmedia_time isan integercontaining thestarting timewithin themediaof thiseditsegment (in

mediatimescaleunits,incompositiontime).Ifthisfieldissetto–1,itisanemptyedit.Thelasteditinatrackshallneverbeanemptyedit.AnydifferencebetweenthedurationintheMovieHeaderBox,andthetrack’sdurationisexpressedasanimplicitemptyeditattheend.

media_rate specifies the relative rate at which to play the media corresponding to this editsegment. If this value is 0, then the edit is specifying a ‘dwell’: the media at media‐time ispresentedforthesegment‐duration.Otherwisethisfieldshallcontainthevalue1.

8.7 Track Data Layout Structures

8.7.1 Data Information Box

8.7.1.1 Definition

BoxType: ‘dinf’Container: MediaInformationBox(‘minf’)orMetaBox(‘meta’)Mandatory:Yes(requiredwithin‘minf’box)andNo(optionalwithin‘meta’box)Quantity: Exactlyone

Thedatainformationboxcontainsobjectsthatdeclarethelocationofthemediainformationinatrack.

8.7.1.2 Syntax

aligned(8) class DataInformationBox extends Box(‘dinf’) { }

8.7.2 Data Reference Box

8.7.2.1 Definition

BoxTypes:‘dref’Container:DataInformationBox(‘dinf’)Mandatory:YesQuantity:Exactlyone

BoxTypes:‘url ‘,‘urn ‘Container:DataInformationBox(‘dref’)Mandatory:Yes(atleastoneof‘url‘or‘urn‘shallbepresent)Quantity:Oneormore

The data reference object contains a table of data references (normally URLs) that declare thelocation(s) of the media data used within the presentation. The data reference index in the sampledescription ties entries in this table to the samples in the track. A track may be split over severalsourcesinthisway.

Iftheflagissetindicatingthatthedataisinthesamefileasthisbox,thennostring(notevenanemptyone)shallbesuppliedintheentryfield.

Page 60: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

46 ©ISO/IEC2015–Allrightsreserved

The entry_count in the DataReferenceBox shall be 1 or greater; each DataEntryBox within theDataReferenceBoxshallbeeitheraDataEntryUrnBoxoraDataEntryUrlBox.

NOTEThoughthecountis32bits,thenumberofitemsisusuallymuchfewer,andisrestrictedbythefactthatthereferenceindexinthesampletableisonly16bits

Whenafilethathasdataentrieswiththeflagsetindicatingthatthemediadatais inthesamefile, issplit into segments for transport, the value of this flag does not change, as the file is (logically)reassembledafterthetransportoperation.

8.7.2.2 Syntax

aligned(8) class DataEntryUrlBox (bit(24) flags) extends FullBox(‘url ’, version = 0, flags) { string location; }

aligned(8) class DataEntryUrnBox (bit(24) flags) extends FullBox(‘urn ’, version = 0, flags) { string name; string location; }

aligned(8) class DataReferenceBox extends FullBox(‘dref’, version = 0, 0) { unsigned int(32) entry_count; for (i=1; i <= entry_count; i++) { DataEntryBox(entry_version, entry_flags) data_entry; } }

8.7.2.3 Semantics

version isanintegerthatspecifiestheversionofthisboxentry_count isanintegerthatcountstheactualentriesentry_version isanintegerthatspecifiestheversionoftheentryformatentry_flags isa24‐bit integerwithflags;oneflagisdefined(x000001)whichmeansthatthe

mediadataisinthesamefileastheMovieBoxcontainingthisdatareference.data_entry isaURLorURNentry.NameisaURN,andisrequiredinaURNentry.Locationisa

URL,andisrequiredinaURLentryandoptionalinaURNentry,whereitgivesalocationtofindtheresourcewiththegivenname.Eachisanull‐terminatedstringusingUTF‐8characters.Iftheself‐containedflagisset,theURLformisusedandnostringispresent;theboxterminateswiththeentry‐flagsfield.TheURLtypeshouldbeofaservicethatdeliversafile(e.g.URLsoftypefile,http, ftp etc.), and which services ideally also permit random access. Relative URLs arepermissible and are relative to the file containing the Movie Box that contains this datareference.

Page 61: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 47

8.7.3 Sample Size Boxes

8.7.3.1 Definition

BoxType: ‘stsz’,‘stz2’Container: SampleTableBox(‘stbl’)Mandatory:YesQuantity: Exactlyonevariantmustbepresent

Thisboxcontainsthesamplecountandatablegivingthesizeinbytesofeachsample.Thisallowsthemediadataitselftobeunframed.Thetotalnumberofsamplesinthemediaisalwaysindicatedinthesamplecount.

There are two variants of the sample size box. The first variant has a fixed size 32‐bit field forrepresentingthesamplesizes;itpermitsdefiningaconstantsizeforallsamplesinatrack.Thesecondvariant permits smaller size fields, to save spacewhen the sizes are varying but small. One of theseboxesmustbepresent;thefirstversionispreferredformaximumcompatibility.

NOTEAsamplesizeofzeroisnotprohibitedingeneral,butitmustbevalidanddefinedforthecodingsystem,asdefinedbythesampleentry,thatthesamplebelongsto.

8.7.3.2 Sample Size Box

8.7.3.2.1 Syntax

aligned(8) class SampleSizeBox extends FullBox(‘stsz’, version = 0, 0) { unsigned int(32) sample_size; unsigned int(32) sample_count; if (sample_size==0) { for (i=1; i <= sample_count; i++) { unsigned int(32) entry_size; } } }

8.7.3.2.2 Semantics

version isanintegerthatspecifiestheversionofthisboxsample_size is integerspecifying thedefault samplesize. If all thesamplesare the samesize,

thisfieldcontainsthatsizevalue.If thisfieldissetto0,thenthesampleshavedifferentsizes,andthosesizesarestoredinthesamplesizetable.Ifthisfieldisnot0,itspecifiestheconstantsamplesize,andnoarrayfollows.

sample_countisanintegerthatgivesthenumberofsamplesinthetrack;ifsample‐sizeis0,thenitisalsothenumberofentriesinthefollowingtable.

entry_size isanintegerspecifyingthesizeofasample,indexedbyitsnumber.

8.7.3.3 Compact Sample Size Box

8.7.3.3.1 Syntax

aligned(8) class CompactSampleSizeBox extends FullBox(‘stz2’, version = 0, 0) { unsigned int(24) reserved = 0; unisgned int(8) field_size; unsigned int(32) sample_count; for (i=1; i <= sample_count; i++) { unsigned int(field_size) entry_size; } }

Page 62: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

48 ©ISO/IEC2015–Allrightsreserved

8.7.3.3.2 Semantics

version isanintegerthatspecifiestheversionofthisboxfield_sizeisanintegerspecifyingthesizeinbitsoftheentriesinthefollowingtable;itshall

takethevalue4,8or16.Ifthevalue4isused,theneachbytecontainstwovalues:entry[i]<<4+entry[i+1];ifthesizesdonotfillanintegralnumberofbytes,thelastbyteispaddedwithzeros.

sample_countisanintegerthatgivesthenumberofentriesinthefollowingtableentry_size isanintegerspecifyingthesizeofasample,indexedbyitsnumber.

8.7.4 Sample To Chunk Box

8.7.4.1 Definition

BoxType: ‘stsc’Container: SampleTableBox(‘stbl’)Mandatory:YesQuantity: Exactlyone

Samples within the media data are grouped into chunks. Chunks can be of different sizes, and thesampleswithinachunkcanhavedifferentsizes.Thistablecanbeusedtofindthechunkthatcontainsasample,itsposition,andtheassociatedsampledescription.

Thetableiscompactlycoded.Eachentrygivestheindexofthefirstchunkofarunofchunkswiththesamecharacteristics.Bysubtractingoneentryherefromthepreviousone,youcancomputehowmanychunks are in this run. You can convert this to a sample count by multiplying by the appropriatesamples‐per‐chunk.

8.7.4.2 Syntax

aligned(8) class SampleToChunkBox extends FullBox(‘stsc’, version = 0, 0) { unsigned int(32) entry_count; for (i=1; i <= entry_count; i++) { unsigned int(32) first_chunk; unsigned int(32) samples_per_chunk; unsigned int(32) sample_description_index; } }

8.7.4.3 Semantics

version isanintegerthatspecifiestheversionofthisboxentry_countisanintegerthatgivesthenumberofentriesinthefollowingtablefirst_chunkisanintegerthatgivestheindexofthefirstchunkinthisrunofchunksthatshare

the same samples‐per‐chunk and sample‐description‐index; the index of the first chunk in atrackhas thevalue1 (thefirst_chunk field in the first recordof thisboxhas thevalue1,identifyingthatthefirstsamplemapstothefirstchunk).

samples_per_chunkisanintegerthatgivesthenumberofsamplesineachofthesechunkssample_description_index is an integer that gives the index of the sample entry that

describesthesamplesinthischunk.Theindexrangesfrom1tothenumberofsampleentriesintheSampleDescriptionBox

Page 63: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 49

8.7.5 Chunk Offset Box

8.7.5.1 Definition

BoxType: ‘stco’,‘co64’Container: SampleTableBox(‘stbl’)Mandatory:YesQuantity: Exactlyonevariantmustbepresent

Thechunkoffset table gives the indexof eachchunk into the containing file.Thereare twovariants,permitting the use of 32‐bit or 64‐bit offsets. The latter is useful when managing very largepresentations.Atmostoneofthesevariantswilloccurinanysingleinstanceofasampletable.

Offsets are file offsets, not theoffset into anyboxwithin the file (e.g.MediaDataBox).Thispermitsreferring tomediadata in fileswithoutanyboxstructure. Itdoesalsomean thatcaremustbe takenwhenconstructingaself‐containedISOfilewithitsmetadata(MovieBox)atthefront,asthesizeoftheMovieBoxwillaffectthechunkoffsetstothemediadata.

8.7.5.2 Syntax

aligned(8) class ChunkOffsetBox extends FullBox(‘stco’, version = 0, 0) { unsigned int(32) entry_count; for (i=1; i <= entry_count; i++) { unsigned int(32) chunk_offset; } }

aligned(8) class ChunkLargeOffsetBox extends FullBox(‘co64’, version = 0, 0) { unsigned int(32) entry_count; for (i=1; i <= entry_count; i++) { unsigned int(64) chunk_offset; } }

8.7.5.3 Semantics

version isanintegerthatspecifiestheversionofthisboxentry_countisanintegerthatgivesthenumberofentriesinthefollowingtablechunk_offset is a 32 or 64 bit integer that gives the offset of the start of a chunk into its

containingmediafile.

8.7.6 Padding Bits Box

8.7.6.1 Definition

BoxType: ‘padb’Container: SampleTable(‘stbl’)Mandatory:NoQuantity: Zeroorone

Insomestreamsthemediasamplesdonotoccupyallbitsofthebytesgivenbythesamplesize,andarepaddedattheendtoabyteboundary.Insomecases,itisnecessarytorecordexternallythenumberofpaddingbitsused.Thistablesuppliesthatinformation.

Page 64: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

50 ©ISO/IEC2015–Allrightsreserved

8.7.6.2 Syntax

aligned(8) class PaddingBitsBox extends FullBox(‘padb’, version = 0, 0) { unsigned int(32) sample_count; int i; for (i=0; i < ((sample_count + 1)/2); i++) { bit(1) reserved = 0; bit(3) pad1; bit(1) reserved = 0; bit(3) pad2; } }

8.7.6.3 Semantics

sample_count –countsthenumberofsamplesinthetrack;itshouldmatchthecountinothertables

pad1 –avaluefrom0to7,indicatingthenumberofbitsattheendofsample(i*2)+1.pad2 –avaluefrom0to7,indicatingthenumberofbitsattheendofsample(i*2)+2

8.7.7 Sub-Sample Information Box

8.7.7.1 Definition

BoxType: ‘subs’Container: SampleTableBox(‘stbl’)orTrackFragmentBox(‘traf’)Mandatory:NoQuantity: Zeroormore

Thisbox,namedtheSub-Sample Information box,isdesignedtocontainsub‐sampleinformation.

Asub‐sampleisacontiguousrangeofbytesofasample.Thespecificdefinitionofasub‐sampleshallbesuppliedforagivencodingsystem(e.g.forISO/IEC14496‐10,AdvancedVideoCoding).Intheabsenceofsuchaspecificdefinition,thisboxshallnotbeappliedtosamplesusingthatcodingsystem.

Ifsubsample_count is0 foranyentry, thenthosesampleshavenosubsample informationandnoarrayfollows.Thetableissparselycoded;thetableidentifieswhichsampleshavesub‐samplestructurebyrecordingthedifferenceinsample‐numberbetweeneachentry.Thefirstentryinthetablerecordsthesamplenumberofthefirstsamplehavingsub‐sampleinformation.

NOTEIt is possible to combine subsample_priority and discardable such that whensubsample_priority is smaller than a certain value,discardable is set to 1. However, since differentsystems may use different scales of priority values, to separate them is safe to have a clean solution fordiscardablesub‐samples.

Whenmore thanoneSub‐Sample Informationbox ispresent in the samecontainerbox, thevalueofflagsshalldifferineachoftheseSub‐SampleInformationboxes.Thesemanticsofflags,ifany,shallbe supplied for a given coding system. If flags have no semantics for a given coding system, theflagsshallbe0.

Page 65: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 51

8.7.7.2 Syntax

aligned(8) class SubSampleInformationBox extends FullBox(‘subs’, version, flags) { unsigned int(32) entry_count; int i,j; for (i=0; i < entry_count; i++) { unsigned int(32) sample_delta; unsigned int(16) subsample_count; if (subsample_count > 0) { for (j=0; j < subsample_count; j++) { if(version == 1) { unsigned int(32) subsample_size; } else { unsigned int(16) subsample_size; } unsigned int(8) subsample_priority; unsigned int(8) discardable; unsigned int(32) codec_specific_parameters; } } } }

8.7.7.3 Semantics

version isanintegerthatspecifiestheversionofthisbox(0or1inthisspecification)entry_count isanintegerthatgivesthenumberofentriesinthefollowingtable.sample_delta isanintegerthatspecifiesthesamplenumberofthesamplehavingsub‐sample

structure. It is coded as the difference between the desired sample number, and the samplenumberindicatedinthepreviousentry.Ifthecurrententryisthefirstentry,thevalueindicatesthesamplenumberofthefirstsamplehavingsub‐sampleinformation,that is, thevalueisthedifferencebetweenthesamplenumberandzero(0).

subsample_count isanintegerthatspecifiesthenumberofsub‐sampleforthecurrentsample.Ifthereisnosub‐samplestructure,thenthisfieldtakesthevalue0.

subsample_size isanintegerthatspecifiesthesize,inbytes,ofthecurrentsub‐sample.subsample_priority is an integer specifying the degradation priority for each sub‐sample.

Higher values ofsubsample_priority, indicate sub‐sampleswhich are important to, andhaveagreaterimpacton,thedecodedquality.

discardableequalto0meansthatthesub‐sampleisrequiredtodecodethecurrentsample,whileequalto1meansthesub‐sampleisnotrequiredtodecodethecurrentsamplebutmaybeusedforenhancements,e.g.,thesub‐sampleconsistsofsupplementalenhancementinformation(SEI)messages.

codec_specific_parametersisdefinedbythecodecinuse.Ifnosuchdefinitionisavailable,thisfieldshallbesetto0.

8.7.8 Sample Auxiliary Information Sizes Box

8.7.8.1 Definition

BoxType: ‘saiz’Container: SampleTableBox(‘stbl’)orTrackFragmentBox('traf')Mandatory:NoQuantity: ZeroorMore

Per‐samplesampleauxiliaryinformationmaybestoredanywhereinthesamefileasthesampledataitself; for self‐contained media files, this is typically in a MediaData box or a box from a derived

Page 66: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

52 ©ISO/IEC2015–Allrightsreserved

specification.Itisstoredeither(a)inmultiplechunks,withthenumberofsamplesperchunk,aswellasthenumberofchunks,matchingthechunkingoftheprimarysampledataor(b)inasinglechunkforallthesamples inamoviesample table (oramovie fragment).TheSampleAuxiliary Information forallsamples contained within a single chunk (or track run) is stored contiguously (similarly to sampledata).

SampleAuxiliaryInformation,whenpresent,isalwaysstoredinthesamefileasthesamplestowhichitrelatesastheysharethesamedatareference(‘dref’)structure.However,thisdatamaybelocatedanywherewithinthis file,usingauxiliary informationoffsets(‘saio’) to indicatethe locationof thedata.

Whethersampleauxiliaryinformationispermittedorrequiredmaybespecifiedbythebrandsorthecoding format in use. The format of the sample auxiliary information is determined byaux_info_type. If aux_info_type and aux_info_type_parameter are omitted then theimpliedvalueofaux_info_type iseither(a) in thecaseof transformedcontent,suchasprotectedcontent, thescheme_type included in theProtection Scheme Informationbox or otherwise (b) thesample entry type. The default value of the aux_info_type_parameter is 0. Some values ofaux_info_type may be restricted to be used only with particular track types. A track may havemultiple streams of sample auxiliary information of different types. The types are registered at theregistrationauthority.

While aux_info_type determines the format of the auxiliary information, several streams ofauxiliary information having the same format may be used when their value ofaux_info_type_parameter differs. The semantics of aux_info_type_parameter for aparticular aux_info_type value must be specified along with specifying the semantics of theparticularaux_info_typevalueandtheimpliedauxiliaryinformationformat.

Thisboxprovidesthesizeoftheauxiliaryinformationforeachsample.Foreachinstanceof thisbox,theremust be amatchingSampleAuxiliaryInformationOffsetsBoxwith the same values ofaux_info_type and aux_info_type_parameter, providing the offset information for thisauxiliaryinformation.

NOTE Fordiscussionsontheuseofsampleauxiliaryinformationversusothermechanisms,seeAnnexC.8.

8.7.8.2 Syntax

aligned(8) class SampleAuxiliaryInformationSizesBox extends FullBox(‘saiz’, version = 0, flags) { if (flags & 1) { unsigned int(32) aux_info_type; unsigned int(32) aux_info_type_parameter; } unsigned int(8) default_sample_info_size; unsigned int(32) sample_count; if (default_sample_info_size == 0) { unsigned int(8) sample_info_size[ sample_count ]; } }

Page 67: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 53

8.7.8.3 Semantics

aux_info_typeisanintegerthatidentifiesthetypeofthesampleauxiliaryinformation.Atmostone occurrence of this box with the same values for aux_info_type andaux_info_type_parametershallexistinthecontainingbox.

aux_info_type_parameter identifies the “stream” of auxiliary information having the samevalue of aux_info_type and associated to the same track. The semantics ofaux_info_type_parameteraredeterminedbythevalueofaux_info_type.

default_sample_info_size isanintegerspecifyingthesampleauxiliaryinformationsizeforthecasewherealltheindicatedsampleshavethesamesampleauxiliaryinformationsize.Ifthesizevariesthenthisfieldshallbezero.

sample_count isan integer thatgives thenumberofsamples forwhichasize isdefined.ForaSample Auxiliary Information Sizes box appearing in the Sample Table Box thismust be thesameas,orlessthan,thesample_countwithintheSampleSizeBoxorCompactSampleSizeBox.ForaSampleAuxiliaryInformationSizesboxappearinginaTrackFragmentboxthismustbethesameas,orlessthan,thesumofthesample_countentrieswithintheTrackFragmentRun boxes of the Track Fragment. If this is less than the number of samples, then auxiliaryinformation issupplied for the initial samples,and theremainingsampleshavenoassociatedauxiliaryinformation.

sample_info_sizegivesthesizeofthesampleauxiliaryinformationinbytes.Thismaybezerotoindicatesampleswithnoassociatedauxiliaryinformation.

8.7.9 Sample Auxiliary Information Offsets Box

8.7.9.1 Definition

BoxType: ‘saio’Container: SampleTableBox(‘stbl’)orTrackFragmentBox('traf')Mandatory:NoQuantity: ZeroorMore

For an introduction to sample auxiliary information, see the definition of the Sample AuxiliaryInformationSizeBox.

Thisboxprovidesthepositioninformationforthesampleauxiliaryinformation,inawaysimilartothechunkoffsetsforsampledata.

8.7.9.2 Syntax

aligned(8) class SampleAuxiliaryInformationOffsetsBox extends FullBox(‘saio’, version, flags) { if (flags & 1) { unsigned int(32) aux_info_type; unsigned int(32) aux_info_type_parameter; } unsigned int(32) entry_count; if ( version == 0 ) { unsigned int(32) offset[ entry_count ]; } else { unsigned int(64) offset[ entry_count ]; } }

Page 68: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

54 ©ISO/IEC2015–Allrightsreserved

8.7.9.3 Semantics

aux_info_type and aux_info_type_parameter are defined as in theSampleAuxiliaryInformationSizesBox

entry_count gives the number of entries in the following table. For a Sample AuxiliaryInformationOffsetsbox appearing in a SampleTableBox thismustbeequal tooneor to thevalue of the entry_count field in the Chunk Offset Box or Chunk Large Offset Box. For aSampleAuxiliaryInformationOffsetsBoxappearinginaTrackFragmentbox,thismustbeequaltooneortothenumberofTrackFragmentRunboxesintheTrackFragmentBox.

offsetgivesthepositioninthefileoftheSampleAuxiliaryInformationforeachChunkorTrackFragmentRun.Ifentry_countisone,thentheSampleAuxiliaryInformationforallChunksorRunsiscontiguousinthefileinchunkorrunorder.WhenintheSampleTableBox,theoffsetsareabsolute.Inatrackfragmentbox,thisvalueisrelativetothebaseoffsetestablishedbythetrackfragmentheaderbox(‘tfhd’)inthesametrackfragment(see8.8.14).

8.8 Movie Fragments

8.8.1 Movie Extends Box

8.8.1.1 Definition

BoxType: ‘mvex’Container: MovieBox(‘moov’)Mandatory:NoQuantity: Zeroorone

ThisboxwarnsreadersthattheremightbeMovieFragmentBoxesinthisfile.Toknowofallsamplesinthe tracks, theseMovie Fragment Boxesmust be found and scanned in order, and their informationlogicallyaddedtothatfoundintheMovieBox.

ThereisanarrativeintroductiontoMovieFragmentsinAnnexA.

8.8.1.2 Syntax

aligned(8) class MovieExtendsBox extends Box(‘mvex’){ }

8.8.2 Movie Extends Header Box

8.8.2.1 Definition

BoxType: ‘mehd’Container: MovieExtendsBox(‘mvex’)Mandatory:NoQuantity: Zeroorone

The Movie Extends Header is optional, and provides the overall duration, including fragments, of afragmentedmovie.Ifthisboxisnotpresent,theoveralldurationmustbecomputedbyexaminingeachfragment.

Page 69: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 55

8.8.2.2 Syntax

aligned(8) class MovieExtendsHeaderBox extends FullBox(‘mehd’, version, 0) { if (version==1) { unsigned int(64) fragment_duration; } else { // version==0 unsigned int(32) fragment_duration; } }

8.8.2.3 Semantics

fragment_duration is an integer thatdeclares length of thepresentationof thewholemovieincludingfragments(inthetimescaleindicatedintheMovieHeaderBox).Thevalueofthisfieldcorresponds to thedurationof the longest track, includingmovie fragments. If anMP4 file iscreated in real‐time, such as used in live streaming, it is not likely that thefragment_durationisknowninadvanceandthisboxmaybeomitted.

8.8.3 Track Extends Box

8.8.3.1 Definition

BoxType: ‘trex’Container: MovieExtendsBox(‘mvex’)Mandatory:YesQuantity: ExactlyoneforeachtrackintheMovieBox

This sets up default values used by themovie fragments. By setting defaults in this way, space andcomplexitycanbesavedineachTrackFragmentBox.

Thesampleflagsfieldinsamplefragments(default_sample_flagshereandinaTrackFragmentHeaderBox,andsample_flagsandfirst_sample_flagsinaTrackFragmentRunBox)iscodedasa32‐bitvalue.Ithasthefollowingstructure:

bit(4) reserved=0; unsigned int(2) is_leading; unsigned int(2) sample_depends_on; unsigned int(2) sample_is_depended_on; unsigned int(2) sample_has_redundancy; bit(3) sample_padding_value; bit(1) sample_is_non_sync_sample; unsigned int(16) sample_degradation_priority;

The is_leading, sample_depends_on, sample_is_depended_on andsample_has_redundancy values are defined as documented in the Independent and DisposableSamplesBox.

The flagsample_is_non_sync_sample provides the same information as the sync sample table[8.6.2]. When this value is set 0 for a sample, it is the same as if the sample were not in a moviefragmentandmarkedwithanentry in thesyncsample table (or, ifallsamplesaresyncsamples, thesyncsampletablewereabsent).

The sample_padding_value is defined as for the padding bits table. Thesample_degradation_priorityisdefinedasforthedegradationprioritytable.

Page 70: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

56 ©ISO/IEC2015–Allrightsreserved

8.8.3.2 Syntax

aligned(8) class TrackExtendsBox extends FullBox(‘trex’, 0, 0){ unsigned int(32) track_ID; unsigned int(32) default_sample_description_index; unsigned int(32) default_sample_duration; unsigned int(32) default_sample_size; unsigned int(32) default_sample_flags; }

8.8.3.3 Semantics

track_id identifiesthetrack;thisshallbethetrackIDofatrackintheMovieBoxdefault_thesefieldssetupdefaultsusedinthetrackfragments.

8.8.4 Movie Fragment Box

8.8.4.1 Definition

BoxType: ‘moof’Container: FileMandatory:NoQuantity: Zeroormore

The movie fragments extend the presentation in time. They provide the information that wouldpreviouslyhavebeenintheMovieBox.TheactualsamplesareinMediaDataBoxes,asusual,iftheyarein the same file. The data reference index is in the sample description, so it is possible to buildincrementalpresentationswherethemediadataisinfilesotherthanthefilecontainingtheMovieBox.

TheMovie Fragment Box is a top‐level box, (i.e. a peer to theMovie Box andMedia Data boxes). ItcontainsaMovieFragmentHeaderBox,andthenoneormoreTrackFragmentBoxes.

NOTE Thereisnorequirementthatanyparticularmoviefragmentextendalltrackspresentinthemovieheader, and there is no restriction on the location of themedia data referred to by themovie fragments.However,derivedspecificationsmaymakesuchrestrictions.

8.8.4.2 Syntax

aligned(8) class MovieFragmentBox extends Box(‘moof’){ }

8.8.5 Movie Fragment Header Box

8.8.5.1 Definition

BoxType: ‘mfhd’Container: MovieFragmentBox('moof')Mandatory:YesQuantity: Exactlyone

The movie fragment header contains a sequence number, as a safety check. The sequence numberusuallystartsat1andincreasesforeachmoviefragmentinthefile,intheorderinwhichtheyoccur.Thisallows readers toverify integrityof thesequence inenvironmentswhereundesiredre‐orderingmightoccur.

Page 71: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 57

8.8.5.2 Syntax

aligned(8) class MovieFragmentHeaderBox extends FullBox(‘mfhd’, 0, 0){ unsigned int(32) sequence_number; }

8.8.5.3 Semantics

sequence_number anumberassociatedwiththisfragment

8.8.6 Track Fragment Box

8.8.6.1 Definition

BoxType: ‘traf’Container: MovieFragmentBox('moof')Mandatory:NoQuantity: Zeroormore

Withinthemoviefragmentthereisasetoftrackfragments,zeroormorepertrack.Thetrackfragmentsinturncontainzeroormoretrackruns,eachofwhichdocumentacontiguousrunofsamplesforthattrack.Withinthesestructures,manyfieldsareoptionalandcanbedefaulted.

It ispossible toadd 'empty time' toa trackusing thesestructures,aswellasaddingsamples.Emptyinsertscanbeusedinaudiotracksdoingsilencesuppression,forexample.

8.8.6.2 Syntax

aligned(8) class TrackFragmentBox extends Box(‘traf’){ }

8.8.7 Track Fragment Header Box

8.8.7.1 Definition

BoxType: ‘tfhd’Container: TrackFragmentBox('traf')Mandatory:YesQuantity: Exactlyone

Eachmoviefragmentcanaddzeroormorefragmentstoeachtrack;andatrackfragmentcanaddzeroormorecontiguousrunsofsamples.Thetrackfragmentheadersetsupinformationanddefaultsusedforthoserunsofsamples.

Thebase‐data‐offset,ifexplicitlyprovided,isadataoffsetthatisidenticaltoachunkoffsetintheChunkOffset Box, i.e. applying to the complete file (e.g. starting with a file‐type box and movie box). Incircumstanceswhenthecompletefiledoesnotexistoritssizeisunknown,itmaybeimpossibletouseanexplicitbase‐data‐offset;then,offsetsneedtobeestablishedrelativetothemoviefragment.

Thefollowingflagsaredefinedinthetf_flags:

0x000001base‐data‐offset‐present: indicates the presence of the base‐data‐offset field. Thisprovidesanexplicitanchorforthedataoffsetsineachtrackrun(seebelow).Ifnotprovidedandif the default‐base‐is‐moof flag is not set, the base‐data‐offset for the first track in themovie

Page 72: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

58 ©ISO/IEC2015–Allrightsreserved

fragmentis thepositionofthefirstbyteoftheenclosingMovieFragmentBox,andforsecondand subsequent track fragments, the default is the end of the data defined by the precedingtrack fragment. Fragments 'inheriting' their offset in this way must all use the same data‐reference(i.e.,thedataforthesetracksmustbeinthesamefile)

0x000002sample‐description‐index‐present:indicatesthepresenceofthisfield,whichover‐rides,inthisfragment,thedefaultsetupintheTrackExtendsBox.

0x000008default‐sample‐duration‐present0x000010default‐sample‐size‐present0x000020default‐sample‐flags‐present0x010000duration‐is‐empty: this indicates that the duration provided in either default‐sample‐

duration, or by thedefault‐duration in theTrackExtendsBox, is empty, i.e. that therearenosamplesforthistimeinterval.ItisanerrortomakeapresentationthathasbotheditlistsintheMovieBox,andempty‐durationfragments.

0x020000default‐base‐is‐moof: if base‐data‐offset‐present is 1, this flag is ignored. If base‐data‐offset‐present is zero, this indicates that the base‐data‐offset for this track fragment is thepositionofthefirstbyteoftheenclosingMovieFragmentBox.Supportforthedefault‐base‐is‐moof flag is requiredunder the ‘iso5’brand,and it shallnotbeused inbrandsorcompatiblebrandsearlierthaniso5.

NOTE Theuseofthedefault‐base‐is‐moofflagbreaksthecompatibilitytoearlierbrandsofthefileformat,becauseitsetstheanchorpointforoffsetcalculationdifferentlythanearlier.Therefore,thedefault‐base‐is‐moofflagcannotbesetwhenearlierbrandsareincludedintheFileTypebox.

8.8.7.2 Syntax

aligned(8) class TrackFragmentHeaderBox extends FullBox(‘tfhd’, 0, tf_flags){ unsigned int(32) track_ID; // all the following are optional fields unsigned int(64) base_data_offset; unsigned int(32) sample_description_index; unsigned int(32) default_sample_duration; unsigned int(32) default_sample_size; unsigned int(32) default_sample_flags }

8.8.7.3 Semantics

base_data_offset thebaseoffsettousewhencalculatingdataoffsets

8.8.8 Track Fragment Run Box

8.8.8.1 Definition

BoxType: ‘trun’Container: TrackFragmentBox('traf')Mandatory:NoQuantity: Zeroormore

WithintheTrackFragmentBox,therearezeroormoreTrackRunBoxes.Iftheduration‐is‐emptyflagissetinthetf_flags,therearenotrackruns.Atrackrundocumentsacontiguoussetofsamplesforatrack.

Thenumberofoptionalfieldsisdeterminedfromthenumberofbitssetinthelowerbyteoftheflags,and the size of a record from the bits set in the second byte of the flags. This procedure shall befollowed,toallowfornewfieldstobedefined.

Page 73: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 59

If the data‐offset is not present, then the data for this run starts immediately after the data of thepreviousrun,oratthebase‐data‐offsetdefinedbythetrackfragmentheaderifthisisthefirstruninatrackfragment,Ifthedata‐offsetispresent,itisrelativetothebase‐data‐offsetestablishedinthetrackfragmentheader.

Thefollowingflagsaredefined:

0x000001 data‐offset‐present.0x000004 first‐sample‐flags‐present;thisover‐ridesthedefaultflagsforthefirstsampleonly.This

makesitpossibletorecordagroupofframeswherethefirstisakeyandtherestaredifferenceframes,withoutsupplyingexplicitflagsforeverysample.Ifthisflagandfieldareused,sample‐flagsshallnotbepresent.

0x000100 sample‐duration‐present:indicatesthateachsamplehasitsownduration,otherwisethedefaultisused.

0x000200 sample‐size‐present:eachsamplehasitsownsize,otherwisethedefaultisused.0x000400 sample‐flags‐present;eachsamplehasitsownflags,otherwisethedefaultisused.0x000800 sample‐composition‐time‐offsets‐present; each sample has a composition time offset

(e.g.asusedforI/P/BvideoinMPEG).

Thecompositionoffsetvaluesinthecompositiontime‐to‐sampleboxandinthetrackrunboxmaybesignedorunsigned.Therecommendationsgiveninthecompositiontime‐to‐sampleboxconcerningtheuseofsignedcompositionoffsetsalsoapplyhere.

8.8.8.2 Syntax

aligned(8) class TrackRunBox extends FullBox(‘trun’, version, tr_flags) { unsigned int(32) sample_count; // the following are optional fields signed int(32) data_offset; unsigned int(32) first_sample_flags; // all fields in the following array are optional { unsigned int(32) sample_duration; unsigned int(32) sample_size; unsigned int(32) sample_flags if (version == 0) { unsigned int(32) sample_composition_time_offset; } else { signed int(32) sample_composition_time_offset; } }[ sample_count ] }

8.8.8.3 Semantics

sample_count thenumberofsamplesbeingadded in thisrun;also thenumberof rows in thefollowingtable(therowscanbeempty)

data_offsetisaddedtotheimplicitorexplicitdata_offsetestablishedinthetrackfragmentheader.

first_sample_flagsprovidesasetofflagsforthefirstsampleonlyofthisrun.

Page 74: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

60 ©ISO/IEC2015–Allrightsreserved

8.8.9 Movie Fragment Random Access Box

8.8.9.1 Definition

BoxType: ‘mfra’Container: FileMandatory:NoQuantity: Zeroorone

The Movie Fragment Random Access Box (‘mfra’) provides a table which may assist readers infindingsyncsamplesinafileusingmoviefragments.Itcontainsatrackfragmentrandomaccessboxforeach track forwhich information isprovided (whichmaynotbeall tracks). It isusuallyplacedatorneartheendofthefile;thelastboxwithintheMovieFragmentRandomAccessBoxprovidesacopyofthelengthfieldfromtheMovieFragmentRandomAccessBox.Readersmayattempttofindthisboxbyexamining the last 32bits of the file, or scanning backwards from the end of the file for a MovieFragment Random Access Offset Box and using the size information in it, to see if that locates thebeginningofaMovieFragmentRandomAccessBox.

Thisboxprovidesonlyahintastowheresyncsamplesare;themoviefragmentsthemselvesaredefinitive.Itisrecommendedthatreaderstakecareinbothlocatingandusingthisboxasmodificationstothefileafteritwascreatedmayrendereitherthepointers,orthedeclarationofsyncsamples,incorrect.

8.8.9.2 Syntax

aligned(8) class MovieFragmentRandomAccessBox extends Box(‘mfra’) { }

8.8.10 Track Fragment Random Access Box

8.8.10.1 Definition

BoxType: ‘tfra’Container: MovieFragmentRandomAccessBox(‘mfra’)Mandatory:NoQuantity: Zerooronepertrack

Eachentrycontainsthelocationandthepresentationtimeofthesyncsample.Notethatnoteverysyncsampleinthetrackneedstobelistedinthetable.

The absence of this box does not mean that all the samples are sync samples. Random accessinformationinthe‘trun’,‘traf’and‘trex’shallbesetappropriatelyregardlessofthepresenceofthisbox.

Page 75: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 61

8.8.10.2 Syntax

aligned(8) class TrackFragmentRandomAccessBox extends FullBox(‘tfra’, version, 0) { unsigned int(32) track_ID; const unsigned int(26) reserved = 0; unsigned int(2) length_size_of_traf_num; unsigned int(2) length_size_of_trun_num; unsigned int(2) length_size_of_sample_num; unsigned int(32) number_of_entry; for(i=1; i <= number_of_entry; i++){ if(version==1){ unsigned int(64) time; unsigned int(64) moof_offset; }else{ unsigned int(32) time; unsigned int(32) moof_offset; } unsigned int((length_size_of_traf_num+1) * 8) traf_number; unsigned int((length_size_of_trun_num+1) * 8) trun_number; unsigned int((length_size_of_sample_num+1) * 8) sample_number; } }

8.8.10.3 Semantics

track_ID isanintegeridentifyingthetrack_ID.length_size_of_traf_num indicatesthelengthinbyteofthetraf_numberfieldminusone.length_size_of_trun_num indicatesthelengthinbyteofthetrun_numberfieldminusone.length_size_of_sample_num indicatesthelengthinbyteofthesample_numberfieldminus

one.number_of_entry isanintegerthatgivesthenumberoftheentriesforthistrack.Ifthisvalueis

zero,itindicatesthateverysampleisasyncsampleandnotableentryfollows. time is 32 or 64 bits integer that indicates the presentation time of the sync sample in units

definedinthe‘mdhd’oftheassociatedtrack.moof_offset is32or64bitsintegerthatgivestheoffsetofthe‘moof’usedinthisentry.Offset

isthebyte‐offsetbetweenthebeginningofthefileandthebeginningofthe‘moof’.traf_numberindicatesthe‘traf’numberthatcontainsthesyncsample.Thenumberranges

from1(thefirst‘traf’ isnumbered1)ineach‘moof’.trun_numberindicatesthe‘trun’numberthatcontainsthesyncsample.Thenumberranges

from1ineach‘traf’.sample_number indicatesthesamplenumberof thesyncsample.Thenumberranges from1 in

each‘trun’. 8.8.11 Movie Fragment Random Access Offset Box

8.8.11.1 Definition

BoxType: ‘mfro’Container: MovieFragmentRandomAccessBox(‘mfra’)Mandatory:YesQuantity: Exactlyone

TheMovieFragmentRandomAccessOffsetBoxprovidesacopyofthelengthfieldfromtheenclosingMovieFragmentRandomAccessBox.Itisplacedlastwithinthatbox,sothatthesizefieldisalsolastintheenclosingMovieFragmentRandomAccessBox.WhentheMovieFragmentRandomAccessBox isalsolastinthefilethispermitsitseasylocation.Thesizefieldheremustbecorrect.However,neitherthepresenceoftheMovieFragmentRandomAccessBox,noritsplacementlastinthefile,areassured.

Page 76: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

62 ©ISO/IEC2015–Allrightsreserved

8.8.11.2 Syntax

aligned(8) class MovieFragmentRandomAccessOffsetBox extends FullBox(‘mfro’, version, 0) { unsigned int(32) size; }

8.8.11.3 Semantics

size isanintegergivesthenumberofbytesoftheenclosing‘mfra’box.Thisfieldisplacedatthe lastof theenclosingbox to assist readers scanning from theendof the file in finding the‘mfra’box.

8.8.12 Track fragment decode time

8.8.12.1 Definition

BoxType: `tfdt’Container: TrackFragmentbox(‘traf’)Mandatory: NoQuantity: Zeroorone

TheTrackFragmentBaseMediaDecodeTimeBoxprovidestheabsolutedecodetime,measuredonthemedia timeline, of the first sample in decode order in the track fragment. This can be useful, forexample,whenperformingrandomaccessinafile;itisnotnecessarytosumthesampledurationsofallprecedingsamplesinpreviousfragmentstofindthisvalue(wherethesampledurationsarethedeltasintheDecodingTimetoSampleBoxandthesample_durationsintheprecedingtrackruns).

The Track Fragment Base Media Decode Time Box, if present, shall be positioned after the TrackFragmentHeaderBoxandbeforethefirstTrackFragmentRunbox.

NOTE Thedecodetimelineisamediatimeline,establishedbeforeanyexplicitorimpliedmappingofmediatimetopresentationtime,forexamplebyaneditlistorsimilarstructure.

Ifthetimeexpressedinthetrackfragmentdecodetime(‘tfdt’)boxexceedsthesumofthedurationsofthe samples in the preceding movie and movie fragments, then the duration of the last sampleprecedingthistrackfragmentisextendedsuchthatthesumnowequalsthetimegiveninthisbox.Inthisway,itispossibletogenerateafragmentcontainingasamplewhenthetimeofthenextsampleisnotyetknown.

Inparticular,anemptytrackfragment(withnosamples,butwithatrackfragmentdecodetimebox)maybeusedtoestablishthedurationofthelastsample.

8.8.12.2 Syntax

aligned(8) class TrackFragmentBaseMediaDecodeTimeBox extends FullBox(‘tfdt’, version, 0) { if (version==1) { unsigned int(64) baseMediaDecodeTime; } else { // version==0 unsigned int(32) baseMediaDecodeTime; } }

Page 77: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 63

8.8.12.3 Semantics

version isanintegerthatspecifiestheversionofthisbox(0or1inthisspecification).baseMediaDecodeTime is an integer equal to the sum of the decode durations of all earlier

samplesinthemedia,expressedinthemedia'stimescale.Itdoesnotincludethesamplesaddedintheenclosingtrackfragment.

8.8.13 Level Assignment Box

8.8.13.1 Definition

BoxType: `leva’Container: MovieExtendsBox(`mvex’)Mandatory: NoQuantity: Zeroorone

Levelsspecifysubsetsof the file.Samplesmappedto levelnmaydependonanysamplesof levelsm,wherem<=n,andshallnotdependonanysamplesoflevelsp,wherep>n.Forexample,levelscanbespecifiedaccordingtotemporallevel(e.g.,temporal_idofSVCorMVC).

Levelscannotbespecifiedfortheinitialmovie.WhentheLevelAssignmentboxispresent,itappliestoallmoviefragmentssubsequenttotheinitialmovie.

For the context of the Level Assignment box, a fraction is defined to consist of one or more MovieFragmentboxesandtheassociatedMediaDataboxes,possiblyincludingonlyaninitialpartofthelastMediaDataBox.Withinafraction,dataforeachlevelshallappearcontiguously.Dataforlevelswithinafractionshallappearinincreasingorderoflevelvalue.Alldatainafractionshallbeassignedtolevels.

NOTE In the context of DASH (ISO/IEC 23009‐1), each subsegment indexedwithin a Subsegment Index box is afraction.

The Level Assignment box provides amapping from features, such as scalability layers, to levels. Afeaturecanbespecifiedthroughatrack,asub‐trackwithinatrack,orasamplegroupingofatrack.

When padding_flag is equal to 1 this indicates that a conforming fraction can be formed byconcatenatinganypositiveintegernumberoflevelswithinafractionandpaddingthelastMediaDataboxbyzerobytesuptothefullsizethat is indicatedintheheaderofthelastMediaDatabox.Forexample,padding_flagcanbesetequalto1whenthefollowingconditionsaretrue:

Each fraction contains two ormoreAVC, SVC, orMVC [ISO/IEC 14496‐15] tracks of the samevideobitstream.

Thesamples foreachtrackofa fractionarecontiguousand indecodingorder inaMediaDatabox.

The samples of the first AVC, SVC, orMVC level contain extractorNAL units for including thevideocodingNALunitsfromtheotherlevelsofthesamefraction.

Page 78: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

64 ©ISO/IEC2015–Allrightsreserved

8.8.13.2 Syntax

aligned(8) class LevelAssignmentBox extends FullBox(‘leva’, 0, 0) { unsigned int(8) level_count; for (j=1; j <= level_count; j++) { unsigned int(32) track_id; unsigned int(1) padding_flag; unsigned int(7) assignment_type; if (assignment_type == 0) { unsigned int(32) grouping_type; } else if (assignment_type == 1) { unsigned int(32) grouping_type; unsigned int(32) grouping_type_parameter; } else if (assignment_type == 2) {} // no further syntax elements needed else if (assignment_type == 3) {} // no further syntax elements needed else if (assignment_type == 4) { unsigned int(32) sub_track_id; } // other assignment_type values are reserved } }

8.8.13.3 Semantics

level_countspecifiesthenumberoflevelseachfractionisgroupedinto.level_countshallbegreaterthanorequalto2.

track_idforloopentryjspecifiesthetrackidentifierofthetrackassignedtolevelj.padding_flagequalto1indicatesthataconformingfractioncanbeformedbyconcatenatingany

positiveintegernumberoflevelswithinafractionandpaddingthelastMediaDataboxbyzerobytesuptothefullsizethatisindicatedintheheaderofthelastMediaDatabox.Thesemanticsofpadding_flagequalto0arethatthisisnotassured.

assignment_type indicates the mechanism used to specify the assignment to a level.assignment_type values greater than 4 are reserved, while the semantics for the othervaluesarespecifiedasfollows.Thesequenceofassignment_typesisrestrictedtobeasetofzeroormoreoftype2or3,followedbyzeroormoreofexactlyonetype. 0:samplegroupsareusedtospecifylevels,i.e.,samplesmappedtodifferentsamplegroup

description indexes of a particular sample grouping lie in different levels within theidentifiedtrack;othertracksarenotaffectedandmusthavealltheirdatainpreciselyonelevel;

1:asforassignment_type0exceptassignmentisbyaparameterizedsamplegroup; 2, 3: level assignment is by track (see the Subsegment Index Box for the difference in

processingoftheselevels) 4: the respective level contains the samples for a sub‐track. The sub‐tracks are specified

through the Sub Track box; other tracks are not affected andmust have all their data inpreciselyonelevel;

grouping_type and grouping_type_parameter, if present, specify the sample groupingusedtomapsamplegroupdescriptionentriesintheSampleGroupDescriptionboxtolevels.Levelncontainsthesamplesthataremappedtothesamplegroupdescriptionentryhavingindexninthe Sample Group Description box having the same values of grouping_type andgrouping_type_parameter,ifpresent,asthoseprovidedinthisbox.

sub_track_id specifies that the sub‐track identified bysub_track_idwithin loop entry j ismappedtolevelj.

Page 79: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 65

8.8.14 Sample Auxiliary Information in Movie Fragments

Whensampleauxiliaryinformation(8.7.8and8.7.9)ispresentintheMovieFragmentbox,theoffsetsintheSampleAuxiliaryInformationOffsetsBoxaretreatedthesameasthedata_offset intheTrackFragmentRunbox,thatis,theyarerelativetoanybasedataoffsetestablishedforthattrackfragment.Ifmovie fragment relative addressing is used (no base data offset is provided in the track fragmentheader)andauxiliaryinformationispresent,thenthedefault_base_is_moofflagmustalsobesetintheflagsofthattrackfragmentheader.

If only one offset is provided, then the Sample Auxiliary Information for all the track runs in thefragmentisstoredcontiguously,otherwiseexactlyoneoffsetmustbeprovidedforeachtrackrun.

If the field default_sample_info_size is non‐zero in one of these boxes, then the size of theauxiliaryinformationisconstantfortheidentifiedsamples.

Inaddition,if:

thisboxispresentinthemoviebox,

anddefault_sample_info_sizeisnon‐zerointheboxinthemoviebox,

andthesampleauxiliaryinformationsizesboxisabsentinamoviefragment,

thentheauxiliaryinformationhasthissameconstantsizeforeverysampleinthemoviefragmentalso;itisthennotnecessarytorepeattheboxinthemoviefragment.

8.8.15 Track Extension Properties Box

8.8.15.1 Definition

BoxType: ‘trep’Container: MovieExtendsBox(‘mvex’)Mandatory:NoQuantity: Zeroormore.(Zerooronepertrack)

Thisboxcanbeusedtodocumentorsummarizecharacteristicsofthetrackinthesubsequentmoviefragments.Itmaycontainanynumberofchildboxes.

8.8.15.2 Syntax

class TrackExtensionPropertiesBox extends FullBox(‘trep’, 0, 0) { unsigned int(32) track_id; // Any number of boxes may follow }

8.8.15.3 Semantics

track_idindicatesthetrackforwhichthetrackextensionpropertiesareprovidedinthisbox.

Page 80: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

66 ©ISO/IEC2015–Allrightsreserved

8.8.16 Alternative Startup Sequence Properties Box

8.8.16.1 Definition

BoxType: ‘assp’Container: TrackExtensionPropertiesBox(‘trep’)Mandatory:NoQuantity: Zeroorone

ThisboxindicatesthepropertiesofalternativestartupsequencesamplegroupsinthesubsequenttrackfragmentsofthetrackindicatedinthecontainingTrackExtensionPropertiesbox.

Version0oftheAlternativeStartupSequencePropertiesboxshallbeusedifversion0oftheSampletoGroupbox isused for thealternativestartupsequencesamplegrouping.Version1of theAlternativeStartupSequencePropertiesboxshallbeusedifversion1oftheSampletoGroupboxisusedforthealternativestartupsequencesamplegrouping.

8.8.16.2 Syntax

class AlternativeStartupSequencePropertiesBox extends FullBox(‘assp’, version, 0) { if (version == 0) { signed int(32) min_initial_alt_startup_offset; } else if (version == 1) { unsigned int(32) num_entries; for (j=1; j <= num_entries; j++) { unsigned int(32) grouping_type_parameter; signed int(32) min_initial_alt_startup_offset; } } }

8.8.16.3 Semantics

min_initial_alt_startup_offset:Novalueofsample_offset[1]ofthereferredsamplegroupdescriptionentriesofthealternativestartupsequencesamplegroupingshallbesmallerthanmin_initial_alt_startup_offset.Inversion0ofthisbox,thealternativestartupsequencesamplegroupingusingversion0oftheSampletoGroupboxisreferredto.Inversion1ofthisbox,thealternativestartupsequencesamplegroupingusingversion1oftheSampletoGroupboxisreferredtoasfurtherconstrainedbygrouping_type_parameter.

num_entriesindicatesthenumberofalternativestartupsequencesamplegroupingsdocumentedinthisbox.

grouping_type_parameter indicateswhichoneof thealternativesamplegroupingsthis loopentryappliesto.

8.8.17 Metadata and user data in movie fragments

Whenmetaboxesoccurinmoviefragmentortrackfragmentboxes,thefollowingapplies.Thefilemusthavebeenfragmentedsuchthatanymeta‐dataneededinthemovieortrackfragmentisformedfromtheunionof themeta‐data in themovieboxand the fragment,notconsideringorusingmeta‐data inanyotherfragment.Meta‐datainamovieortrackfragmentislogically‘arrivinglate’butisvalidfortheentire track. When a file is de‐fragmented, the meta‐data in the movie or track fragments must bemerged into themovie or track boxes, respectively. This process allows for ‘just in time’ delivery of

Page 81: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 67

supportresources,andbandwidthmanagement,whilepreservingtheessentiallyatemporalnatureofuntimedmeta‐data.Ifmeta‐datatrulychangesovertime,atimedmeta‐datatrackmaybeneeded.

If,duringthismerge,thereareeither(a)meta‐dataitemswiththesameitem_IDor(b)user‐dataitemswiththesametype,thenthefollowingapplies:

a) alloccurrencesofthedata(user‐databoxormeta‐dataitem)mustbe‘true’fortheentiremovieincludingallfragments;

b) theoccurrencesinhigher‐numberedmoviefragments(‘later’occurrences)maybemoreaccurateor‘preferred’;

c) inparticular,datainanemptyinitialmovieatommaybeonlyestimatesor‘nottoexceed’values,anddatainafinalotherwiseemptymoviefragmentmaybethe‘final’ormostaccuratevalues.

8.9 Sample Group Structures

8.9.1 Introduction

This clause specifies a generic mechanism for representing a partition of the samples in a track. Asample grouping isanassignmentofeachsampleinatracktobeamemberofonesample group,basedonagroupingcriterion.Asamplegroupinasamplegroupingisnotlimitedtobeingcontiguoussamplesand may contain non‐adjacent samples. As there may be more than one sample grouping for thesamplesinatrack,eachsamplegroupinghasatypefieldtoindicatethetypeofgrouping.Forexample,afilemightcontaintwosamplegroupingsforthesametrack:onebasedonanassignmentofsampletolayersandanothertosub‐sequences.

Sample groupings are represented by two linked data structures: (1) a SampleToGroup boxrepresents the assignment of samples to sample groups; (2) a SampleGroupDescription boxcontainsasample group entryforeachsamplegroupdescribingthepropertiesofthegroup.Theremaybe multiple instances of the SampleToGroup and SampleGroupDescription boxes based ondifferentgroupingcriteria.Thesearedistinguishedbyatypefieldusedtoindicatethetypeofgrouping.

Agroupingofaparticulargroupingtypemayuseaparameterinthesampletogroupmapping;ifso,themeaning of the parameter must be documented with the group. An example of this might bedocumentedthesyncpointsinamultiplexofseveralvideostreams;thegroupdefinitionmightbe‘IsanIframe’,andthegroupparametermightbetheidentifierofeachstream.Sincethesampletogroupboxoccurs once for each stream, it is now both compact, and informs the reader about each streamseparately.

Oneexampleofusingthesetablesistorepresenttheassignmentsofsamplestolayers.Inthiscaseeachsample group represents one layer, with an instance of theSampleToGroup box describingwhichlayerasamplebelongsto.

Page 82: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

68 ©ISO/IEC2015–Allrightsreserved

8.9.2 Sample to Group Box

8.9.2.1 Definition

BoxType: ‘sbgp’Container: SampleTableBox(‘stbl’)orTrackFragmentBox(‘traf’)Mandatory:NoQuantity: Zeroormore.

Thistablecanbeusedtofindthegroupthatasamplebelongstoandtheassociateddescriptionofthatsamplegroup.Thetableiscompactlycodedwitheachentrygivingtheindexofthefirstsampleofarunofsampleswith thesamesamplegroupdescriptor.Thesamplegroupdescription ID isan index thatreferstoaSampleGroupDescriptionbox,whichcontainsentriesdescribingthecharacteristicsofeachsamplegroup.

Theremaybemultipleinstancesofthisboxifthereismorethanonesamplegroupingforthesamplesin a track. Each instance of the SampleToGroup box has a type code that distinguishes differentsamplegroupings.ThereshallbeatmostoneinstanceofthisboxwithaparticulargroupingtypeinaSampleTableBoxorTrackFragmentBox.TheassociatedSampleGroupDescriptionshallindicatethesamevalueforthegroupingtype.

Version1ofthisboxshouldonlybeusedifagroupingtypeparameterisneeded.

8.9.2.2 Syntax

aligned(8) class SampleToGroupBox extends FullBox(‘sbgp’, version, 0) { unsigned int(32) grouping_type; if (version == 1) { unsigned int(32) grouping_type_parameter; } unsigned int(32) entry_count; for (i=1; i <= entry_count; i++) { unsigned int(32) sample_count; unsigned int(32) group_description_index; } }

8.9.2.3 Semantics

version isanintegerthatspecifiestheversionofthisbox,either0or1.grouping_type is an integer that identifies the type (i.e. criterion used to form the sample

groups)ofthesamplegroupingandlinksittoitssamplegroupdescriptiontablewiththesamevalue for grouping type. At most one occurrence of this box with the same value forgrouping_type(and, if used, grouping_type_parameter)shallexistforatrack.

grouping_type_parameter isanindicationofthesub‐typeofthegroupingentry_countisanintegerthatgivesthenumberofentriesinthefollowingtable.sample_countisanintegerthatgivesthenumberofconsecutivesampleswiththesamesample

groupdescriptor.Ifthesumofthesamplecountinthisboxislessthanthetotalsamplecount,orthereisnosample‐to‐groupboxthatappliestosomesamples(e.g.itisabsentfromatrackfragment),thenthereadershouldassociatesthesamplesthathavenoexplicitgroupassociationwiththedefaultgroupdefinedintheSampleDescriptionGroupbox,ifany,orelsewithnogroup.It is an error for the total in this box to be greater than the sample_count documentedelsewhere,andthereaderbehaviourwouldthenbeundefined.

Page 83: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 69

group_description_index isanintegerthatgivestheindexofthesamplegroupentrywhichdescribes thesamples in thisgroup.The indexranges from1 to thenumberof samplegroupentries in the SampleGroupDescription Box, or takes the value 0 to indicate that thissampleisamemberofnogroupofthistype.

8.9.3 Sample Group Description Box

8.9.3.1 Definition

BoxType: ‘sgpd’Container: SampleTableBox(‘stbl’)orTrackFragmentBox(‘traf’)Mandatory:NoQuantity: Zeroormore,withoneforeachSampletoGroupBox.

This description table gives information about the characteristics of sample groups. The descriptiveinformationisanyotherinformationneededtodefineorcharacterizethesamplegroup.

Theremaybemultipleinstancesofthisboxifthereismorethanonesamplegroupingforthesamplesina track.Each instanceof theSampleGroupDescription boxhas a typecode thatdistinguishesdifferentsamplegroupings.ThereshallbeatmostoneinstanceofthisboxwithaparticulargroupingtypeinaSampleTableBoxorTrackFragmentBox.TheassociatedSampleToGroupshallindicatethesamevalueforthegroupingtype.

Theinformationisstoredinthesamplegroupdescriptionboxaftertheentry‐count.Anabstractentrytype isdefinedand samplegroupings shall definederived types to represent thedescriptionof eachsamplegroup.Forvideotracks,anabstractVisualSampleGroupEntryisusedwithsimilartypesforaudioandhinttracks.

NOTEInversion0oftheentriesthebaseclassesforsamplegroupdescriptionentriesareneitherboxesnorhaveasizethatissignaled.Forthisreason,useofversion0entriesisdeprecated.Whendefiningderivedclasses,ensureeitherthattheyhaveafixedsize,orthatthesizeisexplicitlyindicatedwithalengthfield.Animplied size (e.g. achieved by parsing the data) is not recommended as this makes scanning the arraydifficult.

8.9.3.2 Syntax

// Sequence Entry abstract class SampleGroupDescriptionEntry (unsigned int(32) grouping_type) { } abstract class VisualSampleGroupEntry (unsigned int(32) grouping_type) extends SampleGroupDescriptionEntry (grouping_type) { } abstract class AudioSampleGroupEntry (unsigned int(32) grouping_type) extends SampleGroupDescriptionEntry (grouping_type) { } abstract class HintSampleGroupEntry (unsigned int(32) grouping_type) extends SampleGroupDescriptionEntry (grouping_type) { }

Page 84: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

70 ©ISO/IEC2015–Allrightsreserved

abstract class SubtitleSampleGroupEntry (unsigned int(32) grouping_type) extends SampleGroupDescriptionEntry (grouping_type) { } abstract class TextSampleGroupEntry (unsigned int(32) grouping_type) extends SampleGroupDescriptionEntry (grouping_type) { } aligned(8) class SampleGroupDescriptionBox (unsigned int(32) handler_type) extends FullBox('sgpd', version, 0){ unsigned int(32) grouping_type; if (version==1) { unsigned int(32) default_length; } if (version>=2) { unsigned int(32) default_sample_description_index; } unsigned int(32) entry_count; int i; for (i = 1 ; i <= entry_count ; i++){ if (version==1) { if (default_length==0) { unsigned int(32) description_length; } } SampleGroupEntry (grouping_type); // an instance of a class derived from SampleGroupEntry // that is appropriate and permitted for the media type } }

8.9.3.3 Semantics

version isanintegerthatspecifiestheversionofthisbox.grouping_type is an integer that identifies theSampleToGroup box that is associatedwith

this sample group description. If grouping_type_parameter is not defined for a givengrouping_type,thenthereshallbeonlyoneoccurrenceofthisboxwiththisgrouping_type.

default_sample_description_index: specifies the index of the sample group description entry which applies to all samples in the track for which no sample to group mapping is provided through a SampleToGroup box. The default value of this field is zero (indicating that the samples are mapped to no group of this type).

entry_countisanintegerthatgivesthenumberofentriesinthefollowingtable.default_length indicatesthelengthofeverygroupentry(ifthelengthisconstant),orzero(0)

ifitisvariabledescription_lengthindicatesthelengthofanindividualgroupentry,inthecaseitvariesfrom

entrytoentryanddefault_lengthistherefore0

8.9.4 Representation of group structures in Movie Fragments

Support for Sample Group structures within Movie fragments is provided by the use of theSampleToGroupBoxwiththecontainerforthisBoxbeingtheTrack FragmentBox(‘traf’).Thedefinition,syntaxandsemanticsofthisBoxisasspecifiedinsubclause8.9.2.

TheSampleToGroup Boxcanbeusedtofindthegroupthatasampleinatrackfragmentbelongstoand the associated description of that sample group. The table is compactly coded with each entrygiving the indexof the first sampleof a run of sampleswith the same sample groupdescriptor.Thesample group description ID is an index that refers to a SampleGroupDescription Box, which

Page 85: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 71

contains entries describing the characteristics of each sample group and present in theSampleTableBox.

TheremaybemultipleinstancesoftheSampleToGroupBoxifthereismoretheonesamplegroupingfor thesamples ina track fragment.Each instanceof theSampleToGroupBoxhasa typecode thatdistinguishesdifferentsamplegroupings.TheassociatedSampleGroupDescription shall indicatethesamevalueforthegroupingtype.

The total number of samples represented in anySampleToGroup Box in the track fragment mustmatch the total number of samples in all the track fragment runs. Each SampleToGroup Boxdocumentsadifferentgroupingofthesamesamples.

Zero or more SampleGroupDescription boxes may also be present in a Track Fragment Box. ThesedefinitionsareadditionaltothedefinitionsprovidedintheSampleTableofthetrackintheMovieBox.Group definitions within a movie fragment can also be referenced and used fromwithin that samemoviefragment.

Within the SampleToGroup box in that movie fragment, the group description indexes for groupsdefinedwithinthesamefragmentstartat0x10001,i.e.theindexvalue1,withthevalue1inthetop16bits.Thismeanstheremustbefewerthan65536groupdefinitionsforthistrackandgroupingtypeinthesampletableintheMovieBox.

Whenchangingthesizeofmoviefragments,orremovingthem,thesefragment‐localgroupdefinitionswillneedtobemergedintothedefinitionsinthemoviebox,orintothenewmoviefragments,andtheindex numbers in the SampleToGroup box(es) adjusted accordingly. It is recommended that, in thisprocess, identical (andhenceduplicate)definitionsnotbemade inanySampleGroupDescriptionbox,butthatduplicatesbemergedandtheindexesadjustedaccordingly.

8.10 User Data

8.10.1 User Data Box

8.10.1.1 Definition

BoxType: ‘udta’Container: MovieBox(‘moov’),TrackBox(‘trak’), MovieFragmentBox(‘moof’)orTrackFragmentBox(‘traf’)Mandatory:NoQuantity: Zeroorone

This box contains objects that declare user information about the containing box and its data(presentationortrack).

TheUserDataBoxisacontainerboxforinformativeuser‐data.Thisuserdataisformattedasasetofboxeswithmorespecificboxtypes,whichdeclaremorepreciselytheircontent.

Thehandlingofuser‐datainmoviefragmentsisdescribedin8.8.17.

Page 86: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

72 ©ISO/IEC2015–Allrightsreserved

8.10.1.2 Syntax

aligned(8) class UserDataBox extends Box(‘udta’) { }

8.10.2 Copyright Box

8.10.2.1 Definition

BoxType: ‘cprt’Container: Userdatabox(‘udta’)Mandatory: NoQuantity: Zeroormore

The Copyright box contains a copyright declaration which applies to the entire presentation, whencontained within the Movie Box, or, when contained in a track, to that entire track. There may bemultiplecopyrightboxesusingdifferentlanguagecodes.

8.10.2.2 Syntax

aligned(8) class CopyrightBox extends FullBox(‘cprt’, version = 0, 0) { const bit(1) pad = 0; unsigned int(5)[3] language; // ISO-639-2/T language code string notice; }

8.10.2.3 Semantics

language declaresthelanguagecodeforthefollowingtext.SeeISO639‐2/Tforthesetofthreecharactercodes.Eachcharacter ispackedas thedifferencebetween itsASCIIvalueand0x60.Thecodeisconfinedtobeingthreelower‐caseletters,sothesevaluesarestrictlypositive.

noticeisanull‐terminatedstringineitherUTF‐8orUTF‐16characters,givingacopyrightnotice.IfUTF‐16isused,thestringshallstartwiththeBYTEORDERMARK(0xFEFF),todistinguishitfromaUTF‐8string.Thismarkdoesnotformpartofthefinalstring.

8.10.3 Track Selection Box

8.10.3.1 Introduction

Atypicalpresentationstoredinafilecontainsonealternategrouppermediatype:oneforvideo,oneforaudio,etc.Sucha filemayincludeseveralvideotracks,although,atanypoint intime,onlyoneofthemshouldbeplayedorstreamed.Thisisachievedbyassigningallvideotrackstothesamealternategroup.(Seesubclause8.3.2forthedefinitionofalternategroups.)

Alltracksinanalternategrouparecandidatesformediaselection,butitmaynotmakesensetoswitchbetweensomeof those tracksduringa session.Onemay for instanceallowswitchingbetweenvideotracks at different bitrates and keep frame size but not allow switching between tracks of differentframesize.Inthesamemanneritmaybedesirabletoenableselection–butnotswitching–betweentracksofdifferentvideocodecsordifferentaudiolanguages.

Thedistinctionbetween tracks for selection and switching is addressedbyassigning tracks to switchgroupsinadditiontoalternategroups.Onealternategroupmaycontainoneormoreswitchgroups.Alltracksinanalternategrouparecandidatesformediaselection,whiletracksinaswitchgrouparealso

Page 87: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 73

available forswitchingduringasession.Differentswitchgroupsrepresentdifferentoperationpoints,suchasdifferentframesize,high/lowquality,etc.

Forthecaseofnon‐scalablebitstreams,severaltracksmaybeincludedinaswitchgroup.Thesamealsoappliestonon‐layeredscalablebitstreams,suchastraditionalAVCstreams.

Bylabellingtrackswithattributesitispossibletocharacterizethem.Eachtrackcanbelabelledwithalistofattributeswhichcanbeusedtodescribetracksinaparticularswitchgroupordifferentiatetracksthatbelongtodifferentswitchgroups.

8.10.3.2 Definition

BoxType: ‘tsel’Container: UserDataBox(‘udta’)Mandatory:NoQuantity: ZeroorOne

Thetrackselectionboxiscontainedintheuserdataboxofthetrackitmodifies.

8.10.3.3 Syntax

aligned(8) class TrackSelectionBox extends FullBox(‘tsel’, version = 0, 0) { template int(32) switch_group = 0; unsigned int(32) attribute_list[]; // to end of the box }

8.10.3.4 Semantics

switch_group isanintegerthatspecifiesagrouporcollectionoftracks.Ifthisfieldis0(defaultvalue)oriftheTrackSelectionboxisabsentthereisnoinformationonwhetherthetrackcanbeusedforswitchingduringplayingorstreaming. If this integerisnot0 itshallbethesamefortracksthatcanbeusedforswitchingbetweeneachother.Tracksthatbelongtothesameswitchgroupshallbelongtothesamealternategroup.Aswitchgroupmayhaveonlyonemember.

attribute_listisalist,totheendofthebox,ofattributes.Theattributesinthislistshouldbeused as descriptions of tracks or differentiation criteria for tracks in the same alternate orswitch group. Each differentiating attribute is associated with a pointer to the field orinformationthatdistinguishesthetrack.

8.10.3.5 Attributes

Page 88: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

74 ©ISO/IEC2015–Allrightsreserved

Thefollowingattributesaredescriptive:

Name Attribute Description

Temporalscalability

‘tesc’ Thetrackcanbetemporallyscaled.

Fine‐grainSNRscalability

‘fgsc’ Thetrackcanbescaledintermsofquality.

Coarse‐grainSNRscalability

‘cgsc’ Thetrackcanbescaledintermsofquality.

Spatialscalability ‘spsc’ Thetrackcanbespatiallyscaled.

Region‐of‐interestscalability

‘resc’ Thetrackcanberegion‐of‐interestscaled.

Viewscalability ‘vwsc’ Thetrackcanbescaledintermsofnumberofviews.

Thefollowingattributesaredifferentiating:

Name Attribute Pointer

Codec ‘cdec’ Sample Entry (in Sample Description box of mediatrack)

Screensize ‘scsz’ WidthandheightfieldsofVisualSampleEntries.

Maxpacketsize ‘mpsz’ MaxpacketsizefieldinRTPHintSampleEntry

Mediatype ‘mtyp’ HandlertypeinHandlerbox(ofmediatrack)

Medialanguage ‘mela’ LanguagefieldinMediaHeaderbox

Bitrate ‘bitr’ Totalsizeofthesamplesinthetrackdividedbythedurationinthetrackheaderbox

Framerate ‘frar’ Numberofsamplesinthetrackdividedbydurationinthetrackheaderbox

Numberofviews ‘nvws’ Numberofviewsinthesubtrack

Descriptive attributes characterize the tracks they modify, whereas differentiating attributesdifferentiate between tracks that belong to the same alternate or switch groups. The pointer of adifferentiatingattributeindicatesthelocationoftheinformationthatdifferentiatesthetrackfromothertrackswiththesameattribute.

8.10.4 Track kind

8.10.4.1 Definition

BoxType: ‘kind’Container: Userdatabox(‘udta’)inatrackMandatory: NoQuantity: Zeroormore

TheKindboxlabelsatrackwithitsroleorkind.

Page 89: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 75

ItcontainsaURI,possiblyfollowedbyavalue.IfonlyaURIoccurs,thenthekindisdefinedbythatURI;ifavaluefollows,thenthenamingschemeforthevalueisidentifiedbytheURI.BoththeURIandthevaluearenull‐terminatedCstrings.

More thanoneof thesemayoccur ina track,withdifferent contentsbutwithappropriatesemantics(e.g.twoschemesthatbothdefineakindthatindicatessub‐titles).

8.10.4.2 Syntax

aligned(8) class KindBox extends FullBox(‘kind’, version = 0, 0) { string schemeURI; string value; }

8.10.4.3 Semantics

schemeURIisaNULL‐terminatedCstringdeclaringeithertheidentifierofthekind,ifnovaluefollows,ortheidentifierofthenamingschemeforthefollowingvalue.

valueisanamefromthedeclaredscheme

8.11 Metadata Support

Acommonbasestructureisusedtocontaingeneralmetadata,calledthemetabox.

8.11.1 The Meta box

8.11.1.1 Definition

BoxType: ‘meta’Container: File,MovieBox(‘moov’),TrackBox(‘trak’), AdditionalMetadataContainerBox(‘meco’), MovieFragmentBox(‘moof’)orTrackFragmentBox(‘traf’)Mandatory:NoQuantity: Zeroorone(inFile,‘moov’,and‘trak’),Oneormore(in‘meco’)

A meta box contains descriptive or annotative metadata. The 'meta' box is required to contain a‘hdlr’boxindicatingthestructureorformatofthe‘meta’boxcontents.Thatmetadatais locatedeitherwithinaboxwithinthisbox(e.g.anXMLbox),orislocatedbytheitemidentifiedbyaprimaryitembox.

Allothercontainedboxesarespecifictotheformatspecifiedbythehandlerbox.

Theotherboxesdefinedheremaybedefinedasoptionalormandatoryforagivenformat.Iftheyareused,thentheymusttaketheformspecifiedhere.Theseoptionalboxesincludeadata‐informationbox,whichdocumentsother files inwhichmetadatavalues (e.g. pictures) areplaced, anda item locationbox,whichdocumentswhere in those files each item is located (e.g. in the commoncaseofmultiplepicturesstoredinthesamefile).Atmostonemetaboxmayoccurateachofthefilelevel,movielevel,ortracklevel,unlesstheyarecontainedinanadditionalmetadatacontainerbox(‘meco’).

If an Item Protection Box occurs, then some or all of themeta‐data, including possibly the primaryresource, may have been protected and be un‐readable unless the protection system is taken intoaccount.

Page 90: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

76 ©ISO/IEC2015–Allrightsreserved

Thehandlingofmeta‐datainmoviefragmentsisdescribedin8.8.17.

8.11.1.2 Syntax

aligned(8) class MetaBox (handler_type) extends FullBox(‘meta’, version = 0, 0) { HandlerBox(handler_type) theHandler; PrimaryItemBox primary_resource; // optional DataInformationBox file_locations; // optional ItemLocationBox item_locations; // optional ItemProtectionBox protections; // optional ItemInfoBox item_infos; // optional IPMPControlBox IPMP_control; // optional ItemReferenceBox item_refs; // optional ItemDataBox item_data; // optional Box other_boxes[]; // optional }

8.11.1.3 Semantics

Thestructureorformatofthemetadataisdeclaredbythehandler.Inthecasethattheprimarydatais identifiedbyaprimary item, and thatprimary itemhasan item informationentrywithanitem_type,thehandlertypemaybethesameastheitem_type.

8.11.2 XML Boxes

8.11.2.1 Definition

BoxType: ‘xml ‘or‘bxml’Container: Metabox(‘meta’)Mandatory:NoQuantity: Zeroorone

WhentheprimarydataisinXMLformatanditisdesiredthattheXMLbestoreddirectlyinthemeta‐box,oneoftheseformsmaybeused.TheBinaryXMLBoxmayonlybeusedwhenthereisasinglewell‐definedbinarizationoftheXMLforthatdefinedformatasidentifiedbythehandler.

WithinanXMLboxthedata is inUTF‐8formatunlessthedatastartswithabyte‐order‐mark(BOM),whichindicatesthatthedataisinUTF‐16format.

8.11.2.2 Syntax

aligned(8) class XMLBox extends FullBox(‘xml ’, version = 0, 0) { string xml; }

aligned(8) class BinaryXMLBox extends FullBox(‘bxml’, version = 0, 0) { unsigned int(8) data[]; // to end of box }

Page 91: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 77

8.11.3 The Item Location Box

8.11.3.1 Definition

BoxType: ‘iloc’Container: Metabox(‘meta’)Mandatory:NoQuantity: Zeroorone

Theitemlocationboxprovidesadirectoryofresourcesinthisorotherfiles,bylocatingtheircontainer,their offset within that container, and their length. Placing this in binary format enables commonhandling of this data, even by systems which do not understand the particular metadata system(handler)used.Forexample,asystemmightintegratealltheexternallyreferencedmetadataresourcesintooneplace,re‐adjustingoffsetsandreferencesaccordingly.

The box startswith three or four values, specifying the size in bytes of theoffset field,length field,base_offsetfield,and,inversions1and2ofthisbox,theextent_indexfields,respectively.Thesevaluesmustbefromtheset{0,4,8}.

Theconstruction_methodfieldindicatesthe‘constructionmethod’fortheitem:

i) file_offset:bytheusualabsolutefileoffsetsintothefileatdata_reference_index;(construction_method==0)

ii) idat_offset:byboxoffsetsintotheidatboxinthesamemetabox;neitherthedata_reference_indexnorextent_indexfieldsareused;(construction_method==1)

iii) item_offset:byitemoffsetintotheitemsindicatedbytheextent_indexfield,whichisonlyused(currently)bythisconstructionmethod.(construction_method==2).

The extent_index is only used for themethod item_offset; it indicates the 1‐based index of the itemreferencewithreferenceType‘iloc’linkedfromthisitem.Ifindex_sizeis0,thenthevalue1isimplied;thevalue0isreserved.

Items may be stored fragmented into extents, e.g. to enable interleaving. An extent is a contiguoussubset of thebytes of the resource; the resource is formedby concatenating the extents. If only oneextentisused(extent_count=1)theneitherorbothoftheoffsetandlengthmaybeimplied:

If theoffset isnot identified (the fieldhas a lengthof zero), then thebeginningof the source(offset0)isimplied.

Ifthelengthisnotspecified,orspecifiedaszero,thentheentirelengthofthesourceisimplied.References into the same file as this metadata, or items divided into more than one extent,shouldhaveanexplicitoffsetandlength,oruseaMIMEtyperequiringadifferentinterpretationofthefile,toavoidinfiniterecursion.

Thesizeoftheitemisthesumoftheextentlengths.

NOTEExtentsmaybeinterleavedwiththechunksdefinedbythesampletablesoftracks.

Theoffsetsarerelativetoadataorigin.Thatoriginisdeterminedasfollows:

Page 92: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

78 ©ISO/IEC2015–Allrightsreserved

1) whentheMetaboxisinaMovieFragment,andtheconstruction_methodspecifiesafileoffset,and the data reference indicates ‘same file’, the data origin is the first byte of the enclosingMovieFragmentBox(asforthedefault‐base‐is‐moofflagintheTrackFragmentHeader);

2) in all other caseswhen the construction_method specifies a file offset, the data origin is thebeginningofthefileidentifiedbythedatareference;

3) when the construction_method specifies offsets into the ItemData box, the data origin is thebeginningofdata[]intheItemDatabox;

4) when the data reference specifies another item, the data origin is the first byte of theconcatenateddata(ofalltheextents)ofthatitem;

Note – There are offset calculations in other parts of this file format based on the beginning of a box header; incontrast,itemdataoffsetsarecalculatedrelativetotheboxcontents.

The data‐reference index may take the value 0, indicating a reference into the same file as thismetadata,oranindexintothedata‐referencetable.

Some referenced datamay itself use offset/length techniques to address resourceswithin it (e.g. anMP4 filemight be ‘included’ in thisway). Normally such offsets in the item itself are relative to thebeginning of the containing file. The field ‘base offset’ provides an additional offset for offsetcalculationswithinthatcontaineddata.Forexample,ifanMP4fileisincludedwithinafileformattedtothis specification, thennormallydata‐offsetswithin thatMP4 sectionare relative to thebeginningoffile;thebaseoffsetaddstothoseoffsets.

Ifanitemisconstructedfromotheritems,andthosesourceitemsareprotected,theoffsetandlengthinformationapplytothesourceitemsaftertheyhavebeende‐protected.Thatis,thetargetitemdataisformedfromunprotectedsourcedata.

For maximum compatibility, version 0 of this box should be used in preference to version 1 withconstruction_method==0,orversion2whenpossible.Similarly,version2ofthisboxshouldonlybe used when support for large item_ID values (exceeding 65535) is required or expected to berequired.

Page 93: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 79

8.11.3.2 Syntax

aligned(8) class ItemLocationBox extends FullBox(‘iloc’, version, 0) { unsigned int(4) offset_size; unsigned int(4) length_size; unsigned int(4) base_offset_size; if ((version == 1) || (version == 2)) { unsigned int(4) index_size; } else { unsigned int(4) reserved; } if (version < 2) { unsigned int(16) item_count; } else if (version == 2) { unsigned int(32) item_count; } for (i=0; i<item_count; i++) { if (version < 2) { unsigned int(16) item_ID; } else if (version == 2) { unsigned int(32) item_ID; } if ((version == 1) || (version == 2)) { unsigned int(12) reserved = 0; unsigned int(4) construction_method; } unsigned int(16) data_reference_index; unsigned int(base_offset_size*8) base_offset; unsigned int(16) extent_count; for (j=0; j<extent_count; j++) { if (((version == 1) || (version == 2)) && (index_size > 0)) { unsigned int(index_size*8) extent_index; } unsigned int(offset_size*8) extent_offset; unsigned int(length_size*8) extent_length; } } }

8.11.3.3 Semantics

offset_sizeistakenfromtheset{0,4,8}andindicatesthelengthinbytesoftheoffset field.length_sizeistakenfromtheset{0,4,8}andindicatesthelengthinbytesofthelength field.base_offset_size is taken from the set {0, 4, 8} and indicates the length in bytes of the

base_offsetfield.index_sizeistakenfromtheset{0,4,8}andindicatesthelengthinbytesoftheextent_index

field.item_countcountsthenumberofresourcesinthefollowingarray.item_ID isanarbitraryinteger‘name’forthisresourcewhichcanbeusedtorefertoit(e.g.ina

URL).construction_methodistakenfromtheset0(file),1(idat)or2(item)data-reference-indexiseitherzero(‘thisfile’)ora1‐basedindexintothedatareferencesin

thedatainformationbox.base_offset provides a base value for offset calculations within the referenced data. If

base_offset_sizeis0,base_offsettakesthevalue0,i.e.itisunused.extent_count provides the count of the number of extents into which the resource is

fragmented;itmusthavethevalue1orgreaterextent_indexprovidesanindexasdefinedfortheconstructionmethodextent_offsetprovidestheabsoluteoffset,inbytesfromthedataoriginofthecontainer,ofthis

extentdata.Ifoffset_sizeis0,extent_offsettakesthevalue0

Page 94: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

80 ©ISO/IEC2015–Allrightsreserved

extent_length provides the absolute length in bytes of this metadata item extent. Iflength_sizeis0,extent_lengthtakesthevalue0.Ifthevalueis0,thenlengthoftheextentisthelengthoftheentirereferencedcontainer.

8.11.4 Primary Item Box

8.11.4.1 Definition

BoxType: ‘pitm’Container: Metabox(‘meta’)Mandatory:NoQuantity: Zeroorone

Foragivenhandler,theprimarydatamaybeoneofthereferenceditemswhenitisdesiredthatitbestoredelsewhere,ordividedintoextents;ortheprimarymetadatamaybecontainedinthemeta‐box(e.g.inanXMLbox).Eitherthisboxmustoccur,ortheremustbeaboxwithinthemeta‐box(e.g.anXMLbox)containingtheprimaryinformationintheformatrequiredbytheidentifiedhandler.

8.11.4.2 Syntax

aligned(8) class PrimaryItemBox extends FullBox(‘pitm’, version, 0) { if (version == 0) { unsigned int(16) item_ID; } else { unsigned int(32) item_ID; } }

8.11.4.3 Semantics

item_IDistheidentifieroftheprimaryitem.Version1shouldonlybeusedwhenlargeitem_IDvalues(exceeding65535)arerequiredorexpectedtoberequired.

8.11.5 Item Protection Box

8.11.5.1 Definition

BoxType: ‘ipro’Container: Metabox(‘meta’)Mandatory:NoQuantity: Zeroorone

The item protection box provides an array of item protection information, for use by the ItemInformationBox.

8.11.5.2 Syntax

aligned(8) class ItemProtectionBox extends FullBox(‘ipro’, version = 0, 0) { unsigned int(16) protection_count; for (i=1; i<=protection_count; i++) { ProtectionSchemeInfoBox protection_information; } }

Page 95: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 81

8.11.6 Item Information Box

8.11.6.1 Definition

BoxType: ‘iinf’Container: MetaBox(‘meta’)Mandatory:NoQuantity: Zeroorone

The Item informationboxprovides extra informationabout selected items, including symbolic (‘file’)names. It may optionally occur, but if it does, it must be interpreted, as item protection or contentencodingmayhavechangedtheformatofthedataintheitem.Ifbothcontentencodingandprotectionareindicatedforanitem,areadershouldfirstun‐protecttheitem,andthendecodetheitem’scontentencoding.Ifmorecontrolisneeded,anIPMPsequencecodemaybeused.

This box contains an array of entries, and each entry is formatted as a box. This array is sorted byincreasingitem_IDintheentryrecords.

Fourversionsoftheiteminfoentryaredefined.Version1includesadditionalinformationtoversion0asspecifiedbyanextensiontype.Forinstance,itshallbeusedwithextensiontype'fdel' foritemsthatarereferencedbythefilepartitionbox('fpar'),whichisdefinedforsourcefilepartitioningsandapplies to file delivery transmissions. Versions 2 and 3 provide an alternative structure in whichmetadataitemtypesareindicatedbya32‐bit(typically4‐character)registeredordefinedcode;twoofthesecodesaredefinedtoindicateaMIMEtypeormetadatatypedbyaURI.Version2supports16‐bititem_IDvalues,whereasversion3supports32‐bititem_IDvalues.

If no extension is desired, the box may terminate without the extension_type field and theextension;if,inaddition,content_encodingisnotdesired,thatfieldalsomaybeabsentandtheboxterminatebefore it. If anextension isdesiredwithout anexplicitcontent_encoding, a singlenullbyte,signifyingtheemptystring,mustbesuppliedforthecontent_encoding,beforetheindicationofextension_type.

If file delivery item information is needed and a version 2 or 3 ItemInfoEntry is used, then the filedeliveryinformationisstoredasaseparateitemoftype‘fdel’thatisalsolinkedbyanitemreferencefromtheitem,tothefiledeliveryinformation,oftype‘fdel’.Theremustbeexactlyonesuchreferenceiffiledeliveryinformationisneeded.

It ispossiblethattherearevalidURI formsforMPEG‐7metadata(e.g.aschemaURIwitha fragmentidentifyingaparticularelement),anditmaybepossiblethatthesestructurescouldbeusedforMPEG‐7.However,thereisexplicitsupportforMPEG‐7inISObasemediafileformatfamilyfiles,andthisexplicitsupportispreferredasitallows,amongotherthings:

a) incrementalupdateofthemetadata(logically,I/Pcoding,invideoterms)whereasthisdraftis‘I‐frameonly’;

b) binarizationandthuscompaction;

c) theuseofmultipleschemas.

Therefore,theuseofthesestructuresforMPEG‐7isdeprecated(andundocumented).

Page 96: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

82 ©ISO/IEC2015–Allrightsreserved

InformationonURIformsforsomemetadatasystemscanbefoundinAnnexG.

Version 1 of ItemInfoBox should only be used when support for a large number ofitemInfoEntries(exceeding65535)isrequiredorexpectedtoberequired.

8.11.6.2 Syntax

aligned(8) class ItemInfoExtension(unsigned int(32) extension_type) { }

aligned(8) class FDItemInfoExtension() extends ItemInfoExtension (’fdel’) { string content_location; string content_MD5; unsigned int(64) content_length; unsigned int(64) transfer_length; unsigned int(8) entry_count; for (i=1; i <= entry_count; i++) unsigned int(32) group_id; }

aligned(8) class ItemInfoEntry extends FullBox(‘infe’, version, 0) { if ((version == 0) || (version == 1)) { unsigned int(16) item_ID; unsigned int(16) item_protection_index string item_name; string content_type; string content_encoding; //optional } if (version == 1) { unsigned int(32) extension_type; //optional ItemInfoExtension(extension_type); //optional } if (version >= 2) { if (version == 2) { unsigned int(16) item_ID; } else if (version == 3) { unsigned int(32) item_ID; } unsigned int(16) item_protection_index; unsigned int(32) item_type; string item_name; if (item_type==’mime’) { string content_type; string content_encoding; //optional } else if (item_type == ‘uri ‘) { string item_uri_type; } } }

aligned(8) class ItemInfoBox extends FullBox(‘iinf’, version, 0) { if (version == 0) { unsigned int(16) entry_count; } else { unsigned int(32) entry_count; } ItemInfoEntry[ entry_count ] item_infos; }

Page 97: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 83

8.11.6.3 Semantics

item_id containseither0fortheprimaryresource(e.g.,theXMLcontainedinan‘xml ‘box)ortheIDoftheitemforwhichthefollowinginformationisdefined.

item_protection_index contains either 0 for an unprotected item, or the one‐based indexintotheitemprotectionboxdefiningtheprotectionappliedtothisitem(thefirstboxintheitemprotectionboxhastheindex1).

item_nameisanull‐terminatedstringinUTF‐8characterscontainingasymbolicnameoftheitem(sourcefileforfiledeliverytransmissions).

item_type is a 32‐bit value, typically 4 printable characters, that is a defined valid item typeindicator,suchas‘mime’

content_typeisanull‐terminatedstringinUTF‐8characterswiththeMIMEtypeoftheitem.Iftheitemiscontentencoded(seebelow),thenthecontenttypereferstotheitemaftercontentdecoding.

item_uri_type isastringthatisanabsoluteURI,thatisusedasatypeindicator.content_encoding is an optional null‐terminated string in UTF‐8 characters used to indicate

that thebinary file isencodedandneeds tobedecodedbefore interpreted.Thevaluesareasdefined for Content‐Encoding forHTTP/1.1. Somepossible values are “gzip”, “compress” and“deflate”.Anemptystringindicatesnocontentencoding.Notethattheitemisstoredafterthecontentencodinghasbeenapplied.

extension_type isaprintablefour‐charactercodethatidentifiestheextensionfieldsofversion1withrespecttoversion0oftheIteminformationentry.

content_location isanull‐terminatedstringinUTF‐8characterscontainingtheURIofthefileasdefinedinHTTP/1.1(RFC2616).

content_MD5 isanull‐terminatedstringinUTF‐8characterscontaininganMD5digestofthefile.SeeHTTP/1.1(RFC2616)andRFC1864.

content_length givesthetotallength(inbytes)ofthe(un‐encoded)file.transfer_length givesthetotallength(inbytes)ofthe(encoded)file.Notethattransferlength

isequaltocontentlengthifnocontentencodingisapplied(seeabove).entry_count providesacountofthenumberofentriesinthefollowingarray.group_ID indicatesafilegrouptowhichthefileitem(sourcefile)belongs.See3GPPTS26.346

formoredetailsonfilegroups.

8.11.7 Additional Metadata Container Box

8.11.7.1 Definition

BoxType: ‘meco’Container: File,MovieBox(‘moov’),orTrackBox(‘trak’)Mandatory:NoQuantity: Zeroorone

Theadditionalmetadatacontainerbox includesoneormoremetaboxes. It canbecarriedat the toplevelofthefile,intheMovieBox(‘moov’),orintheTrackBox(‘trak’)andshallonlybepresentifitisaccompaniedbyametaboxinthesamecontainer.Ametaboxthatisnotcontainedintheadditionalmetadata container box is the preferred (primary)meta box.Meta boxes in the additionalmetadatacontainerboxcomplementorgivealternativemetadatainformation.Theusageofmultiplemetaboxesmaybedesirablewhen,e.g.,asinglehandlerisnotcapableofprocessingallmetadata.Allmetaboxesata certain level, including thepreferredoneand thosecontained in theadditionalmetadatacontainerbox,musthavedifferenthandlertypes.

Page 98: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

84 ©ISO/IEC2015–Allrightsreserved

AmetaboxcontainedinanadditionalmetadatacontainerboxshallcontainaprimaryItemboxortheprimary data box required by the handler (e.g., an XML Box). It shall not include boxes or syntaxelementsconcerningitemsotherthantheprimaryitemindicatedbythepresentprimaryitemboxorXML box. URLs in ameta box contained in an additionalmetadata container box are relative to thecontextofthepreferredmetabox.

8.11.7.2 Syntax

aligned(8) class AdditionalMetadataContainerBox extends Box('meco') { }

8.11.8 Metabox Relation Box

8.11.8.1 Definition

BoxType: ‘mere’Container: AdditionalMetadataContainerBox(‘meco’)Mandatory:NoQuantity: Zeroormore

Themetaboxrelationbox indicatesarelationbetweentwometaboxesat thesame level, i.e., the toplevel of the file, theMovieBox, orTrackBox.The relationbetween twometaboxes is unspecified ifthere isnometaboxrelationboxforthosemetaboxes.Metaboxesarereferencedbyspecifyingtheirhandlertypes.

8.11.8.2 Syntax

aligned(8) class MetaboxRelationBox extends FullBox('mere', version=0, 0) { unsigned int(32) first_metabox_handler_type; unsigned int(32) second_metabox_handler_type; unsigned int(8) metabox_relation; }

8.11.8.3 Semantics

first_metabox_handler_type indicatesthefirstmetaboxtoberelated.second_metabox_handler_type indicatesthesecondmetaboxtoberelated.metabox_relation indicatestherelationbetweenthetwometaboxes.Thefollowingvaluesare

defined:1 Therelationshipbetweentheboxesisunknown(whichisthedefaultwhenthisbox

isnotpresent);

2 the two boxes are semantically un‐related (e.g., one is presentation, the otherannotation);

3 thetwoboxesaresemanticallyrelatedbutcomplementary(e.g.,twodisjointsetsofmeta‐dataexpressedintwodifferentmeta‐datasystems);

4 the two boxes are semantically related but overlap (e.g., two sets of meta‐dataneitherofwhichisasubsetoftheother);neitheris‘preferred’totheother;

5 thetwoboxesaresemanticallyrelatedbutthesecondisapropersubsetorweakerversionofthefirst;thefirstispreferred;

Page 99: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 85

6 thetwoboxesaresemanticallyrelatedandequivalent(e.g.,twoessentiallyidenticalsetsofmeta‐dataexpressedintwodifferentmeta‐datasystems).

8.11.9 URL Forms for meta boxes

When ameta‐box is used, thenURLsmaybeused to refer to items in themeta‐box, eitherusing anabsoluteURL,orusingarelativeURL.AbsoluteURLsmayonlybeusedtorefertoitemsinafile‐levelmetabox.

When interpretingdata that is in the context of ameta‐box (i.e. the file for a file‐levelmeta‐box, thepresentationforamovie‐levelmeta‐box,orthetrackforatrack‐levelmeta‐box),theitemsinthemeta‐boxaretreatedasshadowingfilesinthesamelocationasthatfromwhichthecontainerfilecame.Thisshadowingmeans that a reference to another file in the same location as the container filemay beresolvedtoanitemwithinthecontainerfileitself.ItemscanbeaddressedwithinthecontainerfilebyappendingafragmenttotheURLforthecontainerfileitself.Thatfragmentstartswiththe“#”characterandconsistsofeither:

b) item_ID=<n>,identifyingtheitembyitsID(theIDmaybe0fortheprimaryresource);

c) item_name=<item_name>,whentheiteminformationboxisused.

If a fragment within the contained item must be addressed, then the initial “#” character of thatfragmentisreplacedby“*”.

Consider the following example:<http://a.com/d/v.qrv#item_name=tree.html*branch1>.Weassumethatv.qrv isa filewith ameta‐box at the file level. First, the client strips the fragment and fetchesv.qrv from a.comusingHTTP.Ittheninspectsthetop‐levelmetaboxandaddstheitemsinit,logically,toitscacheofthedirectory “d” on a.com. It then re‐forms the URL as <http://a.com/d/tree.html#branch1>.Notethatthefragmenthasbeenelevatedtoafullfilename,andthefirst“*”hasbeentransformedbackinto a “#”. The client then either finds an item named tree.html in the meta box, or fetchestree.html froma.com,and it then finds theanchor“branch1”withintree.html. Ifwithin thathtml,afilewasreferencedusingarelativeURL,e.g.“flower.gif”,thentheclientconvertsthistoanabsoluteURLusingthenormalrules:<http://a.com/d/flower.gif>andagainitcheckstoseeifflower.gifisanameditem(andhenceshadowingaseparatefileofthisname),andthenifitisnot,fetchesflower.giffroma.com.

8.11.10 Static Metadata

Thissectiondefinesthestorageofstatic(un‐timed)metadataintheISOfileformatfamily.

Reader support formetadata in general is optional, and therefore it is also optional for the formatsdefinedhereorelsewhere,unlessmademandatorybyaderivedspecification.

8.11.10.1 Simple textual

Thereisexistingsupportforsimpletextualtagsintheformoftheuser‐databoxes;currentlyonlyoneisdefined–thecopyrightnotice.Othermetadataispermittedusingthissimpleformif:

Page 100: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

86 ©ISO/IEC2015–Allrightsreserved

a) itusesaregisteredbox‐typeoritusestheUUIDescape(thelatterispermittedtoday);

b) it uses a registered tag, the equivalentMPEG‐7 constructmustbedocumented aspart of theregistration.

8.11.10.2 Other forms

Whenotherformsofmetadataaredesired,thena‘meta’boxasdefinedabovemaybeincludedattheappropriate levelof thedocument. If thedocument is intendedtobeprimarilyametadatadocumentperse,thenthemetaboxisatfilelevel.Ifthemetadataannotatesanentirepresentation,thenthemetaboxisatthemovielevel;anentirestream,atthetracklevel.

8.11.10.3 MPEG-7 metadata

MPEG‐7metadataisstoredinmetaboxestothisspecification.

1) Thehandler‐typeis‘mp7t’fortextualmetadatainUnicodeformat;

2) Thehandler‐typeis‘mp7b’forbinarymetadatacompressedintheBIMformat.Inthiscase,thebinaryXMLboxcontainstheconfigurationinformationimmediatelyfollowedbythebinarizedXML.

3) When the format is textual, there is either another box in the metadata container ‘meta’,called‘xml ‘,which contains the textualMPEG‐7document, or there is aprimary itemboxidentifyingtheitemcontainingtheMPEG‐7XML.

4) Whentheformatisbinary,thereiseitheranotherboxinthemetadatacontainer‘meta’,called‘bxml‘,which contains thebinaryMPEG‐7document, or aprimary itembox identifying theitemcontainingtheMPEG‐7binarizedXML.

5) IfanMPEG‐7box isusedat the file level, thenthebrand‘mp71’ shouldbeamemberof thecompatible‐brandslistinthefile‐typebox.

8.11.11 Item Data Box

8.11.11.1 Definition

BoxType: ‘idat’Container: Metadatabox(‘meta’)Mandatory:NoQuantity: Zeroorone

Thisboxcontainsthedataofmetadataitemsthatusetheconstructionmethodindicatingthatanitem’sdataextentsarestoredwithinthisbox.

8.11.11.2 Syntax

aligned(8) class ItemDataBox extends Box(‘idat’) { bit(8) data[]; }

Page 101: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 87

8.11.11.3 Semantics

dataisthecontainedmetadata

8.11.12 Item Reference Box

8.11.12.1 Definition

BoxType: ‘iref’Container: Metadatabox(‘meta’)Mandatory:NoQuantity: Zeroorone

Theitemreferenceboxallowsthelinkingofoneitemtoothersviatypedreferences.Allthereferencesfor one itemof a specific type are collected into a single item type reference box,whose type is thereferencetype,andwhichhasa‘fromitemID’fieldindicatingwhichitemislinked.Theitemslinkedtoare thenrepresentedbyanarrayof ‘to item ID’s.All thesesingle itemtypereferenceboxesare thencollectedintotheitemreferencebox.Thereferencetypesdefinedforthetrackreferenceboxdefinedin8.3.3 may be used here if appropriate, or other registered reference types. Version 1 ofItemReferenceBox with SingleItemReferenceBoxLarge should only be used when largefrom_item_IDorto_item_IDvalues(exceeding65535)arerequiredorexpectedtoberequired.

NOTE:Thisdesignmakesitfairlyeasytofindallthereferencesofaspecifictype,orfromaspecificitem.

Anitemreferenceoftype‘font’ maybeusedtoindicatethatanitemusesfontscarried/definedinthereferenceditem.

8.11.12.2 Syntax

aligned(8) class SingleItemTypeReferenceBox(referenceType) extends Box(referenceType) { unsigned int(16) from_item_ID; unsigned int(16) reference_count; for (j=0; j<reference_count; j++) { unsigned int(16) to_item_ID; } }

aligned(8) class SingleItemTypeReferenceBoxLarge(referenceType) extends Box(referenceType) { unsigned int(32) from_item_ID; unsigned int(16) reference_count; for (j=0; j<reference_count; j++) { unsigned int(32) to_item_ID; } }

aligned(8) class ItemReferenceBox extends FullBox(‘iref’, version, 0) { if (version==0) { SingleItemTypeReferenceBox references[]; } else if (version==1) { SingleItemTypeReferenceBoxLarge references[]; } }

8.11.12.3 Semantics

reference_type containsanindicationofthetypeofthereferencefrom_item_id containstheIDoftheitemthatreferstootheritems

Page 102: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

88 ©ISO/IEC2015–Allrightsreserved

reference_count isthenumberofreferencesto_item_id containstheIDoftheitemreferredto

8.11.13 Auxiliary video metadata

An auxiliary video track used for depth or parallax informationmay carry ameta‐data item of type‘auvd’(auxiliaryvideodescriptor);thedataofthatitemisexactlyonesi_rbsp()asspecifiedinISO/IEC23002‐3. (Note that si_rbsp() is externally framed, and the length is supplied by the item locationinformation in the file format). Theremay bemore than one of thesemeta‐data items (e.g. one forparallaxinfoandonefordepth,inthecasethatthesamestreamserves).

8.12 Support for Protected Streams

This section documents the file‐format transformationswhich are used for protected content. Thesetransformationscanbeusedunderseveralcircumstances:

Theymustbeusedwhenthecontenthasbeentransformed(e.g.byencryption)insuchawaythatitcannolongerbedecodedbythenormaldecoder;

Theymay be usedwhen the content should only be decodedwhen the protection system isunderstoodandimplemented.

The transformation functions by encapsulating the original media declarations. The encapsulationchanges the four‐character‐code of the sample entries, so that protection‐unaware readers see themediastreamasanewstreamformat.

Becausetheformatofasampleentryvarieswithmedia‐type,adifferentencapsulatingfour‐character‐codeisusedforeachmediatype(audio,video,textetc.).Theyare:

Stream (Track) Type Sample-Entry Code

Video encv

Audio enca

Text enct

System encs

Thetransformationofthesampledescriptionisdescribedbythefollowingprocedure:

1) The four‐character‐code of the sample description is replaced with a four‐character‐codeindicatingprotectionencapsulation:thesecodesvaryonlybymedia‐type.Forexample,‘mp4v’isreplacedwith‘encv’and‘mp4a’isreplacedwith‘enca’.

2) AProtectionSchemeInfoBox(defined below)isaddedtothesampledescription,leavingallotherboxesunmodified.

3) The original sample entry type (four‐character‐code) is stored within theProtectionSchemeInfoBox, in a new box called the OriginalFormatBox (defined below);

Page 103: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 89

There are then three methods for signalling the nature of the protection, which may be usedindividuallyorincombination.

1) WhenMPEG‐4systemsisused,thenIPMPmustbeusedtosignalthatthestreamsareprotected.

2) IPMPdescriptorsmayalsobeusedoutsidetheMPEG‐4systemscontextusingboxescontainingIPMPdescriptors.

3) Theprotectionappliedmayalsobedescribedusingtheschemetypeandinformationboxes.

When IPMP is used outside of MPEG‐4 systems, then a ‘global’ IPMPControlBox may also occurwithinthe‘moov’atom.

NOTEWhenMPEG‐4 systems isused,anMPEG‐4 systems terminal caneffectively treat, forexample,‘encv’withanOriginalFormatof‘mp4v’exactlythesameas‘mp4v’,byusingtheIPMPdescriptors.

8.12.1 Protection Scheme Information Box

8.12.1.1 Definition

BoxTypes: ‘sinf’Container: ProtectedSampleEntry,orItemProtectionBox(‘ipro’)Mandatory:YesQuantity: OneorMore

TheProtectionSchemeInformationBoxcontainsall the informationrequiredbothtounderstandtheencryptiontransformappliedand itsparameters,andalso to findother informationsuchas thekindandlocationofthekeymanagementsystem.Italsodocumentstheoriginal(unencrypted)formatofthemedia.TheProtectionScheme InformationBox isa containerBox. It ismandatory ina sampleentrythatusesacodeindicatingaprotectedstream.

Whenusedinaprotectedsampleentry,thisboxmustcontaintheoriginalformatboxtodocumenttheoriginalformat.Atleastoneofthefollowingsignallingmethodsmustbeusedtoidentifytheprotectionapplied:

a) MPEG‐4 systems with IPMP: no other boxes, when IPMP descriptors in MPEG‐4 systemsstreamsareused;

b) Schemesignalling: aSchemeTypeBoxandSchemeInformationBox,whentheseareused(eitherbothmustoccur,orneither).

At leastoneprotectionscheme informationboxmustoccur inaprotectedsampleentry.Whenmorethanoneoccurs, theyareequivalent,alternative,descriptionsof thesameprotection.Readersshouldchooseonetoprocess.

Page 104: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

90 ©ISO/IEC2015–Allrightsreserved

8.12.1.2 Syntax

aligned(8) class ProtectionSchemeInfoBox(fmt) extends Box('sinf') { OriginalFormatBox(fmt) original_format; SchemeTypeBox scheme_type_box; // optional SchemeInformationBox info; // optional }

8.12.2 Original Format Box

8.12.2.1 Definition

BoxTypes: ‘frma’Container: ProtectionSchemeInformationBox(‘sinf’),RestrictedSchemeInformationBox(‘rinf’),or CompleteTrackInformationBox(‘cinf’)Mandatory: Yeswhenusedinaprotectedsampleentry,inarestrictedsampleentry,or inasampleentryforanincompletetrack.Quantity: Exactlyone.

The Original Format Box ‘frma’ contains the four‐character‐code of the original un‐transformedsampledescription:

8.12.2.2 Syntax

aligned(8) class OriginalFormatBox(codingname) extends Box ('frma') { unsigned int(32) data_format = codingname; // format of decrypted, encoded data (in case of protection) // or un-transformed sample entry (in case of restriction // and complete track information) }

8.12.2.3 Semantics

data_formatisthefour‐character‐codeoftheoriginalun‐transformedsampleentry(e.g.‘mp4v’ifthestreamcontainsprotectedorrestrictedMPEG‐4visualmaterial).

8.12.3 IPMPInfoBox

(emptysub‐clause)

8.12.4 IPMP Control Box

(emptysub‐clause)

8.12.5 Scheme Type Box

8.12.5.1 Definition

BoxTypes: ‘schm’Container: ProtectionSchemeInformationBox(‘sinf’),RestrictedSchemeInformationBox(‘rinf’), orSRTPProcessbox(‘srpp‘)Mandatory:NoQuantity: Zerooronein‘sinf’,dependingontheprotectionstructure;Exactlyonein‘rinf’and‘srpp’

TheSchemeTypeBox(‘schm’)identifiestheprotectionorrestrictionscheme.

Page 105: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 91

8.12.5.2 Syntax

aligned(8) class SchemeTypeBox extends FullBox('schm', 0, flags) { unsigned int(32) scheme_type; // 4CC identifying the scheme unsigned int(32) scheme_version; // scheme version if (flags & 0x000001) { unsigned int(8) scheme_uri[]; // browser uri } }

8.12.5.3 Semantics

scheme_typeisthecodedefiningtheprotectionorrestrictionscheme.scheme_versionistheversionofthescheme(usedtocreatethecontent)scheme_URI allows for the option of directing the user to a web‐page if they do not have the

scheme installed on their system. It is an absoluteURI formed as a null‐terminated string inUTF‐8characters.

8.12.6 Scheme Information Box

8.12.6.1 Definition

BoxTypes: ‘schi’Container: ProtectionSchemeInformationBox(‘sinf’),RestrictedSchemeInformationBox(‘rinf’), orSRTPProcessbox(‘srpp‘)Mandatory:NoQuantity: Zeroorone

TheSchemeInformationBoxisacontainerBoxthatisonlyinterpretedbytheschemebeingused.Anyinformationtheencryptionorrestrictionsystemneedsisstoredhere.ThecontentofthisboxisaseriesofboxeswhosetypeandformataredefinedbytheschemedeclaredintheSchemeTypeBox.

8.12.6.2 Syntax

aligned(8) class SchemeInformationBox extends Box('schi') { Box scheme_specific_data[]; }

8.13 File Delivery Format Support

8.13.1 Introduction

Files intended for transmission overALC/LCT or FLUTE are stored as items in a top‐levelmeta box(‘meta’).Theitemlocationbox(‘iloc’)specifiestheactualstoragelocationofeachitemwithinthecontainerfileaswellasthefilesizeofeachitem.Filename,contenttype(MIMEtype),etc.,ofeachitemareprovidedbyversion1oftheiteminformationbox(‘iinf’).

Pre‐computedFECreservoirsarestoredasadditionalitemsinthemetabox.Ifasourcefileissplitintoseveral source blocks, FEC reservoirs for each source block are stored as separate items. TherelationshipbetweenFEC reservoirs andoriginal source items is recorded in thepartitionentrybox('paen')locatedintheFDiteminformationbox('fiin').

Pre‐composedFilereservoirsarestoredasadditionalitemsinthecontainerfile.Ifasourcefileissplitintoseveral sourceblocks,eachsourceblock is storedasa separate itemcalledaFile reservoir.The

Page 106: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

92 ©ISO/IEC2015–Allrightsreserved

relationship betweenFile reservoirs andoriginal source items is recorded in thepartition entry box('paen')locatedintheFDiteminformationbox('fiin').

Seesubclause9.2formoredetailsontheusageofthefiledeliveryformat.

8.13.2 FD Item Information Box

8.13.2.1 Definition

BoxType: ‘fiin’Container: MetaBox(‘meta’)Mandatory:NoQuantity: Zeroorone

The FD item information box is optional, although it is mandatory for files using FD hint tracks. ItprovidesinformationonthepartitioningofsourcefilesandhowFDhinttracksarecombinedintoFDsessions. Each partition entry provides details on a particular file partitioning, FEC encoding andassociated File and FEC reservoirs. It is possible to provide multiple entries for one source file(identifiedbyitsitemID)ifalternativeFECencodingschemesorpartitioningsareusedinthefile.Allpartitionentriesareimplicitlynumberedandthefirstentryhasnumber1.

8.13.2.2 Syntax

aligned(8) class PartitionEntry extends Box('paen') { FilePartitionBox blocks_and_symbols; FECReservoirBox FEC_symbol_locations; //optional FileReservoirBox File_symbol_locations; //optional } aligned(8) class FDItemInformationBox extends FullBox('fiin', version = 0, 0) { unsigned int(16) entry_count; PartitionEntry partition_entries[ entry_count ]; FDSessionGroupBox session_info; //optional GroupIdToNameBox group_id_to_name; //optional }

8.13.2.3 Semantics

entry_count providesacountofthenumberofentriesinthefollowingarray.

Thesemanticsoftheboxesaredescribedwheretheboxesaredocumented.

8.13.3 File Partition Box

8.13.3.1 Definition

BoxType: ‘fpar’Container: PartitionEntry(‘paen’)Mandatory:YesQuantity: Exactlyone

TheFilePartitionboxidentifiesthesourcefileandprovidesapartitioningofthatfileintosourceblocksandsymbols.Furtherinformationaboutthesourcefile,e.g.,filename,contentlocationandgroupIDs,iscontainedintheItemInformationbox('iinf'),wheretheItemInformationentrycorrespondingtothe item IDof the source file isofversion1 and includesaFileDelivery Item InformationExtension

Page 107: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 93

('fdel').Version1ofFilePartitionBoxshouldonlybeusedwhensupportforlargeitem_IDorentry_countvalues(exceeding65535)isrequiredorexpectedtoberequired.

8.13.3.2 Syntax

aligned(8) class FilePartitionBox extends FullBox('fpar', version, 0) { if (version == 0) { unsigned int(16) item_ID; } else { unsigned int(32) item_ID; } unsigned int(16) packet_payload_size; unsigned int(8) reserved = 0; unsigned int(8) FEC_encoding_ID; unsigned int(16) FEC_instance_ID; unsigned int(16) max_source_block_length; unsigned int(16) encoding_symbol_length; unsigned int(16) max_number_of_encoding_symbols; string scheme_specific_info; if (version == 0) { unsigned int(16) entry_count; } else { unsigned int(32) entry_count; } for (i=1; i <= entry_count; i++) { unsigned int(16) block_count; unsigned int(32) block_size; } }

8.13.3.3 Semantics

item_ID referencestheitemintheitemlocationbox('iloc')thatthefilepartitioningappliesto.

packet_payload_size gives the target ALC/LCT or FLUTE packet payload size of thepartitioningalgorithm.NotethatUDPpacketpayloadsarelarger,astheyalsocontainALC/LCTorFLUTEheaders.

FEC_encoding_ID identifies theFECencodingschemeand issubject to IANAregistration(seeRFC5052). Note that i) value zero corresponds to the "Compact No‐Code FEC scheme" alsoknown as "Null‐FEC" (RFC 3695); ii) value one corresponds to the “MBMS FEC” (3GPP TS26.346); iii) for values in the range of 0 to 127, inclusive, the FEC scheme is Fully‐Specified,whereasforvaluesintherangeof128to255,inclusive,theFECschemeisUnder‐Specified.

FEC_instance_ID providesamorespecificidentificationoftheFECencoderbeingusedforanUnder‐SpecifiedFECscheme.ThisvalueshouldbesettozeroforFully‐SpecifiedFECschemesand shall be ignoredwhen parsing a filewithFEC_encoding_ID in the range of 0 to 127,inclusive.FEC_instance_IDisscopedbytheFEC_encoding_ID.SeeRFC5052forfurtherdetails.

max_source_block_length givesthemaximumnumberofsourcesymbolspersourceblock.encoding_symbol_length gives the size (in bytes) of one encoding symbol. All encoding

symbolsofoneitemhavethesamelength,exceptthelastsymbolwhichmaybeshorter.max_number_of_encoding_symbols gives the maximum number of encoding symbols that

canbegeneratedforasourceblock forthoseFECschemes inwhichthemaximumnumberofencodingsymbolsisrelevant,suchasFECencodingID129definedinRFC5052.ForthoseFECschemesinwhichthemaximumnumberofencodingsymbolsisnotrelevant,thesemanticsofthisfieldisunspecified.

scheme_specific_info is a base64‐encoded null‐terminated string of the scheme‐specificobject transfer information (FEC‐OTI‐Scheme‐Specific‐Info). The definition of the informationdependsontheFECencodingID.

Page 108: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

94 ©ISO/IEC2015–Allrightsreserved

entry_count givesthenumberofentriesinthelistof(block_count,block_size)pairsthatprovides a partitioning of the source file. Starting from the beginning of the file, each entryindicateshowthenextsegmentofthefileisdividedintosourceblocksandsourcesymbols.

block_count indicatesthenumberofconsecutivesourceblocksofsizeblock_size.block_size indicates the size of a block (in bytes). A block_size that is not a multiple of the

encoding_symbol_lengthsymbolsize indicateswithCompactNo‐CodeFECthatthe lastsourcesymbolsincludespaddingthatisnotstoredintheitem.WithMBMSFEC(3GPPTS26.346)thepaddingmayextendacrossmultiplesymbolsbutthesizeofpaddingshouldneverbemorethanencoding_symbol_length.

8.13.4 FEC Reservoir Box

8.13.4.1 Definition

BoxType: ‘fecr’Container: PartitionEntry(‘paen’)Mandatory:NoQuantity: ZeroorOne

TheFECreservoirboxassociatesthesourcefileidentifiedinthefilepartitionbox('fpar')withFECreservoirsstoredasadditionalitems.ItcontainsalistthatstartswiththefirstFECreservoirassociatedwiththefirstsourceblockofthesourcefileandcontinuessequentiallythroughthesourceblocksofthesource file. Version 1 ofFECReservoirBox should only be usedwhen support for largeitem_IDvaluesandentry_count(exceeding65535)isrequiredorexpectedtoberequired.

8.13.4.2 Syntax

aligned(8) class FECReservoirBox extends FullBox('fecr', version, 0) { if (version == 0) { unsigned int(16) entry_count; } else { unsigned int(32) entry_count; } for (i=1; i <= entry_count; i++) { if (version == 0) { unsigned int(16) item_ID; } else { unsigned int(32) item_ID; } unsigned int(32) symbol_count; } }

8.13.4.3 Semantics

entry_count givesthenumberofentriesinthefollowinglist.Anentrycounthereshouldmatchthetotalnumberofblocksinthecorrespondingfilepartitionbox.

item_ID indicatesthelocationoftheFECreservoirassociatedwithasourceblock.symbol_count indicatesthenumberofrepairsymbolscontainedintheFECreservoir.

Page 109: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 95

8.13.5 FD Session Group Box

8.13.5.1 Definition

BoxType: ‘segr’Container: FDInformationBox(‘fiin’)Mandatory:NoQuantity: ZeroorOne

TheFDsessiongroupbox isoptional,althoughit ismandatory for filescontainingmorethanoneFDhint track. It contains a list of sessions aswell as all file groups and hint tracks that belong to eachsession.AnFDsessionsendssimultaneouslyoverallFDhinttracks(channels)thatarelistedintheFDsessiongroupboxforaparticularFDsession.

Onlyonesessiongroupshouldbeprocessedatanytime.Thefirst listedhinttrackinasessiongroupspecifies the base channel. If the server has no preference between the session groups, the defaultchoiceshouldbethefirstsessiongroup.ThegroupIDsofallfilegroupscontainingthefilesreferencedbythehinttracksshallbeincludedinthelistoffilegroups.ThefilegroupIDscaninturnbetranslatedintofilegroupnames(usingthegroupIDtonamebox)thatcanbeincludedbytheserverinFDTs.

8.13.5.2 Syntax

aligned(8) class FDSessionGroupBox extends Box('segr') { unsigned int(16) num_session_groups; for(i=0; i < num_session_groups; i++) { unsigned int(8) entry_count; for (j=0; j < entry_count; j++) { unsigned int(32) group_ID; } unsigned int(16) num_channels_in_session_group; for(k=0; k < num_channels_in_session_group; k++) { unsigned int(32) hint_track_id; } } }

8.13.5.3 Semantics

num_session_groups specifiesthenumberofsessiongroups.entry_count givesthenumberofentriesinthefollowinglistcomprisingallfilegroupsthatthe

session group complies with. The session group contains all files included in the listed filegroupsasspecifiedbytheiteminformationentryofeachsourcefile.NotethattheFDTforthesessiongroupshouldonlycontainthosegroupsthatarelistedinthisstructure.

group_ID indicatesafilegroupthatthesessiongroupcomplieswith.num_channels_in_session_groups specifies the number of channels in the session group.

Thevalueofnum_channels_in_session_groupsshallbeapositiveinteger.hint_track_ID specifiesthetrackIDoftheFDhinttrackbelongingtoaparticularsessiongroup.

NotethatoneFDhinttrackcorrespondstooneLCTchannel.

Page 110: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

96 ©ISO/IEC2015–Allrightsreserved

8.13.6 Group ID to Name Box

8.13.6.1 Definition

BoxType: ‘gitn’Container: FDInformationBox(‘fiin’)Mandatory:NoQuantity: ZeroorOne

The Group ID toName box associates file group names to file group IDs used in the version 1 iteminformationentriesintheiteminformationbox('iinf').

8.13.6.2 Syntax

aligned(8) class GroupIdToNameBox extends FullBox('gitn', version = 0, 0) { unsigned int(16) entry_count; for (i=1; i <= entry_count; i++) { unsigned int(32) group_ID; string group_name; } }

8.13.6.3 Semantics

entry_count givesthenumberofentriesinthefollowinglist.group_ID indicatesafilegroup.group_name isanull‐terminatedstringinUTF‐8characterscontainingafilegroupname.

8.13.7 File Reservoir Box

8.13.7.1 Definition

BoxType: ‘fire’Container: PartitionEntry(‘paen’)Mandatory: NoQuantity: ZeroorOne

The File reservoir box associates the source file identified in the file partition box ('fpar') with Filereservoirsstoredasadditionalitems.ItcontainsalistthatstartswiththefirstFilereservoirassociatedwiththefirstsourceblockofthesourcefileandcontinuessequentiallythroughthesourceblocksofthesourcefile.Version1ofFileReservoirBox shouldonlybeusedwhensupportforlargeitem_IDorentry_countvalues(exceeding65535)isrequiredorexpectedtoberequired.

Page 111: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 97

8.13.7.2 Syntax

aligned(8) class FileReservoirBox extends FullBox('fire', version, 0) { if (version == 0) { unsigned int(16) entry_count; } else { unsigned int(32) entry_count; } for (i=1; i <= entry_count; i++) { if (version == 0) { unsigned int(16) item_ID; } else { unsigned int(32) item_ID; } unsigned int(32) symbol_count; } }

8.13.7.3 Semantics

entry_count givesthenumberofentriesinthefollowinglist.Anentrycounthereshouldmatchthetotalnumberorblocksinthecorrespondingfilepartitionbox.

item_ID indicatesthelocationoftheFilereservoirassociatedwithasourceblock.symbol_count indicatesthenumberofsourcesymbolscontainedintheFilereservoir.

8.14 Sub tracks

8.14.1 Introduction

Subtracksareusedtoassignpartsoftrackstoalternateandswitchgroupsinthesamewayas(entire)trackscanbeassignedtoalternateandswitchgroupstoindicatewhetherthosetracksarealternativesto each other andwhether itmakes sense to switch between them during a session. Sub tracks aresuitableforlayeredmedia,e.g.,SVCandMVC,wheremediaalternativesoftenareincommensuratewithtrackstructures.Bydefiningalternateandswitchgroupsatsub‐tracklevelitispossibletouseexistingrules formediaselectionandswitching for such layeredcodecs.Theover‐all syntax isgeneric forallkinds of media and backward compatible with track‐level definitions. Sub‐track level alternate andswitchgroupsusethesamenumberingastracklevelgroups.Thenumberingsareglobaloveralltrackssuchthatgroupscanbedefinedacrosstrackandsub‐trackboundaries.

Inordertodefinesubtracks,media‐specificdefinitionsarerequired.DefinitionsforSVCandMVCarespecifiedintheAVCfileformat(ISO/IEC14496‐15).AnotherwayistodefinesamplegroupsandmapthemtosubtracksusingtheSubTrackSampleGroupboxdefinedhere.Thesyntaxcanalsobeextendedtoincludeothermedia‐specificdefinitions.

ForeachsubtrackthatshallbedefinedaSubTrackboxshallbeincludedintheUserDataboxofthecorrespondingtrack.TheSubTrackboxcontainsobjectsthatdefineandprovideinformationaboutasubtrackinthesametrack.TheTrackSelectionboxforthissametrackisalreadylocatedhere.

8.14.2 Backward compatibility

Thedefault is toassignalternateandswitchgroups to0 (zero) for (entire) tracks,whichmeans thatthere is no information on alternate and/or switch groups for those (entire) tracks. However, filereadersthatareawareofsub‐trackdefinitionswillbeableto findsub‐trackinformationonalternateandswitchgroupsevenifthetrackindicationissetto0.Thiswayitispossibletoindicatethatafilecan

Page 112: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

98 ©ISO/IEC2015–Allrightsreserved

beusedby legacy readersby including theappropriatebrand in the file typebox.A file creator thatrequiresareadertobeawareofsub‐trackinformationshouldnotincludelegacybrands.

Thesamemethodofassigningsubtrackinformationcanalsobeappliedifallpartsofatrackexceptasub trackbelong to thesamealternateor switchgroup.Then theoveralldefinitionscanbemadeontrack level as usual and specific assignments canbemade at sub‐track level. For sub trackswithoutspecific assignments, track level assignments apply by default. As before, if a file creator requires areader to be aware of sub‐track information it should not include legacy brands (which wouldotherwiseindicatethatsubtrackinformationcanbeskipped).

8.14.3 Sub Track box

8.14.3.1 Definition

BoxType: ‘strk’Container: UserDatabox(‘udta’)ofthecorrespondingTrackbox(‘trak’)Mandatory: NoQuantity: Zeroormore

Thisboxcontainsobjectsthatdefineandprovideinformationaboutasubtrackinthepresenttrack.

8.14.3.2 Syntax

aligned(8) class SubTrack extends Box(‘strk’) { }

8.14.4 Sub Track Information box

8.14.4.1 Definition

BoxType: ‘stri’Container: SubTrackbox(‘strk’)Mandatory: YesQuantity: One

8.14.4.2 Syntax

aligned(8) class SubTrackInformation extends FullBox(‘stri’, version = 0, 0){ template int(16) switch_group = 0; template int(16) alternate_group = 0; template unsigned int(32) sub_track_ID = 0; unsigned int(32) attribute_list[]; // to the end of the box }

8.14.4.3 Semantics

switch_group isanintegerthatspecifiesagrouporcollectionoftracksand/orsubtracks.Ifthisfieldis0(defaultvalue),thenthereisnoinformationonwhetherthesubtrackcanbeusedforswitching during playing or streaming. If this integer is not 0 it shall be the same for tracksand/orsubtracksthatcanbeusedforswitchingbetweeneachother.Tracksthatbelongtothesameswitchgroupshallbelongtothesamealternategroup.Aswitchgroupmayhaveonlyonemember.

alternate_group isanintegerthatspecifiesagrouporcollectionoftracksand/orsubtracks.Ifthis field is 0 (default value), then there is no information on possible relations to othertracks/sub‐tracks.Ifthisfieldisnot0,itshouldbethesamefortracks/sub‐tracksthatcontain

Page 113: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 99

alternatedata foroneanother anddifferent for tracks/sub‐tracksbelonging todifferent suchgroups.Onlyonetrack/sub‐trackwithinanalternategroupshouldbeplayedorstreamedatanyonetime.

sub_track_ID isaninteger.Anon‐zerovalueuniquelyidentifiesthesubtracklocallywithinthetrack.Azerovalue(default)meansthatsubtrackIDisnotassigned.

attribute_list isalist,totheendofthebox,ofattributes.Theattributesinthislistshouldbeused as descriptions of sub tracks or differentiating criteria for tracks and sub tracks in thesamealternateorswitchgroup.

Thefollowingattributesaredescriptive:

Name Attribute

Description

Temporalscalability

‘tesc’

Thesub‐trackcanbetemporallyscaled.

Fine‐grainSNRscalability

‘fgsc’

Thesub‐trackcanbescaledintermsofquality.

Coarse‐grainSNRscalability

‘cgsc’

Thesub‐trackcanbescaledintermsofquality.

Spatialscalability ‘spsc’

Thesub‐trackcanbespatiallyscaled.

Region‐of‐interestscalability

‘resc’

Thesub‐trackcanberegion‐of‐interestscaled.

Viewscalability ‘vwsc’

The sub‐track can be scaled in terms of number ofviews.

Thefollowingattributesaredifferentiating:

Name Attribute

Pointer

Bitrate ‘bitr’

Total size of the samples in the track divided by thedurationinthetrackheaderbox

Framerate ‘frar’

Numberofsamplesinthetrackdividedbydurationinthetrackheaderbox

Numberofviews ‘nvws’

Numberofviewsinthesubtrack

Page 114: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

100 ©ISO/IEC2015–Allrightsreserved

8.14.5 Sub Track Definition box

8.14.5.1 Definition

BoxType: ‘strd’Container: SubTrackbox(‘strk’)Mandatory: YesQuantity: One

Thisboxcontainsobjectsthatprovideadefinitionofthesubtrack.

8.14.5.2 Syntax

aligned(8) class SubTrackDefinition extends Box(‘strd’) { }

8.14.6 Sub Track Sample Group box

8.14.6.1 Definition

BoxType: ‘stsg’Container: SubTrackDefinitionbox(‘strd’)Mandatory: NoQuantity: Zeroormore

Thisboxdefinesasubtrackasoneormoresamplegroupsbyreferringtothecorrespondingsamplegroupdescriptionsdescribingthesamplesofeachgroup.

8.14.6.2 Syntax

aligned(8) class SubTrackSampleGroupBox extends FullBox(‘stsg’, 0, 0){ unsigned int(32) grouping_type; unsigned int(16) item_count; for(i = 0; i< item_count; i++) unsigned int(32) group_description_index; }

8.14.6.3 Semantics

grouping_type isanintegerthatidentifiesthesamplegrouping.ThevalueshallbethesameasinthecorrespondingSampletoGroupandSampleGroupDescriptionboxes.

item_count countsthenumberofsamplegroupslistedinthisbox.group_description_index isanintegerthatgivestheindexofthesamplegroupentrywhich

describesthesamplesinthegroup.

8.15 Post-decoder requirements on media

8.15.1 General

Inordertohandlesituationswherethefileauthorrequirescertainactionsontheplayerorrenderer,this Subclause specifies a mechanism that enables players to simply inspect a file to find out suchrequirementsforrenderingabitstreamandstopslegacyplayersfromdecodingandrenderingfilesthatrequirefurtherprocessing.Themechanismappliestoanytypeofvideocodec.InparticularitappliestoAVCandforthiscasespecificsignallingisdefinedintheAVCfileformat(ISO/IEC14496‐15)thatallows

Page 115: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 101

a file author to list occurring SEI message IDs and distinguish between required and non‐requiredactionsfortherenderingprocess.

Themechanism is similar to the contentprotection transformationwhere sample entries arehiddenbehindgenericsampleentries,‘encv’,‘enca’,etc.,indicatingencryptedorencapsulatedmedia.Theanalogous mechanism for restricted video uses a transformation with the generic sample entry‘resv’.Themethodmaybeappliedwhenthecontentshouldonlybedecodedbyplayersthatpresentitcorrectly.

8.15.2 Transformation

Themethodisappliedasfollows:

1) Thefour‐character‐codeofthesampleentryisreplacedbyanewsampleentrycode‘resv’meaningrestrictedvideo.

2) A Restricted Scheme Info box is added to the sample description, leaving all other boxesunmodified.

3) The original sample entry type is stored within an Original Format box contained in theRestrictedSchemeInfobox.

ARestrictedSchemeInfoBox is formattedexactlythesameasaProtectionSchemeInfoBox,exceptthatisusestheidentifier‘rinf’insteadof‘sinf’ (see below).

TheoriginalsampleentrytypeiscontainedintheOriginalFormatboxlocatedintheRestrictedSchemeInfobox(inanidenticalwaytotheProtectionSchemeInfoboxforencryptedmedia).

The exact nature of the restriction is defined in theSchemeTypeBox, and the data needed for thatschemeisstoredintheSchemeInformationBox,again,analogouslytoprotectioninformation.

Notethatrestrictionandprotectioncanbeappliedatthesametime.Theorderofthetransformationsfollows from the four‐character code of the sample entry. For instance, if the sample entry type is‘resv’,undoingtheabovetransformationmayresultinasampleentrytype‘encv’,indicatingthatthemediaisprotected.

Notethatifthefileauthoronlywantstoprovideadvisoryinformationwithoutstoppinglegacyplayersfromplaying the file, theRestrictedScheme Infoboxmaybeplaced inside the sampleentrywithoutapplyingthefour‐character‐codetransformation.InthiscaseitisnotnecessarytoincludeanOriginalFormatbox.

Page 116: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

102 ©ISO/IEC2015–Allrightsreserved

8.15.3 Restricted Scheme Information box

8.15.3.1 Definition

BoxTypes: ‘rinf’Container: RestrictedSampleEntryorSampleEntryMandatory:YesQuantity: Exactlyone

TheRestrictedSchemeInformationBoxcontainsall the informationrequiredbothtounderstand therestrictionschemeappliedanditsparameters.Italsodocumentstheoriginal(un‐transformed)sampleentrytypeofthemedia.TheRestrictedSchemeInformationBoxisacontainerBox.Itismandatoryinasampleentrythatusesacodeindicatingarestrictedstream,i.e.,‘resv’.

Whenusedinarestrictedsampleentry,thisboxmustcontaintheoriginalformatboxtodocumenttheoriginal sample entry type and a Scheme type box. A Scheme Information box may be requireddependingontherestrictionscheme.

8.15.3.2 Syntax

aligned(8) class RestrictedSchemeInfoBox(fmt) extends Box('rinf') { OriginalFormatBox(fmt) original_format; SchemeTypeBox scheme_type_box; SchemeInformationBox info; // optional }

8.15.4 Scheme for stereoscopic video arrangements

8.15.4.1 General

Whenstereo‐codedvideo framesaredecoded, thedecoded frameseither containa representationoftwospatiallypackedconstituentframesthatformastereopair(framepacking)oronlyoneviewofastereo pair (left and right views in different tracks). Restrictions due to stereo‐coded video arecontainedintheStereoVideobox.

TheSchemeType‘stvi’(stereoscopicvideo)isused.

8.15.4.2 Stereo video box

8.15.4.2.1 Definition

BoxType: `stvi’Container: SchemeInformationbox(‘schi’)Mandatory: Yes(whentheSchemeTypeis‘stvi’)Quantity: One

TheStereoVideobox isused to indicate thatdecoded frameseither contain a representationof twospatiallypackedconstituentframesthatformastereopairorcontainoneoftwoviewsofastereopair.TheStereoVideoboxshallbepresentwhentheSchemeTypeis‘stvi’.

Page 117: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 103

8.15.4.2.2 Syntax

aligned(8) class StereoVideoBox extends extends FullBox(‘stvi’, version = 0, 0) { template unsigned int(30) reserved = 0; unsigned int(2) single_view_allowed; unsigned int(32) stereo_scheme; unsigned int(32) length; unsigned int(8)[length] stereo_indication_type; Box[] any_box; // optional }

8.15.4.2.3 Semantics

single_view_allowed is an integer. A zero value indicates that the content may only bedisplayed on stereoscopic displays.When (single_view_allowed & 1) is equal to 1, it isallowed to display the right view on a monoscopic single‐view display. When(single_view_allowed & 2) is equal to 2, it is allowed to display the left view on amonoscopicsingle‐viewdisplay.

stereo_scheme isanintegerthatindicatesthestereoarrangementschemeusedandthestereoindicationtypeaccordingto theusedscheme.The followingvalues forstereo_scheme arespecified:1: the frame packing scheme as specified by the Frame packing arrangement Supplemental

EnhancementInformationmessageofISO/IEC14496‐10[ISO/IEC14496‐10]2: thearrangementtypeschemeasspecifiedinAnnexLofISO/IEC13818‐2[ISO/IEC13818‐

2:2000/Amd.4]3: thestereoschemeasspecifiedinISO/IEC23000‐11forbothframe/servicecompatibleand

2D/3Dmixedservices.Othervaluesofstereo_schemearereserved.

length indicatesthenumberofbytesforthestereo_indication_typefield.stereo_indication_type indicatesthestereoarrangementtypeaccordingtotheusedstereo

indicationscheme.Thesyntaxandsemanticsofstereo_indication_typedependon thevalue of stereo_scheme. The syntax and semantics for stereo_indication_type forthefollowingvaluesofstereo_schemearespecifiedasfollows:stereo_scheme equal to 1: The value of length shall be 4 and

stereo_indication_type shall be unsigned int(32) which contains theframe_packing_arrangement_type value from Table D‐8 of ISO/IEC14496‐10 [ISO/IEC14496‐10](‘Definitionofframe_packing_arrangement_type’).

stereo_scheme equal to 2: The value of length shall be 4 andstereo_indication_typeshallbeunsigned int(32)whichcontainsthetypevaluefrom Table L‐1 of ISO/IEC13818‐2 [ISO/IEC13818‐2:2000/Amd.4] (‘Definition ofarrangement_type’).

stereo_scheme equal to 3: The value of length shall be 2 andstereo_indication_type shall contain two syntax elements ofunsigned int(8).The first syntax element shall contain the stereoscopic composition type from Table4 ofISO/IEC23000‐11:2009.Theleastsignificantbitofthesecondsyntaxelementshallcontainthe value ofis_left_first as specified in 8.4.3 of ISO/IEC23000‐11:2009, while theotherbitsarereservedandshallbesetto0.

Page 118: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

104 ©ISO/IEC2015–Allrightsreserved

ThefollowingapplieswhentheStereoVideoboxisused:

IntheTrackHeaderbox

widthandheightspecifythevisualpresentationsizeofasingleviewafterunpacking.

IntheSampleDescriptionbox

frame_count shall be 1, because the decoder physically outputs a single frame. In otherwords,theconstituentframesincludedwithinaframe‐packedpicturearenotdocumentedbyframe_count.

width andheightdocument thepixel countsof a frame‐packedpicture (andnot thepixelcountsofasingleviewwithinaframe‐packedpicture).

the Pixel Aspect Ratio box documents the pixel aspect ratio of each viewwhen the view isdisplayed on amonoscopic single‐view display. For example, inmany spatial frame packingarrangements,thePixelAspectRatioboxthereforeindicates2:1or1:2pixelaspectratio,asthespatialresolutionofoneviewof frame‐packedvideo is typicallyhalvedalongonecoordinateaxiscomparedtothatofthesingle‐viewvideoofthesameformat.

8.16 Segments

8.16.1 Introduction

Mediapresentationsmaybedividedintosegmentsfordelivery,forexample,itispossible(e.g.inHTTPstreaming) to form files that contain a segment – or concatenated segments – which would notnecessarilyformISObasemediafileformatcompliantfiles(e.g.theydonotcontainamoviebox).

ThisSubclausedefinesspecificboxesthatmaybeusedinsuchsegments.

8.16.2 Segment Type Box

BoxType: `styp’Container: FileMandatory: NoQuantity: Zeroormore

Ifsegmentsarestoredinseparatefiles(e.g.onastandardHTTPserver)itisrecommendedthatthese‘segment files’ contain a segment‐typebox,whichmust be first if present, to enable identification ofthosefiles,anddeclarationofthespecificationswithwhichtheyarecompliant.

Asegmenttypehasthesameformatasan'ftyp'box[4.3],exceptthatittakestheboxtype'styp'.Thebrandswithinitmayincludethesamebrandsthatwereincludedinthe'ftyp'boxthatprecededthe‘moov’box,andmayalsoincludeadditionalbrandstoindicatethecompatibilityofthissegmentwithvariousspecification(s).

Validsegment typeboxesshallbe the firstbox inasegment.Segment typeboxesmayberemoved ifsegmentsareconcatenated(e.g.toformafullfile),butthisisnotrequired.Segmenttypeboxesthatarenotfirstintheirfilesmaybeignored.

Page 119: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 105

8.16.3 Segment Index Box

8.16.3.1 Definition

BoxType: `sidx’Container: FileMandatory: NoQuantity: Zeroormore

The Segment Index box ('sidx') provides a compact index of one media stream within the mediasegmenttowhichitapplies.Itisdesignedsothatitcanbeusednotonlywithmediaformatsbasedonthis specification (i.e. segments containing sample tables ormovie fragments), but also othermediaformats (for example, MPEG‐2 Transport Streams [ISO/IEC13818‐1]). For this reason, the formaldescription of the box given here is deliberately generic, and then at the end of this Subclause thespecificdefinitionsforsegmentsusingmoviefragmentsaregiven.

Each Segment Index box documents how a (sub)segment is divided into one or more subsegments(whichmaythemselvesbefurthersubdividedusingSegmentIndexboxes).

Asubsegmentisdefinedasatimeintervalofthecontaining(sub)segment,andcorrespondstoasinglerange of bytes of the containing (sub)segment. The durations of all the subsegments sum to thedurationofthecontaining(sub)segment.

Each entry in the Segment Index box contains a reference type that indicateswhether the referencepoints directly to themedia bytes of a referenced leaf subsegment, or to a Segment Index box thatdescribes how the referenced subsegment is further subdivided; as a result, the segment may beindexed in a ‘hierarchical’ or ‘daisy‐chain’ or other form by documenting time and byte offsetinformationforotherSegmentIndexboxesapplyingtoportionsofthesame(sub)segment.

EachSegmentIndexboxprovidesinformationaboutasinglemediastreamoftheSegment,referredtoasthereferencestream.Ifprovided,thefirstSegmentIndexboxinasegment, foragivenmediastream,shalldocumenttheentiretyofthatmediastreaminthesegment,andshallprecedeanyotherSegmentIndexboxinthesegmentforthesamemediastream.

Ifasegmentindexispresentforat leastonemediastreambutnotallmediastreamsinthesegment,thennormallyamediastreaminwhichnoteveryaccessunitisindependentlycoded,suchasvideo,isselectedtobeindexed.Foranymediastreamforwhichnosegmentindexispresent,referredtoasnon‐indexedstream,themediastreamassociatedwiththefirstSegmentIndexboxinthesegmentservesasareferencestreaminasensethatitalsodescribesthesubsegmentsforanynon‐indexedmediastream.

NOTE1Furtherrestrictionsmaybespecifiedinderivedspecifications.

SegmentIndexboxesmaybeinlineinthesamefileastheindexedmediaor,insomecases,inaseparatefilecontainingonlyindexinginformation.

A Segment Index box contains a sequence of references to subsegments of the (sub)segmentdocumentedbythebox.Thereferencedsubsegmentsarecontiguousinpresentationtime.Similarly,thebytesreferredtobyaSegmentIndexboxarealwayscontiguousinboththemediafile,andtheseparate

Page 120: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

106 ©ISO/IEC2015–Allrightsreserved

indexsegment,orinthesinglefileifindexesareplacedwithinthemediafile.Thereferencedsizegivesthecountofthenumberofbytesinthematerialreferenced.

NOTE2Amediasegmentmaybeindexedbymorethanone“top‐level”SegmentIndexboxthatare independentofeach other, each ofwhich indexes onemedia streamwithin themedia segment. In segments containingmultiplemediastreamsthereferencedbytesmaycontainmediafrommultiplestreams,eventhoughtheSegmentIndexboxprovidestiminginformationforonlyonemediastream.

InthefilecontainingtheSegmentIndexbox,theanchorpointforaSegmentIndexboxisthefirstbyteafterthatbox.Iftherearetwofiles,theanchorpointinthemediafileisthebeginningofthetop‐levelsegment(i.e.thebeginningofthesegmentfileifeachsegmentisstoredinaseparatefile).Thematerialinthefilecontainingmedia(whichmayalsobethefilethatcontainsthesegmentindexboxes)startsattheindicatedoffsetfromtheanchorpoint.Iftherearetwofiles,thematerialintheindexfilestartsattheanchorpoint,i.e.immediatelyfollowingtheSegmentIndexbox.

Withinthetwoconstraints(a)that,intime,thesubsegmentsarecontiguous,thatis,eachentryintheloop is consecutive from the immediately preceding one and (b) within a given file (integrated file,mediafile,or indexsidefile)thereferencedbytesarecontiguous,thereareanumberofpossibilities,including:

1) a reference to a segment index box may include, in its byte count, immediately followingSegmentIndexboxesthatdocumentsubsegments;

2) inan integrated file,using thefirst_offset field, it ispossible toseparateSegment Indexboxesfromthemediathattheyreferto;

3) inan integratedfile, it ispossibleto locateSegmentIndexboxes forsubsegmentsclosetothemediatheyindex;

4) whenaseparatefilecontainingSegmentIndexesisused,itispossiblefortheloopentriestobeof‘mixedtype’,sometoSegmentIndexboxesintheindexsegment,sometomediasubsegmentsinthemediafile.

NOTE3Profilesmaybeusedtorestricttheplacementofsegmentindexes,ortheoverallcomplexityoftheindexing.

TheSegmentIndexboxdocumentsthepresenceofStreamAccessPoints(SAPs),asspecifiedinAnnexI,inthereferencedsubsegments.TheannexspecifiescharacteristicsofSAPs,suchasISAU,ISAPandTSAP,aswellasSAPtypes,whichareallusedinthesemanticsbelow.AsubsegmentstartswithaSAPwhenthesubsegmentcontainsaSAP,andforthefirstSAP,ISAUistheindexofthefirstaccessunitthatfollowsISAP,andISAPiscontainedinthesubsegment.

Forsegmentsbasedonthisspecification(i.e.basedonmoviesampletablesormoviefragments):

anaccessunitisasample; a subsegment is a self‐contained set of one or more consecutive movie fragments; a self‐

containedsetcontainsoneormoreMovieFragmentboxeswiththecorrespondingMediaDatabox(es),andaMediaDataBoxcontainingdatareferencedbyaMovieFragmentBoxmustfollowthat Movie Fragment box and precede the next Movie Fragment box containing informationaboutthesametrack;

SegmentIndexboxesshallbeplacedbeforesubsegmentmaterialtheydocument,thatis,beforeanyMovieFragment(‘moof’)boxofthedocumentedmaterialofthesubsegment;

Page 121: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 107

streamsaretracksinthefileformat,andstreamIDsaretrackIDs; asubsegmentcontainsastreamaccesspointifatrackfragmentwithinthesubsegmentforthe

trackwithtrack_IDequaltoreference_IDcontainsastreamaccesspoint; initialisationdataforSAPsconsistsofthemoviebox; presentation times are in the movie timeline, that is they are composition times after the

applicationofanyeditlistforthetrack; theISAPisapositionexactlypointingtothestartofatop‐levelbox,suchasamoviefragmentbox

'moof'; aSAPoftype1ortype2isindicatedasasyncsample,orbysample_is_non_sync_sample

equalto0inthemoviefragment; aSAPoftype3ismarkedasamemberofasamplegroupoftype‘rap ‘; aSAPoftype4ismarkedasamemberofasamplegroupoftype‘roll‘wherethevalueofthe

roll_distancefieldisgreaterthan0.

NOTE4ForSAPsoftype5and6,nospecificsignallingintheISObasemediafileformatissupported.

8.16.3.2 Syntax

aligned(8) class SegmentIndexBox extends FullBox(‘sidx’, version, 0) { unsigned int(32) reference_ID; unsigned int(32) timescale; if (version==0) { unsigned int(32) earliest_presentation_time; unsigned int(32) first_offset; } else { unsigned int(64) earliest_presentation_time; unsigned int(64) first_offset; } unsigned int(16) reserved = 0; unsigned int(16) reference_count; for(i=1; i <= reference_count; i++) { bit (1) reference_type; unsigned int(31) referenced_size; unsigned int(32) subsegment_duration; bit(1) starts_with_SAP; unsigned int(3) SAP_type; unsigned int(28) SAP_delta_time; } }

8.16.3.3 Semantics

reference_ID provides the stream ID for the reference stream; if this Segment Index box isreferencedfroma“parent”SegmentIndexbox,thevalueofreference_IDshallbethesameasthevalueofreference_IDofthe“parent”SegmentIndexbox;

timescale providesthetimescale,intickspersecond,forthetimeanddurationfieldswithinthisbox;itisrecommendedthatthismatchthetimescaleofthereferencestreamortrack;forfilesbasedonthisspecification,thatisthetimescalefieldoftheMediaHeaderBoxofthetrack;

earliest_presentation_timeistheearliestpresentationtimeofanycontentinthereferencestream in the first subsegment, in the timescale indicated in the timescale field; the earliestpresentation time isderived frommedia inaccessunits,orpartsof accessunits, thatarenotomittedbyaneditlist(ifany);

first_offsetisthedistanceinbytes,inthefilecontainingmedia,fromtheanchorpoint,tothefirstbyteoftheindexedmaterial;

Page 122: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

108 ©ISO/IEC2015–Allrightsreserved

reference_countprovidesthenumberofreferenceditems;reference_type:whensetto1indicatesthatthereferenceistoasegmentindex(‘sidx’)box;

otherwisethereferenceistomediacontent(e.g.,inthecaseoffilesbasedonthisspecification,toamoviefragmentbox);ifaseparateindexsegmentisused,thenentrieswithreferencetype1areintheindexsegment,andentrieswithreferencetype0areinthemediafile;

referenced_size:thedistanceinbytesfromthefirstbyteofthereferenceditemtothefirstbyteofthenextreferenceditem,orinthecaseofthelastentry,theendofthereferencedmaterial;

subsegment_duration:whenthereferenceistoSegmentIndexbox,thisfieldcarriesthesumofthesubsegment_duration fields in that box;when the reference is to a subsegment, thisfield carries the difference between the earliest presentation time of any access unit of thereferencestreaminthenextsubsegment(orthefirstsubsegmentofthenextsegment,ifthisisthelastsubsegmentofthesegment,ortheendpresentationtimeofthereferencestreamifthisisthelastsubsegmentofthestream)andtheearliestpresentationtimeofanyaccessunitofthereference stream in the referenced subsegment; the duration is in the same units asearliest_presentation_time;

starts_with_SAP indicates whether the referenced subsegments start with a SAP. For thedetailedsemanticsofthisfieldincombinationwithotherfields,seethetablebelow.

SAP_type indicates a SAP type as specified in AnnexI, or the value 0. Other type values arereserved.Forthedetailedsemanticsofthisfieldincombinationwithotherfields,seethetablebelow.

SAP_delta_time:indicatesTSAPofthefirstSAP,indecodingorder,inthereferencedsubsegmentfor the reference stream. If the referenced subsegments do not contain a SAP,SAP_delta_timeisreservedwiththevalue0;otherwiseSAP_delta_timeisthedifferencebetweentheearliestpresentationtimeofthesubsegment,andtheTSAP(notethatthisdifferencemaybezero,inthecasethatthesubsegmentstartswithaSAP).

Page 123: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 109

Table 4 — Semantics of SAP and reference type combinations

starts_with_SAP SAP_type reference_type Meaning

0 0 0or1 NoinformationofSAPsisprovided.

0 1to6,inclusive

0(media) Thesubsegmentcontains(butmaynotstartwith)aSAPofthegivenSAP_typeandthefirstSAPofthegivenSAP_typecorrespondstoSAP_delta_time.

0 1to6,inclusive

1(index) AllthereferencedsubsegmentscontainaSAPofatmostthegivenSAP_typeandnoneoftheseSAPsisofanunknowntype.

1 0 0(media) ThesubsegmentstartswithaSAPofanunknowntype.

1 0 1(index) AllthereferencedsubsegmentsstartwithaSAPwhichmaybeofanunknowntype

1 1to6,inclusive

0(media) ThereferencedsubsegmentstartswithaSAPofthegivenSAP_type.

1 1to6,inclusive

1(index) AllthereferencedsubsegmentsstartwithaSAPofatmostthegivenSAP_typeandnoneoftheseSAPsisofanunknowntype.

8.16.4 Subsegment Index Box

8.16.4.1 Definition

BoxType: `ssix’Container: FileMandatory: NoQuantity: Zeroormore

The Subsegment Index box ('ssix') provides a mapping from levels (as specified by the LevelAssignmentbox)tobyterangesoftheindexedsubsegment.Inotherwords,thisboxprovidesacompactindex for how the data in a subsegment is ordered according to levels into partial subsegments. Itenables a client to easily access data for partial subsegments by downloading ranges of data in thesubsegment.

Eachbyteinthesubsegmentshallbeexplicitlyassignedtoalevel,andhencetherangecountmustbe2orgreater. If therange isnotassociatedwithany information in the levelassignment, thenany levelthatisnotincludedinthelevelassignmentmaybeused.

Page 124: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

110 ©ISO/IEC2015–Allrightsreserved

There shall be 0 or 1 Subsegment Index boxes per each Segment Index box that indexes only leafsubsegments, i.e. thatonly indexes subsegmentsbutnosegment indexes.ASubsegment Indexbox, ifany,shallbethenextboxaftertheassociatedSegmentIndexbox.ASubsegmentIndexboxdocumentsthesubsegmentsthatareindicatedintheimmediatelyprecedingSegmentIndexbox.

Ingeneral,themediadataconstructedfromthebyterangesisincomplete,i.e.itdoesnotconformtothemediaformatoftheentiresubsegment.

For leaf subsegments based on this specification (i.e. based on movie sample tables and moviefragments):

Eachlevelshallbeassignedtoexactlyonepartialsubsegment,i.e.byterangesforonelevelshallbecontiguous.

Levelsofpartialsubsegmentsshallbeassignedbyincreasingnumberswithinasubsegment,i.e.,samplesofapartialsubsegmentmaydependonanysamplesofprecedingpartialsubsegmentsinthesamesubsegment,butnottheotherwayaround.Forexample,eachpartialsubsegmentcontains samples having an identical temporal level and partial subsegments appear inincreasingtemporallevelorderwithinthesubsegment.

Whenapartialsubsegmentisaccessedinthisway, foranyassignment_typeotherthan3,the final Media Data box may be incomplete, that is, less data is accessed than the lengthindication of theMediaDataBox indicates is present. The length of theMediaData boxmayneed adjusting, or paddingused. Thepadding_flag in the LevelAssignmentBox indicateswhetherthismissingdatacanbereplacedbyzeros.Ifnot,thesampledataforsamplesassignedtolevelsthatarenotaccessedisnotpresent,andcareshouldbetakennottoattempttoprocesssuchsamples.

ThedatarangescorrespondingtopartialsubsegmentsincludebothMovieFragmentboxesandMediaDataboxes.Thefirstpartialsubsegment,i.e.thelowestlevel,willcorrespondtoaMovieFragment box as well as (parts of) Media Data box(es), whereas subsequent partialsubsegments(higherlevels)maycorrespondto(partsof)MediaDatabox(es)only.

NOTE assignment_type equal to 0 (specified in the subsegment index box ‘leva’) can be used, for example,togetherwiththetemporallevelsamplegrouping(‘tele’)whenframesofavideobitstreamaretemporallyorderedwithinsubsegments;assignment_type equalto2canbeused,forexample,wheneachviewofamultiviewvideobitstreamiscontainedinaseparatetrackandthetrackfragmentsforalltheviewsarecontainedinasinglemoviefragment. assignment_type equal to 3 may be used, for example, when audio and video movie fragments(including the respectiveMedia Data boxes) are interleaved. The first level can be specified to contain the audiomoviefragments(includingtherespectiveMediaDataboxes),whereasthesecondlevelcanbespecifiedtocontainbothaudioandvideomoviefragments(includingallMediaDataboxes).

Page 125: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 111

8.16.4.2 Syntax

aligned(8) class SubsegmentIndexBox extends FullBox(‘ssix’, 0, 0) { unsigned int(32) subsegment_count; for( i=1; i <= subsegment_count; i++) { unsigned int(32) range_count; for ( j=1; j <= range_count; j++) { unsigned int(8) level; unsigned int(24) range_size; } } }

8.16.4.3 Semantics

subsegment_countisapositiveintegerspecifyingthenumberofsubsegmentsforwhichpartialsubsegment information is specified in this box. subsegment_count shall be equal toreference_count (i.e., the number of movie fragment references) in the immediatelyprecedingSegmentIndexbox.

range_count specifies the number of partial subsegment levels into which the media data isgrouped.Thisvalueshallbegreaterthanorequalto2.

range_sizeindicatesthesizeofthepartialsubsegment.levelspecifiestheleveltowhichthispartialsubsegmentisassigned.

8.16.5 Producer Reference Time Box

8.16.5.1 Definition

BoxType: `prft’Container: FileMandatory: NoQuantity: Zeroormore

Theproducerreferencetimeboxsuppliesrelativewall‐clocktimesatwhichmoviefragments,orfilescontainingmovie fragments (such as segments)wereproduced.When these files are bothproducedand consumed in real time, this can provide clients with information to enable consumption andproductiontoproceedatequivalentrates,thusavoidingpossiblebufferoverfloworunderflow.

Thisboxisrelatedtothenextmoviefragmentboxthatfollowsitinbitstreamorder.Itmustfollowanysegment type or segment index box (if any) in the segment, and occur before the following moviefragmentbox(towhichitrefers).Ifasegmentfilecontainsanyproducerreferencetimeboxes,thenthefirstofthemshalloccurbeforethefirstmoviefragmentboxinthatsegment.

The box contains a time value measured on a clock which increments at the same rate as a UTC‐synchronizedNTP[RFC5905]clock,usingNTPformat.Thisisassociatedwithamediatimeforoneofthetracksinthemoviefragment.Thatmediatimeshouldbeintherangeoftimesinthattrackintheassociatedmoviefragment.

Producerreferencetimesshouldbeassociatedwithatmostonetrack.

Page 126: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

112 ©ISO/IEC2015–Allrightsreserved

8.16.5.2 Syntax

aligned(8) class ProducerReferenceTimeBox extends FullBox(‘prft’, version, 0) { unsigned int(32) reference_track_ID; unsigned int(64) ntp_timestamp; if (version==0) { unsigned int(32) media_time; } else { unsigned int(64) media_time; } }

8.16.5.3 Semantics

reference_track_IDprovidesthetrack_IDforthereferencetrack.ntp_timestampindicatesaUTCtimeinNTPformatcorrespondingtodecoding_time.media_timecorrespondstothesametimeasntp_timestamp,butinthetimeunitsusedforthe

referencetrack,andismeasuredonthismediaclockasthemediaisproduced.

NOTE inmostcasesthistimestampwillnotbeequaltothetimestampofthefirstsampleoftheadjacentsegmentofthereferencetrack,butitisrecommendeditbeintherangeofthesegmentcontainingthisproducerreferencetimebox.

8.17 Support for Incomplete Tracks

8.17.1 General

ThisSubclausedocumentsthesampleentryformatsfortracksthatareincomplete.Incompletetracksmaycontainsamplesthataremarkedemptyornotreceivedusingthesampleformat.

Incompletetracksmayresult,forexample,whensubsegmentsarereceivedpartiallyaccordingtolevelassignmentsandpadding_flagintheLevelAssignmentboxindicatesthatthedatainaMediaDataboxthatisnotreceivedcanbereplacedbyzeros.Consequently,sampledataassignedtonon‐accessedlevels is not present, and care should be taken not to attempt to process such samples.However, inpartiallyreceivedsubsegmentssometracksmightremaincompleteincontentwhileothertracksmightbeincompleteandonlycontaindatathatisincludedbyreferenceintothecompletetracks.

This Subclause specifies support for sample entry formats for incomplete tracks.With this support,readerscandetectincompletetracksfromtheirsampleentriesandavoidprocessingsuchtracksortakethepossibilityofemptyornotreceivedsamplesintoaccountwhenprocessingsuchtracks.

Page 127: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 113

The support for incomplete tracks is similar to the content protection transformationwhere sampleentriesarehiddenbehindgenericsampleentries,suchas‘encv’and‘enca’.Becausetheformatofasample entry varies with media‐type, a different encapsulating four‐character‐code is used forincompletetracksofeachmediatype(audio,video,textetc.).Theyare:

Stream (Track) Type Sample-Entry Code

Video icpv

Audio icpa

Text icpt

System icps

Hint icph

TimedMetadata icpm

Sampledataofincompletetracksmaybeincludedintosamplesofothertracksbyreference,andhenceanincompletetrackshouldnotberemovedaslongasanytrackreferencepointstoit.

NOTE–Thechoiceoflevelbytheoriginalrecordingclientmayvaryovertime,andattimesrepresentthecomplete track.The level isnot indicatedhere, and it isnot required that the sampleentry change from‘incomplete’ to ‘complete’whenall levelswere, in fact, received, foraperiod.Notealso that the ‘originalformat’ may have indicated encryption, if partial reception and decryption works for that encryptionformat.

8.17.2 Transformation

Thesampleentryforatrackthatbecomesincompletee.g.throughpartialreception,shouldbemodifiedasfollows:

1) Thefour‐character‐codeofthesampleentry,e.g.‘avc1’,isreplacedbyanewsampleentrycode‘icpv’meaninganincompletetrack.

2) ACompleteTrackInformationboxisaddedtothesampledescription,leavingallotherboxesunmodified.

3) The original sample entry type, e.g.‘avc1’, is storedwithin anOriginal Format boxcontainedintheCompleteTrackInformationbox.

Aftertransformation,anexampleAVCsampleentrymightlooklike:

class IncompleteAVCSampleEntry() extends VisualSampleEntry (‘icpv’){ CompleteTrackInfoBox(); AVCConfigurationBox config; MPEG4BitRateBox (); // optional MPEG4ExtensionDescriptorsBox (); // optional }

Page 128: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

114 ©ISO/IEC2015–Allrightsreserved

8.17.3 Complete Track Information Box

8.17.3.1 Definition

BoxTypes: ‘cinf’Container: SampleEntryforanIncompleteTrackMandatory: YesQuantity: Exactlyone

TheCompleteTrackInformationBoxcontains,withintheOriginalFormatBox,thesampleentryformatof the complete track thatwas transformed to thepresent incomplete track. Itmay containoptionalboxesforexampleincludinginformationrequiredtoprocesssamplesofthepresentincompletetrack.TheCompleteTrackInformationBoxisacontainerbox.Itismandatoryinasampleentrythatusesacodeindicatinganincompletetrack.

8.17.3.2 Syntax

aligned(8) class CompleteTrackInfoBox(fmt) extends Box('cinf') { OriginalFormatBox(fmt) original_format; }

9 Hint Track Formats

9.1 RTP and SRTP Hint Track Format

9.1.1 Introduction

RTP is the real‐time transport protocol defined by the IETF (RFC 3550 and 3551) and is currentlydefinedtobeabletocarryalimitedsetofmediatypes(principallyaudioandvideo)andcodings.ThepackingofMPEG‐4elementarystreamsintoRTPisunderdiscussioninbothbodies.However,itisclearthatthewaythemediaispacketizeddoesnotdifferinkindfromtheexistingtechniquesusedforothercodecsinRTP,andsupportedbythisscheme.

InstandardRTP,eachmediastreamissentasaseparateRTPstream;multiplexingisachievedbyusingIP’s port‐level multiplexing, not by interleaving the data from multiple streams into a single RTPsession.However,ifMPEGisused,itmaybenecessarytomultiplexseveralmediatracksintooneRTPtrack(e.g.whenusingMPEG‐2transportinRTP,orFlexMux).Eachhinttrackisthereforetiedtoasetofmedia tracks by track references. The hint tracks extract data from their media tracks by indexingthroughthistable.Hinttrackreferencestomediatrackshavethereferencetype‘hint’.

This design decides the packet size at the time the server hint track is created; therefore, in thedeclarations for thehint track,we indicate the chosenpacket size. This is in the sample‐description.NotethatitisvalidfortheretobeseveralRTPhinttracksforeachmediatrack,withdifferentpacketsize choices. Similarly the time‐scale for the RTP clock is provided. The timescale of the server hinttrackisusuallychosentomatchthetimescaleofthemediatracks,orasuitablevalueispickedfortheserver. In somecases, theRTP timescale isdifferent (e.g. 90kHz for someMPEGpayloads), and thispermits thatvariation.Sessiondescription(SAP/SDP) information isstored inuser‐databoxes in thetrack.

Page 129: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 115

RTPhinttracksdonotusethecompositiontimeoffsettable(‘ctts’).Instead,thehintingprocessforserver hint tracks establishes the correct transmission order and time‐stamps, perhaps using thetransmissiontimeoffsettosettransmissiontimes.

Hinted contentmay require the use of SRTP for streaming by using the hint track format for SRTP,definedhere.SRTPhinttracksareformattedidenticallytoRTPhinttracks,exceptthat:

1) thesampleentrynameischangedfrom‘rtp ‘to‘srtp’toindicatetotheserverthatSRTPisrequired;

2) anextraboxisaddedtothesampleentrywhichcanbeusedtoinstructtheserverinthenatureoftheon‐the‐flyencryptionandintegrityprotectionthatmustbeapplied.

9.1.2 Sample Description Format

RTP server hint tracks are hint tracks (media handler‘hint’),with an entry‐format in the sampledescriptionof‘rtp ‘:

class RtpHintSampleEntry() extends SampleEntry (‘rtp ‘) { uint(16) hinttrackversion = 1; uint(16) highestcompatibleversion = 1; uint(32) maxpacketsize; box additionaldata[]; }

The hinttrackversion is currently 1; the highest compatible version field specifies the oldestversionwithwhichthistrackisbackward‐compatible.

Themaxpacketsizeindicatesthesizeofthelargestpacketthatthistrackwillgenerate.

Theadditionaldataisasetofboxes,fromthefollowing.

class timescaleentry() extends Box(‘tims’) { uint(32) timescale; } class timeoffset() extends Box(‘tsro’) { int(32) offset; } class sequenceoffset extends Box(‘snro’) { int(32) offset; }

The timescale entry is required. The other two are optional. The offsets over‐ride the default serverbehaviour,whichistochoosearandomoffset.Avalueof0,therefore,willcausetheservertoapplynooffsettothetimestamporsequencenumberrespectively.

AnSRTPHintSampleentryisusedwhenitisrequiredthatSRTPprocessingisrequired.

class SrtpHintSampleEntry() extends SampleEntry (‘srtp‘) { uint(16) hinttrackversion = 1; uint(16) highestcompatibleversion = 1; uint(32) maxpacketsize; box additionaldata[]; }

Page 130: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

116 ©ISO/IEC2015–Allrightsreserved

FieldsandboxesaredefinedasfortheRtpHintSampleEntry(‘rtp ‘)oftheISOBaseMediaFileFormat.However,anSRTPProcessBoxshallbeincludedinanSrtpHintSampleEntryasoneoftheadditionaldataboxes.

9.1.2.1 SRTP Process box ‘srpp‘:

BoxType: ‘srpp’Container: SrtpHintSampleEntryMandatory:YesQuantity: Exactlyone

TheSRTPProcessBoxmayinstructtheserverastowhichSRTPalgorithmsshouldbeapplied.

aligned(8) class SRTPProcessBox extends FullBox(‘srpp’, version, 0) { unsigned int(32) encryption_algorithm_rtp; unsigned int(32) encryption_algorithm_rtcp; unsigned int(32) integrity_algorithm_rtp; unsigned int(32) integrity_algorithm_rtcp; SchemeTypeBox scheme_type_box; SchemeInformationBox info; }

TheSchemeTypeBoxandSchemeInformationBoxhavethesyntaxdefinedaboveforprotectedmediatracks.TheyservetoprovidetheparametersrequiredforapplyingSRTP.TheSchemeTypeBoxisusedto indicate the necessary key‐management and security policy for the stream in extension to thedefined algorithmic pointers provided by the SRTPProcessBox. The key‐management functionality isalso used to establish all the necessary SRTP parameters as listed in section 8.2 of the SRTPspecification.Theexactdefinitionofprotectionschemesisoutofthescopeofthefileformat.

The algorithms for encryption and integrity protection are defined by SRTP. The following formatidentifiersaredefinedhere.Anentryoffourspaces($20$20$20$20)maybeusedtoindicatethatthechoiceofalgorithmforeitherencryptionorintegrityprotectionisdecidedbyaprocessoutsidethefileformat.

Format Algorithm

$20$20$20$20 Thechoiceofalgorithmforeitherencryptionorintegrityprotectionisdecidedbyaprocessoutsidethefileformat

ACM1 EncryptionusingAESinCounterModewith128‐bitkey,asdefinedinSection4.1.1oftheSRTPspecification.

AF81 Encryption using AES in F8‐mode with 128‐bit key, as defined inSection4.1.2oftheSRTPspecification.

ENUL EncryptionusingtheNULL‐algorithmasdefinedinSection4.1.3oftheSRTPspecification

SHM2 IntegrityprotectionusingHMAC‐SHA‐1with160‐bitkey,asdefinedinSection4.2.1oftheSRTPspecification.

ANUL Integrity protection not applied to RTP (but still applied to RTCP).Note:thisisvalidonlyforintegrity_algorithm_rtp

Page 131: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 117

9.1.3 Sample Format

EachsampleinaserverhinttrackwillgenerateoneormoreRTPpackets,whoseRTPtimestampisthesameasthehintsampletime.Therefore,allthepacketsmadebyonesamplehavethesametimestamp.However, provision ismade to ask the server to ‘warp’ the actual transmission times, for data‐ratesmoothing,forexample.

Eachsamplecontains twoareas: the instructions tocompose thepackets,andanyextradataneededwhen sending those packets (e.g. an encrypted version of themedia data). Note that the size of thesampleisknownfromthesamplesizetable.

aligned(8) class RTPsample { unsigned int(16) packetcount; unsigned int(16) reserved; RTPpacket packets[packetcount]; byte extradata[]; }

9.1.3.1 Packet Entry format

Eachpacketinthepacketentrytablehasthefollowingstructure:

aligned(8) class RTPpacket { int(32) relative_time; // the next fields form initialization for the RTP // header (16 bits), and the bit positions correspond bit(2) RTP_version; bit(1) P_bit; bit(1) X_bit; bit(4) CSRC_count; bit(1) M_bit; bit(7) payload_type; unsigned int(16) RTPsequenceseed; unsigned int(13) reserved = 0; unsigned int(1) extra_flag; unsigned int(1) bframe_flag; unsigned int(1) repeat_flag; unsigned int(16) entrycount; if (extra_flag) { uint(32) extra_information_length; box extra_data_tlv[]; } dataentry constructors[entrycount]; }

ThesemanticsofthefieldsforRTPserverhinttracksisspecifiedbelow.RTPreceptionhinttracksusethe same packet structure. The semantics of the fieldswhen the packet structure is used in an RTPreceptionhinttrackisspecifiedinsubclause9.4.1.4.

In serverhint tracks, therelative_time field ‘warps’ theactual transmission timeaway from thesampletime.Thisallowstrafficsmoothing.

Thefollowing2bytesexactlyoverlaytheRTPheader;theyassisttheserverinmakingtheRTPheader(the server fills in the remaining fields). Within these 2 bytes, the fields RTP_version andCSRC_countarereservedinserver(transmission)hinttracksandtheserverfillsinthesefields.

Page 132: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

118 ©ISO/IEC2015–Allrightsreserved

ThesequenceseedisthebasisfortheRTPsequencenumber.Ifahinttrackcausesmultiplecopiesofthe same RTP packet to be sent, then the seed value would be the same for them all. The servernormallyaddsarandomoffsettothisvalue(butseeabove,under‘sequenceoffset’).

extra_flagequalto1indicatesthatthereisextrainformationbeforetheconstructors,intheformoftype‐length‐valuesets.

extra_information_length indicates the length in bytes of all extra information before theconstructors, which includes the four bytes of the extra information_length field. Thesubsequentboxesbefore theconstructors,referredtoas theTLVboxes,arealignedon32‐bitboundaries.TheboxsizeofanyTLVboxindicatestheactualbytesused,notthelengthrequiredfor padding to 32‐bit boundaries. The value of extra_information_length includes therequiredpaddingfor32‐bitboundaries.

The rtpoffsetTLV (‘rtpo’)givesa32‐bitsignedintegeroffsettotheactualRTPtime‐stamptoplaceinthepacket.Thisenablespacketstobeplacedinthehinttrackindecodingorder,buthavetheirpresentationtime‐stampin the transmittedpacketbe inadifferentorder.This isnecessary forsomeMPEGpayloads.

Thebframe_flagindicatesadisposable‘b‐frame’.Therepeat_flagindicatesa‘repeatpacket’,onethatissentasaduplicateofapreviouspacket.Serversmaywishtooptimizehandlingofthesepackets.

9.1.3.2 Constructor format

Therearevariousformsoftheconstructor.Eachconstructoris16bytes,tomakeiterationeasier.Thefirstbyteisauniondiscriminator:

aligned(8) class RTPconstructor(type) { unsigned int(8) constructor_type = type; } aligned(8) class RTPnoopconstructor extends RTPconstructor(0) { uint(8) pad[15]; } aligned(8) class RTPimmediateconstructor extends RTPconstructor(1) { unsigned int(8) count; unsigned int(8) data[count]; unsigned int(8) pad[14 - count]; } aligned(8) class RTPsampleconstructor extends RTPconstructor(2) { signed int(8) trackrefindex; unsigned int(16) length; unsigned int(32) samplenumber; unsigned int(32) sampleoffset; unsigned int(16) bytesperblock = 1; unsigned int(16) samplesperblock = 1; } aligned(8) class RTPsampledescriptionconstructor

Page 133: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 119

extends RTPconstructor(3) { signed int(8) trackrefindex; unsigned int(16) length; unsigned int(32) sampledescriptionindex; unsigned int(32) sampledescriptionoffset; unsigned int(32) reserved; }

Theimmediatemodepermitstheinsertionofpayload‐specificheaders(e.g.theRTPH.261header).Forhint trackswhere themedia is sent ‘in the clear’, thesample entry then specifies thebytes to copyfromthemediatrack,bygivingthesamplenumber,dataoffset,andlengthtocopy.Thetrackreferencemayindexintothetableoftrackreferences(astrictlypositivevalue),namethehinttrackitself(‐1),ortheonlyassociatedmediatrack(0).(Thevaluezeroisthereforeequivalenttothevalue1.)

Thebytesperblock andsamplesperblock concern compressed audio, using a schemeprior toMP4,inwhichtheaudioframingwasnotevidentinthefile.Thesefieldshavethefixedvaluesof1forMP4files.

The sampledescription mode allows sending of sample descriptions (which would containelementary stream descriptors), by reference, as part of an RTP packet. The index is the index of aSampleEntry in a Sample Description Box, and the offset is relative to the beginning of thatSampleEntry.

Forcomplexcases(e.g.encryptionorforwarderrorcorrection),thetransformeddatawouldbeplacedintothehintsamples,intheextradatafield,andthensamplemodereferencingthehinttrackitselfwouldbeused.

Noticethatthereisnorequirementthatsuccessivepacketstransmitsuccessivebytes fromthemediastream.Forexample,toconformwithRTP‐standardpackingofH.261, it issometimesrequiredthatabyte be sent at the end of one packet and also at the beginning of the next (when a macroblockboundaryfallswithinabyte).

9.1.4 SDP Information

Streaming servers using RTSP and SDP usually use SDP as the description format; and there arenecessary relationships between the SDP information, and the RTP streams, such as themapping ofpayload IDs to MIME names. Provision is therefore made for the hinter to leave fragments of SDPinformationinthefile,toassisttheserverinformingafullSDPdescription.NotethattherearerequiredSDPentries,whichtheservershouldalsogenerate.Theinformationhereisonlypartial.

SDPinformationisformattedasasetofboxeswithinuser‐databoxes,atboththemovieandthetracklevel.Thetextinthemovie‐levelSDPboxshouldbeplacedbeforeanymedia‐specificlines(beforethefirst‘m=’intheSDPfile).

9.1.4.1 Movie SDP information

Atthemovielevel,withintheuser‐data(‘udta’)box,ahintinformationcontainerboxmayoccur:

Page 134: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

120 ©ISO/IEC2015–Allrightsreserved

aligned(8) class moviehintinformation extends box(‘hnti’) { } aligned(8) class rtpmoviehintinformation extends box(‘rtp ‘) { uint(32) descriptionformat = ‘sdp ‘; char sdptext[]; }

Thehintinformationboxmaycontaininformationformultipleprotocols;onlyRTPisdefinedhere.TheRTP box may contain information for various description formats; only SDP is defined here. Thesdptextiscorrectlyformattedasaseriesoflines,eachterminatedby<crlf>,asrequiredbySDP.

9.1.4.2 Track SDP Information

Atthetracklevel,thestructureissimilar;however,wealreadyknowthatthistrackisanRTPhinttrack,fromthesampledescription.Thereforethechildboxmerelyspecifiesthedescriptionformat.

aligned(8) class trackhintinformation extends box(‘hnti’) { } aligned(8) class rtptracksdphintinformation extends box(‘sdp ‘) { char sdptext[]; }

Thesdptextiscorrectlyformattedasaseriesoflines,eachterminatedby<crlf>,asrequiredbySDP.

9.1.5 Statistical Information

Inadditiontothestatisticsinthehintmediaheader,thehintermayplaceextradatainahintstatisticsbox,inthetrackuser‐databox.Thisisacontainerboxwithavarietyofsub‐boxesthatitmaycontain.

aligned(8) class hintstatisticsbox extends box(‘hinf’) { }

aligned(8) class hintBytesSent extends box(‘trpy’) { uint(64) bytessent; } // total bytes sent, including 12-byte RTP headers aligned(8) class hintPacketsSent extends box(‘nump’) { uint(64) packetssent; } // total packets sent aligned(8) class hintBytesSent extends box(‘tpyl’) { uint(64) bytessent; } // total bytes sent, not including RTP headers

aligned(8) class hintBytesSent extends box(‘totl’) { uint(32) bytessent; } // total bytes sent, including 12-byte RTP headers aligned(8) class hintPacketsSent extends box(‘npck’) { uint(32) packetssent; } // total packets sent aligned(8) class hintBytesSent extends box(‘tpay’) { uint(32) bytessent; } // total bytes sent, not including RTP headers

aligned(8) class hintmaxrate extends box(‘maxr’) { // maximum data rate uint(32) period; // in milliseconds uint(32) bytes; } // max bytes sent in any period ‘period’ long // including RTP headers

aligned(8) class hintmediaBytesSent extends box(‘dmed’) { uint(64) bytessent; } // total bytes sent from media tracks aligned(8) class hintimmediateBytesSent extends box(‘dimm’) { uint(64) bytessent; } // total bytes sent immediate mode aligned(8) class hintrepeatedBytesSent extends box(‘drep’) { uint(64) bytessent; } // total bytes in repeated packets

aligned(8) class hintminrelativetime extends box(‘tmin’) { int(32) time; } // smallest relative transmission time, milliseconds aligned(8) class hintmaxrelativetime extends box(‘tmax’) { int(32) time; } // largest relative transmission time, milliseconds

Page 135: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 121

aligned(8) class hintlargestpacket extends box(‘pmax’) { uint(32) bytes; } // largest packet sent, including RTP header aligned(8) class hintlongestpacket extends box(‘dmax’) { uint(32) time; } // longest packet duration, milliseconds

aligned(8) class hintpayloadID extends box(‘payt’) { uint(32) payloadID; // payload ID used in RTP packets uint(8) count; char rtpmap_string[count]; }

NOTENotallthesesub‐boxesmaybepresent,andthattheremaybemultiple‘maxr’boxes,coveringdifferentperiods.

9.2 ALC/LCT and FLUTE Hint Track Format

9.2.1 Introduction

Thefileformatsupportsmulticast/broadcastdeliveryoffileswithFECprotection.Filestobedeliveredarestoredas items inacontainer file(definedbythe file format)andthemetabox isamendedwithinformation on how the files are partitioned into source symbols. For each source block of a FECencoding, additional parity symbols can be pre‐computed and stored as FEC reservoir items. Thepartitioning depends on the FEC scheme, the target packet size, and the desired FEC overhead. Pre‐composedsourcesymbolscanbestoredasFilereservoiritemstominimizeduplicateinformationinthecontainer file especially with MBMS‐FEC. The actual transmission is governed by hint tracks thatcontainserverinstructionsthatfacilitatetheencapsulationofsourceandFECsymbolsintopackets.

FD hint tracks have been designed for the ALC/LCT (Asynchronous Layered Coding/Layered CodingTransport)andFLUTE(FileDeliveryoverUnidirectionalTransport)protocols.LCTprovidestransportlevelsupportforreliablecontentdeliveryandstreamdeliveryprotocols.ALCisaprotocolinstantiationof the LCT building block, and it serves as a base protocol for massively scalable reliablemulticastdistribution of arbitrary binary objects. FLUTE builds on top of ALC/LCT and defines a protocol forunidirectionaldeliveryoffiles.

FLUTEdefinesaFileDeliveryTable(FDT),whichcarriesmetadataassociatedwiththefilesdeliveredintheALC/LCTsession,andprovidesmechanismsfor in‐banddeliveryandupdatesofFDT. Incontrast,ALC/LCTreliesonothermeansforout‐of‐banddeliveryoffilemetadata,e.g.,anelectronicserviceguidethat is normally delivered to clients well in advance of the ALC/LCT session combinedwith updatefragmentsthatcanbesentduringtheALC/LCTsession.

FilepartitioningsandFECreservoirscanbeusedindependentlyofFDhinttracksandviceversa.Theformeraidthedesignofhinttracksandallowalternativehinttracks,e.g.,withdifferentFECoverheads,tore‐usethesameFECsymbols.TheyalsoprovidemeanstoaccesssourcesymbolsandadditionalFECsymbols independently forpost‐deliveryrepair,whichmaybeperformedoverALC/LCTorFLUTEorout‐of‐band via another protocol. In order to reduce complexity when a server follows hint trackinstructions,hinttracksreferdirectlytodatarangesofitemsordatacopiedintohintsamples.

ItisrecommendedthataserversendsadifferentsetofFECsymbolsforeachretransmissionofafile.

Thesyntaxforusingthemetaboxasacontainerfileforsourcefilesisdefinedin8.10.4,partitions,fileandFECreservoirsaredefinedin8.13,whilethesyntaxforFDhinttracksisdefinedin9.2.

Page 136: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

122 ©ISO/IEC2015–Allrightsreserved

9.2.2 Design principles

The support for file delivery is designed to optimize the server transmission process by enablingALC/LCT or FLUTE servers to follow simple instructions. It is enough to follow one pre‐definedsequenceofinstructionsperchannelinordertotransmitonesession.Thefileformatenablesstorageofpre‐computedsourceblocksandsymbolpartitionings,i.e.,filesmaybepartitionedintosymbolswhichfitanintendedpacketsize,andpre‐computingacertainamountofFEC‐symbolsthatalsocanbeusedfor post‐session repair. The file format also allows storage of alternative ALC/LCT or FLUTEtransmission session instructions thatmay lead to equivalent end results. Such alternativesmay beintendedfordifferentchannelconditionsbecauseofhigherFECprotectionorevenbyusingdifferenterrorcorrectionschemes.Alternativesessionscanrefer toacommonsetofsymbols.Thehint tracksareflexibleandcanbeusedtocomposeFDTfragmentsandinterleavingofsuchfragmentswithintheactual object transmission. Several hint tracks can be combined into one ormore sessions involvingsimultaneoustransmissionovermultiplechannels.

It is important to make a difference between the definition of sessions for transmission and theschedulingofsuchsessions.ALC/LCTandFLUTEserver filesonlyaddressoptimizationof theservertransmissionprocess.Inordertoensuremaximalusageandflexibilityofsuchpre‐definedsessions,alldetailsregardingschedulingaddresses,etc.arekeptoutsidethedefinitionof the file format.Externalscheduling applications decide such details, which are not important for optimizing transmissionsessions per se. In particular, the following information is out‐of‐scope of the file format: timescheduling,targetaddressesandports,sourceaddressesandports,andso‐calledTransmissionSessionIdentifiers(TSI).

The sample numbers associated with the samples of a file delivery hint track provide a numberedsequence. Hint track sample times provide send times of ALC/LCT or FLUTE packets for a defaultbitrate.Dependingon theactual transmissionbitrate,anALC/LCTorFLUTEservermayapply lineartime scaling. Sample times may simplify the scheduling process, but it is up to the server to sendALC/LCTorFLUTEpacketsinatimelymanner.

AschematicpictureofafilecontainingthreealternativehinttrackswithdifferentFECoverheadforasourcefileisprovidedinFigure6.Inthisexample,eachsourceblockconsistsofonlyonesub‐block.

Page 137: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 123

Src Sym [0-5119 ]

FEC Sym #2 [0-511 ]Src Sym [5120 -10240 ]FEC Sym #1 [0-511 ]track #1

(10 % FEC )

FEC Sym #2 [0-614 ]FEC Sym #1 [0 -614 ]track #2

( ~12% FEC)

FEC Sym #2 [0 -716 ]FEC Sym #1 [0- 716 ]track #3

(14 % FEC )

File item

Storage Format of a single file

FEC reservoir item s

FEC for Src Block #1

FEC for Src Block #2

Src Sym [0 -5119 ]

Src Sym [0 -5119 ]

Src Sym [5120 -10240 ]

Src Sym [5120 -10240 ]

Figure 4 — Different FEC overheads of a source file provided by alternative hint tracks.

Thesourcefileintheabovefigureispartitionedinto2sourceblockscontainingsymbolsofafixedsize.FECredundancysymbolsarecalculatedforbothsourceblocksandstoredasFECreservoiritems.Asthehint tracks reference the same items in the file there is no duplication of information. The originalsourcesymbolsandFECreservoirscanalsobeusedbyrepairserversthatdon’tusehinttracks.

9.2.3 Sample Description Format

9.2.3.1 Definition

FD hint tracks are tracks with handler_type ‘hint’ and with the entry‐format ‘fdp ' in thesampledescriptionbox.TheFDhintsampleentryiscontainedinthesampledescriptionbox('stsd').

9.2.3.2 Syntax

class FDHintSampleEntry() extends SampleEntry ('fdp ') { unsigned int(16) hinttrackversion = 1; unsigned int(16) highestcompatibleversion = 1; unsigned int(16) partition_entry_ID; unsigned int(16) FEC_overhead; Box additionaldata[]; //optional }

Page 138: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

124 ©ISO/IEC2015–Allrightsreserved

9.2.3.3 Semantics

partition_entry_IDindicatesthepartitionentryintheFDiteminformationbox.Azerovalueindicates that no partition entry is associated with this sample entry, e.g., for FDT. If thecorrespondingFDhinttrackcontainsonlyoverheaddatathisvalueshouldindicatethepartitionentrywhoseoverheaddataisinquestion.

FEC_overheadisafixed8.8valueindicatingthepercentageprotectionoverheadusedbythehintsample(s). The intention of providing this value is to provide characteristics to help a serverselecta sessiongroup(andcorrespondingFDhint tracks). If thecorrespondingFDhint trackcontains only overhead data this value should indicate the protection overhead achieved byusingallFDhinttracksinasessiongroupuptotheFDhinttrackinquestion.

Thehinttrackversion andhighestcompatibleversion fieldshavethesameinterpretationas in theRTPhint sampleentrydescribed in9.1.2.As additionaldata a timescale entryboxmaybeprovided.Ifnotprovided,thereisnoindicationgivenontimingofpackets.

Fileentriesneeded for anFDTor anelectronic serviceguide canbe createdbyobservingall sampleentries of a hint track and the corresponding item informationboxes of the items referencedby theabovepartitionentryIDs.Nosampleentriesshallbeincludedinthehinttrackiftheyarenotreferencedbyanysample.

9.2.4 Sample Format

9.2.4.1 Sample Container

EachFDsampleinthehinttrackwillgenerateoneormoreFDpackets.

Eachsamplecontains twoareas: the instructions tocompose thepackets,andanyextradataneededwhensendingthosepackets(e.g.,encodingsymbolsthatarecopiedintothesampleinsteadofresidinginitemsforsourcefilesorFEC).Notethatthesizeofthesampleisknownfromthesamplesizetable.

aligned(8) class FDsample extends Box(‘fdsa’) { FDPacketBox packetbox[] ExtraDataBox extradata; //optional }

SamplenumbersofFD samplesdefine theorder they shall beprocessedby the server. Likewise, FDpacketboxes ineachFDsampleshouldappear in theorder theyshallbeprocessed. If the timescaleentrybox ispresent in theFDhintsampleentry, thensample timesaredefinedandproviderelativesendtimesofpacketsforadefaultbitrate.Dependingontheactualtransmissionbitrate,aservermayapplylineartimescaling.Sampletimesmaysimplifytheschedulingprocess,butitisuptotheservertosendpacketsinatimelymanner.

9.2.4.2 Packet Entry Format

EachpacketintheFDsamplehasthefollowingstructure(References:RFC3926,3450,3451):

Page 139: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 125

aligned(8) class FDpacketBox extends Box(‘fdpa’) { LCTheaderTemplate LCT_header_info; unsigned int(16) entrycount1; LCTheaderExtension header_extension_constructors[ entrycount1 ]; unsigned int(16) entrycount2; dataentry packet_constructors[ entrycount2 ]; }

The LCT header info contains LCT header templates for the current FD packet. Header extensionconstructors are structures which are used for constructing the LCT header extensions. PacketconstructorsareusedforconstructingtheFECpayloadIDandthesourcesymbolsinanFDpacket.

9.2.4.3 LCT Header Template Format

TheLCTheadertemplateisdefinedasfollows:

aligned(8) class LCTheaderTemplate { unsigned int(1) sender_current_time_present; unsigned int(1) expected_residual_time_present; unsigned int(1) session_close_bit; unsigned int(1) object_close_bit; unsigned int(4) reserved; unsigned int(16) transport_object_identifier; }

It can be used by a server to form an LCT header for a packet. Note that some parts of the headerdependontheserverpolicyandarenotincludedinthetemplate.SomefieldlengthsalsodependontheLCTheaderbitsassignedbytheserver.TheservermayalsoneedtochangethevalueoftheTransportObjectIdentifier(TOI).

9.2.4.4 LCT Header Extension Constructor Format

TheLCTheaderextensionconstructorformatisdefinedasfollows:

aligned(8) class LCTheaderextension { unsigned int(8) header_extension_type; if (header_extension_type > 127) { unsigned int(8) content[3]; } else { unsigned int(8) length; if (length > 0) { unsigned int(8) content[(length*4) - 2]; } }

Apositivevalueofthelengthfieldspecifiesthelengthoftheconstructorcontentinmultiplesof32bitwords.Azerovaluemeansthattheheaderisgeneratedbytheserver.

The usage and rules for LCT header extensions are defined in RFC3451 (LCT RFC). Theheader_extension_typecontainstheLCTHeaderExtensionType(HET)value.

HET values between 0 and 127 are used for variable‐length (multiple 32‐bitword) extensions. HETvalues between 128 and 255 are used for fixed length (one 32‐bit word) extensions. If theheader_extension_typeissmallerthan128,thenthelengthfieldcorrespondstotheLCTHeader

Page 140: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

126 ©ISO/IEC2015–Allrightsreserved

ExtensionLength (HEL)asdefined inRFC3451.Thecontent field always corresponds to theHeaderExtensionContent(HEC).

NOTEAservercanidentifypacketsincludingFDTbyobservingwhetherEXT_FDT(header_extension_type == 192)ispresent.

9.2.4.5 Packet Constructor Format

There are various forms of the constructor. Each constructor is 16 bytes in order tomake iterationeasier.Thefirstbyteisauniondiscriminator.ThepacketconstructorsareusedtoincludeFECpayloadIDaswellassourceandparitysymbolsinanFDpacket.

aligned(8) class FDconstructor(type) { unsigned int(8) constructor_type = type; } aligned(8) class FDnoopconstructor extends FDconstructor(0) { unsigned int(8) pad[15]; } aligned(8) class FDimmediateconstructor extends FDconstructor(1) { unsigned int(8) count; unsigned int(8) data[count]; unsigned int(8) pad[14 - count]; } aligned(8) class FDsampleconstructor extends FDconstructor(2) { signed int(8) trackrefindex; unsigned int(16) length; unsigned int(32) samplenumber; unsigned int(32) sampleoffset; unsigned int(16) bytesperblock = 1; unsigned int(16) samplesperblock = 1; } aligned(8) class FDitemconstructor extends FDconstructor(3) { unsigned int(16) item_ID; unsigned int(16) extent_index; unsigned int(64) data_offset; //offset in byte within extent unsigned int(24) data_length; //non-zero length in byte within extent or //if (data_length==0) rest of extent }

aligned(8) class FDitemconstructorLarge extends FDconstructor(5) { unsigned int(32) item_ID; unsigned int(32) extent_index; unsigned int(64) data_offset; //offset in byte within extent unsigned int(24) data_length; //non-zero length in byte within extent or //if (data_length==0) rest of extent }

aligned(8) class FDxmlboxconstructor extends FDconstructor(4) { unsigned int(64) data_offset; //offset in byte within XMLBox or BinaryXMLBox unsigned int(32) data_length; unsigned int(24) reserved; }

Page 141: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 127

9.2.4.6 Extra Data Box

EachsampleofanFDhinttrackmayincludeextradatastoredinanextradatabox:

aligned(8) class ExtraDataBox extends Box(‘extr’) { FECInformationBox feci; bit(8) extradata[]; }

9.2.4.7 FEC Information Box

9.2.4.7.1 Definition

BoxType: ‘feci’Container: ExtraDataBox(‘extr’)Mandatory: NoQuantity: ZeroorOne

TheFECInformationboxstoresFECencodingID,FECinstanceIDandFECpayloadIDwhichareneededwhensendinganFDpacket.

9.2.4.7.2 Syntax

aligned(8) class FECInformationBox extends Box('feci') { unsigned int(8) FEC_encoding_ID; unsigned int(16) FEC_instance_ID; unsigned int(16) source_block_number; unsigned int(16) encoding_symbol_ID; }

9.2.4.7.3 Semantics

FEC_encoding_ID identifies theFECencodingschemeand issubject to IANAregistration(seeRFC 5052), in which (i) value zero corresponds to the "Compact No‐Code FEC scheme" alsoknown as "Null‐FEC" (RFC3695); (ii) value one corresponds to the “MBMS FEC” (3GPP TS26.346); (iii) for values in the rangeof 0 to127, inclusive, theFEC scheme is Fully‐Specified,whereasforvaluesintherangeof128to255,inclusive,theFECschemeisUnder‐Specified.

FEC_instance_ID providesamorespecificidentificationoftheFECencoderbeingusedforanUnder‐SpecifiedFECscheme.ThisvalueshouldbesettozeroforFully‐SpecifiedFECschemesand shall be ignored when parsing a file with FEC_encoding_ID in the range of 0 to 127,inclusive.FEC_instance_IDisscopedbytheFEC_encoding_ID.SeeRFC5052forfurtherdetails.

source_block_number identifiesfromwhichsourceblockoftheobjecttheencodingsymbol(s)intheFDpacketaregenerated.

encoding_symbol_ID identifieswhichspecificencodingsymbol(s)generated fromthesourceblockarecarriedintheFDpacket.

9.3 MPEG-2 Transport Hint Track Format

9.3.1 Introduction

MPEG‐2TS(TransportStream)isastreammultiplexwhichcancarryoneormoreprograms,consistingofaudio,videoandothermedia.ThefileformatsupportsthestorageofMPEG‐2TSinahinttrack.AnMPEG‐2TShinttrackcanbeusedforbothstorageofreceivedTSpackets(asareceptionhinttrack),andasaserverhinttrackusedforthegenerationofanMPEG‐2TS.

Page 142: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

128 ©ISO/IEC2015–Allrightsreserved

TheMPEG‐2TShinttrackdefinitionsupportsso‐called“precomputedhints”.Precomputedhintsmakenouseof includingdatabyreference fromother tracks,but ratherMPEG‐2TSpacketsarestoredassuch.ThisallowsreusingtheMPEG‐2TSpacketsstoredinaseparatefile.Furthermore,precomputedhintsfacilitatesimplerecordingoperation.

In addition to precomputed hint samples, it is possible to includemedia data by reference tomediatracks into hint samples. Conversion of a received transport stream to media tracks would allowexistingplayerscompliantwithearlierversionsoftheISObasemediafileformattoprocessrecordedfiles as long as themedia formats are also supported. Storing the original transport headers retainsvaluableinformationforerrorconcealmentandthereconstructionoftheoriginaltransportstream.

9.3.2 Design Principles

ThedesignprinciplesoftheMPEG‐2TSHintTrackFormatareasfollows.

AsequenceofsamplesinanMPEG‐2TSHintTrackisasetofprecomputedandconstructedMPEG‐2TSpackets.PrecomputedpacketsareTSpacketswhicharestoredunchangedinthecaseofreceptionorwill be sent as is. This is especially importantwhere data cannot be de‐multiplexed and elementarystreamscannotbecreated–e.g.whenthetransportstreamisencryptedandisnotallowedtobestoreddecrypted. Therefore, it is necessary to be able to store the MPEG‐2 TS as such in a hint track.ConstructedpacketsusethesameapproachasRTPhinttracks,i.e.,thesamplecontainsinstructionsforastreamingservertoconstructthepacket.Theactualmediadataiscontainedinothertracks.Atrackreferenceoftype‘hint’isused.

9.3.2.1 Reusing existing Transport Streams

ItwasdesiredtoreuseexistingTSinstancesandthereforeanadditionalmechanismexiststocoverawidevarietyofexistingTSrecordings.TheserecordingsmayconsistnotonlyofTSpacketsbuthaveprecedingortrailingdatawitheachTSpacket.Aspecificcaseforprecedingdataisa4‐bytetimestampinfrontofeachTSpackettoremovethejitterofatransmissionsystem.AspecificcasefortrailingdataistheadditionofFECwhenaTSpacketistransmittedoveranerror‐pronechannel.

9.3.2.2 Timing

MPEG‐2 TS defines a single clock for each program, running at 27MHz, which sampling value istransported as PCRs in the TS for clock recovery. The timescale of MPEG‐2 TS Hint Tracks isrecommendedtobe90000,oranintegerdivisionormultiplethereof.

ThedecodingtimeofasampleinaMPEG‐2TSHintTrackisthereception/transmissiontimeofthefirstbitofthatpacketorpacketgroupwhichisrecommendedtobederivedfromthePCRtimestampsoftheTS,since if thePCRtimesareused,piece‐wise linearitycanbeassumedandthe ‘stts’ tablecompactssensibly. The optional ‘tsti’ box in the sample description can be used to signal whether receptiontimingwithorwithoutclockrecoverywasusedwhenthehinttrackisareceptionhinttrack.InthecaseofaserverhinttrackPCRtimingisassumed.

NOTE:Whentherearemultiplepacketsinasample,theycannotbegivenindependenttransmissiontimeoffsets.

Page 143: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 129

9.3.2.3 Packet Grouping

ThesampleformatforMPEG‐2TransportStreamHintTracksallowsmultipleTSpacketsinonesample.Specific applications, suchas some IPTVapplications, conveyTSpackets in anRTPstream.OnlyonereceptiontimestampcanbederivedforallTSpacketscarriedinoneRTPpacket.AnotherapplicationforstoringmultipleTSpacketsinasampleisSPTSs,whereasamplecontainsalltheTSpacketsforaGoP.Inthiscaseeverysampleisarandomaccesspoint.

Notethatrandom‐accesstoeveryTSpacketisnotpossiblebythemeansofthefileformatifmultipleTSpacketspersampleareused.

InthecaseofanMPTSonlyonepacketpersampleshouldbeused.Thisfacilitatestheuseofthesamplegroupmechanismonaper‐packetbasis.

9.3.2.4 Random-access points

Asyncsampleisapointatwhichprocessingofatrackmaybeginwithouterror.BothMPTSandSPTSaresupportedbyMPEG‐2TSHintTracks,howeverarandomaccesspoint,markedasasyncsample,isnormallyonlydefinedforSPTS,whereitspecifiesthebeginningofapacketthatcontainsthefirstbyteof an independently decodable media access unit (e.g. MPEG‐2 video I‐frames or MPEG‐4 AVC IDRpictures)ofastreamthatusesdifferentialcoding.ForMPTS,thesyncsampletablewouldnormallybepresentbutempty,indicatingthatthereisnopointinthetrackatwhichprocessingoftheentiretrackmaybeginwithouterror.ItisrecommendedthatthePSI/SIbeintheSampleDescriptionsothattruerandom‐accesswithjustthemediadataispossible.

NotethatinthecaseofanMPTS,thesyncsampletableispresentbutempty(whichmeansessentiallythatnosampleisasyncsample).

NotealsothatincaseofanSPTS,samplesincludingmultipleTSpacketsshouldhaveasyncpoint(e.g.GoPboundary)atthestartofasample.Thesyncsampletablethenmarksthesamplesthesyncpoints(e.g. thestartofGoPs); if thesyncsample table isabsent,all thesamplesaresyncpoints. If thesyncsampletableispresentbutempty,thesyncsamplepositionsareunknownandmaybenotatthestartofsamples.

NOTE: Anapplicationsearchingforakeyframecanstartreadingatthatlocation,butingeneralitalsohastoreadfurtherMPEG‐2TSpackets(regardingthefileformatthesearesubsequentsamples)sothatthedecodercandecodeacompleteframe.

9.3.2.5 Application as a Reception Hint Track

Reception hint tracks may be used when one or more packet streams of data are recorded. Theyindicatetheorder,receptiontiming,andcontentsofthereceivedpacketsamongotherthings.

NOTE1:Playersmayreproducethepacketstreamthatwasreceivedbasedonthereceptionhinttracksandprocessthereproducedpacketstreamasifitwasnewlyreceived.

Receptionhinttrackshavethesamestructureashinttracksforservers.

Theformatofthereceptionhintsamplesisindicatedbythesampledescriptionforthereceptionhinttrack.Eachprotocolhasitsownreceptionhintsampleformatandname.

Page 144: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

130 ©ISO/IEC2015–Allrightsreserved

NOTE2:Serversusingreceptionhinttracksashintsforsendingofthereceivedstreamsshouldhandlethepotentialdegradations of the received streams, such as transmission delay jitter and packet losses, gracefully andensure that the constraints of the protocols and contained data formats are obeyed regardless of thepotentialdegradationsofthereceivedstreams.

NOTE3:Aswithserverhinttracks,thesampleformatsofreceptionhinttracksmayenableconstructionofpacketsbypullingdataoutofothertracksbyreference.Theseothertracksmaybehinttracksormediatracks.Theexactformof thesepointers isdefinedbythesample format for theprotocol,but ingeneral theyconsistof fourpiecesofinformation:atrackreferenceindex,asamplenumber,anoffset,andalength.Someofthesemaybeimplicitforaparticularprotocol.These'pointers'alwayspointtotheactualsourceofthedata,i.e.,indirectdatareferencingisdisallowed.Ifahinttrackisbuilt'ontop'ofanotherhinttrack,thenthesecondhinttrackmusthavedirect references to themedia track(s)usedby the firstwheredata fromthosemedia tracks isplacedinthestream.

Ifreceiveddataisextractedtomediatracks,thede‐hintingprocessmustensurethatthemediastreamsarevalid,i.e.thestreamsmustbeerror‐free(whichrequirese.g.errorconcealment).

Asamplewithasizeofzeroispermittedinreceptionhinttracks,andsuchsamplesmaybeignored.

9.3.3 Sample Description Format

9.3.3.1 Introduction

ThesampledescriptionforanMPEG2‐TSreceptionhinttrackcontainsallstaticmetadatathatdescribethe streamoraportion thereof, especially thePSI/SI tables.MPEG‐2TS receptionhint tracksuseanentry‐formatinthesampledescriptionof'rm2t'(whichindicatesMPEG-2 Transport Stream).Theentry‐formatforMPEG2‐TSserverhinttracksis'sm2t'.

The staticmetadata documents e.g. PSI/SI tables. The presence of staticmetadata is optional.Whenpresent, the staticmetadata shall be valid for theMPEG2‐TS packets it describes. Consequently, if apieceofstaticmetadatachangesinthestream,anewsampleentryisneededforthefirstsampleatorafterthechange.Ifstaticmetadataisnotpresentinthesampleentry,structures,suchasPSI/SItables,storedintheMPEG2‐TSpacketsarevalidandthestreammustbescannedinordertofindoutwhichvaluesofstaticmetadataarevalidforaparticularsample.

9.3.3.2 Syntax

class MPEG2TSReceptionSampleEntry extends MPEG2TSSampleEntry(`rm2t´) { }

class MPEG2TSServerSampleEntry extends MPEG2TSSampleEntry(`sm2t´) { }

class MPEG2TSSampleEntry(name) extends HintSampleEntry(name) { uint(16) hinttrackversion = 1; uint(16) highestcompatibleversion = 1; uint(8) precedingbyteslen; uint(8) trailingbyteslen; uint(1) precomputed_only_flag; uint(7) reserved; box additionaldata[]; }

9.3.3.3 Semantics

hinttrackversion is currently 1; the highestcompatibleversion field specifies the oldestversionwithwhichthistrackisbackward‐compatible.

Page 145: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 131

precedingbyteslen indicates the number of bytes that are preceding each MPEG2‐TS packet(whichmaye.g.beatime‐codefromanexternalrecordingdevice).

trailingbyteslen indicates the number of bytes that are at the end of eachMPEG2‐TS packet(whichmaye.g.containchecksumsorotherdatathatwasaddedbyarecordingdevice).

precomputed_only_flagindicateswhethertheassociatedsamplesarepurelyprecomputedifsetto1;

additionaldataisasetofboxes.ThissetcancontainboxesthatdescribeonecommonversionofthePSI/SItablesbymeansofthe'tPAT'boxorthe'tPMT'boxorotherdata,e.g.boxesthatareonlyvalidforasample(whichcontainsmultiplepackets)anddescribetheinitialconditionsoftheSTCorboxesthatdefinethecontentoftheprecedingortrailingdata.ThereshallbeatmostoneofeachofPATBox,TSTimingBox,InitialSampleTimeBoxpresentwithinadditionaldata

Thefollowingoptionalboxesforadditionaldataaredefined:

aligned(8) class PATBox() extends Box(‘tPAT’) { uint(3) reserved; uint(13) PID; uint(8) sectiondata[]; }

aligned(8) class PMTBox() extends Box(‘tPMT’) { uint(3) reserved; uint(13) PID; uint(8) sectiondata[]; }

aligned(8) class ODBox () extends Box (‘tOD ’) { uint(3) reserved; uint(13) PID; uint(8) sectiondata[]; }

aligned(8) class TSTimingBox() extends Box(‘tsti’) { uint(1) timing_derivation_method; uint(2) reserved; uint(13) PID; }

aligned(8) class InitialSampleTimeBox() extends Box(‘istm’) { uint(32) initialsampletime; uint(32) reserved; }

The'tPAT'boxcontainsthesectiondataofthePATandeach'tPMT'boxcontainsthesectiondataofoneofthePMTs.

In the case of an SPTS, it is strongly recommended that the 'tPMT' box is present in theadditionaldata. If the PMT is not present in the sample data, then it shall be present in theadditionaldata. If the 'tPMT' box is present, it shall be the PMT for the program contained in thesampledata(althoughtherecordedstreammaycontainotherprogramsandbeanMPTS).

PIDisthePIDoftheMPEG2‐TSpacketsfromwhichthedatawasextracted.Inthecaseofthe'tPAT'boxthisvalueisalways0.

sectiondata extends to the endof the box and is the completeMPEG2‐TS table, containing theconcatenatedsections,ofanidenticalversionnumber.

initialsampletimespecifiestheinitialvalueofthesampletimesincasethesampletimesdonotstartfrom0.Unlikemediatracks,MPEG‐2TShinttrackusuallyhavesampletimesnotstarting

Page 146: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

132 ©ISO/IEC2015–Allrightsreserved

from0, e.g., PCR times and reception times. Since ‘stts’ only stores thedeltabetween sampletimes,thisfieldisrequiredforreconstructingtheoriginalsampletimes:

OriginalSampleTime(n) = initialsampletime + STTS(n). IncasePCRtimesareusedforsampletimes,thereconstructedsampletimecanbeusedtoinitialize

theSTCwhenthesampleisrandomlyaccessed.Notethatthisfieldmayneedtobeupdatedafterediting.

timing_derivation_method is a flag which specifies the method which was used to set thesampletimeforagivenPID.Thevaluesfortiming_derivation_methodareasfollows:0x0receptiontime:thesampletimingisderivedfromthereceptiontime.Itisnotguaranteed

thattheSTCwasrecoveredforderivationofthereceptiontime.

0x1piecewiselinearitybetweenPCRs:thesampletimeisderivedfromareconstructedSTCforthisprogram.PiecewiselinearitybetweenadjacentPCRsisassumedandallTSpacketsinthesampleshaveaconstantdurationinthisrange.

9.3.4 Sample Format

EachsampleofanMPEG‐2TSHinttrackconsistsofasetof

pre‐computedpackets:oneormoreMPEG‐2TSpacketswiththeassociatedheadersandtrailers

constructedpackets:instructionstocomposeoneormoreMPEG2‐TSpacketswiththeassociatedheadersandtrailersbypointingtodataofanothertrack.

NotethateachMPEG‐2TSpacketinthesamplemaybeprecededwithapreheader(precedingbytes),orfollowedbyaposttrailer(trailingbytes),asdetailedintheSampleDescriptionFormat.Thesizeofthe preheader and the posttrailer are specified by precedingbyteslen and trailingbyteslen,respectively,inthesampledescriptiontoallowcompactsampletableswithfewerchunks.

It is possible for a mixture of precomputed and constructed samples to occur in the same track. Ifpadding of the transport stream packet is required, this can be accomplished with theadaptation_fieldorexplicitlybyusingtheMPEG2TSImmediateConstructorasappropriate.

NOTE1: ThenumberofMPEG‐2TSpacketsinthesamplecanbederivedfromthesamplesizetabledirectlyifthe sample consists of pre‐computed packets only, which is a conclusion if theprecomputed_only_flaginthesampleentryisset.ThenumberofMPEG‐2TSpacketsinthesamplemaybevariableorrestricted,e.g.extensionsofthisfileformatmaydefineasampletocontainexactlyonepacket.

NOTE2 Itispossibletocompactcommonsequencesofbytesintransportpacketsbyincludingthosebytesinoneormorepacketsdirectly for example in theirprecedingbytes ortrailingbytes section,and then using theMPEG2TSSampleConstructor in other places to refer to them; this is especiallyrelevantforrunsof0xFFbytes.

9.3.4.1 Syntax

// Constructor format aligned(8) abstract class MPEG2TSConstructor (uint(8) type) { uint(8) constructor_type = type; }

aligned(8) class MPEG2TSImmediateConstructor extends MPEG2TSConstructor(1) { uint(8) immediatedatalen; uint(8) data[immediatedatalen]; }

Page 147: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 133

aligned(8) class MPEG2TSSampleConstructor extends MPEG2TSConstructor(2) { uint(8) sampledatalen; uint(16) trackrefindex; uint(32) samplenumber; uint(32) sampleoffset; }

// Packet format aligned(8) class MPEG2TSPacketRepresentation { uint(8) precedingbytes[precedingbyteslen]; uint(8) sync_byte; if (sync_byte == 0x47) { uint(8) packet[187]; } else if (sync_byte == 0x00 || sync_byte == 0x01) { uint(8) headerdatalen; uint(4) reserved; uint(4) num_constructors; bit(1) transport_error_indicator; bit(1) payload_unit_start_indicator; bit(1) transport_priority; bit(13) PID; bit(2) transport_scrambling_control; bit(2) adaptation_field_control; bit(4) continuity_counter; if (sync_byte == 0x00 && (adaptation_field_control == ´10´ || adaptation_field_control == ´11´)) { uint(8) adaptation_field[headerdatalen-3]; } MPEG2TSConstructor constructors[num_constructors]; } else if (sync_byte == 0xFF) { // implicit null packet that has been removed } uint(8) trailingbytes[trailingbyteslen]; } // Sample format aligned(8) class MPEG2TSSample { MPEG2TSPacketRepresentation sample[]; }

9.3.4.2 Semantics

precedingbytescontainsanyextradataprecedingthepacket,typicallyprovidedbytherecordingdevice.Forexample,thismayincludeatimestamp.

sync_byte:ifthisvalueis0x47,thenthepacketrepresentationcontainsatransportstreampacket(aprecomputedreceptionhinttracksample),withtheremainingbytesfollowinginthefieldpacket.Thevalues0x00and0x01areusedforconstructedpacketrepresentation(s).IfMPEG2TSSampleConstructorisusedtoconstructpacketrepresentation(s),itpointstoatrackindexedbytrackrefindexinthetrackreferenceboxwithreferencetype'hint'.Ifthisvalueis0xFF,itimpliesthatanullpackethasbeenremovedatthisposition.Allothervaluesarecurrentlyreserved.

trackrefindexindexesinthetrackreferenceboxwithreferencetype'hint'toindicatewithwhichmediatrackthecurrentsampleisassociated.Thesamplenumberandsampleoffset fieldsintheMPEG2TSSampleConstructorpointintothismediatrack.Thetrackrefindexstartsfromvalue1.Thevalue0isreservedforfutureuse.

packet:TheMPEG‐2TSpacket,apartfromthesyncbyte(0x47).TheMPEG2TSConstructor array is a collection of one ormore constructor entries, to allow for

multipleaccessunitsinonetransportstreampacket.AnMPEG2TSImmediateConstructorcancontain,amongstothers, thePESheader.AnMPEG2TSSampleConstructor referencesdata inthe associated media track. The sum of headerdatalen and the datalen fields of allconstructorsofanMPEG2TSPacketmustbeequaltothelengthofthetransportstreampacketbeingconstructed,minus1byte,whichis187.

Page 148: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

134 ©ISO/IEC2015–Allrightsreserved

trailingbytes contains any extra data following the packet. For example, this may include achecksum.

samplenumber indicates the sample within the referred track contained in the packet andsampleoffset indicates thestartingbytepositionof thereferredmedia samplecontained inthepacketofwhichsampledatalenbytesareincluded.sampleoffsetstartsfromvalue0.

immediatedatalen indicates thenumberof byteswithin the fielddata that are included in thesampleratherthandatabeingincludedintothesamplebyreferencetoamediatrack.

headerdatalenindicatesthelengthoftheTSpacketheader(withoutthesyncbyte)inbytes.Thisfield has the value 3 if the adaptation_field is not present or the value(adaptation_field_length+3), where adaptation_field_length is the first octet of thestructureadaptation_field asdefinedinISO/IEC13818‐1.

Neithertheformatofprecedingbytesnortrailingbytesaredefinedbythisspecification.

The remaining fields (transport_error_indicator, payload_unit_start_indicator,transport_priority, PID, transport_scrambling_control, adaptation_field_control,continuity_counter, adaptation_field) of the sample structure contain a copy of the packetheaderoftheTSpacket,asdefinedinISO/IEC13818‐1.

9.3.5 Protected MPEG 2 Transport Stream Hint Track

9.3.5.1 Introduction

ThisSubclausedefinesamechanismformarkingmediastreamsasprotected.Thisworksbychangingthe four character code of the SampleEntry, and appending boxes containing both details of theprotection mechanism and the original four character code. However, in this case the track is notprotected;itisan‘intheclear’hinttrackwhichcontainsprotecteddata.ThisSubclausedescribeshowhinttracksshouldbemarkedascarryingprotecteddata,usingasimilarmechanism,andutilizingthesameboxes.

9.3.5.2 Syntax

class ProtectedMPEG2TransportStreamSampleEntry extends MPEG2TransportStreamSampleEntry(‘pm2t’) { ProtectionSchemeInfoBox SchemeInformation; }

9.3.5.3 Semantics

The SchemeInformation (‘sinf’) box (defined in 0) shall contain details of the protection schemeapplied.ThisshallincludetheOriginalFormatBoxwhichshallcontaintheoriginalsampleentrytypeoftheMPEG‐2TransportStreamSampleEntrybox.

9.4 RTP, RTCP, SRTP and SRTCP Reception Hint Tracks

9.4.1 RTP Reception Hint Track

9.4.1.1 Introduction

ThisSubclausespecifiesthereceptionhinttrackformatforthereal‐timetransportprotocol(RTP),asdefinedinIETFRFC3550.

Page 149: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 135

RTP is used for real‐timemedia transport over the Internet Protocol. Each RTP stream carries onemedia type, andoneRTPreceptionhint trackcarriesoneRTPstream.Hence, recordingof anaudio‐visualprogramresultsintoatleasttwoRTPreceptionhinttracks.

Thedesignof theRTPreceptionhint track format followsasmuchaspossible thedesignof theRTPserver hint track format. This design should ensure that RTP packet transmission operates verysimilarly regardless whether it is based on RTP reception hint tracks or RTP server hint tracks.Furthermore,thenumberofnewdatastructures inthefile formatwasconsequentlykeptassmallaspossible.

TheformatoftheRTPreceptionhinttracksallowstoringofthepacketpayloadsinthehintsamples,orconverting the RTP packet payloads to media samples and including them by reference to the hintsamples, or combining both approaches. As noted earlier, conversion of received streams to mediatracks allows existing players compliant with earlier versions of the ISO base media file format toprocessrecordedfilesaslongasthemediaformatsarealsosupported.StoringtheoriginalRTPheadersretainsvaluableinformationforerrorconcealmentandthereconstructionoftheoriginalRTPstream.Itisnotedthattheconversionofpacketpayloadstomediasamplesmayhappen"off‐line"afterrecordingofthestreamsinprecomputedRTPreceptionhinttrackshasbeencompleted.

9.4.1.2 Sample Description Format

Theentry‐formatinthesampledescriptionfortheRTPreceptionhinttracksis'rrtp'.ThesyntaxofthesampleentryisthesameasforRTPserverhinttrackshavingtheentry‐format'rtp'.

class ReceivedRtpHintSampleEntry() extends SampleEntry (‘rrtp‘) { uint(16) hinttrackversion = 1; uint(16) highestcompatibleversion = 1; uint(32) maxpacketsize; box additionaldata[]; }

Theentry‐formatidentifierinthesampledescriptionoftheRTPreceptionhinttrackisdifferentfromtheentry‐formatinthesampledescriptionoftheRTPserverhinttrack,inordertoavoidusinganRTPreceptionhinttrackthatcontainserrorsasavalidserverhinttrack.

The additionaldata set of boxes may include the timescale entry ('tims') and time offset ('tsro')boxes.Moreover,theadditionaldatamaycontainatimestampsynchronybox.

Thetimescaleentrybox(‘tims’)shallbepresentandthevalueoftimescaleshallbesettomatchtheclockfrequencyoftheRTPtimestampsofthestreamcapturedinthereceptionhinttrack.

Thetimeoffsetbox(‘tsro’)maybepresent.Ifthetimeoffsetboxisnotpresent,thevalueofthefieldoffsetisinferredtobeequalto0.ThevalueofthefieldoffsetisusedforthederivationoftheRTPtimestamp,asspecifiedin9.4.1.4.

RTPtimestampstypicallydonotstartfromzero,especiallyifanRTPreceiver'tunes'intoastream.ThetimeoffsetboxshouldthereforebepresentinRTPreceptionhinttracksandthevalueofoffsetinthetimeoffsetboxshouldbesetequaltothefirstRTPtimestampoftheRTPstreaminreceptionorder.

Page 150: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

136 ©ISO/IEC2015–Allrightsreserved

Zerooronetimestampsynchronyboxesmaybepresentintheadditionaldataofthesampleentryfor a RTP reception hint track. If a timestampsynchrony box is not present, the value oftimestamp_syncisinferredtobeequalto0.

class timestampsynchrony() extends Box(‘tssy’) { unsigned int(6) reserved; unsigned int(2) timestamp_sync; }

timestamp_syncequal to0 indicatesthat theRTPtimestampsof thepresentRTPreceptionhinttrack derived from the Formula in 9.4.1.4 may or may not be synchronized with RTPtimestampsofotherRTPreceptionhinttracks.

timestamp_syncequal to1 indicatesthat theRTPtimestampsof thepresentRTPreceptionhinttrackderivedfromtheFormulain9.4.1.4reflectthereceivedRTPtimestampsexactly(withoutcorrectedsynchronizationtoanyotherRTPreceptionhinttrack).

timestamp_syncequalto2indicatesthatRTPtimestampsofthepresentRTPreceptionhinttrackderived from the Formula in 9.4.1.4 are synchronized with RTP timestamps of other RTPreceptionhinttracks.

Whentimestamp_sync is equal to 0 or 1, a player should correct the inter‐stream synchronizationusing storedRTCP sender reports.Whentimestamp_sync is equal to2, themedia contained in theRTP reception hint tracks can be played out synchronously according to the reconstructed RTPtimestampswithout synchronization correctionusingRTCPSenderReports. If it is expected that theRTPreceptionhinttrackwillbeusedforre‐sendingtherecordedRTPstream,itisrecommendedthattimestamp_syncbesetequalto0or1,becausethestoredRTCPsenderreportscanbereused.

timestamp_syncequalto3isreserved.

Thevalueoftimestamp_syncshallbeidenticalforallRTPreceptionhinttrackspresentinafile.

WhenRTCPisalsostored,usinganRTCPhinttrack,thetimestamprelationshipbetweentheRTPandRTCPhinttrackscanonlybemaintainediftheRTPtimestampsareanchoredbyusingasettimeoffset(‘tsro’)intheRTPtrack,andhencethetimeoffsetismandatoryifRTCPisstoredinanRTCPhinttrack.

ZerooroneReceivedSsrcBox identifiedwith the four‐charactercode ‘rssr’shallbepresent in theadditionaldataofasampledescriptorentryofaRTPreceptionhinttrack:

class ReceivedSsrcBox extends Box(‘rssr’) { unsigned int(32) SSRC }

TheSSRCvaluemustequaltheSSRCvalueintheheaderofallrecordedSRTPpacketsdescribedbythesampledescription.

9.4.1.3 Sample Format

Thesample formatofRTPreceptionhint tracks is identical to thesyntaxof thesample formatof theRTPserverhinttracks.Eachsample inthereceptionhinttrackrepresentsoneormorereceivedRTPpackets.IfmediaframesarenotbothfragmentedandinterleavedinanRTPstream,itisrecommendedthat each sample represents all received RTP packets that have the same RTP timestamp, i.e.,consecutivepacketsinRTPsequencenumberorderwithacommonRTPtimestamp.

Page 151: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 137

EachRTPreceptionhint samplecontains twoareas: the instructions tocompose thepacket, andanyextradataneededforcomposingthepacket,suchasacopyofthepacketpayload.Notethatthesizeofthesampleisknownfromthesamplesizetable.

Since the reception time for the packetsmay vary, this variation canbe signalled for eachpacket asspecifiedsubsequently.

Asamplewithasizeofzeroispermittedinreceptionhinttracks,andsuchsamplesmaybeignored.

9.4.1.4 Packet Entry Format

Eachpacket in thepacket entry tablehas same structure as for server (transmission)hint tracks, in9.1.3.1.

Whereiisthesamplenumberofasample,thesumofthesampletimeDT(i)asspecifiedin8.6.1.2andrelative_time indicatesthereceptiontimeofthepacket.Theclocksourceforthereceptiontimeisundefinedandmaybe, for instance, thewallclockofthereceiver. If therangeofreceptiontimesofareceptionhinttrackoverlapsentirelyorpartlywiththerangeofreceptiontimesofanotherreceptionhinttrack,theclocksourcesforthesehinttracksshallbethesame.

Itisrecommendedthatreceiversmayuseaconstantvalueforsample_deltainthedecodingtimetosample box ('stts') as much as reasonable and smooth out packet scheduling and end‐to‐end delayvariationbysettingrelative_timeadaptivelyinstoredreceptionhintsamples.Thisarrangementofsetting the values of sample_delta and relative_time can facilitate a compact decoding time tosamplebox.Inthiscasetimestamp_syncissetto1,thesampledurationsaremostlyconstant,andthetimeoffset(‘tsro’)isstoredinthesampleentry.

The values of RTP_version, P_bit, X_bit, CSRC_count, M_bit, payload_type, andRTPsequenceseedshallbesetequal totheV,P,X,CC,M,PTandsequencenumber fieldsof theRTPpacketcapturedinthesample.

Thefieldsbframe_flagandrepeat_flagarereservedinreceptionhinttracksandmustbezero.

Thesemanticsof extra_flagandextra_information_lengthareidenticaltothoseofspecifiedfortheRTPserverhinttracks.

ThefollowingTLVboxesarespecified:rtphdrextTLV,rtpoffsetTLV, receivedCSRC.

If theX_bit is set a singlertphdrextTLV box shall bepresent for storing the receivedRTPHeaderExtension.

aligned(8) class rtphdrextTLV extends Box(‘rtpx’) { unsigned int(8) data[]; }

dataistherawRTPHeaderExtensionwhichisapplication‐specific.

ThesyntaxofthertpoffsetTLVboxisspecifiedin9.1.3.1.

offsetindicatesa32‐bitsignedintegeroffsettotheRTPtimestampofthereceivedRTPpacket.Letibe the samplenumberof a sample,DT(i)beequal toDTas specified in8.6.1.2 for samplenumber i,

Page 152: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

138 ©ISO/IEC2015–Allrightsreserved

tsro.offsetbethevalueofoffsetinthe'tsro'boxofthereferredreceptionhintsampleentry,and%bethemodulooperation.ThevalueofoffsetshallbesuchthatthefollowingFormulaistrue:

RTPtimestamp (DTi tsro.offset offset)mod232

formula(1)RTPtimestampcalculation

NOTE1: When each reception hint sample represents all received RTP packets that have the same RTPtimestamp,thevalueofsample_deltainthedecodingtimetosampleboxcanbesettomatchtheRTP timestamp. Inotherwords,DT(i), as specified above, canbe set equal to (theRTP timestamp–tsro.offset–offset)(assumingthattheresultingvaluewouldbegreaterthanorequalto0).Thisisrecommended.

NOTE2: RTPtimestampsdonotnecessarilyincreaseasafunctionofRTPsequencenumberinallRTPstreams,i.e., transmissionorderandplaybackorderofpacketsmaynotbe identical.Forexample,manyvideocoding schemes allow bi‐prediction from previous and succeeding pictures in playback order. Assamplesappearintracksintheirdecodingorder,i.e.,inreceptionorderincaseofRTPreceptionhinttracks,offsetinthertpoffsetTLVboxcanbeusedtowarptheRTPtimestampawayfromthesampletimeDT(i).

ForthepurposeofeditsinEditListBoxes,thecompositiontimeofareceivedRTPpacketisinferredtobethesumofthesampletimeDT(i)andoffsetasspecifiedabove.

IfthevalueofCSRC_countisnotequaltozero,areceivedCSRCboxmaybepresentforstoringthereceivedCSRCheaderfieldsforeachRTPpacket.ThereceivedCSRCboxisidentifiedwiththefour‐charactercode‘rcsr’

aligned(8) class receivedCSRC extends Box('rcsr') { unsigned int(32) CSRC[]; //to end of the box }

The number of entries in CSRC[] equals the CC value of received SRTP packets. The nth entry ofCSRC[]shallequalthenthCSRCvalueoftheRTPpacketheader.

9.4.1.5 SDP information

BothmovieandtrackSDPinformationmaybepresent,asspecifiedin9.1.4.

9.4.2 RTCP Reception Hint Track

9.4.2.1 Introduction

This Subclause specifies the reception hint track format for the real‐time control protocol (RTCP),definedinIETFRFC3550.

RTCP is used for real‐time transport of control information for an RTP session over the InternetProtocol.Duringstreaming,eachRTPstreamtypicallyhasanaccompanyingRTCPstreamthatcarriescontrolinformationfortheRTPstream.OneRTCPreceptionhinttrackcarriesoneRTCPstreamandisassociatedtothecorrespondingRTPreceptionhinttrackthroughatrackreference.

The format of theRTCP receptionhint tracks allows the storageofRTCPSenderReports in the hintsamples.

Page 153: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 139

The RTCP Sender Reports are of particular interest for stream recording, because they reflect thecurrentstatusof theserver,e.g., therelationshipof themediatiming(RTPtimestampofaudio/videopackets) to the server time (absolute time in NTP format). Knowledge of this relationship is alsonecessaryforplaybackofrecordedRTPreceptionhinttrackstobeabletodetectandcorrectclockdriftandjitter.

Thetimestampsynchronyboxasspecifiedin9.4.1.2makes itpossibletocorrectclockdriftand jitterbeforeplayingafile,andthereforerecordingofRTCPstreamsisoptionalwhentimestamp_syncisequalto2.

There is no serverhint track equivalent for theRTCP receptionhint track, sinceRTCPmessages aregeneratedon‐the‐flyduringtransmission.

9.4.2.2 General

There shall be zero or one RTCP reception hint track for each RTP reception hint track. An RTCPreception hint track shall contain a track reference box including a reference of type 'cdsc' to theassociatedRTPreceptionhinttrack.

When i is the samplenumberof a sample, the sample timeDT(i) as specified in8.6.1.2 indicates thereception time of the packet. The clock source for the reception time shall be the same as for theassociated RTP reception hint track. The value of timescale in the Media Header Box of an RTCPreceptionhinttrackshallbeequaltothevalueoftimescaleinthemediaheaderboxoftheassociatedRTPreceptionhinttrack.

9.4.2.3 Sample Description Format

Theentry‐formatinthesampledescriptionfortheRTCPreceptionhinttracksis 'rtcp'.Itisotherwiseidentical in structure to the sample entry format for RTP. There are no defined boxes for theadditionaldatafield.

9.4.2.4 Sample Format

9.4.2.4.1 Introduction

Eachsample in thereceptionhint trackrepresentsoneormorereceivedRTCPpackets.Eachsamplecontainstwoareas:therawRTCPpacketsandanyextradataneeded.Notethatthesizeofthesampleisknownfromthesamplesize table,andthat thesizeofanRTCPpacket is indicatedwithin thepacketitself(asdocumentedinRFC3550),asacountonelessthanthenumberof32‐bitwordsinthatpacket.

9.4.2.4.2 Syntax

aligned(8) class receivedRTCPpacket { unsigned int(8) data[]; }

aligned(8) class receivedRTCPsample { unsigned int(16) packetcount; unsigned int(16) reserved; receivedRTCPpacket packets[packetcount]; }

Page 154: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

140 ©ISO/IEC2015–Allrightsreserved

9.4.2.4.3 Semantics

datacontainsarawRTCPpacketincludingtheRTCPreportheader,the20‐bytesenderinformationblock and any number of report blocks.Note that the size of eachRTCPpacket is knownbyparsingthe16‐bitlengthfieldoftheRTCPheader.

packetcountindicatesthenumberofreceivedRTCPpacketscontainedinthesample.packetscontainsthereceivedRTCPpackets.

9.4.3 SRTP Reception Hint Track

9.4.3.1 Introduction

This Subclause specifies the reception hint track formats for the secure real‐time transport protocol(SRTP),asdefinedinIETFRFC3711.

SRTP is a secure extension of the real‐timemedia transport (RTP) over the Internet Protocol. EachSRTP stream carries one media type, and one SRTP reception hint track carries one SRTP stream.Hence,recordingofanaudio‐visualprogramresultsintoatleasttwoSRTPreceptionhinttracks.

ThedesignoftheSRTPreceptionhinttrackformatfollowsthedesignofRTPreceptionhinttracksandreusesmost of the frameworkprovidedbyRTP receptionhint tracks.Themajordifference betweenRTPandSRTPreceptionhinttracksisthattheactualmediapayloadisstoredinanencryptedformforSRTP receptionhint tracks,whereas it is unencrypted forRTP receptionhint tracks. SRTP receptionhint tracks provide additional boxes to store informationnecessary todecrypt encrypted content onplayback.Additionally,allheaderfieldsoftheSRTPpacketheadershallbestoredwiththepayload,asthisinformationisnecessarytochecktheintegrityofthereceiveddata.SRTPreceptionhinttracksarecommonlyusedtogetherwithSRTCPreceptionhinttracks.

SRTPreceptionhinttracksmay,forexample,beusedtostoreprotectedmobileTVcontent.

9.4.3.2 Sample Description Format

9.4.3.2.1 Sample Description Entry

ThesampledescriptionformatforSRTPreceptionhinttracksisidenticaltothatforRTPreceptionhinttrackswith theexception that thesampleentryname ischanged from ‘rrtp’ to ‘rsrp’and that itmaycontainadditionalboxes:

class ReceivedSrtpHintSampleEntry() extends SampleEntry (‘rsrp‘) { uint(16) hinttrackversion = 1; uint(16) highestcompatibleversion = 1; uint(32) maxpacketsize; box additionaldata[]; }

Fields and boxes are identical to those of the ReceivedRtpHintSampleEntry (‘rrtp‘). Theaddtionaldata[] of each sample description entry of a SRTPReceptionHint Track shall containexactlyoneReceivedSsrcBox(‘rssr’).

Additionally, the additionaldata[] may contain the Received Cryptographic Context ID box and theRolloverCounterboxdefinedbelow.Furthermore,aSRTPProcessBoxshallalsobeincludedasoneofthe additionaldata boxes. As the content is stored encrypted, the integrity and the encryption

Page 155: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 141

algorithmfieldsintheSRTPProcessboxspecifythealgorithmthatwasappliedtothereceivedstream.Anentryoffourspaces($20$20$20$20)maybeusedtoindicatethatthealgorithmisdefinedbymeansoutsidethescopeofthisdocument.

9.4.3.2.2 Received Cryptographic Context ID Box

ZerooroneReceivedCryptoContextIdBox,identifiedwiththefour‐charactercode‘ccid’,maybepresent in the additionaldata of a sample descriptor entry of an SRTP reception hint track.InformationtorecoverthecryptographiccontextforthereceivedSRTPstreammaybestoredhere.

aligned(8) class ReceivedCryptoContextIdBox extends Box (‘ccid’) { unsigned int(16) destPort; unsigned int(8) ip_version; switch (ip_version) { case 4: // IPv4 unsigned int(32) destIP; break; case 6: // IPv6 unsigned int(64) destIP; break; } }

ThedestPortanddestIPparameterscontaintheportnumberandtheIPaddress(aspresentinthereceivedIPv4orIPv6packages),respectively,oftheSRTPsessionviawhichtherecordedSRTPpacketswerereceived.ip_versioncontainseither4or6representingIPv4orIPv6,respectively.

9.4.3.2.3 Rollover Counter Box

ZerooroneRolloverCounterBox,identifiedwiththefour‐charactercode‘sroc’,maybepresentintheadditionaldata of a sample descriptor entry of an SRTP reception hint track. Typically, therollovercountervaluechangesevery65536SRTPpackage.

aligned(8) class RolloverCounterBox extends Box (‘sroc’) { unsigned int(32) rollover_counter; }

Therollover_counterisanon‐zerointegerthatgivesthevalueoftheROCfieldforallassociatedreceivedSRTPpackets.

NOTE:Therollovercounter(ROC)isanelementofthecryptographiccontextofaSRTPstreamanddependsontheabsolutepositionofapacketinanRTPstream.KnowledgeoftheROCvalueisnecessaryinordertodecryptareceived SRTP packet. It is optional to use the rollover counter box as RFC 4771 defines as an optionalmechanismtosignaltheROCvalueexplicitlyintheauthenticationtagofaSRTPpackage.

9.4.3.3 Sample and Packet Entry Format

Both, sample formatandpacketEntry format forSRTPreceptionhint tracksare identical to thoseofRTPreceptionhinttracks,definedin9.4.1.3and9.4.1.4.ThepacketpayloadisstoredasreceivedintheSRTPpackets,i.e.,allinformationreceivedintheSRTPpacketexcludingtheheaderor,inotherwords,theencryptedpayloadtogetherwiththekeyidentifier(MKI)andtheauthenticationtag.

If thevalueofCSRC_count isnotequal to zero fora receivedSRTPpacket, theextra_data_tlvcorresponding to this receivedSRTPpacket shall contain exactly one receivedCSRC box(‘rcsr’).

Page 156: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

142 ©ISO/IEC2015–Allrightsreserved

9.4.4 SRTCP Reception Hint Tracks

9.4.4.1 Introduction

This Subclause specifies the reception hint track format for the secure real‐time control protocol(SRTCP),definedinIETFRFC3711.

SRTCP is used for real‐time transport of control information for a SRTP session over the InternetProtocol.SRTCPtakesforSRTPtherolethatRTCPtakesforRTP,cf.,9.4.2.Duringstreaming,eachSRTPstream typically has an accompanying SRTCP stream that carries control information for the SRTPstream. One SRTCP reception hint track carries one SRTCP stream and is associated to thecorrespondingSRTPreceptionhinttrackthroughatrackreference.

TheformatoftheSRTCPreceptionhinttracksallowsthestorageofSRTCPPacketsinthehintsamples,e.g.,ofSRTCPSenderReports.

The SRTCP Sender Reports are of particular interest for stream recording, because they reflect thecurrentstatusoftheserver,e.g.,therelationshipofthemediatiming(SRTPtimestampofaudio/videopackets) to the server time (absolute time in NTP format). Knowledge of this relationship is alsonecessaryforplaybackofrecordedSRTPreceptionhinttracksinordertobeabletodetectandcorrectclockdriftandjitter.

Thetimestampsynchronyboxasspecifiedin9.4.1.2makes itpossibletocorrectclockdriftand jitterbeforeplayingafile,andthereforerecordingofSRTCPstreamsisoptional.

ThereisnoserverhinttrackequivalentfortheSRCTPreceptionhinttrack,sinceSRTCPmessagesaregeneratedon‐the‐flyduringtransmission.

9.4.4.2 General

ThereshallbezerooroneSRTCPreceptionhint track foreachSRTPreceptionhint track.AnSRTCPreception hint track shall contain a track reference box including a reference of type 'cdsc' to theassociatedSRTPreceptionhinttrack.

When i is the sample number a sample, the sample time DT(i) as specified in 8.6.1.2 indicates thereception time of the packet. The clock source for the reception time shall be the same as for theassociatedSRTPreceptionhint track.Thevalueoftimescale in theMediaHeaderBoxof anSRTCPreceptionhinttrackshallbeequaltothevalueoftimescaleinthemediaheaderboxoftheassociatedSRTPreceptionhinttrack.

9.4.4.3 Sample Description Format

Theentry‐formatinthesampledescriptionfortheSRTCPreceptionhinttracksis'stcp'.ItisotherwiseidenticalinstructuretothesampleentryformatforRTCP.TheencryptionandauthenticationmethodoftheSRTCPhinttracksaredefinedbytherespectiveentriesinSRTPProcessboxofthecorrespondingSRTPhinttrack.

NOTE: An equivalent to the ROC boxes defined for SRTP is not necessary for SRTCP, as the SRTCP packetcontainsanexplicitlysignalledinitializationvector.

Page 157: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 143

9.4.4.4 Sample Format

SampleformatisthesampleformatforRTCPreceptionhinttracksasdefinedin9.4.2.4.

9.4.5 Protected RTP Reception Hint Track

9.4.5.1 Introduction

This specification defines a mechanism for marking media streams as protected. This works bychangingthefourcharactercodeof theSampleEntry,andappendingboxescontainingbothdetailsoftheprotectionmechanismandtheoriginal fourcharactercode.However, inthiscasethetrack isnotprotected;itisan ‘intheclear’hinttrackwhichcontainsprotecteddata.ThisSubclausedescribesthehowreceptionhinttracksshouldbemarkedascarryingprotecteddata,usingasimilarmechanism,andutilizingthesameboxes.

9.4.5.2 Syntax

Class ProtectedRtpReceptionHintSampleEntry extends RtpReceptionHintSampleEntry (‘prtp‘) { ProtectionSchemeInfoBox SchemeInformation; }

9.4.5.3 Semantics

TheSchemeInformation(‘sinf‘)boxshallcontaindetailsoftheprotectionschemeapplied.ThisshallincludetheOriginalFormatBoxwhichshallcontainthefourcharactercode’rrtp‘(thefourcharactercodeoftheoriginalRTPReceptionHintSampleEntrybox).

9.4.6 Recording Procedure

SeeAnnexH.

9.4.7 Parsing Procedure

SeeAnnexH.

10 Sample Groups

10.1 Random Access Recovery Points

10.1.1.1 Definition

Insomecodingsystemsitispossibletorandomaccessintoastreamandachievecorrectdecodingafterhavingdecodedanumberofsamples.Thisisknownasgradualdecodingrefresh.Forexample,invideo,the encoder might encode intra‐codedmacroblocks in the stream, such that it knows that within acertainperiodtheentirepictureconsistsofpixelsthatareonlydependentonintra‐codedmacroblockssuppliedduringthatperiod.

Samples for which such gradual refresh is possible aremarked by being amember of one of thesegroups.Thedefinitionofthegroupsallowsthemarkingtooccurateitherthebeginningoftheperiodortheend.However,whenusedwithaparticularmediatype,theusageofthesegroupsmayberestrictedtomarkingonlyoneend(i.e.restrictedtoonlypositiveornegativerollvalues).Aroll‐groupisdefinedasthatgroupofsampleshavingthesamerolldistance.

Page 158: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

144 ©ISO/IEC2015–Allrightsreserved

Therollgroupshavethefollowingsemantics.

AVisualRollRecoveryEntry documents samples that enable entry points into streams that arealternativestosyncsamples.

AnAudioRollRecoveryEntrydocumentsthepre‐rolldistancerequiredinaudiostreamsinwhicheverysamplecanbeindependentlydecoded,butthedecoderoutputisonlyassuredtobecorrectafterpre‐rollingbytheindicatednumberofsamples.

AnAudioPreRollEntry is usedwith audio streams inwhich not every sample is a sync sample;decodingcanonlystart at a syncsample,butdecoderoutput isonlyassured tobecorrectafterpre‐rollingbytheindicatednumberofsamples.Thismeansthattoachievecorrectoutputwhenperformingrandomaccess, first it isnecessary tobackupby the indicatedpre‐rolldistance,and then(toenabledecodingtostart)findthenearestsyncsampleat,orpreceding,thatposition.

10.1.1.2 Syntax

class VisualRollRecoveryEntry() extends VisualSampleGroupEntry (’roll’) { signed int(16) roll_distance; }

class AudioRollRecoveryEntry() extends AudioSampleGroupEntry (’roll’) { signed int(16) roll_distance; }

class AudioPreRollEntry() extends AudioSampleGroupEntry (’prol’) { signed int(16) roll_distance; }

10.1.1.3 Semantics

roll_distance is a signed integer that gives the number of samples thatmust be decoded inorder fora sample tobedecodedcorrectly.Apositivevalue indicates thenumberof samplesafter the sample that is a groupmember thatmust be decoded such that at the last of theserecovery is complete, i.e. the last sample is correct.Anegative value indicates thenumberofsamplesbeforethesamplethatisagroupmemberthatmustbedecodedinorderforrecoverytobecompleteatthemarkedsample.Thevaluezeromustnotbeused;thesyncsampletabledocumentsrandomaccesspointsforwhichnorecoveryrollisneeded.

10.2 Rate Share Groups

10.2.1 Introduction

Rate share instructions are used by players and streaming servers to help allocating bitratesdynamicallywhenseveralstreamsshareacommonbandwidthresource.Theinstructionsarestoredinthe file as sample group entries and apply when scalable or alternative media streams at differentbitratesarecombinedwithotherscalableoralternativetracks.Theinstructionsaretime‐dependentassamplesinatrackmaybeassociatedwithdifferentsamplegroupentries.Inthesimplestcase,onlyonetargetratesharevalueisspecifiedpermediaandtimerangeasillustratedinFigure5.

Page 159: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 145

A/

V R

ate

Sh

are

(%)

timeHigher audio rate

required

Audio

Video

Figure 5 — Audio/Video rate share as function of time

In order to accommodate for rate share values that varywith the available bitrate, it is possible tospecify more than one operation range. Onemay for instance indicate that audio requires a higherpercentage(thanvideo)atlowavailablebitrates.TechnicallythisisdonebyspecifyingtwooperationpointsasshowninFigure6.

Au

dio

Ra

te S

hare

(%

)

Available bitrate

Higher audio rate required

Lower audio rate required

OP 1 OP 2

Figure 6 — Audio rate share as function of available bitrate

Operationpointsaredefined in termsof totalavailablebandwidth.Formorecomplexsituations it ispossibletospecifymoreoperationpoints.

Inadditiontotargetratesharevalues,itisalsopossibletospecifymaximumandminimumbitratesforacertainmedia,aswellasdiscardpriority.

Page 160: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

146 ©ISO/IEC2015–Allrightsreserved

10.2.2 Rate Share Sample Group Entry

10.2.2.1 Definition

Eachsampleofatrackmaybeassociatedto(zeroor)oneofanumberofsamplegroupdescriptions,each ofwhich defines a record of rate‐share information. Typically the same rate‐share informationapplies tomany consecutive samples and itmay therefore be enough to define twoor three samplegroupdescriptionsthatcanbeusedatdifferenttimeintervals.

The grouping type'rash' (short for rate share) is defined as the grouping criterion for rate shareinformation. Zero or one sample‐to‐group box ('sbgp') for the grouping type 'rash' can becontainedinthesampletablebox('stbl')ofatrack. Itshallresideinahinttrack, ifahinttrack isused,otherwiseinamediatrack.

Target rate sharemaybespecified for severaloperationpoints thataredefined in termsof the totalavailablebitrate,i.e.,thebitratethatshouldbeshared.Ifonlyoneoperationpointisdefined,thetargetrateshareappliestoallavailablebitrates.Ifseveraloperationpointsaredefined,theneachoperationpointspecifiesatargetrateshare.Targetratesharevaluesspecifiedforthefirstandthelastoperationpointsalsospecifythetargetratesharevaluesatlowerandhigheravailablebitrates,respectively.Thetargetratesharebetweentwooperationpointsisspecifiedtobeintherangebetweenthetargetratesharesofthoseoperationpoints.Onepossibilityistoestimatewithlinearinterpolation.

10.2.2.2 Syntax

class RateShareEntry() extends SampleGroupDescriptionEntry('rash') { unsigned int(16) operation_point_count; if (operation_point_count == 1) { unsigned int(16) target_rate_share; } else { for (i=0; i < operation_point_count; i++) { unsigned int(32) available_bitrate; unsigned int(16) target_rate_share; } } unsigned int(32) maximum_bitrate; unsigned int(32) minimum_bitrate; unsigned int(8) discard_priority; }

10.2.2.3 Semantics

operation_point_countisanon‐zerointegerthatgivesthenumberofoperationpoints.available_bitrateisapositiveintegerthatdefinesanoperationpoint(inkilobitspersecond).

It is the total available bitrate that can be allocated in shares to tracks. Each entry shall begreaterthanthepreviousentry.

target_rate_share is an integer. A non‐zero value indicates the percentage of availablebandwidththatshouldbeallocatedtothemediaforeachoperationpoint.Thevalueofthefirst(last)operationpointappliestolower(higher)availablebitratesthantheoperationpointitself.The target rate share between operation points is bounded by the target rate shares of thecorresponding operation points. A zero value indicates that no information on the preferredratesharepercentageisprovided.

maximum_bitrate is an integer. A nonzero value indicates (in kilobits per second) an upperthreshold for which bandwidth should be allocated to the media. A higher bitrate thanmaximumbitrate should only be allocated if all othermedia in the session has fulfilled their

Page 161: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 147

quotas fortargetrate‐shareandmaximumbitrate,respectively.Azerovalueindicatesthatnoinformationonmaximumbitrateisprovided.

minimum_bitrate is an integer. A nonzero value indicates (in kilobits per second) a lowerthreshold forwhich bandwidth should be allocated to themedia. If the allocated bandwidthwould correspond to a smaller value, thennobitrate should be allocated. Insteadpreferenceshouldbegiventoothermedia in thesessionoralternateencodingsof thesamemedia.Zerominimumbitrateindicatesthatnoinformationonminimumbitrateisprovided.

discard_priority isanintegerindicatingthepriorityofthetrackwhentracksarediscardedtomeettheconstraintssetbytargetrateshare,maximumbitrateandminimumbitrate.Tracksarediscardedindiscardpriorityorderandthetrackthathasthehighestdiscardpriorityvalueisdiscardedfirst.

10.2.3 Relationship between tracks

Thepurposeofdefiningrateshareinformationistoaidaserverorplayerextractingdatafromatrackincombinationwithothertracks.Notethataserver/playerstreams/playstrackssimultaneouslyiftheybelong to different alternate groups and can switch between tracks that belong to the same switchgroupwithinanalternategroup.Bydefault,alltracksareserved/playedsimultaneouslyifnoalternategroupsaredefined.

Rate share information should be provided for each track. A track that does not include rate shareinformationhasoneoperationpointandcanbetreatedasaconstant‐bitratetrackwithdiscardpriority128.Targetrateshare,minimumandmaximumbitratesdonotapplyinthiscase.

Tracks that are alternates to each other shall (at each instance of time) define the same number ofoperationpointsat thesamesetof totalavailablebitratesandhavethesamediscardpriorities.Notethat the number and definition of operation pointsmay depend on time. Alternate tracksmay havedifferenttargetrateshares,minimumandmaximumbitrates.

10.2.4 Bitrate allocation

Rateshareinformationonmaximumbitrate,minimumbitrate,andtargetratesharecanbecombinedforatrack.Ifthisisthecase,thetargetrateshareshallbeappliedtofindanallocatedbitratebeforetheimpactofthemaximumandminimumbitratesisconsidered.

Whenallocatingbandwidthtoseveraltracks,thefollowingconsiderationsapply:

1. In the caseall trackshaveexplicit target rate sharevalues and theydon’t sumup to100percent,treatthemasweights,i.e.,normalizethem.

2. Thetotalallocationshallnotexceedtotalavailablebitrate.

3. In a choice between alternate tracks, the chosen track should be the track that causes thealternate group tohave an allocationmost closely in accordwith its target rate share, or thetrackthatdesiresthehighestbitratethatcanbeallocatedwithoutdiscardingothertracks(seebelow).

4. Tracksmusthaveanallocationbetweentheirminimumandmaximumbitrates,orbediscarded.

5. Tracks should have an allocation in accord with their target rate shares, but this may bedistorted to allow some tracks to achieve their minima, or in case some have reached theirmaxima.

6. Ifanallocationcannotbedoneincludingatrackfromeveryalternategroup,thentracksshouldbediscardedindiscardpriorityorder.

Page 162: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

148 ©ISO/IEC2015–Allrightsreserved

7. Theallocationmustbe re‐calculatedwhenever theoperatingset foranactive track (one thathasbeenselectedfromanalternategroup)changesortheavailablebitratechanges.

10.3 Alternative Startup Sequences

10.3.1 Definition

Analternativestartupsequencecontainsasubsetofsamplesofatrackwithinacertainperiodstartingfromasyncsampleorasamplemarkedby'rap 'samplegrouping,whicharecollectivelyreferredtoas the initial samplebelow.Bydecoding this subsetof samples, the renderingof the samples canbestartedearlierthaninthecasewhenallsamplesaredecoded.

An'alst' samplegroupdescriptionentryindicatesthenumberofsamplesinanyoftherespectivealternativestartupsequences,afterwhichallsamplesshouldbeprocessed.

Either version 0 or version 1 of the Sample to Group Boxmay be usedwith the alternative startupsequence sample grouping. If version 1 of the Sample to Group Box is used,grouping_type_parameterhasnodefinedsemanticsbutthesamealgorithmtoderivealternativestartupsequencesshouldbeusedconsistentlyforaparticularvalueofgrouping_type_parameter.

Aplayerutilizingalternativestartupsequencescouldoperateas follows.First, an initial syncsamplefrom which to start decoding is identified by using the Sync Sample Box, thesample_is_non_sync_sampleflagforsamplesenclosedintrackfragments,orthe'rap 'samplegrouping. Then, if the initial sync sample is associated to a sample group description entry of type'alst'whereroll_countisgreaterthan0,theplayercanusethealternativestartupsequence.Theplayerthendecodesonlythosesamplesthataremappedtothealternativestartupsequenceuntilthenumber of samples that have been decoded is equal to roll_count. After that, all samples aredecoded.

10.3.2 Syntax

class AlternativeStartupEntry() extends VisualSampleGroupEntry (’alst’) { unsigned int(16) roll_count; unsigned int(16) first_output_sample; for (i=1; i <= roll_count; i++) unsigned int(32) sample_offset[i]; j=1; do { // optional, until the end of the structure unsigned int(16) num_output_samples[j]; unsigned int(16) num_total_samples[j]; j++; } }

10.3.3 Semantics

roll_countindicatesthenumberofsamplesinthealternativestartupsequence.Ifroll_countisequalto0,theassociatedsampledoesnotbelongtoanyalternativestartupsequenceandthesemanticsoffirst_output_sampleareunspecified.Thenumberofsamplesmappedtothissamplegroupentryperonealternativestartupsequenceshallbeequalto roll_count.

first_output_sample indicates the index of the first sample intended for output among thesamples in the alternative startup sequence. The index of the sync initial sample starting the

Page 163: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 149

alternativestartupsequenceis1,andtheindexisincrementedby1,indecodingorder,pereachsampleinthealternativestartupsequence.

sample_offset[i]indicatesthedecodingtimedeltaofthei‐thsampleinthealternativestartupsequencerelativetotheregulardecodingtimeofthesamplederivedfromtheDecodingTimetoSampleBoxortheTrackFragmentHeaderBox.Thesyncinitialsamplestartingthealternativestartupsequenceisitsfirstsample.

num_output_samples[j] and num_total_samples[j] indicate the sample output ratewithin the alternative startup sequence. The alternative startup sequence is divided into kconsecutivepieces,whereeachpiecehasaconstantsampleoutputratewhichisunequaltothatof the adjacent pieces. The first piece starts from the sample indicated byfirst_output_sample. num_output_samples[j] indicates the number of the outputsamples of the j‐th piece of the alternative startup sequence. num_total_samples[j]indicates the total number of samples, including those that are not in the alternative startupsequence,fromthefirstsampleinthej‐thpiecethatisoutputtotheearlierone(incompositionorder) of the sample that ends the alternative startup sequence and the sample thatimmediatelyprecedesthefirstoutputsampleofthe(j+1)thpiece.

10.3.4 Examples

Hierarchicaltemporalscalability(e.g.,inAVCandSVC)improvescompressionefficiencybutincreasesthe decoding delay due to reordering of the decoded pictures from the (de)coding order to outputorder.Deeptemporalhierarchieshavebeendemonstratedtousefulintermsofcompressionefficiencyinsomestudies.Whenthetemporalhierarchyisdeepandtheoperationspeedofthedecoderislimited(tonofasterthanreal‐timeprocessing), theinitialdelayfromthestartofthedecodingtothestartofrenderingissubstantialandmayaffecttheend‐userexperiencenegatively.

Figure7illustratesatypicalhierarchicallyscalablebitstreamwithfivetemporallevels.Figure7ashowstheexamplesequence inoutputorder.Valuesenclosed inboxes indicate the frame_numvalueof thepicture.Valuesinitalicsindicateanon‐referencepicturewhiletheotherpicturesarereferencepictures.Figure7b shows the example sequence in decoding order. Figure7c shows the example sequence inoutputorderwhenassumingthattheoutputtimelinecoincideswiththatofthedecodingtimelineandthedecodingofonepicturelastsonepictureinterval.Itcanbeseenthatplaybackofthestreamstartsfivepictureintervalslaterthanthedecodingofthestreamstarted.Ifthepicturesweresampledat25Hz,thepictureintervalis40msec,andtheplaybackisdelayedby0.2sec.

Page 164: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

150 ©ISO/IEC2015–Allrightsreserved

Figure 7 — Decoded picture buffering delay of an example sequence with five temporal levels

Thankstothetemporalhierarchy,itispossibletodecodeonlyasubsetofthepicturesatthebeginningofthesequence.Consequently,renderingcanbestartedfasterbutthedisplayedpicturerateisloweratthebeginning.Inotherwords,aplayercanmakeatrade‐offbetweenthedurationoftheinitialstartupdelay and the initial displayed picture rate. Figure8 and Figure9 show two examples of alternativestartupsequenceswhereasubsetofthebitstreamofFigure7isdecoded.

The samples selected fordecodingand thedecoderoutputarepresented inFigure8aandFigure8b,respectively.Thereferencepicturehavingframe_numequalto4andthenon‐referencepictureshavingframe_numequal to5 arenotdecoded. In this example, the renderingofpictures starts fourpictureintervalsearlierthaninFigure7.Whenthepicturerateis25Hz,thesavinginstartupdelayis160msec.Thesaving in thestartupdelaycomeswith thedisadvantageofa lowerdisplayedpicturerateat thebeginningofthebitstream.

Figure 8 — An example of an alternative startup sequence

In the example of Figure9, another way of selecting the pictures for decoding is presented. Thedecoding of the pictures that depend on the picture with frame_num equal to 3 is omitted and thedecodingofnon‐referencepictureswithinthesecondhalfofthefirstgroupofpicturesisomittedtoo.Thedecodedpictureresultingfromthesamplewithframe_numequalto2isthefirstonethatisoutput.Asaresult,theoutputpicturerateofthefirstgroupofpicturesishalfofnormalpicturerate,butthe

Page 165: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 151

display process starts two frame intervals (80 msec in 25 Hz picture rate) earlier than in theconventionalsolutionillustratedinFigure7.

Figure 9 — Another example of an alternative startup sequence

10.4 Random Access Point (RAP) Sample Grouping

10.4.1 Definition

Asyncsampleisspecifiedtobearandomaccesspointafterwhichallsamplesindecodingordercanbecorrectlydecoded.However,itmaybepossibletoencodean“open”randomaccesspoint,afterwhichallsamples inoutputordercanbecorrectlydecoded,butsomesamples followingtherandomaccesspointindecodingorderandprecedingtherandomaccesspointinoutputorderneednotbecorrectlydecodable.Forexample,anintrapicturestartinganopengroupofpicturescanbefollowedindecodingorderby (bi‐)predictedpictures thathoweverprecede the intrapicture inoutputorder; though theypossiblycannotbecorrectlydecodedifthedecodingstartsfromtheintrapicture,theyarenotneeded.

Such“open”random‐accesssamplescanbemarkedbybeingamemberofthisgroup.Samplesmarkedby thisgroupmustberandomaccesspoints, andmayalsobesyncpoints (i.e. it isnot required thatsamplesmarkedbythesyncsampletablebeexcluded).

10.4.2 Syntax

class VisualRandomAccessEntry() extends VisualSampleGroupEntry (’rap ’) { unsigned int(1) num_leading_samples_known; unsigned int(7) num_leading_samples; }

10.4.3 Semantics

num_leading_samples_known equal to 1 indicates that the number of leading samples isknownforeachsample in thisgroup,andthenumber isspecifiedbynum_leading_samples.Aleading sample is such a sample associated with an “open” random access point (RAP). Itprecedes the RAP in presentation order and immediate follows the RAP or another leadingsample in decoding order, and when decoding starts from the RAP, the sample cannot becorrectlydecoded.

num_leading_samples specifiesthenumberofleadingsamplesforeachsampleinthisgroup.Whennum_leading_samples_knownisequalto0,thisfieldshouldbeignored.

Page 166: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

152 ©ISO/IEC2015–Allrightsreserved

10.5 Temporal level sample grouping

10.5.1 Definition

Manyvideocodecssupporttemporalscalabilitywhereitispossibletoextractoneormoresubsetsofframesthatcanbeindependentlydecoded.AsimplecaseistheextractionofI framesforabitstreamwitharegularI‐frameinterval,e.g,,IPPPIPPP…,whereevery4thpictureisanIframe.Alsosubsetsofthese I frames can be extracted for even lower frame rates. More elaborate situations with severaltemporallevelscanbeconstructedusinghierarchicalBorPframes.

TheTemporalLevelsamplegrouping('tele')providesacodec‐independentsamplegroupingthatcanbe used to group samples (access units) in a track (and potential track fragments) according totemporallevel,wheresamplesofonetemporallevelhavenocodingdependenciesonsamplesofhighertemporal levels.The temporal level equals the samplegroupdescription index (takingvalues1, 2,3,etc).Thebitstreamcontainingonlytheaccessunitsfromthefirsttemporalleveltoahighertemporallevelremainsconformingtothecodingstandard.

A grouping according to temporal level facilitates easy extraction of temporal subsequences, forinstanceusingtheSubsegmentIndexingboxin0.

10.5.2 Syntax

class TemporalLevelEntry() extends VisualSampleGroupEntry('tele') { bit(1) level_independently_decodable; bit(7) reserved=0; }

10.5.3 Semantics

Thetemporallevelofsamplesinasamplegroupequalstothesamplegroupdescriptionindex.

level_independently_decodable isaflag.1indicatesthatallsamplesofthislevelhavenocodingdependenciesonsamplesofotherlevels.0indicatesthatnoinformationisprovided.

10.6 Stream access point sample group

10.6.1 Definition

Astreamaccesspoint,asdefinedinAnnexI,enablesrandomaccessintoacontainerofmediastream(s).The SAP sample grouping identifies samples (the first byte ofwhich is the position ISAU for a SAP asspecifiedinAnnexI)asbeingoftheindicatedSAPtype.

Thesyntaxandsemanticsofgrouping_type_parameterarespecifiedasfollows.

{ unsigned int(28) target_layers; unsigned int(4) layer_id_method_idc; }

target_layersspecifiesthetargetlayersfortheindicatedSAPsaccordingtoAnnexI.Thesemanticsoftarget_layersdependsonthevalueoflayer_id_method_idc.Whenlayer_id_method_idcisequalto0,target_layersisreserved.

Page 167: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 153

layer_id_method_idcspecifiesthesemanticsoftarget_layers.layer_id_method_idcequal to 0 specifies that the target layers consist of all the layers represented by the track.layer_id_method_idcnotequalto0isspecifiedbyderivedmediaformatspecifications.

10.6.2 Syntax

class SAPEntry() extends SampleGroupDescriptionEntry('sap ') { unsigned int(1) dependent_flag; unsigned int(3) reserved; unsigned int(4) SAP_type; }

10.6.3 Semantics

reservedshallbeequalto0.Parsersshallallowandignoreallvaluesofreserved.dependent_flagshallbe0fornon‐layeredmedia. dependent_flag equalto1specifiesthat

thereferencelayers,ifany,forpredictingthetargetlayersmayhavetobedecodedforaccessingasampleofthissamplegroup.dependent_flagequalto0specifiesthatthereferencelayers,ifany,forpredictingthetargetlayersneednotbedecodedforaccessinganySAPofthissamplegroup.

sap_typevaluesequalto0and7arereserved;sap_typevaluesintherangeof1to6,inclusive,specifytheSAPtype,asspecifiedinAnnexI,oftheassociatedsamples(forwhichthefirstbyteofasampleinthisgroupisthepositionISAU).

11 Extensibility

11.1 Objects

Thenormativeobjectsdefinedinthisspecificationareidentifiedbya32‐bitvalue,whichisnormallyasetoffourprintablecharactersfromtheISO8859‐1characterset.

Topermituserextensionoftheformat,tostorenewobjecttypes,andtopermittheinter‐operationofthe files formatted to this specificationwithcertaindistributedcomputingenvironments, thereare atypemappingandatypeextensionmechanismthattogetherformapair.

CommonlyusedindistributedcomputingareUUIDs(universaluniqueidentifiers),whichare16bytes.AnynormativetypespecifiedherecanbemappeddirectlyintotheUUIDspacebycomposingthefourbyte type value with the twelve byte ISO reserved value, 0xXXXXXXXX-0011-0010-8000-00AA00389B71. The four character code replaces the XXXXXXXX in the preceding number. ThesetypesareidentifiedtoISOastheobjecttypesusedinthisspecification.

Userobjectsusetheescapetype‘uuid’.Theyaredocumentedaboveinsubclause6.2.Afterthesizeandtypefields,thereisafull16‐byteUUID.

SystemswhichwishtotreateveryobjectashavingaUUIDcouldemploythefollowingalgorithm:

size := read_uint32(); type := read_uint32(); if (type==‘uuid’) then uuid := read_uuid() else uuid := form_uuid(type, ISO_12_bytes);

Page 168: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

154 ©ISO/IEC2015–Allrightsreserved

Similarly when linearizing a set of objects into files formatted to this specification, the following isapplied:

write_uint32( object_size(object) ); uuid := object_uuid_type(object); if (is_ISO_uuid(uuid) ) write_uint32( ISO_type_of(uuid) ) else { write_uint32(‘uuid’); write_uuid(uuid); }

Afilecontainingboxesfromthisspecificationthathavebeenwrittenusingthe‘uuid’escapeandthefull UUID is not compliant; systems are not required to recognize standard boxeswritten using the‘uuid’andanISOUUID.

11.2 Storage formats

Themainfilecontainingthemetadatamayuseotherfilestocontainmedia‐data.Theseotherfilesmaycontainheaderdeclarationsfromavarietyofstandards,includingthisone.

If such a secondary file has ametadata declaration set in it, thatmetadata is not part of the overallpresentation.Thisallowssmallpresentationfilestobeaggregatedintoalargeroverallpresentationbybuildingnewmetadataandreferencingthemedia‐data,ratherthancopyingit.

Thereferencesintotheseotherfilesneednotuseallthedatainthosefiles;inthisway,asubsetofthemedia‐datamaybeused,orunwantedheadersignored.

11.3 Derived File formats

This specificationmay be used as the basis as the specific file format for a restricted purpose: forexample,theMP4fileformatforMPEG‐4andtheMotionJPEG2000fileformatarebothderivedfromit.Whenaderivedspecificationiswritten,thefollowingmustbespecified:

Thenameofthenewformat,anditsbrandandcompatibilitytypesfortheFileTypeBox.Generallyanewfileextensionwillbeused,anewMIMEtype,andMacintosh file typealso, thoughthedefinitionandregistrationoftheseareoutsidethescopeofthisspecification.

Any template fields used must be explicitly declared; their use must be conformant with thespecificationhere.

The exact ‘codingname’ and ‘protocol’ identifiers as used in the Sample Description must bedefined.The formatof the samples that these code‐points identifymust alsobedefined.However, itmaybepreferabletofitthenewcodingsystemsintoanexistingframework(e.g.theMPEG‐4systemsframework),thantodefinenewcodingpointsatthislevel.Forexample,anewaudioformatcoulduseanew codingname, or could use ‘mp4a’ and register new identifiers within the MPEG‐4 audioframework.

Newboxesmaybedefined,thoughthisisdiscouraged.

Ifthederivedspecificationneedsanewtracktypeotherthanthosedefinedhereorregistered,thenanewhandler‐typemustberegistered.Themediaheaderrequiredforthistrackmustbeidentified.Ifit

Page 169: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 155

isanewbox,itmustbedefinedanditsboxtyperegistered.Ingeneral,itisexpectedthatmostsystemscanuseexistingtracktypes.

Anynewtrackreferencetypesshouldberegisteredanddefined.

Asdefinedabove,theSampleDescriptionformatmaybeextendedwithoptionalorrequiredboxes.Theusualsyntaxfordoingthiswouldbetodefineanewboxwithaspecificname,extending(forexample)VisualSampleEntry,andcontainingnewboxes.

12 Media-specific definitions

12.1 Video media

12.1.1 Media handler

Videomediausesthe‘vide’handlertypeinthehandlerboxofthemediabox,asdefinedin8.4.3.

Auxiliaryvideomediausesthe‘auxv’handlertypeinthehandlerboxofthemediabox,asdefinedin8.4.3.

Anauxiliaryvideotrackiscodedthesameasavideotrack,butusesthisdifferenthandlertype,andisnotintendedtobevisuallydisplayed(e.g.itcontainsdepthinformation,orothermonochromeorcolortwo‐dimensional information). Auxiliary video tracks are usually linked to a video track by anappropriatetrackreference.

12.1.2 Video media header

12.1.2.1 Definition

BoxTypes: ‘vmhd’Container: MediaInformationBox(‘minf’)Mandatory:YesQuantity: Exactlyone

VideotracksusetheVideoMediaHeaderboxinthemediainformationboxasdefinedin8.4.5.Thevideomediaheadercontainsgeneralpresentation information, independentof thecoding, forvideomedia.Notethattheflagsfieldhasthevalue1.

12.1.2.2 Syntax

aligned(8) class VideoMediaHeaderBox extends FullBox(‘vmhd’, version = 0, 1) { template unsigned int(16) graphicsmode = 0; // copy, see below template unsigned int(16)[3] opcolor = {0, 0, 0}; }

12.1.2.3 Semantics

version isanintegerthatspecifiestheversionofthisboxgraphicsmode specifiesacompositionmodeforthisvideotrack,fromthefollowingenumerated

set,whichmaybeextendedbyderivedspecifications:copy=0copyovertheexistingimage

opcolor isasetof3colourvalues(red,green,blue)availableforusebygraphicsmodes

Page 170: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

156 ©ISO/IEC2015–Allrightsreserved

12.1.3 Sample entry

12.1.3.1 Definition

VideotracksuseVisualSampleEntry.

Invideotracks,theframe_countfieldmustbe1unlessthespecificationforthemediaformatexplicitlydocuments this template fieldandpermits largervalues.Thatspecificationmustdocumentbothhowthe individual frames of video are found (their size information) and their timing established. Thattimingmightbeas simpleasdividing thesampledurationby the framecount toestablish the frameduration.

Thewidthandheightinthevideosampleentrydocumentthepixelcountsthatthecodecwilldeliver;thisenablestheallocationofbuffers.Sincethesearecountstheydonottakeintoaccountpixelaspectratio.

12.1.3.2 Syntax

class VisualSampleEntry(codingname) extends SampleEntry (codingname){ unsigned int(16) pre_defined = 0; const unsigned int(16) reserved = 0; unsigned int(32)[3] pre_defined = 0; unsigned int(16) width; unsigned int(16) height; template unsigned int(32) horizresolution = 0x00480000; // 72 dpi template unsigned int(32) vertresolution = 0x00480000; // 72 dpi const unsigned int(32) reserved = 0; template unsigned int(16) frame_count = 1; string[32] compressorname; template unsigned int(16) depth = 0x0018; int(16) pre_defined = -1; // other boxes from derived specifications CleanApertureBox clap; // optional PixelAspectRatioBox pasp; // optional }

12.1.3.3 Semantics

resolutionfieldsgivetheresolutionoftheimageinpixels‐per‐inch,asafixed16.16numberframe_count indicates howmany frames of compressed video are stored in each sample. The

defaultis1,foroneframepersample;itmaybemorethan1formultipleframespersampleCompressornameisaname,forinformativepurposes.Itisformattedinafixed32‐bytefield,with

the firstbyteset to thenumberofbytes tobedisplayed, followedby thatnumberofbytesofdisplayabledata,andthenpaddingtocomplete32bytestotal(includingthesizebyte).Thefieldmaybesetto0.

depthtakesoneofthefollowingvalues0x0018–imagesareincolourwithnoalpha

width and height are themaximum visualwidth and height of the stream described by thissampledescription,inpixels

12.1.4 Pixel Aspect Ratio and Clean Aperture

12.1.4.1 Definition

Thepixelaspectratioandcleanapertureofthevideomaybespecifiedusingthe‘pasp’and‘clap’sampleentryboxes,respectively.Thesearebothoptional;ifpresent,theyover‐ridethedeclarations(ifany)instructuresspecifictothevideocodec,whichstructuresshouldbeexaminediftheseboxesare

Page 171: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 157

absent. Formaximumcompatibility, these boxes should follow, not precede, anyboxes defined in orrequiredbyderivedspecifications.

In the PixelAspectRatioBox, hSpacing and vSpacing have the same units, but those units areunspecified:only theratiomatters.hSpacing andvSpacingmayormaynotbe in reduced terms,andtheymayreduceto1/1.Bothofthemmustbepositive.

Theyaredefinedastheaspectratioofapixel, inarbitraryunits. IfapixelappearsHwideandVtall,thenhSpacing/vSpacing is equal toH/V.Thismeans thata squareon thedisplay that isnpixels tallneedstoben*vSpacing/hSpacingpixelswidetoappearsquare.

NOTEWhenadjustingpixelaspectratio,normally,thehorizontaldimensionofthevideoisscaled,ifneeded(i.e.ifthefinaldisplaysystemhasadifferentpixelaspectratiofromthevideosource).

NOTEItisrecommendedthattheoriginalpixels,andthecomposedtransform,becarriedthroughthepipelineasfaraspossible.Ifthetransformationresultingfrom‘correcting’pixelaspectratiotoasquaregrid,normalizingtothetrackdimensions,compositionorplacement(e.g.trackand/ormoviematrix),andnormalizingtothedisplaycharacteristics, is a unity matrix, then no re‐sampling need be done. In particular, video should not be re‐sampledmorethanonceintheprocessofrendering,ifatallpossible.

There are notionally four values in the CleanApertureBox. These parameters are represented as afractionN/D.Thefractionmayormaynotbeinreducedterms.WerefertothepairofparametersfooNandfooDasfoo.ForhorizOff andvertOff,DmustbepositiveandNmaybepositiveornegative.ForcleanApertureWidth andcleanApertureHeight,bothNandDmustbepositive.

NOTEThesearefractionalnumbersforseveralreasons.First,insomesystemstheexactwidthafterpixelaspectratio correction is integral, not the pixel count before that correction. Second, if video is resized in the fullaperture, theexact expression for the cleanaperturemaynotbe integral.Finally,because this is representedusingcentreandoffset,adivisionbytwoisneeded,andsohalf‐valuescanoccur.

Considering the pixel dimensions as defined by the VisualSampleEntry width and height. If picturecentreoftheimageisatpcXandpcY,thenhorizOffandvertOffaredefinedasfollows:

pcX = horizOff + (width - 1)/2 pcY = vertOff + (height - 1)/2;

Typically,horizOffandvertOffarezero,sotheimageiscentredaboutthepicturecentre.

Theleftmost/rightmostpixelandthetopmost/bottommostlineofthecleanaperturefallat:

pcX ± (cleanApertureWidth - 1)/2 pcY ± (cleanApertureHeight - 1)/2;

12.1.4.2 Syntax

class PixelAspectRatioBox extends Box(‘pasp’){ unsigned int(32) hSpacing; unsigned int(32) vSpacing; }

Page 172: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

158 ©ISO/IEC2015–Allrightsreserved

class CleanApertureBox extends Box(‘clap’){ unsigned int(32) cleanApertureWidthN; unsigned int(32) cleanApertureWidthD; unsigned int(32) cleanApertureHeightN; unsigned int(32) cleanApertureHeightD; unsigned int(32) horizOffN; unsigned int(32) horizOffD; unsigned int(32) vertOffN; unsigned int(32) vertOffD; }

12.1.4.3 Semantics

hSpacing,vSpacing:definetherelativewidthandheightofapixel;cleanApertureWidthN,cleanApertureWidthD:afractionalnumberwhichdefinestheexact

cleanaperturewidth,incountedpixels,ofthevideoimagecleanApertureHeightN, cleanApertureHeightD: a fractional number which defines the

exactcleanapertureheight,incountedpixels,ofthevideoimagehorizOffN, horizOffD: a fractional number which defines the horizontal offset of clean

aperturecentreminus(width‐1)/2.Typically0.vertOffN, vertOffD: a fractional number which defines the vertical offset of clean aperture

centreminus(height‐1)/2.Typically0.

12.1.5 Colour information

12.1.5.1 Definition

Colour information may be supplied in one or more ColourInformationBoxes placed in aVisualSampleEntry.Theseshouldbeplacedinorderinthesampleentrystartingwiththemostaccurate(and potentially the most difficult to process), in progression to the least. These are advisory andconcernrenderingandcolourconversion,andthereisnonormativebehaviourassociatedwiththem;areadermaychoosetousethemostsuitable.AColourInformationBoxwithanunknowncolourtypemaybeignored.

If used, an ICC profile may be a restricted one, under the code ‘rICC’, which permits simplerprocessing.ThatprofileshallbeofeithertheMonochromeorThree‐ComponentMatrix‐Basedclassofinputprofiles,asdefinedbyISO15076‐1.Iftheprofileisofanotherclass,thenthe‘prof’ indicatormustbeused.

If colour information is supplied in both this box, and also in the video bitstream, this box takesprecedence,andover‐ridestheinformationinthebitstream.

NOTE WhenanICCprofileisspecified,SMPTERP177“DerivationofBasicTelevisionColorEquations”maybeofassistanceifthereisaneedtoformtheY'CbCrtoR'G'B'conversionmatrixforthecolorprimariesdescribedbytheICCprofile.

Page 173: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 159

12.1.5.2 Syntax

class ColourInformationBox extends Box(‘colr’){ unsigned int(32) colour_type; if (colour_type == ‘nclx’) /* on-screen colours */ { unsigned int(16) colour_primaries; unsigned int(16) transfer_characteristics; unsigned int(16) matrix_coefficients; unsigned int(1) full_range_flag; unsigned int(7) reserved = 0; } else if (colour_type == ‘rICC’) { ICC_profile; // restricted ICC profile } else if (colour_type == ‘prof’) { ICC_profile; // unrestricted ICC profile } }

12.1.5.3 Semantics

colour_type: an indication of the type of colour information supplied. For colour_type ‘nclx’: thesefieldsareexactlythefourbytesdefinedforPTM_COLOR_INFO( ) inA.7.2ofISO/IEC29199‐2butnotethatthefullrangeflagishereinadifferentbitposition

ICC_profile: anICCprofileasdefinedinISO15076‐1orICC.1:2010issupplied.

12.2 Audio media

12.2.1 Media handler

Audiomediausesthe‘soun’handlertypeinthehandlerboxofthemediabox,asdefinedin8.4.3.

12.2.2 Sound media header

12.2.2.1 Definition

BoxTypes: ‘smhd’Container: MediaInformationBox(‘minf’)Mandatory:YesQuantity: Exactlyonespecificmediaheadershallbepresent

Audio tracks use the SoundMediaHeaderbox in the media information box as defined in 8.4.5. Thesoundmediaheader contains generalpresentation information, independentof the coding, for audiomedia.Thisheaderisusedforalltrackscontainingaudio.

12.2.2.2 Syntax

aligned(8) class SoundMediaHeaderBox extends FullBox(‘smhd’, version = 0, 0) { template int(16) balance = 0; const unsigned int(16) reserved = 0; }

12.2.2.3 Semantics

version isanintegerthatspecifiestheversionofthisbox

Page 174: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

160 ©ISO/IEC2015–Allrightsreserved

balance isafixed‐point8.8numberthatplacesmonoaudiotracksinastereospace;0iscentre(thenormalvalue);fullleftis‐1.0andfullrightis1.0.

12.2.3 Sample entry

12.2.3.1 Definition

AudiotracksuseAudioSampleEntryorAudioSampleEntryV1.

The samplerate, samplesize and channelcount fields document the default audio outputplayback format for this media. The timescale for an audio track should be chosen to match thesamplingrate,orbeanintegermultipleofit,toenablesample‐accuratetiming.Whenchannelcountis a value greater than zero, it indicates the intended number of loudspeaker channels in the audiostream.AChannelCountof1indicatesmonoaudio,and2indicatesstereo(left/right).Whenvaluesgreaterthan2areused,thecodecconfigurationshouldidentifythechannelassignment.

Whenitisdesiredtoindicateanaudiosamplingrategreaterthanthevaluethatcanberepresentedinthesampleratefield,thefollowingmaybeused:

anAudioSampleEntryV1isused,whichrequiresthattheenclosingSampleDescriptionBoxalsotaketheversion1;

aSamplingRateboxmaybepresentonlyinanAudioSampleEntryV1,andwhenpresent,itover‐ridesthesampleratefieldanddocumentstheactualsamplingrate;

whentheSamplingRateboxispresent,themediatimescaleshouldbethesameasthesamplingrate,oranintegerdivisionormultipleofit;

thesampleratefieldinthesampleentryshouldcontainavalueleft‐shifted16bits(asforAudioSampleEntry)thatmatchesthemediatimescale,orbeanintegerdivisionormultipleofit.

AnAudioSampleEntryV1shouldonlybeusedwhenneeded;otherwise,formaximumcompatibility,anAudioSampleEntryshouldbeused.AnAudioSampleEntryV1mustnotoccurinaSampleDescriptionBoxwithversionsetto0.

Theaudiooutputformat(samplerate,samplesizeandchannelcountfields)inthesampleentryshouldbeconsidereddefinitiveonlyforcodecsthatdonotrecordtheirownoutputconfiguration.Iftheaudiocodechasdefinitive informationabout theoutput format, it shallbe takenasdefinitive; in thiscasethesamplerate,samplesizeandchannelcountfieldsinthesampleentrymaybeignored,thoughsensiblevaluesshouldbechosen(forexample,thehighestpossiblesamplingrate).

Page 175: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 161

12.2.3.2 Syntax

// Audio Sequences class AudioSampleEntry(codingname) extends SampleEntry (codingname){ const unsigned int(32)[2] reserved = 0; template unsigned int(16) channelcount = 2; template unsigned int(16) samplesize = 16; unsigned int(16) pre_defined = 0; const unsigned int(16) reserved = 0 ; template unsigned int(32) samplerate = { default samplerate of media}<<16; ChannelLayout(); // we permit any number of DownMix or DRC boxes: DownMixInstructions() []; DRCCoefficientsBasic() []; DRCInstructionsBasic() []; DRCCoefficientsUniDRC() []; DRCInstructionsUniDRC() []; Box (); // further boxes as needed }

aligned(8) class SamplingRateBox extends FullBox(‘srat’) { unsigned int(32) sampling_rate; }

class AudioSampleEntryV1(codingname) extends SampleEntry (codingname){ unsigned int(16) entry_version; // must be 1, // and must be in an stsd with version ==1 const unsigned int(16)[3] reserved = 0; template unsigned int(16) channelcount; // must be correct template unsigned int(16) samplesize = 16; unsigned int(16) pre_defined = 0; const unsigned int(16) reserved = 0 ; template unsigned int(32) samplerate = 1<<16; // optional boxes follow SamplingRateBox(); ChannelLayout(); // we permit any number of DownMix or DRC boxes: DownMixInstructions() []; DRCCoefficientsBasic() []; DRCInstructionsBasic() []; DRCCoefficientsUniDRC() []; DRCInstructionsUniDRC() []; Box (); // further boxes as needed }

12.2.3.3 Semantics

ChannelCount isthenumberofchannelssuchas1(mono)or2(stereo)SampleSizeisinbits,andtakesthedefaultvalueof16SampleRatewhenaSamplingRateBoxisabsentisthesamplingrate;whenaSamplingRateBoxis

present,isasuitableintegermultipleordivisionoftheactualsamplingrate.This32‐bitfieldisexpressedasa16.16fixed‐pointnumber(hi.lo)

sampling_rate istheactualsamplingrateoftheaudiomedia,expressedasa32‐bitinteger

Page 176: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

162 ©ISO/IEC2015–Allrightsreserved

12.2.4 Channel layout

12.2.4.1 Definition

BoxTypes: ‘chnl’Container: AudiosampleentryMandatory: NoQuantity: Zeroorone

Thisboxmayappear in anaudio sampleentry todocument theassignmentof channels in theaudiostream.

Thechannelcount field in theAudioSampleEntry must be correct; an AudioSampleEntryV1 istherefore required to signal values other than2.The channel layout canbe all orpart of a standardlayout(fromanenumeratedlist),oracustomlayout(whichalsoallowsatracktocontributepartofanoveralllayout).

Astreammaycontainchannels,objects,neither,orboth.Astreamthat isneitherchannelnorobjectstructuredcanimplicitlyberenderedinavarietyofways.

12.2.4.2 Syntax

aligned(8) class ChannelLayout extends FullBox(‘chnl’) { unsigned int(8) stream_structure; if (stream_structure & channelStructured) { // 1 unsigned int(8) definedLayout; if (definedLayout==0) { for (i = 1 ; i <= channelCount ; i++) { // channelCount comes from the sample entry unsigned int(8) speaker_position; if (speaker_position == 126) { // explicit position signed int (16) azimuth; signed int (8) elevation; } } } else { unsigned int(64) omittedChannelsMap; // a ‘1’ bit indicates ‘not in this track’ } } if (stream_structure & objectStructured) { // 2 unsigned int(8) object_count; } }

12.2.4.3 Semantics

stream_structureisafieldofflagsthatdefinewhetherthestreamhaschannelorobjectstructure(orboth,orneither);thefollowingflagsaredefined,allothervaluesarereserved:1 thestreamcarrieschannels2 thestreamcarriesobjects

definedLayoutisaChannelConfigurationfromISO/IEC23001‐8;speaker_positionisanOutputChannelPositionfromISO/IEC23001‐8.Ifanexplicitpositionis

used,thentheazimuthandelevationareasdefinedasforspeakersinISO/IEC23001‐8.azimuthisasignedvalueindegrees,asdefinedforLoudspeakerAzimuthinISO/IEC23001‐8elevationisasignedvalue,indegrees,asdefinedforLoudspeakerElevationinISO/IEC23001‐8

Page 177: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 163

omittedChannelsMapisabit‐mapofomittedchannels;thebitsinthechannelmaparenumberfromleast‐significanttomost‐significant,andcorrespondinthatorderingwiththeorderofthechannelsfortheconfigurationasdocumentedinISO/IEC23001‐8ChannelConfiguration.1‐bitsin the channelmapmean that a channel is absent. A zero value of themap therefore alwaysmeansthatthegivenstandardlayoutisfullypresent.

12.2.5 Downmix Instructions

12.2.5.1 Definition

BoxTypes: ‘dmix’Container: AudiosampleentryMandatory: NoQuantity: Zeroormore

Thedownmixcanbecontrolledbytheproductionfacilityifnecessary.Forinstance,somecontentmayrequiremoreattenuationofthesurroundchannelsbeforedownmixingtomaintainintelligibility.

Thedownmixsupportisdesignedsothatanydownmix(e.g.from7.1toquadaswellastostereo)canbedescribed.

It is possible to declare the loudness characteristics of the signal after downmix, and after DRC anddownmix.

If targetChannelCount*baseChannelCount is odd, the box is padded with 4 bits set to 0xF. ThetargetChannelCountmustbeconsistentwiththetargetLayout(ifgiven),andmustbelessthanorequaltothechannelcount.

EachdownmixisuniquelyidentifiedbyanID.

12.2.5.2 Syntax

aligned(8) class DownMixInstructions extends FullBox(‘dmix’) { unsigned int(8) targetLayout; unsigned int(1) reserved = 0; unsigned int(7) targetChannelCount; bit(1) in_stream; unsigned int(7) downmix_ID; if (in_stream==0) { // downmix coefficients are out of stream and supplied here int i, j; for (i = 1 ; i <= targetChannelCount; i++){ for (j=1; j <= baseChannelCount; j++) { bit(4) bs_downmix_coefficient; } } } }

12.2.5.3 Semantics

targetLayoutisaChannelConfigurationfromISO/IEC23001‐8anddefinestheresultinglayoutafterdownmix

targetChannelCountisthecountofchannelsintheresultingstream,andmustcorrespondwiththetargetlayout

Page 178: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

164 ©ISO/IEC2015–Allrightsreserved

downmix_IDisanarbitraryvaluethatidentifiesthisdownmix,andmustbeuniqueamongtheDownMixInstructionsinagivensampleentry;therearetworeservedvalues,0and0x7F,whichmustnotbeused

in_streamhasavalueof1whenthedownmixcoefficientsareinthestream.Otherwise,itiszero..bs_downmix_coefficientisencodedasdefinedinthefollowingtables:

Value Hex Encoding (4 bits) 0.00dB 0x0‐0.50dB 0x1‐1.00dB 0x2‐1.50dB 0x3‐2.00dB 0x4‐2.50dB 0x5‐3.00dB 0x6‐3.50dB 0x7‐4.00dB 0x8‐4.50dB 0x9‐5.00dB 0xA‐5.50dB 0xB‐6.00dB 0xC‐7.50dB 0xD‐9.00dB 0xE‐∞dB 0xF

Table 5: Downmix Coefficient Encoding for non-LFE channels

Value Hex Encoding (4 bits)

10.00dB 0x06.00dB 0x14.5dB 0x23.00dB 0x31.50dB 0x40.00dB 0x5‐1.50dB 0x6‐3.00dB 0x7‐4.50dB 0x8‐6.00dB 0x9‐10.00dB 0xA‐15.00dB 0xB‐20.00dB 0xC‐30.00dB 0xD‐40.00dB 0xE‐∞dB 0xF

Table 6: Downmix Coefficient Encoding for LFE channel

Page 179: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 165

12.2.6 DRC Information

ADRCisusedintheencodertogenerategainvaluesusingoneofthepre‐definedDRCcharacteristicsasdefinedinISO/IEC23001‐8;thecoefficientsareplacedeitherin‐streamorinanassociatedmeta‐datatrack.

For somecontent, such as somemulti‐channel content, itmaybe advantageous tousedifferentDRCcharacteristicsindifferentchannels.Forinstance,ifspeechisexclusivelypresentinthecenterchannel,this feature can be very useful. It is supported by the assignment of DRC characteristics to audiochannels.

ItispossibletodeclaretheloudnesscharacteristicsofthesignalafterDRC.

DRCsupport includessupporting in‐streamDRCcoefficients,andaseparate trackcarrying them; thelatter is particularly useful for legacy coding systems (including uncompressed audio) that have noprovisionforin‐streamcoefficients.

In the ISObasemedia file format, the audio contentmaybe carried inmultiple trackswhere a basetrackcontainstheDRCmetadataforalltracks.Theadditionaltracksarereferencedbythebasetrackusingatrackreferenceoftype‘adda’(additionalaudio).ThechannelsprocessedbytheDRCareallthechannels inthebasetrack,plusall thechannels intrack(s)referenced, intheorderofthereferences.TheDRCchannelgroupsapplytoallthosechannels(eveniftheyarechannelsinatrackthatisdisabledornotcurrentlybeingplayed).

The boxes DRCCoefficientsBasic, DRCCoefficientsUniDRC, DRCInstructionsBasic,and DRCInstructionsUniDRC may occur in an AudioSampleEntry and are defined in ISO/IEC23003‐4.

12.2.7 Audio stream loudness

12.2.7.1 Introduction

BoxTypes: ‘ludt’Container: Trackuser‐data‘udta’Mandatory: NoQuantity: Zeroormore

Loudness declarations are placed in user‐data boxes, to enable their presence and update inmoviefragments.Inparticular,inlivescenarios,user‐dataintheinitialmovieatommaybea‘promisenottoexceed’or‘bestguess’,andthenuser‐dataupdatesgivebetter(butstillgenerallyvalid)values.Thus,forexample,aloudnessrangeinthisuserdatathatisassociatedwithaparticularsetofDRCinstructionsconstitutesa‘promise’ratherthanameasurement,underthesecircumstances.

Several metadata values are available that describe aspects of the dynamic range. The size of thedynamic rangecanbeuseful inadjusting theDRCcharacteristic, e.g. theDRC is lessaggressive if thedynamicrangeissmallortheDRCcanevenbeturnedoff.

TruePeakandmaximumloudnessvaluescanbeusefulforestimatingtheheadroom,forinstancewhenloudnessnormalizationresultsinapositivegain[dB]orwhenheadroomisneededtoavoidclippingof

Page 180: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

166 ©ISO/IEC2015–Allrightsreserved

thedownmix.TheDRCcharacteristic can thenbeadjusted toapproachaheadroom target.Thepeakleveloftheassociatedcontentisrepresentedhereinacoding‐independentway.

The audio sound pressure level that the contentwasmixed to can also be documented. (If audio islistenedtoatalevelotherthanthemixinglevel,thiscanaffecttheperceivedtonalbalance.)

Thefollowingmeasuresmayalsobeused:

MaximumoftheLoudnessRangederivedfromEBU‐Tech3342

MaximumMomentaryLoudnessderivedfromITU‐RBS.1771‐1orEBU‐Tech3341

MaximumShort‐TermLoudnessderivedfromITU‐RBS.1771‐1orEBU‐Tech3341

Short‐TermLoudnessdefinedinITU‐RBS.1771‐1orEBU‐Tech3341

Undersomecircumstances itcanbedesirableto indicatethe loudnesscharacteristicsofanalbum, ineach song that the album contains. A separate box can be specified for that purpose. TheTrackLoudnessInfo and AlbumLoudnessInfo provide loudness information for the song, and for theentirealbumwhichcontainsthesong,respectively.

The program loudness ismeasured using ITU‐R BS.1770‐3 over the associated content; the ‘anchorloudness’ is the loudness of the anchor content, where what that content is, is determined by thecontent author; one suitable value (especially for content for which the main content is speech) is‘dialognormal level’ orDialNormasdefined inATSCDoc.A/52:2012. ISO/IEC23003‐4 specifies themeasurementsystems,measurementmethodsandthecodingofallloudnessandpeak‐relatedvalues.

12.2.7.2 Syntax

aligned(8) class LoudnessBaseBox extends FullBox(loudnessType) { unsigned int(3) reserved = 0; unsigned int(7) downmix_ID; // matching downmix unsigned int(6) DRC_set_ID; // to match a DRC box signed int(12) bs_sample_peak_level; signed int(12) bs_true_peak_level; unsigned int(4) measurement_system_for_TP; unsigned int(4) reliability_for_TP; unsigned int(8) measurement_count; int i; for (i = 1 ; i <= measurement_count; i++){ unsigned int(8) method_definition; unsigned int(8) method_value; unsigned int(4) measurement_system; unsigned int(4) reliability; } }

aligned(8) class TrackLoudnessInfo extends LoudnessBaseBox(‘tlou’) { }

aligned(8) class AlbumLoudnessInfo extends LoudnessBaseBox (‘alou’) { }

aligned(8) class LoudnessBox extends Box(‘ludt’) { loudness TrackLoudnessInfo[]; // a set of one or more loudness boxes albumLoudness AlbumLoudnessInfo[]; // if applicable }

Page 181: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 167

12.2.7.3 Semantics

downmix_IDwhenzero,declarestheloudnesscharacteristicsofthelayoutwithoutdownmix.Ifnon‐zero,thisboxdeclarestheloudnessafterapplyingthedownmixwiththematchingdownmix_ID andmustmatchavalueinexactlyoneboxinthethesampleentryofthistrack

DRC_set_IDwhenzero,declaresthecharacteristicswithoutapplyingaDRC.Ifnon‐zero,thisboxdeclarestheloudnessafterapplyingtheDRCwiththematchingDRC_set_ID andmustmatchavalueinexactlyoneboxinthethesampleentryofthistrack

bs_sample_peak_leveltakesavalueforthesamplepeaklevelasdefinedinISO/IEC23003‐4;allothervaluesarereserved

bs_true_peak_leveltakesavalueforthetruepeaklevelasdefinedinISO/IEC23003‐4;allothervaluesarereserved

measurement_system_for_TPtakesanindexforthemeasurementsystemasdefinedinISO/IEC23003‐4;allothervaluesarereserved

method_definitiontakesanindexforthemeasurementmethodasdefinedinISO/IEC23003‐4;allothersarereserved

measurement_systemtakesanindexforthemeasurementsystemasdefinedinISO/IEC23003‐4;allothersarereserved

reliability and reliability_for_TP eachtakeoneofthefollowingvalues(allothervaluesarereserved): 0:Reliabilityisunknown1:Valueisreported/importedbutunverified2:Valueisa‘nottoexceed’ceiling3:Valueismeasuredandaccurate

12.3 Metadata media

12.3.1 Media handler

Timedmetadatamediausesthe‘meta’handlertypeinthehandlerboxofthemediabox,asdefinedin8.4.3.

NOTE MPEG‐7 streams, which are a specific kind of metadata stream, have their own handler declared,documentedintheMP4fileformat[ISO/IEC14496‐14].

NOTE metadatatracksarelinkedtothetracktheydescribeusingatrack‐referenceoftype‘cdsc’.

12.3.2 Media header

Metadatatracksuseanullmediaheader(‘nmhd’),asdefinedinsubclause8.4.5.2.

12.3.3 Sample entry

12.3.3.1 Definition

TimedmetadatatracksuseMetaDataSampleEntry.

AnoptionalBitRateBoxmaybepresentattheendofanyMetaDataSampleEntrytosignalthebitrateinformationofastream.Thiscanbeused forbufferconfiguration. IncaseofXMLmetadata itcanbeusedtochoosetheappropriatememoryrepresentationformat(DOM,STX).

AnoptionalbitrateboxmaybeusedintheURIMetaSampleEntryentry,asusual.

Page 182: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

168 ©ISO/IEC2015–Allrightsreserved

The URIMetaSampleEntry entry contains, in a box, the URI defining the form of the metadata, andoptionalinitializationdata.TheformatofboththesamplesandoftheinitializationdataisdefinedbyallorpartoftheURIform.

ItmaybethecasethattheURI identifiesaformatofmetadatathatallowstheretobemorethanone‘statedfact’withineachsample.However,allmetadatasamplesinthisformatareeffectively‘Iframes’,definingtheentiresetofmetadataforthetimeintervaltheycover.Thismeansthatthecompletesetofmetadataatanyinstant,foragiventrack,iscontainedin(a)thetime‐alignedsamplesofthetrack(s)(ifany)describingthattrack,plus(b)thetrackmetadata(ifany),themoviemetadata(ifany)andthefilemetadata(ifany).

Ifincrementally‐changedmetadataisneeded,theMPEG‐7frameworkprovidesthatcapability.

InformationonURIformsforsomemetadatasystemscanbefoundinAnnexG.

12.3.3.2 Syntax

class MetaDataSampleEntry(codingname) extends SampleEntry (codingname) { Box[] other_boxes; // optional }

class XMLMetaDataSampleEntry() extends MetaDataSampleEntry (’metx‘) { string content_encoding; // optional string namespace; string schema_location; // optional BitRateBox (); // optional }

class TextConfigBox() extends Fullbox (‘txtC’, 0, 0) { string text_config; }

class TextMetaDataSampleEntry() extends MetaDataSampleEntry (‘mett’) { string content_encoding; // optional string mime_format; BitRateBox (); // optional TextConfigBox (); // optional }

aligned(8) class URIBox extends FullBox(‘uri ’, version = 0, 0) { string theURI; }

aligned(8) class URIInitBox extends FullBox(‘uriI’, version = 0, 0) { unsigned int(8) uri_initialization_data[]; }

class URIMetaSampleEntry() extends MetaDataSampleEntry (’urim‘) { URIbox the_label; URIInitBox init; // optional BitRateBox (); // optional }

12.3.3.3 Semantics

content_encoding ‐ is a null-terminated string in UTF-8 characters, and provides aMIME typewhichidentifiesthecontentencodingofthetimedmetadata.Itisdefinedinthesamewayasforan ItemInfoEntry in this specification. If not present (an empty string is supplied) the timedmetadataisnotencoded.Anexampleforthisfieldis‘application/zip’.NotethatnoMIMEtypesforBiM[ISO/IEC23001‐1]andTeM[ISO/IEC15938‐1]currentlyexist.Thustheexperimental

Page 183: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 169

MIME types ‘application/x‐BiM’ and ‘text/x‐TeM’ shall be used to identify these encodingmechanisms.

namespaceisanull‐terminatedfieldconsistingofaspace‐separatedlist,inUTF‐8characters,ofoneormoreXMLnamespacestowhichthesampledocumentsconform.Whenusedformetadata,thisisneededforidentifyingitstype,e.g.gBSDorAQoS[MPEG‐21‐7]andfordecodingusingXMLawareencodingmechanismssuchasBiM.

schema_locationisanoptionalnull‐terminatedfieldconsistingofaspace‐separatedlist,inUTF‐8characters,ofzeroormoreURL’sforXMLschema(s)towhichthesampledocumentconforms.Ifthereisonenamespaceandoneschema,thenthisfieldshallbetheURLoftheoneschema.Ifthereismorethanonenamespace,thenthesyntaxofthisfieldshalladheretothatforxsi:schemaLocationattributeasdefinedby[XML].Whenusedformetadata,thisisneededfordecodingofthetimedmetadatabyXMLawareencodingmechanismssuchasBiM.

mime_format ‐providesaMIMEtype, innull‐terminatedUTF‐8characters,which identifies thecontentformatofthesamples.Examplesforthisfieldinclude‘text/html’and‘text/plain’.

text_config ‐ provides the initial text of eachdocument, innull‐terminatedUTF‐8 characters,whichisprependedbeforethecontentsofeachsyncsample.

theURI isaURIformattedaccordingtotherulesin6.2.4;uri_initialization_dataisopaquedatawhoseformisdefinedinthedocumentationofthe

URIform.

12.4 Hint media

12.4.1 Media handler

Hintmediausesthe‘hint’handlertypeinthehandlerboxofthemediabox,asdefinedin8.4.3.

12.4.2 Hint media header

12.4.2.1 Hint Media Header Box

BoxTypes: ’hmhd’Container: MediaInformationBox(‘minf’)Mandatory:YesQuantity: Exactlyonespecificmediaheadershallbepresent

Hint tracks use theHintMediaHeaderbox in themedia information box, as defined in 8.4.5. The hintmediaheader contains general information, independentof theprotocol, forhint tracks. (APDU is aProtocolDataUnit.)

12.4.2.2 Syntax

aligned(8) class HintMediaHeaderBox extends FullBox(‘hmhd’, version = 0, 0) { unsigned int(16) maxPDUsize; unsigned int(16) avgPDUsize; unsigned int(32) maxbitrate; unsigned int(32) avgbitrate; unsigned int(32) reserved = 0; }

12.4.2.3 Semantics

version isanintegerthatspecifiestheversionofthisboxmaxPDUsize givesthesizeinbytesofthelargestPDUinthis(hint)streamavgPDUsize givestheaveragesizeofaPDUovertheentirepresentationmaxbitrate givesthemaximumrateinbits/secondoveranywindowofonesecond

Page 184: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

170 ©ISO/IEC2015–Allrightsreserved

avgbitrate givestheaveragerateinbits/secondovertheentirepresentation

12.4.3 Sample entry

12.4.3.1 Definition

Hinttracksuseanentryformatspecifictotheirprotocol,withanappropriatename.

Forhinttracks,thesampledescriptioncontainsappropriatedeclarativedataforthestreamingprotocolbeingused,andtheformatofthehinttrack.Thedefinitionofthesampledescriptionisspecifictotheprotocol.

The ‘protocol’ and ‘codingname’ fields are registered identifiers that uniquely identify the streamingprotocolorcompressionformatdecodertobeused.Agivenprotocolorcodingnamemayhaveoptionalor required extensions to the sample description (e.g. codec initialization parameters). All suchextensionsshallbewithinboxes;theseboxesoccuraftertherequiredfields.Unrecognizedboxesshallbeignored.

12.4.3.2 Syntax

class HintSampleEntry() extends SampleEntry (protocol) { unsigned int(8) data []; }

12.5 Text media

12.5.1 Media handler

Thetimedtextmediatypeindicatesthattheassociateddecoderwillprocessonlytextdata.Timedtextmediausesthe‘text’handlertypeinthehandlerboxofthemediabox,asdefinedin8.4.3.

12.5.2 Media header

Timedtexttracksuseanullmediaheader(‘nmhd’),asdefinedinsubclause8.4.5.2.

12.5.3 Sample entry

12.5.3.1 Definition

TimedtexttracksusePlainTextSampleEntry.

12.5.3.2 Syntax

class PlainTextSampleEntry(codingname) extends SampleEntry (codingname) { }

class SimpleTextSampleEntry(codingname) extends PlainTextSampleEntry (‘stxt’) { string content_encoding; // optional string mime_format; BitRateBox (); // optional TextConfigBox (); // optional }

12.5.3.3 Semantics

content_encoding ‐ is a null-terminated string in UTF-8 characters, and provides aMIME typewhichidentifiesthecontentencodingofthetimedtext.Itisdefinedinthesamewayasforan

Page 185: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 171

ItemInfoEntryinthisspecification.Ifnotpresent(anemptystringissupplied)thetimedtextisnotencoded.Anexampleforthisfieldis‘application/zip’.

mime_format ‐providesaMIMEtype, innull‐terminatedUTF‐8characters,which identifies thecontentformatofthesamples.Examplesforthisfieldinclude‘text/html’and‘text/plain’.

12.6 Subtitle media

12.6.1 Media handler

The subtitle media type indicates that the associated decoder will process text data and possiblyimages.Subtitlemediausesthe‘subt’handlertypeinthehandlerboxofthemediabox,asdefinedin8.4.3.

12.6.2 Subtitle media header

12.6.2.1 Definition

SubtitletracksusetheSubtitleMediaHeaderboxinthemediainformationbox,asdefinedin8.4.5.Thesubtitlemediaheadercontainsgeneralpresentationinformation,independentofthecoding,forsubtitlemedia.Thisheaderisusedforalltrackscontainingsubtitles.

12.6.2.2 Syntax

aligned(8) class SubtitleMediaHeaderBox extends FullBox (‘sthd’, version = 0, flags = 0){ }

12.6.2.3 Semantics

version ‐isanintegerthatspecifiestheversionofthisbox.flags ‐isa24‐bitintegerwithflags(currentlyallzero).

12.6.3 Sample entry

12.6.3.1 Definition

SubtitletracksuseSubtitleSampleEntry.

12.6.3.2 Syntax

class SubtitleSampleEntry(codingname) extends SampleEntry (codingname) { }

class XMLSubtitleSampleEntry() extends SubtitleSampleEntry (’stpp‘) { string namespace; string schema_location; // optional string auxiliary_mime_types; // optional, required if auxiliary resources are present BitRateBox (); // optional }

class TextSubtitleSampleEntry() extends SubtitleSampleEntry (‘sbtt’) { string content_encoding; // optional string mime_format; BitRateBox (); // optional TextConfigBox (); // optional }

Page 186: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

172 ©ISO/IEC2015–Allrightsreserved

12.6.3.3 Semantics

content_encoding ‐ is a null-terminated string in UTF-8 characters, and provides aMIME typewhich identifies the content encodingof the subtitles. It isdefined in the samewayas foranItemInfoEntry in this specification. If not present (an empty string is supplied) the subtitlesamplesarenotencoded.Anexampleforthisfieldis‘application/zip’.

namespaceisanull‐terminatedfieldconsistingofaspace‐separatedlist,inUTF‐8characters,ofoneormoreXMLnamespacestowhichthesampledocumentsconform.Whenusedformetadata,thisisneededforidentifyingitstype,e.g.gBSDorAQoS[MPEG‐21‐7]andfordecodingusingXMLawareencodingmechanismssuchasBiM.

schema_locationisanoptionalnull‐terminatedfieldconsistingofaspace‐separatedlist,inUTF‐8characters,ofzeroormoreURL’sforXMLschema(s)towhichthesampledocumentconforms.Ifthereisonenamespaceandoneschema,thenthisfieldshallbetheURLoftheoneschema.Ifthereismorethanonenamespace,thenthesyntaxofthisfieldshalladheretothatforxsi:schemaLocationattributeasdefinedby[XML].Whenusedformetadata,thisisneededfordecodingofthetimedmetadatabyXMLawareencodingmechanismssuchasBiM.

mime_format ‐providesaMIMEtype, innull‐terminatedUTF‐8characters,which identifies thecontentformatofthesamples.Examplesforthisfieldinclude‘text/html’and‘text/plain’.

auxiliary_mime_typesindicatesthemediatypeofallauxiliaryresources,suchasimagesandfonts, ifpresent,storedassubtitlesubsamples. If thereismorethanonemime_type,thenthisfieldshallbeaspace‐separatedlist.Thisfieldisnull‐terminatedinUTF‐8characters.

12.7 Font media

12.7.1 Media handler

Fontmediausesthe‘fdsm’handlertypeinthehandlerboxofthemediabox,asdefinedin8.4.3.

12.7.2 Media header

FonttracksuseaNullMediaHeader.

12.7.3 Sample entry

12.7.3.1 Definition

FontstreamsuseaFontSampleEntry.

12.7.3.2 Syntax

class FontSampleEntry(codingname) extends SampleEntry (codingname){ //other boxes from derived specifications BitRateBox (); // optional } 12.8 Transformed media

Protectedmediaisdescribedin8.12.

Incompletemediaisdescribedin8.17.

Restrictedmediaisdescribedin8.15.

Page 187: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 173

Annex A(informative)

Overview and Introduction

A.1 Section Overview

This section provides an introduction to the file format, that potentially assists readers inunderstanding the overall concepts underlying the file format. It forms an informative annex to thisspecification.

A.2 Core Concepts

In the file format, the overall presentation is called amovie. It is logically divided into tracks; eachtrack represents a timed sequence ofmedia (frames of video, for example).Within each track, eachtimedunitiscalledasample;thismightbeaframeofvideooraudio.Samplesareimplicitlynumberedinsequence.Notethataframeofaudiomaydecompressintoasequenceofaudiosamples(inthesensethiswordisusedinaudio);ingeneral,thisspecificationusesthewordsampletomeanatimedframeorunit of data. Each trackhas oneormoresample descriptions; each sample in the track is tied to adescriptionbyreference.Thedescriptiondefineshowthesamplemaybedecoded(e.g.itidentifiesthecompressionalgorithmused).

Unlikemanyothermulti‐mediafileformats,thisformat,withitsancestors,separatesseveralconceptsthat are often linked. Understanding this separation is key to understanding the file format. Inparticular:

Thephysicalstructureofthefileisnottiedtothephysicalstructuresofthemediaitself.Forexample,many file formats ‘frame’ themedia data, putting headers or other data immediately before or aftereachframeofvideo;thisfileformatdoesnotdothis.

Neitherthephysicalstructureofthefile,northelayoutofthemedia,istiedtothetimeorderingofthemedia.Framesofvideoneednotbelaiddowninthefileintimeorder(thoughtheymaybe).

Thismeansthattherearefilestructuresthatdescribetheplacementandtimingofthemedia;thesefilestructurespermit,butdonotrequire,time‐orderedfiles.

Allthedatawithinaconformingfileisencapsulatedinboxes(calledatomsinpredecessorsofthisfileformat). There is no data outside the box structure. All the metadata, including that defining theplacement and timing of the media, is contained in structured boxes. This specification defines theboxes.Themediadata(framesofvideo, forexample) isreferredtobythismetadata.Themediadatamaybeinthesamefile(containedinoneormoreboxes),orcanbeinotherfiles;themetadatapermitsreferringtootherfilesbymeansofURLs.Theplacementofthemediadatawithinthesesecondaryfilesis entirely described by the metadata in the primary file. They need not be formatted to thisspecification,thoughtheymaybe;itispossiblethattherearenoboxes,forexample,inthesesecondarymediafiles.

Page 188: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

174 ©ISO/IEC2015–Allrightsreserved

Trackscanbeofvariouskinds.Threeareimportanthere.Video trackscontainsamplesthatarevisual;audio tracks contain audiomedia.Hint tracks are rather different; they contain instructions for astreamingserverinhowtoformpacketsforastreamingprotocol,fromthemediatracksinafile.Hinttrackscanbeignoredwhenafileisreadforlocalplayback;theyareonlyrelevanttostreaming.

A.3 Physical structure of the media

Theboxesthatdefinethelayoutofthemediadataarefoundinthesampletable.Theseincludethedatareference,thesamplesizetable,thesampletochunktable,andthechunkoffsettable.Betweenthem,thesetablesalloweachsampleinatracktobebothlocated,anditssizetobeknown.

Thedata referencespermitlocatingmediawithinsecondarymediafiles.Thisallowsacompositiontobebuiltfroma‘library’ofmediainseparatefiles,withoutactuallycopyingthemediaintoasinglefile.Thisgreatlyfacilitatesediting,forexample.

Thetablesarecompactedtosavespace.Inaddition,itisexpectedthattheinterleavewillnotbesampleby sample, but that several samples for a single trackwill occur together, then a set of samples foranothertrack,andsoon.Thesesetsofcontiguoussamplesforonetrackarecalledchunks.Eachchunkhasanoffsetintoitscontainingfile(fromthebeginningofthefile).Withinthechunk,thesamplesarecontiguously stored. Therefore, if a chunk contains two samples, the position of the secondmay befound by adding the size of the first to the offset for the chunk. The chunk offset table provides theoffsets;thesampletochunktableprovidesthemappingfromsamplenumbertochunknumber.

Notethatinbetweenthechunks(butnotwithinthem)theremaybe‘deadspace’,un‐referencedbythemediadata.Thus,duringediting,ifsomemediadataisnotneeded,itcansimplybeleftunreferenced;thedataneednotbecopiedtoremoveit.Likewise,ifthemediadataisinasecondaryfileformattedtoa‘foreign’fileformat,headersorotherstructuresimposedbythatforeignformatcansimplybeskipped.

A.4 Temporal structure of the media

Timinginthefilecanbeunderstoodbymeansofanumberofstructures.Themovie,andeachtrack,hasatimescale.Thisdefinesatimeaxiswhichhasanumberoftickspersecond.Bysuitablechoiceofthisnumber, exact timing can be achieved. Typically, this is the sampling rate of the audio, for an audiotrack. For video, a suitable scale should be chosen. For example, amediaTimeScale of 30000 andmediasampledurationsof1001exactlydefineNTSCvideo(often,butincorrectly,referredtoas29.97)andprovide19.9hoursoftimein32bits.

The timestructureofa trackmaybeaffectedbyanedit list.Theseprovide twokeycapabilities: themovement(andpossiblere‐use)ofportionsofthetime‐lineofatrack,intheoverallmovie,andalsotheinsertionof ‘blank’ time,knownasemptyedits.Note inparticularthat ifa trackdoesnotstartat thebeginningofapresentation,aninitialemptyeditisneeded.

Theoveralldurationofeachtrack isdefined inheaders; thisprovidesausefulsummaryof thetrack.Each sample has a defined duration. The exact presentation time (its time‐stamp) of a sample isdefinedbysummingthedurationsoftheprecedingsamples.

Page 189: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 175

A.5 Interleave

Thetemporalandphysicalstructuresofthefilemaybealigned.Thismeansthatthemediadatahasitsphysical orderwithin its container in timeorder, asused. In addition, if themedia data formultipletracksiscontainedinthesamefile,thismediadatawouldbeinterleaved.Typically,inordertosimplifythereadingofthemediadataforonetrack,andtokeepthetablescompact,thisinterleaveisdoneatasuitabletimeinterval(e.g.1second),ratherthansamplebysample.Thiskeepsthenumberofchunksdown,andthusthechunkoffsettablesmall.

A.6 Composition

If multiple audio tracks are contained in the same file, they are implicitly mixed for playback. Thismixingisaffectedbytheoveralltrackvolume,andtheleft/rightbalance.

Likewise,video tracksarecomposed,by following their layernumber(fromback to front),and theircomposition mode. In addition, each track may be transformed by means of a matrix, and also theoverall movie transformed by matrix. This permits both simple operations (e.g. pixel doubling,correction of 90º rotation) as well as more complex operations (shearing, arbitrary rotation, forexample).

Derivedspecificationsmayover‐ridethisdefaultcompositionofaudioandvideowithmorepowerfulsystems(e.g.MPEG‐4BIFS).

A.7 Random access

This section describes how to seek. Seeking is accomplished primarily by using the child boxescontainedinthesampletablebox.Ifaneditlistispresent,itmustalsobeconsulted.

IfyouwanttoseekagiventracktoatimeT,whereTisinthetimescaleofthemovieheaderbox,youcouldperformthefollowingoperations:

1) Ifthetrackcontainsaneditlist,determinewhicheditcontainsthetimeTbyiteratingovertheedits.ThestarttimeoftheeditinthemovietimescalemustthenbesubtractedfromthetimeTtogenerateT',thedurationintotheeditinthemovietimescale.T'isnextconvertedtothetimescaleofthetrack'smediatogenerateT''.Finally,thetimeinthemediascaletouseiscalculatedbyaddingthemediastarttimeoftheedittoT''.

2) Thetime‐to‐sampleboxforatrackindicateswhattimesareassociatedwithwhichsampleforthattrack.Usethisboxtofindthefirstsamplepriortothegiventime.

3) Thesamplethatwaslocatedinstep1maynotbeasyncsample.Thesyncsampletableindicateswhichsamplesareinfactrandomaccesspoints.Usingthistable,youcanlocatewhichisthefirstsyncsamplepriortothespecifiedtime.Theabsenceofthesyncsampletableindicatesthatallsamples are synchronizationpoints, andmakes thisproblemeasy.Having consulted the syncsampletable,youprobablywishtoseektowhicheverresultantsampleisclosestto,butpriorto,thesamplefoundinstep1.

Page 190: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

176 ©ISO/IEC2015–Allrightsreserved

4) Atthispointyouknowthesamplethatwillbeusedforrandomaccess.Usethesample‐to‐chunktabletodetermineinwhichchunkthissampleislocated.

5) Knowingwhichchunkcontainedthesampleinquestion,usethechunkoffsetboxtofigureoutwherethatchunkbegins.

6) Startingfromthisoffset,youcanusetheinformationcontainedinthesample‐to‐chunkboxandthesamplesizeboxtofigureoutwherewithinthischunkthesampleinquestionislocated.Thisisthedesiredinformation.

A.8 Fragmented movie files

This section introduces a technique thatmaybeused in ISO files,where the constructionof a singleMovieBoxinamovieisburdensome.Thiscanariseinatleastthefollowingcases:

Recording.At themoment, if a recordingapplicationcrashes, runsoutofdisk,or someotherincidenthappens,afterithaswrittenalotofmediatodiskbutbeforeitwritestheMovieBox,therecordeddataisunusable.Thisoccursbecausethefileformatinsiststhatallmetadata(theMovieBox)bewritteninonecontiguousareaofthefile.

Recording. On embedded devices, particularly still cameras, there is not the RAM to buffer aMovieBoxforthesizeofthestorageavailable,andre‐computingitwhenthemovieisclosedistooslow.Thesameriskofcrashingapplies,aswell.

HTTPfast‐start. If themovie isofreasonablesize(intermsof theMovieBox, ifnottime), theMovieBoxcantakeanuncomfortableperiodtodownloadbeforefast‐starthappens.

Thebasic 'shape'of themovie isset in initialMovieBox: thenumberof tracks, theavailablesampledescriptions, width, height, composition, and so on. However the Movie Box does not contain theinformationforthefulldurationofthemovie;inparticular,itmayhavefewornosamplesinitstracks.

Tothisminimaloremptymovie,extrasamplesareadded,instructurecalledmoviefragments.

ThebasicdesignphilosophyisthesameasintheMovieBox;dataisnot'framed'.However,thedesignissuch that it can be treated as a 'framing' design if that is needed. The structuresmap readily to theMovieBox,soanfragmentedpresentationcanberewrittenasasingleMovieBox.

Theapproach is thatdefaultsareset foreachsample,bothglobally(onceper track)andwithineachfragment.Onlythosefragmentsthathavenon‐defaultvaluesneedincludethosevalues.Thismakesthecommoncase—regular,repeating,structures—compact,withoutdisablingtheincrementalbuildingofmoviesthathavevariations.

TheregularMovieBoxsetsupthestructureofthemovie.Itmayoccuranywhereinthefile,thoughitisbestforreadersifitprecedesthefragments.(Thisisnotarule,astrivialchangestotheMovieBoxthatforceittotheendofthefilewouldthenbeimpossible).ThisMovieBox:

mustrepresentavalidmovieinitsownright(thoughthetracksmayhavenosamplesatall);

Page 191: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 177

hasanboxinittoindicatethatfragmentsshouldbefoundandused;

isusedtocontainthecompleteeditlist(ifany).

Note that software that doesn't understand fragmentswill play just this initialmovie. Software thatdoesunderstandfragmentsandgetsanon‐fragmentedmoviewon'tscanforfragmentsasthefragmentindicationboxwon'tbefound.

Page 192: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

178 ©ISO/IEC2015–Allrightsreserved

Annex B(void)

Page 193: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 179

Annex C(informative)

Guidelines on deriving from this specification

C.1 Introduction

ThisAnnexprovidesinformativetexttoexplainhowtoderiveaspecificfileformatfromtheISOBaseMediaFileFormat.

ISO/IEC14496‐12 | ISO/IEC15444‐12 ISO BaseMedia Format defines the basic structure of the fileformat. Media‐specific and user‐defined extensions can be provided in other specifications that arederivedfromtheISOBaseMediaFileFormat.

C.2 General Principles

C.2.1 General

Anumberofexisting file formatsuse the ISOBaseMediaFileFormat,not least theMPEG‐4MP4FileFormat (ISO/IEC14496‐14), and the Motion JPEG 2000 MJ2 File Format (ISO/IEC15444‐3). Whenconsidering a new specification derived from the ISO Base Media File format, all the existingspecificationsshouldbeusedbothasexamplesandasourceofdefinitionsandtechnology.Checkwiththeregistrationauthoritytofindwhatmightalreadyexist,andwhatspecificationsexist.

Inparticular,ifanexistingspecificationalreadycovershowaparticularmediatypeisstoredinthefileformat (e.g. MPEG‐4 video in MP4), that definition should be used and a new one should not beinvented. In thiswayspecificationswhichshare technologywillalsoshare thedefinitionofhowthattechnologyisrepresented.

Beaspermissiveaspossiblewithrespecttothepresenceofotherinformationinthefile;indicatethatunrecognizedboxesandmediamaybeignored(not“shouldbeignored”).Thispermitsthecreationofhybrid files, drawing from more than one specification, and the creation of multi‐format players,capableofhandlingmorethanonespecification.

When layeringon this specification, it'sworthobserving that thereare somecharacteristics that areintentionally‘parameters’tothelower(Part12)specification,thatneedtobespecified.Equally,therearesomecharacteristicsofthePart12file formatspecificationthatareinternalandshouldrarelybediscussedbyotherspecifications.Ofcourse,therearesomecharacteristicsinagreyareainbetween.

Derivedspecificationsareideallywrittensolely intermsoftheparametersofthePart12file format;whatasampleis,whatitstimestampsmean,andsoon.Mentioningspecificexistingboxesinaderivedspecificationmayoftenturnouttobeanerror,exceptinlimitedcases(e.g.addingauser‐databox,oranextensionbox).

Page 194: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

180 ©ISO/IEC2015–Allrightsreserved

C.2.2 Base layer operations

ItshouldbepossibletoperformsomeoperationsonaPart12filewithoutknowinganythingaboutanypotentialderivedspecifications.Theseoperationsmightincludetheobviousreadingtracks,findingthedataandtimingforsamples,andtheirsampledescriptionandtracktype,andsoon.Thismightbedone,forexample,byafile‐formatinspectororgenerallibrarylikethereferencesoftware.

Lessobviousareaclassofmanipulationsofthefiles:

a) re‐interleaving the data;making themedia data in time order, with the samples for varioustracksgroupedintochunksofasensiblesize,withthechunksinterleaved;

b) makingfilesthatusedatareferencesself‐contained,bycopyingthedatafromexternalfilesintothenewfile;

c) removingfreespaceatomsandcompactingtheatomstructure;d) removing data from ‘mdat’ atoms that appears to be un‐referenced by tracks or meta‐data

atoms;e) removingsampleentriesthathavenoassociatedsamples;f) removingsamplegroupsthathavenoassociatedsamples;g) extracting some tracks and making a new file with just those (e.g. an audio track from an

audio/videopresentation);h) inserting,orremoving,moviefragments,orre‐fragmentingamovie.

Thislistisnotexhaustive,ofcourse.

C.3 Boxes

You can add boxes to the file format, but be careful about how they interact with other boxes. Inparticular,ifthey‘cross‐link’intoexistingboxes,youmightnotbeabletomarksuchfilesascompliantwithPart12.

Youmustregisterallnewboxes,exceptthoseusingthe‘uuid’type.Likewise,youshouldregistercodec(sample entry) names, brands, track reference types, handlers (media types), group types, andprotectionschemetypes.Itreallyisabadideatouseoneofthesewithoutregistration,ascollisionsmayoccur–orsomeoneelsemayregisterthesameidentifierwithadifferentmeaning.

Youshouldnotwriteaboxusingthe‘UUIDescape’(thereservedISOUUIDpattern0xXXXXXXXX‐0011‐0010‐8000‐00AA00389B71,wherethefour‐charactercodereplacestheX’s)ifasimplefour‐character‐codecanbeused,andideallyyoushouldn’tdesigntouseaUUIDbox;it’sbettertoplaceyourdatainknown‘expansionpoints’ofthefileformatifatallpossible,orregisteranewboxtypeifreallyneeded.

Don’tforgetthatalldatainISOfilesmustbe,orbecontainedin,boxes.Youcanintroduceasignature,butitmust‘looklike’abox.

Donotrequirethatanyexistingornewboxesyoudefinebeinaparticularposition,ifatallpossible.Forexample,theexistingJPEG2000specificationsrequireasignatureboxandthatitbefirstinthefile.Ifanother specification also defines a signature box and also requires that it be first, then a fileconformanttobothspecificationscannotbeconstructed.

Itmust be possible to ‘walk’ the top‐level of a file by finding box lengths. Don’t forget that ‘impliedlength’ispermittedatfilelevel.

Page 195: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 181

Unlessabsolutelyunavoidable,boxesshouldcontaineitherdata(e.g.infields),orotherboxes,butnotboth. All boxes containing data should be a full box to allow later changes to syntax and semantics.Boxescontainingotherboxesareknownascontainerboxes, andarenormallyaplain (non‐full)box,sincetheirsemanticswillneverchangeiftheyaredocumentedtocontainonlyboxes.

C.4 Brand Identifiers

C.4.1 Introduction

Thissectioncoverstheuseofbrandidentifiersinthefile‐typebox,including:- Introductionofanewbrand.- Player’sbehaviourdependingonthebrand.- SettingofthebrandonthecreationoftheISOBaseMediafile.

Brandsidentifyaspecificationandmakeasimplesetofstatements:a) thefileconformstoallrequirementsoftheidentifiedspecification;b) thefilecontainsnothingcontrarytotheidentifiedspecification;c) a reader implementing potentially that single specification may read, interpret, and possibly

presentthefile,ignoringdataitdoesnotrecognize.

Specificationsshouldthereforesay(iftheyneedabrand)“thebrandthatidentifiesfilesconformanttothisspecificationisXXXX”,andregisterthebrand.

C.4.2 Usage of the Brand

Inordertoidentifythespecificationstowhichthefilecomplies,brandsareusedasidentifiersinthefileformat.ThesebrandsaresetwintheFileTypeBox.

Forexample,abrandmightindicate:(1)thecodecsthatmaybepresentinthefile,(2)howthedataofeachcodecisstored,(3)constraintsandextensionsthatareappliedtothefile.

Newbrandsmayberegisteredifitisnecessarytomakeanewspecificationthatisnotfullyconformanttotheexistingstandards.Forexample,3GPPallowsusingAMRandH.263inthefileformat.Sincethesecodecswerenotsupportedinanystandardsatthattime,3GPPspecifiedtheusageoftheSampleEntryandtemplatefieldsintheISOBaseMediaFormataswellasdefiningnewboxestowhichthesecodecsrefer.Consideringthatthefileformatisusedmorewidelyinthefuture,itisexpectedthatmorebrandswillbeneeded.

Brandsarenotadditive; theystandalone.Youcannotsay: “thisbrand indicates that support forY isalsorequired”becausethe‘also’hasnoreferent.

Systems that re‐write files should remove brands that they do not recognize, as they do not knowwhetherthefilestillconformstothatbrand’srequirements(e.g.re‐interleavingafilemaytakeitoutofconformancewithaspecificationthatrequiresacertainstyleofinterleaving).

Page 196: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

182 ©ISO/IEC2015–Allrightsreserved

Notethatthemajorbrandusuallyimpliesthefileextension,whichinturnimpliestheMIMEtype.Butthesearenotrules.Inaddition,whenservingunderaMIMEtypedonotforgetthatMIMEtypescantakeparameters,andthelistofcompatiblebrandswouldoftenbeusefultothereceivingsystem.

C.4.3 Introduction of a new brand

Anewbrand canbedefined if conformance to anew specificationmust be indicated.This generallymeansthatforthedefinitionofanewbrandatleastoneofthefollowingconditionsshouldbesatisfied:

1. Useofacodecthatisnotsupportedinanyexistingbrands.

2. Usemorethanonecodecinacombinationthatisnotsupportedinanyexistingbrands.Inaddition,theplaybackofthefileisallowedonlywhendecodingofallthemediainthefileissupportedbytheplayer.

3. Useconstraintsand/orextensions(Boxes,templatefields,etc.)thatareuser‐specific.

However,thefileformatcontainsbothamajor_brandfieldandacompatible_brandsarray.Thesefieldsareownedbythefileauthorandthepart12specification.Donotwriteaspecificationthattalksaboutthesefields,merelyaboutbrandsandwhattheymean.Inparticular,donotclaimthemajor_brandfield(“files conformant to this specificationmust set themajor_brand to XXXX”) as a file could never beconformanttotwosuchspecificationswrittenthatway,andyoualsoblocksomeonealsofromderivinga specification from yours. However, brands that are only permitted as compatible brands may bedefined.

Brandscanbeusedasatracer,however.It’sperfectlylegaltohaveabrandwhichhasnorequirements,and is placed in a file as an ‘Iwas there’ point (or strictly “this brand requires that the filewas lastwrittenbyZZZZ”).

C.4.4 Player Guideline

Ifmore than one brand is present in the list of the compatible_brands, and one ormore brands aresupported by the player, the player shall play those aspects of the file that comply with thosespecifications.Inthiscase,theplayermaynotbeabletodecodeunsupportedmedia.

C.4.5 Authoring Guideline

If the author wants to create a file that complies with more than one specification, the followingconsiderationsapply:

1. Theremustbenothingcontrarytothespecificationidentifiedbyabrandwithinthefile.Forexample,ifaspecificationrequiresthatfilesbeself‐contained,thenthebrandindicationofthatspecificationmustnotbeusedonnon‐self‐containedfiles.

2. If theauthorissatisfiedthataplayercompliantwithonlyoneofthespecificationsplayonlythatmediacompliantwiththatspecification,thenthatbrandmaybeindicated.

3. Iftheauthorrequiresthatthemediafrommorethanonespecificationbeplayed,thenanewbrandwouldbeneededasthisrepresentsanewconformancerequirementfortheplayer.

Page 197: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 183

C.4.6 Example

Inthissection,wetaketheexamplecasewhenanewbrandcanbedefined.

Firstofall,weexplainaboutthetwocurrentlyexistingbrands. If thebrand ‘3gp5’ is inthelistofthecompatible_brands,itindicatesthatthefilecontainsthemediadefinedin3GPPTS26.234(Release5)inthewayspecifiedbythestandard.Forexample,thefileof‘3gp5’brandmaycontainH.263.Likewise,ifthe brand ‘mp42’ is in the list of the compatible_brands, it indicate that the file contains themediadefinedintheISO/IEC14496‐14inthespecifiedway.Forexample,thefileof‘mp42’brandmaycontainMP3.However,MP3isnotsupportedin‘3gp5’brand.

GiventhatthefilecontainsH.263andMP3,andhas‘3gp5’and‘mp42’asthecompatible_brands.Iftheplayercompliesonlywith‘3gp5’anddoesnotsupportMP3,recommendedbehaviouroftheplayeristoplay only H.263. If the content’s author does not expect such behaviour, a new brand is defined toindicatethatbothH.263andMP3aresupportedinthefile.Byspecifyingthenewlydefinedbrandinthelistofthecompatible_brands, itcanpreventtheabovebehaviourandthefileisplayedonlywhentheplayersupportsbothH.263andMP3.

C.5 Storage of new media types

Therearetwochoicesinthedefinitionofhowanewmediatypeshouldbestored.

First,ifMPEG‐4systemsconstructsaredesiredoracceptable,then:a) anewObjectTypeIndicationshouldberequestedandused;b) thedecoderspecificinformationforthiscodecshouldbedefinedasanMPEG‐4descriptor;c) theaccessunitformatshouldbedefinedforthismedia.

ThemediathenusestheMPEG‐4code‐pointsinthefileformat;forexample,anewvideocodecwoulduseasampleentryoftype‘mp4v’.

IftheMPEG‐4systemslayerisnotsuitableorotherwisenotdesired,then:a) anewsampleentryfour‐charactercodeshouldberequestedandused;b) anyadditionalinformationneededbythedecodershouldbedefinedasboxestobestored

withinthesampleentry;c) thefile‐formatsampleformatshouldbedefinedforthismedia.

Notethatinthesecondcase,theregistrationauthoritywillalsoallocateanobjecttypeindicationforuseinMPEG‐4systems.

C.6 Use of Template fields

Templatefieldsaredefinedinthefileformat.Ifanyareusedinaderivedspecification,theusemustbecompatiblewiththebasedefinition,andthatuseexplicitlydocumented.

Page 198: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

184 ©ISO/IEC2015–Allrightsreserved

C.7 Tracks

C.7.1 Data Location

Atrackisatimedsequenceofsamples;eachsampleisdefinedbyitsdata(thebytesitcontains),theirlength and location. The length and data of a sample are external parameters to the file format; thelocationofthebytesisnot.

TheexactwaythatthedataisstoredisinternaltothePart12fileformat.Whendefiningwhatasampleinyourformatis,youshoulddefinethelengthandthedataofasample.

Youshouldnotmentionthe followingboxes,however,asthewaythattheyarestructured isopentochange,andtheinformationthattheystoremaybestoredinotherways(e.g.samplesizeinformationmaybeinanstszbox,anstz2box,oramoviefragment):

samplesize(stsz),compactsamplesize(stz2)

Samplesare, infact,storedincontiguousrunsofsamplesforonetrack;theserunsarecalledchunks,and it is chunks from different tracks that are interleaved. But files may be re‐interleaved or re‐chunked;thefollowingboxesareabouthowchunkingisdone:

chunkoffsets(stcoorco64),sample‐to‐chunk(stsc)

Mostcritically,locatingdatainaPart12filemustbedonethroughtheseboxes(ortheirequivalentinmoviefragments).Themediadatabox(‘mdat’)ismerelyonepossiblelocation,andlookedatbyitself,itcan only be considered an un‐ordered bag of un‐identifiable bits. There is no assurance that thedesirable material in a media‐data box is the only data in the box or in any particular order, and,especially if data references are used, there is no assurance that any particular sample is even in amedia‐data box at all. Mentioning the media‐data (‘mdat’) box in a derived specification is almostcertainly a mistake, and attempting to define (or assume) its structure is usurping the Part 12specification,andisanerror.

It isperfectlypermissible torequireacertainstyle,duration,orsizeof interleaving inan integrationspecification(“thisspecificationrequiresthatthe filebeself‐contained,andthatthemedia‐databeindecodingtimeorder,interleavedonagranularityofnogreaterthanonesecond”).

C.7.2 Time

Similarly, samples are parameterized in time in the file format by their decoding timestamp, andoptionally by their composition timestamp. You should define what these mean for your media.However,thewaythatthesearestoredisagaininternaltothepart12fileformat.

Youshouldnotmentionthe followingboxes,however,asthewaythattheyarestructured isopentochange,andtheinformationthattheystoremaybestoredinotherways:

time‐to‐samplebox(stts),compositionoffsets(ctts)

Page 199: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 185

Likewise,thetime‐structureeffectofeditsshouldbepreservedbythefileformat,butthereaPart12filesimplifiermay, forexample,mergetwoadjacenteditsthatinfactbelongtogether(e.g.twoemptyedits,oraneditthatselectstimeA‐BfollowedbyonethatselectsB‐C).

C.7.3 Media Types

There are a number ofmedia types in the Part 12 specification: video, audio,meta‐data, and so on.These are represented by track handler types and bymedia‐specificmedia headers. It is possible toregisternewmediahandlers,butthisisrarelyrequired.Itmightbeneeded,forexample,ifatracktypewere needed for say, laboratory instrument traces, or for a ‘timed aroma’ track. The registrationauthority should also be checked; the needed handler might be already defined in another derivedspecification.

C.7.4 Coding Types

Thenameofasampleentryidentifiesthecodingformatused.ThisisoneoftheprincipalwaysthatthePart12specificationisparameterized;AVC(MPEG‐4Part10)uses‘avc1’forexample,asasampleentrytype.Defining this name for a codec, and registering it, and then definingwhat extra boxes are in asampleentryforthiscodec,areprimarywaysthatthePart12formatisused.Youshoulddefinetheseforyourcodingsystem.Notethattechnicallythecodingtypeis‘scoped’bythemediatype(thoughwetrynottodefinethesamefour‐character‐codeastwodifferentcodecsintwomediatypes,suchasvideoandaudio,inordertoavoidconfusion).

C.7.5 Sub-sample information

The part 12 specification can carry information about ‘sub‐sample’ boundaries for each sample.However,thedefinitionofwhatasub‐sampleis,isspecifictoacodingsystem.Youmightwishtodefineitwhendefininghowacodingsystemisstored.

C.7.6 Sample Dependency

Thepart12 formatallowsyouto identifysomeof thedecodingdependency informationforacodingsystem.Inparticular,youshouldidentifywhatconstitutesavalid‘sync’orrandomaccesspoint(pointsfromwhichdecodingmaybestarted).Theycanbemarkedinthefileformat(inthesyncsampletable,orbyflagsinmoviefragments).Howsyncsamplearemarkedshouldbeoflessconcern.

Similarly,itispossibletoindicatewhichsamples:a) dependonothers,orcanbedecodedindependently;b) aredependedonbyothers,orcanbediscardedwithoutaffectingdecoding;c) containmultipleencodingsofthesameinformation,possiblywithdifferentdependencies(are

redundantlycoded).

Formostcodingsystemsthemeaningsoftheseareself‐evidentanddonotneedspellingout;however,theymayneedexplicitstatementforsomecodingsystems.

C.7.7 Sample Groups

Sample groups provide another way to describe samples and their characteristics. To use samplegroups,youcandefineagrouptype,andthenhowagroupisdefined(thegroupdescription).Thefileformatcanthenmapagivensample toasingledefinitionofagroupofanygiventype.Definingnew

Page 200: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

186 ©ISO/IEC2015–Allrightsreserved

groupingtypesandthewaythattheyareparameterizedisanimportantwaytoparameterizethefileformat.

C.7.8 Track-level

Trackscanbeassociatedwitheachotherinthefileformat,intwoimportantways.Trackreferencesareatypedlinkindicatingareferenceordependencyofonetracktooronanother(e.g.ameta‐datatrackthatdescribesamedia trackhasadependencyonthatmedia track,as itmakesnosensewithout it).Newtrackreferencetypescanberegisteredandusedinderivedspecifications.

Similarlytracksmaybegroupedintosetsofalternatives,wherethereaderisexpectedtobeabletopickonethatsuitsit(e.g.onthebasisofsupportedcodecs,bit‐rates,screensizes,andsoon).3GPP26.234hastakenthisconceptandincludeduser‐data(apermittedextension)togiveahintastowhyatrackisamemberofagroup(‘Icontainadifferentcodec’).

Lastly,tracksmaybeenabledordisabledinthefileformat.Disabledtracksmightbeused,forexample,foroptionalfeatures(e.g.closedcaptions).

C.7.9 Protection

Similarlytotheparameterizationofcodingschemesbyusingthesameentrytype,andextraboxesinthe sampleentry, thepart12 formatallowsprotection tobeapplied to tracks,parameterizedby theschemetypeandthecontentsoftheschemeinformationbox.Theschemeinformationboxis‘owned’bythescheme type– to theextent that containedboxes theredonotneed toberegistered,as theyarealreadyscopedbytheschemetype.

Protection can be subtle; many encryption systems, for example, ‘chain’ together. It’s tempting toencrypt‘thecontentsofthemdatbox’,butthatisverybadlynon‐resilienttominorchangestothefile.It’salsotemptingtoprotectchunks–theydoseemtorepresentcontiguousrunsofmediadataforonetrack.Butagain,re‐chunkingthefilemaybreaktheabilitytode‐protect.

Instead,considermodifyingthesample,orintroducingtime‐parallelmeta‐data,orusesamplegroups,to introduce enough context to enable both file‐based manipulation and decryption. Time‐parallelmeta‐datawouldbeinatrack,andatrackreferenceshouldbeusedtoindicatethattheprotecteddatadependsontheparallelencryption‐contexttrack.

C.8 Construction of fragmented movies

Whenconstructingafragmentedfileforplayback,therearesomerecommendationsforstructuringthecontentwhichwouldoptimizeplaybackandrandomaccess.Therecommendationsareasfollows:

Thefileshouldconsistofboxesinthefollowingorder:‐ 'ftyp'‐ 'moov'‐ pairof'moof'and'mdat'(arbitrarynumber)‐ 'mfra'

Page 201: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 187

A'moof'boxconsistsofatmostone'traf'foreachmedia.Whenthefilecontainsasinglevideotrackandasingleaudiotrack,the'moof'willcontaintwo'traf',oneforthevideoandonefortheaudio.

For video, randomaccessible samples are stored as the first sampleof each 'traf'. In the caseofgradual decoder refresh, a random accessible sample and the corresponding recovery point arestored in the samemovie fragment. For audio, samples having the closest presentation time foreveryvideorandomaccessiblesamplearestoredasthefirstsampleofeach'traf'.Hence,thefirstsamplesofeachmediainthe'moof'havetheapproximatelyequalpresentationtimes.

First(randomaccessible)samplesarerecordedinthe'mfra'forbothvideoandaudio.

Allsamplesin‘mdat’areinterleavedwithanappropriateinterleavedepth.

Theoffsetandthe initialpresentationtimeofevery 'moof'aregiven in the 'mfra' forbothaudioandvideo.

Theplayerwill loadthe 'moov'and 'mfra' initially,andhold theminmemoryduringplayback.Whenrandomaccessisneeded,theplayerwillsearch'mfra'inordertofindtherandomaccesspointhavingtheclosestpresentationtimefortheindicatedtime.

Since the first sample in the 'moof' is random accessible, the player can directory jump in on therandomaccesspoint.Theplayercanreadthe'moof'oftherandomaccesspointfromthebeginning.Thesubsequent'mdat'startsfromtherandomaccessiblesample.Assuch,atwo‐stepseekingwouldnotbenecessaryforrandomaccess.

Notethatan‘mfra’boxisoptional,andmightneveroccurinagivenfile.

C.9 Meta-data

Much of what is said above about tracks and their data applies to meta‐data items, except that, ofcourse, meta‐data items have no time structure. In particular, the division of items into extents –allowingthemtobeinterleaved–isagain,apropertyofthefileformat.Itwouldbeamistaketodesignsomenewsupportbasedonextentstructure.

C.10 Registration

Register! If indoubt,contact theregistrationauthorityathttp://www.mp4ra.org.Registration is free,and so is the advice and help you will get. Not registering means that your use may conflict withsomeone else, and your use is also un‐traceable and therefore effectively undocumented. The RA isawareofmanybrands(at least)beingcheerfully inventedandused,butnotregistered.Thesepeopleare‘flyingdangerously’;don’tjointhem.

C.11 Guidelines on the use of sample groups, timed metadata tracks, and sample auxiliary information

TheISOBaseMediaFileFormatcontainsthreemechanismsfortimedmetadatathatcanbeassociatedwith particular samples: sample groups, timed metadata tracks, and sample auxiliary information.

Page 202: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

188 ©ISO/IEC2015–Allrightsreserved

Derivedspecificationmayprovidesimilar functionalitywithoneormoreof these threemechanisms.ThisClauseprovidesguidelinesforderivedspecificationstochoosebetweenthethreemechanisms.

Sample groups and timed metadata are less tightly coupled to the media data and are typically‘descriptive’,whereassampleauxiliaryinformationmightberequiredfordecoding.

Sampleauxiliaryinformationisonly intendedforusewheretheinformationisdirectlyrelatedtothesampleonaone‐to‐onebasis,and isrequired for themediasampleprocessingandpresentation.Forgeneralcontent,theexistingsolutionofadditionaltracksshouldbeused.Sampleauxiliaryinformationandsamplemediadataarebothaddressedusingbytepointersandsizeinformation,andsowhenthesamebytesformthedataformorethanonesampleitmaybepossibletosharethatdatabyre‐usingthesamebytepointer.

Samplegroupsmaybeusefulinthefollowingoccasions.

- When several samples share the same metadata values, it is space‐efficient to specify themetadata in a Sample Group Description box and the association of samples to metadata inSampletoGroupbox(es).

- As the sample group information is stored in Movie box and Movie Fragment box(es), theyprovideanindextothedataintheMediaDataboxes.NodatafromtheMediaDataboxesneedto be fetched,whichmay therefore reduce disk accesseswhen compared to timedmetadatatracksandsampleauxiliaryinformation.

Timedmetadatatracksmaybeusefulinthefollowingoccasions.

- Thesame timedmetadata trackmaybeassociated tomore thanone track. Inotherwords, atimedmetadata trackmay bemore independent of the content of the associated tracks thansamplegroupsandsampleauxiliaryinformation.

- It may be easier to append a file with a timed metadata track than with sample auxiliaryinformation or sample groups, because sample auxiliary information and Sample to GroupboxeshavetoresideinthesameTrackFragmentboxastheassociatedsamples,whereastimedmetadata may reside in its own Movie Fragment box(es). For example, it may be easier toprovideanadditionalsubtitletrackastimedmetadatathanusesampleauxiliaryinformation.

- Thedurationof timedmetadatasamplesneednotmatch thedurationofassociatedmediaorhint samples. In cases where the duration of timed metadata samples spans over multipleassociated media or hint samples, timed metadata tracks may be more space‐efficient thansampleauxiliaryinformation.

Sampleauxiliaryinformationmaybeusefulinthefollowingoccasions.

- Thedataassociatedwithsamplesischangingsufficientlyfrequentlysuchthatspecifyingsamplegroupsmaynotbejustifiedfromstoragespacepointofview.

- TheamountofdataassociatedwithsamplesissuchlargethatitscarriagewithintheMovieboxor Movie Fragment box (as required by sample grouping) would cause disadvantages. For

Page 203: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 189

example,inprogressivedownloading,itmaybebeneficialtomakethesizeofMovieboxsmallinordertokeeptheinitialbufferingtimesmall.

- Wheneachsampleisassociatedwithmetadata,sampleauxiliaryinformationprovidesamorestraightforwardassociationoftheauxiliaryinformationtosampleswhencomparedtothesamefunctionalitywith timedmetadata tracks,which typically requires resolving sampledecodingtimetoestablishtheassociationbetweentimedmetadatasamplesandmedia/hintsamples.

Page 204: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

190 ©ISO/IEC2015–Allrightsreserved

Annex D(informative)

Registration Authority

D.1 Code points to be registered

The code‐points within the file format are all 32‐bit fields, normally four printable characters(commonlyknownasfour‐character‐codesor4CCs).Anobjecttypeidentifierisan8‐bitinteger.

Thecode‐pointsthatmayberegisteredare:

1) Fileformatboxidentifiers.Notethatinsomespecificationsboxeswereknownasatoms.Notethattheintroductionofnewatomtypesisdiscouraged;ingeneralotherextensibilityfeaturesofthefileformatshouldbeusedifpossible.

2) Fileformattracktypeidentifiers.Apairofidentifiersisusuallyusedhere,toidentifythetracktype (audio, video, etc.) and, if required, amedia‐specific header atom (videomedia header,etc.). It isexpectedthattheneedfornewtracktypesisrare,however;mostmediashouldfallintoexistingtypes(e.g.videocodecsshouldusevideotracks,hintprotocolsusehinttracks,andsoon).

3) Fileformatsampledescriptionandsampleformatidentifiers(alsoknownascodecnames).Thisincludesaudioandvideocodecs,andalsoprotocolidentifiersforhinttracks.Anyregistrationofanewsampleformatwillautomaticallybeissuedanobject‐typeidentifieralso(seebelow),thusmaking the identification of the carriage of this format within the MPEG‐4 systems objectdescriptorframeworkpossible.

4) Fileformattrackreferenceidentifiers.Dependenciesbetweentracksaretypedinthefileformat(forexample,hinttracksdependonthemediatrackstheyhint,usingatrackdependencyoftype‘hint’).

5) This specification includes a ‘file type’ atom which includes a list of ‘brands’ which identifywhichspecificationsthefileisconformantto.Bodiesdefiningstandardsbasedonthestructuraldefinition of this file formatwould normally use a new brand to identify files conformant totheirspecification.Anyregistrationofanewbrandmustspecifytheprecisespecificationwhichthebrandidentifies.

6) WithintheMPEG‐4objectdescriptorframework,theobjecttypevalueisusedtoidentifytheformatof thestreams.Anobjecttype identifiermayberequested independentlyof the fileformatidentifiersabove.

7) Samplegroupsassociatetypedinformationwithgroupsofsamples.Thegroupingtypemayberegistered.

Page 205: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 191

8) Bothmedia andmetadata canbeprotectedand theprotection schemeused identifiedwitharegisteredprotectionschemetype.

Thesecode‐pointsarereferredtointherestofthisannexasregisteredidentifiers,abbreviatedasRIDs.

D.2 Procedure for the request of an MPEG-4 registered identifier value

Requestersof anMPEG‐4code‐pointsasdetailedabovevalue to identify aprivatedata format shallapply to the Registration Authority. Registration forms shall be available from the RegistrationAuthority.Therequestershallprovide the informationspecified inD.4.Companiesandorganizationsareeligibletoapply.

D.3 Responsibilities of the Registration Authority

TheprimaryresponsibilitiesoftheRegistrationAuthorityadministratingtheregistrationoftheprivatedataformatidentifiersareoutlinedinthisannex;certainotherresponsibilitiesmaybefoundintheJTC1Directives.TheRegistrationAuthorityshall:

a) implement a registration procedure for application for a unique RID in accordancewith theJTC1Directives;

b) receiveandprocesstheapplicationsforallocationofanidentifierfromapplicationproviders;

c) ascertainwhichapplicationsreceivedareinaccordancewiththisregistrationprocedure,andtoinformtherequesterwithin30daysofreceiptoftheapplicationoftheirassignedRID;

d) informapplicationproviderswhoserequestisdeniedinwritingwith30daysofreceiptoftheapplication,andtoconsiderresubmissionsoftheapplicationinatimelymanner;

e) maintainanaccurateregisteroftheallocatedidentifiers.RevisionstoformatspecificationsshallbeacceptedandmaintainedbytheRegistrationAuthority;

f) make thecontentsof this registeravailableuponrequest toNationalBodiesof JTC1 thataremembersofISOorIEC,toliaisonorganizationsofISOorIECandtoanyinterestedparty;

g) maintain a data base of RID request forms, granted and denied. Parties seeking technicalinformationontheformatofprivatedatawhichhasaRIDshallhaveaccesstosuchinformationwhichispartofthedatabasemaintainedbytheRegistrationAuthority;

h) report its activities annually to JTC1, the ITTF, and the SC29 Secretariat, or their respectivedesignees;and

i) accommodatetheuseofexistingRIDswheneverpossible.

D.4 Contact information for the Registration Authority

AppleComputerInc.

OneInfiniteLoop,M/S301‐4B

Page 206: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

192 ©ISO/IEC2015–Allrightsreserved

Cupertino,California95014USAE‐mail:[email protected]:http://www.mp4ra.org/

D.5 Responsibilities of Parties Requesting a RID

Thepartyrequestingaformatidentifiershall:

a) applyusingtheFormandproceduressuppliedbytheRegistrationAuthority;

b) include a description of the purpose of the registered identifier, and the required technicaldetailsasspecifiedintheapplicationform;

c) providecontactinformationdescribinghowacompletedescriptioncanbeobtainedonanon‐discriminatorybasis;

d) agreetoinstitutetheintendeduseofthegrantedRIDwithinareasonabletimeframe;and

e) tomaintainapermanentrecordoftheapplicationformandthenotificationreceivedfromtheRegistrationAuthorityofagrantedRID.

D.6 Appeal Procedure for Denied Applications

TheRegistrationManagementGroupisformedtohavejurisdictionoverappealstodeniedrequestforaRID.TheRMGshallhaveamembershipwho isnominatedbyP‐andL‐membersof the ISO technicalcommitteeresponsibleforISO/IEC14496.Itshallhaveaconvenorandsecretariatnominatedfromitsmembers.TheRegistrationAuthorityisentitledtonominateonenon‐votingobservingmember.

TheresponsibilitiesoftheRMGshallbe:

a) toreviewandactonallappealswithinareasonabletimeframe;

b) toinform,inwriting,organizationswhichmakeanappealforreconsiderationofitspetitionoftheRMGsdispositionofthematter;

c) toreviewtheannualreportoftheRegistrationAuthoritiessummaryofactivities;and

d) tosupplyMemberBodiesof ISOandNationalCommitteesof IECwithinformationconcerningthescopeofoperationoftheRegistrationAuthority.

D.7 Registration Application Form

D.7.1 Contact Information of organization requesting a RID

OrganizationName:

Address:

Page 207: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 193

Telephone:

Fax:

E‐mail:

Telex:

D.7.2 Request for a specific RID

NOTE—Ifthesystemhasalreadybeenimplementedandisinuse,fillinthisitemanditemD.7.3andskiptoD.7.5,otherwiseleavethisspaceblankandskiptoD.7.3)

D.7.3 Short description of RID that is in use and date system was implemented

D.7.4 Statement of an intention to apply the assigned RID

D.7.5 Date of intended implementation of the RID

D.7.6 Authorized representative

Name:

Title:

Address:

Email:

Signature__________________________________

Page 208: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

194 ©ISO/IEC2015–Allrightsreserved

D.7.7 For official use of the Registration Authority

Attachment1Attachmentoftechnicaldetailsoftheregistereddataformat.

Attachment2Attachmentofnotificationofappealprocedureforrejectedapplications.

RegistrationRejected_____

Reasonforrejectionoftheapplication:

RegistrationGranted RegistrationValue____________________

Page 209: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 195

Annex E (normative)

File format brands

E.1 Introduction

Thepresenceofabrandinthecompatible_brandslistoftheftypboxisaclaimandapermission.It isaclaimthatthefileconformstoalltherequirementsofthatbrand,andapermissiontoareaderimplementingpotentiallyonlythatbrandtoreadthefile.

In general, readersare required to implement all featuresdocumented forabrandunlessoneof thefollowingapplies:

a) themediatheyareusingdoesnotuseorrequireafeature:forexample,I‐framevideodoesnotneedasyncsampletable,andifcompositionre‐orderingisnotused,thennocompositiontimeoffset table is needed; similarly, if content protection is not needed, then support for thestructuresofcontentprotectionisnotrequired.

b) anotherspecificationwithwhichthefileisconformantforbidstheuseofafeature(forexample,somederivedspecificationsexplicitlyforbiduseofmoviefragments);

c) the context in which the product operatesmeans that some structures are not relevant; forexample, hint track structures are only relevant to products preparing content for, orperforming,filedelivery(suchasstreaming)fortheprotocolinthehinttrack.

The following sections list the brands defined in this specification; no inheritance is implied by thesectionorder–wheninheritanceoccurs,itisspecificallystated.Otherbrandsmaybedefinedinotherspecifications.Notethatifonebrandisasubsetofanother(e.g.,‘isom’requirementsareasubsetofthe‘iso2’requirements)then:

a) fileslabelledascompatiblewiththesubsetcanalwaysbelabelledasalsocompatiblewiththesuperset;afilecompatiblewith‘isom’canalwaysbelabelledascompatiblewith‘iso2’;

b) productssupportingthesupersetautomaticallycansupportthesubset;aproductthatsupports‘iso2’alsonecessarilysupports‘isom’.

Nobrandsdefinedhererequiresupportforanyparticularmediatype(e.g.,video,audio,meta‐data)ormedia encoding (e.g., a particular codec), or structures supporting a specificmedia type (e.g., VisualSampleEntriesortheboxescontainedinaspecifickindofsampleentry).

Morespecificidentifierscanbeusedtoidentifypreciseversionsofspecificationsprovidingmoredetail.These brands should not be used as the major brand; this base file format should be derived intoanother specification to be used. There is therefore no defined normal file extension, ormime typeassigned to thesebrands,nordefinitionof theminor versionwhenoneof thesebrands is themajorbrand.

Page 210: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

196 ©ISO/IEC2015–Allrightsreserved

E.2 The ‘isom’ brand

Thetype‘isom’(ISOBaseMediafile)isdefinedinthissectionofthisspecification,asidentifyingfilesthatconformtothefirstversionofISOBaseMediaFileFormat.

Supportforthefollowingstructuralboxesisrequired:

moov container for all the meta-data mvhd movie header, overall declarations trak container for an individual track or stream tkhd track header, overall information about the track tref track reference container edts edit list container elst an edit list mdia container for the media information in a track mdhd media header, overall information about the media hdlr handler, at this level, the media (handler) type minf media information container vmhd video media header, overall information (video track only) smhd sound media header, overall information (sound track only) hmhd hint media header, overall information (hint track only) <mpeg> mpeg stream headers dinf data information atom, container dref data reference atom, declares source(s) of media in track stbl sample table atom, container for the time/space map stts (decoding) time-to-sample ctts composition time-to-sample table stss sync (key, I-frame) sample map stsd sample descriptions (codec types, initialization etc.) stsz sample sizes (framing) stsc sample-to-chunk, partial data-offset information stco chunk offset, partial data-offset information co64 64-bit chunk offset stsh shadow sync stdp degradation priority mdat Media data container free free space skip free space udta user-data, copyright etc. ftyp file type and compatibility stz2 compact sample sizes (framing) padb sample padding bits mvex movie extends box mehd movie extends header box trex track extends defaults moof movie fragment mfhd movie fragment header traf track fragment tfhd track fragment header trun track fragment run mfra movie fragment random access tfra track fragment random access mfro movie fragment random access offset

Hinttracksmustberecognized,andinhinttracks,RTPprotocolhinttracks.

Page 211: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 197

NotethatsomerequirementsoftheTrackHeaderBoxdonotapplytothisbrand;seesub‐clause8.3.2.1.

Supportforonlyversion0ofthe‘ctts’boxisrequiredhere;version1supportisnotrequired.

Supportforonlyversion0ofthe‘trun’boxisrequiredhere;version1supportisnotrequired.

NOTE Thedefault‐base‐is‐moofflag(8.8.7.1)cannotbesetwhereafileismarkedwiththisbrand.

E.3 The ‘avc1’ brand

Thebrand‘avc1’ shall beused to indicate that the file is conformantwith the ‘AVCExtensions’ insubclauses8.6.4and8.9.Ifusedwithoutotherbrands,thisimpliesthatsupportforthoseextensionsisrequired.Theuseof‘avc1’ asamajor‐brandmaybepermittedbyspecifications; in that case, thatspecificationdefinesthefileextensionandrequiredbehaviour.

The‘avc1’brandrequiressupportforthe‘isom’brand.Inaddition,supportofthefollowingboxesisrequired:

sdtp independent and disposable samples sbgp sample-to-group sgpd sample group description

Withinthesamplegroups,supportforrollgroups(groupingtype‘roll’)isrequired.

NOTE Thedefault‐base‐is‐moofflag(8.8.7.1)cannotbesetwhereafileismarkedwiththisbrand.

NotethatsomerequirementsoftheTrackHeaderBoxdonotapplytothisbrand;seesub‐clause8.3.2.1.

Supportforonlyversion0ofthe‘ctts’boxisrequiredhere;version1supportisnotrequired.

Supportforonlyversion0ofthe‘trun’boxisrequiredhere;version1supportisnotrequired.

SupportofSampleGroupDescriptionboxesinmoviefragmentsisnotrequired.

E.4 The ‘iso2’ brand

Thebrand‘iso2’shallbeusedtoindicatecompatibilitywiththesecondversionoftheISOBaseMediaFileFormat;itmaybeusedinadditiontoorinsteadofthe‘isom’brandandthesameusagerulesapply.If used without the brand 'isom' identifying the first version of this specification, it indicates thatsupportforsomeorallofthetechnologyinsubclauses8.6.4,8.8.15,8.11.1through8.11.7,8.11.10,0,ortheSRTPsupportinsubclause9.1,isrequired.

The‘iso2’brandrequiressupportforallfeaturesofthe‘avc1’brand.

Inaddition,supportforthefollowingboxesisrequired:

pdin progressive download information subs sub-sample information meta metadata iloc item location ipro item protection sinf protection scheme information box

Page 212: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

198 ©ISO/IEC2015–Allrightsreserved

frma original format box schm scheme type box schi scheme information box iinf item information (version field set to 0) xml XML container bxml binary XML container pitm primary item reference

In the context ofRTPhint tracks, SRTPhint tracksmust nowbe recognized. Contentprotection andgeneralizedmeta‐databoxessupportisrequired.

Only support for version 0 of the item information box, and version 0 of the item location box, isrequired.

NotethatsomerequirementsoftheTrackHeaderBoxdonotapplytothisbrand;seesub‐clause8.3.2.1.

Supportforonlyversion0ofthe‘ctts’boxisrequiredhere;version1supportisnotrequired.

Supportforonlyversion0ofthe‘trun’boxisrequiredhere;version1supportisnotrequired.

SupportforSampleGroupDescriptionboxesinmoviefragmentsisnotrequired.

NOTE Thedefault‐base‐is‐moofflag(8.8.7.1)cannotbesetwhereafileismarkedwiththisbrand.

Supportforonly16‐bititem_IDanditem_countvaluesin‘meta’boxisrequiredhere;32‐bititem_IDanditem_countvaluesin‘meta’boxisnotrequired

Supportfor‘meta’boxinmoviefragmentsisnotrequired Supportforonly‘subs’boxpertrackisrequiredhere

E.5 The ‘mp71’ brand

IfaMeta‐boxwithanMPEG‐7handlertypeisusedatthefilelevel,thenthebrand‘mp71’shouldbeamemberofthecompatible‐brandslistinthefile‐typebox.

E.6 The ‘iso3’ brand

Thebrand‘iso3’requiressupportforallfeaturesofthe‘iso2’brand.

Inaddition,supportforthefollowingisrequired:

fiin file delivery item information paen partition entry fpar file partition fecr FEC reservoir segr file delivery session group gitn group id to name meco additional metadata container mere metabox relation

Supportforversion0andversion1oftheiteminformationboxisrequired.Withinthesamplegroups,support for rate share information (grouping type ‘rash’) is required. File delivery hint tracks(sampleentry‘fdp ’)mustberecognized.

Page 213: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 199

Supportforonlyversion0ofthe‘ctts’boxisrequiredhere;version1supportisnotrequired.

Supportforonlyversion0ofthe‘trun’boxisrequiredhere;version1supportisnotrequired.

SupportforSampleGroupDescriptionboxesinmoviefragmentsisnotrequired.

Onlysupportforversion0oftheitemlocationbox,isrequired.

NOTE Thedefault‐base‐is‐moofflag(8.8.7.1)cannotbesetwhereafileismarkedwiththisbrand.

Supportforonly16‐bititem_IDanditem_countvaluesin‘meta’boxisrequiredhere;32‐bititem_IDanditem_countvaluesin‘meta’boxisnotrequired

Supportfor‘meta’boxinmoviefragmentsisnotrequired Supportforonly‘subs’boxpertrackisrequiredhere

E.7 The ‘iso4’ brand

Thebrand‘iso4’requiressupportforallfeaturesofthe‘iso3’brand.

Supportforversion1ofthecompositionoffset(‘ctts’and‘iloc’)boxesisrequiredunderthisbrand.

Support forversion1of the item locationbox,version2of the item infobox,and thenew itemdata(‘idat’)anditemreference(‘iref’)boxesisrequired.

Inaddition,supportforthefollowingisrequired:

trgr track grouping indication cslg composition to decode timeline mapping idat item data iref item reference

Supportforonlyversion0ofthe‘trun’boxisrequiredhere;version1supportisnotrequired.

SupportforSampleGroupDescriptionboxesinmoviefragmentsisnotrequired.

NOTE Thedefault‐base‐is‐moofflag(8.8.7.1)cannotbesetwhereafileismarkedwiththisbrand.

Supportforonly16‐bititem_IDanditem_countvaluesin‘meta’boxisrequiredhere;32‐bititem_IDanditem_countvaluesin‘meta’boxisnotrequired

Supportfor‘meta’boxinmoviefragmentsisnotrequired Supportforonly‘subs’boxpertrackisrequiredhere Supportforonly32‐bitvaluesin‘cslg’boxisrequired.here;64‐bitvaluesin‘cslg’boxisnot

required.

E.8 The ‘iso5’ brand

Thebrand‘iso5’requiressupportforallfeaturesofthe‘iso4’brand.

Supportforthedefault‐base‐is‐moofflagisrequiredunderthisbrand.

Page 214: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

200 ©ISO/IEC2015–Allrightsreserved

Processingofrestrictedsampleentries(i.e.‘resv’)isrequiredunderthisbrand.

Supportforonlyversion0ofthe‘trun’boxisrequiredhere;version1supportisnotrequired.

SupportforSampleGroupDescriptionboxesinmoviefragmentsisnotrequired.

Supportforonly16‐bititem_IDanditem_countvaluesin‘meta’boxisrequiredhere;32‐bititem_IDanditem_countvaluesin‘meta’boxisnotrequired

Supportfor‘meta’boxinmoviefragmentsisnotrequired Supportforonly‘subs’boxpertrackisrequiredhere Supportforonly32‐bitvaluesin‘cslg’boxisrequired.here;64‐bitvaluesin‘cslg’boxisnot

required.

E.9 The ‘iso6’ brand

Thebrand‘iso6’requiressupportforallfeaturesofthe‘iso5’brand.

Supportforthefollowingboxesisrequiredunderthisbrand:

saiz sample auxiliary information sizes saio sample auxiliary information offsets tfdt track fragment decode time styp segment type sidx segment index ssix subsegment index prft producer reference time

Supportforthefollowingisrequiredunderthisbrand:

SampleGroupDescriptionboxesinmoviefragments;

Signedcompositionoffsetsintrackrunboxes(i.e.version1oftrackrunboxes);

Withinthesamplegroups,support forrandomaccesspoint information(groupingtype‘rap ’)isrequired.

Supportforonly16‐bititem_IDanditem_countvaluesin‘meta’boxisrequiredhere;32‐bititem_IDanditem_countvaluesin‘meta’boxisnotrequired

Supportfor‘meta’boxinmoviefragmentsisnotrequired Supportforonly‘subs’boxpertrackisrequiredhere Supportforonly32‐bitvaluesin‘cslg’boxisrequired.here;64‐bitvaluesin‘cslg’boxisnot

required.

E.10 The ‘iso7’ brand

Thebrand‘iso7’requiressupportforallfeaturesofthe‘iso6’brand.

Supportforthefollowingboxesisrequiredunderthisbrand:

Page 215: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 201

trep track extension properties assp alternative startup sequence properties

Supportforthefollowingisrequiredunderthisbrand:

Supportfor32‐bititem_IDanditem_countvaluesin‘meta’box Recognizingincompletetracks. Supportfor‘meta’boxinmoviefragmentsisnotrequired Supportforonly‘subs’boxpertrackisrequiredhere Supportforonly32‐bitvaluesin‘cslg’boxisrequired.here;64‐bitvaluesin‘cslg’boxisnot

required.

E.11 The ‘iso8’ brand

Thebrand‘iso8’requiressupportforallfeaturesofthe‘iso7’brand.

Supportforthefollowingboxesisrequiredunderthisbrand:

sthd subtitle media header, overall information (subtitle track only)

Supportforthefollowingisrequiredunderthisbrand:

Supportfor‘meta’boxinmoviefragments Supportforoneormore‘subs’boxpertrack Supportforonly32‐bitvaluesin‘cslg’boxisrequired.here;64‐bitvaluesin‘cslg’boxisnot

required.

E.12 The ‘iso9’ brand

Thebrand‘iso9’requiressupportforallfeaturesofthe‘iso8’brand.

Supportforthefollowingboxesisrequiredunderthisbrand:

elng extended language tag

Supportforthefollowingisrequiredunderthisbrand:

Supportfor64‐bitvaluesin‘cslg’box;

Page 216: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

202 ©ISO/IEC2015–Allrightsreserved

Annex F(void)

Page 217: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 203

Annex G (informative)

URI-labelled metadata forms

G.1 UUID-labelled metadata

The formatof theURI forUUID‐labelledmetadata isdefined in IETFRFC4122:AUniversallyUniqueIDentifier(UUID)URNNamespace(July2005).

There are no general statements about the form of the primarymetadata, the initialization data fortemporalmetadata, or the temporalmetadata itself. The formof all of these depends on thepreciseUUIDanditsdefinition.

NotethatUUIDscannoteasilybetracedtotheirpointoforigin,andsotheymaybeunsuitable if it isdesiredthatrecipientsofmetadatabeabletofind,ifneeded,theassociateddocumentation.

If traceability is needed, then a standardizedmetadata framework, such asMPEG‐7, or a registeredframework,suchasSMPTE,orade‐referencableURLshouldbeused.

G.2 ISO OID-labelled metadata

The formatof theURI forOID‐labelledmetadata isdefined inRFC3061:AURNNamespaceofObjectIdentifiers(February2001).

There are no general statements about the form of the primarymetadata, the initialization data fortemporalmetadata, or the temporalmetadata itself. The formof all of these depends on thepreciseobjectidentifieranditsdefinition.

A number of more specific labelling systems can also be expressed as object identifiers. The morespecificUUIDformshouldbeused.

Object identifiers starting {joint‐iso‐itu(2) uuid(25)} (i.e. starting urn:oid:2.25) should not be used;UUIDURIsshouldbeuseddirectly.

Object identifiers starting {iso(1) identified‐organizations(3) SMPTE(52)metadata‐dictionary(1)} (i.e.urn:oid:1.3.52.1) should not be used, nor should any other OID being used as a label according toSMPTE298Mor336M;themorespecificSMPTEURIformshouldbeused.

Object Identifiers are registered to specific organizations, and so it may be possible to identify theorganization owning a particular identifier. However, some sections of the object identifier tree aredelegatedtounregistereduses(suchasUUIDs,asnotedabove),andtraceabilityisthenlost.

If traceability is needed, then a standardizedmetadata framework, such asMPEG‐7, or a registeredframework,suchasSMPTE,orade‐referencableURLshouldbeused.

Page 218: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

204 ©ISO/IEC2015–Allrightsreserved

G.3 SMPTE-labelled metadata

TheformatoftheURIforSMPTE‐labelledmetadataisinRFC5119;AUniformResourceName(URN)NamespacefortheSocietyofMotionPictureandTelevisionEngineers(SMPTE).

Theprimarymetadata is exactly the value (V)partof aKLV (key, length, value) triplet asdefined inSMPTE336M,withthekeybeingthelabelgivenintheURN,andthelength(L)beingderivedfromtheitemlength.

Similarly,eachtemporalmetadatasampleisthevalue(V)partofaKLV,wherethekeyistheURNlabelgiveninthematchingsampleentry,andthelength(L)isderivedfromthesamplesize(asgiveninthesamplesizeorcompactsamplesizetables).

Theinitializationdatamaybepresent.Itcontainsthekey(K)andvalue(V)ofaKLVthatprovidesaninitializationcontextfortheKLVsformedfromthesamples,withthelength(L)beingderivedfromtheDataBoxsize.Thefirst16bytesareaSMPTElabeloftheinitializationdata,storedasdefinedinSMPTE336M,followedbythedata.

Thetypicalvalueofthesebytes,asdefinedinSMPTE377M,is‘primerpack’(inhexadecimal):060E2B34 02050101 0D010201 01050100. If the labelof the initializationdatadoesnot, in fact,identifyastructuregivingcontextinformation(suchasaprimerpack),thebehaviourisundefined.Thisenableseachsampletobealocalset.Therulesfortheconstructionoflocalsets,asdefinedinSMPTE377M,mustbefollowed.

SMPTE377Muses locators to locateotherresourcesoutside themetadata itself.Forstaticmetadata,theseshouldusetheitemlocationboxinthemeta‐box.Fortemporalmetadata,externalpointersmaybeuseddirectly.

The initialization data may be absent, and the label then identifies a specific metadata item (e.g. ageographiclocator)notneedingacontext.

Page 219: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 205

Annex H (informative)

Processing of RTP streams and reception hint tracks

H.1 Introduction

H.1.1 Overview

This Annex provides recommendations for recording of RTP streams and the use of recorded RTPstreamsforplaybackandre‐sending.

H.1.2 Structure

ThisAnnexisorganizedasfollows:

- H.2introducesthepotentialsourceswhytheplaybackofRTPstreamsmightbecomeunsynchronizedandprovidesanoverviewhowpropersynchronizationisfacilitatedinrecordingandplayback.ItprecedestheotherClauses,becauseboththerecordingunitandtheplayerhavetotakeactionstoachievepropersynchronization.

- H.3providesrecommendationsforstoringRTPstreams.

- H.4providesrecommendationshowtoplayfilescontainingrecordedRTPstreams.

- H.5providesrecommendationsforre‐sendingreceivedRTPstreamsstoredinfilesasdescribedinH.3.

H.1.3 Terms and definitions

Forthepurposesofthisannex,thefollowingtermsanddefinitionsapply.

H.1.3.1 player entitythatparsesafile,decodesatleastasubsetofthetracksinthefile,andrendersthedecodedtracks

H.1.3.2 recording unit entitythatreceivesoneormorepacketstreamsofencapsulatedandcompressedmediaandstoresthereceivedmediaintoafile

H.1.3.3 re-sending unit entitythatparsesafilecontainingmediathatoriginatesfromoneormorereceivedpacketstreamsofencapsulatedandcompressedmediaandtransmitsatleastasubsetofthemediastoredinthefile

H.2 Synchronization of RTP streams

ThereareseveralpotentialsourcesofunsynchronizedplaybackforreceivedRTPstreams.WhenRTPstreams are recorded as RTP reception hint tracks, the necessary information for guaranteeingsynchronized playback is also recorded. When RTP streams are recorded as media tracks, thesynchronizationoftheplaybackofthemediatrackshastobeguaranteedbycreatingthecompositiontimesof themediasamplesappropriately.Thefollowing listdescribesthesourcesofunsynchronized

Page 220: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

206 ©ISO/IEC2015–Allrightsreserved

playbackforreceivedRTPstreams,summarizestherecommendedsynchronizationmeans,andpointstotherelevantClausesforfurtherinformation.

1. The RTP timestamp of the first packet of the stream has a random offset. Hence, the RTPtimestampsof twostreamsareshiftedbythedifferenceof their initialrandomoffsetseven ifthe potentially different clock rate of the RTP timestamps of the different streams werecompensated.Therandomoffsetshouldbereflectedinthevalueoftheoffsetfieldofthe'tsro'boxofthereferredreceptionhintsampleentryasdescribedinH.3.5.

2. The first received and recorded packet of the different streams may not have an identicalplaybacktimeasdiscussedinH.3.2.TheunequalstarttimeofthedifferentrecordedstreamsiscompensatedbyparsingoneormoreRTCPSenderReportstoderivetheplaybacktimeasthewallclocktimeofthesenderandcreatinganinitialoffsetoftheplaybackusingtheEditListboxasdescribedinH.3.2.TheEditListboxisinterpretedbytheplayerasdescribedin0.

3. ThereisnoguaranteethattheclockforproducingtheRTPtimestampsofacertainRTPstreamruns at the samepace as thewallclock time of the sender,which is used to create theRTCPSenderReports.Forexample,theRTPtimestampsmaybegeneratedonthebasisofaconstantsamplingfrequency,e.g.44.1kHzforaudio,andhencegovernedbytheclockrateoftheaudiocapturing hardware. However, the RTCP Sender Reports may be generated according to thesystem clock running at a different pace than the clock of the audio capturing hardware.Moreover, theclockused togenerateRTPtimestamps foraudiomightrunatadifferentpacethantheclockusedtogenerateRTPtimestampsforvideo(whenbothanormalizedtothesameclocktickfrequency).

Asimilarproblemintheplayerarisesiftheclockpacingtheoutputofadecodedstreamrunsatadifferentpacethanthewallclockoftheplayerortheclockspacingtherenderingofdifferentdecodedstreamsarenotsynchronized.

The recommended approach for all these potential problems of clocks running at a differentpaceistouseRTCPSenderReportstoaligntheRTPtimestampsofdifferentstreamsontothesamewallclocktimeline,whichisusedforinter‐streamsynchronization.Thisalignmentcanbedone while recording the streams by modifying the representation of the recorded RTPtimestampsorwhileplayingtherecordedstreamsbyusingtherecordedRTCPSenderReportsasdescribedinH.3.6.Moreover,itisrecommendedtopacetheplaybackaccordingtotheaudioplayoutrateasdescribedin0.

4. Thewallclockofthesendermayrunatadifferentpacethanthewallclockoftheplayer.

Itisrecommendedtoplayarecordedprogramatthepaceofthewallclockoftheplayerandtouse the audio playout clock as thewallclock of the player. Consequently, the audio timescaledoesnottypicallyhavetobemodified.Evenifthewallclockoftheplayerranatadifferentpacethanthewallclockofthesender,itistypicallyunnoticeable.

Pacingoftheoutputofdecodedmediasamplesisdescribedin0.

H.3 Recording of RTP streams

H.3.1 Introduction

RecordingofRTPstreamscanresultintothreebasicfilestructures.

Page 221: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 207

1. AfilecontainingonlyRTPreceptionhinttracks.Nomediatracksareincluded.Thisfilestructureenablesefficientprocessingofpacketlosses,butonlyplayerscapableofparsingRTPreceptionhinttrackscanplaythefile.

2. Afilecontainingonlymediatracks.NoRTPreceptionhinttracksareincluded.Thisfilestructureallows existing players compliantwith the earlier versions of the ISO basemedia file formatprocessrecordedfilesaslongasthemediaformatsarealsosupported.However,sophisticatedprocessing of transmission errors is not possible due to reasons explained in subsequentclauses.

3. AfilecontainingbothRTPreceptionhinttracksandmediatracks.ThisfilestructurehasboththebenefitsmentionedaboveandshouldbeusedwhenforasgoodinteroperabilityaspossiblewithotherfileformatsderivedfromtheISObasemediafileformat.

IfanRTPstreambeingrecordedisprotected,aprotectedRTPreceptionhinttrackisusedinsteadofanRTPreceptionhint track,while theoperationof therecordingunit remainsunchangedotherwise.Atthe timeofplayback, thedata included in theprotectedRTPreceptionhint track isunprotected firstandthenprocessedsimilarlytoaconventionalunprotectedRTPstream.Alternatively,theRTPstreammaybeunprotectedbeforestoringitasaRTPreceptionhinttrack,butthencarehastobetakenthattherightstousethecontentintheprotectedRTPstreamareobeyed.

Someoftherecordingoperationsarecommonforallthethreefilestructures,whileothersdiffer.TableH.1indicateswhichrecordingoperationsarerequiredforthebasicfilestructures.

Page 222: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

208 ©ISO/IEC2015–Allrightsreserved

Table H.1

FilecontainingonlyRTPreceptionhinttracks

Filecontainingonlymediatracks

FilecontainingbothRTPreceptionhinttracksandmediatracks

CompensationforunequalstartingpositionofreceivedRTPstreams(H.3.2)

no,whenRTCPreceptionhinttracksarestored;yes,otherwise

yes no,whenRTCPreceptionhinttracksarestored;yes,otherwise

RecordingofSDP(H.3.3)

yes no yes,forRTPreceptionhinttracksonly

CreationofasamplewithinanRTPreceptionhinttrack(H.3.4)

yes no yes,forRTPreceptionhinttracksonly

RepresentationofRTPtimestamps(H.3.5)

yes no yes,forRTPreceptionhinttracksonly

Recordingoperationstofacilitateinter‐streamsynchronizationinplayback(H.3.6)

yes yes,thecompositiontimesofmediatracksshouldbecompensatedasdescribedinH.3.6.3

yes

Representationofreceptiontimes(H.3.7)

yes no yes,forRTPreceptionhinttracksonly

Creationofmediasamples(H.3.8)

no yes yes,formediatracksonly

Creationofhintsamplesreferringtomediasamples(H.3.9)

no no yes

Page 223: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 209

Some implementations may record first to RTP reception hint tracks only and create a file with acombinationofmediatracksandRTPreceptionhinttracksoff‐line.

H.3.2 Compensation for unequal starting for position of received RTP streams

When the recording of RTP streams is started, it can happen that the presentation time of the firstmedia sample in one RTP stream is not equal to the presentation time of the firstmedia sample inanotherRTPstreamatleastduetothefollowingreasons:

- Thesamplingfrequencyofaudioandvideotypicallydiffer.

- Audio and video streamsmay not be perfectly interleaved in terms of presentation times intransmissionorder.

IfRTCPreceptionhinttracksarestored,thecompensationforunequalstartingpositionofreceivedRTPstreams should be done at playback time andnoEdit List box concerningRTP receptionhint tracksshould be created. If RTCP reception hint tracks are not stored or if media tracks are stored it isessential that the recording unit indicates the relative initial delay of the streams in order tosynchronize audio and video correctly at the beginning of the playback of the streams as describedsubsequentlyinthisClause.Therecordingunitshouldperformthefollowingoperations.

1. AnRTCPSenderReport indicateswhichRTP timestampcorresponds to thewallclock timeofthetimeinstantthereportwassent.AtleastthefirstRTCPSenderReportforeachRTPstreamshouldbeparsedinordertoestablishanequivalenceofanRTPtimestampofeachRTPstreamandawallclocktimeofthesender.ThewallclocktimestampoftheearliestreceivedRTPpacket,inpresentationorder,isderivedforeachRTPstreambysimplelinearextrapolation.

2. ThesmallestwallclocktimestampderivedaboveamongallthereceivedRTPstreamsismappedto presentation timestamp zero in the movie timeline, i.e., is presented immediately at thebeginningoftheplaybackoftherecordedfile.Themovietimelineisthemastertimelinefortheplaybackofthefile.

3. Themediatimelineforeachtrackstartsfrom0.Inordertoshiftthemediatimelinetoacorrectstartingpositioninthemovietimeline,anEditboxandanEditListboxarecreatedforeachoftheotherRTPtracks(whichdonotcontainapackethavingtheearliestwallclocktimestamp)asfollows:

TheEditListboxcontainstwoentries:

a) The first entry is an empty edit (indicated bymedia_time equal to ‐1), and its duration(segment_duration) is equal to the difference of the presentation times of the earliestmedia sample among all the RTP streams and the earliest media sample of the track.FigureH.1presentsanexampleofhowthesegment_durationofthefirstentryinanEditListboxisderived.

b) Thevalueofmedia_timeofthesecondentryisequaltothecompositiontimeoftheearliestsample in presentation order, and the value of segment_duration of the second entryspansovertheentiretrack.Astheactualdurationof thetrackmightnotbeknownatthetimeofcreatingtheEditListbox,itisrecommendedtosetthesegment_durationequaltothemaximumpossiblevalue(eitherthemaximum32‐bitunsignedintegerorthemaximum64‐bitunsignedinteger,dependingonwhichversionoftheboxisused).

Thevalueofmedia_rate_integerisequalto1inboththeentriesoftheEditListbox.

Page 224: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

210 ©ISO/IEC2015–Allrightsreserved

1staudio sam

ple

1stvid

eo sample

Figure H.1 — An example of an Edit List box to compensate the unequal starting of the received RTP streams, segment_duration is copied to the first entry of the Edit List box

Somerecordingunitsmaydetectpacketsfromwhichdecodingcanbestarted,suchasIDRpicturesofH.264/AVCstreams,whichareherereferredtoasrandomaccesspoints.Ifastreamcontainsapackethavingtheearliestwallclocktimestampamongallthereceivedstreamsandthesamestreamcontainspacketspreceding, indecodingorder, the firstrandomaccesspointof thestream, it is recommendednottostorethepacketsprecedingthefirstrandomaccesspointofthestreamandnottoconsiderthemwhendeterminingtheearliestwallclocktimestampamongallthereceivedstreams.

H.3.3 Recording of SDP

TheSDPshouldbestoredasfollows.Session‐levelSDP,i.e.,alllinesbeforethefirstmedia‐specificline(“m=”line),shouldbestoredasMovieSDPinformationwithintheUserDatabox,asspecifiedin9.1.4.1.Eachmedia‐levelsectionwithintheSDPdescriptionstartswithan'm='lineandcontinuestothenextmedia‐level section or the end of thewhole session description. Eachmedia‐level section should bestored as Track SDP informationwithin the User Data box of the corresponding RTP reception hinttrack.

H.3.4 Creation of a sample within an RTP reception hint track

It is recommended that each sample represents all received RTP packets that have the same RTPtimestamp,i.e.,consecutivepacketsinRTPsequencenumberorderwithacommonRTPtimestamp.TheRTPsamplestructureissettocontainoneRTPpacketstructurepereachreceivedRTPpackethavingthesameRTPtimestamp.EachRTPpacketisrecommendedtocontainonepacketconstructoroftype2(RTPsampleconstructor). AnRTPsampleconstructor copies a particular byte range, indicated bythe sampleoffset and length fields of the constructor, of a particular sample, indicated by thesamplenumber field of the constructor, by reference into the packet payload being constructed. Thepayload of each received RTP packet having the same RTP timestamp is copied to the extradatasectionofthesample.Thetrackreferenceofeachconstructorissettopointtothehinttrackitself,i.e.,

Page 225: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 211

issetequalto‐1,andsampleoffsetandlengtharesettomatchtothelocationandsizeofthepacketpayloadwithinthesample.

FigureH.2presentsapseudo‐codeexampleofanRTPreceptionhintsample,whichcontainstwoRTPpackets.

Figure H.2 — An example of a RTP reception hint sample containing two packets (their header and payload)

The use of an error occurrence indexing event to indicate an RTP packet loss is not recommended,becausetheRTPsequenceseedfieldcanbeusedfordetectingpacketlosseswithoutanyincreaseinthestoragespace.Furthermore,theminimumunittheerroroccurrenceeventcanrefertoisasample(inanRTPreceptionhinttrack).Sinceasamplecancontainmanypackets,itisambiguouswhichonesofthesepacketstheerroroccurrenceindexingeventconcerns.

H.3.5 Representation of RTP timestamps

RTPtimestampsarerepresentedinaRTPreceptionhinttrackbyasumofthreevalues,oneofwhichisthedecodingtimeDTinthemediatimelineofthetrack.Thedecodingtimeisrun‐lengthcodedintothe

Page 226: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

212 ©ISO/IEC2015–Allrightsreserved

DecodingTimetoSampleboxandadditionallytooneormoreTrackFragmentRunboxes,ifasampleresides inamovie fragment.TheDecodingTime toSamplebox includesanumberofsample_countand sample_delta pairs, where sample_delta is the decoding time increment (i.e., the sampleduration in terms of decoding time) for each sample in a set of consecutive samples, the number ofwhichequalstosample_count.TheTrackFragmentRunboxindicatesonepairofsample_countandsample_duration, where sample_duration is the decoding time increment (i.e., the sampleduration) for each sample in a set of consecutive samples, the number of which equals tosample_count. Each Track Fragment box can contain a number of Track Fragment Run boxes. Thedecoding timeDT(i) for sample number i is derived by summing up the sample durations of all thesamplesprecedingsampleifromtheDecodingTimetoSampleboxand,ifneeded,theTrackFragmentRunboxesreferringtoanysampleprecedingsamplei.

TheRTPtimestampforsamplei,RTPTS(i),isrepresentedbyasumofthreevaluesspecifiedasfollows:

RTPTS(i) = (DT(i) + tsro.offset + offset) mod 232 (H.1)

wheretsro.offsetisthevalueofoffsetinthe'tsro'boxofthereferredreceptionhintsampleentryandoffsetisthevalueincludedinthertpoffsetTLVboxintheRTPpacketstructure,andmodisthemodulooperation.

A'tsro'boxshouldbepresentinRTPreceptionhintsampleentries.Thevalueofoffsetinany'tsro'boxofatrackshouldbeequaltotheRTPtimestampofthefirstpacketoftherespectivestreaminRTPsequencenumberorder.

Providedthatnowrap‐aroundoftheRTPtimestampvaluesoverthemaximum32‐bitunsignedintegerhappenedbetweensample i‐1and i, thedifferencebetweenconsecutiveunequalRTP timestamps, inRTPsequencenumberorder,is

RTPTS_DIFF(i) = RTPTS(i) – RTPTS(i – 1) for any i > 1 (H.2)

RTPTS_DIFF(i) remains unchanged, when the frame rate is constant, the number of frames in anypacketisconstant,andthetransmissionorderisthesameasthepresentationorder.Theseconstraintsare typicallymet by audio streamsand temporallynon‐scalable video streams. IfRTPTS_DIFF(i) is aconstant denoted asRTPTS_DIFF, the following is recommended. The value ofsample_delta in theDecodingTimetoSampleboxand,ifmoviefragmentsareused,thevalueofsample_durationintheTrackFragmentRunboxorboxesaresettoRTPTS_DIFF,whichresultsintocompactDecodingTimetoSample and Track Fragment Run boxes. ThertpoffsetTLV box should not be usedwithin the RTPreceptionhintsamples,ifRTCPreceptionhinttracksareused(seeH.3.6).Otherwise(ifRTCPreceptionhinttracksarenotused),offsetinthertpoffsetTLVboxshouldbesetto0.

Whentemporalscalabilityisusedinavideostream,thetransmissionorderandtheplaybackorderofpacketsarenotidentical,RTPtimestampsdonotincreaseasafunctionofRTPsequencenumber,andRTPTS_DIFF(i) is not constant. However, RTP timestamps typically have a constant behaviour inperiodsdeterminedbytheGOP_size,whichisoneplusthenumberofpicturesbetweentwoconsecutivepicturesinthelowesttemporallevelinRTPsequencenumberorder.Forexample,iftwonon‐referencepicturesarecodedforeachpairofreferencepicturesasillustratedinFigureH.3,GOP_sizeisequalto3.FigureH.4presentsanexampleofahierarchicallytemporallyscalablebitstreamwithGOP_sizeequalto4.

Page 227: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 213

Figure H.3 — An example of a temporally scalable bitstream with GOP_size equal to 3

(RTPsequencenumbers(SN)arenormalizedtostartfrom0,andonepacketperframeisassumed.RTPtimestamps(TS)arenormalizedtostartfrom0andindicatedasclocktickslastingoneframeinterval.InterpredictionarrowsareindicatedforthefirstGOPonly,whilepicturesinotherGOPsarepredictedsimilarly.)

Figure H.4 — An example of a hierarchically temporally scalable bitstream with GOP_size equal to 4

(RTPsequencenumbers(SN)arenormalizedtostartfrom0,andonepacketperframeisassumed.RTPtimestamps(TS)arenormalizedtostartfrom0andindicatedasclocktickslastingoneframeinterval.)

TheRTPtimestampincrementcausedbyoneGOPisderivedasfollows,whennowrap‐aroundoftheRTPtimestampvaluesoverthemaximum32‐bitunsignedintegerhappenedbetweensampleiandi+GOP_size,inclusive:

RTPTS_GOP_DIFF(i) = RTPTS(i + GOP_size) – RTPTS(i) (H.3)

IfRTPTS_GOP_DIFF(i)isaconstantequaltoRTPTS_GOP_DIFF,whennosamplei,i+1,…,i+GOP_sizeisapicturestartingaso‐calledclosedgroupofpictures,suchasanIDRpictureofH.264/AVCstreams,thefollowing is recommended. The value ofsample_delta in theDecoding Time to Sample Box and, ifmoviefragmentsareused,thevalueofsample_durationintheTrackFragmentRunboxorboxesareset to RTPTS_GOP_DIFF / GOP_size. The rtpoffsetTLV box should not be used for pictures in thelowesttemporallevel,ifRTCPreceptionhinttracksareused(seeH.3.6).Otherwise(ifRTCPreceptionhinttracksarenotused),offsetinthertpoffsetTLVboxshouldbesetto0.ThevalueofoffsetinthertpoffsetTLVboxshouldbesetforpicturesinothertemporallevelstosuchthatFormulaH.(1)is

Page 228: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

214 ©ISO/IEC2015–Allrightsreserved

fulfilled.FigureH.5indicateshowthedecodingtimeandoffsetaresetforahierarchicallytemporallyscalablevideobitstreampresentedinFigureH.4.

IDR

B

B

P

B B

B

B

P

...

...0

1

Temporal level

0 43 2 61 7 8 5DT ...

RTP TS 0 31 2 64 5 7 8 ...(x clock tick of one frame interval)

2 ...

offset 0 -1-2 0 03 -2 -1 3 ...

Figure H.5 — An example of setting the decoding time (DT) and the value of offset in the rtpoffsetTLV box of a hierarchically temporally scalable bitstream with GOP_size equal to 4.

(Inthisexample,thedecodingtimeincrementbetweensamplesissetequaltoRTPTS_GOP_DIFF/GOP_sizetohaveacompactencodingdecodingtimes.ThevalueofoffsetinthertpoffsetTLVboxisadjustedforeachsampletostorearepresentationoftheRTPtimestamp.Forthisillustration,RTPtimestampsanddecodingtimesarenormalizedtostartfrom0andindicatedasclocktickslastingoneframeinterval.)

IfnolinearandperiodicalbehaviourofRTPtimestampsisdetectedfromthereceivedpackets,andnotworeceivedpacketsofdifferentsampleshavethesamereceptiontime, itisrecommendedtosetthevalueofsample_deltaintheDecodingTimetoSampleBoxand,ifmoviefragmentsareused,thevalueofsample_duration in theTrackFragmentRunboxorboxestorepresentthereceptiontimeof thefirstpacketofthesample.Thatis,thederiveddecodingtimeDT(i)shouldbeequaltothereceptiontimeofthefirstpacketofthesamplesubtractedbythereceptiontimeofthefirstpacketofthefirstreceivedsampleofthestream.

It isnotedthatcompositiontimestampsarenotexplicitly indicatedinthefile forsamplesinanyhinttracks.Consequently,forRTPreceptionhinttracks,thecompositiontimestampsareinferredfromtheinformationrelated theRTPtimestamps indicated in thestoredpacketstream.ForanRTPreceptionhinttrackthatisnotassociatedwithanRTCPreceptionhinttrack,thecompositiontimeofareceivedRTPpacketisinferredtobethesumofthesampletimeDT(i)andthevalueoftheoffsetfieldinthertpoffsetTLV box including the sample. For anRTP receptionhint track that is associatedwith anRTCP reception hint track, the composition time is inferred as follows. Let the received RTP packethaving the earliest RTP timestamp within the same track have composition time equal to 0. AnyremainingRTPpackethasacompositiontimeequaltotheRTPtimestampdifferenceofthepresentRTPpacketandtheearliestRTPpacketinpresentationorderwithclockdriftcorrectionsimilartoH.3.6.3.Thecompositiontimereferstothemediatimelineofthetrack.

H.3.6 Recording operations to facilitate inter-stream synchronization in playback

H.3.6.1 General

Lipsynchronization,i.e.,correctsynchronizationbetweenrecordedRTPstreams,duringplaybackcanbefacilitatedatleastwiththefollowingtwomeans:

Page 229: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 215

1. AnRTCPreceptionhinttrackisgeneratedforeachRTPreceptionhinttrack.ThepotentialclockdriftbetweentheRTPtimestampclocksofdifferentstreamsiscorrectedatthetimewhenthefileisparsedandthemediastreamsincludedinthefilearedecodedandplayed.TheclockdriftcorrectionisdonesimilarlytoaswouldbedoneforRTPstreamsthatarereceivedandplayedsimultaneously. Thismode of operation is straightforward for the recording units. However,accessingafilefromanexactplaybackpositionmightbemorecumbersome,becauseitrequirescompensationoftheclockdriftofalltherecordedstreamsatthetimeoftheaccess.

2. The potential clock drift between recorded RTP streams is corrected by modifying the RTPtimestampsofoneormorerecordedstreams.ThismodeofoperationisrequiresprocessingofRTCPSenderReportsatthetimeofrecordingandishencemoretediousfortherecordingunitsthan creation of RTCP reception hint tracks. However, the operation of the player isstraightforward.

Recordingunitsshouldusethetimestampsynchronybox[9.4.1.2]toindicatewhichlipsynchronizationapproach has been used. The timestamp synchrony box includes the timestamp_sync field.timestamp_sync equal to 1 indicates that players should use RTCP reception hint tracks for lipsynchronization. timestamp_sync equal to 2 indicates that players should use compositiontimestampsforlipsynchronization.

Some implementations may create RTCP reception hint tracks first during the real‐time recordingoperation and then compensate the clock drift by modifying RTP timestamps as an off‐line post‐processingstep.

Thefollowingclausesprovidemoredetailsaboutbothapproaches.

H.3.6.2 Facilitating lip synchronization based on RTCP Sender Reports

A recording unit stores all RTCP Sender Reports for a particular RTP stream as samples in therespectiveRTCPreceptionhinttrack.

H.3.6.3 Compensating clock drift in timestamps

It is not recommended to modify the RTP timestamps of the recorded audio streams. Such amodification would cause an audio timescale modification in the player, which is a non‐trivialoperation.

TherecordedrepresentationoftheRTPtimestampsofthevideoandothernon‐audiostreamsshouldbemodifiedusingthefollowingprocedure.

1. First, the wallclock timestamp a of a video frame is derived from the RTP timestampcorresponding to the video frame as a sum of the wallclock timestamp of the previous videoframeandthedifferenceoftheRTPtimestampsofthecurrentandpreviousvideoframesintheunitsofthewallclocktimeline.

2. Second, the playback timeb for the video frameon thewallclock time is derivedbasedon theRTCPSenderReports.IfnoRTCPSenderReportthatexactlyindicatesthewallclocktimeforthevideoframeisavailable,thewallclocktimecanbeextrapolatedassumingthattherateatwhichthe RTP timestamp clock and the sender wallclock in RTCP Sender Reports deviates staysunchanged.

Page 230: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

216 ©ISO/IEC2015–Allrightsreserved

3. Third, based on the RTCP Sender Reports for audio, the audio RTP timestamp that is playedsimultaneouslywiththevideoframeattimebofthewallclocktimelineisderived.ThereneednotbeanaudioframehavingexactlythederivedaudioRTPtimestamp.Thewallclocktimestampcofanaudiosample iscalculated fromthederivedaudioRTPtimestampasasumof thewallclocktimestampoftheprecedingaudioframeandthedifferenceoftheRTPtimestampsofthederivedaudioRTPtimestampandtheRTPtimestampoftheprecedingaudioframe.

Thedifferencebetweenaandc,ifany,shouldbecompensatedinthefieldsthatrepresentthevideoRTPtimestampinthefile.Inpractice,theeasiestwaymightbetoaddthedifferencetotheoffsetfieldinthertpoffsetTLVbox,whichisillustratedinFigureH.6.Theotheroption,rewritingtheDecodingTimetoSampleboxandtheTrackFragmentRunboxes(ifany),mightbemorecumbersometoimplement,becauseofparticularwayofcodingthesampletimesbyacombinationofsamplecountsanddurations,andmightrequiremorestoragespacetoo.

Figure H.6 — An example of correcting the lip synchronization in the RTP timestamp representation

H.3.7 Representation of reception times

Asspecifiedin9.4.1.4,thereceptiontimeofapacketisindicatedbythesumofthedecodingtimeofthesample containing the packet and the value of relative_time of the RTPpacket structure of thepacket.

Thereception timeof theearliest receivedRTPpacket shouldbezero,and thereception timesofallsubsequentpacketsshouldberelativetothereceptiontimeoftheearliestreceivedRTPpacket.

The clock source for the reception time is undefined andmay be, for instance, the wallclock of thereceiver. If the rangeof reception timesofa receptionhint trackoverlapsentirelyorpartlywith therangeofreceptiontimesofanotherreceptionhinttrack,theclocksourcesforthesehinttracksshallbethesame.

The reception time of a packet should correspond to the time instantwhen the protocol stack layerunderneathRTP,typicallyUDP,outputsthepacket.

Page 231: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 217

H.3.8 Creation of media samples

Mediasamplesarecreated fromthereceivedRTPpacketsas instructedby therelevantRTPpayloadspecificationandRTPitself.However,mostmediacodingstandardsonlyspecifythedecodingoferror‐freestreamsandconsequently itshouldbeensuredthatthecontent inmediatrackscanbecorrectlydecodedbyanystandard‐compliantmediadecoder.Handlingoftransmissionerrorsthereforerequirestwo steps: detection of transmission errors and inference of samples that can be decoded correctly.Thesestepsaredescribedinthesubsequentparagraphs.

LostRTPpacketscanbedetectedfromagapinRTPsequencenumbervalues.RTPpacketscontainingbit errors are usually not forwarded to the application as their UDP checksum fails and packets arediscardedintheprotocolstackofthereceiver.Consequently,bit‐erroneouspacketsareusuallytreatedaspacketlossesinthereceiver.

Theinferenceofmediasamplesthatcanbecorrectlydecodeddependsonthemediacodingformatandisthereforenotdescribedhereindetails.Generally,inter‐samplepredictionisweakornon‐existinginaudiocodingformats,whereasmostvideocodingformatsutilizeinterpredictionheavily.Consequently,alostsampleinmanyaudioformatscanoftenbereplacedbyasilentorerror‐concealedaudiosample.Itshouldbeanalyzedwhetheralossofavideopacketconcernedanon‐referencepictureorareferencepicture, or,more generally, inwhich level of the temporal scalability hierarchy the loss occurred. Itshouldthenbeconcludedwhichpicturesmaynotbecorrectlydecodable.Forexample,alossofanon‐reference picture does not affect the decoding of any other pictures, whereas a loss of a referencepictureinthebasetemporalleveltypicallyaffectsallpicturesuntilthenextpictureforrandomaccess,suchasanIDRpictureinH.264/AVC.Videotracksmustnotcontainanysamplesdependentonanylostvideosample.

H.3.9 Creation of hint samples referring to media samples

Media samples are created from the receivedRTP packets as explained inH.3.8. RTP reception hinttracks are created as explained inH.3.4, but the contentsof theRTPpacket structuredependon theexistenceofthecorrespondingmediasampleasfollows.

IfthepacketpayloadofthereceivedRTPpacketisrepresentedinamediatrack,thetrackreferenceoftherelevantpacketconstructorsaresettopointtothemediatrackandincludethepacketpayloadbyreference.Itisnotrecommendedtohaveacopyofthepacketpayloadintheextradatasectionofthereceived RTP sample in order to save storage space and make file editing operations easier toimplement.

IfthepacketpayloadofthereceivedRTPpacketisnotrepresentedinamediatrack,theinstanceoftheRTPpacketstructureiscreatedasexplainedinH.3.4.

H.4 Playing of recorded RTP streams

H.4.1 Introduction

ThisClausedescribesoperationsrequiredforplaybackofafilecontainingrecordedRTPstreams.Itisorganizedasfollows:

Page 232: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

218 ©ISO/IEC2015–Allrightsreserved

- Before RTP streams can be played, the contents of the files should be analyzed. Particularly,alternative tracks representing the samemedia stream should be identified andone of thesetracksshouldbeselectedfordecodingandplayback.Thecodingformatshouldbedetectedinordertoconcludeupfrontthatitcanbedecodedbytheplayer.ThesepreparationoperationsaredescribedinmoredetailsinH.4.2.

- IfanRTPreceptionhinttrackisbeingprocessed,thereareafewthingstobetakenintoaccountasdescribedin0.Forexample,packetlossesshouldbedetectedandhandledappropriately.

- Thesynchronizationofthedecodedmediasamplesshouldbehandledproperlyasdescribedin0.

- IftheRTPstreamsstoredinafileareaccessedfromapositionotherthanthebeginningofthestreams, proper inter‐stream synchronization and decoder initialization are needed asdescribedinH.4.5.

H.4.2 Preparation for the playback

In the preparation phase for playback, the player selects which tracks are played. The basic trackstructure of the file is parsed first. The tracks are grouped according towhich alternate group theybelong to. Tracks that belong to the same alternate group are indicated by the same value ofalternate_groupinthetrackheaderbox.Onetrackfromeachalternategroupisselectedforplaybackasfollows.

If there is anRTP receptionhint track in thealternate group, it ispreferred forplayback,because itcontains an entire representation of the received RTP stream, unlikemedia tracks derived from thereceivedRTPstreams,whichmightusesuchsubsetofthereceivedRTPpacketsthatcanbedecodedbyanystandard‐compliantdecoderwithoutcapabilityforhandlingpacketlosses.

The compatibility of the playerwith the selected track shouldbe ensured. For example, it should beexaminedwhetherthecodec,theprofile,andthelevelusedinthetrackaresuchthattheplayerisabletosupport.

The codec, profile, and level used for the coded bitstream in an RTP reception hint track can beconcludedfromtheSDPdescriptionoftheRTPstream.TheSDPdescriptionsarestoredinthemovie‐level indextrack. IfSDP isunchangedthroughoutthe file, itmaybeadditionallystoredasMovieSDPinformationandTrackSDPinformationwithinUserDataboxes.IfTrackSDPinformationispresent,itmay be parsed to find out the codec, profile, and level used for the bitstream contained in the RTPreceptionhinttrack.IfMovieSDPinformationorTrackSDPinformationisnotpresent,themove‐levelindextrackistraversedtofindandparseeachSDPindexand,consequently,thecodec,profile,andlevelusedforthebitstreamcontainedintheRTPreceptionhinttrack.

IfnoRTPreceptionhint trackexists inanalternategroup, thesampleentryorsampleentriesof themedia tracks in thealternategroupshouldbeexamined to findoutwhichonesof themtheplayer isabletosupport.

H.4.3 Decoding of a sample within an RTP reception hint track

TheoriginalRTPpacketsmaybereconstructedfromanRTPreceptionhintsamplebycreatingtheRTPpacket header from the RTPpacket structures and by resolving the constructors of the RTPpacket

Page 233: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 219

structures.Hence,oneapproachforfileplayerstoprocessRTPreceptionhinttracksistore‐createthepacketstreamthatwasreceivedandprocessthere‐createdpacketstreamasifitwasnewlyreceived.

Therelative_timefieldincludedintheRTPpacketstructuremaybeusedtoscheduletheinsertionofthe packet into the buffer for the RTP receiver. However, it may be more advisable to modify thedecodingprocessofrecordedRTPstreamssuchamannerthatthedecoderoutputbuffersarekeptasfullaspossible inorder toavoid interruptionsor jerkyplaybackcausedby latepacketsoroccasionalproblemsinreal‐timedecodinginsystemsrunningotherprocessesinadditiontotheplayer.

PacketlossesshouldbedetectedfromgapsintheRTPsequencenumber.Thereactiontopacketlossesdependsontheparticularmediadecoderimplementationandmayalsodependonuserpreferences.

H.4.4 Lip synchronization

Thefollowingstepsarerequiredforachievingcorrectsynchronizationbetweenstreams:

1. Inter‐tracksynchronizationatthestartoftheplayback.

Thestartingpositionofthemediatimelineofatrackmaybeshiftedinthemovietimelineofthefileasdescribedinthefollowingtwoparagraphs.

ForamediatrackandanRTPreceptionhinttrackthatisnotassociatedwithanRTCPreceptionhint track,anEditListboxshouldbeused toshift thestartingpositionof themedia timelinewithin themovetimelineasdescribed inH.3.2.Themedia timelinesof the tracksselected forplayback are mapped to the movie timeline by parsing the Edit List boxes of the tracks, ifpresent. The playback of each media track and each RTP reception hint track that is notassociatedwithanRTCPreceptionhinttrackstartsatthemovietimelinepositionindicatedintheEditListboxof the trackor from thebeginningof themovie timeline, if noEdit Listboxexistsforthetrack.

ForRTPreceptionhint tracks thatareassociatedwith respectiveRTCPreceptionhint tracks,theshiftingofthestartingpositionofthemediatimelinewithinthemovietimelineisinferredasfollows.ThemediatimelineoftheRTPreceptionhinttrackcontainingtheearliestRTPpacket(inpresentationtimeonthesenderwallclocktimeline)amongallRTPreceptionhinttracksisnotshiftedwithinthemovietimeline(i.e.,startsattime0onthemovietimeline).ThestartingtimeofthemediatimelineoftheanyotherRTPreceptionhinttrackisequaltothetimestampdifferenceoftheearliestRTPpacketsofthepresenttrackandthetrackcontainingtheearliestRTPpacketamongallRTPreceptionhinttracks.

2. ReconstructionofRTPtimestampsandcompositiontimesonthemediatimeline(H.3.5).

3. CorrectionofRTPtimestampsandcompositiontimesbasedonRTCPSenderReports,ifRTCPreceptionhinttracksareused.

ThecorrectionisdonesimilarlytowhatisdescribedinH.3.6.3.However,insteadofaddingthedifferencebetweentimesaandcintotherepresentationoftheRTPtimestampsinthefile,thedifference is addedduring theplayback to thepresentation timesof the video frameson themovietimeline.

Page 234: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

220 ©ISO/IEC2015–Allrightsreserved

4. Pacingtheoutputofthedecodedmediasamples.

Itisrecommendedtoplayarecordedprogramatthepaceofthewallclockoftheplayerandtousetheaudioplayoutclockasthewallclockoftheplayer.Theaudioplaybackisarrangedtobecontinuous at the native sampling frequency of the audio signal. A presentation clock of theplayerrunsatthepaceoftheaudioplayback,i.e.,itsvalueisalwaysequaltothe(thenumberofthemostfrequentuncompressedaudiosamplethatwasplayedout)×(samplingfrequencyoftheaudiosignal).Theplaybackofthevideotrack(andpotentialothercontinuousmediatracks)issynchronizedtothepresentationclockoftheplayer.Inotherwords,whenthepresentationclock of the playermeets the composition time of a video sample on themovie timeline, thevideosampleisplayedout.

Onlyifafilebeingsimultaneouslyrecordedandplayedbackandifthereceiverwallclocksrunsfaster than the sender wallclock, pacing the playback according to the rate of the receiverwallclockmightnotberecommendedandsynchronizingtherateofthereceiverwallclocktotherateofthesenderwallclockmaybedoneasfollows.

Thepaceofthesenderclockisrecoveredbycreatingarelationshipbetweenthereceptiontimes(accordingtothereceiverclock)andtherespectivewallclocktimestampsofthesender,whicharereconstructedfromRTCPSenderReports.Itisrecommendedtousetheaudioplayoutclockas the receiver clock. As the delay in the network and in the receiver may be varying, therelation between the reception times and the respective timestamps of the sender should beaveragedoveralargenumberofreceivedpackets.Atimescalemultiplicationfactorisconcludedas a result of the averaging of the relation between the reception times and the respectivetimestampsofthesender.

A presentation time on a timeline of the receiver clock is derived for each sample. If RTCPreceptionhinttracksareinuse,thepresentationtimeisthecompositiontimeofthesampleonthemovie timeline, also includingclockdrift correctionasdescribed instep3above. IfRTCPreceptionhinttracksarenotinuse,thepresentationtimeisdirectlythecompositiontimeofthesampleonthemovietimeline.Then,forplaybackpurposesonly,thepresentationtimesofthesamplesinalltracksbeingplayedshouldbemultipliedbythetimescalemultiplicationfactor.

Time stretching of the signal should be done accordingly. Samples are played out at theirpresentationtimes.

Inpractice, the timescalemultiplication factor and themapping from theRTP timeline to thewallclockofthesender(step3above)maybeimplementedasasingleoperation.

H.4.5 Random access

Random access refers to a non‐linear access to the media streams represented in the file. In otherwords, in a random access operation the file is accessed from another sample than thatwhichwaspreviouslyplayedorthefileisinitiallyaccessedfromapositionthatisnotthebeginningofthemovietimeline.

Page 235: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 221

Itisrecommendedtoprovidetherandomaccessfunctionalitytotheuserrelativetothemovietimelineof the file rather thananyother timelines, suchas thesenderwallclock timeline.Byusing themovietimelineasthebasis,thenumberofstepsforarandomaccessoperationiskeptlow.

First,itisderivedwhichmediaframesareatadesiredrandomaccessposition(orclosesttoit,iftherearenoneexactlyatthedesiredrandomaccessposition).Inthecaseofmediatracks,RTPreceptionhinttracks for audio, and any RTP reception hint tracks having the timestamp_sync field equal to 2(indicating pre‐compensated lip synchronization), the media frame closest to the desired randomaccesspositioncanbedirectlyderivedbasedonthecompositiontimestamps(onthemediatimeline)shiftedbytheinitialstartingpositionindicatedintheEditListbox,ifany.Inthecaseofnon‐audioRTPreceptionhinttrackshavingthetimestamp_syncfieldequalto1(indicatingtheuseofRTCPreceptionhint tracks), thepresentation times of samples shouldbederived asdescribed in 0, until the closestpresentationtimetothedesiredrandomaccesspositionisfound.

Second,decodingofmanymediabitstreamscanbestartedonlyfromframesofaparticulartype,suchan IDR picture of H.264/AVC. Player implementations may therefore have different approaches,includingthefollowing:

1. Discover the closest frame at or preceding the desired random access position from whichdecoding can be started, start decoding from that frame, and start rendering only from thedesired random access point. This approach may imply some processing delay before therenderingisstarted.

2. Start decoding and rendering at or after the desired random access point using the earliestframe fromwhichdecodingcanbestarted.Typically, audioplaybackwouldstartearlier thanvideoplayback,buttheprocessingdelaybeforetherendering isstartedissmallerthaninthepreviousoption.

H.5 Re-sending recorded RTP streams

H.5.1 Introduction

Itmaybeadesirableoperationtore‐sendtheRTPstreamsthathavebeenrecordedearliertoafile.Forexample,ifRTPstreamsarereceivedthroughabroadcastorstreamingserviceandrecordedintoafile,itmaybedesirabletore‐sendthemfromonedevicetoanotherdeviceinahomeenvironmentusingaWLANconnection.ThisClauseprovidesrecommendationsforre‐sendingofrecordedRTPstreams.

AcommunicationsystembasedonRTPincludesasourceendpoint(a.k.a.,asender)andadestinationendpoint(a.k.a.,areceiver)andmaycontainoneormoremixersandtranslators.ThesenderandthereceiveraretheendpointsoftheRTPandRTCPsessions.ThebehaviourofRTPtranslatorsandmixersisspecifiedinRFC3550andclarifiedinRFC5117.Ingeneral,therecordingunitreceivingRTPstreamsandstoring them intoa file actsasadestinationendpoint, anda re‐sendingunit readingstoredRTPstreamsfromafileandsendingthemactsasasource.Typically,thepayloadsofthere‐sentRTPstreamarenotmodified,whichmakesacombinationofarecordingunitandare‐sendingunitactingsimilarlytoatransporttranslatorasdescribedinRFC5117.However,theessentialcharacteristicofatranslatoristhatreceiverscannotdetectitspresence.Consequently,acombinationofarecordingunitandare‐sendingunitcannotactasa transport translator,unlessre‐sendinghappenssimultaneouslywith therecordingoftheoriginalstreams.Asthiscaseisconsideredrare,thediscussioninthisClauseregardsa

Page 236: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

222 ©ISO/IEC2015–Allrightsreserved

recordingunitasadestinationterminatingtheoriginalRTPandRTCPsessionsandare‐sendingunitasasourceofnewRTPandRTCPsessions.

ThisClauseisorganizedasfollows:

- H.5.2includesrecommendationshowtocomposeRTPpacketsfromRTPreceptionhinttracksandhowtoschedulethetransmissionoftheRTPpackets.

- H.5.3discusseshowRTCPpacketsshouldbegeneratedandhowreceivedRTCPpacketsshouldbeprocessed.

H.5.2 Re-sending RTP packets

Thepacketsarerecommendedtobeconstructedandtransmittedasfollows.

Thepacketpayloadsarerecommendedtobeconstructedaccording to theconstructorsstored in thereceptionhinttrack,i.e.,thepacketpayloadsarerecommendedtobeidenticaltothosereceived,unlessadifferentpacketsizeiscrucialforthenetworktowhichthepacketsarere‐sent.

- Thevaluesof theheader fieldsfortheRTPpacketscreatedassuggestedbyanRTPreceptionhint track should be kept the same as in the respective RTPpacket structure except for thefollowingcases:

- The initial RTP timestamp offset and the RTP sequence number offset should be selectedrandomly regardless of the values stored in the offset field of the 'tsro' box of the referredreception hint sample entry or the values of the RTPsequenceseed field of the RTPpacketstructureofanyforanyofthepacketsoftherespectiveRTPreceptionhinttrack.

- ThevalueoftheRTPtimestampfieldshouldbeasumoftherandominitialoffset,thevalueofoffsetintheRTPpacketstructure,andthedecodingtimeoftherespectiveRTPsample.Ifthesumexceedsthemaximumunsigned32‐bitinteger,itshouldbewrappedover.

- TherelativeincrementsoftheRTPsequencenumbershouldbethesameasthoserecordedinthe values of the RTPsequenceseed fields. Consequently, if there was a packet loss in thestream that was recorded, the stream that is re‐sent also has a respective gap in the RTPsequencenumber,andthereceiverisabletodeduceapacketloss.

- ThevalueoftheCSRCcountfieldshouldalwaysbezero,becausenocontributingsourcesofthepreviousRTPsessionthatwasrecordedareactivelymodifyingthestreamsfortheRTPsessionfor the streambeing re‐sent.The source identifier space (forbothSSRCandCSRC) is sessionspecific. Consequently, the CSRC list of the RTP header should be empty regardless of thepotentially stored CSRC values for the received streams, which are included in thereceivedCSRCTLVboxintheRTPpacketstructure.

- The value of the payload type fieldmay be dynamically selected depending on the signallingschemeinuse.

- The value of the SSRC field should be randomly selected and potential collisions should behandled as specified inRFC3550. The SSRCvalueof a received streammaybe stored in theReceivedSsrcBoxof thereferredreceptionhintsampleentrybut itshouldbe ignoredwhenthestreamisre‐sent.

- TherecordedRTPheaderextensions, stored inrtphdrextTLV in theRTPpacket structure, ifany,shouldbere‐sentonly if there‐sendingunitcanverifythattheyarevalid forthere‐sent

Page 237: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 223

stream. If the re‐sending unit is not able to parse the semantics of the recordedRTPheaderextensions,theyshouldnotbere‐sent.

Thereceptiontimeofapacket,representedbythesumofthedecodingtimeoftheRTPreceptionhintsamplecontainingthepacketandthevalueoftherelative_timeoftheRTPpacketstructure,equalstothetransmissiontimeofthepacketwithaskewcausedbythetransmissiondelayandtheprocessingdelay in the protocol stack of the receiver. The skew of adjacent packetsmight not be equal due totransmission delay jitter and varying processing delay. Moreover, the protocol stack used whenreceivingthestreammightdifferfromtheprotocolstackusedforre‐sendingthestream.Duetothesereasons, the reception times areoftennot applicable as such topace the transmissionof the re‐sentpackets.Inallcases,there‐sendingunitshouldverifythatthere‐sentpacketstreamcomplieswiththebufferingmodelinuse, ifany.Ifthere‐sendingunitcanconcludethatthenetworkenvironmentsandprotocolstacksusedwhenreceivingthestreamandwhenre‐sendingtherecordedstreamaresimilar,reception timesmay be used as a basis for scheduling the packet transmission. The re‐sending unitshouldmakeanefforttoremoveorconcealthetransmissiondelayjitterintherecordedstream.Ifthere‐sendingunit isunable toconclude that thenetworkenvironmentsandprotocol stacksusedwhenreceivingthestreamandwhenre‐sendingtherecordedstreamaresimilarorisuncertainwhichkindofpacketschedulingisappropriate,itmayusethedecodingtimeasthebasisforscheduling.

H.5.3 RTCP Processing

RTCPSenderReportsandotherRTCPmessagesareregeneratedfollowingtheconstraintsspecifiedinRFC3550ratherthandirectlyusingtheRTCPmessagesrecordedinRTCPreceptionhinttracks,ifany.

AnRTCPSenderReportcontainsthewallclocktimewhenthereportwassentandtheRTPtimestampcorrespondingtothesametimeastheindicatedwallclocktime.TheRTPtimestampforanRTCPSenderReportisgeneratedasfollows.ApresentationtimeonatimelineofareferenceclockisderivedforthesamplecorrespondingtheindicatedwallclocktimeintheRTCPSenderReport.Thereferenceclockmaybe the wallclock of the re‐sending unit initialized to 0 at the beginning of the session. The samplecorrespondingtotheindicatedwallclocktimemightnotexistinthecorrespondingRTPreceptionhinttrack,becausethesamplinginstantsofthesamplesintheRTPreceptionhinttracksmightnotmatchwith the transmission instantsof theRTCPSenderReports.However,as instructedbyRFC3550, theRTPtimestampisderivedas if therewasasample in theRTPstreamcorrespondingto the indicatedwallclocktime.TheRTPtimestampforanRTCPSenderReportshouldbelinearlyinterpolatedfromtheRTP timestampsof the samples immediatelyprecedingand following thewallclock time indicated intheRTCP SenderReport. In order to conclude the samples immediately preceding and following thewallclocktimeindicatedintheRTCPSenderReport,presentationtimesonthetimelineofthereferenceclock should be derived until the closest samples are discovered. If RTCP reception hint tracks arepresentfortheRTPreceptionhinttrackbeingre‐sent,thepresentationtimeisthecompositiontimeofthe sampleon themovie timeline, also including clockdrift correction asdescribed in step3of 0. IfRTCPreceptionhinttracksarenotpresent,thepresentationtimeisdirectlythecompositiontimeofthesampleonthemovietimeline.

WhenhandlingthereceivedRTCPReceiverReports,itshouldbenoticedthatthereportedcumulativenumber of packets lost includes also the unsent packets that were never originally received andcorrespondtothegapsintheRTPsequencenumberintheRTPreceptionhinttracks.Anycongestionmanagement,retransmission,orotherpacketlossresiliencemethodshouldtakethisintoaccount.

Page 238: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

224 ©ISO/IEC2015–Allrightsreserved

Annex I (normative)

Stream Access Points

I.1 Introduction

ThisAnnexdefinesaStreamAccessPoint(SAP)andspecifiessixtypesofSAPs.

AStreamAccessPoint(SAP)enablesrandomaccess intoacontainerofmediastream(s).Acontainermay contain more than one media stream, each being an encoded version of continuous media ofcertainmediatype.ASAPisapositioninacontainerenablingplaybackofanidentifiedmediastreamtobe started using only (a) the information contained in the container starting from that positiononwards,and(b)possibleinitialisationdatafromotherpart(s)ofthecontainer,orexternallyavailable.DerivedspecificationsshouldspecifyifinitialisationdataisneededtoaccessthecontainerataSAP,andhowtheinitialisationdatacanbeaccessed.

ASAPforlayeredmediamayapplytoallthelayers,aparticularsetoflayers,oronlyasinglelayerinamediastream.WhenaSAPappliestoasetoflayersthatuseinterpredictionfromalayerthatisnotamemberoftheset,theremaybeanindicationiftheSAPrequiresthecorrectdecodingofthereferencelayer.

WhenSAPs areusedwith layeredmedia, derived specifications should specify orprovidesmeans toindicatewhichlayersSAPsapplytoandwhetherSAPsrequirecorrectdecodingofthereferencelayer.

I.2 SAP properties

I.2.1 General

ForeachSAPtheproperties,ISAP,TSAP,ISAU,TDEC,TEPT,andTPTFareidentifiedanddefinedas:

TSAP is the earliest presentation timeof any access unit of themedia stream such that all accessunits of themedia streamwith presentation time greater than or equal to TSAP can be correctlydecodedusingdataintheBitstreamstartingatISAPandnodatabeforeISAP.

ISAP is the greatest position in the Bitstream such that all access units of themedia streamwithpresentation time greater than or equal to TSAP can be correctly decoded using Bitstream datastartingatISAPandnodatabeforeISAP.

ISAU is thestartingposition intheBitstreamof the latestaccessunit indecodingorderwithinthemediastreamsuchthatallaccessunitsofthemediastreamwithpresentationtimegreaterthanorequal toTSAP canbe correctlydecodedusing this latest accessunit and accessunits following indecodingorderandnoaccessunitsearlierindecodingorder.

NOTE ISAUisalwaysgreaterthanorequaltoISAP.

Page 239: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 225

TDEC istheearliestpresentationtimeofanyaccessunitof themediastreamthatcanbecorrectlydecodedusingdataintheBitstreamstartingatISAUandnodatabeforeISAU.

TEPT istheearliestpresentationtimeofanyaccessunitofthemediastreamstartingatISAU intheBitstream.

TPTF is thepresentationtimeof the firstaccessunitof themediastreamindecodingorder intheBitstreamstartingatISAU.

For the purposes of these definitions, the SAP is the access unit that is described as located at ISAUand/orISAP.

Note–Thedistinctionbetween ISAUand ISAP isonlyneeded todistinguishbetween referringdirectly to theaccessunit,andreferringtoitscontainingstructure.

I.2.2 SAP properties for layers

ThefollowingpropertiesapplytolayeredmediastreamsforwhichSAPsareindicatedforoneormorelayers,referredtoasthetarget layers. Inthefollowingproperties,anaccess‐unitpartitionreferstoaunit that contains the codeddata of a single time instance for the target layers, and amedia streampartitionreferstoasequenceofaccess‐unitpartitionofthetargetlayersindecodingorder.

Whenthetargetlayerscoverallthelayersofamediastream,thefollowingpropertiesareequivalenttothoseinI.2.1.

ForeachSAPtheproperties,ISAP,TSAP,ISAU,TDEC,TEPT,andTPTFareidentifiedanddefinedas:

TSAPistheearliestpresentationtimeofanyaccess‐unitpartitionsofthetargetlayerssuchthatallaccess‐unitpartitionsof target layerswithpresentationtimegreaterthanorequal toTSAPcanbecorrectlydecodedusingdatainthemediastreampartitionstartingatISAPandnodatabeforeISAP.

ISAPisthegreatestpositioninthecontainerofthemediastreampartitionsuchthatallaccess‐unitpartitionofthetargetlayerswithpresentationtimegreaterthanorequaltoTSAPcanbecorrectlydecodedusingdataofthemediastreampartitionstartingatISAPandnodatabeforeISAP.

ISAU is the starting position, in the media stream partition, of the latest access‐unit partition indecodingordersuchthatallaccess‐unitpartitionofthetargetlayerswithpresentationtimegreaterthanorequaltoTSAPcanbecorrectlydecodedusingthislatestaccess‐unitpartitionandaccess‐unitpartitionsfollowingindecodingorderandnoaccess‐unitpartitionearlierindecodingorder.

NOTE ISAUisalwaysgreaterthanorequaltoISAP.

TDEC is the earliestpresentation timeof any access‐unitpartitionof the target layers that canbecorrectlydecodedusingdatainthemediastreampartitionstartingatISAUandnodatabeforeISAU.

TEPTistheearliestpresentationtimeofanyaccess‐unitpartitionofthetargetlayersstartingatISAUinthemediastreampartition.

Page 240: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

226 ©ISO/IEC2015–Allrightsreserved

TPTFisthepresentationtimeofthefirstaccess‐unitpartitionofthetargetlayersindecodingorderinthemediastreampartitionstartingatISAU.

I.3 SAP types

SixtypesofSAPsaredefinedwithpropertiesasfollows:

Type1:TEPT=TDEC=TSAP=TPTF

Type2:TEPT=TDEC=TSAP<TPTF

Type3:TEPT<TDEC=TSAP<=TPTF

Type4:TEPT<=TPTF<TDEC=TSAP

Type5:TEPT=TDEC<TSAP

Type6:TEPT<TDEC<TSAP

NOTE ThetypeofSAPisdependentonlyonwhichAccessUnitsarecorrectlydecodableandtheirarrangementinpresentationorder.Thetypesinformallycorrespondwithsomecommonterms:

Type1correspondstowhatisknowninsomecodingschemesasa“ClosedGoPrandomaccesspoint”(inwhichall accessunits, indecodingorder, starting from ISAP canbe correctlydecoded, resulting in a continuoustimesequenceofcorrectlydecodedaccessunitswithnogaps)andinadditiontheaccessunitindecodingorderisalsothefirstaccessunitinpresentationorder.

Type2correspondstowhatisknowinsomecodingschemesasa“ClosedGoPrandomaccesspoint”,forwhichthefirstaccessunitindecodingorderinthemediastreamstartingfromISAUisnotthefirstaccessunitinpresentationorder.

Type3correspondstowhatisknowninsomecodingschemesasan“OpenGoPrandomaccesspoint”,inwhichthere are some access units in decoding order following ISAU that cannot be correctly decoded and havepresentationtimeslessthanTSAP.

Type4correspondstowhatisknowninsomecodingschemesasan"GradualDecodingRefresh(GDR)randomaccesspoint”,inwhichtherearesomeaccessunitsindecodingorderstartingfromandfollowingISAUthatcannotbecorrectlydecodedandhavepresentationtimeslessthanTSAP.

Type5correspondstothecaseforwhichthereisatleastoneaccessunitindecodingorderstartingfromISAPthatcannotbecorrectlydecodedandhaspresentationtimegreaterthanTDECandwhereTDECistheearliestpresentationtimeofanyaccessunitstartingfromISAU.

Type6correspondstothecaseforwhichthereisatleastoneaccessunitindecodingorderstartingfromISAPthat cannot be correctly decoded and has presentation time greater thanTDEC andwhereTDEC is not theearliestpresentationtimeofanyaccessunitstartingfromISAU.

Page 241: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 227

Annex J (normative)

MIME Type Registration of Segments

J.1 Introduction

ThisAnnexprovidestheformalMIMEregistrationofmediasegmentsformattedaccordingto8.16.

J.2 Registration

MIME media type name: video

MIME subtype name: iso.segment

Required parameters: none

Optional parameters: as specified by RFC 6381 and its successors

Encoding considerations: as for video/mp4

Security considerations: See section 5 of RFC 4337.

Interoperability considerations: A number of interoperating implementations exist within the ISO/IEC 14496 community, and that community has reference software for reading and writing the file format.

Published specification: ISO/IEC 14496-12:2012 (expected)

Applications: Multimedia

Additional information:

Magic number(s): none

File extension(s): m4s

Macintosh File Type Code(s): None

Person to contact for info: David Singer, [email protected]

Intended usage: Common

Author/Change controller: David Singer, ISO/IEC 14496 file format chair

Page 242: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

228 ©ISO/IEC2015–Allrightsreserved

Annex K (informative)

Segment Index Examples

K.1 Introduction

Thisannexgivessomeexamplesoftheuseofthesegmentindexbox,andwhatvaluesareinsertedinitwhenitisusedinvariousdifferent‘styles’orconfigurations.

Inthefollowingexamples,thesizeofi‐th‘sidx’boxisdefinedasSi,index,thesizeofi‐thsubsegment,e.g.i‐th ‘moof’ and ‘mdat’ boxes, is defined as Si,media, the duration of i‐th subsegment is defined asDi, thenumberofthelastsubsegmentisdefinedasN,andthedurationofthesegmentisdefinedasDsegment.

K.2 Examples

K.2.1 Simple one-level indexing

Thisexampleshowsasimplesegmentindex(FigureK.1).Allentriesofthetoplevelsidxpointtomediacontent(segmentscomprisingoneormoremoviefragments),i.e.reference_typeisequalto0.Thevalueofreferenced_sizeandsubsegment_durationofeachentryarecalculatedasTableK.1.

Figure K. 1: Simple Segment Index

sidx entries referenced_size subsegment_duration

e0 Si Di

e1 Si+1 Di+1

Table K. 1: Simple Segment Index

K.2.2 Hierarchical

Thisexampleshowshierarchicalsegmentindex(FigureK.2).Allentriesofthetoplevelsidxpointtoanother‘sidx’box,i.e.reference_typeisequalto1,andallentriesofthesecondlevelsidxpointto

Page 243: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 229

media content, i.e. reference_type is equal to 0. The value of referenced_size andsubsegment_durationofeachentryarecalculatedasTableK.2.

Figure K. 2: Hierarchical segment index

sidx# entries referenced_size subsegment_duration

i-th e0 Si+1,index + Sj,media + Sj+1,media Dj + Dj+1

e1 Si+2,index + Sj+2,media + Sj+3,media Dj+2 + Dj+3

(i+1)th e0 Sj,media Dj

e1 Sj+1,media Dj+1

(i+2)th e0 Sj+2,media Dj+2

e1 Sj+3,media Dj+3

Table K. 2: Hierarchical segment index

K.2.3 Daisy-chain

Thisexampleshowsdaisy‐chainedsegmentindex(FigureK.3).Each‘sidx’boxhastwoentries,thefirstentrypointstomediacontent,i.e.reference_typeisequalto0,thesecond(thelast)entrypointstonext ‘sidx’ box, i.e. reference_type is equal to 1. The value of referenced_size andsubsegment_durationofeachentryarecalculatedasTableK.3.

Figure K. 3: Daisy-chained segment index

Page 244: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

230 ©ISO/IEC2015–Allrightsreserved

sidx# entries referenced_size subsegment_duration

i-th e0 Si,media Di

e1 Si+1,index

i

jjsegment

N

ijj DDD

01

(i+1)th e0 Si+1,media Di+1

e1 Si+2,index

1

02

i

jjsegment

N

ijj DDD

Table K. 3: Daisy-chained segment index

K.2.4 Combination hierarchical and daisy-chain

Thisexampleshowshierarchicalanddaisy‐chainedsegmentindex(FigureK.4),whichiscombinationofA.2.3andA.2.4.Thevalueofreferenced_sizeandsubsegment_durationofeachentryarecalculatedasTableK.4.

Figure K. 4: Combined segment index

Page 245: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 231

sidx# entries referenced_size subsegment_duration

i-th e0 Si+1,index + Sj,media + Sj+1,media Dj + Dj+1

e1 Si+2,index + Sj+2,media + Sj+3,media Dj+2 + Dj+3

e2 Si+3,index + Sj+4,media

3

04

j

kksegment

N

jkk DDD

(i+1)th e0 Sj,media Dj

e1 Sj+1,media Dj+1

(i+2)th e0 Sj+2,media Dj+2

e1 Sj+3,media Dj+3

(i+3)th e0 Sj+4,media Dj+4

e1 Si+4,index

4

05

j

kksegment

N

jkk DDD

Table K. 4: Combined segment index

Page 246: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

232 ©ISO/IEC2015–Allrightsreserved

Bibliography

[1] TheQuickTimefileformatspecification,inPDF:<http://developer.apple.com/documentation/QuickTime/QTFF/qtff.pdf>

[2] 3GPPTS26.244,3GPPfileformat(3GP)

[3] 3GPPTS26.346,MultimediaBroadcast/MulticastService(MBMS);Protocolsandcodecs

[4] OMABCAST_Distribution‐V1_0:FileandStreamDistributionforMobileBroadcastServices

[5] IETFRFC3926,FLUTE‐FileDeliveryoverUnidirectionalTransport,October2004

[6] IETFRFC3450,AsynchronousLayeredCoding(ALC)ProtocolInstantiation,December2002

[7] IETFRFC3451,LayeredCodingTransport(LCT)BuildingBlock,December2002

[8] IETFRFC3452,ForwardErrorCorrection(FEC)BuildingBlock,December2002

[9] IETFRFC3695,CompactForwardErrorCorrection(FEC)Schemes,February2004

[10] IETFRFC1864,TheContent‐MD5HeaderField,October1995

[11] IETFRFC2616,HypertextTransferProtocol—HTTP/1.1,June1999

[12] IETFRFC3061,AURNNamespaceofObjectIdentifiers,February2001

[13] IETFRFC3550,RTP:ATransportProtocolforReal‐TimeApplications,July2003

[14] IETFRFC3551,RTPProfileforAudioandVideoConferenceswithMinimalControl,July2003

[15] IETFRFC4122,AUniversallyUniqueIDentifier(UUID)URNNamespace,July2005

[16] IETF RFC 4771, Integrity Transform Carrying Roll‐Over Counter for the Secure Real‐timeTransportProtocol(SRTP),January2007

[17] IETFRFC5119,AUniformResourceName(URN)Namespace for theSocietyofMotionPictureandTelevisionEngineers(SMPTE),February2008

[18] ICC.1:2001‐04,Fileformatforcolorprofiles,InternationalColorConsortium

[19] SMPTE RP 177, Derivation of Basic Television Color Equations; Society of Motion Picture andTelevisionEngineers(SMPTE),1993

[20] ISO/IEC13818‐1, Information technology — Generic coding of moving pictures and associated audio information — Systems

Page 247: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

©ISO/IEC2015–Allrightsreserved 233

[21] ISO/IEC14496‐15, Information technology — Coding of audio-visual objects — Advanced Video Coding (AVC) file format

[22] IETFRFC5117,RTP Topologies,WESTERLUND,M.etal.,January2008.

Page 248: INTERNATIONAL ISO/IEC STANDARD 14496-12 · 2018-06-07 · electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address

ISO/IEC 14496-12:2015(E)

ICS 35.040

Pricebasedon233pages

©ISO/IEC2015–Allrightsreserved