mpeg-4 systems and dmif doug young suh, ph.d. kyung hee university suh@khu.ac.kr 21 세기...

Post on 27-Dec-2015

219 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

MPEG-4 Systems and DMIF

Doug Young Suh, Ph.D.Kyung Hee University

Suh@khu.ac.kr

21 세기 유망핵심부품 기술 세미나

Outline

• Overview• ISO/IEC 14496-1 MPEG-4 Systems• ISO/IEC 14496-6 DMIF

Overview

14496-1 MPEG-4 Systems

14496-2MPEG-4 Video

14496-3MPEG-4 Audio

• MPEG-4 Systems : interactive audio-visual scene

Server

DMIFCallSetupControl

Client

DMIFCallSetupControl

Authoring Tool

MP4 File

Video ESAudio ES

FlexMux

DMIF TransMux

RTP/UDP/IP RTP/UDP/IP

DMIF TransMux

FlexMuxSLSL SLSL

BIFS Encoder BIFS Encoder BIFSComposition

BIFSComposition

MP4File

VideoEncoderVideo

EncoderVideo

DecoderVideo

Decoder

AudioEncoderAudio

EncoderAudio

EncoderAudio

Encoder

Interactive VOD Based on MPEG-4

Concept 1 : Layered Model

한국 철학자 네팔 철학자

통역한국어 영어

통역네팔어 영어

한국 통신 네팔 통신통신 프로토콜

통역 프로토콜

철학자 프로토콜

철학자 / 통역 인터페이스

통역 / 통신 인터페이스

Concept 2: Object-oriented

• Encapsulation : data, method• Inheritance • Not object-based Human

NameAgeCall()

Customer

BalanceRegister()

Employee

SalaryFire()

Multiplexed Streams

Interactive AudiovisualScene

Elementary Streams

Composition and Rendering

Display andUser

Interaction

Transmission/Storage Medium

(RTP)UDP

IP

H223PSTN

DABMux

DeliveryLayer

FlexMux FlexMux

DMIF Application Interface

SL SLSL SL ... SyncLayer

Elementary Stream Interface

AV Objectdata

SceneDescriptionInformation

ObjectDescriptor

... CompressionLayer

SL

SL-Packetized Streams

(PES)MPEG-2

TS

AAL2ATM

UpstreamInformation

SL

SL

FlexMux

...

The ISO/IEC 14496 terminal architecture 14496-2 video14496-3 audio

14496-1 Systems

14496-6 DMIF

Tools in Systems

• Terminal model with time and buffer management

• BIFS (Binary Format for Scenes)• OD (Object Descriptor)• Interface to IPMP systems• SL (Sync Layer)• FlexMux• MPEG-Java : an application engine

14496-1 Terminology

• A scene is composed of one or more than one objects.

예 ) 일기예보장면 (scene) 에서 사람 (object1) 과 배경(object2) 이 있고 , 소리 (object3) 가 나온다 .

• ES : 압축된 media data, 대개 object 와 1:1

• AU : 대개 영상은 한 VOP, audio 는 한 frame (e.g. 10ms)

• CU : decoding 후 독립적으로 다룰 수 있는 가장 작은 단위

Systems Decoder Model

DecodingBuffer DB

1

Decoder

(encapsulatesDemultiplexer)

DMIF Appli-cation Interface

DecodingBuffer DBn

DecodingBuffer DB

2 DecoderMemoryCB

2

Compositor

Elementary Stream Interface

DecodingBuffer DB

3

MemoryCB

1

Composition

Composition

MemoryCB

n

CompositionDecoder

1

2

n

Systems Buffer Model

• DB : bitrate 변화 및 network jitter 흡수• CM : prediction (P-, B-VOP) 용 , CU decoding time 차이 흡수• DB, CM 으로 초기 지연이 결정됨• CM 은 최소화하여야 ( 특히 , PDA)

Time Model

• 필요한 이유- Lip synchronization : CTS, DTS- Clock recovery : e.g. broadcast, IMT-2000

• Assumption- DTS 순간 decoding 되고 , DB 에서 지워지면서 , deco

ding 된 CU 는 CM 에 저장됨- 현재 CTS 에서 다음 CTS 사이에 composition 됨 ( 한

CU 는 적어도 다음 CU 의 CTS 까지는 CM 에 있어야 )

DTS and CTS

CompositionMemory

DecodingBuffer

AU0

AU1

CU0

CU1

Arrival(AU0)

Arrival(AU1)

DTS (AU0)DTS (AU1)

CTS (CU0) CTS (CU1)= available for composition

...................

...................

Time Base

• STB in the decoder system• OTB for media source systems- Video : 60 times in a second- Audio : 44100 times in a second

• Mapping OTB to STB

STARTSTBSTARTOTBOTB

STBOCT

OTB

STBSCT tt

t

tt

t

tt

MP4 File

• Self-contained cf. *.asf of MS Media Player

• Include IOD, OD, BIFS, ES

IODmoov

mp4 file

mdattrak (BIFS)

trak (OD)

trak (video)

trak (audio)

... other atoms

Interleaved, time-ordered, BIFS, OD, video, and audio access units

OD Framework• Basic syntaxabstract aligned(8) expandable(228-1) class BaseDescriptor : bit(8) tag=0 {

// empty. To be filled by classes extending this class.

}

abstract aligned(8) expandable(228-1) class BaseCommand : bit(8) tag=0 {

// empty. To be filled by classes extending this class.

}

• IPMP : IPMP OD, IMMP ES• Command : OD stream, OD as an ES (convey, update, and remove ODs)

• Descriptor : OD components (Object, IOD, ES, Decoder, QoS)

OD Stream

• Command 전달 (convey, update, and remove)

• Examplesclass ObjectDescriptorUpdate extends BaseCommand : bit(8) tag=ObjectDes

crUpdateTag {ObjectDescriptorBase OD[1 .. 255]; }

class ObjectDescriptorRemove extends BaseCommand : bit(8) tag=ObjectDescrRemoveTag {bit(10) objectDescriptorId[(sizeOfInstance*8)/10]; }

class ES_DescriptorRemove extends BaseCommand : bit(8) tag=ES_DescrRemoveTag {bit(10) objectDescriptorId;aligned (8) bit(16) ES_ID[1..255]; }

class IPMP_DescriptorRemove extends BaseCommand : bit(8) tag=IPMP_DescrRemoveTag {bit(8) IPMP_DescriptorID[1..255]; }

100

Visual Stream (e.g. temporal enhancement)

Visual Stream (e.g. base layer)

Scene Description Stream

Object Descriptor Stream

e.g. MovieTexture

Scene Description

ObjectDescriptorID

ES_ID

ES_ID

ES_ID

ES_ID

ObjectDescriptor

:

ES_Descriptor

ES_Descriptor

initialObjectDescriptor

:

ES_Descriptor

ES_Descriptor

ObjectDescriptor

ObjectDescriptor

ObjectDescriptorUpdate

ES_DES_D

ES_D

... ...

......

BIFS Command (Replace Scene)

e.g. AudioSource

Audio Stream

Object descriptors linking scene description to elementary streams

OD Component 1: IOD// BIFS 와 media 별 OD 에 대한 Es_Descriptor 를 가진 OD// Call-setup 을 위하여 필요함class InitialObjectDescriptor extends BaseDescriptor : bit(8) tag=InitialObjectDescr

Tag bit(10) ObjectDescriptorID;bit(1) URL_Flag;bit(1) includeInlineProfileLevelFlag;const bit(4) reserved=0b1111;if (URL_Flag){ bit(8) URLlength; bit(8) URLstring[URLlength]; } else {

bit(8) ODProfileLevelIndication;bit(8) sceneProfileLevelIndication;bit(8) audioProfileLevelIndication;bit(8) visualProfileLevelIndication;

// e.g. Simple, Simple Scalable, Core, Main, etc.bit(8) graphicsProfileLevelIndication;ES_Descriptor ESD[1 .. 255]; // 한 개 이상 있어야OCI_Descriptor ociDescr[0 .. 255]; // 없어도 됨IPMP_DescriptorPointer ipmpDescrPtr[0 .. 255];

}ExtensionDescriptor extDescr[0 .. 255];

}

OD Component 2 : ODclass ObjectDescriptor extends BaseDescriptor : bit(8) tag=ObjectDescrTag {

bit(10) ObjectDescriptorID;

bit(1) URL_Flag;

const bit(5) reserved=0b1111.1;

if (URL_Flag) {

bit(8) URLlength;

bit(8) URLstring[URLlength]; //point to another OD

} else {

ES_Descriptor esDescr[1 .. 255];

// an array of ES_Descriptors, 한 개 이상 있어야

OCI_Descriptor ociDescr[0 .. 255];

IPMP_DescriptorPointer ipmpDescrPtr[0 .. 255];

}

ExtensionDescriptor extDescr[0 .. 255];

}

OD Component 3 : ES_Descriptorclass ES_Descriptor extends BaseDescriptor : bit(8) tag=ES_DescrTag {

bit(16) ES_ID;

bit(1) streamDependenceFlag;

bit(1) URL_Flag;

const bit(1) reserved=1;bit(5) streamPriority;

if (streamDependenceFlag) bit(16) dependsOn_ES_ID;

if (URL_Flag){ bit(8) URLlength; bit(8) URLstring[URLlength]; }

DecoderConfigDescriptor decConfigDescr;SLConfigDescriptor slConfigDescr;QoS_Descriptor qosDescr[0 .. 1]; // 있으면 , 한 개까지IPMPDescriptor ipmpDescrPtr[0 .. 1];

………………………… 중략 ……………………… ..

ExtensionDescriptor extDescr[0 .. 255];}

OD Component 4 : DecoderConfigDescriptor

class DecoderConfigDescriptor extends BaseDescriptor : bit(8) tag=DecoderConfigDescrTag {

bit(8) objectTypeIndication; // MPEG-1,-2 video, audio, etc.

bit(6) streamType;

bit(1) upStream;

const bit(1) reserved=1;

bit(24) bufferSizeDB;

bit(32) maxBitrate;

bit(32) avgBitrate;

DecoderSpecificInfo decSpecificInfo[0 .. 1];

}

Other OD Components

• QoS_Descriptor : delay, loss, AU_Size, etc.

• DecoderSpecificInfo• SLConfigDescriptor• ContentIdentificationDescriptor

BIFS• Binary information needed to combine, recons

truct, and present audio-visual data at the client side (not at the server side)

• spatio-temporal location/scale/orientation of audio-visual objects

• largely based on VRML (ISO/IEC 14772-1)• BIFS_ES, BIFS AU (BIFS-Command, BIFS-Ani

m), BIFS SL, BIFS time base, BIFS decoder• For interactivity, SENSOR node

Object-based

multimedia Scene

multiplexeddownstream control / data

multiplexedupstream control / data

audiovisualpresentation

3D objects

2D background

voice

sprite

hypothetical viewer

projection

videocompositor

plane

audiocompositor

scenecoordinate

systemx

y

z user events

audiovisualobjects

speaker displayuser input

Logical structure of the scene

scene

globe desk

person audiovisualpresentation

2D background furniture

voice sprite

• a graph with links and nodes (refer to graph theory.)

startTime1

startTime stopTime2

startTimestartTime+

duration3

startTime stopTime

4 set StopTime

startTime5

Set loop = FALSE

startTime+2*duration

startTime+duration

startTime+duration

Time

ParametersLoop, duration,

startTime,stopTime

1. 한번 play

2. Play 도중 stop

3. 계속 되풀이(loop=TRUE, stopTime<=startTime)

(loop=FALSE, startTime<stopTime<startTime+duration)

BIFS-Command

• Modify properties of the scene graph, its nodes, and behaviors

• applied to conditional nodes1. ReplaceEntireScene(new_scene_graph) // random access point2. Insertion(nodeID,event,ROUTE)3. Deletion(nodeID,event,ROUTE)4. Replace(nodeID,event,ROUTE)

BIFS-Anim

• update of the certain fields of nodes in the scene graph

• meshes, 2D/3D positions, rotations, scale factors, and color attributes

• Separate ESs for BIFS-Command (CommandFrames) and BIFS-Anim (AnimationFrames)

Composite Texture2D example (projected on 3D

cube)

CompositeTexture2D{

eventIn MFNode addChildren

eventIn MFNode removeChildren

exposedField MFNode children

exposedField SFInt32 pixelWidth

exposedField SFInt32 pixelHeight

exposedField SFNode background

exposedField SFInt32 viewport

}

Sync layer (SL)

• defines a syntax for the packetization of each ES into AUs or parts of AU

• SPS (SL packet stream) : the sequence of SL packets from one ES

DMIF Application Interface

Elementary Stream Interface

SL-Packetized Streams

Elementary Streams

Sync LayerSL SLSL SL.............

class SLConfigDescriptor extends BaseDescriptor : bit(8) tag=SLConfigDescrTag {bit(8) predefined;if (predefined==0) {

bit(1) useAccessUnitStartFlag; bit(1) useAccessUnitEndFlag;bit(1) useRandomAccessPointFlag;

bit(1) hasRandomAccessUnitsOnlyFlag;bit(1) usePaddingFlag; bit(1) useTimeStampsFlag;bit(1) useIdleFlag; bit(1) durationFlag;bit(32) timeStampResolution; bit(32) OCRResolution;bit(8) timeStampLength; // must be 64bit(8) OCRLength; // must be 64bit(8) AU_Length; // must be 32bit(8) instantBitrateLength; bit(4) degradationPriorityLengt

h;bit(5) AU_seqNumLength; // must be 16bit(5) packetSeqNumLength; // must be 16bit(2) reserved=0b11; }

if (durationFlag) {bit(32) timeScale; bit(16) accessUnitDuration;bit(16) compositionUnitDuration; }

if (!useTimeStampsFlag) {bit(timeStampLength) startDecodingTimeStamp;bit(timeStampLength) startCompositionTimeStamp; } }

SLConfigDescriptor in ES_Descriptor

SL Packet Header

• packetSequenceNumber• degradationPriority• objectClockReference• decodingTimeStamp• compositionTimeStamp• accessUnitLength• instantBitrate

MPEG-Java

• Flexible programmatic control system

(not parametric)• Capability for graceful degradation

under limited or time varying resources

• Capability to respond to user interaction and provide enhanced multimedia functionality

MPEG-J System

• Combine MPEG-media and safe executable code (Java code)

• Components of MPEG-4 player- Execution and presentation resources- Decoders- Network resources- Scene graph • Downloadable decoder????

MPEG-J enabled MPEG-4 System

DEMUX

M P E G - JA p p l i c a t i o n

B u f f e r

S c e n e G r a p hM a n a g e r

R e s o u r c eM a n a g e r

I / OD e v i c e s

N e t w o r kM a n a g e r

C l a s sL o a d e r

D M I F S c e n eG r a p h

B I F SD e c o d e r

D e c o d i n gB u f f e r s 1 . . n

M e d i aD e c o d e r s 1 . . n

C o m p o s i t i o nB u f f e r s 1 . . n

C o m p o s i t o ra n d R e n d e r e r

V e r s i o n 1p l a y e r

N W A P I S G A P I R M A P I

L e g e n d

I n t e r f a c e

C o n t r o ld a t a

B a c kC h a n n e l

C h a n n e l

M D A P I

FlexMux (optional)

• Multiplexing or separate channel?- Multiplexing : circuit switching- Separate channels : packet switching

• Multiplexing : low overhead- RTP/UDP/IP header size (40 bytes > )

compared to audio packet payload (20 bytes)

- Simpler than MPEG-2 TS

Simple Mode

FlexMux-PDU

PayloadHeader

SL-PDUlengthindex

MuxCode Mode

.......SL-PDUSL-PDUversion SL-PDUlengthindex

.......H PayloadH Payld H Payload

FlexMux-PDU

MP4 File format• (normally) self-contained file cf. *.asf• Protocol-unaware, media-unaware

IODmoov

mp4 file

mdattrak (BIFS)

trak (OD)

trak (video)

trak (audio)

... other atoms

Interleaved, time-ordered, BIFS, OD, video, and audio access units, and hintinstructions

hint

MP4 File Usage

• Interchange • Content creation : authoring• Preparation for streaming :

interleaving• Local presentation : CD, DVD-ROM• Streamed presentation (not yet, in

IM1)

MP4 Terminology• atom : ‘object’ in sense of object-oriente

d concept e.g. ‘iods’ OD atom, ‘moov’ movie atom,

‘mdat’ media data atom etc. • trak : ES + [hint trak] e.g. video trak, au

dio trak • hint trak : packetization information • Container : file‘moov’ ‘mvhd’ ‘mdhd’

Hint track• Bridge between MPEG-4 and a protocol• Each TransMux has its own hint track format.

(ES over TransMuxes)

• aligned(8) class HintMediaHeaderAtom extends FullAtom(‘hmhd’, version = 0, 0) {unsigned int(16)maxPDUsize;unsigned int(16)avgPDUsize;unsigned int(32)maxbitrate;unsigned int(32)avgbitrate;unsigned int(32)slidingavgbitrate;

}

DMIF

Compression Layer

media awaredelivery unawareISO/IEC 14496-2VisualISO/IEC 14496-3Audio

ElementaryStreamInterface(ESI)

Sync Layermedia unawaredelivery unawareISO/IEC 14496-1Systems

Delivery Layer

DMIFApplicationInterface(DAI)media unaware

delivery awareISO/IEC 14496-6 DMIF

DMIF Usage

Originating

App

Flows between independent systems (normative)

Flows internal to a single system (either informative or out of DMIF scope)

Originating DMIF

for Broadcast

Originating DMIF

for Remote srv

Originating DMIF

for Local Files

Target DMIF

Target DMIF

Network

DNI

Broadcastsource

DM

IF F

ilter

Sigmap

Target App.

Target App.

DAI

Sigmap

Target DMIF TargetApp

LocalStorage

InteractiveNetwork

DNI DAIServer

Client

DMIF Terminology

• Service : DMIF provides a service to an application(or user).

• Service session : local association between DMIF instance and a service

• Network session : an association between two DMIF peers

• Channel over which a DMIF user sends or receives data

DMIF user DMIF user

DMIFInstance

uu

ddDMIF

Instance

service

Service

session

Network session

service

Service

session

Network

TransMuxchannels

DMIF Terminology

Network service primitives

Network

User User

1. Request 2. Indication4. Confirm

3. Response

DMIF-Application Interface

• Service primitives e.g. DA_ServiceAttach(IN: URL, uuDataInBuffer, uuDataInLen; OUT: response,

serviceSessionId, uuDataOutBuffer, uuDataOutLen)

• Channel primitives e.g. DA_ChannelDelete(IN: loop(channelHandle,reason) OUT: loop(response))

• Data primitives e.g. DA_Data(IN: channelHandle, streamDataBuffer, streamDataLen)

DMIF Network Interface

• Session primitives : setup and release- DN_SessionSetup(), DN_SessionRelease()

• Service primitives : attach and detach- DN_ServiceAttach(), DN_ServiceDetach()

• Transmux primitives : setup, release, and config

- DN_TransMuxSetup(), DN_TransMuxRelease(), DN_TransMuxConfig()

• Channel primitives : add and delete

the applicationinitiates

the service DA_ServiceAttach

(IN: DMIF_URL,uuData)

DN_SessionSetup(IN: nsId, CalledAddr,

CallingAddr, CapDescr)

(OUT: rsp, CapDescr)

DN_ServiceAttach(IN: nsId, serviceId,

serviceName, ddData)

(OUT: rsp, ddData)

DA_ServiceAttach(IN: ssId,

serviceName,uuData)

(OUT: rsp, uuData)(OUT: rsp, ssId,uuData)

determinewhether a

new networksession

is needed

attach to theservice

Connect to theapplication

runningthe service

the applicationrunning

the service replies

1

2

3

4

5

67

8

Application ApplicationDAIDAI DMIF Layer

DMIF Layer

DNI + Network + DNI

Origin DMIF Terminal (Client) Target DMIF Terminal (Server)

Conclusion

• Semantic and syntax• General or specific applications• Multimedia over [mobile] Internet- ATM => All IP- QoS Issues (time varying and limited)• Imlementation 시작은 IM1 으로 mpeg4.nist.gov/IM1

Future Works

• Downloadable decoder cf. SDR (software defined radio) • All IP (<= all ATM)• QoS control : transport layer => network layer IETF (RSVP, diffServ, intServ, MPLS)

Abbreviations• AU access unit• AV audio-visual• BIFS binary format for scene• CM composition memory• CTS composition time stamp• CU composition unit• DAI DMIF-application interface• DB decoding buffer • DNI DMIF-network interface• DTS decoding time stamp• ES elementary stream• ESI elelmentary stream interface• ESID elementary stream identifier• IPMP intellectual property managem

ent and protection

• OCI object content information• OCR object clock reference• OD object description• OTB object time base• PLL phase locked loop• QoS quality of service• SDM system decoder model • SL synchronization layer• SPS SL-packetized stream• STB system time base• URL universal resource locator• VOP video object plane• VRML virtual reality modeling langua

ge

top related