philipp merkle, aljoscha smolic karsten müller, thomas wiegand csvt 2007
TRANSCRIPT
Efficient Prediction Structure for Multi-view Video Coding
Philipp Merkle, Aljoscha Smolic Karsten Müller,
Thomas Wiegand
CSVT 2007
OutlineMulti-view video coding (MVC) introductionRequirements and test conditions for MVCPrediction structuresExperimental resultsConclusion
2
MVC IntroductionMVC: Multi-view Video CodingMulti-view video (MVV): A system that uses
multiple camera views of the same scene is called.
Usage: 3DTV, free viewpoint video(FVV), etc.
3
Requirements for MVCTemporal random accessView random accessScalabilityBackward compatibilityQuality consistencyParallel processing
4
Temporal and inter-view correlation
5
T
T
T
temporal/inter-view mixed mode
Inter-view
temporal/inter-view mixed modeTemporal
Temporal and inter-view correlation analysis
6
H.264/AVC encoder was used with the following settings: Motion compensation block size of 16*16 Search range of ±32 pixels Lagrange parameter (λ) of 29.5
denotes the decrease of the average in comparison to temporal prediction only.J J
Simply including temporal and inter-view prediction modes
7
Temporal and inter-view correlation analysis (cont’d)
Lagrangian cost functionLagrangian cost function:
D denotes distortion.R denotes number of bits to transmit all components of
the motion vector.For each block in a picture, algorithm chooses
MV within a search rage that minimizes .
The distortion in the subject macroblock B is calculated by:
8
J D R (1)
argmin ( , ) ( , )i i im D S m R S m (2)
iS imM J
2
( , )
, ( , , ) ( , , )i x y tx y B
D S m s x y t s x m y m t m
(3)
1D camera: Ballroom, Exit, Rena, Race1, Uli, (line)
Breakdancers (arched) 2D camera: Flamenco2 (cross), AkkoKayo
(array)
Use 5 to 16 camera views Target high quality TV-type video (640*480
or 1024*768) then limited channel communication-type video.
9
Test data and test conditions
Knowledge – hierarchical B picture, QP cascadingHierarchical B picture, key picture, non-key
picture:
QP cascading : [1]
10
key picture key picture
1 ( 1?4 :1)k kQP QP k
[1] “Analysis of hierarchical B pictures and MCTF”, ICME 2006, IEEE International Conference on Multimedia and Expo, Toronto, Ontario, Canada, July 2006
Knowledge – DPB sizeDecoded Picture Buffer (DPB) size is
increased to: [2]
11
2* _ _ _GOP length number of views
[2] “Efficient Compression of Multi-view Video Exploiting Inter-view Dependencies Based on H.264/AVC”, ICME 2006, IEEE International Conference on Multimedia and Expo, Toronto, Ontario, Canada, July 2006
Memory-efficient reordering of multi-view input for compression
Two tasks1. To adapt the multi-view prediction schemes
to the specific camera arrangements of the test data sets.
2. To adapt the prediction structures to the random access specification.
12
Prediction structureSimulcast coding structureTo allow synchronization and random access,
all key pictures are coded in intra mode.
13
Prediction structure (cont’d)Alternative structures of inter-view for key
pictures
15
KS_IPP KS_PIP KS_IBP
KS_IPP
KS_PIP
KS_IBP
Linear camera arrangement 2D Camera array
Experimental results – objective evaluation
17
Ballroom test result
Average coding gains compared with anchor coding
Experimental results – subjective evaluationDifferent bit-rates were selected for the
different data sets.
18
Ballroom test result
Race1 test result
Experimental results – subjective evaluationAS_IBP outperforms the anchors significantly.The gain decreases slightly with higher bit-rates.
19
Average results over all test sequences
Influence of camera densityUsing Rena sequence, and
consisting of 16 linear arranged cameras with a 5 cm distance between two adjacent cameras
Repeated for each shifted set of 9 adjacent cameras
The structure are applied to every time instance of the MVV sequence without temporal prediction.
20
Results of experiments on camera density
Coding gain increases with decreasing camera distance and decreasing reconstruction quality.
21
Results of experiments on camera density (cont’d)
Results of average per camera rate relative to the one camera case(→)
A larger QP value leads to a larger coding gain
22
ConclusionResulting multi-view prediction: achieving
significant coding gains and being highly flexible.
Parallel processing is supported by the presented sequential processing approach.
Problems:Large disparities between the different views
of multi-view video sequencesIllumination and color inconsistencies across
views
23