li-wei kang ( 康立威 ) institute of information science, academia sinica taipei, taiwan...

Li-Wei Kang ( 康立威 )

Institute of Information Science, Academia SinicaTaipei, Taiwan

[email protected]中央研究院資訊科學研究所

博士後研究員

Feb. 22, 2008

Distributed Video Coding for Wireless Visual Sensor Networks

Distributed Video Coding for Wireless Visual Sensor Networks Feb. 22, 2008 at CSIE/NDHU 2

Outline

• Introduction

• Distributed Source Coding (DSC)

• Distributed Video Coding (DVC)

• DVC for Wireless Visual Sensor Networks (WVSN)

• Concluding Remarks

• References


Introduction

• Conventional video codingMPEG-1/2/4, H.261, H.263,

H.26L, H.264/AVC Interframe predictive codingEncoder is 5-10 times more

complex than decoderSuitable for video down-link

X’i-1

Interframe Encoder

Interframe Decoder

Xi Xi’

[Girod, 2002]


Conventional Video Coding

[Aramvith]



[Lin, NTHU, 2007]


Transformation and Quantization

[Lin, NTHU, 2007]


Interframe Predictive Video Coding

[Lin, NTHU, 2007]


Motion Estimation

[Lin, NTHU, 2007]


Motion Compensated Prediction

[Lin, NTHU, 2007]


Applications of Conventional Video Coding

[Pereira, 2007]


Introduction

Interframe Decoder

Intraframe EncoderXi

Xi-1’

Xi’

Side Information

• Problem: low-complexity video encoding for resource-limited video devices

• DSC approach: Wyner-Ziv video coding with low-complexity intraframe encoding and possibly high-complexity interframe decoding with side information only available at decoder

[Girod, 2002]


Applications of Low-Complexity Video Coding

• Wireless video cameras• Wireless low-power surveillance• Mobile document scanner• Video conferencing with mobile devices• Mobile video mail• Disposable video cameras• Wireless Visual Sensor Networks• Networked camcorders• Distributed video streaming• Multiview video entertainment• Wireless capsule endoscopy

[Pereira, 2007]



[Pereira, 2007]


Wireless Visual Sensor Networks

[Akyildiz, 2007, and Pereira, 2007]


Wireless Visual Sensor Networks

[Akyildiz, 2002]


Introduction

• Requirements of wireless visual sensor networkslow-complexity video encoderhigh compression efficiency

• Current approachesdistributed video coding (DVC) based on

distributed source coding (DSC)collaborative image coding and transmissionhybrid approach (proposed approach)


Distributed Source Coding (DSC)

• Lossless DSC, Slepian and Wolf, 1973• Lossy DSC, Wyner and Ziv, 1976• Distributed video coding (DVC) based on DSC

Girod, Stanford University, 2002~B. Girod, A. M. Aaron, S. Rane, and D. Rebollo-Monedero, “Distributed video

coding,” Proceedings of the IEEE, vol. 93, no. 1, pp. 71-83, Jan. 2005.Special session on Distributed video coding, 2005 IEEE International

Conference on Image Processing (ICIP2005), Italy, Sept. 2005 Ramchandran, Berkeley, 2002~

R. Puri, A. Majumdar, and K. Ramchandran, “PRISM: a video coding paradigm with motion estimation at the decoder,” IEEE Trans. on Image Processing, vol. 16, no. 10, pp. 2436-2448, Oct. 2007.

R. Puri, A. Majumdar, P. Ishwar, and K. Ramchandran, “Distributed video coding in wireless sensor networks,” IEEE Signal Processing Magazine, vol. 23, no. 4, pp. 94-106, July 2006.


Distributed Source Coding

• DISCOVER (Distributed Coding for Video Services) 2005~

F. Pereira, L. Torres, C. Guillemot, T. Ebrahimi, R. Leonardi, and S. Klomp, “Distributed video coding selecting the most promising application scenarios,” to appear in Signal Processing: Image Communication.

C. Guillemot, F. Pereira, L. Torres, T. Ebrahimi. R. Leonardi, J. Ostermann, “Distributed monoview and multiview video coding: basics, problems and recent advances,” IEEE Signal Processing Magazine, special issue on signal processing for multiterminal communication systems, vol. 24, no. 5, pp. 67-76, Sept. 2007.

M. Maitre, C. Guillemot, and L. Morin, “3-D model-based frame interpolation for distributed video coding of static scenes,” IEEE Trans. on Image Processing, vol. 16, no. 5, pp. 1246-1257, May 2007.

Six European major universities: UPC, IST, EPFL, UH, INRIA, UNIBS Special session on Distributed source coding, 2007 IEEE International

Conference on Image Processing (ICIP2007), USA, Sept. 2007 DISCOVER Workshop on Recent Advances in Distributed Video Coding,

Lisbon, Portugal, Nov. 2007 http://www.discoverdvc.org/



• X 、 Y in S = {000, 001, 010, 011, 100, 101, 110, 111} • H(X) = H(Y) = 3• If d(X, Y) ≤ 1, H(X) may be reduced to H(X|Y) = 2• For example, if Y = 000 and d(X, Y) ≤ 1, the possible X =>

X in {000, 001, 010, 100} => H(X|Y) = 2• A possible solution:

S can be divided into the four disjoint sets based on d(X, Y) ≤ 1

{000, 111}, {100, 011}, {010, 101}, {001, 110}

At the encoder, if X = 100 ， H(X|Y) = 2 denotes X in {100, 011}

At the decoder, X = 100 can be correctly decoded based on Y = 000 and the correlation between X and Y, d(X, Y) ≤ 1

• X: source data to be encoded, Y: the side information of X



Encoder

Encoder

X

Y

Decoder YX ,

XR

YR

Statistically dependent

)|(

)|(

),(

XYHR

YXHR

YXHRR

Y

X

YX

Slepian-Wolf Theorem, 1973

EncoderX

Y

Decoder

X

)(| dRWZYX

)()( || dRdR YXWZ

YX

Wyner-Ziv Theorem, 1976

Statistically dependent

[Girod, 2002]



[bits]XR

[bits]YR

H X

H Y

|H Y X

|H X Y

,X YR R H X Y

Separate encodingand decoding of X and Y

Separate encodingand decoding of X and Y

Separate encodingand joint decoding of X and Y

Separate encodingand joint decoding of X and Y

Slepian-Wolf Theorem, 1973

[Girod, 2002]



PredictiveInterframe Decoder

PredictiveInterframe Encoder

X’

Side Information

YX Y

[Girod, 2006]


Distributed Video Coding based on Wyner-Ziv Theorem

“Motion JPEG”

Decoder

“Motion JPEG”

Encoder

X’X

Wyner-ZivInterframe Decoder

Wyner-ZivIntraframe Encoder

Side Information

Y

[Girod, 2006]


Wyner-Ziv Video Coding

• K: key frame, conventional intraframe encoding

• X: Wyner-Ziv frame, Wyner-Ziv video encoding

• The corresponding side information Y of X is generated at decoder based on interpolation of the previous decoded frames

[Girod, 2003]


Side Information Generation

[Ebrahimi, 2006][Guo, 2006]



(a) The original frame (X); (b) the corresponding side information (Y) generated at the decoder.

(a) (b)

[Girod, 2003]



Quantizer

X

Y

Channel

Encoder

X Channel

Decoder

Minimum distortion

Reconstruction

Wyner-Ziv DecoderWyner-Ziv Encoder

Y“Correlation channel”X

Wyner-Ziv Decoder

Scalar Quantizer

X

Wyner-Ziv Encoder

Reconstruction X’

Y

Turbo Encoder

Turbo Decoder

Slepian-Wolf Codec

[Girod, 2002]


Pixel-domain Wyner-Ziv Video Coding

Interframe Decoder

Scalar Quantizer

Turbo Encoder

Buffer

Wyner-Ziv frames

X

Intraframe Encoder

Turbo Decoder

Interpolation/ Extrapolation

Reconstruction X’

Y

Key frames

KConventional

Intraframe encoding

Conventional Intraframe decoding

K’

Side informationRequest bits

[Girod, 2003]


Scalar Quantization

• Scalar quantization in pixel domain

(a) The original frame; (b) the corresponding 16 gray level quantized frame.(a) (b)

[Girod, 2003]


Turbo Encoder

bits output

1

2

nRX

Interleaver length L

1PX

XL bits in

L bitsSystematic Convolutional Encoder

Rate nn 1

1n

Lbits

Discarded

2PX

Systematic Convolutional Encoder

Rate nn 1

L bitsDiscarded

1n

Lbits 1n

2L

• For each input block of n – 1 bits, the turbo encoder produces codewords of length n composed of the actual input bits and one parity bit[Girod, 2002]


Turbo Decoder


1PX

L bits out

Channel probabilities

calculations

1n

Lbits in

2PX Channel

probabilities

calculations

Y

1n

Lbits in

)|( yxP

SISO

Decoder

Pchannel

PextrinsicPa priori


Deinterleaver length L

SISO

Decoder

Pchannel

Pextrinsic Pa prioriDeinterleaver

length L

Decision X

Pa posteriori

Pa posteriori

[Girod, 2002]


Simulation Results

Side information After Wyner-Ziv decoding16-level quantization

[Girod, 2003]


Simulation Results

[Girod, 2003]


Transform-domain Wyner-Ziv Video Coding

WZ frames

W

Request bits


Reconstruction

Key framesK Conventional

Intraframe coding

Conventional Intraframe decoding

DCT

For each transform band k

K’

W’

Y

Yk

Xk Xk’

IDCT

Decoded WZ frames

level Quantizer

DCT

kM2 Turbo Encoder

BufferTurbo

DecoderExtract bit-

planes

qk

bit-plane 1

bit-plane 2

bit-plane Mk

…

qk’

Interframe Decoder

Intraframe Encoder

level Quantizer

DCT

kM2 Turbo Encoder

BufferTurbo

DecoderExtract bit-

planes


Side information

[Girod, 2004]


• Each coefficient band is quantized using a scalar quantizer with 2M levels.


level Quantizer

WZ frameW

4x4 DCT

XkkM2 qk

For each transform band k

256} ..., 4, 2, {1,2 kM

• Combination of quantizers determines the bit allocation across bands.

Mk = number of bit planes for kth coefficient

band

Sample quantizers: Values represent number quantization levels for coefficient band

[Girod, 2004]



Turbo Encoder

BufferTurbo

DecoderRequest bits

Extract bit-planes

bit-plane 1

bit-plane 2

bit-plane Mk

… qk’qk

Yk

• Bit planes of coefficients are encoded independently but decoded successively

• Rate-compatible punctured turbo code (RCPT)Flexibility for varying statisticsBit rate controlled by decoder through feedback channel

• Turbo decoder can perform joint source channel decoding

[Girod, 2004]


Simulation Results

Side information Wyner-Ziv Coding 370 kbps

[Girod, 2004]


Simulation Results

H263 Intraframe Coding 330 kbps, 32.9 dB

Wyner-Ziv Coding 274 kbps, 39.0 dB

[Girod, 2004]


Simulation Results

H263 interframe coding 145 kbps, 40.4 dB

Wyner-Ziv Coding 156 kbps, 37.5 dB

[Girod, 2004]


Simulation Results

[Girod, 2004]

3 dB

8 dB


DISCOVER DVC Codec

• Based on the feedback channel solution from Stanford Univ.

• Based on a split between Wyner-Ziv (WZ) and key frames

• Key frames used with a regular (GOP size) or dynamic periodicity

• Key frames coded with H.264/AVC Intraframe encoding [Pereira, 2007]


Simulation Results

[Pereira, 2007]


DVC for Wireless Visual Sensor Networks (WVSN)

Internet or satellite

Remote control unit(RCU)

Visual sensor node (VSN) Aggregation and forwarding node (AFN)

Sensor field Wireless link


Conventional Multiview Video Coding

[Kubota, 2007]Multiview video coding structure combining inter-view and temporal prediction


Global Motion Estimation

[Lin, NTHU, 2007] [Ebrahimi, 2007]


Multiview Distributed Video Coding

[Ebrahimi, 2006]


Multiview Distributed Video Coding

Temporal side information

Inter-view side information

[Ebrahimi, 2007]


Simulation Results

[Ebrahimi, 2007]


Collaborative Image Coding and Transmission

[1] M. Wu and C. W. Chen, “Collaborative image coding and transmission over Wireless Sensor Networks,” EURASIP Journal on Advances in Signal Processing, special issue on Visual Sensor Networks, 2007.

[2] K. Y. Chow, K. S. Lui, and E. Y. Lam, “Efficient on-demand image transmission in visual sensor networks,” EURASIP Journal on Advances in Signal Processing, special issue on Visual Sensor Networks, 2007.


Proposed Multiview DVC

• The proposed low-complexity video codec is based on the motion estimation is shifted to the decoder the low-complexity image matching is performed at the

encoder based on image warping and robust media hashing

• L. W. Kang and C. S. Lu, “Low-complexity power-scalable multi-view distributed video encoder,” in Proc. of 2007 Picture Coding Symposium, Lisbon, Portugal, Nov. 2007.

• L. W. Kang and C. S. Lu, “Multi-view distributed video coding with low-complexity inter-sensor communication over wireless video sensor networks,” in Proc. of 2007 IEEE Int. Conf. on Image Processing, special session on Distributed source coding II: Distributed video and image coding and their applications, San Antonio, TX, USA, Sept. 2007, vol. 3, pp. 13-16 (invited paper).

• L. W. Kang and C. S. Lu, “Low-complexity Wyner-Ziv video coding based on robust media hashing,” in Proc. of IEEE Int. Workshop on Multimedia Signal Processing, Victoria, BC, Canada, Oct. 2006, pp. 267-272.

P.S. Co-author: Prof. Chun-Shien Lu ( 呂俊賢教授 , 中研院資訊所副研究員 )


Robust Media Hashing

• A compact representation for a frame



A parent and its four child nodes.

Only the parent-child pair with the maximum magnitude difference (Diff) among those of the four pairs in a “parent-four children” pair will be selected

kk

kk

cpDiffDiff 4141

maxmax

p

C4C3

C2C1

The wavelet decomposition for a frame.

c1 c2c3 c4 c1 c2

c3 c4

p

Structural digital signature (SDS)

C. S. Lu and H. Y. M. Liao, “Structural digital signature for image authentication: an incidental distortion resistant scheme,” IEEE Trans. on Multimedia, vol. 5, no. 2, pp. 161-173, June 2003.



• Labeling an SDS the signature symbol sym(p,c) of a parent-child pair (p, c) can

be defined as follows

each parent-four children pair will be represented by a symbol sym(p,c), where the pair (p, c) is with maximum magnitude difference

.02

,02

,01

,01

),(

candcpif

candcpif

pandcpif

pandcpif

cpsym


An illustrated example for encoding with GOP = 4

Proposed Single-view DVC

L. W. Kang and C. S. Lu, “Low-complexity Wyner-Ziv video coding based on robust media hashing,” in Proc. of 2006 IEEE Int. Workshop on Multimedia Signal Processing, Victoria, BC, Canada, Oct. 2006, pp. 267-272 (MMSP2006).

F52 (key frame) F53 (non-key frame) F54 (non-key frame) F55 (non-key frame) F56 (key frame)

SDS extraction SDS extraction SDS extraction SDS extraction SDS extraction

SDS comparison and non-key bits generation

S52 ( = SR53): +1, 0, +1, +2, -2, … S53: +1, 0, +1, +2, -1, …


S56 ( = SR55): -1, 0, +1, +2, -2, …S55: -1, 0, +1, +2, -1, …

Simple interpolation

SDS extraction

S54: +1, 0, +1, -2, -1, …

SR54: +1, 0, +1, -2, +1, …


Non-key bits for F53 Non-key bits for F55

Non-key frame bits for F54

F52 (key frame) F53 (non-key frame) F54 (non-key frame) F55 (non-key frame) F56 (key frame)

SDS extraction SDS extraction SDS extraction SDS extraction SDS extraction


S52 ( = SR53): +1, 0, +1, +2, -2, … S53: +1, 0, +1, +2, -1, …


S56 ( = SR55): -1, 0, +1, +2, -2, …S55: -1, 0, +1, +2, -1, …

Simple interpolation

SDS extraction

S54: +1, 0, +1, -2, -1, …

SR54: +1, 0, +1, -2, +1, …


Non-key bits for F53 Non-key bits for F55

Non-key frame bits for F54


• Consider several adjacent VSNs observing the same target scene in a WVSN

• For each VSN, Vs, an input video sequence is divided into several GOPs, in which a GOP consists of a key frame, Ks,t, followed by several non-key frames, Ws,t

A simple example of the GOP structure for a WVSN with Nsensor = 3, where GOPS0 = 1,

GOPS1 = 4, and GOPS2 = 2.

VSN / Time

instant t t + 1 t + 2 t + 3 t + 4 •••

V0 K0,t K0,t+1 K0,t+2 K0,t+3 K0,t+4 ••• V1 K1,t W1,t+1 W1,t+2 W1,t+3 K1,t+4 ••• V2 K2,t W2,t+1 K2,t+2 W2,t+3 K2,t+4 •••

Target scene

V0 V1V2

Target scene

V0 V1V2

Proposed Multiview DVC


Key Frame Encoding

• Key frameseach key frame is encoded using the H.264/AVC intra-

frame encoder firstThe global motion estimation between the key frames

from adjacent VSNs will be performed at the decoder (RCU)

The estimated motion parameters between each pair of the key frames from adjacent VSNs will be sent back to the corresponding VSNs via feedback channel


Global Motion Estimation between the Key Frames from Adjacent VSNs

Target scene Vk

Ki,t

Kj,t

AFN

Decoder at the RCUPerform global motion estimation between decoded Ki,t and Kj,t, and send back the estimated motion parameters via the feedback channel.

An example of a WVSN

Internet

Feedback channel

Global motion parameters

Vi

Vj

Target sceneTarget scene Vk

Ki,t

Kj,t

AFN

Decoder at the RCUPerform global motion estimation between decoded Ki,t and Kj,t, and send back the estimated motion parameters via the feedback channel.

An example of a WVSN

Internet

Feedback channel

Global motion parameters

Vi

Vj


Key Frame Encoding

Target scene

V0

V1

K’0,48

K’1,48

Vk

Warping

(a) Co-located block MSE calculation and comparison(b) Block-based SDS extraction and comparison(c) Significant wavelet coefficients extraction

Ќ0,48

Quantization and entropy encoding

Compressed bitstream for K1,48

Significant wavelet coefficients for K1,48


Non-key Frame Encoding

• Based on hash comparisons• Block coding mode selection (Intra, Inter, or Skip)

for each frame, all the blocks are sorted in an increasing order based on their PSNR values (calculated with their co-located blocks in the reference frame from the same VSN)

B(1) B(2) ••• B(i) B(i+1) B(i+2) ••• •••B(j) B(j+1) B(k)

PSNR(1) ≤ PSNR(2) ≤ ••• ≤ PSNR(i+1) ≤ ••• ≤ PSNR(k)

T1 T2

Blocks with Intra mode (H.264/AVC intra-frame encoding)

Blocks with Inter mode (SDS extraction and comparison)

Blocks with Skip mode


Non-key Frame Encoding for Blocks with Inter Mode

V0

V1

K0,45

W1,45

Warping

(d) Block-based SDS extraction and comparison(e) True significant symbols extraction

Quantization and entropy-encoding

Compressed bitstream for the blocks with inter mode in W1,45

K’0,45

R1,45 = K1,44

(a) Co-located blocks comparison(b) Block-based SDS extraction and comparison(c) Initial significant symbols extraction

Initial significant symbols for W1,45

SDS for K’0,45

Target scene

Significant wavelet coefficients for W1,45

V0

V1

K0,45

W1,45

Warping

(d) Block-based SDS extraction and comparison(e) True significant symbols extraction

Quantization and entropy-encoding

Compressed bitstream for the blocks with inter mode in W1,45

K’0,45

R1,45 = K1,44

(a) Co-located blocks comparison(b) Block-based SDS extraction and comparison(c) Initial significant symbols extraction

Initial significant symbols for W1,45

SDS for K’0,45

Target sceneTarget scene

Significant wavelet coefficients for W1,45


Simulation Results

28

30

32

34

36

38

40

42

0 200 400 600 800 1000 Bitrate (kbps)

PSNR (dB)

H.264 Inter (GOP = ∞ ) Proposed (GOP = 4)Multi (GOP = 4) Single (GOP = 4)H.264 Intra (GOP = 1)


Concluding Remarks

• Low-complexity video coding becomes a very hot research topic• Distributed video coding (DVC) based on distributed source coding (DSC)

becomes a new paradigm of low-complexity video coding• Further researches

side information generation transformation and quantization channel coding rate control Other DSC-related applications

multimedia authenticationbiometrics security layered video codingError resilience for standard video coding

other low-complexity video coding architectures


References

[1] F. Pereira, L. Torres, C. Guillemot, T. Ebrahimi, R. Leonardi, and S. Klomp, “Distributed video coding: selecting the most promising application scenarios,” to appear in Signal Processing: Image Communication.

[2] C. Guillemot, F. Pereira, L. Torres, T. Ebrahimi. R. Leonardi, J. Ostermann, “Distributed monoview and multiview video coding: basics, problems and recent advances,” IEEE Signal Processing Magazine, vol. 24, no. 5, pp. 67-76, Sept. 2007.

[3] M. Maitre, C. Guillemot, and L. Morin, “3-D model-based frame interpolation for distributed video coding of static scenes,” IEEE Trans. on Image Processing, vol. 16, no. 5, pp. 1246-1257, May 2007.

[4] R. Puri, A. Majumdar, and K. Ramchandran, “PRISM: a video coding paradigm with motion estimation at the decoder,” IEEE Trans. on Image Processing, vol. 16, no. 10, pp. 2436-2448, Oct. 2007.

[5] R. Puri, A. Majumdar, P. Ishwar, and K. Ramchandran, “Distributed video coding in wireless sensor networks,” IEEE Signal Processing Magazine, vol. 23, no. 4, pp. 94-106, July 2006.

[6] B. Girod, A. M. Aaron, S. Rane, and D. Rebollo-Monedero, “Distributed video coding,” Proceedings of the IEEE, vol. 93, no. 1, pp. 71-83, Jan. 2005.

[7] X. Artigas, J. Ascenso, M. Dalai, S. Klomp, D. Kubasov, and M. Ouaret, “The DISCOVER codec: architecture, techniques and evaluation,” in Proc. of 2007 Picture Coding Symposium, Lisbon, Portugal, Nov. 2007.


Our Preliminary Publications

[1] L. W. Kang and C. S. Lu, “Low-complexity power-scalable multi-view distributed video encoder,” in Proc. of Picture Coding Symposium, Lisbon, Portugal, Nov. 2007 (PCS2007).

[2] L. W. Kang and C. S. Lu, “Multi-view distributed video coding with low-complexity inter-sensor communication over wireless video sensor networks,” in Proc. of IEEE Int. Conf. on Image Processing, special session on Distributed Source Coding II: Distributed Image and Video Coding and Their Applications, San Antonio, TX, USA, Sept. 2007, vol. 3, pp. 13-16 (ICIP2007, invited paper).

[3] L. W. Kang and C. S. Lu, “Low-complexity Wyner-Ziv video coding based on robust media hashing,” in Proc. of IEEE Int. Workshop on Multimedia Signal Processing, Victoria, BC, Canada, Oct. 2006, pp. 267-272 (MMSP2006).

[4] L. W. Kang and C. S. Lu, “Wyner-Ziv video coding with coding mode-aided motion compensation,” in Proc. of IEEE Int. Conf. on Image Processing, Atlanta, GA, USA, Oct. 2006, pp. 237-240 (ICIP2006).

li-wei kang ( 康立威 ) institute of information science, academia sinica taipei, taiwan...

Documents