video steganalysis exploiting motion vector-2012

7/29/2019 Video Steganalysis Exploiting Motion Vector-2012

1/4

IEEE SIGNAL PROCESSING LETTERS, VOL. 19, NO. 1, JANUARY 2012 35

Video Steganalysis Exploiting Motion VectorReversion-Based Features

Yun Cao, Xianfeng Zhao, and Dengguo Feng

AbstractUnlike traditional image or video steganography inspatial/transform domain, motion vector (MV)-based methodstarget the internal dynamics of video compression and embedmessages while performing motion estimation. However, we have

noticed that some existing methods adopt nonoptimal selectionrules and modify MVs in somewhat arbitrary manners which

violate the encoding principles a lot. Aiming at these weaknesses,we design a calibration-based approach and propose MV re-version-based features for steganalysis. Experimental resultsdemonstrate that the proposed features are very sensitive to the

tendency of MV reversion during calibration and can be used toeffectively detect some typical MV-based steganography even withlow embedding rates.

Index TermsCalibration, motion vector, MPEG, steganalysis,

video.

I. INTRODUCTION

I N recent years, networked multimedia applications havebeen significantly facilitated by high performance net-working and compression technologies. Steganography usingcompressed video stream can easily achieve a large capacityeven with low embedding rates. Moreover, covert commu-nications via internet television, video telephony or videoconference are not easy to arouse suspicion.

The proposed steganalytic approach targets the video

steganography making use of MVs. We focus on these methodsfor the following two reasons: First, since the MV values areleveraged as the information carrier, the statistical characteris-tics of the spacial/frequence coefficients are indirectly affected.Secondly, because the motion compensation technique isadopted by most advanced compression standards and the MVsare lossless coded, little degradation of the reconstructed visualquality would be introduced [2]. The advantages outlined makethe MV-based steganography less detectable compared to thoseutilizing spacial/frequence coefficients directly.

Typical MV-based steganography share some features incommon, i.e., they first select a subset of MVs following apredefined selection rule (SR), then make certain modifications

Manuscript received June 16, 2011; revised November 02, 2011; acceptedNovember 07, 2011. Date of publication November 15, 2011; date of currentversion November 28, 2011. This work was supported by the Natural ScienceFoundation of Beijing under Grant 4112063 and the Natural Science Foundationof China under Grant 61170281. The associate editor coordinating the reviewof this manuscript and approving it for publication was Dr. Dimitrios Tzovaras.

Y. Cao is with the State Key Laboratory of Information Security, Institute ofSoftware, Chinese Academy of Sciences, Beijing 100190, China, and also withthe Graduate University of the Chinese Academy of Sciences, Beijing 100049,China (e-mail: [email protected]).

X. Zhao and D. Feng are with the State Key Laboratory of Information Se-curity, Institute of Software, Chinese Academy of Sciences, Beijing 100190,China (e-mail: [email protected]; [email protected]).

Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/LSP.2011.2176116

to them for data hiding. Xu et al. [3] suggested embeddingmessage bits in the MV magnitudes, the LSBs of MVs hor-izontal or vertical components are used for embedding. Fangand Chang [4] designed a method utilizing MVs phase angles.These two schemes select candidate MVs (CMV) accordingto their magnitudes with the assumption that modificationsapplied to MVs with larger magnitudes introduce less distor-tion. But later, Aly had pointed out that the magnitude-basedSR cannot ensure minimum prediction errors [2]. He hencedesigned a new selection rule by which MVs associated withlarge prediction errors are chosen, and message bits are hiddenin the LSBs of both their horizontal and vertical components.

In order to further enhance the steganographic security, inour latest work, we suggested using adopting nonshared SRsand minimizing the embedding impacts by perturbed motionestimation [5].

Compared to the efforts devoted to image steganalysis, videosteganalysis remains largely unexplored. Most current stegan-alyzers (e.g., [6][9]) model the video data as successive stillimages and the embedding process as adding independent meanzero Gaussian noises. The reliability of this model is likely todeteriorate when MV values are used fordata hiding.Since MVsare leveraged, detectable MV statistical changes should be ex-ploited when designing specific steganalysis. To the best of ourknowledge, in this direction, the only achievement was made by

Zhang et al. [13] Their steganalytic features are directly drawfrom certain MVs statistics, but not very effective with lowembedding strengths as tested in our experiments. In this letter,we design a calibration-based approach to perform dynamic ste-ganalysis. As will be demonstrated later, if we decompress astego video to the spatial domain and compress it again withno embedding involved, the altered MVs are inclined to revertto their prior values. Strong tendency of MV reversion wouldsignal the existence of hidden messages. Therefore, calibrationis done by recompression and MV reversion-based features arederived from the differences between the original and the cali-brated videos.

The rest of the letter is organized as follows: In the next sec-tion, we explain the basic concepts of the motion-compensated

prediction and the phenomenon of MV reversion. In Section III,we give details on feature definition and describe the imple-mentation of the used steganalyzer. In Section IV, compara-tive experiments are conducted to show the performance of theproposed steganalytic features. Finally, concluding remarks aregiven in Section V with some future research directions.

II. THE PHENOMENON OF MV REVERSION

A. Motion-Compensated Prediction

Motion-compensated prediction is an integral part of videocompression, and its basic idea is to predict the frame to becoded using one or more prior coded frames. This is possible

in practice because video data is essentially a series of highly

1070-9908/$26.00 2011 IEEE


2/4

36 IEEE SIGNAL PROCESSING LETTERS, VOL. 19, NO. 1, JANUARY 2012

Fig. 1. Generic structure of the inter-MB coding.

correlated still images, and the temporal redundancy can begreatly reduced by inter-frame coding. State-of-the-art videocoding standards try to remove the temporal redundancy viablock-based motion estimation applied to pixels macroblocks (MB). A generic structure of the inter-MB coding is de-picted in Fig. 1. To encode the current MB , the encoder usesone prior coded frame as the reference and search for s ap-proximation within it. To measure the prediction error between

and one candidate MB , the sum of absolute differ-ences (SAD) is commonly used

(1)

where and are luminance values. As a result, the MBwith minimum SAD is taken as s best prediction and denotedby . Consequently, only the MV representing the spatialdisplacement offset and the differential signal blockneed to be further coded and transmitted.

B. MV Reversion While Recompression

Calibration [1] is well known as an image steganalytic con-cept which estimates the macroscopic properties of the coverfrom the stego image. A typical calibration-based steganalyzerreconstructs an estimation of thecover from the stego object anddraws features based on the difference between the two.

Since our target steganographic methods are MV-based, weare interested in certain statistical characteristics of MVs. Forthe compressed videos, calibration can be done by decom-pressing the videos to the spatial domain and compressing themagain with no message embedded. As will be demonstratedin details, the MVs altered in the first compression have theinclination to revert to their prior values. Therefore, the MVs ofthe calibrated videos have most macroscopic features similarto those of the clean videos.

We focus on one inter-MB whose MV has been changedfor embedding, and take a close look at what will happen duringcalibration. Suppose that was modified to and thedifferential signal was calculated based on the MB associated

with , i.e., . Subsequently, instead of underwent DCT transformation, quantization and entropycoding before transmission as depicted in Fig. 1.

The first step of calibration is decompression during whichis retrieved as where is s reconstruction.

In the second step, recompression is performed withoutembedding. As the case in the first compression, when ap-plying motion estimation to 1 and

will be calculated for comparison as

1Actually, s reference frame has to be coded again before used as s

reference frame. Because is a compressed frame, a second time compressionunder the same settings wont introduce much distortion compared to the firstcompression. Here is used as s reference frame for a close approximation.

(2)

and similarly

(3)

Lemma 1 tells us that the variable has zero mean, thusthe expectations of and can be estimated as

(4)

and similarly

(5)

Since during the first compression the inequationholds, we have .

Then a conclusion can be draw that, for an inter-MBwhose MV has been changed for data hiding, its MV has aninclination to revert during recompression.

Lemma 1: As to the differential coefficient, the differencebetween the original value and its reconstruction has zero mean,i.e., .

Proof: In the first compression, the differential signalhas to be DCT coded and quantized. Bellifemine et al. [10] hadpointed out that, if the motion compensation technique is used,

the 2D-DCT coefficients of the differential signal tend to beless correlated. Thus the distribution of the coefficients incan be well modeled with the Laplacian probability

density function [11] as

(6)

Since the commonly used quantizer divides the sample by in-teger Q and rounds to the nearest integer, the probability that asample will be quantized to is simply theprobability that the sample is between andcalculated as

(7)


3/4

CAO et al.: VIDEO STEGANALYSIS EXPLOITING MOTION VECTOR REVERSION-BASED FEATURES 37

Fig. 2. Proportion differences between stego (50 bpf) and nonstego videos.

Then the expectation of the difference introduced by quantiza-tion is

(8)

i.e., . Because and are linear combinations of

the coefficients in and respectively, there is ,

i.e., .

Now if one specific MV reverts toafter calibration, we will call this reversion a shift. The shiftdistance is calculate by . Given a

compressed video with hidden messages, according to theaboveanalysis, those modified MVs are likely to have nonzero shiftsafter recompression. So compared to the corresponding non-stego video, the stego one is expected to have a lower propor-tion of zero shift MVs and higher proportions of nonzero shiftMVs. Fig. 2. shows the proportion differences between stegoand nonstego videos caused by different embedding methods.The embedding methods and the used test videos are describedin Section IV-A.

III. PROPOSED STEGANALYZER

A. MV Reversion-Based FeaturesBased on the fact that the modified MVs have the inclination

to revert during recompression, we define a differential operatorapplied to an inter-MB as

(9)

where and are s prediction errors before and after re-compression. The first element of the tuple measures the MVshift distance whereas the second changes in prediction errors.Generally speaking, larger values in indicate a larger prob-ability that s MV has been once modified. Given a group ofcompressed inter-frames consisting of inter-MBs, three types

of features are defined as follows where denotes the inter-MBand the upper bound of MV shift distance.

1) Features of Type 1: These features estimate the probabili-ties of MV shift distances defined as

(10)

here is used to calculate the cardinality of a set.2) Features of Type 2: These features are proportions of

correspond to given shift distances defined as

(11)

where .3) Features of Type 3: These features are derived from type

2 features by taking MV shift distances into account anddefined as

(12)

where

B. Steganalyzer Designing

We choose five features from each of the 3 types and form

a 15-d feature vector for training andclassification. To be more specific, the first 5 features are of type1, and

(13)

The second and the last five features are of type 2 and type3 respectively which are processed analogically. The classifieris implemented using Changs support vector machine (SVM)[12] with the polynomial kernel.

IV. EXPERIMENTS

A. Experimental Setup

1) Test Sequences: A database of 22 CIF video sequencesin the 4:2:0 YUV format is used for experiments. Since the se-quences vary in sizes from 90 to 2000 frames and most of themhave 300 frames, each sequence is divided into 75-frame sub-sequences without overlapping and the total number of subse-quences sums up to 111.

2) Steganographic Methods: Our experiments focus on at-tacking 4 MV-based steganographic methods, i.e., Alys [2], Xuet al.s [3], Fang and Changs [4] and our recently proposed[5] methods, and they are referred to as Tar1, Tar2, Tar3 andTar4. These four targets are implemented using a well-known


4/4

38 IEEE SIGNAL PROCESSING LETTERS, VOL. 19, NO. 1, JANUARY 2012

Fig. 3. ROC curves of steganalyzers using Zhangs and our proposed features.

TABLE IPERFORMANCE COMPARISON BETWEEN ZHANGS (S1) AND OUR PROPOSED

FEATURES (S2) USING DIFFERENT SLIDING WINDOW SIZES (WS).(IN THE UNIT OF %)

MPEG-4 video codec Xvid [14]. As the message bits are em-

bedded into MVs, the embedding strength is measured by theaverage embedded bits per inter-frame (bpf). In our work, theconsidered embedding strength is 50 bpf.

3) Training and Classification: In our experiments, 15 YUVsequences consisting of 77 subsequences are randomly selectedfor training purposes, and the remaining seven sequences are leftfor testing. All subsequences are compressed by Xvid encoderwith standard settings to produce the class of clean videos. Onthe other hand, for a given steganography, all subsequences aresubjected to compression with random messages embedded tocreate the class of stego videos.

We use a fixed size sliding window to scan each subsequencewithout overlapping, and the steganalytic features representingthe clean or stego class are extracted from the frames within thewindow. It can be expected that with the window size increases,more stable statistical features can be obtained whereas the res-olution of the steganalyzer will decrease.

B. Performance Results

Besides our proposed features, C. Zhang et al.s [13] ste-ganalytic features are also leveraged for comparison. The truenegative (TN) rates and true positive (TP) rates are computedby counting the number of detections after a whole scanningover each subsequence in the test set. The performances of thesteganalyzers with different sliding window sizes are tested,and the corresponding results are recorded in Table I. As anexample, the detector receiver operating characteristic (ROC)curves of the steganalyzers using 8-frame sliding window areplotted in Fig. 3.

In general, our proposed features outperform C. Zhangs bya significant margin. It is also observed that, there is an evidentdrop in detection accuracy when testing Tar4. For Tar1-Tar3,we have pointed out in [5] that their SRs cannot eventually guar-antee minimum distortions, and the selected CMVs are modifiedin some nonoptimal manners (e.g., LSB replacement) which vi-olate the encoding principles a lot. As illustrated in Fig. 2(a)(c),

the modified MVs are more likely to revert after recompression.As for Tar4, since both the SR and the embedding process havebeen optimized with the hope that the introduced perturbationswill be confused with normal motion estimation deviations, thetendency of MV reversion is suppressed as can be seen fromFig. 2(d).

V. CONCLUSION AND FUTURE WORK

In this letter, we have presented a calibration-based stegana-lytic scheme against MV-based steganography. We have shownwith both mathematical analysis and experiments that theperturbation in regular motion estimation causes MV reversionduring recompression. Proposed features measuring the ten-

dency of MV reversion can be used to effectively detect sometypical MV-based steganography even with a low embeddingstrength.

However, since our proposed features are sensitive to the ten-dency of MV reversion, if some optimized measures are adoptedto weaken this embedding effect, the detection performance islikely to drop. In our future work,wed like to investigate how toimprove the adaptability of the proposed features. Possible ap-proaches include using higher-order features and adopting cer-tain feature selection/fusion techniques.

REFERENCES

[1] J. Fridrich, Feature-based steganalysis for JPEG images and its impli-cations for future design of steganographic schemes, in Proc. IH04,

Lecture Notes in Co mputer Science, 2004, vol. 3200/2005, pp. 6781.[2] H. Aly, Data hiding in motion vectors of compressed video based on

their associated prediction error, IEEE Trans. Inf. Forensics Secur.,vol. 6, no. 1, pp. 1418, 2011.

[3] C. Xu, X. Ping, and T. Zhang, Steganography in compressed videostream, in Proc. ICICIC06, 2006, pp. 269272.

[4] D. Fang and L. Chang, Data hiding for digital video with phaseof motion vector, in Proc. Int. Symposium on Circuit and Systems(ISCAS)[C], 2006, pp. 14221425.

[5] Y. Cao, X. Zhao, D. Feng, and R. Sheng, Video steganography withperturbed motion estimation, in Proc. IH11, Lecture Notes in Com-puter Science, 2011, vol. 6958, pp. 193207.

[6] U. Budhia, D. Kundur, and T. Zourntos, Digital video steganalysisexploiting statistical visibility in the temporal domain, IEEE Trans.

Inf. Forensics Secur., vol. 1, pp. 4355, 2006.[7] J. S. Jainsky, D. Kundur, and D. R. Halverson, Towards digital

video steganalysis using asymptotic memoryless detection, in Proc.MM&Sec07, 2007, pp. 161168.[8] C. Zhang, Y. Su, and C. Zhang, Video steganalysis based on aliasing

detection, Electron. Lett., vol. 44, no. 13, pp. 801803, 2008.[9] V. Pankajakshan, G. Doerr, and P. K. Bora, Detection of motion-in-

coherent components in video streams, IEEE Trans. Inf. ForensicsSecur., vol. 4, no. 1, pp. 4958, 2009.

[10] F. Bellifemine, A. Capellino, A. Chimienti, R. Picco, and R. Ponti,Statistical analysis of the 2D-DCT coefficients of the differentialsignal for images, Signal Process.: Image Commun., vol. 4, no. 6,pp. 477488, 1992.

[11] M. J. Gormish andJ. T. Gill, Computation-rate-distortionin transformcoders for image compression, SPIE Vis. Commun. Image Process.,pp. 146152, 1993.

[12] C. Changand C. Lin, LIBSVM: A Library forSupport Vector Machines2001 [Online]. Available: http://www.csie.ntu.edu.tw/cjlin/libsvm

[13] C. Zhang, Y. Su, and C. Zhang, A new video steganalysis algorithm

against motion vector steganography, in Proc. WiCOM08, 2008, pp.14.

[14] 2009 [Online]. Available: http://www.xvid.org/, Xvid Codec 1.1.3

video steganalysis exploiting motion vector-2012

Documents