robust speech watermarking procedure in the time-frequancy domain

Upload: shervinsh

Post on 08-Apr-2018

222 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/7/2019 Robust Speech Watermarking Procedure in the Time-frequancy Domain

    1/10

    Hindawi Publishing CorporationEURASIP Journal on Advances in Signal ProcessingVolume 2008, Article ID 519206, 9 pagesdoi:10.1155/2008/519206

    Research Article

    Robust Speech Watermarking Procedure inthe Time-Frequency Domain

    Srdjan Stankovic, Irena Orovic, and Nikola Zaric

    Electrical Engineering Department, University of Montenegro, 81000 Podgorica, Montenegro

    Correspondence should be addressed to Irena Orovic, [email protected]

    Received 18 January 2008; Accepted 16 April 2008

    Recommended by Gloria Menegaz

    An approach to speech watermarking based on the time-frequency signal analysis is proposed. As a time-frequency representationsuitable for speech analysis, the S-method is used. The time-frequency characteristics of watermark are modeled by using speechcomponents in the selected region. The modeling procedure is based on the concept of time-varying filtering. A detector form thatincludes cross-terms in the Wigner distribution is proposed. Theoretical considerations are illustrated by the examples. Efficiencyof the proposed procedure has been tested for several signals and under various attacks.

    Copyright 2008 Srdjan Stankovic et al. This is an open access article distributed under the Creative Commons AttributionLicense, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properlycited.

    1. INTRODUCTION

    Digital watermarking has been developed as an effectivesolution for multimedia data protection. Watermarkingusually assumes embedding of secret signal that shouldbe robust and imperceptible within the host data. Also,reliable watermark detection must be provided. A numberof proposed watermarking techniques refer to the speechand audio signals [1]. Some of them are based on spread-spectrum method [24], while the others are related tothe time-scale method [5, 6], or fragile content featurescombined with robust watermarking [7].

    The existing watermarking techniques are mainly based

    on either the time or frequency domain. However, in bothcases, the time-frequency characteristics of watermark donot correspond to the time-frequency characteristics ofspeech signal. It may cause watermark audibility, becausethe watermark will be present in the time-frequency regionswhere speech components do not exist. In this paper, atime-frequency-based approach for speech watermarking isproposed. The watermark in the time-frequency domain ismodeled to follow specific speech components in the selectedtime-frequency regions. Additionally, in order to provideits imperceptibility, the energy of watermark is adjusted tothe energy of speech components. In image watermarking,an approach based on the two-dimensional space/spatial

    frequency distribution has already been proposed in [8].However, it is not appropriate in the case of speech signals.

    Among all time-frequency representations, the spectro-gram is the simplest one. However, it has a low time-frequency resolution. On the other hand, the Wignerdistribution, as one of the commonly used, produces alarge amount of cross-terms in the case of multicomponentsignals. Thus, the S-method, as a cross-terms free time-frequency representation, can be used for speech analysis.The watermark is created by modeling time-frequencycharacteristics of a pseudorandom sequence according tothe certain time-frequency speech components. The mainproblem in these applications is the inversion of the time-

    frequency distributions. A procedure based on the time-varying filtering has been proposed in [9]. The Wignerdistribution has been used to create time-varying filter thatidentifies the support of a monocomponent chirp signal.However, it cannot be used in the case of multicomponentspeech signals. Also, some interesting approaches to signalscomponents extraction from the time-frequency plane havebeen proposed in [10, 11].

    In this work, the time-varying filtering, based on thecross-terms free time-frequency representation, is adaptedfor speech signals and watermarking purpose. Namely, thisconcept is used to identify the support of certain speechcomponents in the time-frequency domain and to model the

  • 8/7/2019 Robust Speech Watermarking Procedure in the Time-frequancy Domain

    2/10

    2 EURASIP Journal on Advances in Signal Processing

    watermark according to these components. The basic idea ofthis approach has been introduced in [12]. The time-varyingfiltering is also used to overcome the problem of inversemapping from the time-frequency domain. Additionally, areliable procedure for blind watermark detection is providedby modifying the correlation detector in the time-frequency

    domain. It is based on the Wigner distribution, becausethe presence of cross-terms improves detection results [13].Therefore, the main advantage of the proposed method is inproviding efficient watermark detection with low probabili-ties of error for a set of strong attacks. Payload provided bythis procedure is suitable for various applications [1].

    The paper is organized as follows. Time-frequencyrepresentations and the concept of time-varying filtering arepresented in Section 2. A proposal for watermark embeddingand detection is given in Section 3. The evaluation of theproposed procedure is performed by the various examplesand tests in Section 4. Concluding remarks are given inSection 5.

    2. THEORETICAL BACKGROUND

    Time-frequency representations of speech signal and theconcept of time-varying filtering will be considered in thisSection.

    2.1. Time-frequency representation of speech signals

    Time-frequency representations have been used for speechsignal analysis. The Wigner distribution, as one of the com-monly used time-frequency representations, in its pseudo-form is defined as

    WD(n, k) = 2N/2

    m=N/2

    w(m)w(m) f(n + m)

    f(nm)e j22mk/N,

    (1)

    where f represents a signal ( denotes the conjugatedfunction), w is the window function, Nis the window length,while n and k are discrete time and frequency variables,respectively. However, if we represent a multicomponentsignal (such as speech) as a sum ofMcomponents fi(n), that

    is, f(n) =M

    i=1 fi(n), its Wigner distribution produces a largeamount of cross-terms:

    WD f(n, k) =Mi=1

    WDif(n, k) + 2Real

    Mi=1

    Mj>i

    WDi jf (n, k)

    ,

    (2)

    where WDif(n, k) are the autoterms, while WDi jf (n, k),

    for i /= j, represent the cross-terms. In order to preserveautoterms concentration as in the Wigner distribution, andto reduce the presence of cross-terms, the S-method (SM)has been introduced [14]:

    SM(n, k) =L

    l=L

    P(l)STFT(n, k + l)STFT(n, k l), (3)

    where P(l) is a finite frequency domain window with length2L + 1, while STFT is the short-time Fourier transformdefined as STFT(n, k) =

    N/2m=N/2w(m) f(n + m)e

    j2mk/N,with window function w(m). Thus, the SM of the multi-component signal, whose components do not overlap in thetime-frequency plane, represents the cross-terms free Wigner

    distribution of the individual signal components. By takingthe rectangular window P(l), the discrete form of SM can bewritten as

    SM(n, k) =STFT(n, k)2

    + 2Real

    Ll=1

    STFT(n, k + l)STFT(n, k l)

    .

    (4)

    Note that the terms in summation improve the qualityof spectrogram (square module of the short-time Fouriertransform) toward the quality of the Wigner distribution.

    The window P(l) should be wide enough to enablethe complete summation over the autoterms. At the sametime, to remove the cross-terms, it should be narrowerthan the distance between the autoterms. The convergencewithin P(l) is very fast, so that high autoterms concentrationis obtained with only a few summation terms. Thus, inmany applications L < 5 can be used [14]. Unlike theWigner distribution, the oversampling in time domain is notnecessary since the aliasing components will be removed inthe same way as the cross-terms. More details about the S-method can be found in [14, 15].

    Comparing to other quadratic time-frequency distri-butions, the S-method provides a significant saving incomputation time. The number of complex multiplicationsfor the S-method is N(3 + L)/2, while the number ofcomplex additions is N(6 + L)/2 [14] (N is the number ofsamples within the window w(m)). In the case of Wignerdistribution, these numbers are significantly larger: N(4+ log2N)/2 for complex multiplications and Nlog22N forcomplex additions. It is important to note that the S-methodallows simple and efficient hardware realization that hasalready been done [16, 17].

    2.2. Time-varying filtering

    Time-varying filtering is used in order to obtain watermarkwith specific time-frequency properties as well as to providethe inverse transform from the time-frequency domain. Inthe sequel, the general concept of the time-varying filteringis presented.

    For a given signal x, the pseudoform of time-varyingfiltering, suitable for numerical realizations, has been definedas [18]

    Hx(t) =

    h

    t+

    2, t

    2

    w()x(t+ )d, (5)

    where w is a lag window, is a lag coordinate, while hrepresents impulse response of the time-varying filter. Time-varying transfer function, that is, support function, has been

  • 8/7/2019 Robust Speech Watermarking Procedure in the Time-frequancy Domain

    3/10

    Srdjan Stankovic et al. 3

    defined as Weyl symbol mapping of the impulse responseinto the time-frequency domain [18]:

    LH(t, ) =

    h

    t+

    2, t

    2

    e jd, (6)

    where tand are time and frequency variables, respectively.

    Thus, by using the support function (6), the filter outputcan be obtained as [18]

    Hx (t) =1

    2

    LH(t, )STFTx(t, )d. (7)

    The discrete form of the above relation can be written as

    Hx(n) =1

    N

    N/2k=N/2

    LH(n, k)STFTx(n, k), (8)

    where STFTx is the STFT of an input signal x, while N isthe length of window w(m). According to (8), by using theSTFT of a pseudorandom sequence and a suitable support

    function, the watermark with specific time-frequency char-acteristics will be obtained [12]. The support function will bedefined in the form of time-frequency mask that correspondsto certain speech components.

    3. WATERMARKING PROCEDURE USINGTIME-FREQUENCY REPRESENTATION

    A method for time-frequency-based speech watermarkingis proposed in this section. The watermark is embeddedin the components of a voiced speech part. It is modeledto follow the time-frequency characteristics of significantspeech formants. Furthermore, the procedure for watermarkdetection in the time-frequency domain is proposed.

    3.1. Watermark sequence generation

    In order to select the speech components for watermarking,the region D in the time-frequency plane, that is, D ={(t, ) : t (t1, t2), (1, 2)}, is considered (seeFigure 1). The time instances t1 and t2 correspond to the startand the end of voiced speech part. The voice activity detector,that is, word end-points detector [1921], is used to selectthe voiced part of speech signal. The strongest formants areselected within the frequency interval (1, 2).

    The time-frequency characteristics of the watermarkwithin the region D can be modeled by using the support

    function defined as

    LM(t, ) =

    1, for (t, ) D,

    0, for (t, ) /D.(9)

    Thus, the support function LM will be used to create awatermark with specific time-frequency characteristics. Inorder to use the strongest formants components, the energyfloor is introduced. Thus, the function LM can be modifiedas

    LM(t, ) =

    1, for (t, ) D, and SMx(t, ) > ,

    0, for (t, ) /D, or SMx(t, ) ,(10)

    0

    0.5

    1

    1.5

    2

    2.5

    3

    3.5

    4

    Frequency(kHz)

    0 200 400 600 800 1000 1200

    Time (ms)

    Figure 1: Illustration of the region D.

    where SMx(t,) represents the SM of speech signal. Sincethe energy floor is used to avoid watermarking of weak

    components, an appropriate expression for is given by = 10 log10(max(SMx(t,))), where max(SMx(t, )) is amaximal value of signals S-method in the region D, while is a parameter with values between 0 and 1. The higher means that stronger components are taken. It is assumedthat the significant components within the region areapproximately of the same strength. It means that only afew closest formants should be considered within the regionD. Therefore, if different time-frequency regions are usedfor watermarking, each energy floor should be adapted tothe strength of maximal component within the consideredregion. It is important to note that generally, the value is notnecessary for the detection procedure, as it will be explained

    latter.The pseudorandom sequence p is an input of the time-

    varying filter. According to (8), the watermark is obtained as

    wkey(n) =1

    N

    N/2k=N/2

    LM(n, k) STFTp(n, k), (11)

    where STFTp(n,k) is the discrete STFT of the sequence p.Since the watermark is modeled by using the function LM,it will be present only within the specified region where thestrong signal components exist.

    Finally, the watermark embedding is done according to

    xw(n) = x(n) + wkey(n). (12)

    3.2. Watermark detection

    The watermark detection is performed in the time-frequencydomain by using the correlation detector. The time instancest1 and t2 are determined by using voice activity detector. Itis not necessary that the detector contains the informationabout the frequency range (1,2) of the region D. Namely,the correlation can be performed along the entire frequencyrange of signal, but it is only effective within (1,2) (regionD), where watermark components exist. By the way, theinformation about the range (1,2) can be extracted fromthe watermark time-frequency representation.

  • 8/7/2019 Robust Speech Watermarking Procedure in the Time-frequancy Domain

    4/10

    4 EURASIP Journal on Advances in Signal Processing

    The detector responses must satisfy the following:

    D

    STFTxw (t, ) STFTwkey(t, ) > T, (13)

    where STFTxw (t,), STFTwkey(t,) represent the short-time

    Fourier transform of watermarked signal and the short-time Fourier transform of watermark, respectively, while Tis a threshold. The detector response for any wrong trial(sequence created in the same manner as watermark) shouldnot be greater than the threshold value.

    The support function LM and the energy floor are notrequired in the detection procedure. The function LM canbe extracted from the watermark and used to model othersequences that will act as wrong trials, or simply it does nothave to be used. Namely, detection can be performed evenby using STFT of nonmodeled pseudorandom sequence p(used to create watermark). The watermark is included inthe sequence p, and correlation will take effect only on the

    time-frequency positions of watermark. The remaining partsof the sequence p have the same influence on detection as inthe case of wrong trials.

    A significant improvement of watermark detection isobtained if the cross-terms in the time-frequency plane areincluded. Namely, for the calculation of SM in the detectionstage, a large window length L can be chosen. For the windowlength greater than the distance between the autoterms,cross-terms appear:

    Mi, j=1

    j>i

    N/2l=Lmin+1

    Real

    STFTi(n, k + l)STFTj (n, k l)

    /= 0, (14)

    where Lmin is the minimal distance between the autoterms.Thus, by increasing L in (4), the SM approaches the

    Wigner distribution (for L = N/2 Wigner distribution isobtained). An interesting approach to signal detection, basedon the Wigner distribution, is proposed in [13], where thepresence of cross-terms increases the number of componentsused in detection. Namely, apart from the autoterms, thewatermark is included in the cross-terms as well. Therefore,by using the time-frequency domain with the cross-termsincluded, watermark detection can be significantly relaxedand improved, since the watermark is spread over a largenumber of components within the considered region. If the

    cross-terms are considered, the correlation detector in thetime-frequency domain can be written as

    Det =N

    i=1

    SMiwkey SMixw +

    Ni, j=1i /= j

    SMi, jwkey SM

    i, jxw , (15)

    where the first summation includes autoterms, while thesecond one includes cross terms.

    Since the cross-terms contribute in watermark detection,they should be included in other existing detectors struc-tures. For example, the locally optimal detector based onthe generalized Gaussian distribution of the watermarked

    coefficients, in the presence of cross terms in the time-frequency domain, can be written as

    Det =N

    i=1

    SMiwkeysgn

    SMixwSMixw1

    +N

    i, j=1i /= j

    SMi, jwkeysgn

    SMi, jxwSMi, jxw1.

    (16)

    The performance of the proposed detector is tested byusing the following measure of detection quality [22, 23]

    R =Dwr Dww

    2wr + 2ww

    , (17)

    where D and 2 represent the mean value and the standarddeviation of the detector responses, respectively, whileindexes wr and ww indicate the right and wrong keys (trials).

    The watermarking procedure has been done for diff

    erentright keys (watermarks). For each of the right keys, a certainnumber of wrong trials are generated in the same manner asright keys.

    The probability of error Perr is calculated by using

    Perr = pDww

    T

    PDww (x) dx + pDwr

    T

    PDwr(x) dx, (18)

    where the indexes wr and ww have the same meaning as in theprevious relation, Tis a threshold, while equal priors pDww =pDwr = 1/2 are assumed. By considering normal distributionfor PDww and PDwr and

    2wr =

    2ww , the minimization of Perr

    leads to the following relation:

    Perr =1

    4erfc

    R

    2

    1

    4erfc

    R

    2

    +

    1

    2. (19)

    By increasing the value of R, the probability of errordecreases. For example, Perr(R = 2) = 0.0896, Perr(R = 3) =0.0169, while Perr(R = 4) = 0.0023.

    4. EXAMPLES

    Efficiency of the proposed procedure is demonstrated onseveral examples, where signals with various maximalfrequencies and signal to noise ratios (SNRs) are used.The successful detection in the time-frequency domain is

    performed in the case without attack as well as with a set ofstrong attacks.

    Example 1. The speech signal with fmax = 4 kHz is consid-ered. This maximal frequency is used to provide an appro-priate illustration of the proposed method. The STFT wascalculated by using rectangular window with 256 samplesfor time-varying filtering. Zero padding up to 1024 sampleswas carried out, and the parameter L = 5 is used in theSM calculation. The region D (Figure 2(a)) is selected tocover the first three low-frequency formants of voiced speechpart. The corresponding support function LM (Figure 2(b))is created by using the value with parameter = 0.7.

  • 8/7/2019 Robust Speech Watermarking Procedure in the Time-frequancy Domain

    5/10

    Srdjan Stankovic et al. 5

    0

    0.2

    0.4

    0.6

    0.8

    Frequency(kHz)

    450 500 550 600 650 700

    Time (ms)

    (a)

    0

    0.2

    0.4

    0.6

    0.8

    Frequency(kHz)

    450 500 550 600 650 700

    Time (ms)

    (b)

    Figure 2: (a) Region D of analyzed speech signal, (b) supportfunction.

    Selection of the voiced speech part is done by using theword end-points detector based on the combined Teagerenergy and energy-entropy features [20, 21] (a nonoverlap-ping speech frames of length 8 milliseconds are used). Theoriginal and watermarked signals are given in Figure 3(a).

    The obtained SNR is higher than 20 dB, which fulfills theconstraint of watermark imperceptibility [24]. The water-mark imperceptibility has also been proven by using the ABXlistening test, where A, B, and X are original, watermarked,and original or watermarked signal, respectively. The listenerlistens to A and B. Then, listener listens to X and decideswhether X is A or B. Since A, B, and X are few secondslong, the entire signals are listened to, not only isolatedsegments. Three female and seven male listeners with normalhearing participated in the listening test. The test wasperformed few times, and from the obtained statistics itwas concluded that the listeners cannot positively distinguishbetween watermarked and original signals.

    In order to illustrate the efficiency of the proposeddetector form, an isolated watermarked speech part isconsidered. However, it is not limited to this particularspeech part but, depending on the required data payload,various voiced speech parts can be used to embed anddetect watermark. Detection is performed by using 100 trials

    with wrong keys. The responses of the standard correlationdetector for STFT coefficients are given in Figure 3(b), whilethe responses of the detector defined by (15) are shown inFigures 3(c) and 3(d) (for window length L = 10 and L =32, resp.). The detector response for right key is normalizedto the value 1, while the responses for wrong keys areproportionally presented.

    Observe that for the same right key and the same setof wrong trials, the improvement of detection results isachieved by increasing parameter L (see Figure 3). Thus,it is obvious that the detector performance increases withthe number of cross terms. In the following experiments,L = 32 has been used to provide reliable detection. Further

    increasing of L does not improve results significantly. Notethat a window width N + 1 (for L = N/2), like in theWigner distribution, can cause the presence of cross-termsthat do not contain watermark, since they could result fromtwo nonwatermarked autoterms. These cross-terms are notdesirable in watermark detection procedure.

    Additionally, we have performed experiments with fewother speech signals. For each signal, the low-frequencyformants are used, and the watermark has been embeddedwith approximately the same SNR (around 24 dB). Thedetection is performed by using (15) with L = 32. We presentthe results for three of them in Figure 4. Note that theobtained results are very similar to the ones in Figure 3(d).

    Thus, the detection performance is insensitive to diff

    erentsignals tested under same conditions.

    Example 2. In the previous example, the low-frequency for-mants have been considered. However, different frequencyregions can be used. Thus, the procedure is also testedfor watermark modeled according to the middle-frequencyformants. The detection results are given in Figure 5(a)( fmax = 4kHz and L = 32). The ratio between detectorresponses for right key and wrong trials is lower than inthe previous example, with low-frequency formants, butstill satisfactory. The obtained SNR is 28 dB. In addition,the middle frequency formants of a signal with fmax =11.025kHz have been considered. The results of watermark

    detection are given in Figure 5(b) (L = 32, and SNR =32 dB). Extended frequency range enables more space forwatermarking. Thus, it allows embedding watermark withlower strength, providing higher SNR.

    Example 3 (evaluation of detection efficiency and robustnessto attacks). In order to evaluate the efficiency of theproposed procedure by using the measure of detectionquality defined by (17), we repeated the procedure for 50trials (for 50 right keyswatermarks). They are modeledcorresponding to the low-frequency formants. For each ofthe right keys, a number of 60 wrong keys (trials) aregenerated in the same manner as right keys. The average

  • 8/7/2019 Robust Speech Watermarking Procedure in the Time-frequancy Domain

    6/10

    6 EURASIP Journal on Advances in Signal Processing

    Non-watermarked speech signal104

    2

    0

    2 0 1000 2000 3000 4000 5000 6000

    Watermarked speech s ignal104

    2

    0

    2 0 1000 2000 3000 4000 5000 6000

    (a)

    0.4

    0.2

    0

    0.2

    0.4

    0.6

    0.8

    1

    0 20 40 60 80 100

    Right key

    100 wrong trials

    (b)

    0

    0.2

    0.4

    0.6

    0.8

    1

    0 20 40 60 80 100

    Right key

    100 wrong trials

    (c)

    0.2

    0

    0.2

    0.4

    0.6

    0.8

    1

    0 20 40 60 80 100

    Right key

    100 wrong trials

    (d)

    Figure 3: (a) Original and watermarked signals, (b) detection results for STFT coefficients, (c) detection results for SM coefficients and L =10, (d) detection results for SM coefficients and L = 32 (SNR = 24 dB).

    SNR is around 27 dB. The watermark imperceptibility hasbeen proven by using ABX listening test as in the firstexample. Again, the watermarked signal is perceptibly similarto the original one. The detection is performed by usingcorrelation detector that includes cross-terms in the time-frequency domain (L = 32). The responses of the proposeddetector for right and wrong keys are shown in Figure 6.The threshold is set as T = (Dwr + Dww )/2, where Dwr and

    Dww represent the mean values of the detector responses forright keys (watermarks) and wrong trials, respectively. Thecalculated measure of detection quality is R = 7.5, this meansthat the probability of detection error is equal to 5 108.The obtained probabilities of error for other signals (testedin Example 1) are of order 108 as well.

    In the sequel, the procedure is tested on various attacks,such as Mp3 compression for different bit rates, time scaling,

  • 8/7/2019 Robust Speech Watermarking Procedure in the Time-frequancy Domain

    7/10

    Srdjan Stankovic et al. 7

    0.2

    0

    0.2

    0.4

    0.6

    0.8

    1

    0 50 100

    Right key

    100 wrong trials

    (a)

    0.2

    0

    0.2

    0.4

    0.6

    0.8

    1

    0 50 100

    Right key

    100 wrong trials

    (b)

    0.2

    0

    0.2

    0.4

    0.6

    0.8

    1

    0 50 100

    Right key

    100 wrong trials

    (c)

    Figure 4: Detection results for three out of all tested signals.

    0.4

    0

    0.52

    1

    0 20 40 60 80 100

    Right key

    100 wrong trials

    (a)

    0.4

    0

    1

    0 20 40 60 80 100

    Right key

    100 wrong trials

    (b)

    Figure 5: Detection results for watermark modeled to followmiddle frequency formants (a) fmax = 4 kHz, (b) fmax = 11.025 kHz.

    pitch scaling, echo, amplitudes normalization, and so forth.The results of detection in terms of quality measure R,and corresponding probabilities of detection error Perr are

    given in Table 1. The most of attacks are realized by usingCoolEditPro v2.0, while the rest of the processing is done inMatlab 7.

    Note that a plenty of considered attacks are strong, andthey introduce a significant signal distortion. For example,in the existing audio watermarking procedures, usuallyapplied time scaling is up to 4%, wow and flutter up to0.5% or 0.7%, echo 50 milliseconds or 100 milliseconds[4, 25]. We have applied stronger attacks to show that,even in this case, the proposed method provides highrobustness with very low probabilities of detection error (seeTable 1). Note that these results were obtained with a higherwatermark bit rate (more details will be provided in the

  • 8/7/2019 Robust Speech Watermarking Procedure in the Time-frequancy Domain

    8/10

    8 EURASIP Journal on Advances in Signal Processing

    0.5

    0

    0.5

    1

    1.5

    2

    2.5

    3

    3.5

    41026

    0 500 1000 1500 2000 2500 3000

    Right keys

    Wrong tria ls

    Figure 6: The responses of the proposed detector for 50 right keysand 3000 wrong trials.

    Table 1: Measures of detection quality and probabilities of error fordifferent attacks.

    Attack R Perr

    No attack 7.5 108

    Mp3 (constant bit rate:8 Kbps) 6.92 10

    7

    Mp3 (variable bit rate75120 Kbps) 6.8 10

    7

    Mp3 (variable bit rate4050 Kbps)

    6.23 106

    Delay: mono light echo(180ms, mixing 20%) 6.9 10

    7

    Echo (200 ms) 6.8 107

    Time stretch (15%) 6.2 106

    Wow (delay 20%) 6.3 106

    Bright flutter (deep 10,sweeping rate 5 Hz)

    6.8 107

    Deep flutter (central freq.1000Hz, sweeping rate 5 Hz,

    modes-sinusoidal, filtertype-low pass)

    6.82 107

    Amplitude: normalize (100%) 6.95 107

    Wow (delay 10%) and brightflutter 6.72 10

    6

    Pitch scaling 5% 5.6 105

    Additive Gaussian noise (SNR= 35 dB)

    6.9 107

    next subsection). The time-scale modification (TSM) is oneof the challenging attacks in audio watermarking that has

    specially been considered in the recentliterature [24]. Veryfew algorithms can resist these desynchronization attacks[24]. Here, we have applied TSMtime stretch up to 15%by using software tool CoolEditPro v2.0. However, the lowprobability of detection error is still maintained. Only in thecase of pitch scaling the obtained probability of error was

    lower (see Table 1), but still satisfying.Apart from the very low probabilities of detectionerror, an additional advantage of the proposed detection isin providing more flexibility related to desynchronizationbetween frequencies of the watermark sequence embeddedin the signal and watermark sequence used for detection.The correlation effects are enhanced since the detection isperformed within the whole time-frequency region coveredwith a large number of cross-terms apart from the autoterms.

    In the sequel, the achieved payload and some relatedapplications are given.

    4.1. Data payload

    In this example, we have used a single voiced part to embeda pseudorandom sequence that represents one bit of infor-mation. The approximate length of watermark, obtained asmodeled pseudorandom sequences, is 1000 samples (125milliseconds for a signal sampled at 8000 Hz). Data payloadvaries between 4 bps and 8bps, depending on the duration ofvoiced speech regions. In the case of speech signal sampledat 44100 Hz, the achievable data payload is 22 bps. In thisway we have provideda required compromise between datapayload and robustness. Thus, the proposed algorithm canbe efficiently used for copyright and ownership protection,copy and access control [1].

    Note that the data payload can be increased by usingshorter sequences. If we consider the watermark sequencewith 500 samples (that correspond to 62.5 milliseconds ofsignal sampled at 8000 Hz) the data payload is increasedtwice (up to 16 bps). However, the probability of detectionerror increases to 104. On the other hand, the probability ofdetection error can decrease even bellow 108 by consideringlower watermark bit rates.

    5. CONCLUSION

    An efficient approach to watermarking of speech signals inthe time-frequency domain is presented. It is based on the

    cross-terms free S-method and the time-varying filteringused for watermark modeling. The watermark impercepti-bility is provided by adjusting the location and the strengthof watermark to the selected speech components withinthe time-frequency region. Also, the efficient watermarkdetection based on the use of cross-terms in time-frequencydomain is provided. The number of cross-terms employedin the detection procedure is controlled by the windowlength used in the calculation of S-method. The experimentalresults demonstrate that the procedure assures convenientand reliable watermark detection providing low probabilityof error. The successful watermark detection has beendemonstrated in the case of various attacks.

  • 8/7/2019 Robust Speech Watermarking Procedure in the Time-frequancy Domain

    9/10

    Srdjan Stankovic et al. 9

    ACKNOWLEDGMENT

    This work is supported by the Ministry of Education andScience of Montenegro.

    REFERENCES

    [1] N. Cvejic, Algorithms for audio watermarking and steganog-raphy, Academic Dissertation, University of Oulu, Oulu,Finland, 2004.

    [2] H. J. Kim, Audio watermarking techniques, in Proceedingsof the Pacific Rim Workshop on Digital Steganography, KyushuInstitute of Technology, Kitakyushu, Japan, July 2003.

    [3] I. J. Cox, J. Kilian, F. T. Leighton, and T. Shamoon, Securespectrum watermarking for multimedia, IEEE Transactionson Image Processing, vol. 6, no. 12, pp. 16731687, 1997.

    [4] D. Kirovski and H. S. Malvar, Spread-spectrum watermarkingof audio signals, IEEE Transactions on Signal Processing, vol.51, no. 4, pp. 10201033, 2003.

    [5] C.-P. Wu, P.-C. Su, and C.-C. Jay Kuo, Robust and efficient

    digital audio watermarking using audio content analysis, inSecurity and Watermarking of Multimedia Contents II, vol.3971 ofProceedings of SPIE, pp. 382392, San Jose, Calif, USA,January 2000.

    [6] M. F. Mansour and A. H. Tewfik, Audio watermarking bytime-scale modification, in Proceedings of the IEEE Intern-tional Conference on Acoustics, Speech, and Signal Processing(ICASSP 01), vol. 3, pp. 13531356, SaltLake City, Utah, USA,May 2001.

    [7] M. Steinebach and J. Dittmann, Watermarking-based digitalaudio data authentication, EURASIP Journal on AppliedSignal Processing, vol. 2003, no. 10, pp. 10011015, 2003.

    [8] S. Stankovic, I. Djurovic, and L. Pitas, Watermarking inthe space/spatial-frequency domain using two-dimensional

    Radon-Wigner distribution, IEEE Transactions on ImageProcessing, vol. 10, no. 4, pp. 650658, 2001.[9] S. Kay and G. Boudreaux-Bartels, On the optimality of

    the Wigner distribution for detection, in Proceedings of theIEEE International Conference on Acoustics, Speech, and SignalProcessing (ICASSP 85), vol. 10, pp. 10171020, Tampa, Fla,USA, April 1985.

    [10] B. Barkat and K. Abed-Meraim, Algorithms for blind com-ponents separation and extraction from the time-frequencydistribution of their mixture, EURASIP Journal on AppliedSignal Processing, vol. 2004, no. 13, pp. 20252033, 2004.

    [11] C. Ioana, A. Jarrot, A. Quinquis, and S. Krishnan, Awatermarking method for speech signals based on the time-warping signal processing concept, in Proceedings of theIEEE International Conference on Acoustics, Speech and Signal

    Processing (ICASSP 07), vol. 2, pp. 201204, Honolulu,Hawaii, USA, April 2007.

    [12] S. Stankovic, I. Orovic, N. Zaric, and C. Ioana, An approachto digital watermarking of speech signals in the time-frequency domain, in Proceedings of the 48th InternationalSymposium ELMAR Focused on Multimedia Signal Processingand Communications, pp. 127130, Zadar, Croatia, June 2006.

    [13] B. Boashash, Time-frequency signal analysis, in Advances inSpectrum Analysis and Array Processing, S. Haykin, Ed., chapter9, pp. 418517, Prentice Hall, Englewood Cliffs, NJ, USA,1991.

    [14] L. Stankovic, A method for time-frequency signal analysis,IEEE Transactions on Signal Processing, vol. 42, no. 1, pp. 225229, 1994.

    [15] I. Djurovic and L. Stankovic, A virtual instrument for time-frequency analysis, IEEE Transactions on Instrumentation andMeasurement, vol. 48, no. 6, pp. 10861092, 1999.

    [16] D. Petranovic, S. Stankovic, and L. Stankovic, Special purposehardware for time frequency analysis, Electronics Letters, vol.33, no. 6, pp. 464466, 1997.

    [17] S. Stankovic, L. Stankovic, V. Ivanovic, and R. Stojanovic,

    An architecture for the VLSI design of systems for time-frequency analysis and time-varying filtering, Annales desTelecommunications , vol. 57, no. 9-10, pp. 974995, 2002.

    [18] S. Stankovic, About time-variant filtering of speech signalswith time-frequency distributions for hands-free telephonesystems, Signal Processing, vol. 80, no. 9, pp. 17771785, 2000.

    [19] Q. Li, J. Zheng, A. Tsai, and Q. Zhou, Robust endpointdetection and energy normalization for real-time speech andspeaker recognition, IEEE Transactions on Speech and AudioProcessing, vol. 10, no. 3, pp. 146157, 2002.

    [20] G. S. Ying, C. D. Mitchell, and L. H. Jamieson, Endpointdetection of isolated utterances based on a modified Teagerenergy measurement, in Proceedings of the IEEE InternationalConference on Acoustics, Speech, and Signal Processing (ICASSP

    93), vol. 2, pp. 732735, Minneapolis, Minn, USA, April 1993.[21] L. Gu and S. Zahorian, A new robust algorithm for isolated

    word endpoint detection, in Proceedings of the IEEE Interna-tional Conference on Acoustics, Speech, and Signal Processing(ICASSP 02), vol. 4, pp. 41614164, Orlando, Fla, USA, May2002.

    [22] S. Stankovic, I. Djurovic, R. Herpers, and L. Stankovic, Anapproach to optimal watermark detection, AEU InternationalJournal of Electronics and Communications, vol. 57, no. 5, pp.355357, 2003.

    [23] B.-S. Ko, R. Nishimura, and Y. Suzuki, Time-spread echomethod for digital audio watermarking, IEEE Transactions onMultimedia, vol. 7, no. 2, pp. 212221, 2005.

    [24] S. Xiang and J. Huang, Histogram-based audio watermarkingagainst time-scale modification and cropping attacks, IEEETransactions on Multimedia, vol. 9, no. 7, pp. 13571372, 2007.

    [25] N. Cvejic and T. Seppanen, Improving audio watermarkingscheme using psychoacoustic watermark filtering, in Pro-ceedings of the 1st IEEE International Symposium on SignalProcessing and Information Technology, pp. 169172, Cairo,Egypt, December 2001.

  • 8/7/2019 Robust Speech Watermarking Procedure in the Time-frequancy Domain

    10/10

    PhotographTurismedeBarcelona/J.Trulls

    PreliminarycallforpapersThe 2011 European Signal Processing Conference (EUSIPCO2011) is thenineteenth in a series of conferences promoted by the European Association for

    Signal Processing (EURASIP, www.eurasip.org). This year edition will take placein Barcelona, capital city of Catalonia (Spain), and will be jointly organized by theCentre Tecnolgic de Telecomunicacions de Catalunya (CTTC) and theUniversitat Politcnica de Catalunya (UPC).

    EUSIPCO2011 will focus on key aspects of signal processing theory and

    OrganizingCommitteeHonoraryChairMiguelA.Lagunas(CTTC)

    GeneralChair

    Ana

    I.

    Prez

    Neira

    (UPC)GeneralViceChairCarlesAntnHaro(CTTC)

    TechnicalProgramChairXavierMestre(CTTC)

    Technical Pro ram CoChairsapp c a o ns as s e e ow. ccep ance o su m ss ons w e ase on qua y,relevance and originality. Accepted papers will be published in the EUSIPCOproceedings and presented during the conference. Paper submissions, proposalsfor tutorials and proposals for special sessions are invited in, but not limited to,the following areas of interest.

    Areas of Interest

    Audio and electroacoustics.

    Design, implementation, and applications of signal processing systems.

    JavierHernando(UPC)MontserratPards(UPC)

    PlenaryTalksFerranMarqus(UPC)YoninaEldar(Technion)

    SpecialSessionsIgnacioSantamara(UnversidaddeCantabria)MatsBengtsson(KTH)

    Finances

    Multimedia signal processing and coding. Image and multidimensional signal processing. Signal detection and estimation. Sensor array and multichannel signal processing. Sensor fusion in networked systems. Signal processing for communications. Medical imaging and image analysis.

    Nonstationary, nonlinear and nonGaussian signal processing.

    TutorialsDanielP.Palomar(HongKongUST)BeatricePesquetPopescu(ENST)

    PublicityStephanPfletschinger(CTTC)MnicaNavarro(CTTC)

    PublicationsAntonioPascual(UPC)CarlesFernndez(CTTC)

    Procedures to submit a paper and proposals for special sessions and tutorials willbe detailed at www.eusipco2011.org. Submitted papers must be cameraready, nomore than 5 pages long, and conforming to the standard specified on the

    EUSIPCO 2011 web site. First authors who are registered students can participatein the best student paper competition.

    ImportantDeadlines:

    IndustrialLiaison&ExhibitsAngelikiAlexiou(UniversityofPiraeus)AlbertSitj(CTTC)

    InternationalLiaison

    JuLiu(ShandongUniversityChina)JinhongYuan(UNSWAustralia)TamasSziranyi(SZTAKIHungary)RichStern(CMUUSA)RicardoL.deQueiroz(UNBBrazil)

    Webpage:www.eusipco2011.org

    roposa s orspec a sess ons ec

    Proposalsfortutorials 18Feb 2011

    Electronicsubmissionoffullpapers 21Feb 2011Notificationofacceptance 23May 2011

    Submissionofcamerareadypapers 6Jun 2011