a novel amr-wb speech steganography based on diameter ...in this section, a technical overview of...

12
Research Article A Novel AMR-WB Speech Steganography Based on Diameter-Neighbor Codebook Partition Junhui He , 1 Junxi Chen, 1 Shichang Xiao, 1 Xiaoyu Huang , 2 and Shaohua Tang 1 1 School of Computer Science and Engineering, South China University of Technology, Guangzhou 510006, China 2 School of Economics and Commerce, South China University of Technology, Guangzhou 510006, China Correspondence should be addressed to Junhui He; [email protected] Received 28 September 2017; Accepted 26 December 2017; Published 13 February 2018 Academic Editor: R´ emi Cogranne Copyright © 2018 Junhui He et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Steganography is a means of covert communication without revealing the occurrence and the real purpose of communication. e adaptive multirate wideband (AMR-WB) is a widely adapted format in mobile handsets and is also the recommended speech codec for VoLTE. In this paper, a novel AMR-WB speech steganography is proposed based on diameter-neighbor codebook partition algorithm. Different embedding capacity may be achieved by adjusting the iterative parameters during codebook division. e experimental results prove that the presented AMR-WB steganography may provide higher and flexible embedding capacity without inducing perceptible distortion compared with the state-of-the-art methods. With 48 iterations of cluster merging, twice the embedding capacity of complementary-neighbor-vertices-based embedding method may be obtained with a decrease of only around 2% in speech quality and much the same undetectability. Moreover, both the quality of stego speech and the security regarding statistical steganalysis are better than the recent speech steganography based on neighbor-index-division codebook partition. 1. Introduction With the rapid development of the Internet and the grow- ing popularity of instant messaging application, people are increasingly using audio-based communication. How to avoid interception and secure communication turns into one of the most important research problems. Encryption is a conventional method of protecting communication; however, the transmission of ciphered content may easily arouse attackers’ suspicion. In recent years, steganography has been presented as an effective means of covert communica- tion. Audio steganography can transfer important messages secretly by embedding them into cover audio files with the use of information hiding techniques [1]. Data hiding in audio is especially challenging because the human auditory system operates over a wider dynamic range in comparison with human visual system. Many works on audio steganography have been already reported. Gruhl et al. [2] proposed an audio steganographic method of echo hiding by the introduction of synthetic res- onances in the form of closely spaced echoes. Gopalan [3] presented a method of embedding a covert audio message into a cover utterance by altering one bit in each of the cover utterance samples. Gopalan et al. [4] provided two methods of secret message embedding by modifying the phase or amplitude of perceptually masked or significant regions of a host. And a direct-sequence spread-spectrum water- marking method with strong robustness against common audio editing procedures was proposed in [5]. And many audio steganographic applications including Steghide and Hide4PGP can be freely downloaded from the Internet. But most of these methods are not resilient to AMR-WB speech. Based on segmental SNR analysis of modification to the encoded bits in a frame, Liu et al. [6] selected the perceptually least important bits to embed secret message in G.729 speech. In [7], a simple and effective steganographic approach, which may be applied to 5.3 Kbps G.723.1 speech, was presented based on analyzing the redundancy of code parameters, and augmented identity matrix was utilized to lower the distortion of cover speech. Similarly, by calculating speech quality sensitivity on each encoded bit out of 244 bits using perceptual evaluation of speech quality (PESQ) criterion, a Hindawi Security and Communication Networks Volume 2018, Article ID 7080673, 11 pages https://doi.org/10.1155/2018/7080673

Upload: others

Post on 23-Jan-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Novel AMR-WB Speech Steganography Based on Diameter ...In this section, a technical overview of AMR-WB codec is rstly presented. en two related codebook partition algorithms CNV

Research ArticleA Novel AMR-WB Speech Steganography Based onDiameter-Neighbor Codebook Partition

Junhui He 1 Junxi Chen1 Shichang Xiao1 Xiaoyu Huang 2 and Shaohua Tang1

1School of Computer Science and Engineering South China University of Technology Guangzhou 510006 China2School of Economics and Commerce South China University of Technology Guangzhou 510006 China

Correspondence should be addressed to Junhui He hejhscuteducn

Received 28 September 2017 Accepted 26 December 2017 Published 13 February 2018

Academic Editor Remi Cogranne

Copyright copy 2018 Junhui He et al This is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

Steganography is a means of covert communication without revealing the occurrence and the real purpose of communicationThe adaptive multirate wideband (AMR-WB) is a widely adapted format in mobile handsets and is also the recommended speechcodec for VoLTE In this paper a novel AMR-WB speech steganography is proposed based on diameter-neighbor codebookpartition algorithm Different embedding capacity may be achieved by adjusting the iterative parameters during codebook divisionThe experimental results prove that the presented AMR-WB steganography may provide higher and flexible embedding capacitywithout inducing perceptible distortion compared with the state-of-the-art methods With 48 iterations of cluster merging twicethe embedding capacity of complementary-neighbor-vertices-based embedding method may be obtained with a decrease of onlyaround 2 in speech quality and much the same undetectability Moreover both the quality of stego speech and the securityregarding statistical steganalysis are better than the recent speech steganography based on neighbor-index-division codebookpartition

1 Introduction

With the rapid development of the Internet and the grow-ing popularity of instant messaging application people areincreasingly using audio-based communication How toavoid interception and secure communication turns into oneof the most important research problems Encryption is aconventionalmethodof protecting communication howeverthe transmission of ciphered content may easily arouseattackersrsquo suspicion In recent years steganography has beenpresented as an effective means of covert communica-tion Audio steganography can transfer important messagessecretly by embedding them into cover audio files with theuse of information hiding techniques [1]Data hiding in audiois especially challenging because the human auditory systemoperates over a wider dynamic range in comparison withhuman visual system

Many works on audio steganography have been alreadyreported Gruhl et al [2] proposed an audio steganographicmethod of echo hiding by the introduction of synthetic res-onances in the form of closely spaced echoes Gopalan [3]

presented a method of embedding a covert audio messageinto a cover utterance by altering one bit in each of the coverutterance samples Gopalan et al [4] provided two methodsof secret message embedding by modifying the phase oramplitude of perceptually masked or significant regionsof a host And a direct-sequence spread-spectrum water-marking method with strong robustness against commonaudio editing procedures was proposed in [5] And manyaudio steganographic applications including Steghide andHide4PGP can be freely downloaded from the Internet Butmost of these methods are not resilient to AMR-WB speech

Based on segmental SNR analysis of modification to theencoded bits in a frame Liu et al [6] selected the perceptuallyleast important bits to embed secret message in G729 speechIn [7] a simple and effective steganographic approach whichmay be applied to 53 Kbps G7231 speech was presentedbased on analyzing the redundancy of code parametersand augmented identity matrix was utilized to lower thedistortion of cover speech Similarly by calculating speechquality sensitivity on each encoded bit out of 244 bits usingperceptual evaluation of speech quality (PESQ) criterion a

HindawiSecurity and Communication NetworksVolume 2018 Article ID 7080673 11 pageshttpsdoiorg10115520187080673

2 Security and Communication Networks

data hiding approach to embedding data in enhanced fullrate (EFR) compressed speech bitstream is proposed in [8]In addition Nishimura [9] proposed threemethods of hidingdata in the pitch delay data of the AMR speech

Based on complementary neighbor vertices codebookpartition algorithm (CNV) Xiao et al [10] presented anapproach to information hiding in compressed speech withthe use of quantization index modulation (QIM) [11] Huanget al [12] proposed a steganographic algorithm for embed-ding data in different speech encoding parameters of theinactive frames the embedding capacity of which is boundedby the number of inactive frames in the cover speech In [13]Huang et al also presented a method for steganography inlow bit-rate VoIP streams based on pitch period prediction Itcan achieve high quality of stegospeech and prevent statisticalsteganalysis but the embedding rate is still low (only about1333 bps) And an adaptive suboptimal pulse combinationconstrained (ASOPCC) method was presented in [14] toembed data into compressed speech signal of AMR-WBcodec However most of the PESQ scores in different codingmodes are not high In [15] a key-based codebook partitionstrategy which dynamically determines the adopted divisionscheme was designed to improve the security of the QIMsteganography in speech bitstreamAlthough the stegospeechquality is guaranteed to be good the embedding capacity isvery limited and not adjustable Liu et al [16] proposed aneighbor-index-division codebook division algorithm (NID)for G7231 speech Differing from the existing CNVmethodNID divides neighbor-indexed codewords into separatedsubcodebooks according to a suitable stegocoding strategyThe embedding capacity is improved by using multipledivision and multi-ary coding strategy

The adaptive multirate wideband (AMR-WB) is a widelyadapted format in mobile handsets and is also the recom-mended speech codec for VoLTE AMR-WB speech may bea good candidate for cover medium in audio steganographyTherefore we will focus on AMR-WB speech steganographyin this paper Firstly a new diameter-neighbor (DN) code-book partition algorithm toward AMR-WB speech is pro-posed Based on DN codebook division we develop a novelAMR-WB speech steganography capable of providing flexibleembedding capacity with different iterative parameter 119873119894For example when 119873119894 = 48 twice the embedding capacityof CNV-based method may be obtained with a decrease ofonly about 2 in speech quality and much the same unde-tectability Moreover both the quality of stego speech and thesecurity of defending against statistical steganalysis [17 18]are better than the recent NID-based speech steganography

The remainder of this paper is organized as follows InSection 2 the related work is briefly introduced In Section 3the proposedDNcodebook partition algorithm and the novelAMR-WB speech steganography are described in detail Theexperimental results and analysis are provided in Section 4Finally conclusions are presented

2 Related Work

In this section a technical overview of AMR-WB codecis firstly presented Then two related codebook partition

algorithms CNV [10] and NID [16] are also briefly re-viewed

21 AMR-WB Codec The AMR-WB speech codec is stan-dardized by 3GPP (3rd Generation Partnership Project) andadopted as the standard G7222 by ITU-T in 2002 [19] It is amultirate wideband speech codec applied in modern mobilecommunication systems to remarkably improve the speechquality The AMR-WB codec operates at a multitude of bitrates ranging from 66 kbits to 2385 kbits

The input audio signal is separated into 20ms long frameusing 16 kHz sampling rate Every frame contains a linear pre-diction analysis (LPA) and the LP coefficients are converted toimmittance spectrum pairs (ISP) coefficients ISP coefficientsare then converted to frequency domain (ISF) for quantiza-tion Except for mode 0 (66 kbits) the ISF coefficients arequantized using two-stage vector quantization with split-by-2 in first stage and split-by-5 in the second stage Both thesecond and the third codebooks in the second stage have 128codewords and the ISF indices of the codewords in thesecodebooks may be employed to embed secret message

In the decoder the transmitted indices are first parsedfrom the received bitstream and then decoded to obtain thecode parameters for each transmitted frame such as the ISPvector the 4 fractional pitch lags the 4 LTP filtering parame-ters the 4 innovative code vectors and the 4 sets of vectorquantized pitch and innovative gains For a more detaileddescription one should refer to [19] From the received ISFindices which may have been modified because of secretmessage embedding the receiver can recover the embeddedsecret message

22 Complementary Neighbor Vertices CNV is a new typeof codebook partition algorithm proposed in [10] in whicheach codeword in a codebook is viewed as a vertex inthe multidimensional space The relationship between twocodewords 119883 and 119884 is described as an edge connecting thetwo codewordsrsquo vertices And the weight of an edge is definedas the Euclidean distance 119863(119883119884) between two codewords 119883and 119884 Small value of 119863(119883119884) indicates that 119883 and 119884 bear aclose resemblance to each other The vertex nearest to 119883 isreferred to as119883rsquos neighbor vertex which is denoted by119873(119883)The vertex set 119881 together with the edge set 119864 form a graph119866(119881 119864) in a multidimensional space

The codebook partition is realized by the constructionof the graph 119866(119881 119864) and vertex labelling First each vertex119883 in 119866(119881 119864) is connected with its neighbor vertex 119873(119883)using an edge Thus the graph 119866(119881 119864) would be divided intoseveral isolated subgraphs each of whichmay be proved to beacyclic and 2-colorable Second every vertex and its neighborvertex in a subgraph are labelled oppositely using ldquo0rdquo or ldquo1rdquoThird all of the vertices with same label are collected into asubcodebook hence two subcodebooks will be obtained

Based on the generated subgraphs and the label assignedto each codeword in themCNV-based steganography appliesQIM concept to embed secret message More specificallywhen the label of the codeword 119883 which is associatedwith the cover quantization index 119868119886 agrees with the secretmessage 119868119886 remains unchanged or else it should be replaced

Security and Communication Networks 3

AMR-WBspeech

AMR-WBspeechIndex

parse

Secret Codebooksmessage

Partition

StegoAMR-WB

speech

speech

Stego

Cluster set

Stego ISFindices

ISFindicesindices

stego ISFIndexparse

Public SecretmessageEmbed Index

update channelExtract

AMR-WBDecoder

Decoded

Figure 1 Diagram of the proposed method

with the quantization index of the neighbor codeword119873(119883)which belongs to the opposite subcodebook

The key characteristic of CNV-based steganography isthat the distortion is bound even in the worst case How-ever the embedding capacity is limited which is analyzedexperimentally in Section 4 Moreover the number of pos-sible combinations of flipping coefficients which determinewhether the labels in a subgraph will be flipped is large Extrainformation about the flipping processmust be transmitted tothe receiver and thus the effective embedding capacitymay bedecreased further

23 Neighbor Index Division NID assumes that the code-words of neighbor indices (ie neighbor positions) in acodebook would be close together Hence the codewordsin a codebook can be easily separated into subcodebooksaccording to their indices instead of the Euclidean distanceSpecifically select an appropriate integer 119896 according to thedemand for embedding capacity and label the 119894th codewordwith digit (119894 minus 1) mod 119896 respectively Then collect all thecodewords with same label into a subcodebook and obtain119896 different subcodebooks

In order to take full use of the embedding capacitythe binary secret message should be transformed into 119896-ary digits denoted by 119898 (119898 isin 0 1 119896 minus 1) When thecodeword related to the cover quantization index belongs tothe subcodebook whose label differs from the 119896-ary digit 119898to be embedded this index should be substituted with that ofthe closest codeword in the corresponding subcodebook119898

NID-based steganography is an information hidingmethod based on neighbor-index codebook partition ofwhich the embedding capacity may be controlled by thenumber of subcodebooks 119896 However as illustrated in [16]only about 34 of the pairs of neighbor-index codewordshappened to be the pairs of neighbor-vertex codewords Andthe mean distance between neighbor-index codewords isapparently larger than that of neighbor-vertex codewordsTherefore the amount of distortion induced by NID-basedsteganography may be a little large which is proved by theexperimental results provided in Section 4

3 Proposed Method

The diagram of the proposed method is shown in Figure 1Based on DN codebook partition of the codebooks described

in Section 21 secret message can be embedded into anAMR-WB speech file After the stego AMR-WB speech fileis received the embedded secret message can be extractedwithout errors At the same time the decoded speech withoutperceptible distortion will also be obtained In the followingsection the diameter-neighbor codebook partition algorithm(DN) is first introducedThen the embedding and extractionprocedure of our proposed method are described

31 Codebook Partition A codebook may be viewed as alist of isolated code vectors (ie codewords) in the multidi-mensional space The codebook partition algorithm used foraudio steganography is to divide the codebook into severalclusters in each of which the codewords can be replaced witheach other without causing perceptible distortion

Let 119861 denote the original codebook with 119873119887 code-words and 119862 denote a cluster with 119873119888 codewords 119882119905 (119905 =1 2 119873119888) and the centroid 119866 of a cluster 119862 is defined asfollows

119866 (119894) = 1119873119888119873119888sum119905=1

119882119905 (119894) (1)

where 119866(119894) and 119882119905(119894) are the 119894th components of 119866 and 119882119905respectively

The centroid 119866 (average code vector) is used to representthe corresponding cluster 119862 hence the cluster 119862 may alsobe considered as a vector in the multidimensional codebookspace In order to describe the similarity between two clusters1198621 and1198622 the Euclidean distance between them is defined asfollows

119863(1198621 1198622) = radic 119899sum119894=1

(1198661 (119894) minus 1198662 (119894))2 (2)

where 1198661 and 1198662 are the corresponding geometric centerpoints of the two clusters 1198621 and 1198622 And 119899 is the dimensionof a codeword 1198661(119894) and 1198662(119894) are the 119894th components of 1198661and 1198662 respectively

Let 119878 denote a cluster set The diameter of 119878 is defined asthe maximal Euclidean distance119863119898 of all cluster pairs in thecluster set 119878 that is

119863(119862119901 119862119902) le 119863119898 forall119901 119902 = 1 2 |119878| (3)

4 Security and Communication Networks

Codebook B

Initialize a cluster set Sby taking each codewordas a independent clusterand a empty cluster set S

Cluster set S

Put the remainingclusters in S into S

No

Yes

Put the clustersin S into S tomake S empty

S is empty

No

Yes

Search for the diametercluster pair(Cd1 Cd2) in S

Remove Cd1 Cd2 and theirneighbors from S put

Merge Cd1 Cd2 withtheir neighbors respectivelyinto two new clusters

Ni gt 0

Ni = Ni minus 1

into STemp1 and Temp2Temp1 and Temp2

Figure 2 Diagram of our proposed codebook partition

where |119878| is the number of clusters within the cluster set119878 The cluster pair with maximal Euclidean distance 119863119898called diameter cluster pair is denoted by (1198621198891 1198621198892) And theneighbor of a cluster119862 in 119878 is represented by119873(119862 119878) then wehave

119863 (119862119873 (119862 119878)) le 119863 (119862 119862119901) forall119901 = 1 2 |119878| (4)

Figure 2 illustrates the diagram of the proposed DNcodebook partition algorithm And its detailed procedureis given in Algorithm 1 The original codebook 119861 will bedivided into |119878| clusters by iteratively merging the diametercluster pair with their respective neighbors An iterationparameter119873119894 is applied to obtain flexible embedding capacitythrough controlling the merging procedure The relationshipbetween119873119894 and the embedding capacity will be discussed inSection 43

Figure 3 is provided as an example to illustrate theproposed codebook partition algorithmThe white circle ldquoIrdquodenotes a codeword And the oval ldquordquo with shadow denotesa codeword and its neighbor in 119878 being processed whilethe oval ldquordquo without shadow represents a cluster in 1198781015840 thathas been formed The ldquo0rdquo ldquo1rdquo ldquo00rdquo ldquo01rdquo ldquo10rdquo or ldquo11rdquo ina circle ldquoIrdquo is the label of a codeword in the cluster Thecross ldquotimesrdquo means the centroid of the cluster it belongs to anda line ldquominusrdquo represents the diameter of a cluster set The firstto third merging iterations are shown in Figures 3(a)ndash3(c)respectively The fourth merging iteration is comprised of

Figures 3(d) and 3(e) and Figure 3(f) demonstrates thelabelling of the codewords

32 Embedding Procedure In our proposed method the ISFindices corresponding to the codewords in the codebook arefirst obtained by parsing the host AMR-WB speechThen theISF indices are employed to embed secret message based oncodebook partition Generally the codewords in the samecluster as the codeword referred by 119868119886 lies in are consideredto be replaceable with each other According to the secretmessage to be embedded 119868119886 may be substituted by one of theother codewordsrsquo indiceswithin the same clusterThenumberof secret message bits that can be embedded depends on thesize of the specific cluster The embedding procedures aregiven in the following

Step 1 Search cluster set 119878 for the cluster 119862 which containsthe codeword referred by the ISF index 119868119886Step 2 If there are 119873 codewords in 119862 the number of secretbits that can be embedded into 119868119886 is calculated as 119899 = lfloorlog2119873rfloorStep 3 Read 119899 not-yet-embedded bits denoted by 119898 fromthe secret message 119868119886 is replaced with 119868119887 which indexes thecodeword with the same label as119898

Step 4 Repeat Steps 1ndash3 until all the secret bits are embedded

Security and Communication Networks 5

(a) 1st iteration (119873119894 = 4) (b) 2nd iteration (119873119894 = 3) (c) 3rd iteration (119873119894 = 2)

(d) 4th iteration (119878 = 1198781015840 1198781015840clear()) (e) 4th iteration (119873119894 = 1)

0

0

1

1

11

11

00

0001 01

1010

(f) Labelling

Figure 3 An example of our proposed codebook partition

Input Codebook 119861 iterative parameter119873119894Output Cluster set 119878⋆ 1198781015840 is a helper cluster set ⋆1198781015840clear()119878clear()⋆ Each codeword is taken as a initial cluster ⋆for 119894 = 0 119894 lt 119873119888 ++119894 do119878push (119862119894)end⋆ Iterative merging ⋆while 119873119894 gt 0 do

if 119878 is empty then119878 = 11987810158401198781015840clear()end(1198621198891 1198621198891) = argmax119894119895isin12|119878|119863(119862119894 119862119895)1198791198901198981199011 = 1198621198891 cup 119873(1198621198891 119878)1198791198901198981199012 = 1198621198892 cup 119873(1198621198892 119878)1198781015840push (1198791198901198981199011)1198781015840push (1198791198901198981199012)119878remove (1198621198891)119878remove (1198621198892)119878remove (119873(1198621198891 119878))119878remove (119873(1198621198892 119878))119873119894 = 119873119894 minus 1

end⋆ Put the remaining clusters in 1198781015840 into 119878 ⋆for iter = 1198781015840begin() iter lt 1198781015840end() ++iterdo119878push (lowastiter)

endreturn 119878

Algorithm 1 DN-based codebook partition algorithm

Cluster set S

ISF

Stego ISF

Secret

00 01

1110

0 1

Search amp replace

ClusteL1

ClusteL2

index Ia

index Ib

Wa Wb

WcWd

bits ldquo01rdquo

Ia Ib

Figure 4 Embedding two bits into one cover ISF index

Figure 4 is an example of embedding two secret bits intoone cover ISF index Let us assume the cluster set 119878 containstwo clusters and the corresponding codeword indexed by 119868119909is119882119909 for example 119868119887 indexes the codeword119882119887 Hence theISF index 119868119886 shown in Figure 4 will be replaced with 119868119887 whichindexes the codeword119882119887 with the same label as the secret bitsldquo01rdquo

33 Extracting Procedure When the stego AMR-WB speechis transferred to the intended receiver the stego indices maybe obtained by parsing AMR-WB speech stream and used toextract the embedded secretmessageThemessage extractionprocedures from the stegoindex 119868119887 are given below

Step 1 Search cluster set 119878 which is the same as that employedin the embedding procedure for the cluster119862which containsthe codeword119882119887 referred by the ISF index 119868119887Step 2 If there are totally 119873 codewords in 119862 the number ofsecret bits carried by 119868119887 is computed by 119899 = lfloorlog2119873rfloor

6 Security and Communication Networks

Stego ISF

Cluster set S

Extracted

00 01

1110

0 1

Search amp read

ClusteL1

ClusteL2

index Ib

Wa Wb

Wc Wd

bits ldquo01rdquo

ldquo01rdquo

Ib

Figure 5 Extracting two bits from one stego-ISF index

Step 3 Read the label of119882119887 as the extracted 119899 bits which areappended to the secret message bit sequence

Step 4 Repeat Steps 1ndash3 until all the secret bits are recovered

Figure 5 is the corresponding example of extracting twosecret bits from the stegoindex 119868119887 generated by the previousembedding instance shown in Figure 4 It can be easily seenthat the extracted secret bits are identical to the embeddedsecret bits

4 Experimental Results and Analysis

In order to demonstrate the performance of the proposedmethod the perceptual quality of the stego AMR-WB speechwith secret message embedded using our method is com-puted and compared to that of the stego AMR-WB speechgenerated with CNV and NID steganography Moreover theflexibility of embedding capacity and the security regardingstatistical detection are analyzed in detail

41 Audio Database TIMIT acoustic-phonetic continuousspeech corpus (httpscatalogldcupenneduldc93s1) is anaudio database which contains broadband recordings of630 speakers of eight major dialects of American Englisheach reading ten phonetically rich sentences and all audiosentences are sampled at 16 kHz In our experiments 1000audio sentences are randomly chosen from TIMIT databaseThe average maximum and minimum length of the chosenaudio sentences are 347 s 396 s and 312 s All audio files areconverted into AMR-WB format using standard codec

42 Speech Quality Evaluation The perceptual evaluation ofspeech quality (PESQ) described in the ITU-T P862 Recom-mendation [20] may be employed to evaluate speech qualityMoreover according to ITU-T P8622 [21] the raw PESQscore can be converted to mean opinion score-listening qual-ity objective (MOS-LQO) which is more suitable for evalu-ating wideband speech Hence MOS-LQO is applied in ourexperimentsThe normal range ofMOS-LQO score is 1017 to4549 The higher the score the better the quality

Figure 6 shows the MOS-LQO scores of the 1000 coverAMR-WB speeches in 2385 kbits mode and the correspond-ing stego AMR-WB speeches using three different codebookpartition algorithmsThree progressive embedding rates that

StandardCNVNID

Ours

100 200 300 400 500 600 700 800 900 10000Sample index

18222630343842

MO

S-LQ

O sc

ore

(a) The embedding rate is 100 bps

StandardNIDOurs

18222630343842

MO

S-LQ

O sc

ore

100 200 300 400 500 600 700 800 900 10000Sample index

(b) The embedding rate is 200 bps

StandardNIDOurs

100 200 300 400 500 600 700 800 900 10000Sample index

1418222630343842

MO

S-LQ

O sc

ore

(c) The embedding rate is 300 bps

Figure 6 Comparisons of MOS-LQO values for 1000 samplesbetween the standard AMR-WB codec CNV-based steganographyNID-based steganography and the proposedDN-based steganogra-phy

is 100 bps 200 bps and 300 bps are employed in our experi-ments The indices of speech samples are sorted according totheMOS-LQO scores of our proposedmethod It can be seenfrom Figure 6 that the overall scores of the stego AMR-WBspeeches generated with our method are higher than thoseof the NID-based stego AMR-WB speeches especially whenthe embedding rates are 200 bps and 300 bps And the MOS-LQO scores of the CNV-based stego AMR-WB speeches areslightly higher than ours when the embedding rate is 100 bpswhich means there are no obvious discrepancies in speechquality between them Besides when the high embeddingrate that is 200 bps or 300 bps is used the decrease inMOS-LQO scores of our stego AMR-WB speeches is significantlysmaller than that of NID-based steganography

Moreover the average MOS-LQO scores of the coverAMR-WB speeches and the stego AMR-WB speeches withthree different codebook partition algorithms that is CNV

Security and Communication Networks 7

Table 1 MOS-LQO scores of the standard codec CNV-based NID-based and our proposed steganography in four different rate modes andthree embedding rates

Embedding rate Method Rate mode (kbits)1265 1585 1985 2385

Standard 2929 3073 3199 3269

100 bps

CNV 2871 3021 3153 3225(minus20) (minus17) (minus14) (minus13)NID 2750 2895 3020 3091(minus61) (minus58) (minus56) (minus54)Ours 2864 3010 3139 3216(minus22) (minus20) (minus19) (minus16)

200 bps

CNV

NID 2601 2736 2875 2921(minus112) (minus110) (minus107) (minus106)Ours 2807 2955 3084 3164(minus42) (minus38) (minus36) (minus32)

300 bps

CNV

NID 2284 2386 2475 2533(minus220) (minus223) (minus226) (minus225)Ours 2699 2841 2971 3046(minus79) (minus75) (minus71) (minus68)

NID and DN including four rate modes (1265 kbits1585 kbits 1985 kbits and 2385 kbits) together with threekinds of embedding rate (100 bps 200 bps and 300 bps) aregiven in Table 1 Only the MOS-LQO scores of NID-basedand DN-based steganographic methods with embeddingrates 200 bps and 300 bps are given in Table 1 because theembedding capacity of CNV-based steganography may notbe larger than 100 bps

When the embedding rate is 100 bps which is almostthe limit of CNV steganography we can see from Table 1that the mean MOS-LQO scores of our proposed methodare only about 03 worse than CNV-based steganographyThe slight decrease may be almost imperceptible by humanauditory system (HAS) And there are significant increases ofapproximately 38 in the meanMOS-LQO scores when ourpresented method is compared to NID-based steganographyAnd it can be observed that when the embedding rates are200 bps and 300 bps the scores of our approach are improvedby about 7 and 15 correspondingly in contrast to those ofNID-based steganography

Furthermore we can also see that the experimentalresults of four rate modes are analogous The decrease ofspeech quality caused by NID-based steganography is morethan twice that caused by DN-based steganography And theproposedmethod can obtain twice the embedding capacity ofCNV-based steganography by sacrificing less than 2 speechquality in average In addition only a slight decline in speechquality is observed when 300 bps embedding rate is used inthe proposed DN-based method while 200 bps is employedin NID-based method

43 Flexible Embedding Capacity Compared to CNV-basedsteganography flexible embedding capacity may be obtained

to satisfy different practical demand with our proposedmethod The steganographic capacity can be adjusted bychanging the iteration parameter 119873119894 For different values of119873119894 for example 119873119894 = 32 33 54 the average embeddingcapacity and the MOS-LQO scores are given in Figure 7(a)and the corresponding results of NID-based steganographyare provided in Figure 7(b) for comparison Without loss ofgenerality only 2385 kbits mode is used

From Figure 7 we can observe that the embedding ratesignificantly increases with 119873119894 while the MOS-LQO scoreslightly goes down However as NID-based steganographyis concerned the MOS-LQO score rapidly declines with theincrease of the embedding rateTherefore the proposed DN-based steganography can achieve higher embedding capacitywith slight decrease in speech quality For example when119873119894 = 48 the size of each cluster in 119878 is equal to 4 and we canembed 4 bits per frame that is the embedding rate is 200 bpsbut at the same time the CNV algorithm can embed at most2 bits per frame (100 bps)

44 Resistibility of Statistical Steganalysis Speech steganog-raphy aims to hide secret message into cover speech withoutarousing suspicion It is very important for a steganographicmethod to resist statistical steganalysis which is the tech-nique of detecting the presence of hidden message Twostate-of-the-art steganalytic methods [17 18] are used toevaluate the performance of statistical undetectability of ourproposed method In [17] mel-cepstrum coefficients andMarkov transition features from the second-order derivativeof the audio signal are extracted to capture the statisticaldistortions caused by audio steganography while in [18]the correlation characteristics of split vector quantizationcodewords of linear predictive coding filter coefficients are

8 Security and Communication Networks

Embe

ddin

g ra

te (b

ps)

Embedding rateMOS-LQO

330

290

250

210

170

130

90

Times of cluster merging

33

31

29

27

25

23

MO

S-LQ

O sc

ore

565350474441383532

(a) Our proposed steganography

Number of sub-codebooks

Embedding rateMOS-LQO

109876543290

130

170

210

250

290

330

Embe

ddin

g ra

te (b

ps)

23

25

27

29

31

33

MO

S-LQ

O sc

ore

(b) NID-based steganography

Figure 7 Relationship between the embedding rates and the MOS-LQO scores for our proposed steganography and NID-basedsteganography

Table 2 Steganalysis results of different steganographic methods in 2385 kbitss mode

Training rate 04 05 06Method Markov MFCC SS-QCCN RS-QCCN Markov MFCC SS-QCCN RS-QCCN Markov MFCC SS-QCCN RS-QCCN100 bps

CNV 498 498 437 490 501 502 440 492 500 505 419 500NID 510 601 422 500 501 609 429 487 521 598 418 494Ours 500 500 440 494 503 493 403 494 491 486 418 433

200 bpsCNV NID 535 745 469 500 533 762 476 500 536 758 444 501Ours 510 483 452 500 498 487 422 500 505 486 450 500

300 bpsCNV NID 548 746 493 500 563 772 500 500 554 783 505 506Ours 524 497 479 500 528 609 482 500 538 501 466 500

utilized to steganalyze QIM-based steganography in low-bit-rate speech (such as G7231 and G729) Both steganalyticmethods use a support vector machine to predict the exis-tence of hidden message in given audios

In our experiments the sentences chosen from ldquoTIMITrdquodatabases as stated in Section 41 are first encoded using thestandard AMR-WB codec These AMR-WB recordings con-stitute the cover speech setThen secret message is embeddedinto each cover AMR-WB speech with different embeddingrates that is 100 bps 200 bps and 300 bps by CNV-basedNID-based andDN-based steganographyOf course 200 bpsand 300 bps may be omitted for CNV-based steganogra-phy because of its limited embedding capacity And sevenstegospeech sets are generated amongwhich one set is relatedto CNV-based steganographic method and each of three setsis associated with NID-based and DN-based steganographyrespectivelyMoreover only 2385 kbitsmode is usedwithoutloss of generality

In each experiment a pair of cover and stego speech setsis randomly divided into training and testing sets accordingto three kinds of training rates that is 04 05 and 06 For

example if the training rate is 04 the training set contains40 speech samples randomly chosen from each of the coverand stegospeech sets and the remaining 60 samples go intothe testing set As described in [17 18] LIBSVM [22] is usedas a classifier and radial basis function (RBF) kernel and grid-search technique are employed to obtain better classificationperformance For Li et alrsquos steganalytic method the principalcomponent analysis (PCA) is first used as suggested in [18]to reduce the dimension of feature vectors to 300 Let thesamples in cover speech set denote negatives and those instego speech set stand for positives Hence the accuracy maybe defined as follows

Accuracy = 12 times ( TPTP + FN

+ TNFP + TN

) (5)

where TP are true positives TN are true negatives FN arefalse negatives and FP are false positives

The steganalytic results are given in Table 2 It can beseen that when the embedding rate is 100 bps the accuracyof detecting both CNV-based and DN-based methods isalmost the same say 50 or so while that of detecting

Security and Communication Networks 9

0 11 12 13 14 15 21 22 23 24 25 31 32 33 34 35 41 42 43 44 45 51 52 53 54 55Edge

0123456789

10

Cor

relat

ion

inde

x

12

13

14

15

23

24

25

34

35

45

Figure 8 The correlation index of 1000 AMR-WB speeches wherethe interframe edge 119894119894 connects two vertices 119881119894[119896] and 119881119894[119896 + 1] intwo neighboring frames and the intraframe edge 1198941198951015840 connects twovertices 119881119894[119896] and 119881119895[119896] in the same frame

NID-based steganography increases to 60 when MFCC-based steganalytic method is applied Moreover there isan apparent increase in the accuracy of detecting NID-based hiding method with the embedding rate increases to200 bps or 300 bps when Liu et alrsquos methods (ie Markovand MFCC-based steganalytic methods) are applied But theaccuracy of steganalyzing our proposed method DN-basedsteganography stays at the same level of 50 Therefore theproposed method may defend against Liu et alrsquos statisticalsteganalysis [17] even with higher embedding rates

According to the definition of the correlation index givenin [18] the experimental results of the correlation indices of1000 AMR-WB speeches which are randomly selected fromldquoTIMITrdquo are shown in Figure 8 Based on these results twostrong quantization codeword correlation network (QCCN)models say SS-QCCN and RS-QCCN can be constructedas illustrated in Figure 9 These two models are then usedto steganalyze our proposed steganography The steganalyticresults are also presented in Table 2 It can be seen fromTable 2 that the accuracy of both SS-QCCN and RS-QCCNis less than 50 for all of the AMR-WB stegospeeches Thepossible reasons may be that only the second and thirdcodebooks in the second stage are employed in the AMR-WB speech steganography which means merely the vertices1198812[119896] and 1198813[119896] in the 119896th frame may be changed duringsteganography while none of them are utilized in Li et alrsquossteganalytic method except for the edge ldquo33rdquo in RS-QCCNmodel Besides we also used an adapted QCCN model (ieutilize edges ldquo22rdquo ldquo33rdquo and ldquo231015840rdquo) targeted at AMR-WBspeech but the accuracy is still less than 50 It may bebecause the correlation of those edges is not strong enoughfor steganalysis according to Figure 8 Therefore it is reason-able to conclude that theAMR-WB speech steganography candefend against the steganalytic method proposed in [18]

In order to visualize the detection performance wegive some receiver operating characteristic (ROC) curvesof steganalyzing CNV-based steganography with 100 bpsembedding rate and NID-based and DN-based steganogra-phy with 100 bps 200 bps and 300 bps embedding rates are

V1[k]

V2[k]

V3[k]

V4[k]

V5[k]

V1[k + 1]

V2[k + 1]

V3[k + 1]

V4[k + 1]

V5[k + 1]

45 45

11

middot middot middot middot middot middot

(a) SS-QCCN

V1[k + 1]

V2[k + 1]

V3[k + 1]

V4[k + 1]

V5[k + 1]

V1[k]

V2[k]

V3[k]

V4[k]

V5[k]

45 45

15 15

1414

33

44

11

middot middot middot middot middot middot

(b) RS-QCCN

Figure 9 Two AMR-WB strong correlation network models

provided in Figure 10 (ROC curves for SS-QCCN and RS-QCCN are omitted for these two methods fail to steganalyzeAMR-WB steganography in spite of embedding capacity)It shows that all of the three steganographic methods canresist statistical steganalysis when the embedding rate is100 bps While the statistical steganalytic methods especiallyMFCC-based steganalysismay detect the existence of hiddenmessage embedded with NID-based steganography when theembedding rate is above 100 bps the proposed DN-basedsteganography may still have good security against bothMarkov-based and MFCC-based steganalysis

5 Conclusion

The adaptive multirate wideband (AMR-WB) is a widelyadapted format in mobile handsets and is also the recom-mended speech codec for VoLTE AMR-WB speech may bea good candidate for cover medium in speech steganographyIn this paper a novel AMR-WB speech steganographicmethod is proposed The experimental results demonstratedthe effectiveness of our proposed method The main contri-butions of this paper are as follows

(1) A novel AMR-WB speech steganography is pro-posed based on diameter-neighbor codebook parti-tion algorithm It can provide higher capacity with-out noticeable decrease in speech quality and better

10 Security and Communication Networks

DN

CNVNID

0

05

1

True

pos

itive

rate

02 04 06 08 10False positive rate

(a) Markov (TIMIT 100 bps)DN

CNVNID

02 04 06 08 10False positive rate

0

05

1

True

pos

itive

rate

(b) MFCC (TIMIT 100 bps)

NIDDN

0

05

1

True

pos

itive

rate

02 04 06 08 10False positive rate

(c) Markov (TIMIT 200 bps)

NIDDN

0

05

1

True

pos

itive

rate

02 04 06 08 10False positive rate

(d) MFCC (TIMIT 200 bps)

NIDDN

0

05

1

True

pos

itive

rate

02 04 06 08 10False positive rate

(e) Markov (TIMIT 300 bps)

NIDDN

0

05

1

True

pos

itive

rate

02 04 06 08 10False positive rate

(f) MFCC (TIMIT 300 bps)

Figure 10 ROC curves for steganalysis of CNV-based NID-based and our proposed steganography (50 training rate)

performance against statistical steganalysis thanNID-based method

(2) Flexible embedding capacity may be easily achievedwith different iterations of cluster merging Twicethe embedding capacity of CNV-based embeddingmethod may be obtained with119873119894 = 48

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work was partially supported by the National NaturalScience Foundation of China under Grant no 61632013

References

[1] W Bender D Gruhl N Morimoto and A Lu ldquoTechniques fordata hidingrdquo IBM Systems Journal vol 35 no 3-4 pp 313ndash3351996

[2] D Gruhl A Lu and W Bender ldquoEcho hidingrdquo in InformationHiding R Anderson Ed vol 1174 of Lecture Notes in ComputerScience pp 295ndash315 Springer Berlin Heidelberg Berlin Ger-many 1996

[3] K Gopalan ldquoAudio steganography using bit modificationrdquo inProceedings of the 2003 International Conference on Multimediaand Expo ICME 2003 pp I629ndashI632 USA July 2003

[4] K Gopalan S Wenndt S Adams and D Haddad ldquoAudiosteganography by amplitude or phasemodificationrdquo in Proceed-ings of the Security andWatermarking ofMultimedia Contents Vpp 67ndash76 USA January 2003

[5] D Kirovski and H S Malvar ldquoSpread-spectrum watermarkingof audio signalsrdquo IEEE Transactions on Signal Processing vol 51no 4 pp 1020ndash1033 2003

[6] L Liu M Li Q Li and Y Liang ldquoPerceptually transparentinformation hiding in G729 bitstreamrdquo in Proceedings of the2008 4th International Conference on Intelligent InformationHiding andMultiedia Signal Processing IIH-MSP 2008 pp 406ndash409 China August 2008

[7] T Xu and Z Yang ldquoSimple and effective speech steganog-raphy in G7231 low-rate codesrdquo in Proceedings of the 2009

Security and Communication Networks 11

International Conference on Wireless Communications and Sig-nal Processing WCSP 2009 China November 2009

[8] A Shahbazi A H Rezaie and R Shahbazi ldquoMELPe codedspeech hiding on enhanced full rate compressed domainrdquo inProceedings of the Asia Modelling Symposium 2010 4th Inter-national Conference on Mathematical Modelling and ComputerSimulation AMS2010 pp 267ndash270 Malaysia May 2010

[9] A Nishimura ldquoData hiding in pitch delay data of the adaptivemulti-rate narrow-band speech codecrdquo in Proceedings of theIIH-MSP 2009-2009 5th International Conference on IntelligentInformation Hiding and Multimedia Signal Processing pp 483ndash486 Japan September 2009

[10] B Xiao Y Huang and S Tang ldquoAn approach to informationhiding in low bit-rate speech streamrdquo in Proceedings of the2008 IEEE Global Telecommunications Conference GLOBE-COM 2008 pp 1940ndash1944 USA December 2008

[11] B Chen and G W Wornell ldquoQuantization index modulationa class of provably good methods for digital watermarking andinformation embeddingrdquo Institute of Electrical and ElectronicsEngineers Transactions on InformationTheory vol 47 no 4 pp1423ndash1443 2001

[12] Y F Huang S Tang and J Yuan ldquoSteganography in inactiveframes of VoIP streams encoded by source codecrdquo IEEETransactions on Information Forensics and Security vol 6 no2 pp 296ndash306 2011

[13] YHuang C Liu S Tang and S Bai ldquoSteganography integrationinto a low-bit rate speech codecrdquo IEEE Transactions on Informa-tion Forensics and Security vol 7 no 6 pp 1865ndash1875 2012

[14] H Miao L Huang Z Chen W Yang and A Al-Hawbani ldquoAnew scheme for covert communication via 3G encoded speechrdquoComputers and Electrical Engineering vol 38 no 6 pp 1490ndash1501 2012

[15] H Tian J Liu and S Li ldquoImproving security of quantization-index-modulation steganography in low bit-rate speechstreamsrdquoMultimedia Systems vol 20 no 2 pp 143ndash154 2014

[16] J Liu H Tian J Lu and Y Chen ldquoNeighbor-index-divisionsteganography based on QIM method for G7231 speechstreamsrdquo Journal of Ambient Intelligence and Humanized Com-puting vol 7 no 1 pp 139ndash147 2016

[17] Q Liu A H Sung and M Qiao ldquoDerivative-based audiosteganalysisrdquo ACM Transactions on Multimedia ComputingCommunications andApplications (TOMM) vol 7 no 3 articleno 18 2011

[18] S Li Y Jia and C-C J Kuo ldquoSteganalysis of QIM Steganogra-phy in Low-Bit-Rate Speech Signalsrdquo IEEEACM TransactionsonAudio Speech and Language Processing vol 25 no 5 pp 1011ndash1022 2017

[19] ITU-T Wideband Coding of Speech at around 16 Kbps UsingAdaptive Multi-rate Wideband (AMR-WB) International Tele-communication Union Std G7222 2002

[20] Perceptual Evaluation of Speech Quality (PESQ) An ObjectiveMethod for End-to-end Speech Quality Assessment of Narrow-band Telephone Net-works and Speech Codecs InternationalTelecommunication Union Std P862 2001

[21] Wideband Extension to Recommendation P862 for the Assess-ment of Wideband Telephone Networks and Speech CodecsInternational Telecommunication Union Std P8622 2007

[22] C Chang and C Lin ldquoLIBSVM a Library for support vectormachinesrdquo ACM Transactions on Intelligent Systems and Tech-nology vol 2 no 3 article 27 2011

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 2: A Novel AMR-WB Speech Steganography Based on Diameter ...In this section, a technical overview of AMR-WB codec is rstly presented. en two related codebook partition algorithms CNV

2 Security and Communication Networks

data hiding approach to embedding data in enhanced fullrate (EFR) compressed speech bitstream is proposed in [8]In addition Nishimura [9] proposed threemethods of hidingdata in the pitch delay data of the AMR speech

Based on complementary neighbor vertices codebookpartition algorithm (CNV) Xiao et al [10] presented anapproach to information hiding in compressed speech withthe use of quantization index modulation (QIM) [11] Huanget al [12] proposed a steganographic algorithm for embed-ding data in different speech encoding parameters of theinactive frames the embedding capacity of which is boundedby the number of inactive frames in the cover speech In [13]Huang et al also presented a method for steganography inlow bit-rate VoIP streams based on pitch period prediction Itcan achieve high quality of stegospeech and prevent statisticalsteganalysis but the embedding rate is still low (only about1333 bps) And an adaptive suboptimal pulse combinationconstrained (ASOPCC) method was presented in [14] toembed data into compressed speech signal of AMR-WBcodec However most of the PESQ scores in different codingmodes are not high In [15] a key-based codebook partitionstrategy which dynamically determines the adopted divisionscheme was designed to improve the security of the QIMsteganography in speech bitstreamAlthough the stegospeechquality is guaranteed to be good the embedding capacity isvery limited and not adjustable Liu et al [16] proposed aneighbor-index-division codebook division algorithm (NID)for G7231 speech Differing from the existing CNVmethodNID divides neighbor-indexed codewords into separatedsubcodebooks according to a suitable stegocoding strategyThe embedding capacity is improved by using multipledivision and multi-ary coding strategy

The adaptive multirate wideband (AMR-WB) is a widelyadapted format in mobile handsets and is also the recom-mended speech codec for VoLTE AMR-WB speech may bea good candidate for cover medium in audio steganographyTherefore we will focus on AMR-WB speech steganographyin this paper Firstly a new diameter-neighbor (DN) code-book partition algorithm toward AMR-WB speech is pro-posed Based on DN codebook division we develop a novelAMR-WB speech steganography capable of providing flexibleembedding capacity with different iterative parameter 119873119894For example when 119873119894 = 48 twice the embedding capacityof CNV-based method may be obtained with a decrease ofonly about 2 in speech quality and much the same unde-tectability Moreover both the quality of stego speech and thesecurity of defending against statistical steganalysis [17 18]are better than the recent NID-based speech steganography

The remainder of this paper is organized as follows InSection 2 the related work is briefly introduced In Section 3the proposedDNcodebook partition algorithm and the novelAMR-WB speech steganography are described in detail Theexperimental results and analysis are provided in Section 4Finally conclusions are presented

2 Related Work

In this section a technical overview of AMR-WB codecis firstly presented Then two related codebook partition

algorithms CNV [10] and NID [16] are also briefly re-viewed

21 AMR-WB Codec The AMR-WB speech codec is stan-dardized by 3GPP (3rd Generation Partnership Project) andadopted as the standard G7222 by ITU-T in 2002 [19] It is amultirate wideband speech codec applied in modern mobilecommunication systems to remarkably improve the speechquality The AMR-WB codec operates at a multitude of bitrates ranging from 66 kbits to 2385 kbits

The input audio signal is separated into 20ms long frameusing 16 kHz sampling rate Every frame contains a linear pre-diction analysis (LPA) and the LP coefficients are converted toimmittance spectrum pairs (ISP) coefficients ISP coefficientsare then converted to frequency domain (ISF) for quantiza-tion Except for mode 0 (66 kbits) the ISF coefficients arequantized using two-stage vector quantization with split-by-2 in first stage and split-by-5 in the second stage Both thesecond and the third codebooks in the second stage have 128codewords and the ISF indices of the codewords in thesecodebooks may be employed to embed secret message

In the decoder the transmitted indices are first parsedfrom the received bitstream and then decoded to obtain thecode parameters for each transmitted frame such as the ISPvector the 4 fractional pitch lags the 4 LTP filtering parame-ters the 4 innovative code vectors and the 4 sets of vectorquantized pitch and innovative gains For a more detaileddescription one should refer to [19] From the received ISFindices which may have been modified because of secretmessage embedding the receiver can recover the embeddedsecret message

22 Complementary Neighbor Vertices CNV is a new typeof codebook partition algorithm proposed in [10] in whicheach codeword in a codebook is viewed as a vertex inthe multidimensional space The relationship between twocodewords 119883 and 119884 is described as an edge connecting thetwo codewordsrsquo vertices And the weight of an edge is definedas the Euclidean distance 119863(119883119884) between two codewords 119883and 119884 Small value of 119863(119883119884) indicates that 119883 and 119884 bear aclose resemblance to each other The vertex nearest to 119883 isreferred to as119883rsquos neighbor vertex which is denoted by119873(119883)The vertex set 119881 together with the edge set 119864 form a graph119866(119881 119864) in a multidimensional space

The codebook partition is realized by the constructionof the graph 119866(119881 119864) and vertex labelling First each vertex119883 in 119866(119881 119864) is connected with its neighbor vertex 119873(119883)using an edge Thus the graph 119866(119881 119864) would be divided intoseveral isolated subgraphs each of whichmay be proved to beacyclic and 2-colorable Second every vertex and its neighborvertex in a subgraph are labelled oppositely using ldquo0rdquo or ldquo1rdquoThird all of the vertices with same label are collected into asubcodebook hence two subcodebooks will be obtained

Based on the generated subgraphs and the label assignedto each codeword in themCNV-based steganography appliesQIM concept to embed secret message More specificallywhen the label of the codeword 119883 which is associatedwith the cover quantization index 119868119886 agrees with the secretmessage 119868119886 remains unchanged or else it should be replaced

Security and Communication Networks 3

AMR-WBspeech

AMR-WBspeechIndex

parse

Secret Codebooksmessage

Partition

StegoAMR-WB

speech

speech

Stego

Cluster set

Stego ISFindices

ISFindicesindices

stego ISFIndexparse

Public SecretmessageEmbed Index

update channelExtract

AMR-WBDecoder

Decoded

Figure 1 Diagram of the proposed method

with the quantization index of the neighbor codeword119873(119883)which belongs to the opposite subcodebook

The key characteristic of CNV-based steganography isthat the distortion is bound even in the worst case How-ever the embedding capacity is limited which is analyzedexperimentally in Section 4 Moreover the number of pos-sible combinations of flipping coefficients which determinewhether the labels in a subgraph will be flipped is large Extrainformation about the flipping processmust be transmitted tothe receiver and thus the effective embedding capacitymay bedecreased further

23 Neighbor Index Division NID assumes that the code-words of neighbor indices (ie neighbor positions) in acodebook would be close together Hence the codewordsin a codebook can be easily separated into subcodebooksaccording to their indices instead of the Euclidean distanceSpecifically select an appropriate integer 119896 according to thedemand for embedding capacity and label the 119894th codewordwith digit (119894 minus 1) mod 119896 respectively Then collect all thecodewords with same label into a subcodebook and obtain119896 different subcodebooks

In order to take full use of the embedding capacitythe binary secret message should be transformed into 119896-ary digits denoted by 119898 (119898 isin 0 1 119896 minus 1) When thecodeword related to the cover quantization index belongs tothe subcodebook whose label differs from the 119896-ary digit 119898to be embedded this index should be substituted with that ofthe closest codeword in the corresponding subcodebook119898

NID-based steganography is an information hidingmethod based on neighbor-index codebook partition ofwhich the embedding capacity may be controlled by thenumber of subcodebooks 119896 However as illustrated in [16]only about 34 of the pairs of neighbor-index codewordshappened to be the pairs of neighbor-vertex codewords Andthe mean distance between neighbor-index codewords isapparently larger than that of neighbor-vertex codewordsTherefore the amount of distortion induced by NID-basedsteganography may be a little large which is proved by theexperimental results provided in Section 4

3 Proposed Method

The diagram of the proposed method is shown in Figure 1Based on DN codebook partition of the codebooks described

in Section 21 secret message can be embedded into anAMR-WB speech file After the stego AMR-WB speech fileis received the embedded secret message can be extractedwithout errors At the same time the decoded speech withoutperceptible distortion will also be obtained In the followingsection the diameter-neighbor codebook partition algorithm(DN) is first introducedThen the embedding and extractionprocedure of our proposed method are described

31 Codebook Partition A codebook may be viewed as alist of isolated code vectors (ie codewords) in the multidi-mensional space The codebook partition algorithm used foraudio steganography is to divide the codebook into severalclusters in each of which the codewords can be replaced witheach other without causing perceptible distortion

Let 119861 denote the original codebook with 119873119887 code-words and 119862 denote a cluster with 119873119888 codewords 119882119905 (119905 =1 2 119873119888) and the centroid 119866 of a cluster 119862 is defined asfollows

119866 (119894) = 1119873119888119873119888sum119905=1

119882119905 (119894) (1)

where 119866(119894) and 119882119905(119894) are the 119894th components of 119866 and 119882119905respectively

The centroid 119866 (average code vector) is used to representthe corresponding cluster 119862 hence the cluster 119862 may alsobe considered as a vector in the multidimensional codebookspace In order to describe the similarity between two clusters1198621 and1198622 the Euclidean distance between them is defined asfollows

119863(1198621 1198622) = radic 119899sum119894=1

(1198661 (119894) minus 1198662 (119894))2 (2)

where 1198661 and 1198662 are the corresponding geometric centerpoints of the two clusters 1198621 and 1198622 And 119899 is the dimensionof a codeword 1198661(119894) and 1198662(119894) are the 119894th components of 1198661and 1198662 respectively

Let 119878 denote a cluster set The diameter of 119878 is defined asthe maximal Euclidean distance119863119898 of all cluster pairs in thecluster set 119878 that is

119863(119862119901 119862119902) le 119863119898 forall119901 119902 = 1 2 |119878| (3)

4 Security and Communication Networks

Codebook B

Initialize a cluster set Sby taking each codewordas a independent clusterand a empty cluster set S

Cluster set S

Put the remainingclusters in S into S

No

Yes

Put the clustersin S into S tomake S empty

S is empty

No

Yes

Search for the diametercluster pair(Cd1 Cd2) in S

Remove Cd1 Cd2 and theirneighbors from S put

Merge Cd1 Cd2 withtheir neighbors respectivelyinto two new clusters

Ni gt 0

Ni = Ni minus 1

into STemp1 and Temp2Temp1 and Temp2

Figure 2 Diagram of our proposed codebook partition

where |119878| is the number of clusters within the cluster set119878 The cluster pair with maximal Euclidean distance 119863119898called diameter cluster pair is denoted by (1198621198891 1198621198892) And theneighbor of a cluster119862 in 119878 is represented by119873(119862 119878) then wehave

119863 (119862119873 (119862 119878)) le 119863 (119862 119862119901) forall119901 = 1 2 |119878| (4)

Figure 2 illustrates the diagram of the proposed DNcodebook partition algorithm And its detailed procedureis given in Algorithm 1 The original codebook 119861 will bedivided into |119878| clusters by iteratively merging the diametercluster pair with their respective neighbors An iterationparameter119873119894 is applied to obtain flexible embedding capacitythrough controlling the merging procedure The relationshipbetween119873119894 and the embedding capacity will be discussed inSection 43

Figure 3 is provided as an example to illustrate theproposed codebook partition algorithmThe white circle ldquoIrdquodenotes a codeword And the oval ldquordquo with shadow denotesa codeword and its neighbor in 119878 being processed whilethe oval ldquordquo without shadow represents a cluster in 1198781015840 thathas been formed The ldquo0rdquo ldquo1rdquo ldquo00rdquo ldquo01rdquo ldquo10rdquo or ldquo11rdquo ina circle ldquoIrdquo is the label of a codeword in the cluster Thecross ldquotimesrdquo means the centroid of the cluster it belongs to anda line ldquominusrdquo represents the diameter of a cluster set The firstto third merging iterations are shown in Figures 3(a)ndash3(c)respectively The fourth merging iteration is comprised of

Figures 3(d) and 3(e) and Figure 3(f) demonstrates thelabelling of the codewords

32 Embedding Procedure In our proposed method the ISFindices corresponding to the codewords in the codebook arefirst obtained by parsing the host AMR-WB speechThen theISF indices are employed to embed secret message based oncodebook partition Generally the codewords in the samecluster as the codeword referred by 119868119886 lies in are consideredto be replaceable with each other According to the secretmessage to be embedded 119868119886 may be substituted by one of theother codewordsrsquo indiceswithin the same clusterThenumberof secret message bits that can be embedded depends on thesize of the specific cluster The embedding procedures aregiven in the following

Step 1 Search cluster set 119878 for the cluster 119862 which containsthe codeword referred by the ISF index 119868119886Step 2 If there are 119873 codewords in 119862 the number of secretbits that can be embedded into 119868119886 is calculated as 119899 = lfloorlog2119873rfloorStep 3 Read 119899 not-yet-embedded bits denoted by 119898 fromthe secret message 119868119886 is replaced with 119868119887 which indexes thecodeword with the same label as119898

Step 4 Repeat Steps 1ndash3 until all the secret bits are embedded

Security and Communication Networks 5

(a) 1st iteration (119873119894 = 4) (b) 2nd iteration (119873119894 = 3) (c) 3rd iteration (119873119894 = 2)

(d) 4th iteration (119878 = 1198781015840 1198781015840clear()) (e) 4th iteration (119873119894 = 1)

0

0

1

1

11

11

00

0001 01

1010

(f) Labelling

Figure 3 An example of our proposed codebook partition

Input Codebook 119861 iterative parameter119873119894Output Cluster set 119878⋆ 1198781015840 is a helper cluster set ⋆1198781015840clear()119878clear()⋆ Each codeword is taken as a initial cluster ⋆for 119894 = 0 119894 lt 119873119888 ++119894 do119878push (119862119894)end⋆ Iterative merging ⋆while 119873119894 gt 0 do

if 119878 is empty then119878 = 11987810158401198781015840clear()end(1198621198891 1198621198891) = argmax119894119895isin12|119878|119863(119862119894 119862119895)1198791198901198981199011 = 1198621198891 cup 119873(1198621198891 119878)1198791198901198981199012 = 1198621198892 cup 119873(1198621198892 119878)1198781015840push (1198791198901198981199011)1198781015840push (1198791198901198981199012)119878remove (1198621198891)119878remove (1198621198892)119878remove (119873(1198621198891 119878))119878remove (119873(1198621198892 119878))119873119894 = 119873119894 minus 1

end⋆ Put the remaining clusters in 1198781015840 into 119878 ⋆for iter = 1198781015840begin() iter lt 1198781015840end() ++iterdo119878push (lowastiter)

endreturn 119878

Algorithm 1 DN-based codebook partition algorithm

Cluster set S

ISF

Stego ISF

Secret

00 01

1110

0 1

Search amp replace

ClusteL1

ClusteL2

index Ia

index Ib

Wa Wb

WcWd

bits ldquo01rdquo

Ia Ib

Figure 4 Embedding two bits into one cover ISF index

Figure 4 is an example of embedding two secret bits intoone cover ISF index Let us assume the cluster set 119878 containstwo clusters and the corresponding codeword indexed by 119868119909is119882119909 for example 119868119887 indexes the codeword119882119887 Hence theISF index 119868119886 shown in Figure 4 will be replaced with 119868119887 whichindexes the codeword119882119887 with the same label as the secret bitsldquo01rdquo

33 Extracting Procedure When the stego AMR-WB speechis transferred to the intended receiver the stego indices maybe obtained by parsing AMR-WB speech stream and used toextract the embedded secretmessageThemessage extractionprocedures from the stegoindex 119868119887 are given below

Step 1 Search cluster set 119878 which is the same as that employedin the embedding procedure for the cluster119862which containsthe codeword119882119887 referred by the ISF index 119868119887Step 2 If there are totally 119873 codewords in 119862 the number ofsecret bits carried by 119868119887 is computed by 119899 = lfloorlog2119873rfloor

6 Security and Communication Networks

Stego ISF

Cluster set S

Extracted

00 01

1110

0 1

Search amp read

ClusteL1

ClusteL2

index Ib

Wa Wb

Wc Wd

bits ldquo01rdquo

ldquo01rdquo

Ib

Figure 5 Extracting two bits from one stego-ISF index

Step 3 Read the label of119882119887 as the extracted 119899 bits which areappended to the secret message bit sequence

Step 4 Repeat Steps 1ndash3 until all the secret bits are recovered

Figure 5 is the corresponding example of extracting twosecret bits from the stegoindex 119868119887 generated by the previousembedding instance shown in Figure 4 It can be easily seenthat the extracted secret bits are identical to the embeddedsecret bits

4 Experimental Results and Analysis

In order to demonstrate the performance of the proposedmethod the perceptual quality of the stego AMR-WB speechwith secret message embedded using our method is com-puted and compared to that of the stego AMR-WB speechgenerated with CNV and NID steganography Moreover theflexibility of embedding capacity and the security regardingstatistical detection are analyzed in detail

41 Audio Database TIMIT acoustic-phonetic continuousspeech corpus (httpscatalogldcupenneduldc93s1) is anaudio database which contains broadband recordings of630 speakers of eight major dialects of American Englisheach reading ten phonetically rich sentences and all audiosentences are sampled at 16 kHz In our experiments 1000audio sentences are randomly chosen from TIMIT databaseThe average maximum and minimum length of the chosenaudio sentences are 347 s 396 s and 312 s All audio files areconverted into AMR-WB format using standard codec

42 Speech Quality Evaluation The perceptual evaluation ofspeech quality (PESQ) described in the ITU-T P862 Recom-mendation [20] may be employed to evaluate speech qualityMoreover according to ITU-T P8622 [21] the raw PESQscore can be converted to mean opinion score-listening qual-ity objective (MOS-LQO) which is more suitable for evalu-ating wideband speech Hence MOS-LQO is applied in ourexperimentsThe normal range ofMOS-LQO score is 1017 to4549 The higher the score the better the quality

Figure 6 shows the MOS-LQO scores of the 1000 coverAMR-WB speeches in 2385 kbits mode and the correspond-ing stego AMR-WB speeches using three different codebookpartition algorithmsThree progressive embedding rates that

StandardCNVNID

Ours

100 200 300 400 500 600 700 800 900 10000Sample index

18222630343842

MO

S-LQ

O sc

ore

(a) The embedding rate is 100 bps

StandardNIDOurs

18222630343842

MO

S-LQ

O sc

ore

100 200 300 400 500 600 700 800 900 10000Sample index

(b) The embedding rate is 200 bps

StandardNIDOurs

100 200 300 400 500 600 700 800 900 10000Sample index

1418222630343842

MO

S-LQ

O sc

ore

(c) The embedding rate is 300 bps

Figure 6 Comparisons of MOS-LQO values for 1000 samplesbetween the standard AMR-WB codec CNV-based steganographyNID-based steganography and the proposedDN-based steganogra-phy

is 100 bps 200 bps and 300 bps are employed in our experi-ments The indices of speech samples are sorted according totheMOS-LQO scores of our proposedmethod It can be seenfrom Figure 6 that the overall scores of the stego AMR-WBspeeches generated with our method are higher than thoseof the NID-based stego AMR-WB speeches especially whenthe embedding rates are 200 bps and 300 bps And the MOS-LQO scores of the CNV-based stego AMR-WB speeches areslightly higher than ours when the embedding rate is 100 bpswhich means there are no obvious discrepancies in speechquality between them Besides when the high embeddingrate that is 200 bps or 300 bps is used the decrease inMOS-LQO scores of our stego AMR-WB speeches is significantlysmaller than that of NID-based steganography

Moreover the average MOS-LQO scores of the coverAMR-WB speeches and the stego AMR-WB speeches withthree different codebook partition algorithms that is CNV

Security and Communication Networks 7

Table 1 MOS-LQO scores of the standard codec CNV-based NID-based and our proposed steganography in four different rate modes andthree embedding rates

Embedding rate Method Rate mode (kbits)1265 1585 1985 2385

Standard 2929 3073 3199 3269

100 bps

CNV 2871 3021 3153 3225(minus20) (minus17) (minus14) (minus13)NID 2750 2895 3020 3091(minus61) (minus58) (minus56) (minus54)Ours 2864 3010 3139 3216(minus22) (minus20) (minus19) (minus16)

200 bps

CNV

NID 2601 2736 2875 2921(minus112) (minus110) (minus107) (minus106)Ours 2807 2955 3084 3164(minus42) (minus38) (minus36) (minus32)

300 bps

CNV

NID 2284 2386 2475 2533(minus220) (minus223) (minus226) (minus225)Ours 2699 2841 2971 3046(minus79) (minus75) (minus71) (minus68)

NID and DN including four rate modes (1265 kbits1585 kbits 1985 kbits and 2385 kbits) together with threekinds of embedding rate (100 bps 200 bps and 300 bps) aregiven in Table 1 Only the MOS-LQO scores of NID-basedand DN-based steganographic methods with embeddingrates 200 bps and 300 bps are given in Table 1 because theembedding capacity of CNV-based steganography may notbe larger than 100 bps

When the embedding rate is 100 bps which is almostthe limit of CNV steganography we can see from Table 1that the mean MOS-LQO scores of our proposed methodare only about 03 worse than CNV-based steganographyThe slight decrease may be almost imperceptible by humanauditory system (HAS) And there are significant increases ofapproximately 38 in the meanMOS-LQO scores when ourpresented method is compared to NID-based steganographyAnd it can be observed that when the embedding rates are200 bps and 300 bps the scores of our approach are improvedby about 7 and 15 correspondingly in contrast to those ofNID-based steganography

Furthermore we can also see that the experimentalresults of four rate modes are analogous The decrease ofspeech quality caused by NID-based steganography is morethan twice that caused by DN-based steganography And theproposedmethod can obtain twice the embedding capacity ofCNV-based steganography by sacrificing less than 2 speechquality in average In addition only a slight decline in speechquality is observed when 300 bps embedding rate is used inthe proposed DN-based method while 200 bps is employedin NID-based method

43 Flexible Embedding Capacity Compared to CNV-basedsteganography flexible embedding capacity may be obtained

to satisfy different practical demand with our proposedmethod The steganographic capacity can be adjusted bychanging the iteration parameter 119873119894 For different values of119873119894 for example 119873119894 = 32 33 54 the average embeddingcapacity and the MOS-LQO scores are given in Figure 7(a)and the corresponding results of NID-based steganographyare provided in Figure 7(b) for comparison Without loss ofgenerality only 2385 kbits mode is used

From Figure 7 we can observe that the embedding ratesignificantly increases with 119873119894 while the MOS-LQO scoreslightly goes down However as NID-based steganographyis concerned the MOS-LQO score rapidly declines with theincrease of the embedding rateTherefore the proposed DN-based steganography can achieve higher embedding capacitywith slight decrease in speech quality For example when119873119894 = 48 the size of each cluster in 119878 is equal to 4 and we canembed 4 bits per frame that is the embedding rate is 200 bpsbut at the same time the CNV algorithm can embed at most2 bits per frame (100 bps)

44 Resistibility of Statistical Steganalysis Speech steganog-raphy aims to hide secret message into cover speech withoutarousing suspicion It is very important for a steganographicmethod to resist statistical steganalysis which is the tech-nique of detecting the presence of hidden message Twostate-of-the-art steganalytic methods [17 18] are used toevaluate the performance of statistical undetectability of ourproposed method In [17] mel-cepstrum coefficients andMarkov transition features from the second-order derivativeof the audio signal are extracted to capture the statisticaldistortions caused by audio steganography while in [18]the correlation characteristics of split vector quantizationcodewords of linear predictive coding filter coefficients are

8 Security and Communication Networks

Embe

ddin

g ra

te (b

ps)

Embedding rateMOS-LQO

330

290

250

210

170

130

90

Times of cluster merging

33

31

29

27

25

23

MO

S-LQ

O sc

ore

565350474441383532

(a) Our proposed steganography

Number of sub-codebooks

Embedding rateMOS-LQO

109876543290

130

170

210

250

290

330

Embe

ddin

g ra

te (b

ps)

23

25

27

29

31

33

MO

S-LQ

O sc

ore

(b) NID-based steganography

Figure 7 Relationship between the embedding rates and the MOS-LQO scores for our proposed steganography and NID-basedsteganography

Table 2 Steganalysis results of different steganographic methods in 2385 kbitss mode

Training rate 04 05 06Method Markov MFCC SS-QCCN RS-QCCN Markov MFCC SS-QCCN RS-QCCN Markov MFCC SS-QCCN RS-QCCN100 bps

CNV 498 498 437 490 501 502 440 492 500 505 419 500NID 510 601 422 500 501 609 429 487 521 598 418 494Ours 500 500 440 494 503 493 403 494 491 486 418 433

200 bpsCNV NID 535 745 469 500 533 762 476 500 536 758 444 501Ours 510 483 452 500 498 487 422 500 505 486 450 500

300 bpsCNV NID 548 746 493 500 563 772 500 500 554 783 505 506Ours 524 497 479 500 528 609 482 500 538 501 466 500

utilized to steganalyze QIM-based steganography in low-bit-rate speech (such as G7231 and G729) Both steganalyticmethods use a support vector machine to predict the exis-tence of hidden message in given audios

In our experiments the sentences chosen from ldquoTIMITrdquodatabases as stated in Section 41 are first encoded using thestandard AMR-WB codec These AMR-WB recordings con-stitute the cover speech setThen secret message is embeddedinto each cover AMR-WB speech with different embeddingrates that is 100 bps 200 bps and 300 bps by CNV-basedNID-based andDN-based steganographyOf course 200 bpsand 300 bps may be omitted for CNV-based steganogra-phy because of its limited embedding capacity And sevenstegospeech sets are generated amongwhich one set is relatedto CNV-based steganographic method and each of three setsis associated with NID-based and DN-based steganographyrespectivelyMoreover only 2385 kbitsmode is usedwithoutloss of generality

In each experiment a pair of cover and stego speech setsis randomly divided into training and testing sets accordingto three kinds of training rates that is 04 05 and 06 For

example if the training rate is 04 the training set contains40 speech samples randomly chosen from each of the coverand stegospeech sets and the remaining 60 samples go intothe testing set As described in [17 18] LIBSVM [22] is usedas a classifier and radial basis function (RBF) kernel and grid-search technique are employed to obtain better classificationperformance For Li et alrsquos steganalytic method the principalcomponent analysis (PCA) is first used as suggested in [18]to reduce the dimension of feature vectors to 300 Let thesamples in cover speech set denote negatives and those instego speech set stand for positives Hence the accuracy maybe defined as follows

Accuracy = 12 times ( TPTP + FN

+ TNFP + TN

) (5)

where TP are true positives TN are true negatives FN arefalse negatives and FP are false positives

The steganalytic results are given in Table 2 It can beseen that when the embedding rate is 100 bps the accuracyof detecting both CNV-based and DN-based methods isalmost the same say 50 or so while that of detecting

Security and Communication Networks 9

0 11 12 13 14 15 21 22 23 24 25 31 32 33 34 35 41 42 43 44 45 51 52 53 54 55Edge

0123456789

10

Cor

relat

ion

inde

x

12

13

14

15

23

24

25

34

35

45

Figure 8 The correlation index of 1000 AMR-WB speeches wherethe interframe edge 119894119894 connects two vertices 119881119894[119896] and 119881119894[119896 + 1] intwo neighboring frames and the intraframe edge 1198941198951015840 connects twovertices 119881119894[119896] and 119881119895[119896] in the same frame

NID-based steganography increases to 60 when MFCC-based steganalytic method is applied Moreover there isan apparent increase in the accuracy of detecting NID-based hiding method with the embedding rate increases to200 bps or 300 bps when Liu et alrsquos methods (ie Markovand MFCC-based steganalytic methods) are applied But theaccuracy of steganalyzing our proposed method DN-basedsteganography stays at the same level of 50 Therefore theproposed method may defend against Liu et alrsquos statisticalsteganalysis [17] even with higher embedding rates

According to the definition of the correlation index givenin [18] the experimental results of the correlation indices of1000 AMR-WB speeches which are randomly selected fromldquoTIMITrdquo are shown in Figure 8 Based on these results twostrong quantization codeword correlation network (QCCN)models say SS-QCCN and RS-QCCN can be constructedas illustrated in Figure 9 These two models are then usedto steganalyze our proposed steganography The steganalyticresults are also presented in Table 2 It can be seen fromTable 2 that the accuracy of both SS-QCCN and RS-QCCNis less than 50 for all of the AMR-WB stegospeeches Thepossible reasons may be that only the second and thirdcodebooks in the second stage are employed in the AMR-WB speech steganography which means merely the vertices1198812[119896] and 1198813[119896] in the 119896th frame may be changed duringsteganography while none of them are utilized in Li et alrsquossteganalytic method except for the edge ldquo33rdquo in RS-QCCNmodel Besides we also used an adapted QCCN model (ieutilize edges ldquo22rdquo ldquo33rdquo and ldquo231015840rdquo) targeted at AMR-WBspeech but the accuracy is still less than 50 It may bebecause the correlation of those edges is not strong enoughfor steganalysis according to Figure 8 Therefore it is reason-able to conclude that theAMR-WB speech steganography candefend against the steganalytic method proposed in [18]

In order to visualize the detection performance wegive some receiver operating characteristic (ROC) curvesof steganalyzing CNV-based steganography with 100 bpsembedding rate and NID-based and DN-based steganogra-phy with 100 bps 200 bps and 300 bps embedding rates are

V1[k]

V2[k]

V3[k]

V4[k]

V5[k]

V1[k + 1]

V2[k + 1]

V3[k + 1]

V4[k + 1]

V5[k + 1]

45 45

11

middot middot middot middot middot middot

(a) SS-QCCN

V1[k + 1]

V2[k + 1]

V3[k + 1]

V4[k + 1]

V5[k + 1]

V1[k]

V2[k]

V3[k]

V4[k]

V5[k]

45 45

15 15

1414

33

44

11

middot middot middot middot middot middot

(b) RS-QCCN

Figure 9 Two AMR-WB strong correlation network models

provided in Figure 10 (ROC curves for SS-QCCN and RS-QCCN are omitted for these two methods fail to steganalyzeAMR-WB steganography in spite of embedding capacity)It shows that all of the three steganographic methods canresist statistical steganalysis when the embedding rate is100 bps While the statistical steganalytic methods especiallyMFCC-based steganalysismay detect the existence of hiddenmessage embedded with NID-based steganography when theembedding rate is above 100 bps the proposed DN-basedsteganography may still have good security against bothMarkov-based and MFCC-based steganalysis

5 Conclusion

The adaptive multirate wideband (AMR-WB) is a widelyadapted format in mobile handsets and is also the recom-mended speech codec for VoLTE AMR-WB speech may bea good candidate for cover medium in speech steganographyIn this paper a novel AMR-WB speech steganographicmethod is proposed The experimental results demonstratedthe effectiveness of our proposed method The main contri-butions of this paper are as follows

(1) A novel AMR-WB speech steganography is pro-posed based on diameter-neighbor codebook parti-tion algorithm It can provide higher capacity with-out noticeable decrease in speech quality and better

10 Security and Communication Networks

DN

CNVNID

0

05

1

True

pos

itive

rate

02 04 06 08 10False positive rate

(a) Markov (TIMIT 100 bps)DN

CNVNID

02 04 06 08 10False positive rate

0

05

1

True

pos

itive

rate

(b) MFCC (TIMIT 100 bps)

NIDDN

0

05

1

True

pos

itive

rate

02 04 06 08 10False positive rate

(c) Markov (TIMIT 200 bps)

NIDDN

0

05

1

True

pos

itive

rate

02 04 06 08 10False positive rate

(d) MFCC (TIMIT 200 bps)

NIDDN

0

05

1

True

pos

itive

rate

02 04 06 08 10False positive rate

(e) Markov (TIMIT 300 bps)

NIDDN

0

05

1

True

pos

itive

rate

02 04 06 08 10False positive rate

(f) MFCC (TIMIT 300 bps)

Figure 10 ROC curves for steganalysis of CNV-based NID-based and our proposed steganography (50 training rate)

performance against statistical steganalysis thanNID-based method

(2) Flexible embedding capacity may be easily achievedwith different iterations of cluster merging Twicethe embedding capacity of CNV-based embeddingmethod may be obtained with119873119894 = 48

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work was partially supported by the National NaturalScience Foundation of China under Grant no 61632013

References

[1] W Bender D Gruhl N Morimoto and A Lu ldquoTechniques fordata hidingrdquo IBM Systems Journal vol 35 no 3-4 pp 313ndash3351996

[2] D Gruhl A Lu and W Bender ldquoEcho hidingrdquo in InformationHiding R Anderson Ed vol 1174 of Lecture Notes in ComputerScience pp 295ndash315 Springer Berlin Heidelberg Berlin Ger-many 1996

[3] K Gopalan ldquoAudio steganography using bit modificationrdquo inProceedings of the 2003 International Conference on Multimediaand Expo ICME 2003 pp I629ndashI632 USA July 2003

[4] K Gopalan S Wenndt S Adams and D Haddad ldquoAudiosteganography by amplitude or phasemodificationrdquo in Proceed-ings of the Security andWatermarking ofMultimedia Contents Vpp 67ndash76 USA January 2003

[5] D Kirovski and H S Malvar ldquoSpread-spectrum watermarkingof audio signalsrdquo IEEE Transactions on Signal Processing vol 51no 4 pp 1020ndash1033 2003

[6] L Liu M Li Q Li and Y Liang ldquoPerceptually transparentinformation hiding in G729 bitstreamrdquo in Proceedings of the2008 4th International Conference on Intelligent InformationHiding andMultiedia Signal Processing IIH-MSP 2008 pp 406ndash409 China August 2008

[7] T Xu and Z Yang ldquoSimple and effective speech steganog-raphy in G7231 low-rate codesrdquo in Proceedings of the 2009

Security and Communication Networks 11

International Conference on Wireless Communications and Sig-nal Processing WCSP 2009 China November 2009

[8] A Shahbazi A H Rezaie and R Shahbazi ldquoMELPe codedspeech hiding on enhanced full rate compressed domainrdquo inProceedings of the Asia Modelling Symposium 2010 4th Inter-national Conference on Mathematical Modelling and ComputerSimulation AMS2010 pp 267ndash270 Malaysia May 2010

[9] A Nishimura ldquoData hiding in pitch delay data of the adaptivemulti-rate narrow-band speech codecrdquo in Proceedings of theIIH-MSP 2009-2009 5th International Conference on IntelligentInformation Hiding and Multimedia Signal Processing pp 483ndash486 Japan September 2009

[10] B Xiao Y Huang and S Tang ldquoAn approach to informationhiding in low bit-rate speech streamrdquo in Proceedings of the2008 IEEE Global Telecommunications Conference GLOBE-COM 2008 pp 1940ndash1944 USA December 2008

[11] B Chen and G W Wornell ldquoQuantization index modulationa class of provably good methods for digital watermarking andinformation embeddingrdquo Institute of Electrical and ElectronicsEngineers Transactions on InformationTheory vol 47 no 4 pp1423ndash1443 2001

[12] Y F Huang S Tang and J Yuan ldquoSteganography in inactiveframes of VoIP streams encoded by source codecrdquo IEEETransactions on Information Forensics and Security vol 6 no2 pp 296ndash306 2011

[13] YHuang C Liu S Tang and S Bai ldquoSteganography integrationinto a low-bit rate speech codecrdquo IEEE Transactions on Informa-tion Forensics and Security vol 7 no 6 pp 1865ndash1875 2012

[14] H Miao L Huang Z Chen W Yang and A Al-Hawbani ldquoAnew scheme for covert communication via 3G encoded speechrdquoComputers and Electrical Engineering vol 38 no 6 pp 1490ndash1501 2012

[15] H Tian J Liu and S Li ldquoImproving security of quantization-index-modulation steganography in low bit-rate speechstreamsrdquoMultimedia Systems vol 20 no 2 pp 143ndash154 2014

[16] J Liu H Tian J Lu and Y Chen ldquoNeighbor-index-divisionsteganography based on QIM method for G7231 speechstreamsrdquo Journal of Ambient Intelligence and Humanized Com-puting vol 7 no 1 pp 139ndash147 2016

[17] Q Liu A H Sung and M Qiao ldquoDerivative-based audiosteganalysisrdquo ACM Transactions on Multimedia ComputingCommunications andApplications (TOMM) vol 7 no 3 articleno 18 2011

[18] S Li Y Jia and C-C J Kuo ldquoSteganalysis of QIM Steganogra-phy in Low-Bit-Rate Speech Signalsrdquo IEEEACM TransactionsonAudio Speech and Language Processing vol 25 no 5 pp 1011ndash1022 2017

[19] ITU-T Wideband Coding of Speech at around 16 Kbps UsingAdaptive Multi-rate Wideband (AMR-WB) International Tele-communication Union Std G7222 2002

[20] Perceptual Evaluation of Speech Quality (PESQ) An ObjectiveMethod for End-to-end Speech Quality Assessment of Narrow-band Telephone Net-works and Speech Codecs InternationalTelecommunication Union Std P862 2001

[21] Wideband Extension to Recommendation P862 for the Assess-ment of Wideband Telephone Networks and Speech CodecsInternational Telecommunication Union Std P8622 2007

[22] C Chang and C Lin ldquoLIBSVM a Library for support vectormachinesrdquo ACM Transactions on Intelligent Systems and Tech-nology vol 2 no 3 article 27 2011

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 3: A Novel AMR-WB Speech Steganography Based on Diameter ...In this section, a technical overview of AMR-WB codec is rstly presented. en two related codebook partition algorithms CNV

Security and Communication Networks 3

AMR-WBspeech

AMR-WBspeechIndex

parse

Secret Codebooksmessage

Partition

StegoAMR-WB

speech

speech

Stego

Cluster set

Stego ISFindices

ISFindicesindices

stego ISFIndexparse

Public SecretmessageEmbed Index

update channelExtract

AMR-WBDecoder

Decoded

Figure 1 Diagram of the proposed method

with the quantization index of the neighbor codeword119873(119883)which belongs to the opposite subcodebook

The key characteristic of CNV-based steganography isthat the distortion is bound even in the worst case How-ever the embedding capacity is limited which is analyzedexperimentally in Section 4 Moreover the number of pos-sible combinations of flipping coefficients which determinewhether the labels in a subgraph will be flipped is large Extrainformation about the flipping processmust be transmitted tothe receiver and thus the effective embedding capacitymay bedecreased further

23 Neighbor Index Division NID assumes that the code-words of neighbor indices (ie neighbor positions) in acodebook would be close together Hence the codewordsin a codebook can be easily separated into subcodebooksaccording to their indices instead of the Euclidean distanceSpecifically select an appropriate integer 119896 according to thedemand for embedding capacity and label the 119894th codewordwith digit (119894 minus 1) mod 119896 respectively Then collect all thecodewords with same label into a subcodebook and obtain119896 different subcodebooks

In order to take full use of the embedding capacitythe binary secret message should be transformed into 119896-ary digits denoted by 119898 (119898 isin 0 1 119896 minus 1) When thecodeword related to the cover quantization index belongs tothe subcodebook whose label differs from the 119896-ary digit 119898to be embedded this index should be substituted with that ofthe closest codeword in the corresponding subcodebook119898

NID-based steganography is an information hidingmethod based on neighbor-index codebook partition ofwhich the embedding capacity may be controlled by thenumber of subcodebooks 119896 However as illustrated in [16]only about 34 of the pairs of neighbor-index codewordshappened to be the pairs of neighbor-vertex codewords Andthe mean distance between neighbor-index codewords isapparently larger than that of neighbor-vertex codewordsTherefore the amount of distortion induced by NID-basedsteganography may be a little large which is proved by theexperimental results provided in Section 4

3 Proposed Method

The diagram of the proposed method is shown in Figure 1Based on DN codebook partition of the codebooks described

in Section 21 secret message can be embedded into anAMR-WB speech file After the stego AMR-WB speech fileis received the embedded secret message can be extractedwithout errors At the same time the decoded speech withoutperceptible distortion will also be obtained In the followingsection the diameter-neighbor codebook partition algorithm(DN) is first introducedThen the embedding and extractionprocedure of our proposed method are described

31 Codebook Partition A codebook may be viewed as alist of isolated code vectors (ie codewords) in the multidi-mensional space The codebook partition algorithm used foraudio steganography is to divide the codebook into severalclusters in each of which the codewords can be replaced witheach other without causing perceptible distortion

Let 119861 denote the original codebook with 119873119887 code-words and 119862 denote a cluster with 119873119888 codewords 119882119905 (119905 =1 2 119873119888) and the centroid 119866 of a cluster 119862 is defined asfollows

119866 (119894) = 1119873119888119873119888sum119905=1

119882119905 (119894) (1)

where 119866(119894) and 119882119905(119894) are the 119894th components of 119866 and 119882119905respectively

The centroid 119866 (average code vector) is used to representthe corresponding cluster 119862 hence the cluster 119862 may alsobe considered as a vector in the multidimensional codebookspace In order to describe the similarity between two clusters1198621 and1198622 the Euclidean distance between them is defined asfollows

119863(1198621 1198622) = radic 119899sum119894=1

(1198661 (119894) minus 1198662 (119894))2 (2)

where 1198661 and 1198662 are the corresponding geometric centerpoints of the two clusters 1198621 and 1198622 And 119899 is the dimensionof a codeword 1198661(119894) and 1198662(119894) are the 119894th components of 1198661and 1198662 respectively

Let 119878 denote a cluster set The diameter of 119878 is defined asthe maximal Euclidean distance119863119898 of all cluster pairs in thecluster set 119878 that is

119863(119862119901 119862119902) le 119863119898 forall119901 119902 = 1 2 |119878| (3)

4 Security and Communication Networks

Codebook B

Initialize a cluster set Sby taking each codewordas a independent clusterand a empty cluster set S

Cluster set S

Put the remainingclusters in S into S

No

Yes

Put the clustersin S into S tomake S empty

S is empty

No

Yes

Search for the diametercluster pair(Cd1 Cd2) in S

Remove Cd1 Cd2 and theirneighbors from S put

Merge Cd1 Cd2 withtheir neighbors respectivelyinto two new clusters

Ni gt 0

Ni = Ni minus 1

into STemp1 and Temp2Temp1 and Temp2

Figure 2 Diagram of our proposed codebook partition

where |119878| is the number of clusters within the cluster set119878 The cluster pair with maximal Euclidean distance 119863119898called diameter cluster pair is denoted by (1198621198891 1198621198892) And theneighbor of a cluster119862 in 119878 is represented by119873(119862 119878) then wehave

119863 (119862119873 (119862 119878)) le 119863 (119862 119862119901) forall119901 = 1 2 |119878| (4)

Figure 2 illustrates the diagram of the proposed DNcodebook partition algorithm And its detailed procedureis given in Algorithm 1 The original codebook 119861 will bedivided into |119878| clusters by iteratively merging the diametercluster pair with their respective neighbors An iterationparameter119873119894 is applied to obtain flexible embedding capacitythrough controlling the merging procedure The relationshipbetween119873119894 and the embedding capacity will be discussed inSection 43

Figure 3 is provided as an example to illustrate theproposed codebook partition algorithmThe white circle ldquoIrdquodenotes a codeword And the oval ldquordquo with shadow denotesa codeword and its neighbor in 119878 being processed whilethe oval ldquordquo without shadow represents a cluster in 1198781015840 thathas been formed The ldquo0rdquo ldquo1rdquo ldquo00rdquo ldquo01rdquo ldquo10rdquo or ldquo11rdquo ina circle ldquoIrdquo is the label of a codeword in the cluster Thecross ldquotimesrdquo means the centroid of the cluster it belongs to anda line ldquominusrdquo represents the diameter of a cluster set The firstto third merging iterations are shown in Figures 3(a)ndash3(c)respectively The fourth merging iteration is comprised of

Figures 3(d) and 3(e) and Figure 3(f) demonstrates thelabelling of the codewords

32 Embedding Procedure In our proposed method the ISFindices corresponding to the codewords in the codebook arefirst obtained by parsing the host AMR-WB speechThen theISF indices are employed to embed secret message based oncodebook partition Generally the codewords in the samecluster as the codeword referred by 119868119886 lies in are consideredto be replaceable with each other According to the secretmessage to be embedded 119868119886 may be substituted by one of theother codewordsrsquo indiceswithin the same clusterThenumberof secret message bits that can be embedded depends on thesize of the specific cluster The embedding procedures aregiven in the following

Step 1 Search cluster set 119878 for the cluster 119862 which containsthe codeword referred by the ISF index 119868119886Step 2 If there are 119873 codewords in 119862 the number of secretbits that can be embedded into 119868119886 is calculated as 119899 = lfloorlog2119873rfloorStep 3 Read 119899 not-yet-embedded bits denoted by 119898 fromthe secret message 119868119886 is replaced with 119868119887 which indexes thecodeword with the same label as119898

Step 4 Repeat Steps 1ndash3 until all the secret bits are embedded

Security and Communication Networks 5

(a) 1st iteration (119873119894 = 4) (b) 2nd iteration (119873119894 = 3) (c) 3rd iteration (119873119894 = 2)

(d) 4th iteration (119878 = 1198781015840 1198781015840clear()) (e) 4th iteration (119873119894 = 1)

0

0

1

1

11

11

00

0001 01

1010

(f) Labelling

Figure 3 An example of our proposed codebook partition

Input Codebook 119861 iterative parameter119873119894Output Cluster set 119878⋆ 1198781015840 is a helper cluster set ⋆1198781015840clear()119878clear()⋆ Each codeword is taken as a initial cluster ⋆for 119894 = 0 119894 lt 119873119888 ++119894 do119878push (119862119894)end⋆ Iterative merging ⋆while 119873119894 gt 0 do

if 119878 is empty then119878 = 11987810158401198781015840clear()end(1198621198891 1198621198891) = argmax119894119895isin12|119878|119863(119862119894 119862119895)1198791198901198981199011 = 1198621198891 cup 119873(1198621198891 119878)1198791198901198981199012 = 1198621198892 cup 119873(1198621198892 119878)1198781015840push (1198791198901198981199011)1198781015840push (1198791198901198981199012)119878remove (1198621198891)119878remove (1198621198892)119878remove (119873(1198621198891 119878))119878remove (119873(1198621198892 119878))119873119894 = 119873119894 minus 1

end⋆ Put the remaining clusters in 1198781015840 into 119878 ⋆for iter = 1198781015840begin() iter lt 1198781015840end() ++iterdo119878push (lowastiter)

endreturn 119878

Algorithm 1 DN-based codebook partition algorithm

Cluster set S

ISF

Stego ISF

Secret

00 01

1110

0 1

Search amp replace

ClusteL1

ClusteL2

index Ia

index Ib

Wa Wb

WcWd

bits ldquo01rdquo

Ia Ib

Figure 4 Embedding two bits into one cover ISF index

Figure 4 is an example of embedding two secret bits intoone cover ISF index Let us assume the cluster set 119878 containstwo clusters and the corresponding codeword indexed by 119868119909is119882119909 for example 119868119887 indexes the codeword119882119887 Hence theISF index 119868119886 shown in Figure 4 will be replaced with 119868119887 whichindexes the codeword119882119887 with the same label as the secret bitsldquo01rdquo

33 Extracting Procedure When the stego AMR-WB speechis transferred to the intended receiver the stego indices maybe obtained by parsing AMR-WB speech stream and used toextract the embedded secretmessageThemessage extractionprocedures from the stegoindex 119868119887 are given below

Step 1 Search cluster set 119878 which is the same as that employedin the embedding procedure for the cluster119862which containsthe codeword119882119887 referred by the ISF index 119868119887Step 2 If there are totally 119873 codewords in 119862 the number ofsecret bits carried by 119868119887 is computed by 119899 = lfloorlog2119873rfloor

6 Security and Communication Networks

Stego ISF

Cluster set S

Extracted

00 01

1110

0 1

Search amp read

ClusteL1

ClusteL2

index Ib

Wa Wb

Wc Wd

bits ldquo01rdquo

ldquo01rdquo

Ib

Figure 5 Extracting two bits from one stego-ISF index

Step 3 Read the label of119882119887 as the extracted 119899 bits which areappended to the secret message bit sequence

Step 4 Repeat Steps 1ndash3 until all the secret bits are recovered

Figure 5 is the corresponding example of extracting twosecret bits from the stegoindex 119868119887 generated by the previousembedding instance shown in Figure 4 It can be easily seenthat the extracted secret bits are identical to the embeddedsecret bits

4 Experimental Results and Analysis

In order to demonstrate the performance of the proposedmethod the perceptual quality of the stego AMR-WB speechwith secret message embedded using our method is com-puted and compared to that of the stego AMR-WB speechgenerated with CNV and NID steganography Moreover theflexibility of embedding capacity and the security regardingstatistical detection are analyzed in detail

41 Audio Database TIMIT acoustic-phonetic continuousspeech corpus (httpscatalogldcupenneduldc93s1) is anaudio database which contains broadband recordings of630 speakers of eight major dialects of American Englisheach reading ten phonetically rich sentences and all audiosentences are sampled at 16 kHz In our experiments 1000audio sentences are randomly chosen from TIMIT databaseThe average maximum and minimum length of the chosenaudio sentences are 347 s 396 s and 312 s All audio files areconverted into AMR-WB format using standard codec

42 Speech Quality Evaluation The perceptual evaluation ofspeech quality (PESQ) described in the ITU-T P862 Recom-mendation [20] may be employed to evaluate speech qualityMoreover according to ITU-T P8622 [21] the raw PESQscore can be converted to mean opinion score-listening qual-ity objective (MOS-LQO) which is more suitable for evalu-ating wideband speech Hence MOS-LQO is applied in ourexperimentsThe normal range ofMOS-LQO score is 1017 to4549 The higher the score the better the quality

Figure 6 shows the MOS-LQO scores of the 1000 coverAMR-WB speeches in 2385 kbits mode and the correspond-ing stego AMR-WB speeches using three different codebookpartition algorithmsThree progressive embedding rates that

StandardCNVNID

Ours

100 200 300 400 500 600 700 800 900 10000Sample index

18222630343842

MO

S-LQ

O sc

ore

(a) The embedding rate is 100 bps

StandardNIDOurs

18222630343842

MO

S-LQ

O sc

ore

100 200 300 400 500 600 700 800 900 10000Sample index

(b) The embedding rate is 200 bps

StandardNIDOurs

100 200 300 400 500 600 700 800 900 10000Sample index

1418222630343842

MO

S-LQ

O sc

ore

(c) The embedding rate is 300 bps

Figure 6 Comparisons of MOS-LQO values for 1000 samplesbetween the standard AMR-WB codec CNV-based steganographyNID-based steganography and the proposedDN-based steganogra-phy

is 100 bps 200 bps and 300 bps are employed in our experi-ments The indices of speech samples are sorted according totheMOS-LQO scores of our proposedmethod It can be seenfrom Figure 6 that the overall scores of the stego AMR-WBspeeches generated with our method are higher than thoseof the NID-based stego AMR-WB speeches especially whenthe embedding rates are 200 bps and 300 bps And the MOS-LQO scores of the CNV-based stego AMR-WB speeches areslightly higher than ours when the embedding rate is 100 bpswhich means there are no obvious discrepancies in speechquality between them Besides when the high embeddingrate that is 200 bps or 300 bps is used the decrease inMOS-LQO scores of our stego AMR-WB speeches is significantlysmaller than that of NID-based steganography

Moreover the average MOS-LQO scores of the coverAMR-WB speeches and the stego AMR-WB speeches withthree different codebook partition algorithms that is CNV

Security and Communication Networks 7

Table 1 MOS-LQO scores of the standard codec CNV-based NID-based and our proposed steganography in four different rate modes andthree embedding rates

Embedding rate Method Rate mode (kbits)1265 1585 1985 2385

Standard 2929 3073 3199 3269

100 bps

CNV 2871 3021 3153 3225(minus20) (minus17) (minus14) (minus13)NID 2750 2895 3020 3091(minus61) (minus58) (minus56) (minus54)Ours 2864 3010 3139 3216(minus22) (minus20) (minus19) (minus16)

200 bps

CNV

NID 2601 2736 2875 2921(minus112) (minus110) (minus107) (minus106)Ours 2807 2955 3084 3164(minus42) (minus38) (minus36) (minus32)

300 bps

CNV

NID 2284 2386 2475 2533(minus220) (minus223) (minus226) (minus225)Ours 2699 2841 2971 3046(minus79) (minus75) (minus71) (minus68)

NID and DN including four rate modes (1265 kbits1585 kbits 1985 kbits and 2385 kbits) together with threekinds of embedding rate (100 bps 200 bps and 300 bps) aregiven in Table 1 Only the MOS-LQO scores of NID-basedand DN-based steganographic methods with embeddingrates 200 bps and 300 bps are given in Table 1 because theembedding capacity of CNV-based steganography may notbe larger than 100 bps

When the embedding rate is 100 bps which is almostthe limit of CNV steganography we can see from Table 1that the mean MOS-LQO scores of our proposed methodare only about 03 worse than CNV-based steganographyThe slight decrease may be almost imperceptible by humanauditory system (HAS) And there are significant increases ofapproximately 38 in the meanMOS-LQO scores when ourpresented method is compared to NID-based steganographyAnd it can be observed that when the embedding rates are200 bps and 300 bps the scores of our approach are improvedby about 7 and 15 correspondingly in contrast to those ofNID-based steganography

Furthermore we can also see that the experimentalresults of four rate modes are analogous The decrease ofspeech quality caused by NID-based steganography is morethan twice that caused by DN-based steganography And theproposedmethod can obtain twice the embedding capacity ofCNV-based steganography by sacrificing less than 2 speechquality in average In addition only a slight decline in speechquality is observed when 300 bps embedding rate is used inthe proposed DN-based method while 200 bps is employedin NID-based method

43 Flexible Embedding Capacity Compared to CNV-basedsteganography flexible embedding capacity may be obtained

to satisfy different practical demand with our proposedmethod The steganographic capacity can be adjusted bychanging the iteration parameter 119873119894 For different values of119873119894 for example 119873119894 = 32 33 54 the average embeddingcapacity and the MOS-LQO scores are given in Figure 7(a)and the corresponding results of NID-based steganographyare provided in Figure 7(b) for comparison Without loss ofgenerality only 2385 kbits mode is used

From Figure 7 we can observe that the embedding ratesignificantly increases with 119873119894 while the MOS-LQO scoreslightly goes down However as NID-based steganographyis concerned the MOS-LQO score rapidly declines with theincrease of the embedding rateTherefore the proposed DN-based steganography can achieve higher embedding capacitywith slight decrease in speech quality For example when119873119894 = 48 the size of each cluster in 119878 is equal to 4 and we canembed 4 bits per frame that is the embedding rate is 200 bpsbut at the same time the CNV algorithm can embed at most2 bits per frame (100 bps)

44 Resistibility of Statistical Steganalysis Speech steganog-raphy aims to hide secret message into cover speech withoutarousing suspicion It is very important for a steganographicmethod to resist statistical steganalysis which is the tech-nique of detecting the presence of hidden message Twostate-of-the-art steganalytic methods [17 18] are used toevaluate the performance of statistical undetectability of ourproposed method In [17] mel-cepstrum coefficients andMarkov transition features from the second-order derivativeof the audio signal are extracted to capture the statisticaldistortions caused by audio steganography while in [18]the correlation characteristics of split vector quantizationcodewords of linear predictive coding filter coefficients are

8 Security and Communication Networks

Embe

ddin

g ra

te (b

ps)

Embedding rateMOS-LQO

330

290

250

210

170

130

90

Times of cluster merging

33

31

29

27

25

23

MO

S-LQ

O sc

ore

565350474441383532

(a) Our proposed steganography

Number of sub-codebooks

Embedding rateMOS-LQO

109876543290

130

170

210

250

290

330

Embe

ddin

g ra

te (b

ps)

23

25

27

29

31

33

MO

S-LQ

O sc

ore

(b) NID-based steganography

Figure 7 Relationship between the embedding rates and the MOS-LQO scores for our proposed steganography and NID-basedsteganography

Table 2 Steganalysis results of different steganographic methods in 2385 kbitss mode

Training rate 04 05 06Method Markov MFCC SS-QCCN RS-QCCN Markov MFCC SS-QCCN RS-QCCN Markov MFCC SS-QCCN RS-QCCN100 bps

CNV 498 498 437 490 501 502 440 492 500 505 419 500NID 510 601 422 500 501 609 429 487 521 598 418 494Ours 500 500 440 494 503 493 403 494 491 486 418 433

200 bpsCNV NID 535 745 469 500 533 762 476 500 536 758 444 501Ours 510 483 452 500 498 487 422 500 505 486 450 500

300 bpsCNV NID 548 746 493 500 563 772 500 500 554 783 505 506Ours 524 497 479 500 528 609 482 500 538 501 466 500

utilized to steganalyze QIM-based steganography in low-bit-rate speech (such as G7231 and G729) Both steganalyticmethods use a support vector machine to predict the exis-tence of hidden message in given audios

In our experiments the sentences chosen from ldquoTIMITrdquodatabases as stated in Section 41 are first encoded using thestandard AMR-WB codec These AMR-WB recordings con-stitute the cover speech setThen secret message is embeddedinto each cover AMR-WB speech with different embeddingrates that is 100 bps 200 bps and 300 bps by CNV-basedNID-based andDN-based steganographyOf course 200 bpsand 300 bps may be omitted for CNV-based steganogra-phy because of its limited embedding capacity And sevenstegospeech sets are generated amongwhich one set is relatedto CNV-based steganographic method and each of three setsis associated with NID-based and DN-based steganographyrespectivelyMoreover only 2385 kbitsmode is usedwithoutloss of generality

In each experiment a pair of cover and stego speech setsis randomly divided into training and testing sets accordingto three kinds of training rates that is 04 05 and 06 For

example if the training rate is 04 the training set contains40 speech samples randomly chosen from each of the coverand stegospeech sets and the remaining 60 samples go intothe testing set As described in [17 18] LIBSVM [22] is usedas a classifier and radial basis function (RBF) kernel and grid-search technique are employed to obtain better classificationperformance For Li et alrsquos steganalytic method the principalcomponent analysis (PCA) is first used as suggested in [18]to reduce the dimension of feature vectors to 300 Let thesamples in cover speech set denote negatives and those instego speech set stand for positives Hence the accuracy maybe defined as follows

Accuracy = 12 times ( TPTP + FN

+ TNFP + TN

) (5)

where TP are true positives TN are true negatives FN arefalse negatives and FP are false positives

The steganalytic results are given in Table 2 It can beseen that when the embedding rate is 100 bps the accuracyof detecting both CNV-based and DN-based methods isalmost the same say 50 or so while that of detecting

Security and Communication Networks 9

0 11 12 13 14 15 21 22 23 24 25 31 32 33 34 35 41 42 43 44 45 51 52 53 54 55Edge

0123456789

10

Cor

relat

ion

inde

x

12

13

14

15

23

24

25

34

35

45

Figure 8 The correlation index of 1000 AMR-WB speeches wherethe interframe edge 119894119894 connects two vertices 119881119894[119896] and 119881119894[119896 + 1] intwo neighboring frames and the intraframe edge 1198941198951015840 connects twovertices 119881119894[119896] and 119881119895[119896] in the same frame

NID-based steganography increases to 60 when MFCC-based steganalytic method is applied Moreover there isan apparent increase in the accuracy of detecting NID-based hiding method with the embedding rate increases to200 bps or 300 bps when Liu et alrsquos methods (ie Markovand MFCC-based steganalytic methods) are applied But theaccuracy of steganalyzing our proposed method DN-basedsteganography stays at the same level of 50 Therefore theproposed method may defend against Liu et alrsquos statisticalsteganalysis [17] even with higher embedding rates

According to the definition of the correlation index givenin [18] the experimental results of the correlation indices of1000 AMR-WB speeches which are randomly selected fromldquoTIMITrdquo are shown in Figure 8 Based on these results twostrong quantization codeword correlation network (QCCN)models say SS-QCCN and RS-QCCN can be constructedas illustrated in Figure 9 These two models are then usedto steganalyze our proposed steganography The steganalyticresults are also presented in Table 2 It can be seen fromTable 2 that the accuracy of both SS-QCCN and RS-QCCNis less than 50 for all of the AMR-WB stegospeeches Thepossible reasons may be that only the second and thirdcodebooks in the second stage are employed in the AMR-WB speech steganography which means merely the vertices1198812[119896] and 1198813[119896] in the 119896th frame may be changed duringsteganography while none of them are utilized in Li et alrsquossteganalytic method except for the edge ldquo33rdquo in RS-QCCNmodel Besides we also used an adapted QCCN model (ieutilize edges ldquo22rdquo ldquo33rdquo and ldquo231015840rdquo) targeted at AMR-WBspeech but the accuracy is still less than 50 It may bebecause the correlation of those edges is not strong enoughfor steganalysis according to Figure 8 Therefore it is reason-able to conclude that theAMR-WB speech steganography candefend against the steganalytic method proposed in [18]

In order to visualize the detection performance wegive some receiver operating characteristic (ROC) curvesof steganalyzing CNV-based steganography with 100 bpsembedding rate and NID-based and DN-based steganogra-phy with 100 bps 200 bps and 300 bps embedding rates are

V1[k]

V2[k]

V3[k]

V4[k]

V5[k]

V1[k + 1]

V2[k + 1]

V3[k + 1]

V4[k + 1]

V5[k + 1]

45 45

11

middot middot middot middot middot middot

(a) SS-QCCN

V1[k + 1]

V2[k + 1]

V3[k + 1]

V4[k + 1]

V5[k + 1]

V1[k]

V2[k]

V3[k]

V4[k]

V5[k]

45 45

15 15

1414

33

44

11

middot middot middot middot middot middot

(b) RS-QCCN

Figure 9 Two AMR-WB strong correlation network models

provided in Figure 10 (ROC curves for SS-QCCN and RS-QCCN are omitted for these two methods fail to steganalyzeAMR-WB steganography in spite of embedding capacity)It shows that all of the three steganographic methods canresist statistical steganalysis when the embedding rate is100 bps While the statistical steganalytic methods especiallyMFCC-based steganalysismay detect the existence of hiddenmessage embedded with NID-based steganography when theembedding rate is above 100 bps the proposed DN-basedsteganography may still have good security against bothMarkov-based and MFCC-based steganalysis

5 Conclusion

The adaptive multirate wideband (AMR-WB) is a widelyadapted format in mobile handsets and is also the recom-mended speech codec for VoLTE AMR-WB speech may bea good candidate for cover medium in speech steganographyIn this paper a novel AMR-WB speech steganographicmethod is proposed The experimental results demonstratedthe effectiveness of our proposed method The main contri-butions of this paper are as follows

(1) A novel AMR-WB speech steganography is pro-posed based on diameter-neighbor codebook parti-tion algorithm It can provide higher capacity with-out noticeable decrease in speech quality and better

10 Security and Communication Networks

DN

CNVNID

0

05

1

True

pos

itive

rate

02 04 06 08 10False positive rate

(a) Markov (TIMIT 100 bps)DN

CNVNID

02 04 06 08 10False positive rate

0

05

1

True

pos

itive

rate

(b) MFCC (TIMIT 100 bps)

NIDDN

0

05

1

True

pos

itive

rate

02 04 06 08 10False positive rate

(c) Markov (TIMIT 200 bps)

NIDDN

0

05

1

True

pos

itive

rate

02 04 06 08 10False positive rate

(d) MFCC (TIMIT 200 bps)

NIDDN

0

05

1

True

pos

itive

rate

02 04 06 08 10False positive rate

(e) Markov (TIMIT 300 bps)

NIDDN

0

05

1

True

pos

itive

rate

02 04 06 08 10False positive rate

(f) MFCC (TIMIT 300 bps)

Figure 10 ROC curves for steganalysis of CNV-based NID-based and our proposed steganography (50 training rate)

performance against statistical steganalysis thanNID-based method

(2) Flexible embedding capacity may be easily achievedwith different iterations of cluster merging Twicethe embedding capacity of CNV-based embeddingmethod may be obtained with119873119894 = 48

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work was partially supported by the National NaturalScience Foundation of China under Grant no 61632013

References

[1] W Bender D Gruhl N Morimoto and A Lu ldquoTechniques fordata hidingrdquo IBM Systems Journal vol 35 no 3-4 pp 313ndash3351996

[2] D Gruhl A Lu and W Bender ldquoEcho hidingrdquo in InformationHiding R Anderson Ed vol 1174 of Lecture Notes in ComputerScience pp 295ndash315 Springer Berlin Heidelberg Berlin Ger-many 1996

[3] K Gopalan ldquoAudio steganography using bit modificationrdquo inProceedings of the 2003 International Conference on Multimediaand Expo ICME 2003 pp I629ndashI632 USA July 2003

[4] K Gopalan S Wenndt S Adams and D Haddad ldquoAudiosteganography by amplitude or phasemodificationrdquo in Proceed-ings of the Security andWatermarking ofMultimedia Contents Vpp 67ndash76 USA January 2003

[5] D Kirovski and H S Malvar ldquoSpread-spectrum watermarkingof audio signalsrdquo IEEE Transactions on Signal Processing vol 51no 4 pp 1020ndash1033 2003

[6] L Liu M Li Q Li and Y Liang ldquoPerceptually transparentinformation hiding in G729 bitstreamrdquo in Proceedings of the2008 4th International Conference on Intelligent InformationHiding andMultiedia Signal Processing IIH-MSP 2008 pp 406ndash409 China August 2008

[7] T Xu and Z Yang ldquoSimple and effective speech steganog-raphy in G7231 low-rate codesrdquo in Proceedings of the 2009

Security and Communication Networks 11

International Conference on Wireless Communications and Sig-nal Processing WCSP 2009 China November 2009

[8] A Shahbazi A H Rezaie and R Shahbazi ldquoMELPe codedspeech hiding on enhanced full rate compressed domainrdquo inProceedings of the Asia Modelling Symposium 2010 4th Inter-national Conference on Mathematical Modelling and ComputerSimulation AMS2010 pp 267ndash270 Malaysia May 2010

[9] A Nishimura ldquoData hiding in pitch delay data of the adaptivemulti-rate narrow-band speech codecrdquo in Proceedings of theIIH-MSP 2009-2009 5th International Conference on IntelligentInformation Hiding and Multimedia Signal Processing pp 483ndash486 Japan September 2009

[10] B Xiao Y Huang and S Tang ldquoAn approach to informationhiding in low bit-rate speech streamrdquo in Proceedings of the2008 IEEE Global Telecommunications Conference GLOBE-COM 2008 pp 1940ndash1944 USA December 2008

[11] B Chen and G W Wornell ldquoQuantization index modulationa class of provably good methods for digital watermarking andinformation embeddingrdquo Institute of Electrical and ElectronicsEngineers Transactions on InformationTheory vol 47 no 4 pp1423ndash1443 2001

[12] Y F Huang S Tang and J Yuan ldquoSteganography in inactiveframes of VoIP streams encoded by source codecrdquo IEEETransactions on Information Forensics and Security vol 6 no2 pp 296ndash306 2011

[13] YHuang C Liu S Tang and S Bai ldquoSteganography integrationinto a low-bit rate speech codecrdquo IEEE Transactions on Informa-tion Forensics and Security vol 7 no 6 pp 1865ndash1875 2012

[14] H Miao L Huang Z Chen W Yang and A Al-Hawbani ldquoAnew scheme for covert communication via 3G encoded speechrdquoComputers and Electrical Engineering vol 38 no 6 pp 1490ndash1501 2012

[15] H Tian J Liu and S Li ldquoImproving security of quantization-index-modulation steganography in low bit-rate speechstreamsrdquoMultimedia Systems vol 20 no 2 pp 143ndash154 2014

[16] J Liu H Tian J Lu and Y Chen ldquoNeighbor-index-divisionsteganography based on QIM method for G7231 speechstreamsrdquo Journal of Ambient Intelligence and Humanized Com-puting vol 7 no 1 pp 139ndash147 2016

[17] Q Liu A H Sung and M Qiao ldquoDerivative-based audiosteganalysisrdquo ACM Transactions on Multimedia ComputingCommunications andApplications (TOMM) vol 7 no 3 articleno 18 2011

[18] S Li Y Jia and C-C J Kuo ldquoSteganalysis of QIM Steganogra-phy in Low-Bit-Rate Speech Signalsrdquo IEEEACM TransactionsonAudio Speech and Language Processing vol 25 no 5 pp 1011ndash1022 2017

[19] ITU-T Wideband Coding of Speech at around 16 Kbps UsingAdaptive Multi-rate Wideband (AMR-WB) International Tele-communication Union Std G7222 2002

[20] Perceptual Evaluation of Speech Quality (PESQ) An ObjectiveMethod for End-to-end Speech Quality Assessment of Narrow-band Telephone Net-works and Speech Codecs InternationalTelecommunication Union Std P862 2001

[21] Wideband Extension to Recommendation P862 for the Assess-ment of Wideband Telephone Networks and Speech CodecsInternational Telecommunication Union Std P8622 2007

[22] C Chang and C Lin ldquoLIBSVM a Library for support vectormachinesrdquo ACM Transactions on Intelligent Systems and Tech-nology vol 2 no 3 article 27 2011

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 4: A Novel AMR-WB Speech Steganography Based on Diameter ...In this section, a technical overview of AMR-WB codec is rstly presented. en two related codebook partition algorithms CNV

4 Security and Communication Networks

Codebook B

Initialize a cluster set Sby taking each codewordas a independent clusterand a empty cluster set S

Cluster set S

Put the remainingclusters in S into S

No

Yes

Put the clustersin S into S tomake S empty

S is empty

No

Yes

Search for the diametercluster pair(Cd1 Cd2) in S

Remove Cd1 Cd2 and theirneighbors from S put

Merge Cd1 Cd2 withtheir neighbors respectivelyinto two new clusters

Ni gt 0

Ni = Ni minus 1

into STemp1 and Temp2Temp1 and Temp2

Figure 2 Diagram of our proposed codebook partition

where |119878| is the number of clusters within the cluster set119878 The cluster pair with maximal Euclidean distance 119863119898called diameter cluster pair is denoted by (1198621198891 1198621198892) And theneighbor of a cluster119862 in 119878 is represented by119873(119862 119878) then wehave

119863 (119862119873 (119862 119878)) le 119863 (119862 119862119901) forall119901 = 1 2 |119878| (4)

Figure 2 illustrates the diagram of the proposed DNcodebook partition algorithm And its detailed procedureis given in Algorithm 1 The original codebook 119861 will bedivided into |119878| clusters by iteratively merging the diametercluster pair with their respective neighbors An iterationparameter119873119894 is applied to obtain flexible embedding capacitythrough controlling the merging procedure The relationshipbetween119873119894 and the embedding capacity will be discussed inSection 43

Figure 3 is provided as an example to illustrate theproposed codebook partition algorithmThe white circle ldquoIrdquodenotes a codeword And the oval ldquordquo with shadow denotesa codeword and its neighbor in 119878 being processed whilethe oval ldquordquo without shadow represents a cluster in 1198781015840 thathas been formed The ldquo0rdquo ldquo1rdquo ldquo00rdquo ldquo01rdquo ldquo10rdquo or ldquo11rdquo ina circle ldquoIrdquo is the label of a codeword in the cluster Thecross ldquotimesrdquo means the centroid of the cluster it belongs to anda line ldquominusrdquo represents the diameter of a cluster set The firstto third merging iterations are shown in Figures 3(a)ndash3(c)respectively The fourth merging iteration is comprised of

Figures 3(d) and 3(e) and Figure 3(f) demonstrates thelabelling of the codewords

32 Embedding Procedure In our proposed method the ISFindices corresponding to the codewords in the codebook arefirst obtained by parsing the host AMR-WB speechThen theISF indices are employed to embed secret message based oncodebook partition Generally the codewords in the samecluster as the codeword referred by 119868119886 lies in are consideredto be replaceable with each other According to the secretmessage to be embedded 119868119886 may be substituted by one of theother codewordsrsquo indiceswithin the same clusterThenumberof secret message bits that can be embedded depends on thesize of the specific cluster The embedding procedures aregiven in the following

Step 1 Search cluster set 119878 for the cluster 119862 which containsthe codeword referred by the ISF index 119868119886Step 2 If there are 119873 codewords in 119862 the number of secretbits that can be embedded into 119868119886 is calculated as 119899 = lfloorlog2119873rfloorStep 3 Read 119899 not-yet-embedded bits denoted by 119898 fromthe secret message 119868119886 is replaced with 119868119887 which indexes thecodeword with the same label as119898

Step 4 Repeat Steps 1ndash3 until all the secret bits are embedded

Security and Communication Networks 5

(a) 1st iteration (119873119894 = 4) (b) 2nd iteration (119873119894 = 3) (c) 3rd iteration (119873119894 = 2)

(d) 4th iteration (119878 = 1198781015840 1198781015840clear()) (e) 4th iteration (119873119894 = 1)

0

0

1

1

11

11

00

0001 01

1010

(f) Labelling

Figure 3 An example of our proposed codebook partition

Input Codebook 119861 iterative parameter119873119894Output Cluster set 119878⋆ 1198781015840 is a helper cluster set ⋆1198781015840clear()119878clear()⋆ Each codeword is taken as a initial cluster ⋆for 119894 = 0 119894 lt 119873119888 ++119894 do119878push (119862119894)end⋆ Iterative merging ⋆while 119873119894 gt 0 do

if 119878 is empty then119878 = 11987810158401198781015840clear()end(1198621198891 1198621198891) = argmax119894119895isin12|119878|119863(119862119894 119862119895)1198791198901198981199011 = 1198621198891 cup 119873(1198621198891 119878)1198791198901198981199012 = 1198621198892 cup 119873(1198621198892 119878)1198781015840push (1198791198901198981199011)1198781015840push (1198791198901198981199012)119878remove (1198621198891)119878remove (1198621198892)119878remove (119873(1198621198891 119878))119878remove (119873(1198621198892 119878))119873119894 = 119873119894 minus 1

end⋆ Put the remaining clusters in 1198781015840 into 119878 ⋆for iter = 1198781015840begin() iter lt 1198781015840end() ++iterdo119878push (lowastiter)

endreturn 119878

Algorithm 1 DN-based codebook partition algorithm

Cluster set S

ISF

Stego ISF

Secret

00 01

1110

0 1

Search amp replace

ClusteL1

ClusteL2

index Ia

index Ib

Wa Wb

WcWd

bits ldquo01rdquo

Ia Ib

Figure 4 Embedding two bits into one cover ISF index

Figure 4 is an example of embedding two secret bits intoone cover ISF index Let us assume the cluster set 119878 containstwo clusters and the corresponding codeword indexed by 119868119909is119882119909 for example 119868119887 indexes the codeword119882119887 Hence theISF index 119868119886 shown in Figure 4 will be replaced with 119868119887 whichindexes the codeword119882119887 with the same label as the secret bitsldquo01rdquo

33 Extracting Procedure When the stego AMR-WB speechis transferred to the intended receiver the stego indices maybe obtained by parsing AMR-WB speech stream and used toextract the embedded secretmessageThemessage extractionprocedures from the stegoindex 119868119887 are given below

Step 1 Search cluster set 119878 which is the same as that employedin the embedding procedure for the cluster119862which containsthe codeword119882119887 referred by the ISF index 119868119887Step 2 If there are totally 119873 codewords in 119862 the number ofsecret bits carried by 119868119887 is computed by 119899 = lfloorlog2119873rfloor

6 Security and Communication Networks

Stego ISF

Cluster set S

Extracted

00 01

1110

0 1

Search amp read

ClusteL1

ClusteL2

index Ib

Wa Wb

Wc Wd

bits ldquo01rdquo

ldquo01rdquo

Ib

Figure 5 Extracting two bits from one stego-ISF index

Step 3 Read the label of119882119887 as the extracted 119899 bits which areappended to the secret message bit sequence

Step 4 Repeat Steps 1ndash3 until all the secret bits are recovered

Figure 5 is the corresponding example of extracting twosecret bits from the stegoindex 119868119887 generated by the previousembedding instance shown in Figure 4 It can be easily seenthat the extracted secret bits are identical to the embeddedsecret bits

4 Experimental Results and Analysis

In order to demonstrate the performance of the proposedmethod the perceptual quality of the stego AMR-WB speechwith secret message embedded using our method is com-puted and compared to that of the stego AMR-WB speechgenerated with CNV and NID steganography Moreover theflexibility of embedding capacity and the security regardingstatistical detection are analyzed in detail

41 Audio Database TIMIT acoustic-phonetic continuousspeech corpus (httpscatalogldcupenneduldc93s1) is anaudio database which contains broadband recordings of630 speakers of eight major dialects of American Englisheach reading ten phonetically rich sentences and all audiosentences are sampled at 16 kHz In our experiments 1000audio sentences are randomly chosen from TIMIT databaseThe average maximum and minimum length of the chosenaudio sentences are 347 s 396 s and 312 s All audio files areconverted into AMR-WB format using standard codec

42 Speech Quality Evaluation The perceptual evaluation ofspeech quality (PESQ) described in the ITU-T P862 Recom-mendation [20] may be employed to evaluate speech qualityMoreover according to ITU-T P8622 [21] the raw PESQscore can be converted to mean opinion score-listening qual-ity objective (MOS-LQO) which is more suitable for evalu-ating wideband speech Hence MOS-LQO is applied in ourexperimentsThe normal range ofMOS-LQO score is 1017 to4549 The higher the score the better the quality

Figure 6 shows the MOS-LQO scores of the 1000 coverAMR-WB speeches in 2385 kbits mode and the correspond-ing stego AMR-WB speeches using three different codebookpartition algorithmsThree progressive embedding rates that

StandardCNVNID

Ours

100 200 300 400 500 600 700 800 900 10000Sample index

18222630343842

MO

S-LQ

O sc

ore

(a) The embedding rate is 100 bps

StandardNIDOurs

18222630343842

MO

S-LQ

O sc

ore

100 200 300 400 500 600 700 800 900 10000Sample index

(b) The embedding rate is 200 bps

StandardNIDOurs

100 200 300 400 500 600 700 800 900 10000Sample index

1418222630343842

MO

S-LQ

O sc

ore

(c) The embedding rate is 300 bps

Figure 6 Comparisons of MOS-LQO values for 1000 samplesbetween the standard AMR-WB codec CNV-based steganographyNID-based steganography and the proposedDN-based steganogra-phy

is 100 bps 200 bps and 300 bps are employed in our experi-ments The indices of speech samples are sorted according totheMOS-LQO scores of our proposedmethod It can be seenfrom Figure 6 that the overall scores of the stego AMR-WBspeeches generated with our method are higher than thoseof the NID-based stego AMR-WB speeches especially whenthe embedding rates are 200 bps and 300 bps And the MOS-LQO scores of the CNV-based stego AMR-WB speeches areslightly higher than ours when the embedding rate is 100 bpswhich means there are no obvious discrepancies in speechquality between them Besides when the high embeddingrate that is 200 bps or 300 bps is used the decrease inMOS-LQO scores of our stego AMR-WB speeches is significantlysmaller than that of NID-based steganography

Moreover the average MOS-LQO scores of the coverAMR-WB speeches and the stego AMR-WB speeches withthree different codebook partition algorithms that is CNV

Security and Communication Networks 7

Table 1 MOS-LQO scores of the standard codec CNV-based NID-based and our proposed steganography in four different rate modes andthree embedding rates

Embedding rate Method Rate mode (kbits)1265 1585 1985 2385

Standard 2929 3073 3199 3269

100 bps

CNV 2871 3021 3153 3225(minus20) (minus17) (minus14) (minus13)NID 2750 2895 3020 3091(minus61) (minus58) (minus56) (minus54)Ours 2864 3010 3139 3216(minus22) (minus20) (minus19) (minus16)

200 bps

CNV

NID 2601 2736 2875 2921(minus112) (minus110) (minus107) (minus106)Ours 2807 2955 3084 3164(minus42) (minus38) (minus36) (minus32)

300 bps

CNV

NID 2284 2386 2475 2533(minus220) (minus223) (minus226) (minus225)Ours 2699 2841 2971 3046(minus79) (minus75) (minus71) (minus68)

NID and DN including four rate modes (1265 kbits1585 kbits 1985 kbits and 2385 kbits) together with threekinds of embedding rate (100 bps 200 bps and 300 bps) aregiven in Table 1 Only the MOS-LQO scores of NID-basedand DN-based steganographic methods with embeddingrates 200 bps and 300 bps are given in Table 1 because theembedding capacity of CNV-based steganography may notbe larger than 100 bps

When the embedding rate is 100 bps which is almostthe limit of CNV steganography we can see from Table 1that the mean MOS-LQO scores of our proposed methodare only about 03 worse than CNV-based steganographyThe slight decrease may be almost imperceptible by humanauditory system (HAS) And there are significant increases ofapproximately 38 in the meanMOS-LQO scores when ourpresented method is compared to NID-based steganographyAnd it can be observed that when the embedding rates are200 bps and 300 bps the scores of our approach are improvedby about 7 and 15 correspondingly in contrast to those ofNID-based steganography

Furthermore we can also see that the experimentalresults of four rate modes are analogous The decrease ofspeech quality caused by NID-based steganography is morethan twice that caused by DN-based steganography And theproposedmethod can obtain twice the embedding capacity ofCNV-based steganography by sacrificing less than 2 speechquality in average In addition only a slight decline in speechquality is observed when 300 bps embedding rate is used inthe proposed DN-based method while 200 bps is employedin NID-based method

43 Flexible Embedding Capacity Compared to CNV-basedsteganography flexible embedding capacity may be obtained

to satisfy different practical demand with our proposedmethod The steganographic capacity can be adjusted bychanging the iteration parameter 119873119894 For different values of119873119894 for example 119873119894 = 32 33 54 the average embeddingcapacity and the MOS-LQO scores are given in Figure 7(a)and the corresponding results of NID-based steganographyare provided in Figure 7(b) for comparison Without loss ofgenerality only 2385 kbits mode is used

From Figure 7 we can observe that the embedding ratesignificantly increases with 119873119894 while the MOS-LQO scoreslightly goes down However as NID-based steganographyis concerned the MOS-LQO score rapidly declines with theincrease of the embedding rateTherefore the proposed DN-based steganography can achieve higher embedding capacitywith slight decrease in speech quality For example when119873119894 = 48 the size of each cluster in 119878 is equal to 4 and we canembed 4 bits per frame that is the embedding rate is 200 bpsbut at the same time the CNV algorithm can embed at most2 bits per frame (100 bps)

44 Resistibility of Statistical Steganalysis Speech steganog-raphy aims to hide secret message into cover speech withoutarousing suspicion It is very important for a steganographicmethod to resist statistical steganalysis which is the tech-nique of detecting the presence of hidden message Twostate-of-the-art steganalytic methods [17 18] are used toevaluate the performance of statistical undetectability of ourproposed method In [17] mel-cepstrum coefficients andMarkov transition features from the second-order derivativeof the audio signal are extracted to capture the statisticaldistortions caused by audio steganography while in [18]the correlation characteristics of split vector quantizationcodewords of linear predictive coding filter coefficients are

8 Security and Communication Networks

Embe

ddin

g ra

te (b

ps)

Embedding rateMOS-LQO

330

290

250

210

170

130

90

Times of cluster merging

33

31

29

27

25

23

MO

S-LQ

O sc

ore

565350474441383532

(a) Our proposed steganography

Number of sub-codebooks

Embedding rateMOS-LQO

109876543290

130

170

210

250

290

330

Embe

ddin

g ra

te (b

ps)

23

25

27

29

31

33

MO

S-LQ

O sc

ore

(b) NID-based steganography

Figure 7 Relationship between the embedding rates and the MOS-LQO scores for our proposed steganography and NID-basedsteganography

Table 2 Steganalysis results of different steganographic methods in 2385 kbitss mode

Training rate 04 05 06Method Markov MFCC SS-QCCN RS-QCCN Markov MFCC SS-QCCN RS-QCCN Markov MFCC SS-QCCN RS-QCCN100 bps

CNV 498 498 437 490 501 502 440 492 500 505 419 500NID 510 601 422 500 501 609 429 487 521 598 418 494Ours 500 500 440 494 503 493 403 494 491 486 418 433

200 bpsCNV NID 535 745 469 500 533 762 476 500 536 758 444 501Ours 510 483 452 500 498 487 422 500 505 486 450 500

300 bpsCNV NID 548 746 493 500 563 772 500 500 554 783 505 506Ours 524 497 479 500 528 609 482 500 538 501 466 500

utilized to steganalyze QIM-based steganography in low-bit-rate speech (such as G7231 and G729) Both steganalyticmethods use a support vector machine to predict the exis-tence of hidden message in given audios

In our experiments the sentences chosen from ldquoTIMITrdquodatabases as stated in Section 41 are first encoded using thestandard AMR-WB codec These AMR-WB recordings con-stitute the cover speech setThen secret message is embeddedinto each cover AMR-WB speech with different embeddingrates that is 100 bps 200 bps and 300 bps by CNV-basedNID-based andDN-based steganographyOf course 200 bpsand 300 bps may be omitted for CNV-based steganogra-phy because of its limited embedding capacity And sevenstegospeech sets are generated amongwhich one set is relatedto CNV-based steganographic method and each of three setsis associated with NID-based and DN-based steganographyrespectivelyMoreover only 2385 kbitsmode is usedwithoutloss of generality

In each experiment a pair of cover and stego speech setsis randomly divided into training and testing sets accordingto three kinds of training rates that is 04 05 and 06 For

example if the training rate is 04 the training set contains40 speech samples randomly chosen from each of the coverand stegospeech sets and the remaining 60 samples go intothe testing set As described in [17 18] LIBSVM [22] is usedas a classifier and radial basis function (RBF) kernel and grid-search technique are employed to obtain better classificationperformance For Li et alrsquos steganalytic method the principalcomponent analysis (PCA) is first used as suggested in [18]to reduce the dimension of feature vectors to 300 Let thesamples in cover speech set denote negatives and those instego speech set stand for positives Hence the accuracy maybe defined as follows

Accuracy = 12 times ( TPTP + FN

+ TNFP + TN

) (5)

where TP are true positives TN are true negatives FN arefalse negatives and FP are false positives

The steganalytic results are given in Table 2 It can beseen that when the embedding rate is 100 bps the accuracyof detecting both CNV-based and DN-based methods isalmost the same say 50 or so while that of detecting

Security and Communication Networks 9

0 11 12 13 14 15 21 22 23 24 25 31 32 33 34 35 41 42 43 44 45 51 52 53 54 55Edge

0123456789

10

Cor

relat

ion

inde

x

12

13

14

15

23

24

25

34

35

45

Figure 8 The correlation index of 1000 AMR-WB speeches wherethe interframe edge 119894119894 connects two vertices 119881119894[119896] and 119881119894[119896 + 1] intwo neighboring frames and the intraframe edge 1198941198951015840 connects twovertices 119881119894[119896] and 119881119895[119896] in the same frame

NID-based steganography increases to 60 when MFCC-based steganalytic method is applied Moreover there isan apparent increase in the accuracy of detecting NID-based hiding method with the embedding rate increases to200 bps or 300 bps when Liu et alrsquos methods (ie Markovand MFCC-based steganalytic methods) are applied But theaccuracy of steganalyzing our proposed method DN-basedsteganography stays at the same level of 50 Therefore theproposed method may defend against Liu et alrsquos statisticalsteganalysis [17] even with higher embedding rates

According to the definition of the correlation index givenin [18] the experimental results of the correlation indices of1000 AMR-WB speeches which are randomly selected fromldquoTIMITrdquo are shown in Figure 8 Based on these results twostrong quantization codeword correlation network (QCCN)models say SS-QCCN and RS-QCCN can be constructedas illustrated in Figure 9 These two models are then usedto steganalyze our proposed steganography The steganalyticresults are also presented in Table 2 It can be seen fromTable 2 that the accuracy of both SS-QCCN and RS-QCCNis less than 50 for all of the AMR-WB stegospeeches Thepossible reasons may be that only the second and thirdcodebooks in the second stage are employed in the AMR-WB speech steganography which means merely the vertices1198812[119896] and 1198813[119896] in the 119896th frame may be changed duringsteganography while none of them are utilized in Li et alrsquossteganalytic method except for the edge ldquo33rdquo in RS-QCCNmodel Besides we also used an adapted QCCN model (ieutilize edges ldquo22rdquo ldquo33rdquo and ldquo231015840rdquo) targeted at AMR-WBspeech but the accuracy is still less than 50 It may bebecause the correlation of those edges is not strong enoughfor steganalysis according to Figure 8 Therefore it is reason-able to conclude that theAMR-WB speech steganography candefend against the steganalytic method proposed in [18]

In order to visualize the detection performance wegive some receiver operating characteristic (ROC) curvesof steganalyzing CNV-based steganography with 100 bpsembedding rate and NID-based and DN-based steganogra-phy with 100 bps 200 bps and 300 bps embedding rates are

V1[k]

V2[k]

V3[k]

V4[k]

V5[k]

V1[k + 1]

V2[k + 1]

V3[k + 1]

V4[k + 1]

V5[k + 1]

45 45

11

middot middot middot middot middot middot

(a) SS-QCCN

V1[k + 1]

V2[k + 1]

V3[k + 1]

V4[k + 1]

V5[k + 1]

V1[k]

V2[k]

V3[k]

V4[k]

V5[k]

45 45

15 15

1414

33

44

11

middot middot middot middot middot middot

(b) RS-QCCN

Figure 9 Two AMR-WB strong correlation network models

provided in Figure 10 (ROC curves for SS-QCCN and RS-QCCN are omitted for these two methods fail to steganalyzeAMR-WB steganography in spite of embedding capacity)It shows that all of the three steganographic methods canresist statistical steganalysis when the embedding rate is100 bps While the statistical steganalytic methods especiallyMFCC-based steganalysismay detect the existence of hiddenmessage embedded with NID-based steganography when theembedding rate is above 100 bps the proposed DN-basedsteganography may still have good security against bothMarkov-based and MFCC-based steganalysis

5 Conclusion

The adaptive multirate wideband (AMR-WB) is a widelyadapted format in mobile handsets and is also the recom-mended speech codec for VoLTE AMR-WB speech may bea good candidate for cover medium in speech steganographyIn this paper a novel AMR-WB speech steganographicmethod is proposed The experimental results demonstratedthe effectiveness of our proposed method The main contri-butions of this paper are as follows

(1) A novel AMR-WB speech steganography is pro-posed based on diameter-neighbor codebook parti-tion algorithm It can provide higher capacity with-out noticeable decrease in speech quality and better

10 Security and Communication Networks

DN

CNVNID

0

05

1

True

pos

itive

rate

02 04 06 08 10False positive rate

(a) Markov (TIMIT 100 bps)DN

CNVNID

02 04 06 08 10False positive rate

0

05

1

True

pos

itive

rate

(b) MFCC (TIMIT 100 bps)

NIDDN

0

05

1

True

pos

itive

rate

02 04 06 08 10False positive rate

(c) Markov (TIMIT 200 bps)

NIDDN

0

05

1

True

pos

itive

rate

02 04 06 08 10False positive rate

(d) MFCC (TIMIT 200 bps)

NIDDN

0

05

1

True

pos

itive

rate

02 04 06 08 10False positive rate

(e) Markov (TIMIT 300 bps)

NIDDN

0

05

1

True

pos

itive

rate

02 04 06 08 10False positive rate

(f) MFCC (TIMIT 300 bps)

Figure 10 ROC curves for steganalysis of CNV-based NID-based and our proposed steganography (50 training rate)

performance against statistical steganalysis thanNID-based method

(2) Flexible embedding capacity may be easily achievedwith different iterations of cluster merging Twicethe embedding capacity of CNV-based embeddingmethod may be obtained with119873119894 = 48

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work was partially supported by the National NaturalScience Foundation of China under Grant no 61632013

References

[1] W Bender D Gruhl N Morimoto and A Lu ldquoTechniques fordata hidingrdquo IBM Systems Journal vol 35 no 3-4 pp 313ndash3351996

[2] D Gruhl A Lu and W Bender ldquoEcho hidingrdquo in InformationHiding R Anderson Ed vol 1174 of Lecture Notes in ComputerScience pp 295ndash315 Springer Berlin Heidelberg Berlin Ger-many 1996

[3] K Gopalan ldquoAudio steganography using bit modificationrdquo inProceedings of the 2003 International Conference on Multimediaand Expo ICME 2003 pp I629ndashI632 USA July 2003

[4] K Gopalan S Wenndt S Adams and D Haddad ldquoAudiosteganography by amplitude or phasemodificationrdquo in Proceed-ings of the Security andWatermarking ofMultimedia Contents Vpp 67ndash76 USA January 2003

[5] D Kirovski and H S Malvar ldquoSpread-spectrum watermarkingof audio signalsrdquo IEEE Transactions on Signal Processing vol 51no 4 pp 1020ndash1033 2003

[6] L Liu M Li Q Li and Y Liang ldquoPerceptually transparentinformation hiding in G729 bitstreamrdquo in Proceedings of the2008 4th International Conference on Intelligent InformationHiding andMultiedia Signal Processing IIH-MSP 2008 pp 406ndash409 China August 2008

[7] T Xu and Z Yang ldquoSimple and effective speech steganog-raphy in G7231 low-rate codesrdquo in Proceedings of the 2009

Security and Communication Networks 11

International Conference on Wireless Communications and Sig-nal Processing WCSP 2009 China November 2009

[8] A Shahbazi A H Rezaie and R Shahbazi ldquoMELPe codedspeech hiding on enhanced full rate compressed domainrdquo inProceedings of the Asia Modelling Symposium 2010 4th Inter-national Conference on Mathematical Modelling and ComputerSimulation AMS2010 pp 267ndash270 Malaysia May 2010

[9] A Nishimura ldquoData hiding in pitch delay data of the adaptivemulti-rate narrow-band speech codecrdquo in Proceedings of theIIH-MSP 2009-2009 5th International Conference on IntelligentInformation Hiding and Multimedia Signal Processing pp 483ndash486 Japan September 2009

[10] B Xiao Y Huang and S Tang ldquoAn approach to informationhiding in low bit-rate speech streamrdquo in Proceedings of the2008 IEEE Global Telecommunications Conference GLOBE-COM 2008 pp 1940ndash1944 USA December 2008

[11] B Chen and G W Wornell ldquoQuantization index modulationa class of provably good methods for digital watermarking andinformation embeddingrdquo Institute of Electrical and ElectronicsEngineers Transactions on InformationTheory vol 47 no 4 pp1423ndash1443 2001

[12] Y F Huang S Tang and J Yuan ldquoSteganography in inactiveframes of VoIP streams encoded by source codecrdquo IEEETransactions on Information Forensics and Security vol 6 no2 pp 296ndash306 2011

[13] YHuang C Liu S Tang and S Bai ldquoSteganography integrationinto a low-bit rate speech codecrdquo IEEE Transactions on Informa-tion Forensics and Security vol 7 no 6 pp 1865ndash1875 2012

[14] H Miao L Huang Z Chen W Yang and A Al-Hawbani ldquoAnew scheme for covert communication via 3G encoded speechrdquoComputers and Electrical Engineering vol 38 no 6 pp 1490ndash1501 2012

[15] H Tian J Liu and S Li ldquoImproving security of quantization-index-modulation steganography in low bit-rate speechstreamsrdquoMultimedia Systems vol 20 no 2 pp 143ndash154 2014

[16] J Liu H Tian J Lu and Y Chen ldquoNeighbor-index-divisionsteganography based on QIM method for G7231 speechstreamsrdquo Journal of Ambient Intelligence and Humanized Com-puting vol 7 no 1 pp 139ndash147 2016

[17] Q Liu A H Sung and M Qiao ldquoDerivative-based audiosteganalysisrdquo ACM Transactions on Multimedia ComputingCommunications andApplications (TOMM) vol 7 no 3 articleno 18 2011

[18] S Li Y Jia and C-C J Kuo ldquoSteganalysis of QIM Steganogra-phy in Low-Bit-Rate Speech Signalsrdquo IEEEACM TransactionsonAudio Speech and Language Processing vol 25 no 5 pp 1011ndash1022 2017

[19] ITU-T Wideband Coding of Speech at around 16 Kbps UsingAdaptive Multi-rate Wideband (AMR-WB) International Tele-communication Union Std G7222 2002

[20] Perceptual Evaluation of Speech Quality (PESQ) An ObjectiveMethod for End-to-end Speech Quality Assessment of Narrow-band Telephone Net-works and Speech Codecs InternationalTelecommunication Union Std P862 2001

[21] Wideband Extension to Recommendation P862 for the Assess-ment of Wideband Telephone Networks and Speech CodecsInternational Telecommunication Union Std P8622 2007

[22] C Chang and C Lin ldquoLIBSVM a Library for support vectormachinesrdquo ACM Transactions on Intelligent Systems and Tech-nology vol 2 no 3 article 27 2011

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 5: A Novel AMR-WB Speech Steganography Based on Diameter ...In this section, a technical overview of AMR-WB codec is rstly presented. en two related codebook partition algorithms CNV

Security and Communication Networks 5

(a) 1st iteration (119873119894 = 4) (b) 2nd iteration (119873119894 = 3) (c) 3rd iteration (119873119894 = 2)

(d) 4th iteration (119878 = 1198781015840 1198781015840clear()) (e) 4th iteration (119873119894 = 1)

0

0

1

1

11

11

00

0001 01

1010

(f) Labelling

Figure 3 An example of our proposed codebook partition

Input Codebook 119861 iterative parameter119873119894Output Cluster set 119878⋆ 1198781015840 is a helper cluster set ⋆1198781015840clear()119878clear()⋆ Each codeword is taken as a initial cluster ⋆for 119894 = 0 119894 lt 119873119888 ++119894 do119878push (119862119894)end⋆ Iterative merging ⋆while 119873119894 gt 0 do

if 119878 is empty then119878 = 11987810158401198781015840clear()end(1198621198891 1198621198891) = argmax119894119895isin12|119878|119863(119862119894 119862119895)1198791198901198981199011 = 1198621198891 cup 119873(1198621198891 119878)1198791198901198981199012 = 1198621198892 cup 119873(1198621198892 119878)1198781015840push (1198791198901198981199011)1198781015840push (1198791198901198981199012)119878remove (1198621198891)119878remove (1198621198892)119878remove (119873(1198621198891 119878))119878remove (119873(1198621198892 119878))119873119894 = 119873119894 minus 1

end⋆ Put the remaining clusters in 1198781015840 into 119878 ⋆for iter = 1198781015840begin() iter lt 1198781015840end() ++iterdo119878push (lowastiter)

endreturn 119878

Algorithm 1 DN-based codebook partition algorithm

Cluster set S

ISF

Stego ISF

Secret

00 01

1110

0 1

Search amp replace

ClusteL1

ClusteL2

index Ia

index Ib

Wa Wb

WcWd

bits ldquo01rdquo

Ia Ib

Figure 4 Embedding two bits into one cover ISF index

Figure 4 is an example of embedding two secret bits intoone cover ISF index Let us assume the cluster set 119878 containstwo clusters and the corresponding codeword indexed by 119868119909is119882119909 for example 119868119887 indexes the codeword119882119887 Hence theISF index 119868119886 shown in Figure 4 will be replaced with 119868119887 whichindexes the codeword119882119887 with the same label as the secret bitsldquo01rdquo

33 Extracting Procedure When the stego AMR-WB speechis transferred to the intended receiver the stego indices maybe obtained by parsing AMR-WB speech stream and used toextract the embedded secretmessageThemessage extractionprocedures from the stegoindex 119868119887 are given below

Step 1 Search cluster set 119878 which is the same as that employedin the embedding procedure for the cluster119862which containsthe codeword119882119887 referred by the ISF index 119868119887Step 2 If there are totally 119873 codewords in 119862 the number ofsecret bits carried by 119868119887 is computed by 119899 = lfloorlog2119873rfloor

6 Security and Communication Networks

Stego ISF

Cluster set S

Extracted

00 01

1110

0 1

Search amp read

ClusteL1

ClusteL2

index Ib

Wa Wb

Wc Wd

bits ldquo01rdquo

ldquo01rdquo

Ib

Figure 5 Extracting two bits from one stego-ISF index

Step 3 Read the label of119882119887 as the extracted 119899 bits which areappended to the secret message bit sequence

Step 4 Repeat Steps 1ndash3 until all the secret bits are recovered

Figure 5 is the corresponding example of extracting twosecret bits from the stegoindex 119868119887 generated by the previousembedding instance shown in Figure 4 It can be easily seenthat the extracted secret bits are identical to the embeddedsecret bits

4 Experimental Results and Analysis

In order to demonstrate the performance of the proposedmethod the perceptual quality of the stego AMR-WB speechwith secret message embedded using our method is com-puted and compared to that of the stego AMR-WB speechgenerated with CNV and NID steganography Moreover theflexibility of embedding capacity and the security regardingstatistical detection are analyzed in detail

41 Audio Database TIMIT acoustic-phonetic continuousspeech corpus (httpscatalogldcupenneduldc93s1) is anaudio database which contains broadband recordings of630 speakers of eight major dialects of American Englisheach reading ten phonetically rich sentences and all audiosentences are sampled at 16 kHz In our experiments 1000audio sentences are randomly chosen from TIMIT databaseThe average maximum and minimum length of the chosenaudio sentences are 347 s 396 s and 312 s All audio files areconverted into AMR-WB format using standard codec

42 Speech Quality Evaluation The perceptual evaluation ofspeech quality (PESQ) described in the ITU-T P862 Recom-mendation [20] may be employed to evaluate speech qualityMoreover according to ITU-T P8622 [21] the raw PESQscore can be converted to mean opinion score-listening qual-ity objective (MOS-LQO) which is more suitable for evalu-ating wideband speech Hence MOS-LQO is applied in ourexperimentsThe normal range ofMOS-LQO score is 1017 to4549 The higher the score the better the quality

Figure 6 shows the MOS-LQO scores of the 1000 coverAMR-WB speeches in 2385 kbits mode and the correspond-ing stego AMR-WB speeches using three different codebookpartition algorithmsThree progressive embedding rates that

StandardCNVNID

Ours

100 200 300 400 500 600 700 800 900 10000Sample index

18222630343842

MO

S-LQ

O sc

ore

(a) The embedding rate is 100 bps

StandardNIDOurs

18222630343842

MO

S-LQ

O sc

ore

100 200 300 400 500 600 700 800 900 10000Sample index

(b) The embedding rate is 200 bps

StandardNIDOurs

100 200 300 400 500 600 700 800 900 10000Sample index

1418222630343842

MO

S-LQ

O sc

ore

(c) The embedding rate is 300 bps

Figure 6 Comparisons of MOS-LQO values for 1000 samplesbetween the standard AMR-WB codec CNV-based steganographyNID-based steganography and the proposedDN-based steganogra-phy

is 100 bps 200 bps and 300 bps are employed in our experi-ments The indices of speech samples are sorted according totheMOS-LQO scores of our proposedmethod It can be seenfrom Figure 6 that the overall scores of the stego AMR-WBspeeches generated with our method are higher than thoseof the NID-based stego AMR-WB speeches especially whenthe embedding rates are 200 bps and 300 bps And the MOS-LQO scores of the CNV-based stego AMR-WB speeches areslightly higher than ours when the embedding rate is 100 bpswhich means there are no obvious discrepancies in speechquality between them Besides when the high embeddingrate that is 200 bps or 300 bps is used the decrease inMOS-LQO scores of our stego AMR-WB speeches is significantlysmaller than that of NID-based steganography

Moreover the average MOS-LQO scores of the coverAMR-WB speeches and the stego AMR-WB speeches withthree different codebook partition algorithms that is CNV

Security and Communication Networks 7

Table 1 MOS-LQO scores of the standard codec CNV-based NID-based and our proposed steganography in four different rate modes andthree embedding rates

Embedding rate Method Rate mode (kbits)1265 1585 1985 2385

Standard 2929 3073 3199 3269

100 bps

CNV 2871 3021 3153 3225(minus20) (minus17) (minus14) (minus13)NID 2750 2895 3020 3091(minus61) (minus58) (minus56) (minus54)Ours 2864 3010 3139 3216(minus22) (minus20) (minus19) (minus16)

200 bps

CNV

NID 2601 2736 2875 2921(minus112) (minus110) (minus107) (minus106)Ours 2807 2955 3084 3164(minus42) (minus38) (minus36) (minus32)

300 bps

CNV

NID 2284 2386 2475 2533(minus220) (minus223) (minus226) (minus225)Ours 2699 2841 2971 3046(minus79) (minus75) (minus71) (minus68)

NID and DN including four rate modes (1265 kbits1585 kbits 1985 kbits and 2385 kbits) together with threekinds of embedding rate (100 bps 200 bps and 300 bps) aregiven in Table 1 Only the MOS-LQO scores of NID-basedand DN-based steganographic methods with embeddingrates 200 bps and 300 bps are given in Table 1 because theembedding capacity of CNV-based steganography may notbe larger than 100 bps

When the embedding rate is 100 bps which is almostthe limit of CNV steganography we can see from Table 1that the mean MOS-LQO scores of our proposed methodare only about 03 worse than CNV-based steganographyThe slight decrease may be almost imperceptible by humanauditory system (HAS) And there are significant increases ofapproximately 38 in the meanMOS-LQO scores when ourpresented method is compared to NID-based steganographyAnd it can be observed that when the embedding rates are200 bps and 300 bps the scores of our approach are improvedby about 7 and 15 correspondingly in contrast to those ofNID-based steganography

Furthermore we can also see that the experimentalresults of four rate modes are analogous The decrease ofspeech quality caused by NID-based steganography is morethan twice that caused by DN-based steganography And theproposedmethod can obtain twice the embedding capacity ofCNV-based steganography by sacrificing less than 2 speechquality in average In addition only a slight decline in speechquality is observed when 300 bps embedding rate is used inthe proposed DN-based method while 200 bps is employedin NID-based method

43 Flexible Embedding Capacity Compared to CNV-basedsteganography flexible embedding capacity may be obtained

to satisfy different practical demand with our proposedmethod The steganographic capacity can be adjusted bychanging the iteration parameter 119873119894 For different values of119873119894 for example 119873119894 = 32 33 54 the average embeddingcapacity and the MOS-LQO scores are given in Figure 7(a)and the corresponding results of NID-based steganographyare provided in Figure 7(b) for comparison Without loss ofgenerality only 2385 kbits mode is used

From Figure 7 we can observe that the embedding ratesignificantly increases with 119873119894 while the MOS-LQO scoreslightly goes down However as NID-based steganographyis concerned the MOS-LQO score rapidly declines with theincrease of the embedding rateTherefore the proposed DN-based steganography can achieve higher embedding capacitywith slight decrease in speech quality For example when119873119894 = 48 the size of each cluster in 119878 is equal to 4 and we canembed 4 bits per frame that is the embedding rate is 200 bpsbut at the same time the CNV algorithm can embed at most2 bits per frame (100 bps)

44 Resistibility of Statistical Steganalysis Speech steganog-raphy aims to hide secret message into cover speech withoutarousing suspicion It is very important for a steganographicmethod to resist statistical steganalysis which is the tech-nique of detecting the presence of hidden message Twostate-of-the-art steganalytic methods [17 18] are used toevaluate the performance of statistical undetectability of ourproposed method In [17] mel-cepstrum coefficients andMarkov transition features from the second-order derivativeof the audio signal are extracted to capture the statisticaldistortions caused by audio steganography while in [18]the correlation characteristics of split vector quantizationcodewords of linear predictive coding filter coefficients are

8 Security and Communication Networks

Embe

ddin

g ra

te (b

ps)

Embedding rateMOS-LQO

330

290

250

210

170

130

90

Times of cluster merging

33

31

29

27

25

23

MO

S-LQ

O sc

ore

565350474441383532

(a) Our proposed steganography

Number of sub-codebooks

Embedding rateMOS-LQO

109876543290

130

170

210

250

290

330

Embe

ddin

g ra

te (b

ps)

23

25

27

29

31

33

MO

S-LQ

O sc

ore

(b) NID-based steganography

Figure 7 Relationship between the embedding rates and the MOS-LQO scores for our proposed steganography and NID-basedsteganography

Table 2 Steganalysis results of different steganographic methods in 2385 kbitss mode

Training rate 04 05 06Method Markov MFCC SS-QCCN RS-QCCN Markov MFCC SS-QCCN RS-QCCN Markov MFCC SS-QCCN RS-QCCN100 bps

CNV 498 498 437 490 501 502 440 492 500 505 419 500NID 510 601 422 500 501 609 429 487 521 598 418 494Ours 500 500 440 494 503 493 403 494 491 486 418 433

200 bpsCNV NID 535 745 469 500 533 762 476 500 536 758 444 501Ours 510 483 452 500 498 487 422 500 505 486 450 500

300 bpsCNV NID 548 746 493 500 563 772 500 500 554 783 505 506Ours 524 497 479 500 528 609 482 500 538 501 466 500

utilized to steganalyze QIM-based steganography in low-bit-rate speech (such as G7231 and G729) Both steganalyticmethods use a support vector machine to predict the exis-tence of hidden message in given audios

In our experiments the sentences chosen from ldquoTIMITrdquodatabases as stated in Section 41 are first encoded using thestandard AMR-WB codec These AMR-WB recordings con-stitute the cover speech setThen secret message is embeddedinto each cover AMR-WB speech with different embeddingrates that is 100 bps 200 bps and 300 bps by CNV-basedNID-based andDN-based steganographyOf course 200 bpsand 300 bps may be omitted for CNV-based steganogra-phy because of its limited embedding capacity And sevenstegospeech sets are generated amongwhich one set is relatedto CNV-based steganographic method and each of three setsis associated with NID-based and DN-based steganographyrespectivelyMoreover only 2385 kbitsmode is usedwithoutloss of generality

In each experiment a pair of cover and stego speech setsis randomly divided into training and testing sets accordingto three kinds of training rates that is 04 05 and 06 For

example if the training rate is 04 the training set contains40 speech samples randomly chosen from each of the coverand stegospeech sets and the remaining 60 samples go intothe testing set As described in [17 18] LIBSVM [22] is usedas a classifier and radial basis function (RBF) kernel and grid-search technique are employed to obtain better classificationperformance For Li et alrsquos steganalytic method the principalcomponent analysis (PCA) is first used as suggested in [18]to reduce the dimension of feature vectors to 300 Let thesamples in cover speech set denote negatives and those instego speech set stand for positives Hence the accuracy maybe defined as follows

Accuracy = 12 times ( TPTP + FN

+ TNFP + TN

) (5)

where TP are true positives TN are true negatives FN arefalse negatives and FP are false positives

The steganalytic results are given in Table 2 It can beseen that when the embedding rate is 100 bps the accuracyof detecting both CNV-based and DN-based methods isalmost the same say 50 or so while that of detecting

Security and Communication Networks 9

0 11 12 13 14 15 21 22 23 24 25 31 32 33 34 35 41 42 43 44 45 51 52 53 54 55Edge

0123456789

10

Cor

relat

ion

inde

x

12

13

14

15

23

24

25

34

35

45

Figure 8 The correlation index of 1000 AMR-WB speeches wherethe interframe edge 119894119894 connects two vertices 119881119894[119896] and 119881119894[119896 + 1] intwo neighboring frames and the intraframe edge 1198941198951015840 connects twovertices 119881119894[119896] and 119881119895[119896] in the same frame

NID-based steganography increases to 60 when MFCC-based steganalytic method is applied Moreover there isan apparent increase in the accuracy of detecting NID-based hiding method with the embedding rate increases to200 bps or 300 bps when Liu et alrsquos methods (ie Markovand MFCC-based steganalytic methods) are applied But theaccuracy of steganalyzing our proposed method DN-basedsteganography stays at the same level of 50 Therefore theproposed method may defend against Liu et alrsquos statisticalsteganalysis [17] even with higher embedding rates

According to the definition of the correlation index givenin [18] the experimental results of the correlation indices of1000 AMR-WB speeches which are randomly selected fromldquoTIMITrdquo are shown in Figure 8 Based on these results twostrong quantization codeword correlation network (QCCN)models say SS-QCCN and RS-QCCN can be constructedas illustrated in Figure 9 These two models are then usedto steganalyze our proposed steganography The steganalyticresults are also presented in Table 2 It can be seen fromTable 2 that the accuracy of both SS-QCCN and RS-QCCNis less than 50 for all of the AMR-WB stegospeeches Thepossible reasons may be that only the second and thirdcodebooks in the second stage are employed in the AMR-WB speech steganography which means merely the vertices1198812[119896] and 1198813[119896] in the 119896th frame may be changed duringsteganography while none of them are utilized in Li et alrsquossteganalytic method except for the edge ldquo33rdquo in RS-QCCNmodel Besides we also used an adapted QCCN model (ieutilize edges ldquo22rdquo ldquo33rdquo and ldquo231015840rdquo) targeted at AMR-WBspeech but the accuracy is still less than 50 It may bebecause the correlation of those edges is not strong enoughfor steganalysis according to Figure 8 Therefore it is reason-able to conclude that theAMR-WB speech steganography candefend against the steganalytic method proposed in [18]

In order to visualize the detection performance wegive some receiver operating characteristic (ROC) curvesof steganalyzing CNV-based steganography with 100 bpsembedding rate and NID-based and DN-based steganogra-phy with 100 bps 200 bps and 300 bps embedding rates are

V1[k]

V2[k]

V3[k]

V4[k]

V5[k]

V1[k + 1]

V2[k + 1]

V3[k + 1]

V4[k + 1]

V5[k + 1]

45 45

11

middot middot middot middot middot middot

(a) SS-QCCN

V1[k + 1]

V2[k + 1]

V3[k + 1]

V4[k + 1]

V5[k + 1]

V1[k]

V2[k]

V3[k]

V4[k]

V5[k]

45 45

15 15

1414

33

44

11

middot middot middot middot middot middot

(b) RS-QCCN

Figure 9 Two AMR-WB strong correlation network models

provided in Figure 10 (ROC curves for SS-QCCN and RS-QCCN are omitted for these two methods fail to steganalyzeAMR-WB steganography in spite of embedding capacity)It shows that all of the three steganographic methods canresist statistical steganalysis when the embedding rate is100 bps While the statistical steganalytic methods especiallyMFCC-based steganalysismay detect the existence of hiddenmessage embedded with NID-based steganography when theembedding rate is above 100 bps the proposed DN-basedsteganography may still have good security against bothMarkov-based and MFCC-based steganalysis

5 Conclusion

The adaptive multirate wideband (AMR-WB) is a widelyadapted format in mobile handsets and is also the recom-mended speech codec for VoLTE AMR-WB speech may bea good candidate for cover medium in speech steganographyIn this paper a novel AMR-WB speech steganographicmethod is proposed The experimental results demonstratedthe effectiveness of our proposed method The main contri-butions of this paper are as follows

(1) A novel AMR-WB speech steganography is pro-posed based on diameter-neighbor codebook parti-tion algorithm It can provide higher capacity with-out noticeable decrease in speech quality and better

10 Security and Communication Networks

DN

CNVNID

0

05

1

True

pos

itive

rate

02 04 06 08 10False positive rate

(a) Markov (TIMIT 100 bps)DN

CNVNID

02 04 06 08 10False positive rate

0

05

1

True

pos

itive

rate

(b) MFCC (TIMIT 100 bps)

NIDDN

0

05

1

True

pos

itive

rate

02 04 06 08 10False positive rate

(c) Markov (TIMIT 200 bps)

NIDDN

0

05

1

True

pos

itive

rate

02 04 06 08 10False positive rate

(d) MFCC (TIMIT 200 bps)

NIDDN

0

05

1

True

pos

itive

rate

02 04 06 08 10False positive rate

(e) Markov (TIMIT 300 bps)

NIDDN

0

05

1

True

pos

itive

rate

02 04 06 08 10False positive rate

(f) MFCC (TIMIT 300 bps)

Figure 10 ROC curves for steganalysis of CNV-based NID-based and our proposed steganography (50 training rate)

performance against statistical steganalysis thanNID-based method

(2) Flexible embedding capacity may be easily achievedwith different iterations of cluster merging Twicethe embedding capacity of CNV-based embeddingmethod may be obtained with119873119894 = 48

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work was partially supported by the National NaturalScience Foundation of China under Grant no 61632013

References

[1] W Bender D Gruhl N Morimoto and A Lu ldquoTechniques fordata hidingrdquo IBM Systems Journal vol 35 no 3-4 pp 313ndash3351996

[2] D Gruhl A Lu and W Bender ldquoEcho hidingrdquo in InformationHiding R Anderson Ed vol 1174 of Lecture Notes in ComputerScience pp 295ndash315 Springer Berlin Heidelberg Berlin Ger-many 1996

[3] K Gopalan ldquoAudio steganography using bit modificationrdquo inProceedings of the 2003 International Conference on Multimediaand Expo ICME 2003 pp I629ndashI632 USA July 2003

[4] K Gopalan S Wenndt S Adams and D Haddad ldquoAudiosteganography by amplitude or phasemodificationrdquo in Proceed-ings of the Security andWatermarking ofMultimedia Contents Vpp 67ndash76 USA January 2003

[5] D Kirovski and H S Malvar ldquoSpread-spectrum watermarkingof audio signalsrdquo IEEE Transactions on Signal Processing vol 51no 4 pp 1020ndash1033 2003

[6] L Liu M Li Q Li and Y Liang ldquoPerceptually transparentinformation hiding in G729 bitstreamrdquo in Proceedings of the2008 4th International Conference on Intelligent InformationHiding andMultiedia Signal Processing IIH-MSP 2008 pp 406ndash409 China August 2008

[7] T Xu and Z Yang ldquoSimple and effective speech steganog-raphy in G7231 low-rate codesrdquo in Proceedings of the 2009

Security and Communication Networks 11

International Conference on Wireless Communications and Sig-nal Processing WCSP 2009 China November 2009

[8] A Shahbazi A H Rezaie and R Shahbazi ldquoMELPe codedspeech hiding on enhanced full rate compressed domainrdquo inProceedings of the Asia Modelling Symposium 2010 4th Inter-national Conference on Mathematical Modelling and ComputerSimulation AMS2010 pp 267ndash270 Malaysia May 2010

[9] A Nishimura ldquoData hiding in pitch delay data of the adaptivemulti-rate narrow-band speech codecrdquo in Proceedings of theIIH-MSP 2009-2009 5th International Conference on IntelligentInformation Hiding and Multimedia Signal Processing pp 483ndash486 Japan September 2009

[10] B Xiao Y Huang and S Tang ldquoAn approach to informationhiding in low bit-rate speech streamrdquo in Proceedings of the2008 IEEE Global Telecommunications Conference GLOBE-COM 2008 pp 1940ndash1944 USA December 2008

[11] B Chen and G W Wornell ldquoQuantization index modulationa class of provably good methods for digital watermarking andinformation embeddingrdquo Institute of Electrical and ElectronicsEngineers Transactions on InformationTheory vol 47 no 4 pp1423ndash1443 2001

[12] Y F Huang S Tang and J Yuan ldquoSteganography in inactiveframes of VoIP streams encoded by source codecrdquo IEEETransactions on Information Forensics and Security vol 6 no2 pp 296ndash306 2011

[13] YHuang C Liu S Tang and S Bai ldquoSteganography integrationinto a low-bit rate speech codecrdquo IEEE Transactions on Informa-tion Forensics and Security vol 7 no 6 pp 1865ndash1875 2012

[14] H Miao L Huang Z Chen W Yang and A Al-Hawbani ldquoAnew scheme for covert communication via 3G encoded speechrdquoComputers and Electrical Engineering vol 38 no 6 pp 1490ndash1501 2012

[15] H Tian J Liu and S Li ldquoImproving security of quantization-index-modulation steganography in low bit-rate speechstreamsrdquoMultimedia Systems vol 20 no 2 pp 143ndash154 2014

[16] J Liu H Tian J Lu and Y Chen ldquoNeighbor-index-divisionsteganography based on QIM method for G7231 speechstreamsrdquo Journal of Ambient Intelligence and Humanized Com-puting vol 7 no 1 pp 139ndash147 2016

[17] Q Liu A H Sung and M Qiao ldquoDerivative-based audiosteganalysisrdquo ACM Transactions on Multimedia ComputingCommunications andApplications (TOMM) vol 7 no 3 articleno 18 2011

[18] S Li Y Jia and C-C J Kuo ldquoSteganalysis of QIM Steganogra-phy in Low-Bit-Rate Speech Signalsrdquo IEEEACM TransactionsonAudio Speech and Language Processing vol 25 no 5 pp 1011ndash1022 2017

[19] ITU-T Wideband Coding of Speech at around 16 Kbps UsingAdaptive Multi-rate Wideband (AMR-WB) International Tele-communication Union Std G7222 2002

[20] Perceptual Evaluation of Speech Quality (PESQ) An ObjectiveMethod for End-to-end Speech Quality Assessment of Narrow-band Telephone Net-works and Speech Codecs InternationalTelecommunication Union Std P862 2001

[21] Wideband Extension to Recommendation P862 for the Assess-ment of Wideband Telephone Networks and Speech CodecsInternational Telecommunication Union Std P8622 2007

[22] C Chang and C Lin ldquoLIBSVM a Library for support vectormachinesrdquo ACM Transactions on Intelligent Systems and Tech-nology vol 2 no 3 article 27 2011

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 6: A Novel AMR-WB Speech Steganography Based on Diameter ...In this section, a technical overview of AMR-WB codec is rstly presented. en two related codebook partition algorithms CNV

6 Security and Communication Networks

Stego ISF

Cluster set S

Extracted

00 01

1110

0 1

Search amp read

ClusteL1

ClusteL2

index Ib

Wa Wb

Wc Wd

bits ldquo01rdquo

ldquo01rdquo

Ib

Figure 5 Extracting two bits from one stego-ISF index

Step 3 Read the label of119882119887 as the extracted 119899 bits which areappended to the secret message bit sequence

Step 4 Repeat Steps 1ndash3 until all the secret bits are recovered

Figure 5 is the corresponding example of extracting twosecret bits from the stegoindex 119868119887 generated by the previousembedding instance shown in Figure 4 It can be easily seenthat the extracted secret bits are identical to the embeddedsecret bits

4 Experimental Results and Analysis

In order to demonstrate the performance of the proposedmethod the perceptual quality of the stego AMR-WB speechwith secret message embedded using our method is com-puted and compared to that of the stego AMR-WB speechgenerated with CNV and NID steganography Moreover theflexibility of embedding capacity and the security regardingstatistical detection are analyzed in detail

41 Audio Database TIMIT acoustic-phonetic continuousspeech corpus (httpscatalogldcupenneduldc93s1) is anaudio database which contains broadband recordings of630 speakers of eight major dialects of American Englisheach reading ten phonetically rich sentences and all audiosentences are sampled at 16 kHz In our experiments 1000audio sentences are randomly chosen from TIMIT databaseThe average maximum and minimum length of the chosenaudio sentences are 347 s 396 s and 312 s All audio files areconverted into AMR-WB format using standard codec

42 Speech Quality Evaluation The perceptual evaluation ofspeech quality (PESQ) described in the ITU-T P862 Recom-mendation [20] may be employed to evaluate speech qualityMoreover according to ITU-T P8622 [21] the raw PESQscore can be converted to mean opinion score-listening qual-ity objective (MOS-LQO) which is more suitable for evalu-ating wideband speech Hence MOS-LQO is applied in ourexperimentsThe normal range ofMOS-LQO score is 1017 to4549 The higher the score the better the quality

Figure 6 shows the MOS-LQO scores of the 1000 coverAMR-WB speeches in 2385 kbits mode and the correspond-ing stego AMR-WB speeches using three different codebookpartition algorithmsThree progressive embedding rates that

StandardCNVNID

Ours

100 200 300 400 500 600 700 800 900 10000Sample index

18222630343842

MO

S-LQ

O sc

ore

(a) The embedding rate is 100 bps

StandardNIDOurs

18222630343842

MO

S-LQ

O sc

ore

100 200 300 400 500 600 700 800 900 10000Sample index

(b) The embedding rate is 200 bps

StandardNIDOurs

100 200 300 400 500 600 700 800 900 10000Sample index

1418222630343842

MO

S-LQ

O sc

ore

(c) The embedding rate is 300 bps

Figure 6 Comparisons of MOS-LQO values for 1000 samplesbetween the standard AMR-WB codec CNV-based steganographyNID-based steganography and the proposedDN-based steganogra-phy

is 100 bps 200 bps and 300 bps are employed in our experi-ments The indices of speech samples are sorted according totheMOS-LQO scores of our proposedmethod It can be seenfrom Figure 6 that the overall scores of the stego AMR-WBspeeches generated with our method are higher than thoseof the NID-based stego AMR-WB speeches especially whenthe embedding rates are 200 bps and 300 bps And the MOS-LQO scores of the CNV-based stego AMR-WB speeches areslightly higher than ours when the embedding rate is 100 bpswhich means there are no obvious discrepancies in speechquality between them Besides when the high embeddingrate that is 200 bps or 300 bps is used the decrease inMOS-LQO scores of our stego AMR-WB speeches is significantlysmaller than that of NID-based steganography

Moreover the average MOS-LQO scores of the coverAMR-WB speeches and the stego AMR-WB speeches withthree different codebook partition algorithms that is CNV

Security and Communication Networks 7

Table 1 MOS-LQO scores of the standard codec CNV-based NID-based and our proposed steganography in four different rate modes andthree embedding rates

Embedding rate Method Rate mode (kbits)1265 1585 1985 2385

Standard 2929 3073 3199 3269

100 bps

CNV 2871 3021 3153 3225(minus20) (minus17) (minus14) (minus13)NID 2750 2895 3020 3091(minus61) (minus58) (minus56) (minus54)Ours 2864 3010 3139 3216(minus22) (minus20) (minus19) (minus16)

200 bps

CNV

NID 2601 2736 2875 2921(minus112) (minus110) (minus107) (minus106)Ours 2807 2955 3084 3164(minus42) (minus38) (minus36) (minus32)

300 bps

CNV

NID 2284 2386 2475 2533(minus220) (minus223) (minus226) (minus225)Ours 2699 2841 2971 3046(minus79) (minus75) (minus71) (minus68)

NID and DN including four rate modes (1265 kbits1585 kbits 1985 kbits and 2385 kbits) together with threekinds of embedding rate (100 bps 200 bps and 300 bps) aregiven in Table 1 Only the MOS-LQO scores of NID-basedand DN-based steganographic methods with embeddingrates 200 bps and 300 bps are given in Table 1 because theembedding capacity of CNV-based steganography may notbe larger than 100 bps

When the embedding rate is 100 bps which is almostthe limit of CNV steganography we can see from Table 1that the mean MOS-LQO scores of our proposed methodare only about 03 worse than CNV-based steganographyThe slight decrease may be almost imperceptible by humanauditory system (HAS) And there are significant increases ofapproximately 38 in the meanMOS-LQO scores when ourpresented method is compared to NID-based steganographyAnd it can be observed that when the embedding rates are200 bps and 300 bps the scores of our approach are improvedby about 7 and 15 correspondingly in contrast to those ofNID-based steganography

Furthermore we can also see that the experimentalresults of four rate modes are analogous The decrease ofspeech quality caused by NID-based steganography is morethan twice that caused by DN-based steganography And theproposedmethod can obtain twice the embedding capacity ofCNV-based steganography by sacrificing less than 2 speechquality in average In addition only a slight decline in speechquality is observed when 300 bps embedding rate is used inthe proposed DN-based method while 200 bps is employedin NID-based method

43 Flexible Embedding Capacity Compared to CNV-basedsteganography flexible embedding capacity may be obtained

to satisfy different practical demand with our proposedmethod The steganographic capacity can be adjusted bychanging the iteration parameter 119873119894 For different values of119873119894 for example 119873119894 = 32 33 54 the average embeddingcapacity and the MOS-LQO scores are given in Figure 7(a)and the corresponding results of NID-based steganographyare provided in Figure 7(b) for comparison Without loss ofgenerality only 2385 kbits mode is used

From Figure 7 we can observe that the embedding ratesignificantly increases with 119873119894 while the MOS-LQO scoreslightly goes down However as NID-based steganographyis concerned the MOS-LQO score rapidly declines with theincrease of the embedding rateTherefore the proposed DN-based steganography can achieve higher embedding capacitywith slight decrease in speech quality For example when119873119894 = 48 the size of each cluster in 119878 is equal to 4 and we canembed 4 bits per frame that is the embedding rate is 200 bpsbut at the same time the CNV algorithm can embed at most2 bits per frame (100 bps)

44 Resistibility of Statistical Steganalysis Speech steganog-raphy aims to hide secret message into cover speech withoutarousing suspicion It is very important for a steganographicmethod to resist statistical steganalysis which is the tech-nique of detecting the presence of hidden message Twostate-of-the-art steganalytic methods [17 18] are used toevaluate the performance of statistical undetectability of ourproposed method In [17] mel-cepstrum coefficients andMarkov transition features from the second-order derivativeof the audio signal are extracted to capture the statisticaldistortions caused by audio steganography while in [18]the correlation characteristics of split vector quantizationcodewords of linear predictive coding filter coefficients are

8 Security and Communication Networks

Embe

ddin

g ra

te (b

ps)

Embedding rateMOS-LQO

330

290

250

210

170

130

90

Times of cluster merging

33

31

29

27

25

23

MO

S-LQ

O sc

ore

565350474441383532

(a) Our proposed steganography

Number of sub-codebooks

Embedding rateMOS-LQO

109876543290

130

170

210

250

290

330

Embe

ddin

g ra

te (b

ps)

23

25

27

29

31

33

MO

S-LQ

O sc

ore

(b) NID-based steganography

Figure 7 Relationship between the embedding rates and the MOS-LQO scores for our proposed steganography and NID-basedsteganography

Table 2 Steganalysis results of different steganographic methods in 2385 kbitss mode

Training rate 04 05 06Method Markov MFCC SS-QCCN RS-QCCN Markov MFCC SS-QCCN RS-QCCN Markov MFCC SS-QCCN RS-QCCN100 bps

CNV 498 498 437 490 501 502 440 492 500 505 419 500NID 510 601 422 500 501 609 429 487 521 598 418 494Ours 500 500 440 494 503 493 403 494 491 486 418 433

200 bpsCNV NID 535 745 469 500 533 762 476 500 536 758 444 501Ours 510 483 452 500 498 487 422 500 505 486 450 500

300 bpsCNV NID 548 746 493 500 563 772 500 500 554 783 505 506Ours 524 497 479 500 528 609 482 500 538 501 466 500

utilized to steganalyze QIM-based steganography in low-bit-rate speech (such as G7231 and G729) Both steganalyticmethods use a support vector machine to predict the exis-tence of hidden message in given audios

In our experiments the sentences chosen from ldquoTIMITrdquodatabases as stated in Section 41 are first encoded using thestandard AMR-WB codec These AMR-WB recordings con-stitute the cover speech setThen secret message is embeddedinto each cover AMR-WB speech with different embeddingrates that is 100 bps 200 bps and 300 bps by CNV-basedNID-based andDN-based steganographyOf course 200 bpsand 300 bps may be omitted for CNV-based steganogra-phy because of its limited embedding capacity And sevenstegospeech sets are generated amongwhich one set is relatedto CNV-based steganographic method and each of three setsis associated with NID-based and DN-based steganographyrespectivelyMoreover only 2385 kbitsmode is usedwithoutloss of generality

In each experiment a pair of cover and stego speech setsis randomly divided into training and testing sets accordingto three kinds of training rates that is 04 05 and 06 For

example if the training rate is 04 the training set contains40 speech samples randomly chosen from each of the coverand stegospeech sets and the remaining 60 samples go intothe testing set As described in [17 18] LIBSVM [22] is usedas a classifier and radial basis function (RBF) kernel and grid-search technique are employed to obtain better classificationperformance For Li et alrsquos steganalytic method the principalcomponent analysis (PCA) is first used as suggested in [18]to reduce the dimension of feature vectors to 300 Let thesamples in cover speech set denote negatives and those instego speech set stand for positives Hence the accuracy maybe defined as follows

Accuracy = 12 times ( TPTP + FN

+ TNFP + TN

) (5)

where TP are true positives TN are true negatives FN arefalse negatives and FP are false positives

The steganalytic results are given in Table 2 It can beseen that when the embedding rate is 100 bps the accuracyof detecting both CNV-based and DN-based methods isalmost the same say 50 or so while that of detecting

Security and Communication Networks 9

0 11 12 13 14 15 21 22 23 24 25 31 32 33 34 35 41 42 43 44 45 51 52 53 54 55Edge

0123456789

10

Cor

relat

ion

inde

x

12

13

14

15

23

24

25

34

35

45

Figure 8 The correlation index of 1000 AMR-WB speeches wherethe interframe edge 119894119894 connects two vertices 119881119894[119896] and 119881119894[119896 + 1] intwo neighboring frames and the intraframe edge 1198941198951015840 connects twovertices 119881119894[119896] and 119881119895[119896] in the same frame

NID-based steganography increases to 60 when MFCC-based steganalytic method is applied Moreover there isan apparent increase in the accuracy of detecting NID-based hiding method with the embedding rate increases to200 bps or 300 bps when Liu et alrsquos methods (ie Markovand MFCC-based steganalytic methods) are applied But theaccuracy of steganalyzing our proposed method DN-basedsteganography stays at the same level of 50 Therefore theproposed method may defend against Liu et alrsquos statisticalsteganalysis [17] even with higher embedding rates

According to the definition of the correlation index givenin [18] the experimental results of the correlation indices of1000 AMR-WB speeches which are randomly selected fromldquoTIMITrdquo are shown in Figure 8 Based on these results twostrong quantization codeword correlation network (QCCN)models say SS-QCCN and RS-QCCN can be constructedas illustrated in Figure 9 These two models are then usedto steganalyze our proposed steganography The steganalyticresults are also presented in Table 2 It can be seen fromTable 2 that the accuracy of both SS-QCCN and RS-QCCNis less than 50 for all of the AMR-WB stegospeeches Thepossible reasons may be that only the second and thirdcodebooks in the second stage are employed in the AMR-WB speech steganography which means merely the vertices1198812[119896] and 1198813[119896] in the 119896th frame may be changed duringsteganography while none of them are utilized in Li et alrsquossteganalytic method except for the edge ldquo33rdquo in RS-QCCNmodel Besides we also used an adapted QCCN model (ieutilize edges ldquo22rdquo ldquo33rdquo and ldquo231015840rdquo) targeted at AMR-WBspeech but the accuracy is still less than 50 It may bebecause the correlation of those edges is not strong enoughfor steganalysis according to Figure 8 Therefore it is reason-able to conclude that theAMR-WB speech steganography candefend against the steganalytic method proposed in [18]

In order to visualize the detection performance wegive some receiver operating characteristic (ROC) curvesof steganalyzing CNV-based steganography with 100 bpsembedding rate and NID-based and DN-based steganogra-phy with 100 bps 200 bps and 300 bps embedding rates are

V1[k]

V2[k]

V3[k]

V4[k]

V5[k]

V1[k + 1]

V2[k + 1]

V3[k + 1]

V4[k + 1]

V5[k + 1]

45 45

11

middot middot middot middot middot middot

(a) SS-QCCN

V1[k + 1]

V2[k + 1]

V3[k + 1]

V4[k + 1]

V5[k + 1]

V1[k]

V2[k]

V3[k]

V4[k]

V5[k]

45 45

15 15

1414

33

44

11

middot middot middot middot middot middot

(b) RS-QCCN

Figure 9 Two AMR-WB strong correlation network models

provided in Figure 10 (ROC curves for SS-QCCN and RS-QCCN are omitted for these two methods fail to steganalyzeAMR-WB steganography in spite of embedding capacity)It shows that all of the three steganographic methods canresist statistical steganalysis when the embedding rate is100 bps While the statistical steganalytic methods especiallyMFCC-based steganalysismay detect the existence of hiddenmessage embedded with NID-based steganography when theembedding rate is above 100 bps the proposed DN-basedsteganography may still have good security against bothMarkov-based and MFCC-based steganalysis

5 Conclusion

The adaptive multirate wideband (AMR-WB) is a widelyadapted format in mobile handsets and is also the recom-mended speech codec for VoLTE AMR-WB speech may bea good candidate for cover medium in speech steganographyIn this paper a novel AMR-WB speech steganographicmethod is proposed The experimental results demonstratedthe effectiveness of our proposed method The main contri-butions of this paper are as follows

(1) A novel AMR-WB speech steganography is pro-posed based on diameter-neighbor codebook parti-tion algorithm It can provide higher capacity with-out noticeable decrease in speech quality and better

10 Security and Communication Networks

DN

CNVNID

0

05

1

True

pos

itive

rate

02 04 06 08 10False positive rate

(a) Markov (TIMIT 100 bps)DN

CNVNID

02 04 06 08 10False positive rate

0

05

1

True

pos

itive

rate

(b) MFCC (TIMIT 100 bps)

NIDDN

0

05

1

True

pos

itive

rate

02 04 06 08 10False positive rate

(c) Markov (TIMIT 200 bps)

NIDDN

0

05

1

True

pos

itive

rate

02 04 06 08 10False positive rate

(d) MFCC (TIMIT 200 bps)

NIDDN

0

05

1

True

pos

itive

rate

02 04 06 08 10False positive rate

(e) Markov (TIMIT 300 bps)

NIDDN

0

05

1

True

pos

itive

rate

02 04 06 08 10False positive rate

(f) MFCC (TIMIT 300 bps)

Figure 10 ROC curves for steganalysis of CNV-based NID-based and our proposed steganography (50 training rate)

performance against statistical steganalysis thanNID-based method

(2) Flexible embedding capacity may be easily achievedwith different iterations of cluster merging Twicethe embedding capacity of CNV-based embeddingmethod may be obtained with119873119894 = 48

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work was partially supported by the National NaturalScience Foundation of China under Grant no 61632013

References

[1] W Bender D Gruhl N Morimoto and A Lu ldquoTechniques fordata hidingrdquo IBM Systems Journal vol 35 no 3-4 pp 313ndash3351996

[2] D Gruhl A Lu and W Bender ldquoEcho hidingrdquo in InformationHiding R Anderson Ed vol 1174 of Lecture Notes in ComputerScience pp 295ndash315 Springer Berlin Heidelberg Berlin Ger-many 1996

[3] K Gopalan ldquoAudio steganography using bit modificationrdquo inProceedings of the 2003 International Conference on Multimediaand Expo ICME 2003 pp I629ndashI632 USA July 2003

[4] K Gopalan S Wenndt S Adams and D Haddad ldquoAudiosteganography by amplitude or phasemodificationrdquo in Proceed-ings of the Security andWatermarking ofMultimedia Contents Vpp 67ndash76 USA January 2003

[5] D Kirovski and H S Malvar ldquoSpread-spectrum watermarkingof audio signalsrdquo IEEE Transactions on Signal Processing vol 51no 4 pp 1020ndash1033 2003

[6] L Liu M Li Q Li and Y Liang ldquoPerceptually transparentinformation hiding in G729 bitstreamrdquo in Proceedings of the2008 4th International Conference on Intelligent InformationHiding andMultiedia Signal Processing IIH-MSP 2008 pp 406ndash409 China August 2008

[7] T Xu and Z Yang ldquoSimple and effective speech steganog-raphy in G7231 low-rate codesrdquo in Proceedings of the 2009

Security and Communication Networks 11

International Conference on Wireless Communications and Sig-nal Processing WCSP 2009 China November 2009

[8] A Shahbazi A H Rezaie and R Shahbazi ldquoMELPe codedspeech hiding on enhanced full rate compressed domainrdquo inProceedings of the Asia Modelling Symposium 2010 4th Inter-national Conference on Mathematical Modelling and ComputerSimulation AMS2010 pp 267ndash270 Malaysia May 2010

[9] A Nishimura ldquoData hiding in pitch delay data of the adaptivemulti-rate narrow-band speech codecrdquo in Proceedings of theIIH-MSP 2009-2009 5th International Conference on IntelligentInformation Hiding and Multimedia Signal Processing pp 483ndash486 Japan September 2009

[10] B Xiao Y Huang and S Tang ldquoAn approach to informationhiding in low bit-rate speech streamrdquo in Proceedings of the2008 IEEE Global Telecommunications Conference GLOBE-COM 2008 pp 1940ndash1944 USA December 2008

[11] B Chen and G W Wornell ldquoQuantization index modulationa class of provably good methods for digital watermarking andinformation embeddingrdquo Institute of Electrical and ElectronicsEngineers Transactions on InformationTheory vol 47 no 4 pp1423ndash1443 2001

[12] Y F Huang S Tang and J Yuan ldquoSteganography in inactiveframes of VoIP streams encoded by source codecrdquo IEEETransactions on Information Forensics and Security vol 6 no2 pp 296ndash306 2011

[13] YHuang C Liu S Tang and S Bai ldquoSteganography integrationinto a low-bit rate speech codecrdquo IEEE Transactions on Informa-tion Forensics and Security vol 7 no 6 pp 1865ndash1875 2012

[14] H Miao L Huang Z Chen W Yang and A Al-Hawbani ldquoAnew scheme for covert communication via 3G encoded speechrdquoComputers and Electrical Engineering vol 38 no 6 pp 1490ndash1501 2012

[15] H Tian J Liu and S Li ldquoImproving security of quantization-index-modulation steganography in low bit-rate speechstreamsrdquoMultimedia Systems vol 20 no 2 pp 143ndash154 2014

[16] J Liu H Tian J Lu and Y Chen ldquoNeighbor-index-divisionsteganography based on QIM method for G7231 speechstreamsrdquo Journal of Ambient Intelligence and Humanized Com-puting vol 7 no 1 pp 139ndash147 2016

[17] Q Liu A H Sung and M Qiao ldquoDerivative-based audiosteganalysisrdquo ACM Transactions on Multimedia ComputingCommunications andApplications (TOMM) vol 7 no 3 articleno 18 2011

[18] S Li Y Jia and C-C J Kuo ldquoSteganalysis of QIM Steganogra-phy in Low-Bit-Rate Speech Signalsrdquo IEEEACM TransactionsonAudio Speech and Language Processing vol 25 no 5 pp 1011ndash1022 2017

[19] ITU-T Wideband Coding of Speech at around 16 Kbps UsingAdaptive Multi-rate Wideband (AMR-WB) International Tele-communication Union Std G7222 2002

[20] Perceptual Evaluation of Speech Quality (PESQ) An ObjectiveMethod for End-to-end Speech Quality Assessment of Narrow-band Telephone Net-works and Speech Codecs InternationalTelecommunication Union Std P862 2001

[21] Wideband Extension to Recommendation P862 for the Assess-ment of Wideband Telephone Networks and Speech CodecsInternational Telecommunication Union Std P8622 2007

[22] C Chang and C Lin ldquoLIBSVM a Library for support vectormachinesrdquo ACM Transactions on Intelligent Systems and Tech-nology vol 2 no 3 article 27 2011

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 7: A Novel AMR-WB Speech Steganography Based on Diameter ...In this section, a technical overview of AMR-WB codec is rstly presented. en two related codebook partition algorithms CNV

Security and Communication Networks 7

Table 1 MOS-LQO scores of the standard codec CNV-based NID-based and our proposed steganography in four different rate modes andthree embedding rates

Embedding rate Method Rate mode (kbits)1265 1585 1985 2385

Standard 2929 3073 3199 3269

100 bps

CNV 2871 3021 3153 3225(minus20) (minus17) (minus14) (minus13)NID 2750 2895 3020 3091(minus61) (minus58) (minus56) (minus54)Ours 2864 3010 3139 3216(minus22) (minus20) (minus19) (minus16)

200 bps

CNV

NID 2601 2736 2875 2921(minus112) (minus110) (minus107) (minus106)Ours 2807 2955 3084 3164(minus42) (minus38) (minus36) (minus32)

300 bps

CNV

NID 2284 2386 2475 2533(minus220) (minus223) (minus226) (minus225)Ours 2699 2841 2971 3046(minus79) (minus75) (minus71) (minus68)

NID and DN including four rate modes (1265 kbits1585 kbits 1985 kbits and 2385 kbits) together with threekinds of embedding rate (100 bps 200 bps and 300 bps) aregiven in Table 1 Only the MOS-LQO scores of NID-basedand DN-based steganographic methods with embeddingrates 200 bps and 300 bps are given in Table 1 because theembedding capacity of CNV-based steganography may notbe larger than 100 bps

When the embedding rate is 100 bps which is almostthe limit of CNV steganography we can see from Table 1that the mean MOS-LQO scores of our proposed methodare only about 03 worse than CNV-based steganographyThe slight decrease may be almost imperceptible by humanauditory system (HAS) And there are significant increases ofapproximately 38 in the meanMOS-LQO scores when ourpresented method is compared to NID-based steganographyAnd it can be observed that when the embedding rates are200 bps and 300 bps the scores of our approach are improvedby about 7 and 15 correspondingly in contrast to those ofNID-based steganography

Furthermore we can also see that the experimentalresults of four rate modes are analogous The decrease ofspeech quality caused by NID-based steganography is morethan twice that caused by DN-based steganography And theproposedmethod can obtain twice the embedding capacity ofCNV-based steganography by sacrificing less than 2 speechquality in average In addition only a slight decline in speechquality is observed when 300 bps embedding rate is used inthe proposed DN-based method while 200 bps is employedin NID-based method

43 Flexible Embedding Capacity Compared to CNV-basedsteganography flexible embedding capacity may be obtained

to satisfy different practical demand with our proposedmethod The steganographic capacity can be adjusted bychanging the iteration parameter 119873119894 For different values of119873119894 for example 119873119894 = 32 33 54 the average embeddingcapacity and the MOS-LQO scores are given in Figure 7(a)and the corresponding results of NID-based steganographyare provided in Figure 7(b) for comparison Without loss ofgenerality only 2385 kbits mode is used

From Figure 7 we can observe that the embedding ratesignificantly increases with 119873119894 while the MOS-LQO scoreslightly goes down However as NID-based steganographyis concerned the MOS-LQO score rapidly declines with theincrease of the embedding rateTherefore the proposed DN-based steganography can achieve higher embedding capacitywith slight decrease in speech quality For example when119873119894 = 48 the size of each cluster in 119878 is equal to 4 and we canembed 4 bits per frame that is the embedding rate is 200 bpsbut at the same time the CNV algorithm can embed at most2 bits per frame (100 bps)

44 Resistibility of Statistical Steganalysis Speech steganog-raphy aims to hide secret message into cover speech withoutarousing suspicion It is very important for a steganographicmethod to resist statistical steganalysis which is the tech-nique of detecting the presence of hidden message Twostate-of-the-art steganalytic methods [17 18] are used toevaluate the performance of statistical undetectability of ourproposed method In [17] mel-cepstrum coefficients andMarkov transition features from the second-order derivativeof the audio signal are extracted to capture the statisticaldistortions caused by audio steganography while in [18]the correlation characteristics of split vector quantizationcodewords of linear predictive coding filter coefficients are

8 Security and Communication Networks

Embe

ddin

g ra

te (b

ps)

Embedding rateMOS-LQO

330

290

250

210

170

130

90

Times of cluster merging

33

31

29

27

25

23

MO

S-LQ

O sc

ore

565350474441383532

(a) Our proposed steganography

Number of sub-codebooks

Embedding rateMOS-LQO

109876543290

130

170

210

250

290

330

Embe

ddin

g ra

te (b

ps)

23

25

27

29

31

33

MO

S-LQ

O sc

ore

(b) NID-based steganography

Figure 7 Relationship between the embedding rates and the MOS-LQO scores for our proposed steganography and NID-basedsteganography

Table 2 Steganalysis results of different steganographic methods in 2385 kbitss mode

Training rate 04 05 06Method Markov MFCC SS-QCCN RS-QCCN Markov MFCC SS-QCCN RS-QCCN Markov MFCC SS-QCCN RS-QCCN100 bps

CNV 498 498 437 490 501 502 440 492 500 505 419 500NID 510 601 422 500 501 609 429 487 521 598 418 494Ours 500 500 440 494 503 493 403 494 491 486 418 433

200 bpsCNV NID 535 745 469 500 533 762 476 500 536 758 444 501Ours 510 483 452 500 498 487 422 500 505 486 450 500

300 bpsCNV NID 548 746 493 500 563 772 500 500 554 783 505 506Ours 524 497 479 500 528 609 482 500 538 501 466 500

utilized to steganalyze QIM-based steganography in low-bit-rate speech (such as G7231 and G729) Both steganalyticmethods use a support vector machine to predict the exis-tence of hidden message in given audios

In our experiments the sentences chosen from ldquoTIMITrdquodatabases as stated in Section 41 are first encoded using thestandard AMR-WB codec These AMR-WB recordings con-stitute the cover speech setThen secret message is embeddedinto each cover AMR-WB speech with different embeddingrates that is 100 bps 200 bps and 300 bps by CNV-basedNID-based andDN-based steganographyOf course 200 bpsand 300 bps may be omitted for CNV-based steganogra-phy because of its limited embedding capacity And sevenstegospeech sets are generated amongwhich one set is relatedto CNV-based steganographic method and each of three setsis associated with NID-based and DN-based steganographyrespectivelyMoreover only 2385 kbitsmode is usedwithoutloss of generality

In each experiment a pair of cover and stego speech setsis randomly divided into training and testing sets accordingto three kinds of training rates that is 04 05 and 06 For

example if the training rate is 04 the training set contains40 speech samples randomly chosen from each of the coverand stegospeech sets and the remaining 60 samples go intothe testing set As described in [17 18] LIBSVM [22] is usedas a classifier and radial basis function (RBF) kernel and grid-search technique are employed to obtain better classificationperformance For Li et alrsquos steganalytic method the principalcomponent analysis (PCA) is first used as suggested in [18]to reduce the dimension of feature vectors to 300 Let thesamples in cover speech set denote negatives and those instego speech set stand for positives Hence the accuracy maybe defined as follows

Accuracy = 12 times ( TPTP + FN

+ TNFP + TN

) (5)

where TP are true positives TN are true negatives FN arefalse negatives and FP are false positives

The steganalytic results are given in Table 2 It can beseen that when the embedding rate is 100 bps the accuracyof detecting both CNV-based and DN-based methods isalmost the same say 50 or so while that of detecting

Security and Communication Networks 9

0 11 12 13 14 15 21 22 23 24 25 31 32 33 34 35 41 42 43 44 45 51 52 53 54 55Edge

0123456789

10

Cor

relat

ion

inde

x

12

13

14

15

23

24

25

34

35

45

Figure 8 The correlation index of 1000 AMR-WB speeches wherethe interframe edge 119894119894 connects two vertices 119881119894[119896] and 119881119894[119896 + 1] intwo neighboring frames and the intraframe edge 1198941198951015840 connects twovertices 119881119894[119896] and 119881119895[119896] in the same frame

NID-based steganography increases to 60 when MFCC-based steganalytic method is applied Moreover there isan apparent increase in the accuracy of detecting NID-based hiding method with the embedding rate increases to200 bps or 300 bps when Liu et alrsquos methods (ie Markovand MFCC-based steganalytic methods) are applied But theaccuracy of steganalyzing our proposed method DN-basedsteganography stays at the same level of 50 Therefore theproposed method may defend against Liu et alrsquos statisticalsteganalysis [17] even with higher embedding rates

According to the definition of the correlation index givenin [18] the experimental results of the correlation indices of1000 AMR-WB speeches which are randomly selected fromldquoTIMITrdquo are shown in Figure 8 Based on these results twostrong quantization codeword correlation network (QCCN)models say SS-QCCN and RS-QCCN can be constructedas illustrated in Figure 9 These two models are then usedto steganalyze our proposed steganography The steganalyticresults are also presented in Table 2 It can be seen fromTable 2 that the accuracy of both SS-QCCN and RS-QCCNis less than 50 for all of the AMR-WB stegospeeches Thepossible reasons may be that only the second and thirdcodebooks in the second stage are employed in the AMR-WB speech steganography which means merely the vertices1198812[119896] and 1198813[119896] in the 119896th frame may be changed duringsteganography while none of them are utilized in Li et alrsquossteganalytic method except for the edge ldquo33rdquo in RS-QCCNmodel Besides we also used an adapted QCCN model (ieutilize edges ldquo22rdquo ldquo33rdquo and ldquo231015840rdquo) targeted at AMR-WBspeech but the accuracy is still less than 50 It may bebecause the correlation of those edges is not strong enoughfor steganalysis according to Figure 8 Therefore it is reason-able to conclude that theAMR-WB speech steganography candefend against the steganalytic method proposed in [18]

In order to visualize the detection performance wegive some receiver operating characteristic (ROC) curvesof steganalyzing CNV-based steganography with 100 bpsembedding rate and NID-based and DN-based steganogra-phy with 100 bps 200 bps and 300 bps embedding rates are

V1[k]

V2[k]

V3[k]

V4[k]

V5[k]

V1[k + 1]

V2[k + 1]

V3[k + 1]

V4[k + 1]

V5[k + 1]

45 45

11

middot middot middot middot middot middot

(a) SS-QCCN

V1[k + 1]

V2[k + 1]

V3[k + 1]

V4[k + 1]

V5[k + 1]

V1[k]

V2[k]

V3[k]

V4[k]

V5[k]

45 45

15 15

1414

33

44

11

middot middot middot middot middot middot

(b) RS-QCCN

Figure 9 Two AMR-WB strong correlation network models

provided in Figure 10 (ROC curves for SS-QCCN and RS-QCCN are omitted for these two methods fail to steganalyzeAMR-WB steganography in spite of embedding capacity)It shows that all of the three steganographic methods canresist statistical steganalysis when the embedding rate is100 bps While the statistical steganalytic methods especiallyMFCC-based steganalysismay detect the existence of hiddenmessage embedded with NID-based steganography when theembedding rate is above 100 bps the proposed DN-basedsteganography may still have good security against bothMarkov-based and MFCC-based steganalysis

5 Conclusion

The adaptive multirate wideband (AMR-WB) is a widelyadapted format in mobile handsets and is also the recom-mended speech codec for VoLTE AMR-WB speech may bea good candidate for cover medium in speech steganographyIn this paper a novel AMR-WB speech steganographicmethod is proposed The experimental results demonstratedthe effectiveness of our proposed method The main contri-butions of this paper are as follows

(1) A novel AMR-WB speech steganography is pro-posed based on diameter-neighbor codebook parti-tion algorithm It can provide higher capacity with-out noticeable decrease in speech quality and better

10 Security and Communication Networks

DN

CNVNID

0

05

1

True

pos

itive

rate

02 04 06 08 10False positive rate

(a) Markov (TIMIT 100 bps)DN

CNVNID

02 04 06 08 10False positive rate

0

05

1

True

pos

itive

rate

(b) MFCC (TIMIT 100 bps)

NIDDN

0

05

1

True

pos

itive

rate

02 04 06 08 10False positive rate

(c) Markov (TIMIT 200 bps)

NIDDN

0

05

1

True

pos

itive

rate

02 04 06 08 10False positive rate

(d) MFCC (TIMIT 200 bps)

NIDDN

0

05

1

True

pos

itive

rate

02 04 06 08 10False positive rate

(e) Markov (TIMIT 300 bps)

NIDDN

0

05

1

True

pos

itive

rate

02 04 06 08 10False positive rate

(f) MFCC (TIMIT 300 bps)

Figure 10 ROC curves for steganalysis of CNV-based NID-based and our proposed steganography (50 training rate)

performance against statistical steganalysis thanNID-based method

(2) Flexible embedding capacity may be easily achievedwith different iterations of cluster merging Twicethe embedding capacity of CNV-based embeddingmethod may be obtained with119873119894 = 48

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work was partially supported by the National NaturalScience Foundation of China under Grant no 61632013

References

[1] W Bender D Gruhl N Morimoto and A Lu ldquoTechniques fordata hidingrdquo IBM Systems Journal vol 35 no 3-4 pp 313ndash3351996

[2] D Gruhl A Lu and W Bender ldquoEcho hidingrdquo in InformationHiding R Anderson Ed vol 1174 of Lecture Notes in ComputerScience pp 295ndash315 Springer Berlin Heidelberg Berlin Ger-many 1996

[3] K Gopalan ldquoAudio steganography using bit modificationrdquo inProceedings of the 2003 International Conference on Multimediaand Expo ICME 2003 pp I629ndashI632 USA July 2003

[4] K Gopalan S Wenndt S Adams and D Haddad ldquoAudiosteganography by amplitude or phasemodificationrdquo in Proceed-ings of the Security andWatermarking ofMultimedia Contents Vpp 67ndash76 USA January 2003

[5] D Kirovski and H S Malvar ldquoSpread-spectrum watermarkingof audio signalsrdquo IEEE Transactions on Signal Processing vol 51no 4 pp 1020ndash1033 2003

[6] L Liu M Li Q Li and Y Liang ldquoPerceptually transparentinformation hiding in G729 bitstreamrdquo in Proceedings of the2008 4th International Conference on Intelligent InformationHiding andMultiedia Signal Processing IIH-MSP 2008 pp 406ndash409 China August 2008

[7] T Xu and Z Yang ldquoSimple and effective speech steganog-raphy in G7231 low-rate codesrdquo in Proceedings of the 2009

Security and Communication Networks 11

International Conference on Wireless Communications and Sig-nal Processing WCSP 2009 China November 2009

[8] A Shahbazi A H Rezaie and R Shahbazi ldquoMELPe codedspeech hiding on enhanced full rate compressed domainrdquo inProceedings of the Asia Modelling Symposium 2010 4th Inter-national Conference on Mathematical Modelling and ComputerSimulation AMS2010 pp 267ndash270 Malaysia May 2010

[9] A Nishimura ldquoData hiding in pitch delay data of the adaptivemulti-rate narrow-band speech codecrdquo in Proceedings of theIIH-MSP 2009-2009 5th International Conference on IntelligentInformation Hiding and Multimedia Signal Processing pp 483ndash486 Japan September 2009

[10] B Xiao Y Huang and S Tang ldquoAn approach to informationhiding in low bit-rate speech streamrdquo in Proceedings of the2008 IEEE Global Telecommunications Conference GLOBE-COM 2008 pp 1940ndash1944 USA December 2008

[11] B Chen and G W Wornell ldquoQuantization index modulationa class of provably good methods for digital watermarking andinformation embeddingrdquo Institute of Electrical and ElectronicsEngineers Transactions on InformationTheory vol 47 no 4 pp1423ndash1443 2001

[12] Y F Huang S Tang and J Yuan ldquoSteganography in inactiveframes of VoIP streams encoded by source codecrdquo IEEETransactions on Information Forensics and Security vol 6 no2 pp 296ndash306 2011

[13] YHuang C Liu S Tang and S Bai ldquoSteganography integrationinto a low-bit rate speech codecrdquo IEEE Transactions on Informa-tion Forensics and Security vol 7 no 6 pp 1865ndash1875 2012

[14] H Miao L Huang Z Chen W Yang and A Al-Hawbani ldquoAnew scheme for covert communication via 3G encoded speechrdquoComputers and Electrical Engineering vol 38 no 6 pp 1490ndash1501 2012

[15] H Tian J Liu and S Li ldquoImproving security of quantization-index-modulation steganography in low bit-rate speechstreamsrdquoMultimedia Systems vol 20 no 2 pp 143ndash154 2014

[16] J Liu H Tian J Lu and Y Chen ldquoNeighbor-index-divisionsteganography based on QIM method for G7231 speechstreamsrdquo Journal of Ambient Intelligence and Humanized Com-puting vol 7 no 1 pp 139ndash147 2016

[17] Q Liu A H Sung and M Qiao ldquoDerivative-based audiosteganalysisrdquo ACM Transactions on Multimedia ComputingCommunications andApplications (TOMM) vol 7 no 3 articleno 18 2011

[18] S Li Y Jia and C-C J Kuo ldquoSteganalysis of QIM Steganogra-phy in Low-Bit-Rate Speech Signalsrdquo IEEEACM TransactionsonAudio Speech and Language Processing vol 25 no 5 pp 1011ndash1022 2017

[19] ITU-T Wideband Coding of Speech at around 16 Kbps UsingAdaptive Multi-rate Wideband (AMR-WB) International Tele-communication Union Std G7222 2002

[20] Perceptual Evaluation of Speech Quality (PESQ) An ObjectiveMethod for End-to-end Speech Quality Assessment of Narrow-band Telephone Net-works and Speech Codecs InternationalTelecommunication Union Std P862 2001

[21] Wideband Extension to Recommendation P862 for the Assess-ment of Wideband Telephone Networks and Speech CodecsInternational Telecommunication Union Std P8622 2007

[22] C Chang and C Lin ldquoLIBSVM a Library for support vectormachinesrdquo ACM Transactions on Intelligent Systems and Tech-nology vol 2 no 3 article 27 2011

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 8: A Novel AMR-WB Speech Steganography Based on Diameter ...In this section, a technical overview of AMR-WB codec is rstly presented. en two related codebook partition algorithms CNV

8 Security and Communication Networks

Embe

ddin

g ra

te (b

ps)

Embedding rateMOS-LQO

330

290

250

210

170

130

90

Times of cluster merging

33

31

29

27

25

23

MO

S-LQ

O sc

ore

565350474441383532

(a) Our proposed steganography

Number of sub-codebooks

Embedding rateMOS-LQO

109876543290

130

170

210

250

290

330

Embe

ddin

g ra

te (b

ps)

23

25

27

29

31

33

MO

S-LQ

O sc

ore

(b) NID-based steganography

Figure 7 Relationship between the embedding rates and the MOS-LQO scores for our proposed steganography and NID-basedsteganography

Table 2 Steganalysis results of different steganographic methods in 2385 kbitss mode

Training rate 04 05 06Method Markov MFCC SS-QCCN RS-QCCN Markov MFCC SS-QCCN RS-QCCN Markov MFCC SS-QCCN RS-QCCN100 bps

CNV 498 498 437 490 501 502 440 492 500 505 419 500NID 510 601 422 500 501 609 429 487 521 598 418 494Ours 500 500 440 494 503 493 403 494 491 486 418 433

200 bpsCNV NID 535 745 469 500 533 762 476 500 536 758 444 501Ours 510 483 452 500 498 487 422 500 505 486 450 500

300 bpsCNV NID 548 746 493 500 563 772 500 500 554 783 505 506Ours 524 497 479 500 528 609 482 500 538 501 466 500

utilized to steganalyze QIM-based steganography in low-bit-rate speech (such as G7231 and G729) Both steganalyticmethods use a support vector machine to predict the exis-tence of hidden message in given audios

In our experiments the sentences chosen from ldquoTIMITrdquodatabases as stated in Section 41 are first encoded using thestandard AMR-WB codec These AMR-WB recordings con-stitute the cover speech setThen secret message is embeddedinto each cover AMR-WB speech with different embeddingrates that is 100 bps 200 bps and 300 bps by CNV-basedNID-based andDN-based steganographyOf course 200 bpsand 300 bps may be omitted for CNV-based steganogra-phy because of its limited embedding capacity And sevenstegospeech sets are generated amongwhich one set is relatedto CNV-based steganographic method and each of three setsis associated with NID-based and DN-based steganographyrespectivelyMoreover only 2385 kbitsmode is usedwithoutloss of generality

In each experiment a pair of cover and stego speech setsis randomly divided into training and testing sets accordingto three kinds of training rates that is 04 05 and 06 For

example if the training rate is 04 the training set contains40 speech samples randomly chosen from each of the coverand stegospeech sets and the remaining 60 samples go intothe testing set As described in [17 18] LIBSVM [22] is usedas a classifier and radial basis function (RBF) kernel and grid-search technique are employed to obtain better classificationperformance For Li et alrsquos steganalytic method the principalcomponent analysis (PCA) is first used as suggested in [18]to reduce the dimension of feature vectors to 300 Let thesamples in cover speech set denote negatives and those instego speech set stand for positives Hence the accuracy maybe defined as follows

Accuracy = 12 times ( TPTP + FN

+ TNFP + TN

) (5)

where TP are true positives TN are true negatives FN arefalse negatives and FP are false positives

The steganalytic results are given in Table 2 It can beseen that when the embedding rate is 100 bps the accuracyof detecting both CNV-based and DN-based methods isalmost the same say 50 or so while that of detecting

Security and Communication Networks 9

0 11 12 13 14 15 21 22 23 24 25 31 32 33 34 35 41 42 43 44 45 51 52 53 54 55Edge

0123456789

10

Cor

relat

ion

inde

x

12

13

14

15

23

24

25

34

35

45

Figure 8 The correlation index of 1000 AMR-WB speeches wherethe interframe edge 119894119894 connects two vertices 119881119894[119896] and 119881119894[119896 + 1] intwo neighboring frames and the intraframe edge 1198941198951015840 connects twovertices 119881119894[119896] and 119881119895[119896] in the same frame

NID-based steganography increases to 60 when MFCC-based steganalytic method is applied Moreover there isan apparent increase in the accuracy of detecting NID-based hiding method with the embedding rate increases to200 bps or 300 bps when Liu et alrsquos methods (ie Markovand MFCC-based steganalytic methods) are applied But theaccuracy of steganalyzing our proposed method DN-basedsteganography stays at the same level of 50 Therefore theproposed method may defend against Liu et alrsquos statisticalsteganalysis [17] even with higher embedding rates

According to the definition of the correlation index givenin [18] the experimental results of the correlation indices of1000 AMR-WB speeches which are randomly selected fromldquoTIMITrdquo are shown in Figure 8 Based on these results twostrong quantization codeword correlation network (QCCN)models say SS-QCCN and RS-QCCN can be constructedas illustrated in Figure 9 These two models are then usedto steganalyze our proposed steganography The steganalyticresults are also presented in Table 2 It can be seen fromTable 2 that the accuracy of both SS-QCCN and RS-QCCNis less than 50 for all of the AMR-WB stegospeeches Thepossible reasons may be that only the second and thirdcodebooks in the second stage are employed in the AMR-WB speech steganography which means merely the vertices1198812[119896] and 1198813[119896] in the 119896th frame may be changed duringsteganography while none of them are utilized in Li et alrsquossteganalytic method except for the edge ldquo33rdquo in RS-QCCNmodel Besides we also used an adapted QCCN model (ieutilize edges ldquo22rdquo ldquo33rdquo and ldquo231015840rdquo) targeted at AMR-WBspeech but the accuracy is still less than 50 It may bebecause the correlation of those edges is not strong enoughfor steganalysis according to Figure 8 Therefore it is reason-able to conclude that theAMR-WB speech steganography candefend against the steganalytic method proposed in [18]

In order to visualize the detection performance wegive some receiver operating characteristic (ROC) curvesof steganalyzing CNV-based steganography with 100 bpsembedding rate and NID-based and DN-based steganogra-phy with 100 bps 200 bps and 300 bps embedding rates are

V1[k]

V2[k]

V3[k]

V4[k]

V5[k]

V1[k + 1]

V2[k + 1]

V3[k + 1]

V4[k + 1]

V5[k + 1]

45 45

11

middot middot middot middot middot middot

(a) SS-QCCN

V1[k + 1]

V2[k + 1]

V3[k + 1]

V4[k + 1]

V5[k + 1]

V1[k]

V2[k]

V3[k]

V4[k]

V5[k]

45 45

15 15

1414

33

44

11

middot middot middot middot middot middot

(b) RS-QCCN

Figure 9 Two AMR-WB strong correlation network models

provided in Figure 10 (ROC curves for SS-QCCN and RS-QCCN are omitted for these two methods fail to steganalyzeAMR-WB steganography in spite of embedding capacity)It shows that all of the three steganographic methods canresist statistical steganalysis when the embedding rate is100 bps While the statistical steganalytic methods especiallyMFCC-based steganalysismay detect the existence of hiddenmessage embedded with NID-based steganography when theembedding rate is above 100 bps the proposed DN-basedsteganography may still have good security against bothMarkov-based and MFCC-based steganalysis

5 Conclusion

The adaptive multirate wideband (AMR-WB) is a widelyadapted format in mobile handsets and is also the recom-mended speech codec for VoLTE AMR-WB speech may bea good candidate for cover medium in speech steganographyIn this paper a novel AMR-WB speech steganographicmethod is proposed The experimental results demonstratedthe effectiveness of our proposed method The main contri-butions of this paper are as follows

(1) A novel AMR-WB speech steganography is pro-posed based on diameter-neighbor codebook parti-tion algorithm It can provide higher capacity with-out noticeable decrease in speech quality and better

10 Security and Communication Networks

DN

CNVNID

0

05

1

True

pos

itive

rate

02 04 06 08 10False positive rate

(a) Markov (TIMIT 100 bps)DN

CNVNID

02 04 06 08 10False positive rate

0

05

1

True

pos

itive

rate

(b) MFCC (TIMIT 100 bps)

NIDDN

0

05

1

True

pos

itive

rate

02 04 06 08 10False positive rate

(c) Markov (TIMIT 200 bps)

NIDDN

0

05

1

True

pos

itive

rate

02 04 06 08 10False positive rate

(d) MFCC (TIMIT 200 bps)

NIDDN

0

05

1

True

pos

itive

rate

02 04 06 08 10False positive rate

(e) Markov (TIMIT 300 bps)

NIDDN

0

05

1

True

pos

itive

rate

02 04 06 08 10False positive rate

(f) MFCC (TIMIT 300 bps)

Figure 10 ROC curves for steganalysis of CNV-based NID-based and our proposed steganography (50 training rate)

performance against statistical steganalysis thanNID-based method

(2) Flexible embedding capacity may be easily achievedwith different iterations of cluster merging Twicethe embedding capacity of CNV-based embeddingmethod may be obtained with119873119894 = 48

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work was partially supported by the National NaturalScience Foundation of China under Grant no 61632013

References

[1] W Bender D Gruhl N Morimoto and A Lu ldquoTechniques fordata hidingrdquo IBM Systems Journal vol 35 no 3-4 pp 313ndash3351996

[2] D Gruhl A Lu and W Bender ldquoEcho hidingrdquo in InformationHiding R Anderson Ed vol 1174 of Lecture Notes in ComputerScience pp 295ndash315 Springer Berlin Heidelberg Berlin Ger-many 1996

[3] K Gopalan ldquoAudio steganography using bit modificationrdquo inProceedings of the 2003 International Conference on Multimediaand Expo ICME 2003 pp I629ndashI632 USA July 2003

[4] K Gopalan S Wenndt S Adams and D Haddad ldquoAudiosteganography by amplitude or phasemodificationrdquo in Proceed-ings of the Security andWatermarking ofMultimedia Contents Vpp 67ndash76 USA January 2003

[5] D Kirovski and H S Malvar ldquoSpread-spectrum watermarkingof audio signalsrdquo IEEE Transactions on Signal Processing vol 51no 4 pp 1020ndash1033 2003

[6] L Liu M Li Q Li and Y Liang ldquoPerceptually transparentinformation hiding in G729 bitstreamrdquo in Proceedings of the2008 4th International Conference on Intelligent InformationHiding andMultiedia Signal Processing IIH-MSP 2008 pp 406ndash409 China August 2008

[7] T Xu and Z Yang ldquoSimple and effective speech steganog-raphy in G7231 low-rate codesrdquo in Proceedings of the 2009

Security and Communication Networks 11

International Conference on Wireless Communications and Sig-nal Processing WCSP 2009 China November 2009

[8] A Shahbazi A H Rezaie and R Shahbazi ldquoMELPe codedspeech hiding on enhanced full rate compressed domainrdquo inProceedings of the Asia Modelling Symposium 2010 4th Inter-national Conference on Mathematical Modelling and ComputerSimulation AMS2010 pp 267ndash270 Malaysia May 2010

[9] A Nishimura ldquoData hiding in pitch delay data of the adaptivemulti-rate narrow-band speech codecrdquo in Proceedings of theIIH-MSP 2009-2009 5th International Conference on IntelligentInformation Hiding and Multimedia Signal Processing pp 483ndash486 Japan September 2009

[10] B Xiao Y Huang and S Tang ldquoAn approach to informationhiding in low bit-rate speech streamrdquo in Proceedings of the2008 IEEE Global Telecommunications Conference GLOBE-COM 2008 pp 1940ndash1944 USA December 2008

[11] B Chen and G W Wornell ldquoQuantization index modulationa class of provably good methods for digital watermarking andinformation embeddingrdquo Institute of Electrical and ElectronicsEngineers Transactions on InformationTheory vol 47 no 4 pp1423ndash1443 2001

[12] Y F Huang S Tang and J Yuan ldquoSteganography in inactiveframes of VoIP streams encoded by source codecrdquo IEEETransactions on Information Forensics and Security vol 6 no2 pp 296ndash306 2011

[13] YHuang C Liu S Tang and S Bai ldquoSteganography integrationinto a low-bit rate speech codecrdquo IEEE Transactions on Informa-tion Forensics and Security vol 7 no 6 pp 1865ndash1875 2012

[14] H Miao L Huang Z Chen W Yang and A Al-Hawbani ldquoAnew scheme for covert communication via 3G encoded speechrdquoComputers and Electrical Engineering vol 38 no 6 pp 1490ndash1501 2012

[15] H Tian J Liu and S Li ldquoImproving security of quantization-index-modulation steganography in low bit-rate speechstreamsrdquoMultimedia Systems vol 20 no 2 pp 143ndash154 2014

[16] J Liu H Tian J Lu and Y Chen ldquoNeighbor-index-divisionsteganography based on QIM method for G7231 speechstreamsrdquo Journal of Ambient Intelligence and Humanized Com-puting vol 7 no 1 pp 139ndash147 2016

[17] Q Liu A H Sung and M Qiao ldquoDerivative-based audiosteganalysisrdquo ACM Transactions on Multimedia ComputingCommunications andApplications (TOMM) vol 7 no 3 articleno 18 2011

[18] S Li Y Jia and C-C J Kuo ldquoSteganalysis of QIM Steganogra-phy in Low-Bit-Rate Speech Signalsrdquo IEEEACM TransactionsonAudio Speech and Language Processing vol 25 no 5 pp 1011ndash1022 2017

[19] ITU-T Wideband Coding of Speech at around 16 Kbps UsingAdaptive Multi-rate Wideband (AMR-WB) International Tele-communication Union Std G7222 2002

[20] Perceptual Evaluation of Speech Quality (PESQ) An ObjectiveMethod for End-to-end Speech Quality Assessment of Narrow-band Telephone Net-works and Speech Codecs InternationalTelecommunication Union Std P862 2001

[21] Wideband Extension to Recommendation P862 for the Assess-ment of Wideband Telephone Networks and Speech CodecsInternational Telecommunication Union Std P8622 2007

[22] C Chang and C Lin ldquoLIBSVM a Library for support vectormachinesrdquo ACM Transactions on Intelligent Systems and Tech-nology vol 2 no 3 article 27 2011

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 9: A Novel AMR-WB Speech Steganography Based on Diameter ...In this section, a technical overview of AMR-WB codec is rstly presented. en two related codebook partition algorithms CNV

Security and Communication Networks 9

0 11 12 13 14 15 21 22 23 24 25 31 32 33 34 35 41 42 43 44 45 51 52 53 54 55Edge

0123456789

10

Cor

relat

ion

inde

x

12

13

14

15

23

24

25

34

35

45

Figure 8 The correlation index of 1000 AMR-WB speeches wherethe interframe edge 119894119894 connects two vertices 119881119894[119896] and 119881119894[119896 + 1] intwo neighboring frames and the intraframe edge 1198941198951015840 connects twovertices 119881119894[119896] and 119881119895[119896] in the same frame

NID-based steganography increases to 60 when MFCC-based steganalytic method is applied Moreover there isan apparent increase in the accuracy of detecting NID-based hiding method with the embedding rate increases to200 bps or 300 bps when Liu et alrsquos methods (ie Markovand MFCC-based steganalytic methods) are applied But theaccuracy of steganalyzing our proposed method DN-basedsteganography stays at the same level of 50 Therefore theproposed method may defend against Liu et alrsquos statisticalsteganalysis [17] even with higher embedding rates

According to the definition of the correlation index givenin [18] the experimental results of the correlation indices of1000 AMR-WB speeches which are randomly selected fromldquoTIMITrdquo are shown in Figure 8 Based on these results twostrong quantization codeword correlation network (QCCN)models say SS-QCCN and RS-QCCN can be constructedas illustrated in Figure 9 These two models are then usedto steganalyze our proposed steganography The steganalyticresults are also presented in Table 2 It can be seen fromTable 2 that the accuracy of both SS-QCCN and RS-QCCNis less than 50 for all of the AMR-WB stegospeeches Thepossible reasons may be that only the second and thirdcodebooks in the second stage are employed in the AMR-WB speech steganography which means merely the vertices1198812[119896] and 1198813[119896] in the 119896th frame may be changed duringsteganography while none of them are utilized in Li et alrsquossteganalytic method except for the edge ldquo33rdquo in RS-QCCNmodel Besides we also used an adapted QCCN model (ieutilize edges ldquo22rdquo ldquo33rdquo and ldquo231015840rdquo) targeted at AMR-WBspeech but the accuracy is still less than 50 It may bebecause the correlation of those edges is not strong enoughfor steganalysis according to Figure 8 Therefore it is reason-able to conclude that theAMR-WB speech steganography candefend against the steganalytic method proposed in [18]

In order to visualize the detection performance wegive some receiver operating characteristic (ROC) curvesof steganalyzing CNV-based steganography with 100 bpsembedding rate and NID-based and DN-based steganogra-phy with 100 bps 200 bps and 300 bps embedding rates are

V1[k]

V2[k]

V3[k]

V4[k]

V5[k]

V1[k + 1]

V2[k + 1]

V3[k + 1]

V4[k + 1]

V5[k + 1]

45 45

11

middot middot middot middot middot middot

(a) SS-QCCN

V1[k + 1]

V2[k + 1]

V3[k + 1]

V4[k + 1]

V5[k + 1]

V1[k]

V2[k]

V3[k]

V4[k]

V5[k]

45 45

15 15

1414

33

44

11

middot middot middot middot middot middot

(b) RS-QCCN

Figure 9 Two AMR-WB strong correlation network models

provided in Figure 10 (ROC curves for SS-QCCN and RS-QCCN are omitted for these two methods fail to steganalyzeAMR-WB steganography in spite of embedding capacity)It shows that all of the three steganographic methods canresist statistical steganalysis when the embedding rate is100 bps While the statistical steganalytic methods especiallyMFCC-based steganalysismay detect the existence of hiddenmessage embedded with NID-based steganography when theembedding rate is above 100 bps the proposed DN-basedsteganography may still have good security against bothMarkov-based and MFCC-based steganalysis

5 Conclusion

The adaptive multirate wideband (AMR-WB) is a widelyadapted format in mobile handsets and is also the recom-mended speech codec for VoLTE AMR-WB speech may bea good candidate for cover medium in speech steganographyIn this paper a novel AMR-WB speech steganographicmethod is proposed The experimental results demonstratedthe effectiveness of our proposed method The main contri-butions of this paper are as follows

(1) A novel AMR-WB speech steganography is pro-posed based on diameter-neighbor codebook parti-tion algorithm It can provide higher capacity with-out noticeable decrease in speech quality and better

10 Security and Communication Networks

DN

CNVNID

0

05

1

True

pos

itive

rate

02 04 06 08 10False positive rate

(a) Markov (TIMIT 100 bps)DN

CNVNID

02 04 06 08 10False positive rate

0

05

1

True

pos

itive

rate

(b) MFCC (TIMIT 100 bps)

NIDDN

0

05

1

True

pos

itive

rate

02 04 06 08 10False positive rate

(c) Markov (TIMIT 200 bps)

NIDDN

0

05

1

True

pos

itive

rate

02 04 06 08 10False positive rate

(d) MFCC (TIMIT 200 bps)

NIDDN

0

05

1

True

pos

itive

rate

02 04 06 08 10False positive rate

(e) Markov (TIMIT 300 bps)

NIDDN

0

05

1

True

pos

itive

rate

02 04 06 08 10False positive rate

(f) MFCC (TIMIT 300 bps)

Figure 10 ROC curves for steganalysis of CNV-based NID-based and our proposed steganography (50 training rate)

performance against statistical steganalysis thanNID-based method

(2) Flexible embedding capacity may be easily achievedwith different iterations of cluster merging Twicethe embedding capacity of CNV-based embeddingmethod may be obtained with119873119894 = 48

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work was partially supported by the National NaturalScience Foundation of China under Grant no 61632013

References

[1] W Bender D Gruhl N Morimoto and A Lu ldquoTechniques fordata hidingrdquo IBM Systems Journal vol 35 no 3-4 pp 313ndash3351996

[2] D Gruhl A Lu and W Bender ldquoEcho hidingrdquo in InformationHiding R Anderson Ed vol 1174 of Lecture Notes in ComputerScience pp 295ndash315 Springer Berlin Heidelberg Berlin Ger-many 1996

[3] K Gopalan ldquoAudio steganography using bit modificationrdquo inProceedings of the 2003 International Conference on Multimediaand Expo ICME 2003 pp I629ndashI632 USA July 2003

[4] K Gopalan S Wenndt S Adams and D Haddad ldquoAudiosteganography by amplitude or phasemodificationrdquo in Proceed-ings of the Security andWatermarking ofMultimedia Contents Vpp 67ndash76 USA January 2003

[5] D Kirovski and H S Malvar ldquoSpread-spectrum watermarkingof audio signalsrdquo IEEE Transactions on Signal Processing vol 51no 4 pp 1020ndash1033 2003

[6] L Liu M Li Q Li and Y Liang ldquoPerceptually transparentinformation hiding in G729 bitstreamrdquo in Proceedings of the2008 4th International Conference on Intelligent InformationHiding andMultiedia Signal Processing IIH-MSP 2008 pp 406ndash409 China August 2008

[7] T Xu and Z Yang ldquoSimple and effective speech steganog-raphy in G7231 low-rate codesrdquo in Proceedings of the 2009

Security and Communication Networks 11

International Conference on Wireless Communications and Sig-nal Processing WCSP 2009 China November 2009

[8] A Shahbazi A H Rezaie and R Shahbazi ldquoMELPe codedspeech hiding on enhanced full rate compressed domainrdquo inProceedings of the Asia Modelling Symposium 2010 4th Inter-national Conference on Mathematical Modelling and ComputerSimulation AMS2010 pp 267ndash270 Malaysia May 2010

[9] A Nishimura ldquoData hiding in pitch delay data of the adaptivemulti-rate narrow-band speech codecrdquo in Proceedings of theIIH-MSP 2009-2009 5th International Conference on IntelligentInformation Hiding and Multimedia Signal Processing pp 483ndash486 Japan September 2009

[10] B Xiao Y Huang and S Tang ldquoAn approach to informationhiding in low bit-rate speech streamrdquo in Proceedings of the2008 IEEE Global Telecommunications Conference GLOBE-COM 2008 pp 1940ndash1944 USA December 2008

[11] B Chen and G W Wornell ldquoQuantization index modulationa class of provably good methods for digital watermarking andinformation embeddingrdquo Institute of Electrical and ElectronicsEngineers Transactions on InformationTheory vol 47 no 4 pp1423ndash1443 2001

[12] Y F Huang S Tang and J Yuan ldquoSteganography in inactiveframes of VoIP streams encoded by source codecrdquo IEEETransactions on Information Forensics and Security vol 6 no2 pp 296ndash306 2011

[13] YHuang C Liu S Tang and S Bai ldquoSteganography integrationinto a low-bit rate speech codecrdquo IEEE Transactions on Informa-tion Forensics and Security vol 7 no 6 pp 1865ndash1875 2012

[14] H Miao L Huang Z Chen W Yang and A Al-Hawbani ldquoAnew scheme for covert communication via 3G encoded speechrdquoComputers and Electrical Engineering vol 38 no 6 pp 1490ndash1501 2012

[15] H Tian J Liu and S Li ldquoImproving security of quantization-index-modulation steganography in low bit-rate speechstreamsrdquoMultimedia Systems vol 20 no 2 pp 143ndash154 2014

[16] J Liu H Tian J Lu and Y Chen ldquoNeighbor-index-divisionsteganography based on QIM method for G7231 speechstreamsrdquo Journal of Ambient Intelligence and Humanized Com-puting vol 7 no 1 pp 139ndash147 2016

[17] Q Liu A H Sung and M Qiao ldquoDerivative-based audiosteganalysisrdquo ACM Transactions on Multimedia ComputingCommunications andApplications (TOMM) vol 7 no 3 articleno 18 2011

[18] S Li Y Jia and C-C J Kuo ldquoSteganalysis of QIM Steganogra-phy in Low-Bit-Rate Speech Signalsrdquo IEEEACM TransactionsonAudio Speech and Language Processing vol 25 no 5 pp 1011ndash1022 2017

[19] ITU-T Wideband Coding of Speech at around 16 Kbps UsingAdaptive Multi-rate Wideband (AMR-WB) International Tele-communication Union Std G7222 2002

[20] Perceptual Evaluation of Speech Quality (PESQ) An ObjectiveMethod for End-to-end Speech Quality Assessment of Narrow-band Telephone Net-works and Speech Codecs InternationalTelecommunication Union Std P862 2001

[21] Wideband Extension to Recommendation P862 for the Assess-ment of Wideband Telephone Networks and Speech CodecsInternational Telecommunication Union Std P8622 2007

[22] C Chang and C Lin ldquoLIBSVM a Library for support vectormachinesrdquo ACM Transactions on Intelligent Systems and Tech-nology vol 2 no 3 article 27 2011

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 10: A Novel AMR-WB Speech Steganography Based on Diameter ...In this section, a technical overview of AMR-WB codec is rstly presented. en two related codebook partition algorithms CNV

10 Security and Communication Networks

DN

CNVNID

0

05

1

True

pos

itive

rate

02 04 06 08 10False positive rate

(a) Markov (TIMIT 100 bps)DN

CNVNID

02 04 06 08 10False positive rate

0

05

1

True

pos

itive

rate

(b) MFCC (TIMIT 100 bps)

NIDDN

0

05

1

True

pos

itive

rate

02 04 06 08 10False positive rate

(c) Markov (TIMIT 200 bps)

NIDDN

0

05

1

True

pos

itive

rate

02 04 06 08 10False positive rate

(d) MFCC (TIMIT 200 bps)

NIDDN

0

05

1

True

pos

itive

rate

02 04 06 08 10False positive rate

(e) Markov (TIMIT 300 bps)

NIDDN

0

05

1

True

pos

itive

rate

02 04 06 08 10False positive rate

(f) MFCC (TIMIT 300 bps)

Figure 10 ROC curves for steganalysis of CNV-based NID-based and our proposed steganography (50 training rate)

performance against statistical steganalysis thanNID-based method

(2) Flexible embedding capacity may be easily achievedwith different iterations of cluster merging Twicethe embedding capacity of CNV-based embeddingmethod may be obtained with119873119894 = 48

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work was partially supported by the National NaturalScience Foundation of China under Grant no 61632013

References

[1] W Bender D Gruhl N Morimoto and A Lu ldquoTechniques fordata hidingrdquo IBM Systems Journal vol 35 no 3-4 pp 313ndash3351996

[2] D Gruhl A Lu and W Bender ldquoEcho hidingrdquo in InformationHiding R Anderson Ed vol 1174 of Lecture Notes in ComputerScience pp 295ndash315 Springer Berlin Heidelberg Berlin Ger-many 1996

[3] K Gopalan ldquoAudio steganography using bit modificationrdquo inProceedings of the 2003 International Conference on Multimediaand Expo ICME 2003 pp I629ndashI632 USA July 2003

[4] K Gopalan S Wenndt S Adams and D Haddad ldquoAudiosteganography by amplitude or phasemodificationrdquo in Proceed-ings of the Security andWatermarking ofMultimedia Contents Vpp 67ndash76 USA January 2003

[5] D Kirovski and H S Malvar ldquoSpread-spectrum watermarkingof audio signalsrdquo IEEE Transactions on Signal Processing vol 51no 4 pp 1020ndash1033 2003

[6] L Liu M Li Q Li and Y Liang ldquoPerceptually transparentinformation hiding in G729 bitstreamrdquo in Proceedings of the2008 4th International Conference on Intelligent InformationHiding andMultiedia Signal Processing IIH-MSP 2008 pp 406ndash409 China August 2008

[7] T Xu and Z Yang ldquoSimple and effective speech steganog-raphy in G7231 low-rate codesrdquo in Proceedings of the 2009

Security and Communication Networks 11

International Conference on Wireless Communications and Sig-nal Processing WCSP 2009 China November 2009

[8] A Shahbazi A H Rezaie and R Shahbazi ldquoMELPe codedspeech hiding on enhanced full rate compressed domainrdquo inProceedings of the Asia Modelling Symposium 2010 4th Inter-national Conference on Mathematical Modelling and ComputerSimulation AMS2010 pp 267ndash270 Malaysia May 2010

[9] A Nishimura ldquoData hiding in pitch delay data of the adaptivemulti-rate narrow-band speech codecrdquo in Proceedings of theIIH-MSP 2009-2009 5th International Conference on IntelligentInformation Hiding and Multimedia Signal Processing pp 483ndash486 Japan September 2009

[10] B Xiao Y Huang and S Tang ldquoAn approach to informationhiding in low bit-rate speech streamrdquo in Proceedings of the2008 IEEE Global Telecommunications Conference GLOBE-COM 2008 pp 1940ndash1944 USA December 2008

[11] B Chen and G W Wornell ldquoQuantization index modulationa class of provably good methods for digital watermarking andinformation embeddingrdquo Institute of Electrical and ElectronicsEngineers Transactions on InformationTheory vol 47 no 4 pp1423ndash1443 2001

[12] Y F Huang S Tang and J Yuan ldquoSteganography in inactiveframes of VoIP streams encoded by source codecrdquo IEEETransactions on Information Forensics and Security vol 6 no2 pp 296ndash306 2011

[13] YHuang C Liu S Tang and S Bai ldquoSteganography integrationinto a low-bit rate speech codecrdquo IEEE Transactions on Informa-tion Forensics and Security vol 7 no 6 pp 1865ndash1875 2012

[14] H Miao L Huang Z Chen W Yang and A Al-Hawbani ldquoAnew scheme for covert communication via 3G encoded speechrdquoComputers and Electrical Engineering vol 38 no 6 pp 1490ndash1501 2012

[15] H Tian J Liu and S Li ldquoImproving security of quantization-index-modulation steganography in low bit-rate speechstreamsrdquoMultimedia Systems vol 20 no 2 pp 143ndash154 2014

[16] J Liu H Tian J Lu and Y Chen ldquoNeighbor-index-divisionsteganography based on QIM method for G7231 speechstreamsrdquo Journal of Ambient Intelligence and Humanized Com-puting vol 7 no 1 pp 139ndash147 2016

[17] Q Liu A H Sung and M Qiao ldquoDerivative-based audiosteganalysisrdquo ACM Transactions on Multimedia ComputingCommunications andApplications (TOMM) vol 7 no 3 articleno 18 2011

[18] S Li Y Jia and C-C J Kuo ldquoSteganalysis of QIM Steganogra-phy in Low-Bit-Rate Speech Signalsrdquo IEEEACM TransactionsonAudio Speech and Language Processing vol 25 no 5 pp 1011ndash1022 2017

[19] ITU-T Wideband Coding of Speech at around 16 Kbps UsingAdaptive Multi-rate Wideband (AMR-WB) International Tele-communication Union Std G7222 2002

[20] Perceptual Evaluation of Speech Quality (PESQ) An ObjectiveMethod for End-to-end Speech Quality Assessment of Narrow-band Telephone Net-works and Speech Codecs InternationalTelecommunication Union Std P862 2001

[21] Wideband Extension to Recommendation P862 for the Assess-ment of Wideband Telephone Networks and Speech CodecsInternational Telecommunication Union Std P8622 2007

[22] C Chang and C Lin ldquoLIBSVM a Library for support vectormachinesrdquo ACM Transactions on Intelligent Systems and Tech-nology vol 2 no 3 article 27 2011

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 11: A Novel AMR-WB Speech Steganography Based on Diameter ...In this section, a technical overview of AMR-WB codec is rstly presented. en two related codebook partition algorithms CNV

Security and Communication Networks 11

International Conference on Wireless Communications and Sig-nal Processing WCSP 2009 China November 2009

[8] A Shahbazi A H Rezaie and R Shahbazi ldquoMELPe codedspeech hiding on enhanced full rate compressed domainrdquo inProceedings of the Asia Modelling Symposium 2010 4th Inter-national Conference on Mathematical Modelling and ComputerSimulation AMS2010 pp 267ndash270 Malaysia May 2010

[9] A Nishimura ldquoData hiding in pitch delay data of the adaptivemulti-rate narrow-band speech codecrdquo in Proceedings of theIIH-MSP 2009-2009 5th International Conference on IntelligentInformation Hiding and Multimedia Signal Processing pp 483ndash486 Japan September 2009

[10] B Xiao Y Huang and S Tang ldquoAn approach to informationhiding in low bit-rate speech streamrdquo in Proceedings of the2008 IEEE Global Telecommunications Conference GLOBE-COM 2008 pp 1940ndash1944 USA December 2008

[11] B Chen and G W Wornell ldquoQuantization index modulationa class of provably good methods for digital watermarking andinformation embeddingrdquo Institute of Electrical and ElectronicsEngineers Transactions on InformationTheory vol 47 no 4 pp1423ndash1443 2001

[12] Y F Huang S Tang and J Yuan ldquoSteganography in inactiveframes of VoIP streams encoded by source codecrdquo IEEETransactions on Information Forensics and Security vol 6 no2 pp 296ndash306 2011

[13] YHuang C Liu S Tang and S Bai ldquoSteganography integrationinto a low-bit rate speech codecrdquo IEEE Transactions on Informa-tion Forensics and Security vol 7 no 6 pp 1865ndash1875 2012

[14] H Miao L Huang Z Chen W Yang and A Al-Hawbani ldquoAnew scheme for covert communication via 3G encoded speechrdquoComputers and Electrical Engineering vol 38 no 6 pp 1490ndash1501 2012

[15] H Tian J Liu and S Li ldquoImproving security of quantization-index-modulation steganography in low bit-rate speechstreamsrdquoMultimedia Systems vol 20 no 2 pp 143ndash154 2014

[16] J Liu H Tian J Lu and Y Chen ldquoNeighbor-index-divisionsteganography based on QIM method for G7231 speechstreamsrdquo Journal of Ambient Intelligence and Humanized Com-puting vol 7 no 1 pp 139ndash147 2016

[17] Q Liu A H Sung and M Qiao ldquoDerivative-based audiosteganalysisrdquo ACM Transactions on Multimedia ComputingCommunications andApplications (TOMM) vol 7 no 3 articleno 18 2011

[18] S Li Y Jia and C-C J Kuo ldquoSteganalysis of QIM Steganogra-phy in Low-Bit-Rate Speech Signalsrdquo IEEEACM TransactionsonAudio Speech and Language Processing vol 25 no 5 pp 1011ndash1022 2017

[19] ITU-T Wideband Coding of Speech at around 16 Kbps UsingAdaptive Multi-rate Wideband (AMR-WB) International Tele-communication Union Std G7222 2002

[20] Perceptual Evaluation of Speech Quality (PESQ) An ObjectiveMethod for End-to-end Speech Quality Assessment of Narrow-band Telephone Net-works and Speech Codecs InternationalTelecommunication Union Std P862 2001

[21] Wideband Extension to Recommendation P862 for the Assess-ment of Wideband Telephone Networks and Speech CodecsInternational Telecommunication Union Std P8622 2007

[22] C Chang and C Lin ldquoLIBSVM a Library for support vectormachinesrdquo ACM Transactions on Intelligent Systems and Tech-nology vol 2 no 3 article 27 2011

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 12: A Novel AMR-WB Speech Steganography Based on Diameter ...In this section, a technical overview of AMR-WB codec is rstly presented. en two related codebook partition algorithms CNV

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom