a novel amr-wb speech steganography based on diameter ...in this section, a technical overview of...
TRANSCRIPT
Research ArticleA Novel AMR-WB Speech Steganography Based onDiameter-Neighbor Codebook Partition
Junhui He 1 Junxi Chen1 Shichang Xiao1 Xiaoyu Huang 2 and Shaohua Tang1
1School of Computer Science and Engineering South China University of Technology Guangzhou 510006 China2School of Economics and Commerce South China University of Technology Guangzhou 510006 China
Correspondence should be addressed to Junhui He hejhscuteducn
Received 28 September 2017 Accepted 26 December 2017 Published 13 February 2018
Academic Editor Remi Cogranne
Copyright copy 2018 Junhui He et al This is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited
Steganography is a means of covert communication without revealing the occurrence and the real purpose of communicationThe adaptive multirate wideband (AMR-WB) is a widely adapted format in mobile handsets and is also the recommended speechcodec for VoLTE In this paper a novel AMR-WB speech steganography is proposed based on diameter-neighbor codebookpartition algorithm Different embedding capacity may be achieved by adjusting the iterative parameters during codebook divisionThe experimental results prove that the presented AMR-WB steganography may provide higher and flexible embedding capacitywithout inducing perceptible distortion compared with the state-of-the-art methods With 48 iterations of cluster merging twicethe embedding capacity of complementary-neighbor-vertices-based embedding method may be obtained with a decrease of onlyaround 2 in speech quality and much the same undetectability Moreover both the quality of stego speech and the securityregarding statistical steganalysis are better than the recent speech steganography based on neighbor-index-division codebookpartition
1 Introduction
With the rapid development of the Internet and the grow-ing popularity of instant messaging application people areincreasingly using audio-based communication How toavoid interception and secure communication turns into oneof the most important research problems Encryption is aconventionalmethodof protecting communication howeverthe transmission of ciphered content may easily arouseattackersrsquo suspicion In recent years steganography has beenpresented as an effective means of covert communica-tion Audio steganography can transfer important messagessecretly by embedding them into cover audio files with theuse of information hiding techniques [1]Data hiding in audiois especially challenging because the human auditory systemoperates over a wider dynamic range in comparison withhuman visual system
Many works on audio steganography have been alreadyreported Gruhl et al [2] proposed an audio steganographicmethod of echo hiding by the introduction of synthetic res-onances in the form of closely spaced echoes Gopalan [3]
presented a method of embedding a covert audio messageinto a cover utterance by altering one bit in each of the coverutterance samples Gopalan et al [4] provided two methodsof secret message embedding by modifying the phase oramplitude of perceptually masked or significant regionsof a host And a direct-sequence spread-spectrum water-marking method with strong robustness against commonaudio editing procedures was proposed in [5] And manyaudio steganographic applications including Steghide andHide4PGP can be freely downloaded from the Internet Butmost of these methods are not resilient to AMR-WB speech
Based on segmental SNR analysis of modification to theencoded bits in a frame Liu et al [6] selected the perceptuallyleast important bits to embed secret message in G729 speechIn [7] a simple and effective steganographic approach whichmay be applied to 53 Kbps G7231 speech was presentedbased on analyzing the redundancy of code parametersand augmented identity matrix was utilized to lower thedistortion of cover speech Similarly by calculating speechquality sensitivity on each encoded bit out of 244 bits usingperceptual evaluation of speech quality (PESQ) criterion a
HindawiSecurity and Communication NetworksVolume 2018 Article ID 7080673 11 pageshttpsdoiorg10115520187080673
2 Security and Communication Networks
data hiding approach to embedding data in enhanced fullrate (EFR) compressed speech bitstream is proposed in [8]In addition Nishimura [9] proposed threemethods of hidingdata in the pitch delay data of the AMR speech
Based on complementary neighbor vertices codebookpartition algorithm (CNV) Xiao et al [10] presented anapproach to information hiding in compressed speech withthe use of quantization index modulation (QIM) [11] Huanget al [12] proposed a steganographic algorithm for embed-ding data in different speech encoding parameters of theinactive frames the embedding capacity of which is boundedby the number of inactive frames in the cover speech In [13]Huang et al also presented a method for steganography inlow bit-rate VoIP streams based on pitch period prediction Itcan achieve high quality of stegospeech and prevent statisticalsteganalysis but the embedding rate is still low (only about1333 bps) And an adaptive suboptimal pulse combinationconstrained (ASOPCC) method was presented in [14] toembed data into compressed speech signal of AMR-WBcodec However most of the PESQ scores in different codingmodes are not high In [15] a key-based codebook partitionstrategy which dynamically determines the adopted divisionscheme was designed to improve the security of the QIMsteganography in speech bitstreamAlthough the stegospeechquality is guaranteed to be good the embedding capacity isvery limited and not adjustable Liu et al [16] proposed aneighbor-index-division codebook division algorithm (NID)for G7231 speech Differing from the existing CNVmethodNID divides neighbor-indexed codewords into separatedsubcodebooks according to a suitable stegocoding strategyThe embedding capacity is improved by using multipledivision and multi-ary coding strategy
The adaptive multirate wideband (AMR-WB) is a widelyadapted format in mobile handsets and is also the recom-mended speech codec for VoLTE AMR-WB speech may bea good candidate for cover medium in audio steganographyTherefore we will focus on AMR-WB speech steganographyin this paper Firstly a new diameter-neighbor (DN) code-book partition algorithm toward AMR-WB speech is pro-posed Based on DN codebook division we develop a novelAMR-WB speech steganography capable of providing flexibleembedding capacity with different iterative parameter 119873119894For example when 119873119894 = 48 twice the embedding capacityof CNV-based method may be obtained with a decrease ofonly about 2 in speech quality and much the same unde-tectability Moreover both the quality of stego speech and thesecurity of defending against statistical steganalysis [17 18]are better than the recent NID-based speech steganography
The remainder of this paper is organized as follows InSection 2 the related work is briefly introduced In Section 3the proposedDNcodebook partition algorithm and the novelAMR-WB speech steganography are described in detail Theexperimental results and analysis are provided in Section 4Finally conclusions are presented
2 Related Work
In this section a technical overview of AMR-WB codecis firstly presented Then two related codebook partition
algorithms CNV [10] and NID [16] are also briefly re-viewed
21 AMR-WB Codec The AMR-WB speech codec is stan-dardized by 3GPP (3rd Generation Partnership Project) andadopted as the standard G7222 by ITU-T in 2002 [19] It is amultirate wideband speech codec applied in modern mobilecommunication systems to remarkably improve the speechquality The AMR-WB codec operates at a multitude of bitrates ranging from 66 kbits to 2385 kbits
The input audio signal is separated into 20ms long frameusing 16 kHz sampling rate Every frame contains a linear pre-diction analysis (LPA) and the LP coefficients are converted toimmittance spectrum pairs (ISP) coefficients ISP coefficientsare then converted to frequency domain (ISF) for quantiza-tion Except for mode 0 (66 kbits) the ISF coefficients arequantized using two-stage vector quantization with split-by-2 in first stage and split-by-5 in the second stage Both thesecond and the third codebooks in the second stage have 128codewords and the ISF indices of the codewords in thesecodebooks may be employed to embed secret message
In the decoder the transmitted indices are first parsedfrom the received bitstream and then decoded to obtain thecode parameters for each transmitted frame such as the ISPvector the 4 fractional pitch lags the 4 LTP filtering parame-ters the 4 innovative code vectors and the 4 sets of vectorquantized pitch and innovative gains For a more detaileddescription one should refer to [19] From the received ISFindices which may have been modified because of secretmessage embedding the receiver can recover the embeddedsecret message
22 Complementary Neighbor Vertices CNV is a new typeof codebook partition algorithm proposed in [10] in whicheach codeword in a codebook is viewed as a vertex inthe multidimensional space The relationship between twocodewords 119883 and 119884 is described as an edge connecting thetwo codewordsrsquo vertices And the weight of an edge is definedas the Euclidean distance 119863(119883119884) between two codewords 119883and 119884 Small value of 119863(119883119884) indicates that 119883 and 119884 bear aclose resemblance to each other The vertex nearest to 119883 isreferred to as119883rsquos neighbor vertex which is denoted by119873(119883)The vertex set 119881 together with the edge set 119864 form a graph119866(119881 119864) in a multidimensional space
The codebook partition is realized by the constructionof the graph 119866(119881 119864) and vertex labelling First each vertex119883 in 119866(119881 119864) is connected with its neighbor vertex 119873(119883)using an edge Thus the graph 119866(119881 119864) would be divided intoseveral isolated subgraphs each of whichmay be proved to beacyclic and 2-colorable Second every vertex and its neighborvertex in a subgraph are labelled oppositely using ldquo0rdquo or ldquo1rdquoThird all of the vertices with same label are collected into asubcodebook hence two subcodebooks will be obtained
Based on the generated subgraphs and the label assignedto each codeword in themCNV-based steganography appliesQIM concept to embed secret message More specificallywhen the label of the codeword 119883 which is associatedwith the cover quantization index 119868119886 agrees with the secretmessage 119868119886 remains unchanged or else it should be replaced
Security and Communication Networks 3
AMR-WBspeech
AMR-WBspeechIndex
parse
Secret Codebooksmessage
Partition
StegoAMR-WB
speech
speech
Stego
Cluster set
Stego ISFindices
ISFindicesindices
stego ISFIndexparse
Public SecretmessageEmbed Index
update channelExtract
AMR-WBDecoder
Decoded
Figure 1 Diagram of the proposed method
with the quantization index of the neighbor codeword119873(119883)which belongs to the opposite subcodebook
The key characteristic of CNV-based steganography isthat the distortion is bound even in the worst case How-ever the embedding capacity is limited which is analyzedexperimentally in Section 4 Moreover the number of pos-sible combinations of flipping coefficients which determinewhether the labels in a subgraph will be flipped is large Extrainformation about the flipping processmust be transmitted tothe receiver and thus the effective embedding capacitymay bedecreased further
23 Neighbor Index Division NID assumes that the code-words of neighbor indices (ie neighbor positions) in acodebook would be close together Hence the codewordsin a codebook can be easily separated into subcodebooksaccording to their indices instead of the Euclidean distanceSpecifically select an appropriate integer 119896 according to thedemand for embedding capacity and label the 119894th codewordwith digit (119894 minus 1) mod 119896 respectively Then collect all thecodewords with same label into a subcodebook and obtain119896 different subcodebooks
In order to take full use of the embedding capacitythe binary secret message should be transformed into 119896-ary digits denoted by 119898 (119898 isin 0 1 119896 minus 1) When thecodeword related to the cover quantization index belongs tothe subcodebook whose label differs from the 119896-ary digit 119898to be embedded this index should be substituted with that ofthe closest codeword in the corresponding subcodebook119898
NID-based steganography is an information hidingmethod based on neighbor-index codebook partition ofwhich the embedding capacity may be controlled by thenumber of subcodebooks 119896 However as illustrated in [16]only about 34 of the pairs of neighbor-index codewordshappened to be the pairs of neighbor-vertex codewords Andthe mean distance between neighbor-index codewords isapparently larger than that of neighbor-vertex codewordsTherefore the amount of distortion induced by NID-basedsteganography may be a little large which is proved by theexperimental results provided in Section 4
3 Proposed Method
The diagram of the proposed method is shown in Figure 1Based on DN codebook partition of the codebooks described
in Section 21 secret message can be embedded into anAMR-WB speech file After the stego AMR-WB speech fileis received the embedded secret message can be extractedwithout errors At the same time the decoded speech withoutperceptible distortion will also be obtained In the followingsection the diameter-neighbor codebook partition algorithm(DN) is first introducedThen the embedding and extractionprocedure of our proposed method are described
31 Codebook Partition A codebook may be viewed as alist of isolated code vectors (ie codewords) in the multidi-mensional space The codebook partition algorithm used foraudio steganography is to divide the codebook into severalclusters in each of which the codewords can be replaced witheach other without causing perceptible distortion
Let 119861 denote the original codebook with 119873119887 code-words and 119862 denote a cluster with 119873119888 codewords 119882119905 (119905 =1 2 119873119888) and the centroid 119866 of a cluster 119862 is defined asfollows
119866 (119894) = 1119873119888119873119888sum119905=1
119882119905 (119894) (1)
where 119866(119894) and 119882119905(119894) are the 119894th components of 119866 and 119882119905respectively
The centroid 119866 (average code vector) is used to representthe corresponding cluster 119862 hence the cluster 119862 may alsobe considered as a vector in the multidimensional codebookspace In order to describe the similarity between two clusters1198621 and1198622 the Euclidean distance between them is defined asfollows
119863(1198621 1198622) = radic 119899sum119894=1
(1198661 (119894) minus 1198662 (119894))2 (2)
where 1198661 and 1198662 are the corresponding geometric centerpoints of the two clusters 1198621 and 1198622 And 119899 is the dimensionof a codeword 1198661(119894) and 1198662(119894) are the 119894th components of 1198661and 1198662 respectively
Let 119878 denote a cluster set The diameter of 119878 is defined asthe maximal Euclidean distance119863119898 of all cluster pairs in thecluster set 119878 that is
119863(119862119901 119862119902) le 119863119898 forall119901 119902 = 1 2 |119878| (3)
4 Security and Communication Networks
Codebook B
Initialize a cluster set Sby taking each codewordas a independent clusterand a empty cluster set S
Cluster set S
Put the remainingclusters in S into S
No
Yes
Put the clustersin S into S tomake S empty
S is empty
No
Yes
Search for the diametercluster pair(Cd1 Cd2) in S
Remove Cd1 Cd2 and theirneighbors from S put
Merge Cd1 Cd2 withtheir neighbors respectivelyinto two new clusters
Ni gt 0
Ni = Ni minus 1
into STemp1 and Temp2Temp1 and Temp2
Figure 2 Diagram of our proposed codebook partition
where |119878| is the number of clusters within the cluster set119878 The cluster pair with maximal Euclidean distance 119863119898called diameter cluster pair is denoted by (1198621198891 1198621198892) And theneighbor of a cluster119862 in 119878 is represented by119873(119862 119878) then wehave
119863 (119862119873 (119862 119878)) le 119863 (119862 119862119901) forall119901 = 1 2 |119878| (4)
Figure 2 illustrates the diagram of the proposed DNcodebook partition algorithm And its detailed procedureis given in Algorithm 1 The original codebook 119861 will bedivided into |119878| clusters by iteratively merging the diametercluster pair with their respective neighbors An iterationparameter119873119894 is applied to obtain flexible embedding capacitythrough controlling the merging procedure The relationshipbetween119873119894 and the embedding capacity will be discussed inSection 43
Figure 3 is provided as an example to illustrate theproposed codebook partition algorithmThe white circle ldquoIrdquodenotes a codeword And the oval ldquordquo with shadow denotesa codeword and its neighbor in 119878 being processed whilethe oval ldquordquo without shadow represents a cluster in 1198781015840 thathas been formed The ldquo0rdquo ldquo1rdquo ldquo00rdquo ldquo01rdquo ldquo10rdquo or ldquo11rdquo ina circle ldquoIrdquo is the label of a codeword in the cluster Thecross ldquotimesrdquo means the centroid of the cluster it belongs to anda line ldquominusrdquo represents the diameter of a cluster set The firstto third merging iterations are shown in Figures 3(a)ndash3(c)respectively The fourth merging iteration is comprised of
Figures 3(d) and 3(e) and Figure 3(f) demonstrates thelabelling of the codewords
32 Embedding Procedure In our proposed method the ISFindices corresponding to the codewords in the codebook arefirst obtained by parsing the host AMR-WB speechThen theISF indices are employed to embed secret message based oncodebook partition Generally the codewords in the samecluster as the codeword referred by 119868119886 lies in are consideredto be replaceable with each other According to the secretmessage to be embedded 119868119886 may be substituted by one of theother codewordsrsquo indiceswithin the same clusterThenumberof secret message bits that can be embedded depends on thesize of the specific cluster The embedding procedures aregiven in the following
Step 1 Search cluster set 119878 for the cluster 119862 which containsthe codeword referred by the ISF index 119868119886Step 2 If there are 119873 codewords in 119862 the number of secretbits that can be embedded into 119868119886 is calculated as 119899 = lfloorlog2119873rfloorStep 3 Read 119899 not-yet-embedded bits denoted by 119898 fromthe secret message 119868119886 is replaced with 119868119887 which indexes thecodeword with the same label as119898
Step 4 Repeat Steps 1ndash3 until all the secret bits are embedded
Security and Communication Networks 5
(a) 1st iteration (119873119894 = 4) (b) 2nd iteration (119873119894 = 3) (c) 3rd iteration (119873119894 = 2)
(d) 4th iteration (119878 = 1198781015840 1198781015840clear()) (e) 4th iteration (119873119894 = 1)
0
0
1
1
11
11
00
0001 01
1010
(f) Labelling
Figure 3 An example of our proposed codebook partition
Input Codebook 119861 iterative parameter119873119894Output Cluster set 119878⋆ 1198781015840 is a helper cluster set ⋆1198781015840clear()119878clear()⋆ Each codeword is taken as a initial cluster ⋆for 119894 = 0 119894 lt 119873119888 ++119894 do119878push (119862119894)end⋆ Iterative merging ⋆while 119873119894 gt 0 do
if 119878 is empty then119878 = 11987810158401198781015840clear()end(1198621198891 1198621198891) = argmax119894119895isin12|119878|119863(119862119894 119862119895)1198791198901198981199011 = 1198621198891 cup 119873(1198621198891 119878)1198791198901198981199012 = 1198621198892 cup 119873(1198621198892 119878)1198781015840push (1198791198901198981199011)1198781015840push (1198791198901198981199012)119878remove (1198621198891)119878remove (1198621198892)119878remove (119873(1198621198891 119878))119878remove (119873(1198621198892 119878))119873119894 = 119873119894 minus 1
end⋆ Put the remaining clusters in 1198781015840 into 119878 ⋆for iter = 1198781015840begin() iter lt 1198781015840end() ++iterdo119878push (lowastiter)
endreturn 119878
Algorithm 1 DN-based codebook partition algorithm
Cluster set S
ISF
Stego ISF
Secret
00 01
1110
0 1
Search amp replace
ClusteL1
ClusteL2
index Ia
index Ib
Wa Wb
WcWd
bits ldquo01rdquo
Ia Ib
Figure 4 Embedding two bits into one cover ISF index
Figure 4 is an example of embedding two secret bits intoone cover ISF index Let us assume the cluster set 119878 containstwo clusters and the corresponding codeword indexed by 119868119909is119882119909 for example 119868119887 indexes the codeword119882119887 Hence theISF index 119868119886 shown in Figure 4 will be replaced with 119868119887 whichindexes the codeword119882119887 with the same label as the secret bitsldquo01rdquo
33 Extracting Procedure When the stego AMR-WB speechis transferred to the intended receiver the stego indices maybe obtained by parsing AMR-WB speech stream and used toextract the embedded secretmessageThemessage extractionprocedures from the stegoindex 119868119887 are given below
Step 1 Search cluster set 119878 which is the same as that employedin the embedding procedure for the cluster119862which containsthe codeword119882119887 referred by the ISF index 119868119887Step 2 If there are totally 119873 codewords in 119862 the number ofsecret bits carried by 119868119887 is computed by 119899 = lfloorlog2119873rfloor
6 Security and Communication Networks
Stego ISF
Cluster set S
Extracted
00 01
1110
0 1
Search amp read
ClusteL1
ClusteL2
index Ib
Wa Wb
Wc Wd
bits ldquo01rdquo
ldquo01rdquo
Ib
Figure 5 Extracting two bits from one stego-ISF index
Step 3 Read the label of119882119887 as the extracted 119899 bits which areappended to the secret message bit sequence
Step 4 Repeat Steps 1ndash3 until all the secret bits are recovered
Figure 5 is the corresponding example of extracting twosecret bits from the stegoindex 119868119887 generated by the previousembedding instance shown in Figure 4 It can be easily seenthat the extracted secret bits are identical to the embeddedsecret bits
4 Experimental Results and Analysis
In order to demonstrate the performance of the proposedmethod the perceptual quality of the stego AMR-WB speechwith secret message embedded using our method is com-puted and compared to that of the stego AMR-WB speechgenerated with CNV and NID steganography Moreover theflexibility of embedding capacity and the security regardingstatistical detection are analyzed in detail
41 Audio Database TIMIT acoustic-phonetic continuousspeech corpus (httpscatalogldcupenneduldc93s1) is anaudio database which contains broadband recordings of630 speakers of eight major dialects of American Englisheach reading ten phonetically rich sentences and all audiosentences are sampled at 16 kHz In our experiments 1000audio sentences are randomly chosen from TIMIT databaseThe average maximum and minimum length of the chosenaudio sentences are 347 s 396 s and 312 s All audio files areconverted into AMR-WB format using standard codec
42 Speech Quality Evaluation The perceptual evaluation ofspeech quality (PESQ) described in the ITU-T P862 Recom-mendation [20] may be employed to evaluate speech qualityMoreover according to ITU-T P8622 [21] the raw PESQscore can be converted to mean opinion score-listening qual-ity objective (MOS-LQO) which is more suitable for evalu-ating wideband speech Hence MOS-LQO is applied in ourexperimentsThe normal range ofMOS-LQO score is 1017 to4549 The higher the score the better the quality
Figure 6 shows the MOS-LQO scores of the 1000 coverAMR-WB speeches in 2385 kbits mode and the correspond-ing stego AMR-WB speeches using three different codebookpartition algorithmsThree progressive embedding rates that
StandardCNVNID
Ours
100 200 300 400 500 600 700 800 900 10000Sample index
18222630343842
MO
S-LQ
O sc
ore
(a) The embedding rate is 100 bps
StandardNIDOurs
18222630343842
MO
S-LQ
O sc
ore
100 200 300 400 500 600 700 800 900 10000Sample index
(b) The embedding rate is 200 bps
StandardNIDOurs
100 200 300 400 500 600 700 800 900 10000Sample index
1418222630343842
MO
S-LQ
O sc
ore
(c) The embedding rate is 300 bps
Figure 6 Comparisons of MOS-LQO values for 1000 samplesbetween the standard AMR-WB codec CNV-based steganographyNID-based steganography and the proposedDN-based steganogra-phy
is 100 bps 200 bps and 300 bps are employed in our experi-ments The indices of speech samples are sorted according totheMOS-LQO scores of our proposedmethod It can be seenfrom Figure 6 that the overall scores of the stego AMR-WBspeeches generated with our method are higher than thoseof the NID-based stego AMR-WB speeches especially whenthe embedding rates are 200 bps and 300 bps And the MOS-LQO scores of the CNV-based stego AMR-WB speeches areslightly higher than ours when the embedding rate is 100 bpswhich means there are no obvious discrepancies in speechquality between them Besides when the high embeddingrate that is 200 bps or 300 bps is used the decrease inMOS-LQO scores of our stego AMR-WB speeches is significantlysmaller than that of NID-based steganography
Moreover the average MOS-LQO scores of the coverAMR-WB speeches and the stego AMR-WB speeches withthree different codebook partition algorithms that is CNV
Security and Communication Networks 7
Table 1 MOS-LQO scores of the standard codec CNV-based NID-based and our proposed steganography in four different rate modes andthree embedding rates
Embedding rate Method Rate mode (kbits)1265 1585 1985 2385
Standard 2929 3073 3199 3269
100 bps
CNV 2871 3021 3153 3225(minus20) (minus17) (minus14) (minus13)NID 2750 2895 3020 3091(minus61) (minus58) (minus56) (minus54)Ours 2864 3010 3139 3216(minus22) (minus20) (minus19) (minus16)
200 bps
CNV
NID 2601 2736 2875 2921(minus112) (minus110) (minus107) (minus106)Ours 2807 2955 3084 3164(minus42) (minus38) (minus36) (minus32)
300 bps
CNV
NID 2284 2386 2475 2533(minus220) (minus223) (minus226) (minus225)Ours 2699 2841 2971 3046(minus79) (minus75) (minus71) (minus68)
NID and DN including four rate modes (1265 kbits1585 kbits 1985 kbits and 2385 kbits) together with threekinds of embedding rate (100 bps 200 bps and 300 bps) aregiven in Table 1 Only the MOS-LQO scores of NID-basedand DN-based steganographic methods with embeddingrates 200 bps and 300 bps are given in Table 1 because theembedding capacity of CNV-based steganography may notbe larger than 100 bps
When the embedding rate is 100 bps which is almostthe limit of CNV steganography we can see from Table 1that the mean MOS-LQO scores of our proposed methodare only about 03 worse than CNV-based steganographyThe slight decrease may be almost imperceptible by humanauditory system (HAS) And there are significant increases ofapproximately 38 in the meanMOS-LQO scores when ourpresented method is compared to NID-based steganographyAnd it can be observed that when the embedding rates are200 bps and 300 bps the scores of our approach are improvedby about 7 and 15 correspondingly in contrast to those ofNID-based steganography
Furthermore we can also see that the experimentalresults of four rate modes are analogous The decrease ofspeech quality caused by NID-based steganography is morethan twice that caused by DN-based steganography And theproposedmethod can obtain twice the embedding capacity ofCNV-based steganography by sacrificing less than 2 speechquality in average In addition only a slight decline in speechquality is observed when 300 bps embedding rate is used inthe proposed DN-based method while 200 bps is employedin NID-based method
43 Flexible Embedding Capacity Compared to CNV-basedsteganography flexible embedding capacity may be obtained
to satisfy different practical demand with our proposedmethod The steganographic capacity can be adjusted bychanging the iteration parameter 119873119894 For different values of119873119894 for example 119873119894 = 32 33 54 the average embeddingcapacity and the MOS-LQO scores are given in Figure 7(a)and the corresponding results of NID-based steganographyare provided in Figure 7(b) for comparison Without loss ofgenerality only 2385 kbits mode is used
From Figure 7 we can observe that the embedding ratesignificantly increases with 119873119894 while the MOS-LQO scoreslightly goes down However as NID-based steganographyis concerned the MOS-LQO score rapidly declines with theincrease of the embedding rateTherefore the proposed DN-based steganography can achieve higher embedding capacitywith slight decrease in speech quality For example when119873119894 = 48 the size of each cluster in 119878 is equal to 4 and we canembed 4 bits per frame that is the embedding rate is 200 bpsbut at the same time the CNV algorithm can embed at most2 bits per frame (100 bps)
44 Resistibility of Statistical Steganalysis Speech steganog-raphy aims to hide secret message into cover speech withoutarousing suspicion It is very important for a steganographicmethod to resist statistical steganalysis which is the tech-nique of detecting the presence of hidden message Twostate-of-the-art steganalytic methods [17 18] are used toevaluate the performance of statistical undetectability of ourproposed method In [17] mel-cepstrum coefficients andMarkov transition features from the second-order derivativeof the audio signal are extracted to capture the statisticaldistortions caused by audio steganography while in [18]the correlation characteristics of split vector quantizationcodewords of linear predictive coding filter coefficients are
8 Security and Communication Networks
Embe
ddin
g ra
te (b
ps)
Embedding rateMOS-LQO
330
290
250
210
170
130
90
Times of cluster merging
33
31
29
27
25
23
MO
S-LQ
O sc
ore
565350474441383532
(a) Our proposed steganography
Number of sub-codebooks
Embedding rateMOS-LQO
109876543290
130
170
210
250
290
330
Embe
ddin
g ra
te (b
ps)
23
25
27
29
31
33
MO
S-LQ
O sc
ore
(b) NID-based steganography
Figure 7 Relationship between the embedding rates and the MOS-LQO scores for our proposed steganography and NID-basedsteganography
Table 2 Steganalysis results of different steganographic methods in 2385 kbitss mode
Training rate 04 05 06Method Markov MFCC SS-QCCN RS-QCCN Markov MFCC SS-QCCN RS-QCCN Markov MFCC SS-QCCN RS-QCCN100 bps
CNV 498 498 437 490 501 502 440 492 500 505 419 500NID 510 601 422 500 501 609 429 487 521 598 418 494Ours 500 500 440 494 503 493 403 494 491 486 418 433
200 bpsCNV NID 535 745 469 500 533 762 476 500 536 758 444 501Ours 510 483 452 500 498 487 422 500 505 486 450 500
300 bpsCNV NID 548 746 493 500 563 772 500 500 554 783 505 506Ours 524 497 479 500 528 609 482 500 538 501 466 500
utilized to steganalyze QIM-based steganography in low-bit-rate speech (such as G7231 and G729) Both steganalyticmethods use a support vector machine to predict the exis-tence of hidden message in given audios
In our experiments the sentences chosen from ldquoTIMITrdquodatabases as stated in Section 41 are first encoded using thestandard AMR-WB codec These AMR-WB recordings con-stitute the cover speech setThen secret message is embeddedinto each cover AMR-WB speech with different embeddingrates that is 100 bps 200 bps and 300 bps by CNV-basedNID-based andDN-based steganographyOf course 200 bpsand 300 bps may be omitted for CNV-based steganogra-phy because of its limited embedding capacity And sevenstegospeech sets are generated amongwhich one set is relatedto CNV-based steganographic method and each of three setsis associated with NID-based and DN-based steganographyrespectivelyMoreover only 2385 kbitsmode is usedwithoutloss of generality
In each experiment a pair of cover and stego speech setsis randomly divided into training and testing sets accordingto three kinds of training rates that is 04 05 and 06 For
example if the training rate is 04 the training set contains40 speech samples randomly chosen from each of the coverand stegospeech sets and the remaining 60 samples go intothe testing set As described in [17 18] LIBSVM [22] is usedas a classifier and radial basis function (RBF) kernel and grid-search technique are employed to obtain better classificationperformance For Li et alrsquos steganalytic method the principalcomponent analysis (PCA) is first used as suggested in [18]to reduce the dimension of feature vectors to 300 Let thesamples in cover speech set denote negatives and those instego speech set stand for positives Hence the accuracy maybe defined as follows
Accuracy = 12 times ( TPTP + FN
+ TNFP + TN
) (5)
where TP are true positives TN are true negatives FN arefalse negatives and FP are false positives
The steganalytic results are given in Table 2 It can beseen that when the embedding rate is 100 bps the accuracyof detecting both CNV-based and DN-based methods isalmost the same say 50 or so while that of detecting
Security and Communication Networks 9
0 11 12 13 14 15 21 22 23 24 25 31 32 33 34 35 41 42 43 44 45 51 52 53 54 55Edge
0123456789
10
Cor
relat
ion
inde
x
12
13
14
15
23
24
25
34
35
45
Figure 8 The correlation index of 1000 AMR-WB speeches wherethe interframe edge 119894119894 connects two vertices 119881119894[119896] and 119881119894[119896 + 1] intwo neighboring frames and the intraframe edge 1198941198951015840 connects twovertices 119881119894[119896] and 119881119895[119896] in the same frame
NID-based steganography increases to 60 when MFCC-based steganalytic method is applied Moreover there isan apparent increase in the accuracy of detecting NID-based hiding method with the embedding rate increases to200 bps or 300 bps when Liu et alrsquos methods (ie Markovand MFCC-based steganalytic methods) are applied But theaccuracy of steganalyzing our proposed method DN-basedsteganography stays at the same level of 50 Therefore theproposed method may defend against Liu et alrsquos statisticalsteganalysis [17] even with higher embedding rates
According to the definition of the correlation index givenin [18] the experimental results of the correlation indices of1000 AMR-WB speeches which are randomly selected fromldquoTIMITrdquo are shown in Figure 8 Based on these results twostrong quantization codeword correlation network (QCCN)models say SS-QCCN and RS-QCCN can be constructedas illustrated in Figure 9 These two models are then usedto steganalyze our proposed steganography The steganalyticresults are also presented in Table 2 It can be seen fromTable 2 that the accuracy of both SS-QCCN and RS-QCCNis less than 50 for all of the AMR-WB stegospeeches Thepossible reasons may be that only the second and thirdcodebooks in the second stage are employed in the AMR-WB speech steganography which means merely the vertices1198812[119896] and 1198813[119896] in the 119896th frame may be changed duringsteganography while none of them are utilized in Li et alrsquossteganalytic method except for the edge ldquo33rdquo in RS-QCCNmodel Besides we also used an adapted QCCN model (ieutilize edges ldquo22rdquo ldquo33rdquo and ldquo231015840rdquo) targeted at AMR-WBspeech but the accuracy is still less than 50 It may bebecause the correlation of those edges is not strong enoughfor steganalysis according to Figure 8 Therefore it is reason-able to conclude that theAMR-WB speech steganography candefend against the steganalytic method proposed in [18]
In order to visualize the detection performance wegive some receiver operating characteristic (ROC) curvesof steganalyzing CNV-based steganography with 100 bpsembedding rate and NID-based and DN-based steganogra-phy with 100 bps 200 bps and 300 bps embedding rates are
V1[k]
V2[k]
V3[k]
V4[k]
V5[k]
V1[k + 1]
V2[k + 1]
V3[k + 1]
V4[k + 1]
V5[k + 1]
45 45
11
middot middot middot middot middot middot
(a) SS-QCCN
V1[k + 1]
V2[k + 1]
V3[k + 1]
V4[k + 1]
V5[k + 1]
V1[k]
V2[k]
V3[k]
V4[k]
V5[k]
45 45
15 15
1414
33
44
11
middot middot middot middot middot middot
(b) RS-QCCN
Figure 9 Two AMR-WB strong correlation network models
provided in Figure 10 (ROC curves for SS-QCCN and RS-QCCN are omitted for these two methods fail to steganalyzeAMR-WB steganography in spite of embedding capacity)It shows that all of the three steganographic methods canresist statistical steganalysis when the embedding rate is100 bps While the statistical steganalytic methods especiallyMFCC-based steganalysismay detect the existence of hiddenmessage embedded with NID-based steganography when theembedding rate is above 100 bps the proposed DN-basedsteganography may still have good security against bothMarkov-based and MFCC-based steganalysis
5 Conclusion
The adaptive multirate wideband (AMR-WB) is a widelyadapted format in mobile handsets and is also the recom-mended speech codec for VoLTE AMR-WB speech may bea good candidate for cover medium in speech steganographyIn this paper a novel AMR-WB speech steganographicmethod is proposed The experimental results demonstratedthe effectiveness of our proposed method The main contri-butions of this paper are as follows
(1) A novel AMR-WB speech steganography is pro-posed based on diameter-neighbor codebook parti-tion algorithm It can provide higher capacity with-out noticeable decrease in speech quality and better
10 Security and Communication Networks
DN
CNVNID
0
05
1
True
pos
itive
rate
02 04 06 08 10False positive rate
(a) Markov (TIMIT 100 bps)DN
CNVNID
02 04 06 08 10False positive rate
0
05
1
True
pos
itive
rate
(b) MFCC (TIMIT 100 bps)
NIDDN
0
05
1
True
pos
itive
rate
02 04 06 08 10False positive rate
(c) Markov (TIMIT 200 bps)
NIDDN
0
05
1
True
pos
itive
rate
02 04 06 08 10False positive rate
(d) MFCC (TIMIT 200 bps)
NIDDN
0
05
1
True
pos
itive
rate
02 04 06 08 10False positive rate
(e) Markov (TIMIT 300 bps)
NIDDN
0
05
1
True
pos
itive
rate
02 04 06 08 10False positive rate
(f) MFCC (TIMIT 300 bps)
Figure 10 ROC curves for steganalysis of CNV-based NID-based and our proposed steganography (50 training rate)
performance against statistical steganalysis thanNID-based method
(2) Flexible embedding capacity may be easily achievedwith different iterations of cluster merging Twicethe embedding capacity of CNV-based embeddingmethod may be obtained with119873119894 = 48
Conflicts of Interest
The authors declare that there are no conflicts of interestregarding the publication of this paper
Acknowledgments
This work was partially supported by the National NaturalScience Foundation of China under Grant no 61632013
References
[1] W Bender D Gruhl N Morimoto and A Lu ldquoTechniques fordata hidingrdquo IBM Systems Journal vol 35 no 3-4 pp 313ndash3351996
[2] D Gruhl A Lu and W Bender ldquoEcho hidingrdquo in InformationHiding R Anderson Ed vol 1174 of Lecture Notes in ComputerScience pp 295ndash315 Springer Berlin Heidelberg Berlin Ger-many 1996
[3] K Gopalan ldquoAudio steganography using bit modificationrdquo inProceedings of the 2003 International Conference on Multimediaand Expo ICME 2003 pp I629ndashI632 USA July 2003
[4] K Gopalan S Wenndt S Adams and D Haddad ldquoAudiosteganography by amplitude or phasemodificationrdquo in Proceed-ings of the Security andWatermarking ofMultimedia Contents Vpp 67ndash76 USA January 2003
[5] D Kirovski and H S Malvar ldquoSpread-spectrum watermarkingof audio signalsrdquo IEEE Transactions on Signal Processing vol 51no 4 pp 1020ndash1033 2003
[6] L Liu M Li Q Li and Y Liang ldquoPerceptually transparentinformation hiding in G729 bitstreamrdquo in Proceedings of the2008 4th International Conference on Intelligent InformationHiding andMultiedia Signal Processing IIH-MSP 2008 pp 406ndash409 China August 2008
[7] T Xu and Z Yang ldquoSimple and effective speech steganog-raphy in G7231 low-rate codesrdquo in Proceedings of the 2009
Security and Communication Networks 11
International Conference on Wireless Communications and Sig-nal Processing WCSP 2009 China November 2009
[8] A Shahbazi A H Rezaie and R Shahbazi ldquoMELPe codedspeech hiding on enhanced full rate compressed domainrdquo inProceedings of the Asia Modelling Symposium 2010 4th Inter-national Conference on Mathematical Modelling and ComputerSimulation AMS2010 pp 267ndash270 Malaysia May 2010
[9] A Nishimura ldquoData hiding in pitch delay data of the adaptivemulti-rate narrow-band speech codecrdquo in Proceedings of theIIH-MSP 2009-2009 5th International Conference on IntelligentInformation Hiding and Multimedia Signal Processing pp 483ndash486 Japan September 2009
[10] B Xiao Y Huang and S Tang ldquoAn approach to informationhiding in low bit-rate speech streamrdquo in Proceedings of the2008 IEEE Global Telecommunications Conference GLOBE-COM 2008 pp 1940ndash1944 USA December 2008
[11] B Chen and G W Wornell ldquoQuantization index modulationa class of provably good methods for digital watermarking andinformation embeddingrdquo Institute of Electrical and ElectronicsEngineers Transactions on InformationTheory vol 47 no 4 pp1423ndash1443 2001
[12] Y F Huang S Tang and J Yuan ldquoSteganography in inactiveframes of VoIP streams encoded by source codecrdquo IEEETransactions on Information Forensics and Security vol 6 no2 pp 296ndash306 2011
[13] YHuang C Liu S Tang and S Bai ldquoSteganography integrationinto a low-bit rate speech codecrdquo IEEE Transactions on Informa-tion Forensics and Security vol 7 no 6 pp 1865ndash1875 2012
[14] H Miao L Huang Z Chen W Yang and A Al-Hawbani ldquoAnew scheme for covert communication via 3G encoded speechrdquoComputers and Electrical Engineering vol 38 no 6 pp 1490ndash1501 2012
[15] H Tian J Liu and S Li ldquoImproving security of quantization-index-modulation steganography in low bit-rate speechstreamsrdquoMultimedia Systems vol 20 no 2 pp 143ndash154 2014
[16] J Liu H Tian J Lu and Y Chen ldquoNeighbor-index-divisionsteganography based on QIM method for G7231 speechstreamsrdquo Journal of Ambient Intelligence and Humanized Com-puting vol 7 no 1 pp 139ndash147 2016
[17] Q Liu A H Sung and M Qiao ldquoDerivative-based audiosteganalysisrdquo ACM Transactions on Multimedia ComputingCommunications andApplications (TOMM) vol 7 no 3 articleno 18 2011
[18] S Li Y Jia and C-C J Kuo ldquoSteganalysis of QIM Steganogra-phy in Low-Bit-Rate Speech Signalsrdquo IEEEACM TransactionsonAudio Speech and Language Processing vol 25 no 5 pp 1011ndash1022 2017
[19] ITU-T Wideband Coding of Speech at around 16 Kbps UsingAdaptive Multi-rate Wideband (AMR-WB) International Tele-communication Union Std G7222 2002
[20] Perceptual Evaluation of Speech Quality (PESQ) An ObjectiveMethod for End-to-end Speech Quality Assessment of Narrow-band Telephone Net-works and Speech Codecs InternationalTelecommunication Union Std P862 2001
[21] Wideband Extension to Recommendation P862 for the Assess-ment of Wideband Telephone Networks and Speech CodecsInternational Telecommunication Union Std P8622 2007
[22] C Chang and C Lin ldquoLIBSVM a Library for support vectormachinesrdquo ACM Transactions on Intelligent Systems and Tech-nology vol 2 no 3 article 27 2011
International Journal of
AerospaceEngineeringHindawiwwwhindawicom Volume 2018
RoboticsJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Active and Passive Electronic Components
VLSI Design
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Shock and Vibration
Hindawiwwwhindawicom Volume 2018
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawiwwwhindawicom
Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Control Scienceand Engineering
Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Journal ofEngineeringVolume 2018
SensorsJournal of
Hindawiwwwhindawicom Volume 2018
International Journal of
RotatingMachinery
Hindawiwwwhindawicom Volume 2018
Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Navigation and Observation
International Journal of
Hindawi
wwwhindawicom Volume 2018
Advances in
Multimedia
Submit your manuscripts atwwwhindawicom
2 Security and Communication Networks
data hiding approach to embedding data in enhanced fullrate (EFR) compressed speech bitstream is proposed in [8]In addition Nishimura [9] proposed threemethods of hidingdata in the pitch delay data of the AMR speech
Based on complementary neighbor vertices codebookpartition algorithm (CNV) Xiao et al [10] presented anapproach to information hiding in compressed speech withthe use of quantization index modulation (QIM) [11] Huanget al [12] proposed a steganographic algorithm for embed-ding data in different speech encoding parameters of theinactive frames the embedding capacity of which is boundedby the number of inactive frames in the cover speech In [13]Huang et al also presented a method for steganography inlow bit-rate VoIP streams based on pitch period prediction Itcan achieve high quality of stegospeech and prevent statisticalsteganalysis but the embedding rate is still low (only about1333 bps) And an adaptive suboptimal pulse combinationconstrained (ASOPCC) method was presented in [14] toembed data into compressed speech signal of AMR-WBcodec However most of the PESQ scores in different codingmodes are not high In [15] a key-based codebook partitionstrategy which dynamically determines the adopted divisionscheme was designed to improve the security of the QIMsteganography in speech bitstreamAlthough the stegospeechquality is guaranteed to be good the embedding capacity isvery limited and not adjustable Liu et al [16] proposed aneighbor-index-division codebook division algorithm (NID)for G7231 speech Differing from the existing CNVmethodNID divides neighbor-indexed codewords into separatedsubcodebooks according to a suitable stegocoding strategyThe embedding capacity is improved by using multipledivision and multi-ary coding strategy
The adaptive multirate wideband (AMR-WB) is a widelyadapted format in mobile handsets and is also the recom-mended speech codec for VoLTE AMR-WB speech may bea good candidate for cover medium in audio steganographyTherefore we will focus on AMR-WB speech steganographyin this paper Firstly a new diameter-neighbor (DN) code-book partition algorithm toward AMR-WB speech is pro-posed Based on DN codebook division we develop a novelAMR-WB speech steganography capable of providing flexibleembedding capacity with different iterative parameter 119873119894For example when 119873119894 = 48 twice the embedding capacityof CNV-based method may be obtained with a decrease ofonly about 2 in speech quality and much the same unde-tectability Moreover both the quality of stego speech and thesecurity of defending against statistical steganalysis [17 18]are better than the recent NID-based speech steganography
The remainder of this paper is organized as follows InSection 2 the related work is briefly introduced In Section 3the proposedDNcodebook partition algorithm and the novelAMR-WB speech steganography are described in detail Theexperimental results and analysis are provided in Section 4Finally conclusions are presented
2 Related Work
In this section a technical overview of AMR-WB codecis firstly presented Then two related codebook partition
algorithms CNV [10] and NID [16] are also briefly re-viewed
21 AMR-WB Codec The AMR-WB speech codec is stan-dardized by 3GPP (3rd Generation Partnership Project) andadopted as the standard G7222 by ITU-T in 2002 [19] It is amultirate wideband speech codec applied in modern mobilecommunication systems to remarkably improve the speechquality The AMR-WB codec operates at a multitude of bitrates ranging from 66 kbits to 2385 kbits
The input audio signal is separated into 20ms long frameusing 16 kHz sampling rate Every frame contains a linear pre-diction analysis (LPA) and the LP coefficients are converted toimmittance spectrum pairs (ISP) coefficients ISP coefficientsare then converted to frequency domain (ISF) for quantiza-tion Except for mode 0 (66 kbits) the ISF coefficients arequantized using two-stage vector quantization with split-by-2 in first stage and split-by-5 in the second stage Both thesecond and the third codebooks in the second stage have 128codewords and the ISF indices of the codewords in thesecodebooks may be employed to embed secret message
In the decoder the transmitted indices are first parsedfrom the received bitstream and then decoded to obtain thecode parameters for each transmitted frame such as the ISPvector the 4 fractional pitch lags the 4 LTP filtering parame-ters the 4 innovative code vectors and the 4 sets of vectorquantized pitch and innovative gains For a more detaileddescription one should refer to [19] From the received ISFindices which may have been modified because of secretmessage embedding the receiver can recover the embeddedsecret message
22 Complementary Neighbor Vertices CNV is a new typeof codebook partition algorithm proposed in [10] in whicheach codeword in a codebook is viewed as a vertex inthe multidimensional space The relationship between twocodewords 119883 and 119884 is described as an edge connecting thetwo codewordsrsquo vertices And the weight of an edge is definedas the Euclidean distance 119863(119883119884) between two codewords 119883and 119884 Small value of 119863(119883119884) indicates that 119883 and 119884 bear aclose resemblance to each other The vertex nearest to 119883 isreferred to as119883rsquos neighbor vertex which is denoted by119873(119883)The vertex set 119881 together with the edge set 119864 form a graph119866(119881 119864) in a multidimensional space
The codebook partition is realized by the constructionof the graph 119866(119881 119864) and vertex labelling First each vertex119883 in 119866(119881 119864) is connected with its neighbor vertex 119873(119883)using an edge Thus the graph 119866(119881 119864) would be divided intoseveral isolated subgraphs each of whichmay be proved to beacyclic and 2-colorable Second every vertex and its neighborvertex in a subgraph are labelled oppositely using ldquo0rdquo or ldquo1rdquoThird all of the vertices with same label are collected into asubcodebook hence two subcodebooks will be obtained
Based on the generated subgraphs and the label assignedto each codeword in themCNV-based steganography appliesQIM concept to embed secret message More specificallywhen the label of the codeword 119883 which is associatedwith the cover quantization index 119868119886 agrees with the secretmessage 119868119886 remains unchanged or else it should be replaced
Security and Communication Networks 3
AMR-WBspeech
AMR-WBspeechIndex
parse
Secret Codebooksmessage
Partition
StegoAMR-WB
speech
speech
Stego
Cluster set
Stego ISFindices
ISFindicesindices
stego ISFIndexparse
Public SecretmessageEmbed Index
update channelExtract
AMR-WBDecoder
Decoded
Figure 1 Diagram of the proposed method
with the quantization index of the neighbor codeword119873(119883)which belongs to the opposite subcodebook
The key characteristic of CNV-based steganography isthat the distortion is bound even in the worst case How-ever the embedding capacity is limited which is analyzedexperimentally in Section 4 Moreover the number of pos-sible combinations of flipping coefficients which determinewhether the labels in a subgraph will be flipped is large Extrainformation about the flipping processmust be transmitted tothe receiver and thus the effective embedding capacitymay bedecreased further
23 Neighbor Index Division NID assumes that the code-words of neighbor indices (ie neighbor positions) in acodebook would be close together Hence the codewordsin a codebook can be easily separated into subcodebooksaccording to their indices instead of the Euclidean distanceSpecifically select an appropriate integer 119896 according to thedemand for embedding capacity and label the 119894th codewordwith digit (119894 minus 1) mod 119896 respectively Then collect all thecodewords with same label into a subcodebook and obtain119896 different subcodebooks
In order to take full use of the embedding capacitythe binary secret message should be transformed into 119896-ary digits denoted by 119898 (119898 isin 0 1 119896 minus 1) When thecodeword related to the cover quantization index belongs tothe subcodebook whose label differs from the 119896-ary digit 119898to be embedded this index should be substituted with that ofthe closest codeword in the corresponding subcodebook119898
NID-based steganography is an information hidingmethod based on neighbor-index codebook partition ofwhich the embedding capacity may be controlled by thenumber of subcodebooks 119896 However as illustrated in [16]only about 34 of the pairs of neighbor-index codewordshappened to be the pairs of neighbor-vertex codewords Andthe mean distance between neighbor-index codewords isapparently larger than that of neighbor-vertex codewordsTherefore the amount of distortion induced by NID-basedsteganography may be a little large which is proved by theexperimental results provided in Section 4
3 Proposed Method
The diagram of the proposed method is shown in Figure 1Based on DN codebook partition of the codebooks described
in Section 21 secret message can be embedded into anAMR-WB speech file After the stego AMR-WB speech fileis received the embedded secret message can be extractedwithout errors At the same time the decoded speech withoutperceptible distortion will also be obtained In the followingsection the diameter-neighbor codebook partition algorithm(DN) is first introducedThen the embedding and extractionprocedure of our proposed method are described
31 Codebook Partition A codebook may be viewed as alist of isolated code vectors (ie codewords) in the multidi-mensional space The codebook partition algorithm used foraudio steganography is to divide the codebook into severalclusters in each of which the codewords can be replaced witheach other without causing perceptible distortion
Let 119861 denote the original codebook with 119873119887 code-words and 119862 denote a cluster with 119873119888 codewords 119882119905 (119905 =1 2 119873119888) and the centroid 119866 of a cluster 119862 is defined asfollows
119866 (119894) = 1119873119888119873119888sum119905=1
119882119905 (119894) (1)
where 119866(119894) and 119882119905(119894) are the 119894th components of 119866 and 119882119905respectively
The centroid 119866 (average code vector) is used to representthe corresponding cluster 119862 hence the cluster 119862 may alsobe considered as a vector in the multidimensional codebookspace In order to describe the similarity between two clusters1198621 and1198622 the Euclidean distance between them is defined asfollows
119863(1198621 1198622) = radic 119899sum119894=1
(1198661 (119894) minus 1198662 (119894))2 (2)
where 1198661 and 1198662 are the corresponding geometric centerpoints of the two clusters 1198621 and 1198622 And 119899 is the dimensionof a codeword 1198661(119894) and 1198662(119894) are the 119894th components of 1198661and 1198662 respectively
Let 119878 denote a cluster set The diameter of 119878 is defined asthe maximal Euclidean distance119863119898 of all cluster pairs in thecluster set 119878 that is
119863(119862119901 119862119902) le 119863119898 forall119901 119902 = 1 2 |119878| (3)
4 Security and Communication Networks
Codebook B
Initialize a cluster set Sby taking each codewordas a independent clusterand a empty cluster set S
Cluster set S
Put the remainingclusters in S into S
No
Yes
Put the clustersin S into S tomake S empty
S is empty
No
Yes
Search for the diametercluster pair(Cd1 Cd2) in S
Remove Cd1 Cd2 and theirneighbors from S put
Merge Cd1 Cd2 withtheir neighbors respectivelyinto two new clusters
Ni gt 0
Ni = Ni minus 1
into STemp1 and Temp2Temp1 and Temp2
Figure 2 Diagram of our proposed codebook partition
where |119878| is the number of clusters within the cluster set119878 The cluster pair with maximal Euclidean distance 119863119898called diameter cluster pair is denoted by (1198621198891 1198621198892) And theneighbor of a cluster119862 in 119878 is represented by119873(119862 119878) then wehave
119863 (119862119873 (119862 119878)) le 119863 (119862 119862119901) forall119901 = 1 2 |119878| (4)
Figure 2 illustrates the diagram of the proposed DNcodebook partition algorithm And its detailed procedureis given in Algorithm 1 The original codebook 119861 will bedivided into |119878| clusters by iteratively merging the diametercluster pair with their respective neighbors An iterationparameter119873119894 is applied to obtain flexible embedding capacitythrough controlling the merging procedure The relationshipbetween119873119894 and the embedding capacity will be discussed inSection 43
Figure 3 is provided as an example to illustrate theproposed codebook partition algorithmThe white circle ldquoIrdquodenotes a codeword And the oval ldquordquo with shadow denotesa codeword and its neighbor in 119878 being processed whilethe oval ldquordquo without shadow represents a cluster in 1198781015840 thathas been formed The ldquo0rdquo ldquo1rdquo ldquo00rdquo ldquo01rdquo ldquo10rdquo or ldquo11rdquo ina circle ldquoIrdquo is the label of a codeword in the cluster Thecross ldquotimesrdquo means the centroid of the cluster it belongs to anda line ldquominusrdquo represents the diameter of a cluster set The firstto third merging iterations are shown in Figures 3(a)ndash3(c)respectively The fourth merging iteration is comprised of
Figures 3(d) and 3(e) and Figure 3(f) demonstrates thelabelling of the codewords
32 Embedding Procedure In our proposed method the ISFindices corresponding to the codewords in the codebook arefirst obtained by parsing the host AMR-WB speechThen theISF indices are employed to embed secret message based oncodebook partition Generally the codewords in the samecluster as the codeword referred by 119868119886 lies in are consideredto be replaceable with each other According to the secretmessage to be embedded 119868119886 may be substituted by one of theother codewordsrsquo indiceswithin the same clusterThenumberof secret message bits that can be embedded depends on thesize of the specific cluster The embedding procedures aregiven in the following
Step 1 Search cluster set 119878 for the cluster 119862 which containsthe codeword referred by the ISF index 119868119886Step 2 If there are 119873 codewords in 119862 the number of secretbits that can be embedded into 119868119886 is calculated as 119899 = lfloorlog2119873rfloorStep 3 Read 119899 not-yet-embedded bits denoted by 119898 fromthe secret message 119868119886 is replaced with 119868119887 which indexes thecodeword with the same label as119898
Step 4 Repeat Steps 1ndash3 until all the secret bits are embedded
Security and Communication Networks 5
(a) 1st iteration (119873119894 = 4) (b) 2nd iteration (119873119894 = 3) (c) 3rd iteration (119873119894 = 2)
(d) 4th iteration (119878 = 1198781015840 1198781015840clear()) (e) 4th iteration (119873119894 = 1)
0
0
1
1
11
11
00
0001 01
1010
(f) Labelling
Figure 3 An example of our proposed codebook partition
Input Codebook 119861 iterative parameter119873119894Output Cluster set 119878⋆ 1198781015840 is a helper cluster set ⋆1198781015840clear()119878clear()⋆ Each codeword is taken as a initial cluster ⋆for 119894 = 0 119894 lt 119873119888 ++119894 do119878push (119862119894)end⋆ Iterative merging ⋆while 119873119894 gt 0 do
if 119878 is empty then119878 = 11987810158401198781015840clear()end(1198621198891 1198621198891) = argmax119894119895isin12|119878|119863(119862119894 119862119895)1198791198901198981199011 = 1198621198891 cup 119873(1198621198891 119878)1198791198901198981199012 = 1198621198892 cup 119873(1198621198892 119878)1198781015840push (1198791198901198981199011)1198781015840push (1198791198901198981199012)119878remove (1198621198891)119878remove (1198621198892)119878remove (119873(1198621198891 119878))119878remove (119873(1198621198892 119878))119873119894 = 119873119894 minus 1
end⋆ Put the remaining clusters in 1198781015840 into 119878 ⋆for iter = 1198781015840begin() iter lt 1198781015840end() ++iterdo119878push (lowastiter)
endreturn 119878
Algorithm 1 DN-based codebook partition algorithm
Cluster set S
ISF
Stego ISF
Secret
00 01
1110
0 1
Search amp replace
ClusteL1
ClusteL2
index Ia
index Ib
Wa Wb
WcWd
bits ldquo01rdquo
Ia Ib
Figure 4 Embedding two bits into one cover ISF index
Figure 4 is an example of embedding two secret bits intoone cover ISF index Let us assume the cluster set 119878 containstwo clusters and the corresponding codeword indexed by 119868119909is119882119909 for example 119868119887 indexes the codeword119882119887 Hence theISF index 119868119886 shown in Figure 4 will be replaced with 119868119887 whichindexes the codeword119882119887 with the same label as the secret bitsldquo01rdquo
33 Extracting Procedure When the stego AMR-WB speechis transferred to the intended receiver the stego indices maybe obtained by parsing AMR-WB speech stream and used toextract the embedded secretmessageThemessage extractionprocedures from the stegoindex 119868119887 are given below
Step 1 Search cluster set 119878 which is the same as that employedin the embedding procedure for the cluster119862which containsthe codeword119882119887 referred by the ISF index 119868119887Step 2 If there are totally 119873 codewords in 119862 the number ofsecret bits carried by 119868119887 is computed by 119899 = lfloorlog2119873rfloor
6 Security and Communication Networks
Stego ISF
Cluster set S
Extracted
00 01
1110
0 1
Search amp read
ClusteL1
ClusteL2
index Ib
Wa Wb
Wc Wd
bits ldquo01rdquo
ldquo01rdquo
Ib
Figure 5 Extracting two bits from one stego-ISF index
Step 3 Read the label of119882119887 as the extracted 119899 bits which areappended to the secret message bit sequence
Step 4 Repeat Steps 1ndash3 until all the secret bits are recovered
Figure 5 is the corresponding example of extracting twosecret bits from the stegoindex 119868119887 generated by the previousembedding instance shown in Figure 4 It can be easily seenthat the extracted secret bits are identical to the embeddedsecret bits
4 Experimental Results and Analysis
In order to demonstrate the performance of the proposedmethod the perceptual quality of the stego AMR-WB speechwith secret message embedded using our method is com-puted and compared to that of the stego AMR-WB speechgenerated with CNV and NID steganography Moreover theflexibility of embedding capacity and the security regardingstatistical detection are analyzed in detail
41 Audio Database TIMIT acoustic-phonetic continuousspeech corpus (httpscatalogldcupenneduldc93s1) is anaudio database which contains broadband recordings of630 speakers of eight major dialects of American Englisheach reading ten phonetically rich sentences and all audiosentences are sampled at 16 kHz In our experiments 1000audio sentences are randomly chosen from TIMIT databaseThe average maximum and minimum length of the chosenaudio sentences are 347 s 396 s and 312 s All audio files areconverted into AMR-WB format using standard codec
42 Speech Quality Evaluation The perceptual evaluation ofspeech quality (PESQ) described in the ITU-T P862 Recom-mendation [20] may be employed to evaluate speech qualityMoreover according to ITU-T P8622 [21] the raw PESQscore can be converted to mean opinion score-listening qual-ity objective (MOS-LQO) which is more suitable for evalu-ating wideband speech Hence MOS-LQO is applied in ourexperimentsThe normal range ofMOS-LQO score is 1017 to4549 The higher the score the better the quality
Figure 6 shows the MOS-LQO scores of the 1000 coverAMR-WB speeches in 2385 kbits mode and the correspond-ing stego AMR-WB speeches using three different codebookpartition algorithmsThree progressive embedding rates that
StandardCNVNID
Ours
100 200 300 400 500 600 700 800 900 10000Sample index
18222630343842
MO
S-LQ
O sc
ore
(a) The embedding rate is 100 bps
StandardNIDOurs
18222630343842
MO
S-LQ
O sc
ore
100 200 300 400 500 600 700 800 900 10000Sample index
(b) The embedding rate is 200 bps
StandardNIDOurs
100 200 300 400 500 600 700 800 900 10000Sample index
1418222630343842
MO
S-LQ
O sc
ore
(c) The embedding rate is 300 bps
Figure 6 Comparisons of MOS-LQO values for 1000 samplesbetween the standard AMR-WB codec CNV-based steganographyNID-based steganography and the proposedDN-based steganogra-phy
is 100 bps 200 bps and 300 bps are employed in our experi-ments The indices of speech samples are sorted according totheMOS-LQO scores of our proposedmethod It can be seenfrom Figure 6 that the overall scores of the stego AMR-WBspeeches generated with our method are higher than thoseof the NID-based stego AMR-WB speeches especially whenthe embedding rates are 200 bps and 300 bps And the MOS-LQO scores of the CNV-based stego AMR-WB speeches areslightly higher than ours when the embedding rate is 100 bpswhich means there are no obvious discrepancies in speechquality between them Besides when the high embeddingrate that is 200 bps or 300 bps is used the decrease inMOS-LQO scores of our stego AMR-WB speeches is significantlysmaller than that of NID-based steganography
Moreover the average MOS-LQO scores of the coverAMR-WB speeches and the stego AMR-WB speeches withthree different codebook partition algorithms that is CNV
Security and Communication Networks 7
Table 1 MOS-LQO scores of the standard codec CNV-based NID-based and our proposed steganography in four different rate modes andthree embedding rates
Embedding rate Method Rate mode (kbits)1265 1585 1985 2385
Standard 2929 3073 3199 3269
100 bps
CNV 2871 3021 3153 3225(minus20) (minus17) (minus14) (minus13)NID 2750 2895 3020 3091(minus61) (minus58) (minus56) (minus54)Ours 2864 3010 3139 3216(minus22) (minus20) (minus19) (minus16)
200 bps
CNV
NID 2601 2736 2875 2921(minus112) (minus110) (minus107) (minus106)Ours 2807 2955 3084 3164(minus42) (minus38) (minus36) (minus32)
300 bps
CNV
NID 2284 2386 2475 2533(minus220) (minus223) (minus226) (minus225)Ours 2699 2841 2971 3046(minus79) (minus75) (minus71) (minus68)
NID and DN including four rate modes (1265 kbits1585 kbits 1985 kbits and 2385 kbits) together with threekinds of embedding rate (100 bps 200 bps and 300 bps) aregiven in Table 1 Only the MOS-LQO scores of NID-basedand DN-based steganographic methods with embeddingrates 200 bps and 300 bps are given in Table 1 because theembedding capacity of CNV-based steganography may notbe larger than 100 bps
When the embedding rate is 100 bps which is almostthe limit of CNV steganography we can see from Table 1that the mean MOS-LQO scores of our proposed methodare only about 03 worse than CNV-based steganographyThe slight decrease may be almost imperceptible by humanauditory system (HAS) And there are significant increases ofapproximately 38 in the meanMOS-LQO scores when ourpresented method is compared to NID-based steganographyAnd it can be observed that when the embedding rates are200 bps and 300 bps the scores of our approach are improvedby about 7 and 15 correspondingly in contrast to those ofNID-based steganography
Furthermore we can also see that the experimentalresults of four rate modes are analogous The decrease ofspeech quality caused by NID-based steganography is morethan twice that caused by DN-based steganography And theproposedmethod can obtain twice the embedding capacity ofCNV-based steganography by sacrificing less than 2 speechquality in average In addition only a slight decline in speechquality is observed when 300 bps embedding rate is used inthe proposed DN-based method while 200 bps is employedin NID-based method
43 Flexible Embedding Capacity Compared to CNV-basedsteganography flexible embedding capacity may be obtained
to satisfy different practical demand with our proposedmethod The steganographic capacity can be adjusted bychanging the iteration parameter 119873119894 For different values of119873119894 for example 119873119894 = 32 33 54 the average embeddingcapacity and the MOS-LQO scores are given in Figure 7(a)and the corresponding results of NID-based steganographyare provided in Figure 7(b) for comparison Without loss ofgenerality only 2385 kbits mode is used
From Figure 7 we can observe that the embedding ratesignificantly increases with 119873119894 while the MOS-LQO scoreslightly goes down However as NID-based steganographyis concerned the MOS-LQO score rapidly declines with theincrease of the embedding rateTherefore the proposed DN-based steganography can achieve higher embedding capacitywith slight decrease in speech quality For example when119873119894 = 48 the size of each cluster in 119878 is equal to 4 and we canembed 4 bits per frame that is the embedding rate is 200 bpsbut at the same time the CNV algorithm can embed at most2 bits per frame (100 bps)
44 Resistibility of Statistical Steganalysis Speech steganog-raphy aims to hide secret message into cover speech withoutarousing suspicion It is very important for a steganographicmethod to resist statistical steganalysis which is the tech-nique of detecting the presence of hidden message Twostate-of-the-art steganalytic methods [17 18] are used toevaluate the performance of statistical undetectability of ourproposed method In [17] mel-cepstrum coefficients andMarkov transition features from the second-order derivativeof the audio signal are extracted to capture the statisticaldistortions caused by audio steganography while in [18]the correlation characteristics of split vector quantizationcodewords of linear predictive coding filter coefficients are
8 Security and Communication Networks
Embe
ddin
g ra
te (b
ps)
Embedding rateMOS-LQO
330
290
250
210
170
130
90
Times of cluster merging
33
31
29
27
25
23
MO
S-LQ
O sc
ore
565350474441383532
(a) Our proposed steganography
Number of sub-codebooks
Embedding rateMOS-LQO
109876543290
130
170
210
250
290
330
Embe
ddin
g ra
te (b
ps)
23
25
27
29
31
33
MO
S-LQ
O sc
ore
(b) NID-based steganography
Figure 7 Relationship between the embedding rates and the MOS-LQO scores for our proposed steganography and NID-basedsteganography
Table 2 Steganalysis results of different steganographic methods in 2385 kbitss mode
Training rate 04 05 06Method Markov MFCC SS-QCCN RS-QCCN Markov MFCC SS-QCCN RS-QCCN Markov MFCC SS-QCCN RS-QCCN100 bps
CNV 498 498 437 490 501 502 440 492 500 505 419 500NID 510 601 422 500 501 609 429 487 521 598 418 494Ours 500 500 440 494 503 493 403 494 491 486 418 433
200 bpsCNV NID 535 745 469 500 533 762 476 500 536 758 444 501Ours 510 483 452 500 498 487 422 500 505 486 450 500
300 bpsCNV NID 548 746 493 500 563 772 500 500 554 783 505 506Ours 524 497 479 500 528 609 482 500 538 501 466 500
utilized to steganalyze QIM-based steganography in low-bit-rate speech (such as G7231 and G729) Both steganalyticmethods use a support vector machine to predict the exis-tence of hidden message in given audios
In our experiments the sentences chosen from ldquoTIMITrdquodatabases as stated in Section 41 are first encoded using thestandard AMR-WB codec These AMR-WB recordings con-stitute the cover speech setThen secret message is embeddedinto each cover AMR-WB speech with different embeddingrates that is 100 bps 200 bps and 300 bps by CNV-basedNID-based andDN-based steganographyOf course 200 bpsand 300 bps may be omitted for CNV-based steganogra-phy because of its limited embedding capacity And sevenstegospeech sets are generated amongwhich one set is relatedto CNV-based steganographic method and each of three setsis associated with NID-based and DN-based steganographyrespectivelyMoreover only 2385 kbitsmode is usedwithoutloss of generality
In each experiment a pair of cover and stego speech setsis randomly divided into training and testing sets accordingto three kinds of training rates that is 04 05 and 06 For
example if the training rate is 04 the training set contains40 speech samples randomly chosen from each of the coverand stegospeech sets and the remaining 60 samples go intothe testing set As described in [17 18] LIBSVM [22] is usedas a classifier and radial basis function (RBF) kernel and grid-search technique are employed to obtain better classificationperformance For Li et alrsquos steganalytic method the principalcomponent analysis (PCA) is first used as suggested in [18]to reduce the dimension of feature vectors to 300 Let thesamples in cover speech set denote negatives and those instego speech set stand for positives Hence the accuracy maybe defined as follows
Accuracy = 12 times ( TPTP + FN
+ TNFP + TN
) (5)
where TP are true positives TN are true negatives FN arefalse negatives and FP are false positives
The steganalytic results are given in Table 2 It can beseen that when the embedding rate is 100 bps the accuracyof detecting both CNV-based and DN-based methods isalmost the same say 50 or so while that of detecting
Security and Communication Networks 9
0 11 12 13 14 15 21 22 23 24 25 31 32 33 34 35 41 42 43 44 45 51 52 53 54 55Edge
0123456789
10
Cor
relat
ion
inde
x
12
13
14
15
23
24
25
34
35
45
Figure 8 The correlation index of 1000 AMR-WB speeches wherethe interframe edge 119894119894 connects two vertices 119881119894[119896] and 119881119894[119896 + 1] intwo neighboring frames and the intraframe edge 1198941198951015840 connects twovertices 119881119894[119896] and 119881119895[119896] in the same frame
NID-based steganography increases to 60 when MFCC-based steganalytic method is applied Moreover there isan apparent increase in the accuracy of detecting NID-based hiding method with the embedding rate increases to200 bps or 300 bps when Liu et alrsquos methods (ie Markovand MFCC-based steganalytic methods) are applied But theaccuracy of steganalyzing our proposed method DN-basedsteganography stays at the same level of 50 Therefore theproposed method may defend against Liu et alrsquos statisticalsteganalysis [17] even with higher embedding rates
According to the definition of the correlation index givenin [18] the experimental results of the correlation indices of1000 AMR-WB speeches which are randomly selected fromldquoTIMITrdquo are shown in Figure 8 Based on these results twostrong quantization codeword correlation network (QCCN)models say SS-QCCN and RS-QCCN can be constructedas illustrated in Figure 9 These two models are then usedto steganalyze our proposed steganography The steganalyticresults are also presented in Table 2 It can be seen fromTable 2 that the accuracy of both SS-QCCN and RS-QCCNis less than 50 for all of the AMR-WB stegospeeches Thepossible reasons may be that only the second and thirdcodebooks in the second stage are employed in the AMR-WB speech steganography which means merely the vertices1198812[119896] and 1198813[119896] in the 119896th frame may be changed duringsteganography while none of them are utilized in Li et alrsquossteganalytic method except for the edge ldquo33rdquo in RS-QCCNmodel Besides we also used an adapted QCCN model (ieutilize edges ldquo22rdquo ldquo33rdquo and ldquo231015840rdquo) targeted at AMR-WBspeech but the accuracy is still less than 50 It may bebecause the correlation of those edges is not strong enoughfor steganalysis according to Figure 8 Therefore it is reason-able to conclude that theAMR-WB speech steganography candefend against the steganalytic method proposed in [18]
In order to visualize the detection performance wegive some receiver operating characteristic (ROC) curvesof steganalyzing CNV-based steganography with 100 bpsembedding rate and NID-based and DN-based steganogra-phy with 100 bps 200 bps and 300 bps embedding rates are
V1[k]
V2[k]
V3[k]
V4[k]
V5[k]
V1[k + 1]
V2[k + 1]
V3[k + 1]
V4[k + 1]
V5[k + 1]
45 45
11
middot middot middot middot middot middot
(a) SS-QCCN
V1[k + 1]
V2[k + 1]
V3[k + 1]
V4[k + 1]
V5[k + 1]
V1[k]
V2[k]
V3[k]
V4[k]
V5[k]
45 45
15 15
1414
33
44
11
middot middot middot middot middot middot
(b) RS-QCCN
Figure 9 Two AMR-WB strong correlation network models
provided in Figure 10 (ROC curves for SS-QCCN and RS-QCCN are omitted for these two methods fail to steganalyzeAMR-WB steganography in spite of embedding capacity)It shows that all of the three steganographic methods canresist statistical steganalysis when the embedding rate is100 bps While the statistical steganalytic methods especiallyMFCC-based steganalysismay detect the existence of hiddenmessage embedded with NID-based steganography when theembedding rate is above 100 bps the proposed DN-basedsteganography may still have good security against bothMarkov-based and MFCC-based steganalysis
5 Conclusion
The adaptive multirate wideband (AMR-WB) is a widelyadapted format in mobile handsets and is also the recom-mended speech codec for VoLTE AMR-WB speech may bea good candidate for cover medium in speech steganographyIn this paper a novel AMR-WB speech steganographicmethod is proposed The experimental results demonstratedthe effectiveness of our proposed method The main contri-butions of this paper are as follows
(1) A novel AMR-WB speech steganography is pro-posed based on diameter-neighbor codebook parti-tion algorithm It can provide higher capacity with-out noticeable decrease in speech quality and better
10 Security and Communication Networks
DN
CNVNID
0
05
1
True
pos
itive
rate
02 04 06 08 10False positive rate
(a) Markov (TIMIT 100 bps)DN
CNVNID
02 04 06 08 10False positive rate
0
05
1
True
pos
itive
rate
(b) MFCC (TIMIT 100 bps)
NIDDN
0
05
1
True
pos
itive
rate
02 04 06 08 10False positive rate
(c) Markov (TIMIT 200 bps)
NIDDN
0
05
1
True
pos
itive
rate
02 04 06 08 10False positive rate
(d) MFCC (TIMIT 200 bps)
NIDDN
0
05
1
True
pos
itive
rate
02 04 06 08 10False positive rate
(e) Markov (TIMIT 300 bps)
NIDDN
0
05
1
True
pos
itive
rate
02 04 06 08 10False positive rate
(f) MFCC (TIMIT 300 bps)
Figure 10 ROC curves for steganalysis of CNV-based NID-based and our proposed steganography (50 training rate)
performance against statistical steganalysis thanNID-based method
(2) Flexible embedding capacity may be easily achievedwith different iterations of cluster merging Twicethe embedding capacity of CNV-based embeddingmethod may be obtained with119873119894 = 48
Conflicts of Interest
The authors declare that there are no conflicts of interestregarding the publication of this paper
Acknowledgments
This work was partially supported by the National NaturalScience Foundation of China under Grant no 61632013
References
[1] W Bender D Gruhl N Morimoto and A Lu ldquoTechniques fordata hidingrdquo IBM Systems Journal vol 35 no 3-4 pp 313ndash3351996
[2] D Gruhl A Lu and W Bender ldquoEcho hidingrdquo in InformationHiding R Anderson Ed vol 1174 of Lecture Notes in ComputerScience pp 295ndash315 Springer Berlin Heidelberg Berlin Ger-many 1996
[3] K Gopalan ldquoAudio steganography using bit modificationrdquo inProceedings of the 2003 International Conference on Multimediaand Expo ICME 2003 pp I629ndashI632 USA July 2003
[4] K Gopalan S Wenndt S Adams and D Haddad ldquoAudiosteganography by amplitude or phasemodificationrdquo in Proceed-ings of the Security andWatermarking ofMultimedia Contents Vpp 67ndash76 USA January 2003
[5] D Kirovski and H S Malvar ldquoSpread-spectrum watermarkingof audio signalsrdquo IEEE Transactions on Signal Processing vol 51no 4 pp 1020ndash1033 2003
[6] L Liu M Li Q Li and Y Liang ldquoPerceptually transparentinformation hiding in G729 bitstreamrdquo in Proceedings of the2008 4th International Conference on Intelligent InformationHiding andMultiedia Signal Processing IIH-MSP 2008 pp 406ndash409 China August 2008
[7] T Xu and Z Yang ldquoSimple and effective speech steganog-raphy in G7231 low-rate codesrdquo in Proceedings of the 2009
Security and Communication Networks 11
International Conference on Wireless Communications and Sig-nal Processing WCSP 2009 China November 2009
[8] A Shahbazi A H Rezaie and R Shahbazi ldquoMELPe codedspeech hiding on enhanced full rate compressed domainrdquo inProceedings of the Asia Modelling Symposium 2010 4th Inter-national Conference on Mathematical Modelling and ComputerSimulation AMS2010 pp 267ndash270 Malaysia May 2010
[9] A Nishimura ldquoData hiding in pitch delay data of the adaptivemulti-rate narrow-band speech codecrdquo in Proceedings of theIIH-MSP 2009-2009 5th International Conference on IntelligentInformation Hiding and Multimedia Signal Processing pp 483ndash486 Japan September 2009
[10] B Xiao Y Huang and S Tang ldquoAn approach to informationhiding in low bit-rate speech streamrdquo in Proceedings of the2008 IEEE Global Telecommunications Conference GLOBE-COM 2008 pp 1940ndash1944 USA December 2008
[11] B Chen and G W Wornell ldquoQuantization index modulationa class of provably good methods for digital watermarking andinformation embeddingrdquo Institute of Electrical and ElectronicsEngineers Transactions on InformationTheory vol 47 no 4 pp1423ndash1443 2001
[12] Y F Huang S Tang and J Yuan ldquoSteganography in inactiveframes of VoIP streams encoded by source codecrdquo IEEETransactions on Information Forensics and Security vol 6 no2 pp 296ndash306 2011
[13] YHuang C Liu S Tang and S Bai ldquoSteganography integrationinto a low-bit rate speech codecrdquo IEEE Transactions on Informa-tion Forensics and Security vol 7 no 6 pp 1865ndash1875 2012
[14] H Miao L Huang Z Chen W Yang and A Al-Hawbani ldquoAnew scheme for covert communication via 3G encoded speechrdquoComputers and Electrical Engineering vol 38 no 6 pp 1490ndash1501 2012
[15] H Tian J Liu and S Li ldquoImproving security of quantization-index-modulation steganography in low bit-rate speechstreamsrdquoMultimedia Systems vol 20 no 2 pp 143ndash154 2014
[16] J Liu H Tian J Lu and Y Chen ldquoNeighbor-index-divisionsteganography based on QIM method for G7231 speechstreamsrdquo Journal of Ambient Intelligence and Humanized Com-puting vol 7 no 1 pp 139ndash147 2016
[17] Q Liu A H Sung and M Qiao ldquoDerivative-based audiosteganalysisrdquo ACM Transactions on Multimedia ComputingCommunications andApplications (TOMM) vol 7 no 3 articleno 18 2011
[18] S Li Y Jia and C-C J Kuo ldquoSteganalysis of QIM Steganogra-phy in Low-Bit-Rate Speech Signalsrdquo IEEEACM TransactionsonAudio Speech and Language Processing vol 25 no 5 pp 1011ndash1022 2017
[19] ITU-T Wideband Coding of Speech at around 16 Kbps UsingAdaptive Multi-rate Wideband (AMR-WB) International Tele-communication Union Std G7222 2002
[20] Perceptual Evaluation of Speech Quality (PESQ) An ObjectiveMethod for End-to-end Speech Quality Assessment of Narrow-band Telephone Net-works and Speech Codecs InternationalTelecommunication Union Std P862 2001
[21] Wideband Extension to Recommendation P862 for the Assess-ment of Wideband Telephone Networks and Speech CodecsInternational Telecommunication Union Std P8622 2007
[22] C Chang and C Lin ldquoLIBSVM a Library for support vectormachinesrdquo ACM Transactions on Intelligent Systems and Tech-nology vol 2 no 3 article 27 2011
International Journal of
AerospaceEngineeringHindawiwwwhindawicom Volume 2018
RoboticsJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Active and Passive Electronic Components
VLSI Design
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Shock and Vibration
Hindawiwwwhindawicom Volume 2018
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawiwwwhindawicom
Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Control Scienceand Engineering
Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Journal ofEngineeringVolume 2018
SensorsJournal of
Hindawiwwwhindawicom Volume 2018
International Journal of
RotatingMachinery
Hindawiwwwhindawicom Volume 2018
Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Navigation and Observation
International Journal of
Hindawi
wwwhindawicom Volume 2018
Advances in
Multimedia
Submit your manuscripts atwwwhindawicom
Security and Communication Networks 3
AMR-WBspeech
AMR-WBspeechIndex
parse
Secret Codebooksmessage
Partition
StegoAMR-WB
speech
speech
Stego
Cluster set
Stego ISFindices
ISFindicesindices
stego ISFIndexparse
Public SecretmessageEmbed Index
update channelExtract
AMR-WBDecoder
Decoded
Figure 1 Diagram of the proposed method
with the quantization index of the neighbor codeword119873(119883)which belongs to the opposite subcodebook
The key characteristic of CNV-based steganography isthat the distortion is bound even in the worst case How-ever the embedding capacity is limited which is analyzedexperimentally in Section 4 Moreover the number of pos-sible combinations of flipping coefficients which determinewhether the labels in a subgraph will be flipped is large Extrainformation about the flipping processmust be transmitted tothe receiver and thus the effective embedding capacitymay bedecreased further
23 Neighbor Index Division NID assumes that the code-words of neighbor indices (ie neighbor positions) in acodebook would be close together Hence the codewordsin a codebook can be easily separated into subcodebooksaccording to their indices instead of the Euclidean distanceSpecifically select an appropriate integer 119896 according to thedemand for embedding capacity and label the 119894th codewordwith digit (119894 minus 1) mod 119896 respectively Then collect all thecodewords with same label into a subcodebook and obtain119896 different subcodebooks
In order to take full use of the embedding capacitythe binary secret message should be transformed into 119896-ary digits denoted by 119898 (119898 isin 0 1 119896 minus 1) When thecodeword related to the cover quantization index belongs tothe subcodebook whose label differs from the 119896-ary digit 119898to be embedded this index should be substituted with that ofthe closest codeword in the corresponding subcodebook119898
NID-based steganography is an information hidingmethod based on neighbor-index codebook partition ofwhich the embedding capacity may be controlled by thenumber of subcodebooks 119896 However as illustrated in [16]only about 34 of the pairs of neighbor-index codewordshappened to be the pairs of neighbor-vertex codewords Andthe mean distance between neighbor-index codewords isapparently larger than that of neighbor-vertex codewordsTherefore the amount of distortion induced by NID-basedsteganography may be a little large which is proved by theexperimental results provided in Section 4
3 Proposed Method
The diagram of the proposed method is shown in Figure 1Based on DN codebook partition of the codebooks described
in Section 21 secret message can be embedded into anAMR-WB speech file After the stego AMR-WB speech fileis received the embedded secret message can be extractedwithout errors At the same time the decoded speech withoutperceptible distortion will also be obtained In the followingsection the diameter-neighbor codebook partition algorithm(DN) is first introducedThen the embedding and extractionprocedure of our proposed method are described
31 Codebook Partition A codebook may be viewed as alist of isolated code vectors (ie codewords) in the multidi-mensional space The codebook partition algorithm used foraudio steganography is to divide the codebook into severalclusters in each of which the codewords can be replaced witheach other without causing perceptible distortion
Let 119861 denote the original codebook with 119873119887 code-words and 119862 denote a cluster with 119873119888 codewords 119882119905 (119905 =1 2 119873119888) and the centroid 119866 of a cluster 119862 is defined asfollows
119866 (119894) = 1119873119888119873119888sum119905=1
119882119905 (119894) (1)
where 119866(119894) and 119882119905(119894) are the 119894th components of 119866 and 119882119905respectively
The centroid 119866 (average code vector) is used to representthe corresponding cluster 119862 hence the cluster 119862 may alsobe considered as a vector in the multidimensional codebookspace In order to describe the similarity between two clusters1198621 and1198622 the Euclidean distance between them is defined asfollows
119863(1198621 1198622) = radic 119899sum119894=1
(1198661 (119894) minus 1198662 (119894))2 (2)
where 1198661 and 1198662 are the corresponding geometric centerpoints of the two clusters 1198621 and 1198622 And 119899 is the dimensionof a codeword 1198661(119894) and 1198662(119894) are the 119894th components of 1198661and 1198662 respectively
Let 119878 denote a cluster set The diameter of 119878 is defined asthe maximal Euclidean distance119863119898 of all cluster pairs in thecluster set 119878 that is
119863(119862119901 119862119902) le 119863119898 forall119901 119902 = 1 2 |119878| (3)
4 Security and Communication Networks
Codebook B
Initialize a cluster set Sby taking each codewordas a independent clusterand a empty cluster set S
Cluster set S
Put the remainingclusters in S into S
No
Yes
Put the clustersin S into S tomake S empty
S is empty
No
Yes
Search for the diametercluster pair(Cd1 Cd2) in S
Remove Cd1 Cd2 and theirneighbors from S put
Merge Cd1 Cd2 withtheir neighbors respectivelyinto two new clusters
Ni gt 0
Ni = Ni minus 1
into STemp1 and Temp2Temp1 and Temp2
Figure 2 Diagram of our proposed codebook partition
where |119878| is the number of clusters within the cluster set119878 The cluster pair with maximal Euclidean distance 119863119898called diameter cluster pair is denoted by (1198621198891 1198621198892) And theneighbor of a cluster119862 in 119878 is represented by119873(119862 119878) then wehave
119863 (119862119873 (119862 119878)) le 119863 (119862 119862119901) forall119901 = 1 2 |119878| (4)
Figure 2 illustrates the diagram of the proposed DNcodebook partition algorithm And its detailed procedureis given in Algorithm 1 The original codebook 119861 will bedivided into |119878| clusters by iteratively merging the diametercluster pair with their respective neighbors An iterationparameter119873119894 is applied to obtain flexible embedding capacitythrough controlling the merging procedure The relationshipbetween119873119894 and the embedding capacity will be discussed inSection 43
Figure 3 is provided as an example to illustrate theproposed codebook partition algorithmThe white circle ldquoIrdquodenotes a codeword And the oval ldquordquo with shadow denotesa codeword and its neighbor in 119878 being processed whilethe oval ldquordquo without shadow represents a cluster in 1198781015840 thathas been formed The ldquo0rdquo ldquo1rdquo ldquo00rdquo ldquo01rdquo ldquo10rdquo or ldquo11rdquo ina circle ldquoIrdquo is the label of a codeword in the cluster Thecross ldquotimesrdquo means the centroid of the cluster it belongs to anda line ldquominusrdquo represents the diameter of a cluster set The firstto third merging iterations are shown in Figures 3(a)ndash3(c)respectively The fourth merging iteration is comprised of
Figures 3(d) and 3(e) and Figure 3(f) demonstrates thelabelling of the codewords
32 Embedding Procedure In our proposed method the ISFindices corresponding to the codewords in the codebook arefirst obtained by parsing the host AMR-WB speechThen theISF indices are employed to embed secret message based oncodebook partition Generally the codewords in the samecluster as the codeword referred by 119868119886 lies in are consideredto be replaceable with each other According to the secretmessage to be embedded 119868119886 may be substituted by one of theother codewordsrsquo indiceswithin the same clusterThenumberof secret message bits that can be embedded depends on thesize of the specific cluster The embedding procedures aregiven in the following
Step 1 Search cluster set 119878 for the cluster 119862 which containsthe codeword referred by the ISF index 119868119886Step 2 If there are 119873 codewords in 119862 the number of secretbits that can be embedded into 119868119886 is calculated as 119899 = lfloorlog2119873rfloorStep 3 Read 119899 not-yet-embedded bits denoted by 119898 fromthe secret message 119868119886 is replaced with 119868119887 which indexes thecodeword with the same label as119898
Step 4 Repeat Steps 1ndash3 until all the secret bits are embedded
Security and Communication Networks 5
(a) 1st iteration (119873119894 = 4) (b) 2nd iteration (119873119894 = 3) (c) 3rd iteration (119873119894 = 2)
(d) 4th iteration (119878 = 1198781015840 1198781015840clear()) (e) 4th iteration (119873119894 = 1)
0
0
1
1
11
11
00
0001 01
1010
(f) Labelling
Figure 3 An example of our proposed codebook partition
Input Codebook 119861 iterative parameter119873119894Output Cluster set 119878⋆ 1198781015840 is a helper cluster set ⋆1198781015840clear()119878clear()⋆ Each codeword is taken as a initial cluster ⋆for 119894 = 0 119894 lt 119873119888 ++119894 do119878push (119862119894)end⋆ Iterative merging ⋆while 119873119894 gt 0 do
if 119878 is empty then119878 = 11987810158401198781015840clear()end(1198621198891 1198621198891) = argmax119894119895isin12|119878|119863(119862119894 119862119895)1198791198901198981199011 = 1198621198891 cup 119873(1198621198891 119878)1198791198901198981199012 = 1198621198892 cup 119873(1198621198892 119878)1198781015840push (1198791198901198981199011)1198781015840push (1198791198901198981199012)119878remove (1198621198891)119878remove (1198621198892)119878remove (119873(1198621198891 119878))119878remove (119873(1198621198892 119878))119873119894 = 119873119894 minus 1
end⋆ Put the remaining clusters in 1198781015840 into 119878 ⋆for iter = 1198781015840begin() iter lt 1198781015840end() ++iterdo119878push (lowastiter)
endreturn 119878
Algorithm 1 DN-based codebook partition algorithm
Cluster set S
ISF
Stego ISF
Secret
00 01
1110
0 1
Search amp replace
ClusteL1
ClusteL2
index Ia
index Ib
Wa Wb
WcWd
bits ldquo01rdquo
Ia Ib
Figure 4 Embedding two bits into one cover ISF index
Figure 4 is an example of embedding two secret bits intoone cover ISF index Let us assume the cluster set 119878 containstwo clusters and the corresponding codeword indexed by 119868119909is119882119909 for example 119868119887 indexes the codeword119882119887 Hence theISF index 119868119886 shown in Figure 4 will be replaced with 119868119887 whichindexes the codeword119882119887 with the same label as the secret bitsldquo01rdquo
33 Extracting Procedure When the stego AMR-WB speechis transferred to the intended receiver the stego indices maybe obtained by parsing AMR-WB speech stream and used toextract the embedded secretmessageThemessage extractionprocedures from the stegoindex 119868119887 are given below
Step 1 Search cluster set 119878 which is the same as that employedin the embedding procedure for the cluster119862which containsthe codeword119882119887 referred by the ISF index 119868119887Step 2 If there are totally 119873 codewords in 119862 the number ofsecret bits carried by 119868119887 is computed by 119899 = lfloorlog2119873rfloor
6 Security and Communication Networks
Stego ISF
Cluster set S
Extracted
00 01
1110
0 1
Search amp read
ClusteL1
ClusteL2
index Ib
Wa Wb
Wc Wd
bits ldquo01rdquo
ldquo01rdquo
Ib
Figure 5 Extracting two bits from one stego-ISF index
Step 3 Read the label of119882119887 as the extracted 119899 bits which areappended to the secret message bit sequence
Step 4 Repeat Steps 1ndash3 until all the secret bits are recovered
Figure 5 is the corresponding example of extracting twosecret bits from the stegoindex 119868119887 generated by the previousembedding instance shown in Figure 4 It can be easily seenthat the extracted secret bits are identical to the embeddedsecret bits
4 Experimental Results and Analysis
In order to demonstrate the performance of the proposedmethod the perceptual quality of the stego AMR-WB speechwith secret message embedded using our method is com-puted and compared to that of the stego AMR-WB speechgenerated with CNV and NID steganography Moreover theflexibility of embedding capacity and the security regardingstatistical detection are analyzed in detail
41 Audio Database TIMIT acoustic-phonetic continuousspeech corpus (httpscatalogldcupenneduldc93s1) is anaudio database which contains broadband recordings of630 speakers of eight major dialects of American Englisheach reading ten phonetically rich sentences and all audiosentences are sampled at 16 kHz In our experiments 1000audio sentences are randomly chosen from TIMIT databaseThe average maximum and minimum length of the chosenaudio sentences are 347 s 396 s and 312 s All audio files areconverted into AMR-WB format using standard codec
42 Speech Quality Evaluation The perceptual evaluation ofspeech quality (PESQ) described in the ITU-T P862 Recom-mendation [20] may be employed to evaluate speech qualityMoreover according to ITU-T P8622 [21] the raw PESQscore can be converted to mean opinion score-listening qual-ity objective (MOS-LQO) which is more suitable for evalu-ating wideband speech Hence MOS-LQO is applied in ourexperimentsThe normal range ofMOS-LQO score is 1017 to4549 The higher the score the better the quality
Figure 6 shows the MOS-LQO scores of the 1000 coverAMR-WB speeches in 2385 kbits mode and the correspond-ing stego AMR-WB speeches using three different codebookpartition algorithmsThree progressive embedding rates that
StandardCNVNID
Ours
100 200 300 400 500 600 700 800 900 10000Sample index
18222630343842
MO
S-LQ
O sc
ore
(a) The embedding rate is 100 bps
StandardNIDOurs
18222630343842
MO
S-LQ
O sc
ore
100 200 300 400 500 600 700 800 900 10000Sample index
(b) The embedding rate is 200 bps
StandardNIDOurs
100 200 300 400 500 600 700 800 900 10000Sample index
1418222630343842
MO
S-LQ
O sc
ore
(c) The embedding rate is 300 bps
Figure 6 Comparisons of MOS-LQO values for 1000 samplesbetween the standard AMR-WB codec CNV-based steganographyNID-based steganography and the proposedDN-based steganogra-phy
is 100 bps 200 bps and 300 bps are employed in our experi-ments The indices of speech samples are sorted according totheMOS-LQO scores of our proposedmethod It can be seenfrom Figure 6 that the overall scores of the stego AMR-WBspeeches generated with our method are higher than thoseof the NID-based stego AMR-WB speeches especially whenthe embedding rates are 200 bps and 300 bps And the MOS-LQO scores of the CNV-based stego AMR-WB speeches areslightly higher than ours when the embedding rate is 100 bpswhich means there are no obvious discrepancies in speechquality between them Besides when the high embeddingrate that is 200 bps or 300 bps is used the decrease inMOS-LQO scores of our stego AMR-WB speeches is significantlysmaller than that of NID-based steganography
Moreover the average MOS-LQO scores of the coverAMR-WB speeches and the stego AMR-WB speeches withthree different codebook partition algorithms that is CNV
Security and Communication Networks 7
Table 1 MOS-LQO scores of the standard codec CNV-based NID-based and our proposed steganography in four different rate modes andthree embedding rates
Embedding rate Method Rate mode (kbits)1265 1585 1985 2385
Standard 2929 3073 3199 3269
100 bps
CNV 2871 3021 3153 3225(minus20) (minus17) (minus14) (minus13)NID 2750 2895 3020 3091(minus61) (minus58) (minus56) (minus54)Ours 2864 3010 3139 3216(minus22) (minus20) (minus19) (minus16)
200 bps
CNV
NID 2601 2736 2875 2921(minus112) (minus110) (minus107) (minus106)Ours 2807 2955 3084 3164(minus42) (minus38) (minus36) (minus32)
300 bps
CNV
NID 2284 2386 2475 2533(minus220) (minus223) (minus226) (minus225)Ours 2699 2841 2971 3046(minus79) (minus75) (minus71) (minus68)
NID and DN including four rate modes (1265 kbits1585 kbits 1985 kbits and 2385 kbits) together with threekinds of embedding rate (100 bps 200 bps and 300 bps) aregiven in Table 1 Only the MOS-LQO scores of NID-basedand DN-based steganographic methods with embeddingrates 200 bps and 300 bps are given in Table 1 because theembedding capacity of CNV-based steganography may notbe larger than 100 bps
When the embedding rate is 100 bps which is almostthe limit of CNV steganography we can see from Table 1that the mean MOS-LQO scores of our proposed methodare only about 03 worse than CNV-based steganographyThe slight decrease may be almost imperceptible by humanauditory system (HAS) And there are significant increases ofapproximately 38 in the meanMOS-LQO scores when ourpresented method is compared to NID-based steganographyAnd it can be observed that when the embedding rates are200 bps and 300 bps the scores of our approach are improvedby about 7 and 15 correspondingly in contrast to those ofNID-based steganography
Furthermore we can also see that the experimentalresults of four rate modes are analogous The decrease ofspeech quality caused by NID-based steganography is morethan twice that caused by DN-based steganography And theproposedmethod can obtain twice the embedding capacity ofCNV-based steganography by sacrificing less than 2 speechquality in average In addition only a slight decline in speechquality is observed when 300 bps embedding rate is used inthe proposed DN-based method while 200 bps is employedin NID-based method
43 Flexible Embedding Capacity Compared to CNV-basedsteganography flexible embedding capacity may be obtained
to satisfy different practical demand with our proposedmethod The steganographic capacity can be adjusted bychanging the iteration parameter 119873119894 For different values of119873119894 for example 119873119894 = 32 33 54 the average embeddingcapacity and the MOS-LQO scores are given in Figure 7(a)and the corresponding results of NID-based steganographyare provided in Figure 7(b) for comparison Without loss ofgenerality only 2385 kbits mode is used
From Figure 7 we can observe that the embedding ratesignificantly increases with 119873119894 while the MOS-LQO scoreslightly goes down However as NID-based steganographyis concerned the MOS-LQO score rapidly declines with theincrease of the embedding rateTherefore the proposed DN-based steganography can achieve higher embedding capacitywith slight decrease in speech quality For example when119873119894 = 48 the size of each cluster in 119878 is equal to 4 and we canembed 4 bits per frame that is the embedding rate is 200 bpsbut at the same time the CNV algorithm can embed at most2 bits per frame (100 bps)
44 Resistibility of Statistical Steganalysis Speech steganog-raphy aims to hide secret message into cover speech withoutarousing suspicion It is very important for a steganographicmethod to resist statistical steganalysis which is the tech-nique of detecting the presence of hidden message Twostate-of-the-art steganalytic methods [17 18] are used toevaluate the performance of statistical undetectability of ourproposed method In [17] mel-cepstrum coefficients andMarkov transition features from the second-order derivativeof the audio signal are extracted to capture the statisticaldistortions caused by audio steganography while in [18]the correlation characteristics of split vector quantizationcodewords of linear predictive coding filter coefficients are
8 Security and Communication Networks
Embe
ddin
g ra
te (b
ps)
Embedding rateMOS-LQO
330
290
250
210
170
130
90
Times of cluster merging
33
31
29
27
25
23
MO
S-LQ
O sc
ore
565350474441383532
(a) Our proposed steganography
Number of sub-codebooks
Embedding rateMOS-LQO
109876543290
130
170
210
250
290
330
Embe
ddin
g ra
te (b
ps)
23
25
27
29
31
33
MO
S-LQ
O sc
ore
(b) NID-based steganography
Figure 7 Relationship between the embedding rates and the MOS-LQO scores for our proposed steganography and NID-basedsteganography
Table 2 Steganalysis results of different steganographic methods in 2385 kbitss mode
Training rate 04 05 06Method Markov MFCC SS-QCCN RS-QCCN Markov MFCC SS-QCCN RS-QCCN Markov MFCC SS-QCCN RS-QCCN100 bps
CNV 498 498 437 490 501 502 440 492 500 505 419 500NID 510 601 422 500 501 609 429 487 521 598 418 494Ours 500 500 440 494 503 493 403 494 491 486 418 433
200 bpsCNV NID 535 745 469 500 533 762 476 500 536 758 444 501Ours 510 483 452 500 498 487 422 500 505 486 450 500
300 bpsCNV NID 548 746 493 500 563 772 500 500 554 783 505 506Ours 524 497 479 500 528 609 482 500 538 501 466 500
utilized to steganalyze QIM-based steganography in low-bit-rate speech (such as G7231 and G729) Both steganalyticmethods use a support vector machine to predict the exis-tence of hidden message in given audios
In our experiments the sentences chosen from ldquoTIMITrdquodatabases as stated in Section 41 are first encoded using thestandard AMR-WB codec These AMR-WB recordings con-stitute the cover speech setThen secret message is embeddedinto each cover AMR-WB speech with different embeddingrates that is 100 bps 200 bps and 300 bps by CNV-basedNID-based andDN-based steganographyOf course 200 bpsand 300 bps may be omitted for CNV-based steganogra-phy because of its limited embedding capacity And sevenstegospeech sets are generated amongwhich one set is relatedto CNV-based steganographic method and each of three setsis associated with NID-based and DN-based steganographyrespectivelyMoreover only 2385 kbitsmode is usedwithoutloss of generality
In each experiment a pair of cover and stego speech setsis randomly divided into training and testing sets accordingto three kinds of training rates that is 04 05 and 06 For
example if the training rate is 04 the training set contains40 speech samples randomly chosen from each of the coverand stegospeech sets and the remaining 60 samples go intothe testing set As described in [17 18] LIBSVM [22] is usedas a classifier and radial basis function (RBF) kernel and grid-search technique are employed to obtain better classificationperformance For Li et alrsquos steganalytic method the principalcomponent analysis (PCA) is first used as suggested in [18]to reduce the dimension of feature vectors to 300 Let thesamples in cover speech set denote negatives and those instego speech set stand for positives Hence the accuracy maybe defined as follows
Accuracy = 12 times ( TPTP + FN
+ TNFP + TN
) (5)
where TP are true positives TN are true negatives FN arefalse negatives and FP are false positives
The steganalytic results are given in Table 2 It can beseen that when the embedding rate is 100 bps the accuracyof detecting both CNV-based and DN-based methods isalmost the same say 50 or so while that of detecting
Security and Communication Networks 9
0 11 12 13 14 15 21 22 23 24 25 31 32 33 34 35 41 42 43 44 45 51 52 53 54 55Edge
0123456789
10
Cor
relat
ion
inde
x
12
13
14
15
23
24
25
34
35
45
Figure 8 The correlation index of 1000 AMR-WB speeches wherethe interframe edge 119894119894 connects two vertices 119881119894[119896] and 119881119894[119896 + 1] intwo neighboring frames and the intraframe edge 1198941198951015840 connects twovertices 119881119894[119896] and 119881119895[119896] in the same frame
NID-based steganography increases to 60 when MFCC-based steganalytic method is applied Moreover there isan apparent increase in the accuracy of detecting NID-based hiding method with the embedding rate increases to200 bps or 300 bps when Liu et alrsquos methods (ie Markovand MFCC-based steganalytic methods) are applied But theaccuracy of steganalyzing our proposed method DN-basedsteganography stays at the same level of 50 Therefore theproposed method may defend against Liu et alrsquos statisticalsteganalysis [17] even with higher embedding rates
According to the definition of the correlation index givenin [18] the experimental results of the correlation indices of1000 AMR-WB speeches which are randomly selected fromldquoTIMITrdquo are shown in Figure 8 Based on these results twostrong quantization codeword correlation network (QCCN)models say SS-QCCN and RS-QCCN can be constructedas illustrated in Figure 9 These two models are then usedto steganalyze our proposed steganography The steganalyticresults are also presented in Table 2 It can be seen fromTable 2 that the accuracy of both SS-QCCN and RS-QCCNis less than 50 for all of the AMR-WB stegospeeches Thepossible reasons may be that only the second and thirdcodebooks in the second stage are employed in the AMR-WB speech steganography which means merely the vertices1198812[119896] and 1198813[119896] in the 119896th frame may be changed duringsteganography while none of them are utilized in Li et alrsquossteganalytic method except for the edge ldquo33rdquo in RS-QCCNmodel Besides we also used an adapted QCCN model (ieutilize edges ldquo22rdquo ldquo33rdquo and ldquo231015840rdquo) targeted at AMR-WBspeech but the accuracy is still less than 50 It may bebecause the correlation of those edges is not strong enoughfor steganalysis according to Figure 8 Therefore it is reason-able to conclude that theAMR-WB speech steganography candefend against the steganalytic method proposed in [18]
In order to visualize the detection performance wegive some receiver operating characteristic (ROC) curvesof steganalyzing CNV-based steganography with 100 bpsembedding rate and NID-based and DN-based steganogra-phy with 100 bps 200 bps and 300 bps embedding rates are
V1[k]
V2[k]
V3[k]
V4[k]
V5[k]
V1[k + 1]
V2[k + 1]
V3[k + 1]
V4[k + 1]
V5[k + 1]
45 45
11
middot middot middot middot middot middot
(a) SS-QCCN
V1[k + 1]
V2[k + 1]
V3[k + 1]
V4[k + 1]
V5[k + 1]
V1[k]
V2[k]
V3[k]
V4[k]
V5[k]
45 45
15 15
1414
33
44
11
middot middot middot middot middot middot
(b) RS-QCCN
Figure 9 Two AMR-WB strong correlation network models
provided in Figure 10 (ROC curves for SS-QCCN and RS-QCCN are omitted for these two methods fail to steganalyzeAMR-WB steganography in spite of embedding capacity)It shows that all of the three steganographic methods canresist statistical steganalysis when the embedding rate is100 bps While the statistical steganalytic methods especiallyMFCC-based steganalysismay detect the existence of hiddenmessage embedded with NID-based steganography when theembedding rate is above 100 bps the proposed DN-basedsteganography may still have good security against bothMarkov-based and MFCC-based steganalysis
5 Conclusion
The adaptive multirate wideband (AMR-WB) is a widelyadapted format in mobile handsets and is also the recom-mended speech codec for VoLTE AMR-WB speech may bea good candidate for cover medium in speech steganographyIn this paper a novel AMR-WB speech steganographicmethod is proposed The experimental results demonstratedthe effectiveness of our proposed method The main contri-butions of this paper are as follows
(1) A novel AMR-WB speech steganography is pro-posed based on diameter-neighbor codebook parti-tion algorithm It can provide higher capacity with-out noticeable decrease in speech quality and better
10 Security and Communication Networks
DN
CNVNID
0
05
1
True
pos
itive
rate
02 04 06 08 10False positive rate
(a) Markov (TIMIT 100 bps)DN
CNVNID
02 04 06 08 10False positive rate
0
05
1
True
pos
itive
rate
(b) MFCC (TIMIT 100 bps)
NIDDN
0
05
1
True
pos
itive
rate
02 04 06 08 10False positive rate
(c) Markov (TIMIT 200 bps)
NIDDN
0
05
1
True
pos
itive
rate
02 04 06 08 10False positive rate
(d) MFCC (TIMIT 200 bps)
NIDDN
0
05
1
True
pos
itive
rate
02 04 06 08 10False positive rate
(e) Markov (TIMIT 300 bps)
NIDDN
0
05
1
True
pos
itive
rate
02 04 06 08 10False positive rate
(f) MFCC (TIMIT 300 bps)
Figure 10 ROC curves for steganalysis of CNV-based NID-based and our proposed steganography (50 training rate)
performance against statistical steganalysis thanNID-based method
(2) Flexible embedding capacity may be easily achievedwith different iterations of cluster merging Twicethe embedding capacity of CNV-based embeddingmethod may be obtained with119873119894 = 48
Conflicts of Interest
The authors declare that there are no conflicts of interestregarding the publication of this paper
Acknowledgments
This work was partially supported by the National NaturalScience Foundation of China under Grant no 61632013
References
[1] W Bender D Gruhl N Morimoto and A Lu ldquoTechniques fordata hidingrdquo IBM Systems Journal vol 35 no 3-4 pp 313ndash3351996
[2] D Gruhl A Lu and W Bender ldquoEcho hidingrdquo in InformationHiding R Anderson Ed vol 1174 of Lecture Notes in ComputerScience pp 295ndash315 Springer Berlin Heidelberg Berlin Ger-many 1996
[3] K Gopalan ldquoAudio steganography using bit modificationrdquo inProceedings of the 2003 International Conference on Multimediaand Expo ICME 2003 pp I629ndashI632 USA July 2003
[4] K Gopalan S Wenndt S Adams and D Haddad ldquoAudiosteganography by amplitude or phasemodificationrdquo in Proceed-ings of the Security andWatermarking ofMultimedia Contents Vpp 67ndash76 USA January 2003
[5] D Kirovski and H S Malvar ldquoSpread-spectrum watermarkingof audio signalsrdquo IEEE Transactions on Signal Processing vol 51no 4 pp 1020ndash1033 2003
[6] L Liu M Li Q Li and Y Liang ldquoPerceptually transparentinformation hiding in G729 bitstreamrdquo in Proceedings of the2008 4th International Conference on Intelligent InformationHiding andMultiedia Signal Processing IIH-MSP 2008 pp 406ndash409 China August 2008
[7] T Xu and Z Yang ldquoSimple and effective speech steganog-raphy in G7231 low-rate codesrdquo in Proceedings of the 2009
Security and Communication Networks 11
International Conference on Wireless Communications and Sig-nal Processing WCSP 2009 China November 2009
[8] A Shahbazi A H Rezaie and R Shahbazi ldquoMELPe codedspeech hiding on enhanced full rate compressed domainrdquo inProceedings of the Asia Modelling Symposium 2010 4th Inter-national Conference on Mathematical Modelling and ComputerSimulation AMS2010 pp 267ndash270 Malaysia May 2010
[9] A Nishimura ldquoData hiding in pitch delay data of the adaptivemulti-rate narrow-band speech codecrdquo in Proceedings of theIIH-MSP 2009-2009 5th International Conference on IntelligentInformation Hiding and Multimedia Signal Processing pp 483ndash486 Japan September 2009
[10] B Xiao Y Huang and S Tang ldquoAn approach to informationhiding in low bit-rate speech streamrdquo in Proceedings of the2008 IEEE Global Telecommunications Conference GLOBE-COM 2008 pp 1940ndash1944 USA December 2008
[11] B Chen and G W Wornell ldquoQuantization index modulationa class of provably good methods for digital watermarking andinformation embeddingrdquo Institute of Electrical and ElectronicsEngineers Transactions on InformationTheory vol 47 no 4 pp1423ndash1443 2001
[12] Y F Huang S Tang and J Yuan ldquoSteganography in inactiveframes of VoIP streams encoded by source codecrdquo IEEETransactions on Information Forensics and Security vol 6 no2 pp 296ndash306 2011
[13] YHuang C Liu S Tang and S Bai ldquoSteganography integrationinto a low-bit rate speech codecrdquo IEEE Transactions on Informa-tion Forensics and Security vol 7 no 6 pp 1865ndash1875 2012
[14] H Miao L Huang Z Chen W Yang and A Al-Hawbani ldquoAnew scheme for covert communication via 3G encoded speechrdquoComputers and Electrical Engineering vol 38 no 6 pp 1490ndash1501 2012
[15] H Tian J Liu and S Li ldquoImproving security of quantization-index-modulation steganography in low bit-rate speechstreamsrdquoMultimedia Systems vol 20 no 2 pp 143ndash154 2014
[16] J Liu H Tian J Lu and Y Chen ldquoNeighbor-index-divisionsteganography based on QIM method for G7231 speechstreamsrdquo Journal of Ambient Intelligence and Humanized Com-puting vol 7 no 1 pp 139ndash147 2016
[17] Q Liu A H Sung and M Qiao ldquoDerivative-based audiosteganalysisrdquo ACM Transactions on Multimedia ComputingCommunications andApplications (TOMM) vol 7 no 3 articleno 18 2011
[18] S Li Y Jia and C-C J Kuo ldquoSteganalysis of QIM Steganogra-phy in Low-Bit-Rate Speech Signalsrdquo IEEEACM TransactionsonAudio Speech and Language Processing vol 25 no 5 pp 1011ndash1022 2017
[19] ITU-T Wideband Coding of Speech at around 16 Kbps UsingAdaptive Multi-rate Wideband (AMR-WB) International Tele-communication Union Std G7222 2002
[20] Perceptual Evaluation of Speech Quality (PESQ) An ObjectiveMethod for End-to-end Speech Quality Assessment of Narrow-band Telephone Net-works and Speech Codecs InternationalTelecommunication Union Std P862 2001
[21] Wideband Extension to Recommendation P862 for the Assess-ment of Wideband Telephone Networks and Speech CodecsInternational Telecommunication Union Std P8622 2007
[22] C Chang and C Lin ldquoLIBSVM a Library for support vectormachinesrdquo ACM Transactions on Intelligent Systems and Tech-nology vol 2 no 3 article 27 2011
International Journal of
AerospaceEngineeringHindawiwwwhindawicom Volume 2018
RoboticsJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Active and Passive Electronic Components
VLSI Design
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Shock and Vibration
Hindawiwwwhindawicom Volume 2018
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawiwwwhindawicom
Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Control Scienceand Engineering
Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Journal ofEngineeringVolume 2018
SensorsJournal of
Hindawiwwwhindawicom Volume 2018
International Journal of
RotatingMachinery
Hindawiwwwhindawicom Volume 2018
Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Navigation and Observation
International Journal of
Hindawi
wwwhindawicom Volume 2018
Advances in
Multimedia
Submit your manuscripts atwwwhindawicom
4 Security and Communication Networks
Codebook B
Initialize a cluster set Sby taking each codewordas a independent clusterand a empty cluster set S
Cluster set S
Put the remainingclusters in S into S
No
Yes
Put the clustersin S into S tomake S empty
S is empty
No
Yes
Search for the diametercluster pair(Cd1 Cd2) in S
Remove Cd1 Cd2 and theirneighbors from S put
Merge Cd1 Cd2 withtheir neighbors respectivelyinto two new clusters
Ni gt 0
Ni = Ni minus 1
into STemp1 and Temp2Temp1 and Temp2
Figure 2 Diagram of our proposed codebook partition
where |119878| is the number of clusters within the cluster set119878 The cluster pair with maximal Euclidean distance 119863119898called diameter cluster pair is denoted by (1198621198891 1198621198892) And theneighbor of a cluster119862 in 119878 is represented by119873(119862 119878) then wehave
119863 (119862119873 (119862 119878)) le 119863 (119862 119862119901) forall119901 = 1 2 |119878| (4)
Figure 2 illustrates the diagram of the proposed DNcodebook partition algorithm And its detailed procedureis given in Algorithm 1 The original codebook 119861 will bedivided into |119878| clusters by iteratively merging the diametercluster pair with their respective neighbors An iterationparameter119873119894 is applied to obtain flexible embedding capacitythrough controlling the merging procedure The relationshipbetween119873119894 and the embedding capacity will be discussed inSection 43
Figure 3 is provided as an example to illustrate theproposed codebook partition algorithmThe white circle ldquoIrdquodenotes a codeword And the oval ldquordquo with shadow denotesa codeword and its neighbor in 119878 being processed whilethe oval ldquordquo without shadow represents a cluster in 1198781015840 thathas been formed The ldquo0rdquo ldquo1rdquo ldquo00rdquo ldquo01rdquo ldquo10rdquo or ldquo11rdquo ina circle ldquoIrdquo is the label of a codeword in the cluster Thecross ldquotimesrdquo means the centroid of the cluster it belongs to anda line ldquominusrdquo represents the diameter of a cluster set The firstto third merging iterations are shown in Figures 3(a)ndash3(c)respectively The fourth merging iteration is comprised of
Figures 3(d) and 3(e) and Figure 3(f) demonstrates thelabelling of the codewords
32 Embedding Procedure In our proposed method the ISFindices corresponding to the codewords in the codebook arefirst obtained by parsing the host AMR-WB speechThen theISF indices are employed to embed secret message based oncodebook partition Generally the codewords in the samecluster as the codeword referred by 119868119886 lies in are consideredto be replaceable with each other According to the secretmessage to be embedded 119868119886 may be substituted by one of theother codewordsrsquo indiceswithin the same clusterThenumberof secret message bits that can be embedded depends on thesize of the specific cluster The embedding procedures aregiven in the following
Step 1 Search cluster set 119878 for the cluster 119862 which containsthe codeword referred by the ISF index 119868119886Step 2 If there are 119873 codewords in 119862 the number of secretbits that can be embedded into 119868119886 is calculated as 119899 = lfloorlog2119873rfloorStep 3 Read 119899 not-yet-embedded bits denoted by 119898 fromthe secret message 119868119886 is replaced with 119868119887 which indexes thecodeword with the same label as119898
Step 4 Repeat Steps 1ndash3 until all the secret bits are embedded
Security and Communication Networks 5
(a) 1st iteration (119873119894 = 4) (b) 2nd iteration (119873119894 = 3) (c) 3rd iteration (119873119894 = 2)
(d) 4th iteration (119878 = 1198781015840 1198781015840clear()) (e) 4th iteration (119873119894 = 1)
0
0
1
1
11
11
00
0001 01
1010
(f) Labelling
Figure 3 An example of our proposed codebook partition
Input Codebook 119861 iterative parameter119873119894Output Cluster set 119878⋆ 1198781015840 is a helper cluster set ⋆1198781015840clear()119878clear()⋆ Each codeword is taken as a initial cluster ⋆for 119894 = 0 119894 lt 119873119888 ++119894 do119878push (119862119894)end⋆ Iterative merging ⋆while 119873119894 gt 0 do
if 119878 is empty then119878 = 11987810158401198781015840clear()end(1198621198891 1198621198891) = argmax119894119895isin12|119878|119863(119862119894 119862119895)1198791198901198981199011 = 1198621198891 cup 119873(1198621198891 119878)1198791198901198981199012 = 1198621198892 cup 119873(1198621198892 119878)1198781015840push (1198791198901198981199011)1198781015840push (1198791198901198981199012)119878remove (1198621198891)119878remove (1198621198892)119878remove (119873(1198621198891 119878))119878remove (119873(1198621198892 119878))119873119894 = 119873119894 minus 1
end⋆ Put the remaining clusters in 1198781015840 into 119878 ⋆for iter = 1198781015840begin() iter lt 1198781015840end() ++iterdo119878push (lowastiter)
endreturn 119878
Algorithm 1 DN-based codebook partition algorithm
Cluster set S
ISF
Stego ISF
Secret
00 01
1110
0 1
Search amp replace
ClusteL1
ClusteL2
index Ia
index Ib
Wa Wb
WcWd
bits ldquo01rdquo
Ia Ib
Figure 4 Embedding two bits into one cover ISF index
Figure 4 is an example of embedding two secret bits intoone cover ISF index Let us assume the cluster set 119878 containstwo clusters and the corresponding codeword indexed by 119868119909is119882119909 for example 119868119887 indexes the codeword119882119887 Hence theISF index 119868119886 shown in Figure 4 will be replaced with 119868119887 whichindexes the codeword119882119887 with the same label as the secret bitsldquo01rdquo
33 Extracting Procedure When the stego AMR-WB speechis transferred to the intended receiver the stego indices maybe obtained by parsing AMR-WB speech stream and used toextract the embedded secretmessageThemessage extractionprocedures from the stegoindex 119868119887 are given below
Step 1 Search cluster set 119878 which is the same as that employedin the embedding procedure for the cluster119862which containsthe codeword119882119887 referred by the ISF index 119868119887Step 2 If there are totally 119873 codewords in 119862 the number ofsecret bits carried by 119868119887 is computed by 119899 = lfloorlog2119873rfloor
6 Security and Communication Networks
Stego ISF
Cluster set S
Extracted
00 01
1110
0 1
Search amp read
ClusteL1
ClusteL2
index Ib
Wa Wb
Wc Wd
bits ldquo01rdquo
ldquo01rdquo
Ib
Figure 5 Extracting two bits from one stego-ISF index
Step 3 Read the label of119882119887 as the extracted 119899 bits which areappended to the secret message bit sequence
Step 4 Repeat Steps 1ndash3 until all the secret bits are recovered
Figure 5 is the corresponding example of extracting twosecret bits from the stegoindex 119868119887 generated by the previousembedding instance shown in Figure 4 It can be easily seenthat the extracted secret bits are identical to the embeddedsecret bits
4 Experimental Results and Analysis
In order to demonstrate the performance of the proposedmethod the perceptual quality of the stego AMR-WB speechwith secret message embedded using our method is com-puted and compared to that of the stego AMR-WB speechgenerated with CNV and NID steganography Moreover theflexibility of embedding capacity and the security regardingstatistical detection are analyzed in detail
41 Audio Database TIMIT acoustic-phonetic continuousspeech corpus (httpscatalogldcupenneduldc93s1) is anaudio database which contains broadband recordings of630 speakers of eight major dialects of American Englisheach reading ten phonetically rich sentences and all audiosentences are sampled at 16 kHz In our experiments 1000audio sentences are randomly chosen from TIMIT databaseThe average maximum and minimum length of the chosenaudio sentences are 347 s 396 s and 312 s All audio files areconverted into AMR-WB format using standard codec
42 Speech Quality Evaluation The perceptual evaluation ofspeech quality (PESQ) described in the ITU-T P862 Recom-mendation [20] may be employed to evaluate speech qualityMoreover according to ITU-T P8622 [21] the raw PESQscore can be converted to mean opinion score-listening qual-ity objective (MOS-LQO) which is more suitable for evalu-ating wideband speech Hence MOS-LQO is applied in ourexperimentsThe normal range ofMOS-LQO score is 1017 to4549 The higher the score the better the quality
Figure 6 shows the MOS-LQO scores of the 1000 coverAMR-WB speeches in 2385 kbits mode and the correspond-ing stego AMR-WB speeches using three different codebookpartition algorithmsThree progressive embedding rates that
StandardCNVNID
Ours
100 200 300 400 500 600 700 800 900 10000Sample index
18222630343842
MO
S-LQ
O sc
ore
(a) The embedding rate is 100 bps
StandardNIDOurs
18222630343842
MO
S-LQ
O sc
ore
100 200 300 400 500 600 700 800 900 10000Sample index
(b) The embedding rate is 200 bps
StandardNIDOurs
100 200 300 400 500 600 700 800 900 10000Sample index
1418222630343842
MO
S-LQ
O sc
ore
(c) The embedding rate is 300 bps
Figure 6 Comparisons of MOS-LQO values for 1000 samplesbetween the standard AMR-WB codec CNV-based steganographyNID-based steganography and the proposedDN-based steganogra-phy
is 100 bps 200 bps and 300 bps are employed in our experi-ments The indices of speech samples are sorted according totheMOS-LQO scores of our proposedmethod It can be seenfrom Figure 6 that the overall scores of the stego AMR-WBspeeches generated with our method are higher than thoseof the NID-based stego AMR-WB speeches especially whenthe embedding rates are 200 bps and 300 bps And the MOS-LQO scores of the CNV-based stego AMR-WB speeches areslightly higher than ours when the embedding rate is 100 bpswhich means there are no obvious discrepancies in speechquality between them Besides when the high embeddingrate that is 200 bps or 300 bps is used the decrease inMOS-LQO scores of our stego AMR-WB speeches is significantlysmaller than that of NID-based steganography
Moreover the average MOS-LQO scores of the coverAMR-WB speeches and the stego AMR-WB speeches withthree different codebook partition algorithms that is CNV
Security and Communication Networks 7
Table 1 MOS-LQO scores of the standard codec CNV-based NID-based and our proposed steganography in four different rate modes andthree embedding rates
Embedding rate Method Rate mode (kbits)1265 1585 1985 2385
Standard 2929 3073 3199 3269
100 bps
CNV 2871 3021 3153 3225(minus20) (minus17) (minus14) (minus13)NID 2750 2895 3020 3091(minus61) (minus58) (minus56) (minus54)Ours 2864 3010 3139 3216(minus22) (minus20) (minus19) (minus16)
200 bps
CNV
NID 2601 2736 2875 2921(minus112) (minus110) (minus107) (minus106)Ours 2807 2955 3084 3164(minus42) (minus38) (minus36) (minus32)
300 bps
CNV
NID 2284 2386 2475 2533(minus220) (minus223) (minus226) (minus225)Ours 2699 2841 2971 3046(minus79) (minus75) (minus71) (minus68)
NID and DN including four rate modes (1265 kbits1585 kbits 1985 kbits and 2385 kbits) together with threekinds of embedding rate (100 bps 200 bps and 300 bps) aregiven in Table 1 Only the MOS-LQO scores of NID-basedand DN-based steganographic methods with embeddingrates 200 bps and 300 bps are given in Table 1 because theembedding capacity of CNV-based steganography may notbe larger than 100 bps
When the embedding rate is 100 bps which is almostthe limit of CNV steganography we can see from Table 1that the mean MOS-LQO scores of our proposed methodare only about 03 worse than CNV-based steganographyThe slight decrease may be almost imperceptible by humanauditory system (HAS) And there are significant increases ofapproximately 38 in the meanMOS-LQO scores when ourpresented method is compared to NID-based steganographyAnd it can be observed that when the embedding rates are200 bps and 300 bps the scores of our approach are improvedby about 7 and 15 correspondingly in contrast to those ofNID-based steganography
Furthermore we can also see that the experimentalresults of four rate modes are analogous The decrease ofspeech quality caused by NID-based steganography is morethan twice that caused by DN-based steganography And theproposedmethod can obtain twice the embedding capacity ofCNV-based steganography by sacrificing less than 2 speechquality in average In addition only a slight decline in speechquality is observed when 300 bps embedding rate is used inthe proposed DN-based method while 200 bps is employedin NID-based method
43 Flexible Embedding Capacity Compared to CNV-basedsteganography flexible embedding capacity may be obtained
to satisfy different practical demand with our proposedmethod The steganographic capacity can be adjusted bychanging the iteration parameter 119873119894 For different values of119873119894 for example 119873119894 = 32 33 54 the average embeddingcapacity and the MOS-LQO scores are given in Figure 7(a)and the corresponding results of NID-based steganographyare provided in Figure 7(b) for comparison Without loss ofgenerality only 2385 kbits mode is used
From Figure 7 we can observe that the embedding ratesignificantly increases with 119873119894 while the MOS-LQO scoreslightly goes down However as NID-based steganographyis concerned the MOS-LQO score rapidly declines with theincrease of the embedding rateTherefore the proposed DN-based steganography can achieve higher embedding capacitywith slight decrease in speech quality For example when119873119894 = 48 the size of each cluster in 119878 is equal to 4 and we canembed 4 bits per frame that is the embedding rate is 200 bpsbut at the same time the CNV algorithm can embed at most2 bits per frame (100 bps)
44 Resistibility of Statistical Steganalysis Speech steganog-raphy aims to hide secret message into cover speech withoutarousing suspicion It is very important for a steganographicmethod to resist statistical steganalysis which is the tech-nique of detecting the presence of hidden message Twostate-of-the-art steganalytic methods [17 18] are used toevaluate the performance of statistical undetectability of ourproposed method In [17] mel-cepstrum coefficients andMarkov transition features from the second-order derivativeof the audio signal are extracted to capture the statisticaldistortions caused by audio steganography while in [18]the correlation characteristics of split vector quantizationcodewords of linear predictive coding filter coefficients are
8 Security and Communication Networks
Embe
ddin
g ra
te (b
ps)
Embedding rateMOS-LQO
330
290
250
210
170
130
90
Times of cluster merging
33
31
29
27
25
23
MO
S-LQ
O sc
ore
565350474441383532
(a) Our proposed steganography
Number of sub-codebooks
Embedding rateMOS-LQO
109876543290
130
170
210
250
290
330
Embe
ddin
g ra
te (b
ps)
23
25
27
29
31
33
MO
S-LQ
O sc
ore
(b) NID-based steganography
Figure 7 Relationship between the embedding rates and the MOS-LQO scores for our proposed steganography and NID-basedsteganography
Table 2 Steganalysis results of different steganographic methods in 2385 kbitss mode
Training rate 04 05 06Method Markov MFCC SS-QCCN RS-QCCN Markov MFCC SS-QCCN RS-QCCN Markov MFCC SS-QCCN RS-QCCN100 bps
CNV 498 498 437 490 501 502 440 492 500 505 419 500NID 510 601 422 500 501 609 429 487 521 598 418 494Ours 500 500 440 494 503 493 403 494 491 486 418 433
200 bpsCNV NID 535 745 469 500 533 762 476 500 536 758 444 501Ours 510 483 452 500 498 487 422 500 505 486 450 500
300 bpsCNV NID 548 746 493 500 563 772 500 500 554 783 505 506Ours 524 497 479 500 528 609 482 500 538 501 466 500
utilized to steganalyze QIM-based steganography in low-bit-rate speech (such as G7231 and G729) Both steganalyticmethods use a support vector machine to predict the exis-tence of hidden message in given audios
In our experiments the sentences chosen from ldquoTIMITrdquodatabases as stated in Section 41 are first encoded using thestandard AMR-WB codec These AMR-WB recordings con-stitute the cover speech setThen secret message is embeddedinto each cover AMR-WB speech with different embeddingrates that is 100 bps 200 bps and 300 bps by CNV-basedNID-based andDN-based steganographyOf course 200 bpsand 300 bps may be omitted for CNV-based steganogra-phy because of its limited embedding capacity And sevenstegospeech sets are generated amongwhich one set is relatedto CNV-based steganographic method and each of three setsis associated with NID-based and DN-based steganographyrespectivelyMoreover only 2385 kbitsmode is usedwithoutloss of generality
In each experiment a pair of cover and stego speech setsis randomly divided into training and testing sets accordingto three kinds of training rates that is 04 05 and 06 For
example if the training rate is 04 the training set contains40 speech samples randomly chosen from each of the coverand stegospeech sets and the remaining 60 samples go intothe testing set As described in [17 18] LIBSVM [22] is usedas a classifier and radial basis function (RBF) kernel and grid-search technique are employed to obtain better classificationperformance For Li et alrsquos steganalytic method the principalcomponent analysis (PCA) is first used as suggested in [18]to reduce the dimension of feature vectors to 300 Let thesamples in cover speech set denote negatives and those instego speech set stand for positives Hence the accuracy maybe defined as follows
Accuracy = 12 times ( TPTP + FN
+ TNFP + TN
) (5)
where TP are true positives TN are true negatives FN arefalse negatives and FP are false positives
The steganalytic results are given in Table 2 It can beseen that when the embedding rate is 100 bps the accuracyof detecting both CNV-based and DN-based methods isalmost the same say 50 or so while that of detecting
Security and Communication Networks 9
0 11 12 13 14 15 21 22 23 24 25 31 32 33 34 35 41 42 43 44 45 51 52 53 54 55Edge
0123456789
10
Cor
relat
ion
inde
x
12
13
14
15
23
24
25
34
35
45
Figure 8 The correlation index of 1000 AMR-WB speeches wherethe interframe edge 119894119894 connects two vertices 119881119894[119896] and 119881119894[119896 + 1] intwo neighboring frames and the intraframe edge 1198941198951015840 connects twovertices 119881119894[119896] and 119881119895[119896] in the same frame
NID-based steganography increases to 60 when MFCC-based steganalytic method is applied Moreover there isan apparent increase in the accuracy of detecting NID-based hiding method with the embedding rate increases to200 bps or 300 bps when Liu et alrsquos methods (ie Markovand MFCC-based steganalytic methods) are applied But theaccuracy of steganalyzing our proposed method DN-basedsteganography stays at the same level of 50 Therefore theproposed method may defend against Liu et alrsquos statisticalsteganalysis [17] even with higher embedding rates
According to the definition of the correlation index givenin [18] the experimental results of the correlation indices of1000 AMR-WB speeches which are randomly selected fromldquoTIMITrdquo are shown in Figure 8 Based on these results twostrong quantization codeword correlation network (QCCN)models say SS-QCCN and RS-QCCN can be constructedas illustrated in Figure 9 These two models are then usedto steganalyze our proposed steganography The steganalyticresults are also presented in Table 2 It can be seen fromTable 2 that the accuracy of both SS-QCCN and RS-QCCNis less than 50 for all of the AMR-WB stegospeeches Thepossible reasons may be that only the second and thirdcodebooks in the second stage are employed in the AMR-WB speech steganography which means merely the vertices1198812[119896] and 1198813[119896] in the 119896th frame may be changed duringsteganography while none of them are utilized in Li et alrsquossteganalytic method except for the edge ldquo33rdquo in RS-QCCNmodel Besides we also used an adapted QCCN model (ieutilize edges ldquo22rdquo ldquo33rdquo and ldquo231015840rdquo) targeted at AMR-WBspeech but the accuracy is still less than 50 It may bebecause the correlation of those edges is not strong enoughfor steganalysis according to Figure 8 Therefore it is reason-able to conclude that theAMR-WB speech steganography candefend against the steganalytic method proposed in [18]
In order to visualize the detection performance wegive some receiver operating characteristic (ROC) curvesof steganalyzing CNV-based steganography with 100 bpsembedding rate and NID-based and DN-based steganogra-phy with 100 bps 200 bps and 300 bps embedding rates are
V1[k]
V2[k]
V3[k]
V4[k]
V5[k]
V1[k + 1]
V2[k + 1]
V3[k + 1]
V4[k + 1]
V5[k + 1]
45 45
11
middot middot middot middot middot middot
(a) SS-QCCN
V1[k + 1]
V2[k + 1]
V3[k + 1]
V4[k + 1]
V5[k + 1]
V1[k]
V2[k]
V3[k]
V4[k]
V5[k]
45 45
15 15
1414
33
44
11
middot middot middot middot middot middot
(b) RS-QCCN
Figure 9 Two AMR-WB strong correlation network models
provided in Figure 10 (ROC curves for SS-QCCN and RS-QCCN are omitted for these two methods fail to steganalyzeAMR-WB steganography in spite of embedding capacity)It shows that all of the three steganographic methods canresist statistical steganalysis when the embedding rate is100 bps While the statistical steganalytic methods especiallyMFCC-based steganalysismay detect the existence of hiddenmessage embedded with NID-based steganography when theembedding rate is above 100 bps the proposed DN-basedsteganography may still have good security against bothMarkov-based and MFCC-based steganalysis
5 Conclusion
The adaptive multirate wideband (AMR-WB) is a widelyadapted format in mobile handsets and is also the recom-mended speech codec for VoLTE AMR-WB speech may bea good candidate for cover medium in speech steganographyIn this paper a novel AMR-WB speech steganographicmethod is proposed The experimental results demonstratedthe effectiveness of our proposed method The main contri-butions of this paper are as follows
(1) A novel AMR-WB speech steganography is pro-posed based on diameter-neighbor codebook parti-tion algorithm It can provide higher capacity with-out noticeable decrease in speech quality and better
10 Security and Communication Networks
DN
CNVNID
0
05
1
True
pos
itive
rate
02 04 06 08 10False positive rate
(a) Markov (TIMIT 100 bps)DN
CNVNID
02 04 06 08 10False positive rate
0
05
1
True
pos
itive
rate
(b) MFCC (TIMIT 100 bps)
NIDDN
0
05
1
True
pos
itive
rate
02 04 06 08 10False positive rate
(c) Markov (TIMIT 200 bps)
NIDDN
0
05
1
True
pos
itive
rate
02 04 06 08 10False positive rate
(d) MFCC (TIMIT 200 bps)
NIDDN
0
05
1
True
pos
itive
rate
02 04 06 08 10False positive rate
(e) Markov (TIMIT 300 bps)
NIDDN
0
05
1
True
pos
itive
rate
02 04 06 08 10False positive rate
(f) MFCC (TIMIT 300 bps)
Figure 10 ROC curves for steganalysis of CNV-based NID-based and our proposed steganography (50 training rate)
performance against statistical steganalysis thanNID-based method
(2) Flexible embedding capacity may be easily achievedwith different iterations of cluster merging Twicethe embedding capacity of CNV-based embeddingmethod may be obtained with119873119894 = 48
Conflicts of Interest
The authors declare that there are no conflicts of interestregarding the publication of this paper
Acknowledgments
This work was partially supported by the National NaturalScience Foundation of China under Grant no 61632013
References
[1] W Bender D Gruhl N Morimoto and A Lu ldquoTechniques fordata hidingrdquo IBM Systems Journal vol 35 no 3-4 pp 313ndash3351996
[2] D Gruhl A Lu and W Bender ldquoEcho hidingrdquo in InformationHiding R Anderson Ed vol 1174 of Lecture Notes in ComputerScience pp 295ndash315 Springer Berlin Heidelberg Berlin Ger-many 1996
[3] K Gopalan ldquoAudio steganography using bit modificationrdquo inProceedings of the 2003 International Conference on Multimediaand Expo ICME 2003 pp I629ndashI632 USA July 2003
[4] K Gopalan S Wenndt S Adams and D Haddad ldquoAudiosteganography by amplitude or phasemodificationrdquo in Proceed-ings of the Security andWatermarking ofMultimedia Contents Vpp 67ndash76 USA January 2003
[5] D Kirovski and H S Malvar ldquoSpread-spectrum watermarkingof audio signalsrdquo IEEE Transactions on Signal Processing vol 51no 4 pp 1020ndash1033 2003
[6] L Liu M Li Q Li and Y Liang ldquoPerceptually transparentinformation hiding in G729 bitstreamrdquo in Proceedings of the2008 4th International Conference on Intelligent InformationHiding andMultiedia Signal Processing IIH-MSP 2008 pp 406ndash409 China August 2008
[7] T Xu and Z Yang ldquoSimple and effective speech steganog-raphy in G7231 low-rate codesrdquo in Proceedings of the 2009
Security and Communication Networks 11
International Conference on Wireless Communications and Sig-nal Processing WCSP 2009 China November 2009
[8] A Shahbazi A H Rezaie and R Shahbazi ldquoMELPe codedspeech hiding on enhanced full rate compressed domainrdquo inProceedings of the Asia Modelling Symposium 2010 4th Inter-national Conference on Mathematical Modelling and ComputerSimulation AMS2010 pp 267ndash270 Malaysia May 2010
[9] A Nishimura ldquoData hiding in pitch delay data of the adaptivemulti-rate narrow-band speech codecrdquo in Proceedings of theIIH-MSP 2009-2009 5th International Conference on IntelligentInformation Hiding and Multimedia Signal Processing pp 483ndash486 Japan September 2009
[10] B Xiao Y Huang and S Tang ldquoAn approach to informationhiding in low bit-rate speech streamrdquo in Proceedings of the2008 IEEE Global Telecommunications Conference GLOBE-COM 2008 pp 1940ndash1944 USA December 2008
[11] B Chen and G W Wornell ldquoQuantization index modulationa class of provably good methods for digital watermarking andinformation embeddingrdquo Institute of Electrical and ElectronicsEngineers Transactions on InformationTheory vol 47 no 4 pp1423ndash1443 2001
[12] Y F Huang S Tang and J Yuan ldquoSteganography in inactiveframes of VoIP streams encoded by source codecrdquo IEEETransactions on Information Forensics and Security vol 6 no2 pp 296ndash306 2011
[13] YHuang C Liu S Tang and S Bai ldquoSteganography integrationinto a low-bit rate speech codecrdquo IEEE Transactions on Informa-tion Forensics and Security vol 7 no 6 pp 1865ndash1875 2012
[14] H Miao L Huang Z Chen W Yang and A Al-Hawbani ldquoAnew scheme for covert communication via 3G encoded speechrdquoComputers and Electrical Engineering vol 38 no 6 pp 1490ndash1501 2012
[15] H Tian J Liu and S Li ldquoImproving security of quantization-index-modulation steganography in low bit-rate speechstreamsrdquoMultimedia Systems vol 20 no 2 pp 143ndash154 2014
[16] J Liu H Tian J Lu and Y Chen ldquoNeighbor-index-divisionsteganography based on QIM method for G7231 speechstreamsrdquo Journal of Ambient Intelligence and Humanized Com-puting vol 7 no 1 pp 139ndash147 2016
[17] Q Liu A H Sung and M Qiao ldquoDerivative-based audiosteganalysisrdquo ACM Transactions on Multimedia ComputingCommunications andApplications (TOMM) vol 7 no 3 articleno 18 2011
[18] S Li Y Jia and C-C J Kuo ldquoSteganalysis of QIM Steganogra-phy in Low-Bit-Rate Speech Signalsrdquo IEEEACM TransactionsonAudio Speech and Language Processing vol 25 no 5 pp 1011ndash1022 2017
[19] ITU-T Wideband Coding of Speech at around 16 Kbps UsingAdaptive Multi-rate Wideband (AMR-WB) International Tele-communication Union Std G7222 2002
[20] Perceptual Evaluation of Speech Quality (PESQ) An ObjectiveMethod for End-to-end Speech Quality Assessment of Narrow-band Telephone Net-works and Speech Codecs InternationalTelecommunication Union Std P862 2001
[21] Wideband Extension to Recommendation P862 for the Assess-ment of Wideband Telephone Networks and Speech CodecsInternational Telecommunication Union Std P8622 2007
[22] C Chang and C Lin ldquoLIBSVM a Library for support vectormachinesrdquo ACM Transactions on Intelligent Systems and Tech-nology vol 2 no 3 article 27 2011
International Journal of
AerospaceEngineeringHindawiwwwhindawicom Volume 2018
RoboticsJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Active and Passive Electronic Components
VLSI Design
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Shock and Vibration
Hindawiwwwhindawicom Volume 2018
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawiwwwhindawicom
Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Control Scienceand Engineering
Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Journal ofEngineeringVolume 2018
SensorsJournal of
Hindawiwwwhindawicom Volume 2018
International Journal of
RotatingMachinery
Hindawiwwwhindawicom Volume 2018
Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Navigation and Observation
International Journal of
Hindawi
wwwhindawicom Volume 2018
Advances in
Multimedia
Submit your manuscripts atwwwhindawicom
Security and Communication Networks 5
(a) 1st iteration (119873119894 = 4) (b) 2nd iteration (119873119894 = 3) (c) 3rd iteration (119873119894 = 2)
(d) 4th iteration (119878 = 1198781015840 1198781015840clear()) (e) 4th iteration (119873119894 = 1)
0
0
1
1
11
11
00
0001 01
1010
(f) Labelling
Figure 3 An example of our proposed codebook partition
Input Codebook 119861 iterative parameter119873119894Output Cluster set 119878⋆ 1198781015840 is a helper cluster set ⋆1198781015840clear()119878clear()⋆ Each codeword is taken as a initial cluster ⋆for 119894 = 0 119894 lt 119873119888 ++119894 do119878push (119862119894)end⋆ Iterative merging ⋆while 119873119894 gt 0 do
if 119878 is empty then119878 = 11987810158401198781015840clear()end(1198621198891 1198621198891) = argmax119894119895isin12|119878|119863(119862119894 119862119895)1198791198901198981199011 = 1198621198891 cup 119873(1198621198891 119878)1198791198901198981199012 = 1198621198892 cup 119873(1198621198892 119878)1198781015840push (1198791198901198981199011)1198781015840push (1198791198901198981199012)119878remove (1198621198891)119878remove (1198621198892)119878remove (119873(1198621198891 119878))119878remove (119873(1198621198892 119878))119873119894 = 119873119894 minus 1
end⋆ Put the remaining clusters in 1198781015840 into 119878 ⋆for iter = 1198781015840begin() iter lt 1198781015840end() ++iterdo119878push (lowastiter)
endreturn 119878
Algorithm 1 DN-based codebook partition algorithm
Cluster set S
ISF
Stego ISF
Secret
00 01
1110
0 1
Search amp replace
ClusteL1
ClusteL2
index Ia
index Ib
Wa Wb
WcWd
bits ldquo01rdquo
Ia Ib
Figure 4 Embedding two bits into one cover ISF index
Figure 4 is an example of embedding two secret bits intoone cover ISF index Let us assume the cluster set 119878 containstwo clusters and the corresponding codeword indexed by 119868119909is119882119909 for example 119868119887 indexes the codeword119882119887 Hence theISF index 119868119886 shown in Figure 4 will be replaced with 119868119887 whichindexes the codeword119882119887 with the same label as the secret bitsldquo01rdquo
33 Extracting Procedure When the stego AMR-WB speechis transferred to the intended receiver the stego indices maybe obtained by parsing AMR-WB speech stream and used toextract the embedded secretmessageThemessage extractionprocedures from the stegoindex 119868119887 are given below
Step 1 Search cluster set 119878 which is the same as that employedin the embedding procedure for the cluster119862which containsthe codeword119882119887 referred by the ISF index 119868119887Step 2 If there are totally 119873 codewords in 119862 the number ofsecret bits carried by 119868119887 is computed by 119899 = lfloorlog2119873rfloor
6 Security and Communication Networks
Stego ISF
Cluster set S
Extracted
00 01
1110
0 1
Search amp read
ClusteL1
ClusteL2
index Ib
Wa Wb
Wc Wd
bits ldquo01rdquo
ldquo01rdquo
Ib
Figure 5 Extracting two bits from one stego-ISF index
Step 3 Read the label of119882119887 as the extracted 119899 bits which areappended to the secret message bit sequence
Step 4 Repeat Steps 1ndash3 until all the secret bits are recovered
Figure 5 is the corresponding example of extracting twosecret bits from the stegoindex 119868119887 generated by the previousembedding instance shown in Figure 4 It can be easily seenthat the extracted secret bits are identical to the embeddedsecret bits
4 Experimental Results and Analysis
In order to demonstrate the performance of the proposedmethod the perceptual quality of the stego AMR-WB speechwith secret message embedded using our method is com-puted and compared to that of the stego AMR-WB speechgenerated with CNV and NID steganography Moreover theflexibility of embedding capacity and the security regardingstatistical detection are analyzed in detail
41 Audio Database TIMIT acoustic-phonetic continuousspeech corpus (httpscatalogldcupenneduldc93s1) is anaudio database which contains broadband recordings of630 speakers of eight major dialects of American Englisheach reading ten phonetically rich sentences and all audiosentences are sampled at 16 kHz In our experiments 1000audio sentences are randomly chosen from TIMIT databaseThe average maximum and minimum length of the chosenaudio sentences are 347 s 396 s and 312 s All audio files areconverted into AMR-WB format using standard codec
42 Speech Quality Evaluation The perceptual evaluation ofspeech quality (PESQ) described in the ITU-T P862 Recom-mendation [20] may be employed to evaluate speech qualityMoreover according to ITU-T P8622 [21] the raw PESQscore can be converted to mean opinion score-listening qual-ity objective (MOS-LQO) which is more suitable for evalu-ating wideband speech Hence MOS-LQO is applied in ourexperimentsThe normal range ofMOS-LQO score is 1017 to4549 The higher the score the better the quality
Figure 6 shows the MOS-LQO scores of the 1000 coverAMR-WB speeches in 2385 kbits mode and the correspond-ing stego AMR-WB speeches using three different codebookpartition algorithmsThree progressive embedding rates that
StandardCNVNID
Ours
100 200 300 400 500 600 700 800 900 10000Sample index
18222630343842
MO
S-LQ
O sc
ore
(a) The embedding rate is 100 bps
StandardNIDOurs
18222630343842
MO
S-LQ
O sc
ore
100 200 300 400 500 600 700 800 900 10000Sample index
(b) The embedding rate is 200 bps
StandardNIDOurs
100 200 300 400 500 600 700 800 900 10000Sample index
1418222630343842
MO
S-LQ
O sc
ore
(c) The embedding rate is 300 bps
Figure 6 Comparisons of MOS-LQO values for 1000 samplesbetween the standard AMR-WB codec CNV-based steganographyNID-based steganography and the proposedDN-based steganogra-phy
is 100 bps 200 bps and 300 bps are employed in our experi-ments The indices of speech samples are sorted according totheMOS-LQO scores of our proposedmethod It can be seenfrom Figure 6 that the overall scores of the stego AMR-WBspeeches generated with our method are higher than thoseof the NID-based stego AMR-WB speeches especially whenthe embedding rates are 200 bps and 300 bps And the MOS-LQO scores of the CNV-based stego AMR-WB speeches areslightly higher than ours when the embedding rate is 100 bpswhich means there are no obvious discrepancies in speechquality between them Besides when the high embeddingrate that is 200 bps or 300 bps is used the decrease inMOS-LQO scores of our stego AMR-WB speeches is significantlysmaller than that of NID-based steganography
Moreover the average MOS-LQO scores of the coverAMR-WB speeches and the stego AMR-WB speeches withthree different codebook partition algorithms that is CNV
Security and Communication Networks 7
Table 1 MOS-LQO scores of the standard codec CNV-based NID-based and our proposed steganography in four different rate modes andthree embedding rates
Embedding rate Method Rate mode (kbits)1265 1585 1985 2385
Standard 2929 3073 3199 3269
100 bps
CNV 2871 3021 3153 3225(minus20) (minus17) (minus14) (minus13)NID 2750 2895 3020 3091(minus61) (minus58) (minus56) (minus54)Ours 2864 3010 3139 3216(minus22) (minus20) (minus19) (minus16)
200 bps
CNV
NID 2601 2736 2875 2921(minus112) (minus110) (minus107) (minus106)Ours 2807 2955 3084 3164(minus42) (minus38) (minus36) (minus32)
300 bps
CNV
NID 2284 2386 2475 2533(minus220) (minus223) (minus226) (minus225)Ours 2699 2841 2971 3046(minus79) (minus75) (minus71) (minus68)
NID and DN including four rate modes (1265 kbits1585 kbits 1985 kbits and 2385 kbits) together with threekinds of embedding rate (100 bps 200 bps and 300 bps) aregiven in Table 1 Only the MOS-LQO scores of NID-basedand DN-based steganographic methods with embeddingrates 200 bps and 300 bps are given in Table 1 because theembedding capacity of CNV-based steganography may notbe larger than 100 bps
When the embedding rate is 100 bps which is almostthe limit of CNV steganography we can see from Table 1that the mean MOS-LQO scores of our proposed methodare only about 03 worse than CNV-based steganographyThe slight decrease may be almost imperceptible by humanauditory system (HAS) And there are significant increases ofapproximately 38 in the meanMOS-LQO scores when ourpresented method is compared to NID-based steganographyAnd it can be observed that when the embedding rates are200 bps and 300 bps the scores of our approach are improvedby about 7 and 15 correspondingly in contrast to those ofNID-based steganography
Furthermore we can also see that the experimentalresults of four rate modes are analogous The decrease ofspeech quality caused by NID-based steganography is morethan twice that caused by DN-based steganography And theproposedmethod can obtain twice the embedding capacity ofCNV-based steganography by sacrificing less than 2 speechquality in average In addition only a slight decline in speechquality is observed when 300 bps embedding rate is used inthe proposed DN-based method while 200 bps is employedin NID-based method
43 Flexible Embedding Capacity Compared to CNV-basedsteganography flexible embedding capacity may be obtained
to satisfy different practical demand with our proposedmethod The steganographic capacity can be adjusted bychanging the iteration parameter 119873119894 For different values of119873119894 for example 119873119894 = 32 33 54 the average embeddingcapacity and the MOS-LQO scores are given in Figure 7(a)and the corresponding results of NID-based steganographyare provided in Figure 7(b) for comparison Without loss ofgenerality only 2385 kbits mode is used
From Figure 7 we can observe that the embedding ratesignificantly increases with 119873119894 while the MOS-LQO scoreslightly goes down However as NID-based steganographyis concerned the MOS-LQO score rapidly declines with theincrease of the embedding rateTherefore the proposed DN-based steganography can achieve higher embedding capacitywith slight decrease in speech quality For example when119873119894 = 48 the size of each cluster in 119878 is equal to 4 and we canembed 4 bits per frame that is the embedding rate is 200 bpsbut at the same time the CNV algorithm can embed at most2 bits per frame (100 bps)
44 Resistibility of Statistical Steganalysis Speech steganog-raphy aims to hide secret message into cover speech withoutarousing suspicion It is very important for a steganographicmethod to resist statistical steganalysis which is the tech-nique of detecting the presence of hidden message Twostate-of-the-art steganalytic methods [17 18] are used toevaluate the performance of statistical undetectability of ourproposed method In [17] mel-cepstrum coefficients andMarkov transition features from the second-order derivativeof the audio signal are extracted to capture the statisticaldistortions caused by audio steganography while in [18]the correlation characteristics of split vector quantizationcodewords of linear predictive coding filter coefficients are
8 Security and Communication Networks
Embe
ddin
g ra
te (b
ps)
Embedding rateMOS-LQO
330
290
250
210
170
130
90
Times of cluster merging
33
31
29
27
25
23
MO
S-LQ
O sc
ore
565350474441383532
(a) Our proposed steganography
Number of sub-codebooks
Embedding rateMOS-LQO
109876543290
130
170
210
250
290
330
Embe
ddin
g ra
te (b
ps)
23
25
27
29
31
33
MO
S-LQ
O sc
ore
(b) NID-based steganography
Figure 7 Relationship between the embedding rates and the MOS-LQO scores for our proposed steganography and NID-basedsteganography
Table 2 Steganalysis results of different steganographic methods in 2385 kbitss mode
Training rate 04 05 06Method Markov MFCC SS-QCCN RS-QCCN Markov MFCC SS-QCCN RS-QCCN Markov MFCC SS-QCCN RS-QCCN100 bps
CNV 498 498 437 490 501 502 440 492 500 505 419 500NID 510 601 422 500 501 609 429 487 521 598 418 494Ours 500 500 440 494 503 493 403 494 491 486 418 433
200 bpsCNV NID 535 745 469 500 533 762 476 500 536 758 444 501Ours 510 483 452 500 498 487 422 500 505 486 450 500
300 bpsCNV NID 548 746 493 500 563 772 500 500 554 783 505 506Ours 524 497 479 500 528 609 482 500 538 501 466 500
utilized to steganalyze QIM-based steganography in low-bit-rate speech (such as G7231 and G729) Both steganalyticmethods use a support vector machine to predict the exis-tence of hidden message in given audios
In our experiments the sentences chosen from ldquoTIMITrdquodatabases as stated in Section 41 are first encoded using thestandard AMR-WB codec These AMR-WB recordings con-stitute the cover speech setThen secret message is embeddedinto each cover AMR-WB speech with different embeddingrates that is 100 bps 200 bps and 300 bps by CNV-basedNID-based andDN-based steganographyOf course 200 bpsand 300 bps may be omitted for CNV-based steganogra-phy because of its limited embedding capacity And sevenstegospeech sets are generated amongwhich one set is relatedto CNV-based steganographic method and each of three setsis associated with NID-based and DN-based steganographyrespectivelyMoreover only 2385 kbitsmode is usedwithoutloss of generality
In each experiment a pair of cover and stego speech setsis randomly divided into training and testing sets accordingto three kinds of training rates that is 04 05 and 06 For
example if the training rate is 04 the training set contains40 speech samples randomly chosen from each of the coverand stegospeech sets and the remaining 60 samples go intothe testing set As described in [17 18] LIBSVM [22] is usedas a classifier and radial basis function (RBF) kernel and grid-search technique are employed to obtain better classificationperformance For Li et alrsquos steganalytic method the principalcomponent analysis (PCA) is first used as suggested in [18]to reduce the dimension of feature vectors to 300 Let thesamples in cover speech set denote negatives and those instego speech set stand for positives Hence the accuracy maybe defined as follows
Accuracy = 12 times ( TPTP + FN
+ TNFP + TN
) (5)
where TP are true positives TN are true negatives FN arefalse negatives and FP are false positives
The steganalytic results are given in Table 2 It can beseen that when the embedding rate is 100 bps the accuracyof detecting both CNV-based and DN-based methods isalmost the same say 50 or so while that of detecting
Security and Communication Networks 9
0 11 12 13 14 15 21 22 23 24 25 31 32 33 34 35 41 42 43 44 45 51 52 53 54 55Edge
0123456789
10
Cor
relat
ion
inde
x
12
13
14
15
23
24
25
34
35
45
Figure 8 The correlation index of 1000 AMR-WB speeches wherethe interframe edge 119894119894 connects two vertices 119881119894[119896] and 119881119894[119896 + 1] intwo neighboring frames and the intraframe edge 1198941198951015840 connects twovertices 119881119894[119896] and 119881119895[119896] in the same frame
NID-based steganography increases to 60 when MFCC-based steganalytic method is applied Moreover there isan apparent increase in the accuracy of detecting NID-based hiding method with the embedding rate increases to200 bps or 300 bps when Liu et alrsquos methods (ie Markovand MFCC-based steganalytic methods) are applied But theaccuracy of steganalyzing our proposed method DN-basedsteganography stays at the same level of 50 Therefore theproposed method may defend against Liu et alrsquos statisticalsteganalysis [17] even with higher embedding rates
According to the definition of the correlation index givenin [18] the experimental results of the correlation indices of1000 AMR-WB speeches which are randomly selected fromldquoTIMITrdquo are shown in Figure 8 Based on these results twostrong quantization codeword correlation network (QCCN)models say SS-QCCN and RS-QCCN can be constructedas illustrated in Figure 9 These two models are then usedto steganalyze our proposed steganography The steganalyticresults are also presented in Table 2 It can be seen fromTable 2 that the accuracy of both SS-QCCN and RS-QCCNis less than 50 for all of the AMR-WB stegospeeches Thepossible reasons may be that only the second and thirdcodebooks in the second stage are employed in the AMR-WB speech steganography which means merely the vertices1198812[119896] and 1198813[119896] in the 119896th frame may be changed duringsteganography while none of them are utilized in Li et alrsquossteganalytic method except for the edge ldquo33rdquo in RS-QCCNmodel Besides we also used an adapted QCCN model (ieutilize edges ldquo22rdquo ldquo33rdquo and ldquo231015840rdquo) targeted at AMR-WBspeech but the accuracy is still less than 50 It may bebecause the correlation of those edges is not strong enoughfor steganalysis according to Figure 8 Therefore it is reason-able to conclude that theAMR-WB speech steganography candefend against the steganalytic method proposed in [18]
In order to visualize the detection performance wegive some receiver operating characteristic (ROC) curvesof steganalyzing CNV-based steganography with 100 bpsembedding rate and NID-based and DN-based steganogra-phy with 100 bps 200 bps and 300 bps embedding rates are
V1[k]
V2[k]
V3[k]
V4[k]
V5[k]
V1[k + 1]
V2[k + 1]
V3[k + 1]
V4[k + 1]
V5[k + 1]
45 45
11
middot middot middot middot middot middot
(a) SS-QCCN
V1[k + 1]
V2[k + 1]
V3[k + 1]
V4[k + 1]
V5[k + 1]
V1[k]
V2[k]
V3[k]
V4[k]
V5[k]
45 45
15 15
1414
33
44
11
middot middot middot middot middot middot
(b) RS-QCCN
Figure 9 Two AMR-WB strong correlation network models
provided in Figure 10 (ROC curves for SS-QCCN and RS-QCCN are omitted for these two methods fail to steganalyzeAMR-WB steganography in spite of embedding capacity)It shows that all of the three steganographic methods canresist statistical steganalysis when the embedding rate is100 bps While the statistical steganalytic methods especiallyMFCC-based steganalysismay detect the existence of hiddenmessage embedded with NID-based steganography when theembedding rate is above 100 bps the proposed DN-basedsteganography may still have good security against bothMarkov-based and MFCC-based steganalysis
5 Conclusion
The adaptive multirate wideband (AMR-WB) is a widelyadapted format in mobile handsets and is also the recom-mended speech codec for VoLTE AMR-WB speech may bea good candidate for cover medium in speech steganographyIn this paper a novel AMR-WB speech steganographicmethod is proposed The experimental results demonstratedthe effectiveness of our proposed method The main contri-butions of this paper are as follows
(1) A novel AMR-WB speech steganography is pro-posed based on diameter-neighbor codebook parti-tion algorithm It can provide higher capacity with-out noticeable decrease in speech quality and better
10 Security and Communication Networks
DN
CNVNID
0
05
1
True
pos
itive
rate
02 04 06 08 10False positive rate
(a) Markov (TIMIT 100 bps)DN
CNVNID
02 04 06 08 10False positive rate
0
05
1
True
pos
itive
rate
(b) MFCC (TIMIT 100 bps)
NIDDN
0
05
1
True
pos
itive
rate
02 04 06 08 10False positive rate
(c) Markov (TIMIT 200 bps)
NIDDN
0
05
1
True
pos
itive
rate
02 04 06 08 10False positive rate
(d) MFCC (TIMIT 200 bps)
NIDDN
0
05
1
True
pos
itive
rate
02 04 06 08 10False positive rate
(e) Markov (TIMIT 300 bps)
NIDDN
0
05
1
True
pos
itive
rate
02 04 06 08 10False positive rate
(f) MFCC (TIMIT 300 bps)
Figure 10 ROC curves for steganalysis of CNV-based NID-based and our proposed steganography (50 training rate)
performance against statistical steganalysis thanNID-based method
(2) Flexible embedding capacity may be easily achievedwith different iterations of cluster merging Twicethe embedding capacity of CNV-based embeddingmethod may be obtained with119873119894 = 48
Conflicts of Interest
The authors declare that there are no conflicts of interestregarding the publication of this paper
Acknowledgments
This work was partially supported by the National NaturalScience Foundation of China under Grant no 61632013
References
[1] W Bender D Gruhl N Morimoto and A Lu ldquoTechniques fordata hidingrdquo IBM Systems Journal vol 35 no 3-4 pp 313ndash3351996
[2] D Gruhl A Lu and W Bender ldquoEcho hidingrdquo in InformationHiding R Anderson Ed vol 1174 of Lecture Notes in ComputerScience pp 295ndash315 Springer Berlin Heidelberg Berlin Ger-many 1996
[3] K Gopalan ldquoAudio steganography using bit modificationrdquo inProceedings of the 2003 International Conference on Multimediaand Expo ICME 2003 pp I629ndashI632 USA July 2003
[4] K Gopalan S Wenndt S Adams and D Haddad ldquoAudiosteganography by amplitude or phasemodificationrdquo in Proceed-ings of the Security andWatermarking ofMultimedia Contents Vpp 67ndash76 USA January 2003
[5] D Kirovski and H S Malvar ldquoSpread-spectrum watermarkingof audio signalsrdquo IEEE Transactions on Signal Processing vol 51no 4 pp 1020ndash1033 2003
[6] L Liu M Li Q Li and Y Liang ldquoPerceptually transparentinformation hiding in G729 bitstreamrdquo in Proceedings of the2008 4th International Conference on Intelligent InformationHiding andMultiedia Signal Processing IIH-MSP 2008 pp 406ndash409 China August 2008
[7] T Xu and Z Yang ldquoSimple and effective speech steganog-raphy in G7231 low-rate codesrdquo in Proceedings of the 2009
Security and Communication Networks 11
International Conference on Wireless Communications and Sig-nal Processing WCSP 2009 China November 2009
[8] A Shahbazi A H Rezaie and R Shahbazi ldquoMELPe codedspeech hiding on enhanced full rate compressed domainrdquo inProceedings of the Asia Modelling Symposium 2010 4th Inter-national Conference on Mathematical Modelling and ComputerSimulation AMS2010 pp 267ndash270 Malaysia May 2010
[9] A Nishimura ldquoData hiding in pitch delay data of the adaptivemulti-rate narrow-band speech codecrdquo in Proceedings of theIIH-MSP 2009-2009 5th International Conference on IntelligentInformation Hiding and Multimedia Signal Processing pp 483ndash486 Japan September 2009
[10] B Xiao Y Huang and S Tang ldquoAn approach to informationhiding in low bit-rate speech streamrdquo in Proceedings of the2008 IEEE Global Telecommunications Conference GLOBE-COM 2008 pp 1940ndash1944 USA December 2008
[11] B Chen and G W Wornell ldquoQuantization index modulationa class of provably good methods for digital watermarking andinformation embeddingrdquo Institute of Electrical and ElectronicsEngineers Transactions on InformationTheory vol 47 no 4 pp1423ndash1443 2001
[12] Y F Huang S Tang and J Yuan ldquoSteganography in inactiveframes of VoIP streams encoded by source codecrdquo IEEETransactions on Information Forensics and Security vol 6 no2 pp 296ndash306 2011
[13] YHuang C Liu S Tang and S Bai ldquoSteganography integrationinto a low-bit rate speech codecrdquo IEEE Transactions on Informa-tion Forensics and Security vol 7 no 6 pp 1865ndash1875 2012
[14] H Miao L Huang Z Chen W Yang and A Al-Hawbani ldquoAnew scheme for covert communication via 3G encoded speechrdquoComputers and Electrical Engineering vol 38 no 6 pp 1490ndash1501 2012
[15] H Tian J Liu and S Li ldquoImproving security of quantization-index-modulation steganography in low bit-rate speechstreamsrdquoMultimedia Systems vol 20 no 2 pp 143ndash154 2014
[16] J Liu H Tian J Lu and Y Chen ldquoNeighbor-index-divisionsteganography based on QIM method for G7231 speechstreamsrdquo Journal of Ambient Intelligence and Humanized Com-puting vol 7 no 1 pp 139ndash147 2016
[17] Q Liu A H Sung and M Qiao ldquoDerivative-based audiosteganalysisrdquo ACM Transactions on Multimedia ComputingCommunications andApplications (TOMM) vol 7 no 3 articleno 18 2011
[18] S Li Y Jia and C-C J Kuo ldquoSteganalysis of QIM Steganogra-phy in Low-Bit-Rate Speech Signalsrdquo IEEEACM TransactionsonAudio Speech and Language Processing vol 25 no 5 pp 1011ndash1022 2017
[19] ITU-T Wideband Coding of Speech at around 16 Kbps UsingAdaptive Multi-rate Wideband (AMR-WB) International Tele-communication Union Std G7222 2002
[20] Perceptual Evaluation of Speech Quality (PESQ) An ObjectiveMethod for End-to-end Speech Quality Assessment of Narrow-band Telephone Net-works and Speech Codecs InternationalTelecommunication Union Std P862 2001
[21] Wideband Extension to Recommendation P862 for the Assess-ment of Wideband Telephone Networks and Speech CodecsInternational Telecommunication Union Std P8622 2007
[22] C Chang and C Lin ldquoLIBSVM a Library for support vectormachinesrdquo ACM Transactions on Intelligent Systems and Tech-nology vol 2 no 3 article 27 2011
International Journal of
AerospaceEngineeringHindawiwwwhindawicom Volume 2018
RoboticsJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Active and Passive Electronic Components
VLSI Design
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Shock and Vibration
Hindawiwwwhindawicom Volume 2018
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawiwwwhindawicom
Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Control Scienceand Engineering
Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Journal ofEngineeringVolume 2018
SensorsJournal of
Hindawiwwwhindawicom Volume 2018
International Journal of
RotatingMachinery
Hindawiwwwhindawicom Volume 2018
Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Navigation and Observation
International Journal of
Hindawi
wwwhindawicom Volume 2018
Advances in
Multimedia
Submit your manuscripts atwwwhindawicom
6 Security and Communication Networks
Stego ISF
Cluster set S
Extracted
00 01
1110
0 1
Search amp read
ClusteL1
ClusteL2
index Ib
Wa Wb
Wc Wd
bits ldquo01rdquo
ldquo01rdquo
Ib
Figure 5 Extracting two bits from one stego-ISF index
Step 3 Read the label of119882119887 as the extracted 119899 bits which areappended to the secret message bit sequence
Step 4 Repeat Steps 1ndash3 until all the secret bits are recovered
Figure 5 is the corresponding example of extracting twosecret bits from the stegoindex 119868119887 generated by the previousembedding instance shown in Figure 4 It can be easily seenthat the extracted secret bits are identical to the embeddedsecret bits
4 Experimental Results and Analysis
In order to demonstrate the performance of the proposedmethod the perceptual quality of the stego AMR-WB speechwith secret message embedded using our method is com-puted and compared to that of the stego AMR-WB speechgenerated with CNV and NID steganography Moreover theflexibility of embedding capacity and the security regardingstatistical detection are analyzed in detail
41 Audio Database TIMIT acoustic-phonetic continuousspeech corpus (httpscatalogldcupenneduldc93s1) is anaudio database which contains broadband recordings of630 speakers of eight major dialects of American Englisheach reading ten phonetically rich sentences and all audiosentences are sampled at 16 kHz In our experiments 1000audio sentences are randomly chosen from TIMIT databaseThe average maximum and minimum length of the chosenaudio sentences are 347 s 396 s and 312 s All audio files areconverted into AMR-WB format using standard codec
42 Speech Quality Evaluation The perceptual evaluation ofspeech quality (PESQ) described in the ITU-T P862 Recom-mendation [20] may be employed to evaluate speech qualityMoreover according to ITU-T P8622 [21] the raw PESQscore can be converted to mean opinion score-listening qual-ity objective (MOS-LQO) which is more suitable for evalu-ating wideband speech Hence MOS-LQO is applied in ourexperimentsThe normal range ofMOS-LQO score is 1017 to4549 The higher the score the better the quality
Figure 6 shows the MOS-LQO scores of the 1000 coverAMR-WB speeches in 2385 kbits mode and the correspond-ing stego AMR-WB speeches using three different codebookpartition algorithmsThree progressive embedding rates that
StandardCNVNID
Ours
100 200 300 400 500 600 700 800 900 10000Sample index
18222630343842
MO
S-LQ
O sc
ore
(a) The embedding rate is 100 bps
StandardNIDOurs
18222630343842
MO
S-LQ
O sc
ore
100 200 300 400 500 600 700 800 900 10000Sample index
(b) The embedding rate is 200 bps
StandardNIDOurs
100 200 300 400 500 600 700 800 900 10000Sample index
1418222630343842
MO
S-LQ
O sc
ore
(c) The embedding rate is 300 bps
Figure 6 Comparisons of MOS-LQO values for 1000 samplesbetween the standard AMR-WB codec CNV-based steganographyNID-based steganography and the proposedDN-based steganogra-phy
is 100 bps 200 bps and 300 bps are employed in our experi-ments The indices of speech samples are sorted according totheMOS-LQO scores of our proposedmethod It can be seenfrom Figure 6 that the overall scores of the stego AMR-WBspeeches generated with our method are higher than thoseof the NID-based stego AMR-WB speeches especially whenthe embedding rates are 200 bps and 300 bps And the MOS-LQO scores of the CNV-based stego AMR-WB speeches areslightly higher than ours when the embedding rate is 100 bpswhich means there are no obvious discrepancies in speechquality between them Besides when the high embeddingrate that is 200 bps or 300 bps is used the decrease inMOS-LQO scores of our stego AMR-WB speeches is significantlysmaller than that of NID-based steganography
Moreover the average MOS-LQO scores of the coverAMR-WB speeches and the stego AMR-WB speeches withthree different codebook partition algorithms that is CNV
Security and Communication Networks 7
Table 1 MOS-LQO scores of the standard codec CNV-based NID-based and our proposed steganography in four different rate modes andthree embedding rates
Embedding rate Method Rate mode (kbits)1265 1585 1985 2385
Standard 2929 3073 3199 3269
100 bps
CNV 2871 3021 3153 3225(minus20) (minus17) (minus14) (minus13)NID 2750 2895 3020 3091(minus61) (minus58) (minus56) (minus54)Ours 2864 3010 3139 3216(minus22) (minus20) (minus19) (minus16)
200 bps
CNV
NID 2601 2736 2875 2921(minus112) (minus110) (minus107) (minus106)Ours 2807 2955 3084 3164(minus42) (minus38) (minus36) (minus32)
300 bps
CNV
NID 2284 2386 2475 2533(minus220) (minus223) (minus226) (minus225)Ours 2699 2841 2971 3046(minus79) (minus75) (minus71) (minus68)
NID and DN including four rate modes (1265 kbits1585 kbits 1985 kbits and 2385 kbits) together with threekinds of embedding rate (100 bps 200 bps and 300 bps) aregiven in Table 1 Only the MOS-LQO scores of NID-basedand DN-based steganographic methods with embeddingrates 200 bps and 300 bps are given in Table 1 because theembedding capacity of CNV-based steganography may notbe larger than 100 bps
When the embedding rate is 100 bps which is almostthe limit of CNV steganography we can see from Table 1that the mean MOS-LQO scores of our proposed methodare only about 03 worse than CNV-based steganographyThe slight decrease may be almost imperceptible by humanauditory system (HAS) And there are significant increases ofapproximately 38 in the meanMOS-LQO scores when ourpresented method is compared to NID-based steganographyAnd it can be observed that when the embedding rates are200 bps and 300 bps the scores of our approach are improvedby about 7 and 15 correspondingly in contrast to those ofNID-based steganography
Furthermore we can also see that the experimentalresults of four rate modes are analogous The decrease ofspeech quality caused by NID-based steganography is morethan twice that caused by DN-based steganography And theproposedmethod can obtain twice the embedding capacity ofCNV-based steganography by sacrificing less than 2 speechquality in average In addition only a slight decline in speechquality is observed when 300 bps embedding rate is used inthe proposed DN-based method while 200 bps is employedin NID-based method
43 Flexible Embedding Capacity Compared to CNV-basedsteganography flexible embedding capacity may be obtained
to satisfy different practical demand with our proposedmethod The steganographic capacity can be adjusted bychanging the iteration parameter 119873119894 For different values of119873119894 for example 119873119894 = 32 33 54 the average embeddingcapacity and the MOS-LQO scores are given in Figure 7(a)and the corresponding results of NID-based steganographyare provided in Figure 7(b) for comparison Without loss ofgenerality only 2385 kbits mode is used
From Figure 7 we can observe that the embedding ratesignificantly increases with 119873119894 while the MOS-LQO scoreslightly goes down However as NID-based steganographyis concerned the MOS-LQO score rapidly declines with theincrease of the embedding rateTherefore the proposed DN-based steganography can achieve higher embedding capacitywith slight decrease in speech quality For example when119873119894 = 48 the size of each cluster in 119878 is equal to 4 and we canembed 4 bits per frame that is the embedding rate is 200 bpsbut at the same time the CNV algorithm can embed at most2 bits per frame (100 bps)
44 Resistibility of Statistical Steganalysis Speech steganog-raphy aims to hide secret message into cover speech withoutarousing suspicion It is very important for a steganographicmethod to resist statistical steganalysis which is the tech-nique of detecting the presence of hidden message Twostate-of-the-art steganalytic methods [17 18] are used toevaluate the performance of statistical undetectability of ourproposed method In [17] mel-cepstrum coefficients andMarkov transition features from the second-order derivativeof the audio signal are extracted to capture the statisticaldistortions caused by audio steganography while in [18]the correlation characteristics of split vector quantizationcodewords of linear predictive coding filter coefficients are
8 Security and Communication Networks
Embe
ddin
g ra
te (b
ps)
Embedding rateMOS-LQO
330
290
250
210
170
130
90
Times of cluster merging
33
31
29
27
25
23
MO
S-LQ
O sc
ore
565350474441383532
(a) Our proposed steganography
Number of sub-codebooks
Embedding rateMOS-LQO
109876543290
130
170
210
250
290
330
Embe
ddin
g ra
te (b
ps)
23
25
27
29
31
33
MO
S-LQ
O sc
ore
(b) NID-based steganography
Figure 7 Relationship between the embedding rates and the MOS-LQO scores for our proposed steganography and NID-basedsteganography
Table 2 Steganalysis results of different steganographic methods in 2385 kbitss mode
Training rate 04 05 06Method Markov MFCC SS-QCCN RS-QCCN Markov MFCC SS-QCCN RS-QCCN Markov MFCC SS-QCCN RS-QCCN100 bps
CNV 498 498 437 490 501 502 440 492 500 505 419 500NID 510 601 422 500 501 609 429 487 521 598 418 494Ours 500 500 440 494 503 493 403 494 491 486 418 433
200 bpsCNV NID 535 745 469 500 533 762 476 500 536 758 444 501Ours 510 483 452 500 498 487 422 500 505 486 450 500
300 bpsCNV NID 548 746 493 500 563 772 500 500 554 783 505 506Ours 524 497 479 500 528 609 482 500 538 501 466 500
utilized to steganalyze QIM-based steganography in low-bit-rate speech (such as G7231 and G729) Both steganalyticmethods use a support vector machine to predict the exis-tence of hidden message in given audios
In our experiments the sentences chosen from ldquoTIMITrdquodatabases as stated in Section 41 are first encoded using thestandard AMR-WB codec These AMR-WB recordings con-stitute the cover speech setThen secret message is embeddedinto each cover AMR-WB speech with different embeddingrates that is 100 bps 200 bps and 300 bps by CNV-basedNID-based andDN-based steganographyOf course 200 bpsand 300 bps may be omitted for CNV-based steganogra-phy because of its limited embedding capacity And sevenstegospeech sets are generated amongwhich one set is relatedto CNV-based steganographic method and each of three setsis associated with NID-based and DN-based steganographyrespectivelyMoreover only 2385 kbitsmode is usedwithoutloss of generality
In each experiment a pair of cover and stego speech setsis randomly divided into training and testing sets accordingto three kinds of training rates that is 04 05 and 06 For
example if the training rate is 04 the training set contains40 speech samples randomly chosen from each of the coverand stegospeech sets and the remaining 60 samples go intothe testing set As described in [17 18] LIBSVM [22] is usedas a classifier and radial basis function (RBF) kernel and grid-search technique are employed to obtain better classificationperformance For Li et alrsquos steganalytic method the principalcomponent analysis (PCA) is first used as suggested in [18]to reduce the dimension of feature vectors to 300 Let thesamples in cover speech set denote negatives and those instego speech set stand for positives Hence the accuracy maybe defined as follows
Accuracy = 12 times ( TPTP + FN
+ TNFP + TN
) (5)
where TP are true positives TN are true negatives FN arefalse negatives and FP are false positives
The steganalytic results are given in Table 2 It can beseen that when the embedding rate is 100 bps the accuracyof detecting both CNV-based and DN-based methods isalmost the same say 50 or so while that of detecting
Security and Communication Networks 9
0 11 12 13 14 15 21 22 23 24 25 31 32 33 34 35 41 42 43 44 45 51 52 53 54 55Edge
0123456789
10
Cor
relat
ion
inde
x
12
13
14
15
23
24
25
34
35
45
Figure 8 The correlation index of 1000 AMR-WB speeches wherethe interframe edge 119894119894 connects two vertices 119881119894[119896] and 119881119894[119896 + 1] intwo neighboring frames and the intraframe edge 1198941198951015840 connects twovertices 119881119894[119896] and 119881119895[119896] in the same frame
NID-based steganography increases to 60 when MFCC-based steganalytic method is applied Moreover there isan apparent increase in the accuracy of detecting NID-based hiding method with the embedding rate increases to200 bps or 300 bps when Liu et alrsquos methods (ie Markovand MFCC-based steganalytic methods) are applied But theaccuracy of steganalyzing our proposed method DN-basedsteganography stays at the same level of 50 Therefore theproposed method may defend against Liu et alrsquos statisticalsteganalysis [17] even with higher embedding rates
According to the definition of the correlation index givenin [18] the experimental results of the correlation indices of1000 AMR-WB speeches which are randomly selected fromldquoTIMITrdquo are shown in Figure 8 Based on these results twostrong quantization codeword correlation network (QCCN)models say SS-QCCN and RS-QCCN can be constructedas illustrated in Figure 9 These two models are then usedto steganalyze our proposed steganography The steganalyticresults are also presented in Table 2 It can be seen fromTable 2 that the accuracy of both SS-QCCN and RS-QCCNis less than 50 for all of the AMR-WB stegospeeches Thepossible reasons may be that only the second and thirdcodebooks in the second stage are employed in the AMR-WB speech steganography which means merely the vertices1198812[119896] and 1198813[119896] in the 119896th frame may be changed duringsteganography while none of them are utilized in Li et alrsquossteganalytic method except for the edge ldquo33rdquo in RS-QCCNmodel Besides we also used an adapted QCCN model (ieutilize edges ldquo22rdquo ldquo33rdquo and ldquo231015840rdquo) targeted at AMR-WBspeech but the accuracy is still less than 50 It may bebecause the correlation of those edges is not strong enoughfor steganalysis according to Figure 8 Therefore it is reason-able to conclude that theAMR-WB speech steganography candefend against the steganalytic method proposed in [18]
In order to visualize the detection performance wegive some receiver operating characteristic (ROC) curvesof steganalyzing CNV-based steganography with 100 bpsembedding rate and NID-based and DN-based steganogra-phy with 100 bps 200 bps and 300 bps embedding rates are
V1[k]
V2[k]
V3[k]
V4[k]
V5[k]
V1[k + 1]
V2[k + 1]
V3[k + 1]
V4[k + 1]
V5[k + 1]
45 45
11
middot middot middot middot middot middot
(a) SS-QCCN
V1[k + 1]
V2[k + 1]
V3[k + 1]
V4[k + 1]
V5[k + 1]
V1[k]
V2[k]
V3[k]
V4[k]
V5[k]
45 45
15 15
1414
33
44
11
middot middot middot middot middot middot
(b) RS-QCCN
Figure 9 Two AMR-WB strong correlation network models
provided in Figure 10 (ROC curves for SS-QCCN and RS-QCCN are omitted for these two methods fail to steganalyzeAMR-WB steganography in spite of embedding capacity)It shows that all of the three steganographic methods canresist statistical steganalysis when the embedding rate is100 bps While the statistical steganalytic methods especiallyMFCC-based steganalysismay detect the existence of hiddenmessage embedded with NID-based steganography when theembedding rate is above 100 bps the proposed DN-basedsteganography may still have good security against bothMarkov-based and MFCC-based steganalysis
5 Conclusion
The adaptive multirate wideband (AMR-WB) is a widelyadapted format in mobile handsets and is also the recom-mended speech codec for VoLTE AMR-WB speech may bea good candidate for cover medium in speech steganographyIn this paper a novel AMR-WB speech steganographicmethod is proposed The experimental results demonstratedthe effectiveness of our proposed method The main contri-butions of this paper are as follows
(1) A novel AMR-WB speech steganography is pro-posed based on diameter-neighbor codebook parti-tion algorithm It can provide higher capacity with-out noticeable decrease in speech quality and better
10 Security and Communication Networks
DN
CNVNID
0
05
1
True
pos
itive
rate
02 04 06 08 10False positive rate
(a) Markov (TIMIT 100 bps)DN
CNVNID
02 04 06 08 10False positive rate
0
05
1
True
pos
itive
rate
(b) MFCC (TIMIT 100 bps)
NIDDN
0
05
1
True
pos
itive
rate
02 04 06 08 10False positive rate
(c) Markov (TIMIT 200 bps)
NIDDN
0
05
1
True
pos
itive
rate
02 04 06 08 10False positive rate
(d) MFCC (TIMIT 200 bps)
NIDDN
0
05
1
True
pos
itive
rate
02 04 06 08 10False positive rate
(e) Markov (TIMIT 300 bps)
NIDDN
0
05
1
True
pos
itive
rate
02 04 06 08 10False positive rate
(f) MFCC (TIMIT 300 bps)
Figure 10 ROC curves for steganalysis of CNV-based NID-based and our proposed steganography (50 training rate)
performance against statistical steganalysis thanNID-based method
(2) Flexible embedding capacity may be easily achievedwith different iterations of cluster merging Twicethe embedding capacity of CNV-based embeddingmethod may be obtained with119873119894 = 48
Conflicts of Interest
The authors declare that there are no conflicts of interestregarding the publication of this paper
Acknowledgments
This work was partially supported by the National NaturalScience Foundation of China under Grant no 61632013
References
[1] W Bender D Gruhl N Morimoto and A Lu ldquoTechniques fordata hidingrdquo IBM Systems Journal vol 35 no 3-4 pp 313ndash3351996
[2] D Gruhl A Lu and W Bender ldquoEcho hidingrdquo in InformationHiding R Anderson Ed vol 1174 of Lecture Notes in ComputerScience pp 295ndash315 Springer Berlin Heidelberg Berlin Ger-many 1996
[3] K Gopalan ldquoAudio steganography using bit modificationrdquo inProceedings of the 2003 International Conference on Multimediaand Expo ICME 2003 pp I629ndashI632 USA July 2003
[4] K Gopalan S Wenndt S Adams and D Haddad ldquoAudiosteganography by amplitude or phasemodificationrdquo in Proceed-ings of the Security andWatermarking ofMultimedia Contents Vpp 67ndash76 USA January 2003
[5] D Kirovski and H S Malvar ldquoSpread-spectrum watermarkingof audio signalsrdquo IEEE Transactions on Signal Processing vol 51no 4 pp 1020ndash1033 2003
[6] L Liu M Li Q Li and Y Liang ldquoPerceptually transparentinformation hiding in G729 bitstreamrdquo in Proceedings of the2008 4th International Conference on Intelligent InformationHiding andMultiedia Signal Processing IIH-MSP 2008 pp 406ndash409 China August 2008
[7] T Xu and Z Yang ldquoSimple and effective speech steganog-raphy in G7231 low-rate codesrdquo in Proceedings of the 2009
Security and Communication Networks 11
International Conference on Wireless Communications and Sig-nal Processing WCSP 2009 China November 2009
[8] A Shahbazi A H Rezaie and R Shahbazi ldquoMELPe codedspeech hiding on enhanced full rate compressed domainrdquo inProceedings of the Asia Modelling Symposium 2010 4th Inter-national Conference on Mathematical Modelling and ComputerSimulation AMS2010 pp 267ndash270 Malaysia May 2010
[9] A Nishimura ldquoData hiding in pitch delay data of the adaptivemulti-rate narrow-band speech codecrdquo in Proceedings of theIIH-MSP 2009-2009 5th International Conference on IntelligentInformation Hiding and Multimedia Signal Processing pp 483ndash486 Japan September 2009
[10] B Xiao Y Huang and S Tang ldquoAn approach to informationhiding in low bit-rate speech streamrdquo in Proceedings of the2008 IEEE Global Telecommunications Conference GLOBE-COM 2008 pp 1940ndash1944 USA December 2008
[11] B Chen and G W Wornell ldquoQuantization index modulationa class of provably good methods for digital watermarking andinformation embeddingrdquo Institute of Electrical and ElectronicsEngineers Transactions on InformationTheory vol 47 no 4 pp1423ndash1443 2001
[12] Y F Huang S Tang and J Yuan ldquoSteganography in inactiveframes of VoIP streams encoded by source codecrdquo IEEETransactions on Information Forensics and Security vol 6 no2 pp 296ndash306 2011
[13] YHuang C Liu S Tang and S Bai ldquoSteganography integrationinto a low-bit rate speech codecrdquo IEEE Transactions on Informa-tion Forensics and Security vol 7 no 6 pp 1865ndash1875 2012
[14] H Miao L Huang Z Chen W Yang and A Al-Hawbani ldquoAnew scheme for covert communication via 3G encoded speechrdquoComputers and Electrical Engineering vol 38 no 6 pp 1490ndash1501 2012
[15] H Tian J Liu and S Li ldquoImproving security of quantization-index-modulation steganography in low bit-rate speechstreamsrdquoMultimedia Systems vol 20 no 2 pp 143ndash154 2014
[16] J Liu H Tian J Lu and Y Chen ldquoNeighbor-index-divisionsteganography based on QIM method for G7231 speechstreamsrdquo Journal of Ambient Intelligence and Humanized Com-puting vol 7 no 1 pp 139ndash147 2016
[17] Q Liu A H Sung and M Qiao ldquoDerivative-based audiosteganalysisrdquo ACM Transactions on Multimedia ComputingCommunications andApplications (TOMM) vol 7 no 3 articleno 18 2011
[18] S Li Y Jia and C-C J Kuo ldquoSteganalysis of QIM Steganogra-phy in Low-Bit-Rate Speech Signalsrdquo IEEEACM TransactionsonAudio Speech and Language Processing vol 25 no 5 pp 1011ndash1022 2017
[19] ITU-T Wideband Coding of Speech at around 16 Kbps UsingAdaptive Multi-rate Wideband (AMR-WB) International Tele-communication Union Std G7222 2002
[20] Perceptual Evaluation of Speech Quality (PESQ) An ObjectiveMethod for End-to-end Speech Quality Assessment of Narrow-band Telephone Net-works and Speech Codecs InternationalTelecommunication Union Std P862 2001
[21] Wideband Extension to Recommendation P862 for the Assess-ment of Wideband Telephone Networks and Speech CodecsInternational Telecommunication Union Std P8622 2007
[22] C Chang and C Lin ldquoLIBSVM a Library for support vectormachinesrdquo ACM Transactions on Intelligent Systems and Tech-nology vol 2 no 3 article 27 2011
International Journal of
AerospaceEngineeringHindawiwwwhindawicom Volume 2018
RoboticsJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Active and Passive Electronic Components
VLSI Design
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Shock and Vibration
Hindawiwwwhindawicom Volume 2018
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawiwwwhindawicom
Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Control Scienceand Engineering
Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Journal ofEngineeringVolume 2018
SensorsJournal of
Hindawiwwwhindawicom Volume 2018
International Journal of
RotatingMachinery
Hindawiwwwhindawicom Volume 2018
Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Navigation and Observation
International Journal of
Hindawi
wwwhindawicom Volume 2018
Advances in
Multimedia
Submit your manuscripts atwwwhindawicom
Security and Communication Networks 7
Table 1 MOS-LQO scores of the standard codec CNV-based NID-based and our proposed steganography in four different rate modes andthree embedding rates
Embedding rate Method Rate mode (kbits)1265 1585 1985 2385
Standard 2929 3073 3199 3269
100 bps
CNV 2871 3021 3153 3225(minus20) (minus17) (minus14) (minus13)NID 2750 2895 3020 3091(minus61) (minus58) (minus56) (minus54)Ours 2864 3010 3139 3216(minus22) (minus20) (minus19) (minus16)
200 bps
CNV
NID 2601 2736 2875 2921(minus112) (minus110) (minus107) (minus106)Ours 2807 2955 3084 3164(minus42) (minus38) (minus36) (minus32)
300 bps
CNV
NID 2284 2386 2475 2533(minus220) (minus223) (minus226) (minus225)Ours 2699 2841 2971 3046(minus79) (minus75) (minus71) (minus68)
NID and DN including four rate modes (1265 kbits1585 kbits 1985 kbits and 2385 kbits) together with threekinds of embedding rate (100 bps 200 bps and 300 bps) aregiven in Table 1 Only the MOS-LQO scores of NID-basedand DN-based steganographic methods with embeddingrates 200 bps and 300 bps are given in Table 1 because theembedding capacity of CNV-based steganography may notbe larger than 100 bps
When the embedding rate is 100 bps which is almostthe limit of CNV steganography we can see from Table 1that the mean MOS-LQO scores of our proposed methodare only about 03 worse than CNV-based steganographyThe slight decrease may be almost imperceptible by humanauditory system (HAS) And there are significant increases ofapproximately 38 in the meanMOS-LQO scores when ourpresented method is compared to NID-based steganographyAnd it can be observed that when the embedding rates are200 bps and 300 bps the scores of our approach are improvedby about 7 and 15 correspondingly in contrast to those ofNID-based steganography
Furthermore we can also see that the experimentalresults of four rate modes are analogous The decrease ofspeech quality caused by NID-based steganography is morethan twice that caused by DN-based steganography And theproposedmethod can obtain twice the embedding capacity ofCNV-based steganography by sacrificing less than 2 speechquality in average In addition only a slight decline in speechquality is observed when 300 bps embedding rate is used inthe proposed DN-based method while 200 bps is employedin NID-based method
43 Flexible Embedding Capacity Compared to CNV-basedsteganography flexible embedding capacity may be obtained
to satisfy different practical demand with our proposedmethod The steganographic capacity can be adjusted bychanging the iteration parameter 119873119894 For different values of119873119894 for example 119873119894 = 32 33 54 the average embeddingcapacity and the MOS-LQO scores are given in Figure 7(a)and the corresponding results of NID-based steganographyare provided in Figure 7(b) for comparison Without loss ofgenerality only 2385 kbits mode is used
From Figure 7 we can observe that the embedding ratesignificantly increases with 119873119894 while the MOS-LQO scoreslightly goes down However as NID-based steganographyis concerned the MOS-LQO score rapidly declines with theincrease of the embedding rateTherefore the proposed DN-based steganography can achieve higher embedding capacitywith slight decrease in speech quality For example when119873119894 = 48 the size of each cluster in 119878 is equal to 4 and we canembed 4 bits per frame that is the embedding rate is 200 bpsbut at the same time the CNV algorithm can embed at most2 bits per frame (100 bps)
44 Resistibility of Statistical Steganalysis Speech steganog-raphy aims to hide secret message into cover speech withoutarousing suspicion It is very important for a steganographicmethod to resist statistical steganalysis which is the tech-nique of detecting the presence of hidden message Twostate-of-the-art steganalytic methods [17 18] are used toevaluate the performance of statistical undetectability of ourproposed method In [17] mel-cepstrum coefficients andMarkov transition features from the second-order derivativeof the audio signal are extracted to capture the statisticaldistortions caused by audio steganography while in [18]the correlation characteristics of split vector quantizationcodewords of linear predictive coding filter coefficients are
8 Security and Communication Networks
Embe
ddin
g ra
te (b
ps)
Embedding rateMOS-LQO
330
290
250
210
170
130
90
Times of cluster merging
33
31
29
27
25
23
MO
S-LQ
O sc
ore
565350474441383532
(a) Our proposed steganography
Number of sub-codebooks
Embedding rateMOS-LQO
109876543290
130
170
210
250
290
330
Embe
ddin
g ra
te (b
ps)
23
25
27
29
31
33
MO
S-LQ
O sc
ore
(b) NID-based steganography
Figure 7 Relationship between the embedding rates and the MOS-LQO scores for our proposed steganography and NID-basedsteganography
Table 2 Steganalysis results of different steganographic methods in 2385 kbitss mode
Training rate 04 05 06Method Markov MFCC SS-QCCN RS-QCCN Markov MFCC SS-QCCN RS-QCCN Markov MFCC SS-QCCN RS-QCCN100 bps
CNV 498 498 437 490 501 502 440 492 500 505 419 500NID 510 601 422 500 501 609 429 487 521 598 418 494Ours 500 500 440 494 503 493 403 494 491 486 418 433
200 bpsCNV NID 535 745 469 500 533 762 476 500 536 758 444 501Ours 510 483 452 500 498 487 422 500 505 486 450 500
300 bpsCNV NID 548 746 493 500 563 772 500 500 554 783 505 506Ours 524 497 479 500 528 609 482 500 538 501 466 500
utilized to steganalyze QIM-based steganography in low-bit-rate speech (such as G7231 and G729) Both steganalyticmethods use a support vector machine to predict the exis-tence of hidden message in given audios
In our experiments the sentences chosen from ldquoTIMITrdquodatabases as stated in Section 41 are first encoded using thestandard AMR-WB codec These AMR-WB recordings con-stitute the cover speech setThen secret message is embeddedinto each cover AMR-WB speech with different embeddingrates that is 100 bps 200 bps and 300 bps by CNV-basedNID-based andDN-based steganographyOf course 200 bpsand 300 bps may be omitted for CNV-based steganogra-phy because of its limited embedding capacity And sevenstegospeech sets are generated amongwhich one set is relatedto CNV-based steganographic method and each of three setsis associated with NID-based and DN-based steganographyrespectivelyMoreover only 2385 kbitsmode is usedwithoutloss of generality
In each experiment a pair of cover and stego speech setsis randomly divided into training and testing sets accordingto three kinds of training rates that is 04 05 and 06 For
example if the training rate is 04 the training set contains40 speech samples randomly chosen from each of the coverand stegospeech sets and the remaining 60 samples go intothe testing set As described in [17 18] LIBSVM [22] is usedas a classifier and radial basis function (RBF) kernel and grid-search technique are employed to obtain better classificationperformance For Li et alrsquos steganalytic method the principalcomponent analysis (PCA) is first used as suggested in [18]to reduce the dimension of feature vectors to 300 Let thesamples in cover speech set denote negatives and those instego speech set stand for positives Hence the accuracy maybe defined as follows
Accuracy = 12 times ( TPTP + FN
+ TNFP + TN
) (5)
where TP are true positives TN are true negatives FN arefalse negatives and FP are false positives
The steganalytic results are given in Table 2 It can beseen that when the embedding rate is 100 bps the accuracyof detecting both CNV-based and DN-based methods isalmost the same say 50 or so while that of detecting
Security and Communication Networks 9
0 11 12 13 14 15 21 22 23 24 25 31 32 33 34 35 41 42 43 44 45 51 52 53 54 55Edge
0123456789
10
Cor
relat
ion
inde
x
12
13
14
15
23
24
25
34
35
45
Figure 8 The correlation index of 1000 AMR-WB speeches wherethe interframe edge 119894119894 connects two vertices 119881119894[119896] and 119881119894[119896 + 1] intwo neighboring frames and the intraframe edge 1198941198951015840 connects twovertices 119881119894[119896] and 119881119895[119896] in the same frame
NID-based steganography increases to 60 when MFCC-based steganalytic method is applied Moreover there isan apparent increase in the accuracy of detecting NID-based hiding method with the embedding rate increases to200 bps or 300 bps when Liu et alrsquos methods (ie Markovand MFCC-based steganalytic methods) are applied But theaccuracy of steganalyzing our proposed method DN-basedsteganography stays at the same level of 50 Therefore theproposed method may defend against Liu et alrsquos statisticalsteganalysis [17] even with higher embedding rates
According to the definition of the correlation index givenin [18] the experimental results of the correlation indices of1000 AMR-WB speeches which are randomly selected fromldquoTIMITrdquo are shown in Figure 8 Based on these results twostrong quantization codeword correlation network (QCCN)models say SS-QCCN and RS-QCCN can be constructedas illustrated in Figure 9 These two models are then usedto steganalyze our proposed steganography The steganalyticresults are also presented in Table 2 It can be seen fromTable 2 that the accuracy of both SS-QCCN and RS-QCCNis less than 50 for all of the AMR-WB stegospeeches Thepossible reasons may be that only the second and thirdcodebooks in the second stage are employed in the AMR-WB speech steganography which means merely the vertices1198812[119896] and 1198813[119896] in the 119896th frame may be changed duringsteganography while none of them are utilized in Li et alrsquossteganalytic method except for the edge ldquo33rdquo in RS-QCCNmodel Besides we also used an adapted QCCN model (ieutilize edges ldquo22rdquo ldquo33rdquo and ldquo231015840rdquo) targeted at AMR-WBspeech but the accuracy is still less than 50 It may bebecause the correlation of those edges is not strong enoughfor steganalysis according to Figure 8 Therefore it is reason-able to conclude that theAMR-WB speech steganography candefend against the steganalytic method proposed in [18]
In order to visualize the detection performance wegive some receiver operating characteristic (ROC) curvesof steganalyzing CNV-based steganography with 100 bpsembedding rate and NID-based and DN-based steganogra-phy with 100 bps 200 bps and 300 bps embedding rates are
V1[k]
V2[k]
V3[k]
V4[k]
V5[k]
V1[k + 1]
V2[k + 1]
V3[k + 1]
V4[k + 1]
V5[k + 1]
45 45
11
middot middot middot middot middot middot
(a) SS-QCCN
V1[k + 1]
V2[k + 1]
V3[k + 1]
V4[k + 1]
V5[k + 1]
V1[k]
V2[k]
V3[k]
V4[k]
V5[k]
45 45
15 15
1414
33
44
11
middot middot middot middot middot middot
(b) RS-QCCN
Figure 9 Two AMR-WB strong correlation network models
provided in Figure 10 (ROC curves for SS-QCCN and RS-QCCN are omitted for these two methods fail to steganalyzeAMR-WB steganography in spite of embedding capacity)It shows that all of the three steganographic methods canresist statistical steganalysis when the embedding rate is100 bps While the statistical steganalytic methods especiallyMFCC-based steganalysismay detect the existence of hiddenmessage embedded with NID-based steganography when theembedding rate is above 100 bps the proposed DN-basedsteganography may still have good security against bothMarkov-based and MFCC-based steganalysis
5 Conclusion
The adaptive multirate wideband (AMR-WB) is a widelyadapted format in mobile handsets and is also the recom-mended speech codec for VoLTE AMR-WB speech may bea good candidate for cover medium in speech steganographyIn this paper a novel AMR-WB speech steganographicmethod is proposed The experimental results demonstratedthe effectiveness of our proposed method The main contri-butions of this paper are as follows
(1) A novel AMR-WB speech steganography is pro-posed based on diameter-neighbor codebook parti-tion algorithm It can provide higher capacity with-out noticeable decrease in speech quality and better
10 Security and Communication Networks
DN
CNVNID
0
05
1
True
pos
itive
rate
02 04 06 08 10False positive rate
(a) Markov (TIMIT 100 bps)DN
CNVNID
02 04 06 08 10False positive rate
0
05
1
True
pos
itive
rate
(b) MFCC (TIMIT 100 bps)
NIDDN
0
05
1
True
pos
itive
rate
02 04 06 08 10False positive rate
(c) Markov (TIMIT 200 bps)
NIDDN
0
05
1
True
pos
itive
rate
02 04 06 08 10False positive rate
(d) MFCC (TIMIT 200 bps)
NIDDN
0
05
1
True
pos
itive
rate
02 04 06 08 10False positive rate
(e) Markov (TIMIT 300 bps)
NIDDN
0
05
1
True
pos
itive
rate
02 04 06 08 10False positive rate
(f) MFCC (TIMIT 300 bps)
Figure 10 ROC curves for steganalysis of CNV-based NID-based and our proposed steganography (50 training rate)
performance against statistical steganalysis thanNID-based method
(2) Flexible embedding capacity may be easily achievedwith different iterations of cluster merging Twicethe embedding capacity of CNV-based embeddingmethod may be obtained with119873119894 = 48
Conflicts of Interest
The authors declare that there are no conflicts of interestregarding the publication of this paper
Acknowledgments
This work was partially supported by the National NaturalScience Foundation of China under Grant no 61632013
References
[1] W Bender D Gruhl N Morimoto and A Lu ldquoTechniques fordata hidingrdquo IBM Systems Journal vol 35 no 3-4 pp 313ndash3351996
[2] D Gruhl A Lu and W Bender ldquoEcho hidingrdquo in InformationHiding R Anderson Ed vol 1174 of Lecture Notes in ComputerScience pp 295ndash315 Springer Berlin Heidelberg Berlin Ger-many 1996
[3] K Gopalan ldquoAudio steganography using bit modificationrdquo inProceedings of the 2003 International Conference on Multimediaand Expo ICME 2003 pp I629ndashI632 USA July 2003
[4] K Gopalan S Wenndt S Adams and D Haddad ldquoAudiosteganography by amplitude or phasemodificationrdquo in Proceed-ings of the Security andWatermarking ofMultimedia Contents Vpp 67ndash76 USA January 2003
[5] D Kirovski and H S Malvar ldquoSpread-spectrum watermarkingof audio signalsrdquo IEEE Transactions on Signal Processing vol 51no 4 pp 1020ndash1033 2003
[6] L Liu M Li Q Li and Y Liang ldquoPerceptually transparentinformation hiding in G729 bitstreamrdquo in Proceedings of the2008 4th International Conference on Intelligent InformationHiding andMultiedia Signal Processing IIH-MSP 2008 pp 406ndash409 China August 2008
[7] T Xu and Z Yang ldquoSimple and effective speech steganog-raphy in G7231 low-rate codesrdquo in Proceedings of the 2009
Security and Communication Networks 11
International Conference on Wireless Communications and Sig-nal Processing WCSP 2009 China November 2009
[8] A Shahbazi A H Rezaie and R Shahbazi ldquoMELPe codedspeech hiding on enhanced full rate compressed domainrdquo inProceedings of the Asia Modelling Symposium 2010 4th Inter-national Conference on Mathematical Modelling and ComputerSimulation AMS2010 pp 267ndash270 Malaysia May 2010
[9] A Nishimura ldquoData hiding in pitch delay data of the adaptivemulti-rate narrow-band speech codecrdquo in Proceedings of theIIH-MSP 2009-2009 5th International Conference on IntelligentInformation Hiding and Multimedia Signal Processing pp 483ndash486 Japan September 2009
[10] B Xiao Y Huang and S Tang ldquoAn approach to informationhiding in low bit-rate speech streamrdquo in Proceedings of the2008 IEEE Global Telecommunications Conference GLOBE-COM 2008 pp 1940ndash1944 USA December 2008
[11] B Chen and G W Wornell ldquoQuantization index modulationa class of provably good methods for digital watermarking andinformation embeddingrdquo Institute of Electrical and ElectronicsEngineers Transactions on InformationTheory vol 47 no 4 pp1423ndash1443 2001
[12] Y F Huang S Tang and J Yuan ldquoSteganography in inactiveframes of VoIP streams encoded by source codecrdquo IEEETransactions on Information Forensics and Security vol 6 no2 pp 296ndash306 2011
[13] YHuang C Liu S Tang and S Bai ldquoSteganography integrationinto a low-bit rate speech codecrdquo IEEE Transactions on Informa-tion Forensics and Security vol 7 no 6 pp 1865ndash1875 2012
[14] H Miao L Huang Z Chen W Yang and A Al-Hawbani ldquoAnew scheme for covert communication via 3G encoded speechrdquoComputers and Electrical Engineering vol 38 no 6 pp 1490ndash1501 2012
[15] H Tian J Liu and S Li ldquoImproving security of quantization-index-modulation steganography in low bit-rate speechstreamsrdquoMultimedia Systems vol 20 no 2 pp 143ndash154 2014
[16] J Liu H Tian J Lu and Y Chen ldquoNeighbor-index-divisionsteganography based on QIM method for G7231 speechstreamsrdquo Journal of Ambient Intelligence and Humanized Com-puting vol 7 no 1 pp 139ndash147 2016
[17] Q Liu A H Sung and M Qiao ldquoDerivative-based audiosteganalysisrdquo ACM Transactions on Multimedia ComputingCommunications andApplications (TOMM) vol 7 no 3 articleno 18 2011
[18] S Li Y Jia and C-C J Kuo ldquoSteganalysis of QIM Steganogra-phy in Low-Bit-Rate Speech Signalsrdquo IEEEACM TransactionsonAudio Speech and Language Processing vol 25 no 5 pp 1011ndash1022 2017
[19] ITU-T Wideband Coding of Speech at around 16 Kbps UsingAdaptive Multi-rate Wideband (AMR-WB) International Tele-communication Union Std G7222 2002
[20] Perceptual Evaluation of Speech Quality (PESQ) An ObjectiveMethod for End-to-end Speech Quality Assessment of Narrow-band Telephone Net-works and Speech Codecs InternationalTelecommunication Union Std P862 2001
[21] Wideband Extension to Recommendation P862 for the Assess-ment of Wideband Telephone Networks and Speech CodecsInternational Telecommunication Union Std P8622 2007
[22] C Chang and C Lin ldquoLIBSVM a Library for support vectormachinesrdquo ACM Transactions on Intelligent Systems and Tech-nology vol 2 no 3 article 27 2011
International Journal of
AerospaceEngineeringHindawiwwwhindawicom Volume 2018
RoboticsJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Active and Passive Electronic Components
VLSI Design
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Shock and Vibration
Hindawiwwwhindawicom Volume 2018
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawiwwwhindawicom
Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Control Scienceand Engineering
Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Journal ofEngineeringVolume 2018
SensorsJournal of
Hindawiwwwhindawicom Volume 2018
International Journal of
RotatingMachinery
Hindawiwwwhindawicom Volume 2018
Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Navigation and Observation
International Journal of
Hindawi
wwwhindawicom Volume 2018
Advances in
Multimedia
Submit your manuscripts atwwwhindawicom
8 Security and Communication Networks
Embe
ddin
g ra
te (b
ps)
Embedding rateMOS-LQO
330
290
250
210
170
130
90
Times of cluster merging
33
31
29
27
25
23
MO
S-LQ
O sc
ore
565350474441383532
(a) Our proposed steganography
Number of sub-codebooks
Embedding rateMOS-LQO
109876543290
130
170
210
250
290
330
Embe
ddin
g ra
te (b
ps)
23
25
27
29
31
33
MO
S-LQ
O sc
ore
(b) NID-based steganography
Figure 7 Relationship between the embedding rates and the MOS-LQO scores for our proposed steganography and NID-basedsteganography
Table 2 Steganalysis results of different steganographic methods in 2385 kbitss mode
Training rate 04 05 06Method Markov MFCC SS-QCCN RS-QCCN Markov MFCC SS-QCCN RS-QCCN Markov MFCC SS-QCCN RS-QCCN100 bps
CNV 498 498 437 490 501 502 440 492 500 505 419 500NID 510 601 422 500 501 609 429 487 521 598 418 494Ours 500 500 440 494 503 493 403 494 491 486 418 433
200 bpsCNV NID 535 745 469 500 533 762 476 500 536 758 444 501Ours 510 483 452 500 498 487 422 500 505 486 450 500
300 bpsCNV NID 548 746 493 500 563 772 500 500 554 783 505 506Ours 524 497 479 500 528 609 482 500 538 501 466 500
utilized to steganalyze QIM-based steganography in low-bit-rate speech (such as G7231 and G729) Both steganalyticmethods use a support vector machine to predict the exis-tence of hidden message in given audios
In our experiments the sentences chosen from ldquoTIMITrdquodatabases as stated in Section 41 are first encoded using thestandard AMR-WB codec These AMR-WB recordings con-stitute the cover speech setThen secret message is embeddedinto each cover AMR-WB speech with different embeddingrates that is 100 bps 200 bps and 300 bps by CNV-basedNID-based andDN-based steganographyOf course 200 bpsand 300 bps may be omitted for CNV-based steganogra-phy because of its limited embedding capacity And sevenstegospeech sets are generated amongwhich one set is relatedto CNV-based steganographic method and each of three setsis associated with NID-based and DN-based steganographyrespectivelyMoreover only 2385 kbitsmode is usedwithoutloss of generality
In each experiment a pair of cover and stego speech setsis randomly divided into training and testing sets accordingto three kinds of training rates that is 04 05 and 06 For
example if the training rate is 04 the training set contains40 speech samples randomly chosen from each of the coverand stegospeech sets and the remaining 60 samples go intothe testing set As described in [17 18] LIBSVM [22] is usedas a classifier and radial basis function (RBF) kernel and grid-search technique are employed to obtain better classificationperformance For Li et alrsquos steganalytic method the principalcomponent analysis (PCA) is first used as suggested in [18]to reduce the dimension of feature vectors to 300 Let thesamples in cover speech set denote negatives and those instego speech set stand for positives Hence the accuracy maybe defined as follows
Accuracy = 12 times ( TPTP + FN
+ TNFP + TN
) (5)
where TP are true positives TN are true negatives FN arefalse negatives and FP are false positives
The steganalytic results are given in Table 2 It can beseen that when the embedding rate is 100 bps the accuracyof detecting both CNV-based and DN-based methods isalmost the same say 50 or so while that of detecting
Security and Communication Networks 9
0 11 12 13 14 15 21 22 23 24 25 31 32 33 34 35 41 42 43 44 45 51 52 53 54 55Edge
0123456789
10
Cor
relat
ion
inde
x
12
13
14
15
23
24
25
34
35
45
Figure 8 The correlation index of 1000 AMR-WB speeches wherethe interframe edge 119894119894 connects two vertices 119881119894[119896] and 119881119894[119896 + 1] intwo neighboring frames and the intraframe edge 1198941198951015840 connects twovertices 119881119894[119896] and 119881119895[119896] in the same frame
NID-based steganography increases to 60 when MFCC-based steganalytic method is applied Moreover there isan apparent increase in the accuracy of detecting NID-based hiding method with the embedding rate increases to200 bps or 300 bps when Liu et alrsquos methods (ie Markovand MFCC-based steganalytic methods) are applied But theaccuracy of steganalyzing our proposed method DN-basedsteganography stays at the same level of 50 Therefore theproposed method may defend against Liu et alrsquos statisticalsteganalysis [17] even with higher embedding rates
According to the definition of the correlation index givenin [18] the experimental results of the correlation indices of1000 AMR-WB speeches which are randomly selected fromldquoTIMITrdquo are shown in Figure 8 Based on these results twostrong quantization codeword correlation network (QCCN)models say SS-QCCN and RS-QCCN can be constructedas illustrated in Figure 9 These two models are then usedto steganalyze our proposed steganography The steganalyticresults are also presented in Table 2 It can be seen fromTable 2 that the accuracy of both SS-QCCN and RS-QCCNis less than 50 for all of the AMR-WB stegospeeches Thepossible reasons may be that only the second and thirdcodebooks in the second stage are employed in the AMR-WB speech steganography which means merely the vertices1198812[119896] and 1198813[119896] in the 119896th frame may be changed duringsteganography while none of them are utilized in Li et alrsquossteganalytic method except for the edge ldquo33rdquo in RS-QCCNmodel Besides we also used an adapted QCCN model (ieutilize edges ldquo22rdquo ldquo33rdquo and ldquo231015840rdquo) targeted at AMR-WBspeech but the accuracy is still less than 50 It may bebecause the correlation of those edges is not strong enoughfor steganalysis according to Figure 8 Therefore it is reason-able to conclude that theAMR-WB speech steganography candefend against the steganalytic method proposed in [18]
In order to visualize the detection performance wegive some receiver operating characteristic (ROC) curvesof steganalyzing CNV-based steganography with 100 bpsembedding rate and NID-based and DN-based steganogra-phy with 100 bps 200 bps and 300 bps embedding rates are
V1[k]
V2[k]
V3[k]
V4[k]
V5[k]
V1[k + 1]
V2[k + 1]
V3[k + 1]
V4[k + 1]
V5[k + 1]
45 45
11
middot middot middot middot middot middot
(a) SS-QCCN
V1[k + 1]
V2[k + 1]
V3[k + 1]
V4[k + 1]
V5[k + 1]
V1[k]
V2[k]
V3[k]
V4[k]
V5[k]
45 45
15 15
1414
33
44
11
middot middot middot middot middot middot
(b) RS-QCCN
Figure 9 Two AMR-WB strong correlation network models
provided in Figure 10 (ROC curves for SS-QCCN and RS-QCCN are omitted for these two methods fail to steganalyzeAMR-WB steganography in spite of embedding capacity)It shows that all of the three steganographic methods canresist statistical steganalysis when the embedding rate is100 bps While the statistical steganalytic methods especiallyMFCC-based steganalysismay detect the existence of hiddenmessage embedded with NID-based steganography when theembedding rate is above 100 bps the proposed DN-basedsteganography may still have good security against bothMarkov-based and MFCC-based steganalysis
5 Conclusion
The adaptive multirate wideband (AMR-WB) is a widelyadapted format in mobile handsets and is also the recom-mended speech codec for VoLTE AMR-WB speech may bea good candidate for cover medium in speech steganographyIn this paper a novel AMR-WB speech steganographicmethod is proposed The experimental results demonstratedthe effectiveness of our proposed method The main contri-butions of this paper are as follows
(1) A novel AMR-WB speech steganography is pro-posed based on diameter-neighbor codebook parti-tion algorithm It can provide higher capacity with-out noticeable decrease in speech quality and better
10 Security and Communication Networks
DN
CNVNID
0
05
1
True
pos
itive
rate
02 04 06 08 10False positive rate
(a) Markov (TIMIT 100 bps)DN
CNVNID
02 04 06 08 10False positive rate
0
05
1
True
pos
itive
rate
(b) MFCC (TIMIT 100 bps)
NIDDN
0
05
1
True
pos
itive
rate
02 04 06 08 10False positive rate
(c) Markov (TIMIT 200 bps)
NIDDN
0
05
1
True
pos
itive
rate
02 04 06 08 10False positive rate
(d) MFCC (TIMIT 200 bps)
NIDDN
0
05
1
True
pos
itive
rate
02 04 06 08 10False positive rate
(e) Markov (TIMIT 300 bps)
NIDDN
0
05
1
True
pos
itive
rate
02 04 06 08 10False positive rate
(f) MFCC (TIMIT 300 bps)
Figure 10 ROC curves for steganalysis of CNV-based NID-based and our proposed steganography (50 training rate)
performance against statistical steganalysis thanNID-based method
(2) Flexible embedding capacity may be easily achievedwith different iterations of cluster merging Twicethe embedding capacity of CNV-based embeddingmethod may be obtained with119873119894 = 48
Conflicts of Interest
The authors declare that there are no conflicts of interestregarding the publication of this paper
Acknowledgments
This work was partially supported by the National NaturalScience Foundation of China under Grant no 61632013
References
[1] W Bender D Gruhl N Morimoto and A Lu ldquoTechniques fordata hidingrdquo IBM Systems Journal vol 35 no 3-4 pp 313ndash3351996
[2] D Gruhl A Lu and W Bender ldquoEcho hidingrdquo in InformationHiding R Anderson Ed vol 1174 of Lecture Notes in ComputerScience pp 295ndash315 Springer Berlin Heidelberg Berlin Ger-many 1996
[3] K Gopalan ldquoAudio steganography using bit modificationrdquo inProceedings of the 2003 International Conference on Multimediaand Expo ICME 2003 pp I629ndashI632 USA July 2003
[4] K Gopalan S Wenndt S Adams and D Haddad ldquoAudiosteganography by amplitude or phasemodificationrdquo in Proceed-ings of the Security andWatermarking ofMultimedia Contents Vpp 67ndash76 USA January 2003
[5] D Kirovski and H S Malvar ldquoSpread-spectrum watermarkingof audio signalsrdquo IEEE Transactions on Signal Processing vol 51no 4 pp 1020ndash1033 2003
[6] L Liu M Li Q Li and Y Liang ldquoPerceptually transparentinformation hiding in G729 bitstreamrdquo in Proceedings of the2008 4th International Conference on Intelligent InformationHiding andMultiedia Signal Processing IIH-MSP 2008 pp 406ndash409 China August 2008
[7] T Xu and Z Yang ldquoSimple and effective speech steganog-raphy in G7231 low-rate codesrdquo in Proceedings of the 2009
Security and Communication Networks 11
International Conference on Wireless Communications and Sig-nal Processing WCSP 2009 China November 2009
[8] A Shahbazi A H Rezaie and R Shahbazi ldquoMELPe codedspeech hiding on enhanced full rate compressed domainrdquo inProceedings of the Asia Modelling Symposium 2010 4th Inter-national Conference on Mathematical Modelling and ComputerSimulation AMS2010 pp 267ndash270 Malaysia May 2010
[9] A Nishimura ldquoData hiding in pitch delay data of the adaptivemulti-rate narrow-band speech codecrdquo in Proceedings of theIIH-MSP 2009-2009 5th International Conference on IntelligentInformation Hiding and Multimedia Signal Processing pp 483ndash486 Japan September 2009
[10] B Xiao Y Huang and S Tang ldquoAn approach to informationhiding in low bit-rate speech streamrdquo in Proceedings of the2008 IEEE Global Telecommunications Conference GLOBE-COM 2008 pp 1940ndash1944 USA December 2008
[11] B Chen and G W Wornell ldquoQuantization index modulationa class of provably good methods for digital watermarking andinformation embeddingrdquo Institute of Electrical and ElectronicsEngineers Transactions on InformationTheory vol 47 no 4 pp1423ndash1443 2001
[12] Y F Huang S Tang and J Yuan ldquoSteganography in inactiveframes of VoIP streams encoded by source codecrdquo IEEETransactions on Information Forensics and Security vol 6 no2 pp 296ndash306 2011
[13] YHuang C Liu S Tang and S Bai ldquoSteganography integrationinto a low-bit rate speech codecrdquo IEEE Transactions on Informa-tion Forensics and Security vol 7 no 6 pp 1865ndash1875 2012
[14] H Miao L Huang Z Chen W Yang and A Al-Hawbani ldquoAnew scheme for covert communication via 3G encoded speechrdquoComputers and Electrical Engineering vol 38 no 6 pp 1490ndash1501 2012
[15] H Tian J Liu and S Li ldquoImproving security of quantization-index-modulation steganography in low bit-rate speechstreamsrdquoMultimedia Systems vol 20 no 2 pp 143ndash154 2014
[16] J Liu H Tian J Lu and Y Chen ldquoNeighbor-index-divisionsteganography based on QIM method for G7231 speechstreamsrdquo Journal of Ambient Intelligence and Humanized Com-puting vol 7 no 1 pp 139ndash147 2016
[17] Q Liu A H Sung and M Qiao ldquoDerivative-based audiosteganalysisrdquo ACM Transactions on Multimedia ComputingCommunications andApplications (TOMM) vol 7 no 3 articleno 18 2011
[18] S Li Y Jia and C-C J Kuo ldquoSteganalysis of QIM Steganogra-phy in Low-Bit-Rate Speech Signalsrdquo IEEEACM TransactionsonAudio Speech and Language Processing vol 25 no 5 pp 1011ndash1022 2017
[19] ITU-T Wideband Coding of Speech at around 16 Kbps UsingAdaptive Multi-rate Wideband (AMR-WB) International Tele-communication Union Std G7222 2002
[20] Perceptual Evaluation of Speech Quality (PESQ) An ObjectiveMethod for End-to-end Speech Quality Assessment of Narrow-band Telephone Net-works and Speech Codecs InternationalTelecommunication Union Std P862 2001
[21] Wideband Extension to Recommendation P862 for the Assess-ment of Wideband Telephone Networks and Speech CodecsInternational Telecommunication Union Std P8622 2007
[22] C Chang and C Lin ldquoLIBSVM a Library for support vectormachinesrdquo ACM Transactions on Intelligent Systems and Tech-nology vol 2 no 3 article 27 2011
International Journal of
AerospaceEngineeringHindawiwwwhindawicom Volume 2018
RoboticsJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Active and Passive Electronic Components
VLSI Design
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Shock and Vibration
Hindawiwwwhindawicom Volume 2018
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawiwwwhindawicom
Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Control Scienceand Engineering
Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Journal ofEngineeringVolume 2018
SensorsJournal of
Hindawiwwwhindawicom Volume 2018
International Journal of
RotatingMachinery
Hindawiwwwhindawicom Volume 2018
Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Navigation and Observation
International Journal of
Hindawi
wwwhindawicom Volume 2018
Advances in
Multimedia
Submit your manuscripts atwwwhindawicom
Security and Communication Networks 9
0 11 12 13 14 15 21 22 23 24 25 31 32 33 34 35 41 42 43 44 45 51 52 53 54 55Edge
0123456789
10
Cor
relat
ion
inde
x
12
13
14
15
23
24
25
34
35
45
Figure 8 The correlation index of 1000 AMR-WB speeches wherethe interframe edge 119894119894 connects two vertices 119881119894[119896] and 119881119894[119896 + 1] intwo neighboring frames and the intraframe edge 1198941198951015840 connects twovertices 119881119894[119896] and 119881119895[119896] in the same frame
NID-based steganography increases to 60 when MFCC-based steganalytic method is applied Moreover there isan apparent increase in the accuracy of detecting NID-based hiding method with the embedding rate increases to200 bps or 300 bps when Liu et alrsquos methods (ie Markovand MFCC-based steganalytic methods) are applied But theaccuracy of steganalyzing our proposed method DN-basedsteganography stays at the same level of 50 Therefore theproposed method may defend against Liu et alrsquos statisticalsteganalysis [17] even with higher embedding rates
According to the definition of the correlation index givenin [18] the experimental results of the correlation indices of1000 AMR-WB speeches which are randomly selected fromldquoTIMITrdquo are shown in Figure 8 Based on these results twostrong quantization codeword correlation network (QCCN)models say SS-QCCN and RS-QCCN can be constructedas illustrated in Figure 9 These two models are then usedto steganalyze our proposed steganography The steganalyticresults are also presented in Table 2 It can be seen fromTable 2 that the accuracy of both SS-QCCN and RS-QCCNis less than 50 for all of the AMR-WB stegospeeches Thepossible reasons may be that only the second and thirdcodebooks in the second stage are employed in the AMR-WB speech steganography which means merely the vertices1198812[119896] and 1198813[119896] in the 119896th frame may be changed duringsteganography while none of them are utilized in Li et alrsquossteganalytic method except for the edge ldquo33rdquo in RS-QCCNmodel Besides we also used an adapted QCCN model (ieutilize edges ldquo22rdquo ldquo33rdquo and ldquo231015840rdquo) targeted at AMR-WBspeech but the accuracy is still less than 50 It may bebecause the correlation of those edges is not strong enoughfor steganalysis according to Figure 8 Therefore it is reason-able to conclude that theAMR-WB speech steganography candefend against the steganalytic method proposed in [18]
In order to visualize the detection performance wegive some receiver operating characteristic (ROC) curvesof steganalyzing CNV-based steganography with 100 bpsembedding rate and NID-based and DN-based steganogra-phy with 100 bps 200 bps and 300 bps embedding rates are
V1[k]
V2[k]
V3[k]
V4[k]
V5[k]
V1[k + 1]
V2[k + 1]
V3[k + 1]
V4[k + 1]
V5[k + 1]
45 45
11
middot middot middot middot middot middot
(a) SS-QCCN
V1[k + 1]
V2[k + 1]
V3[k + 1]
V4[k + 1]
V5[k + 1]
V1[k]
V2[k]
V3[k]
V4[k]
V5[k]
45 45
15 15
1414
33
44
11
middot middot middot middot middot middot
(b) RS-QCCN
Figure 9 Two AMR-WB strong correlation network models
provided in Figure 10 (ROC curves for SS-QCCN and RS-QCCN are omitted for these two methods fail to steganalyzeAMR-WB steganography in spite of embedding capacity)It shows that all of the three steganographic methods canresist statistical steganalysis when the embedding rate is100 bps While the statistical steganalytic methods especiallyMFCC-based steganalysismay detect the existence of hiddenmessage embedded with NID-based steganography when theembedding rate is above 100 bps the proposed DN-basedsteganography may still have good security against bothMarkov-based and MFCC-based steganalysis
5 Conclusion
The adaptive multirate wideband (AMR-WB) is a widelyadapted format in mobile handsets and is also the recom-mended speech codec for VoLTE AMR-WB speech may bea good candidate for cover medium in speech steganographyIn this paper a novel AMR-WB speech steganographicmethod is proposed The experimental results demonstratedthe effectiveness of our proposed method The main contri-butions of this paper are as follows
(1) A novel AMR-WB speech steganography is pro-posed based on diameter-neighbor codebook parti-tion algorithm It can provide higher capacity with-out noticeable decrease in speech quality and better
10 Security and Communication Networks
DN
CNVNID
0
05
1
True
pos
itive
rate
02 04 06 08 10False positive rate
(a) Markov (TIMIT 100 bps)DN
CNVNID
02 04 06 08 10False positive rate
0
05
1
True
pos
itive
rate
(b) MFCC (TIMIT 100 bps)
NIDDN
0
05
1
True
pos
itive
rate
02 04 06 08 10False positive rate
(c) Markov (TIMIT 200 bps)
NIDDN
0
05
1
True
pos
itive
rate
02 04 06 08 10False positive rate
(d) MFCC (TIMIT 200 bps)
NIDDN
0
05
1
True
pos
itive
rate
02 04 06 08 10False positive rate
(e) Markov (TIMIT 300 bps)
NIDDN
0
05
1
True
pos
itive
rate
02 04 06 08 10False positive rate
(f) MFCC (TIMIT 300 bps)
Figure 10 ROC curves for steganalysis of CNV-based NID-based and our proposed steganography (50 training rate)
performance against statistical steganalysis thanNID-based method
(2) Flexible embedding capacity may be easily achievedwith different iterations of cluster merging Twicethe embedding capacity of CNV-based embeddingmethod may be obtained with119873119894 = 48
Conflicts of Interest
The authors declare that there are no conflicts of interestregarding the publication of this paper
Acknowledgments
This work was partially supported by the National NaturalScience Foundation of China under Grant no 61632013
References
[1] W Bender D Gruhl N Morimoto and A Lu ldquoTechniques fordata hidingrdquo IBM Systems Journal vol 35 no 3-4 pp 313ndash3351996
[2] D Gruhl A Lu and W Bender ldquoEcho hidingrdquo in InformationHiding R Anderson Ed vol 1174 of Lecture Notes in ComputerScience pp 295ndash315 Springer Berlin Heidelberg Berlin Ger-many 1996
[3] K Gopalan ldquoAudio steganography using bit modificationrdquo inProceedings of the 2003 International Conference on Multimediaand Expo ICME 2003 pp I629ndashI632 USA July 2003
[4] K Gopalan S Wenndt S Adams and D Haddad ldquoAudiosteganography by amplitude or phasemodificationrdquo in Proceed-ings of the Security andWatermarking ofMultimedia Contents Vpp 67ndash76 USA January 2003
[5] D Kirovski and H S Malvar ldquoSpread-spectrum watermarkingof audio signalsrdquo IEEE Transactions on Signal Processing vol 51no 4 pp 1020ndash1033 2003
[6] L Liu M Li Q Li and Y Liang ldquoPerceptually transparentinformation hiding in G729 bitstreamrdquo in Proceedings of the2008 4th International Conference on Intelligent InformationHiding andMultiedia Signal Processing IIH-MSP 2008 pp 406ndash409 China August 2008
[7] T Xu and Z Yang ldquoSimple and effective speech steganog-raphy in G7231 low-rate codesrdquo in Proceedings of the 2009
Security and Communication Networks 11
International Conference on Wireless Communications and Sig-nal Processing WCSP 2009 China November 2009
[8] A Shahbazi A H Rezaie and R Shahbazi ldquoMELPe codedspeech hiding on enhanced full rate compressed domainrdquo inProceedings of the Asia Modelling Symposium 2010 4th Inter-national Conference on Mathematical Modelling and ComputerSimulation AMS2010 pp 267ndash270 Malaysia May 2010
[9] A Nishimura ldquoData hiding in pitch delay data of the adaptivemulti-rate narrow-band speech codecrdquo in Proceedings of theIIH-MSP 2009-2009 5th International Conference on IntelligentInformation Hiding and Multimedia Signal Processing pp 483ndash486 Japan September 2009
[10] B Xiao Y Huang and S Tang ldquoAn approach to informationhiding in low bit-rate speech streamrdquo in Proceedings of the2008 IEEE Global Telecommunications Conference GLOBE-COM 2008 pp 1940ndash1944 USA December 2008
[11] B Chen and G W Wornell ldquoQuantization index modulationa class of provably good methods for digital watermarking andinformation embeddingrdquo Institute of Electrical and ElectronicsEngineers Transactions on InformationTheory vol 47 no 4 pp1423ndash1443 2001
[12] Y F Huang S Tang and J Yuan ldquoSteganography in inactiveframes of VoIP streams encoded by source codecrdquo IEEETransactions on Information Forensics and Security vol 6 no2 pp 296ndash306 2011
[13] YHuang C Liu S Tang and S Bai ldquoSteganography integrationinto a low-bit rate speech codecrdquo IEEE Transactions on Informa-tion Forensics and Security vol 7 no 6 pp 1865ndash1875 2012
[14] H Miao L Huang Z Chen W Yang and A Al-Hawbani ldquoAnew scheme for covert communication via 3G encoded speechrdquoComputers and Electrical Engineering vol 38 no 6 pp 1490ndash1501 2012
[15] H Tian J Liu and S Li ldquoImproving security of quantization-index-modulation steganography in low bit-rate speechstreamsrdquoMultimedia Systems vol 20 no 2 pp 143ndash154 2014
[16] J Liu H Tian J Lu and Y Chen ldquoNeighbor-index-divisionsteganography based on QIM method for G7231 speechstreamsrdquo Journal of Ambient Intelligence and Humanized Com-puting vol 7 no 1 pp 139ndash147 2016
[17] Q Liu A H Sung and M Qiao ldquoDerivative-based audiosteganalysisrdquo ACM Transactions on Multimedia ComputingCommunications andApplications (TOMM) vol 7 no 3 articleno 18 2011
[18] S Li Y Jia and C-C J Kuo ldquoSteganalysis of QIM Steganogra-phy in Low-Bit-Rate Speech Signalsrdquo IEEEACM TransactionsonAudio Speech and Language Processing vol 25 no 5 pp 1011ndash1022 2017
[19] ITU-T Wideband Coding of Speech at around 16 Kbps UsingAdaptive Multi-rate Wideband (AMR-WB) International Tele-communication Union Std G7222 2002
[20] Perceptual Evaluation of Speech Quality (PESQ) An ObjectiveMethod for End-to-end Speech Quality Assessment of Narrow-band Telephone Net-works and Speech Codecs InternationalTelecommunication Union Std P862 2001
[21] Wideband Extension to Recommendation P862 for the Assess-ment of Wideband Telephone Networks and Speech CodecsInternational Telecommunication Union Std P8622 2007
[22] C Chang and C Lin ldquoLIBSVM a Library for support vectormachinesrdquo ACM Transactions on Intelligent Systems and Tech-nology vol 2 no 3 article 27 2011
International Journal of
AerospaceEngineeringHindawiwwwhindawicom Volume 2018
RoboticsJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Active and Passive Electronic Components
VLSI Design
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Shock and Vibration
Hindawiwwwhindawicom Volume 2018
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawiwwwhindawicom
Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Control Scienceand Engineering
Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Journal ofEngineeringVolume 2018
SensorsJournal of
Hindawiwwwhindawicom Volume 2018
International Journal of
RotatingMachinery
Hindawiwwwhindawicom Volume 2018
Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Navigation and Observation
International Journal of
Hindawi
wwwhindawicom Volume 2018
Advances in
Multimedia
Submit your manuscripts atwwwhindawicom
10 Security and Communication Networks
DN
CNVNID
0
05
1
True
pos
itive
rate
02 04 06 08 10False positive rate
(a) Markov (TIMIT 100 bps)DN
CNVNID
02 04 06 08 10False positive rate
0
05
1
True
pos
itive
rate
(b) MFCC (TIMIT 100 bps)
NIDDN
0
05
1
True
pos
itive
rate
02 04 06 08 10False positive rate
(c) Markov (TIMIT 200 bps)
NIDDN
0
05
1
True
pos
itive
rate
02 04 06 08 10False positive rate
(d) MFCC (TIMIT 200 bps)
NIDDN
0
05
1
True
pos
itive
rate
02 04 06 08 10False positive rate
(e) Markov (TIMIT 300 bps)
NIDDN
0
05
1
True
pos
itive
rate
02 04 06 08 10False positive rate
(f) MFCC (TIMIT 300 bps)
Figure 10 ROC curves for steganalysis of CNV-based NID-based and our proposed steganography (50 training rate)
performance against statistical steganalysis thanNID-based method
(2) Flexible embedding capacity may be easily achievedwith different iterations of cluster merging Twicethe embedding capacity of CNV-based embeddingmethod may be obtained with119873119894 = 48
Conflicts of Interest
The authors declare that there are no conflicts of interestregarding the publication of this paper
Acknowledgments
This work was partially supported by the National NaturalScience Foundation of China under Grant no 61632013
References
[1] W Bender D Gruhl N Morimoto and A Lu ldquoTechniques fordata hidingrdquo IBM Systems Journal vol 35 no 3-4 pp 313ndash3351996
[2] D Gruhl A Lu and W Bender ldquoEcho hidingrdquo in InformationHiding R Anderson Ed vol 1174 of Lecture Notes in ComputerScience pp 295ndash315 Springer Berlin Heidelberg Berlin Ger-many 1996
[3] K Gopalan ldquoAudio steganography using bit modificationrdquo inProceedings of the 2003 International Conference on Multimediaand Expo ICME 2003 pp I629ndashI632 USA July 2003
[4] K Gopalan S Wenndt S Adams and D Haddad ldquoAudiosteganography by amplitude or phasemodificationrdquo in Proceed-ings of the Security andWatermarking ofMultimedia Contents Vpp 67ndash76 USA January 2003
[5] D Kirovski and H S Malvar ldquoSpread-spectrum watermarkingof audio signalsrdquo IEEE Transactions on Signal Processing vol 51no 4 pp 1020ndash1033 2003
[6] L Liu M Li Q Li and Y Liang ldquoPerceptually transparentinformation hiding in G729 bitstreamrdquo in Proceedings of the2008 4th International Conference on Intelligent InformationHiding andMultiedia Signal Processing IIH-MSP 2008 pp 406ndash409 China August 2008
[7] T Xu and Z Yang ldquoSimple and effective speech steganog-raphy in G7231 low-rate codesrdquo in Proceedings of the 2009
Security and Communication Networks 11
International Conference on Wireless Communications and Sig-nal Processing WCSP 2009 China November 2009
[8] A Shahbazi A H Rezaie and R Shahbazi ldquoMELPe codedspeech hiding on enhanced full rate compressed domainrdquo inProceedings of the Asia Modelling Symposium 2010 4th Inter-national Conference on Mathematical Modelling and ComputerSimulation AMS2010 pp 267ndash270 Malaysia May 2010
[9] A Nishimura ldquoData hiding in pitch delay data of the adaptivemulti-rate narrow-band speech codecrdquo in Proceedings of theIIH-MSP 2009-2009 5th International Conference on IntelligentInformation Hiding and Multimedia Signal Processing pp 483ndash486 Japan September 2009
[10] B Xiao Y Huang and S Tang ldquoAn approach to informationhiding in low bit-rate speech streamrdquo in Proceedings of the2008 IEEE Global Telecommunications Conference GLOBE-COM 2008 pp 1940ndash1944 USA December 2008
[11] B Chen and G W Wornell ldquoQuantization index modulationa class of provably good methods for digital watermarking andinformation embeddingrdquo Institute of Electrical and ElectronicsEngineers Transactions on InformationTheory vol 47 no 4 pp1423ndash1443 2001
[12] Y F Huang S Tang and J Yuan ldquoSteganography in inactiveframes of VoIP streams encoded by source codecrdquo IEEETransactions on Information Forensics and Security vol 6 no2 pp 296ndash306 2011
[13] YHuang C Liu S Tang and S Bai ldquoSteganography integrationinto a low-bit rate speech codecrdquo IEEE Transactions on Informa-tion Forensics and Security vol 7 no 6 pp 1865ndash1875 2012
[14] H Miao L Huang Z Chen W Yang and A Al-Hawbani ldquoAnew scheme for covert communication via 3G encoded speechrdquoComputers and Electrical Engineering vol 38 no 6 pp 1490ndash1501 2012
[15] H Tian J Liu and S Li ldquoImproving security of quantization-index-modulation steganography in low bit-rate speechstreamsrdquoMultimedia Systems vol 20 no 2 pp 143ndash154 2014
[16] J Liu H Tian J Lu and Y Chen ldquoNeighbor-index-divisionsteganography based on QIM method for G7231 speechstreamsrdquo Journal of Ambient Intelligence and Humanized Com-puting vol 7 no 1 pp 139ndash147 2016
[17] Q Liu A H Sung and M Qiao ldquoDerivative-based audiosteganalysisrdquo ACM Transactions on Multimedia ComputingCommunications andApplications (TOMM) vol 7 no 3 articleno 18 2011
[18] S Li Y Jia and C-C J Kuo ldquoSteganalysis of QIM Steganogra-phy in Low-Bit-Rate Speech Signalsrdquo IEEEACM TransactionsonAudio Speech and Language Processing vol 25 no 5 pp 1011ndash1022 2017
[19] ITU-T Wideband Coding of Speech at around 16 Kbps UsingAdaptive Multi-rate Wideband (AMR-WB) International Tele-communication Union Std G7222 2002
[20] Perceptual Evaluation of Speech Quality (PESQ) An ObjectiveMethod for End-to-end Speech Quality Assessment of Narrow-band Telephone Net-works and Speech Codecs InternationalTelecommunication Union Std P862 2001
[21] Wideband Extension to Recommendation P862 for the Assess-ment of Wideband Telephone Networks and Speech CodecsInternational Telecommunication Union Std P8622 2007
[22] C Chang and C Lin ldquoLIBSVM a Library for support vectormachinesrdquo ACM Transactions on Intelligent Systems and Tech-nology vol 2 no 3 article 27 2011
International Journal of
AerospaceEngineeringHindawiwwwhindawicom Volume 2018
RoboticsJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Active and Passive Electronic Components
VLSI Design
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Shock and Vibration
Hindawiwwwhindawicom Volume 2018
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawiwwwhindawicom
Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Control Scienceand Engineering
Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Journal ofEngineeringVolume 2018
SensorsJournal of
Hindawiwwwhindawicom Volume 2018
International Journal of
RotatingMachinery
Hindawiwwwhindawicom Volume 2018
Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Navigation and Observation
International Journal of
Hindawi
wwwhindawicom Volume 2018
Advances in
Multimedia
Submit your manuscripts atwwwhindawicom
Security and Communication Networks 11
International Conference on Wireless Communications and Sig-nal Processing WCSP 2009 China November 2009
[8] A Shahbazi A H Rezaie and R Shahbazi ldquoMELPe codedspeech hiding on enhanced full rate compressed domainrdquo inProceedings of the Asia Modelling Symposium 2010 4th Inter-national Conference on Mathematical Modelling and ComputerSimulation AMS2010 pp 267ndash270 Malaysia May 2010
[9] A Nishimura ldquoData hiding in pitch delay data of the adaptivemulti-rate narrow-band speech codecrdquo in Proceedings of theIIH-MSP 2009-2009 5th International Conference on IntelligentInformation Hiding and Multimedia Signal Processing pp 483ndash486 Japan September 2009
[10] B Xiao Y Huang and S Tang ldquoAn approach to informationhiding in low bit-rate speech streamrdquo in Proceedings of the2008 IEEE Global Telecommunications Conference GLOBE-COM 2008 pp 1940ndash1944 USA December 2008
[11] B Chen and G W Wornell ldquoQuantization index modulationa class of provably good methods for digital watermarking andinformation embeddingrdquo Institute of Electrical and ElectronicsEngineers Transactions on InformationTheory vol 47 no 4 pp1423ndash1443 2001
[12] Y F Huang S Tang and J Yuan ldquoSteganography in inactiveframes of VoIP streams encoded by source codecrdquo IEEETransactions on Information Forensics and Security vol 6 no2 pp 296ndash306 2011
[13] YHuang C Liu S Tang and S Bai ldquoSteganography integrationinto a low-bit rate speech codecrdquo IEEE Transactions on Informa-tion Forensics and Security vol 7 no 6 pp 1865ndash1875 2012
[14] H Miao L Huang Z Chen W Yang and A Al-Hawbani ldquoAnew scheme for covert communication via 3G encoded speechrdquoComputers and Electrical Engineering vol 38 no 6 pp 1490ndash1501 2012
[15] H Tian J Liu and S Li ldquoImproving security of quantization-index-modulation steganography in low bit-rate speechstreamsrdquoMultimedia Systems vol 20 no 2 pp 143ndash154 2014
[16] J Liu H Tian J Lu and Y Chen ldquoNeighbor-index-divisionsteganography based on QIM method for G7231 speechstreamsrdquo Journal of Ambient Intelligence and Humanized Com-puting vol 7 no 1 pp 139ndash147 2016
[17] Q Liu A H Sung and M Qiao ldquoDerivative-based audiosteganalysisrdquo ACM Transactions on Multimedia ComputingCommunications andApplications (TOMM) vol 7 no 3 articleno 18 2011
[18] S Li Y Jia and C-C J Kuo ldquoSteganalysis of QIM Steganogra-phy in Low-Bit-Rate Speech Signalsrdquo IEEEACM TransactionsonAudio Speech and Language Processing vol 25 no 5 pp 1011ndash1022 2017
[19] ITU-T Wideband Coding of Speech at around 16 Kbps UsingAdaptive Multi-rate Wideband (AMR-WB) International Tele-communication Union Std G7222 2002
[20] Perceptual Evaluation of Speech Quality (PESQ) An ObjectiveMethod for End-to-end Speech Quality Assessment of Narrow-band Telephone Net-works and Speech Codecs InternationalTelecommunication Union Std P862 2001
[21] Wideband Extension to Recommendation P862 for the Assess-ment of Wideband Telephone Networks and Speech CodecsInternational Telecommunication Union Std P8622 2007
[22] C Chang and C Lin ldquoLIBSVM a Library for support vectormachinesrdquo ACM Transactions on Intelligent Systems and Tech-nology vol 2 no 3 article 27 2011
International Journal of
AerospaceEngineeringHindawiwwwhindawicom Volume 2018
RoboticsJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Active and Passive Electronic Components
VLSI Design
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Shock and Vibration
Hindawiwwwhindawicom Volume 2018
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawiwwwhindawicom
Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Control Scienceand Engineering
Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Journal ofEngineeringVolume 2018
SensorsJournal of
Hindawiwwwhindawicom Volume 2018
International Journal of
RotatingMachinery
Hindawiwwwhindawicom Volume 2018
Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Navigation and Observation
International Journal of
Hindawi
wwwhindawicom Volume 2018
Advances in
Multimedia
Submit your manuscripts atwwwhindawicom
International Journal of
AerospaceEngineeringHindawiwwwhindawicom Volume 2018
RoboticsJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Active and Passive Electronic Components
VLSI Design
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Shock and Vibration
Hindawiwwwhindawicom Volume 2018
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawiwwwhindawicom
Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Control Scienceand Engineering
Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Journal ofEngineeringVolume 2018
SensorsJournal of
Hindawiwwwhindawicom Volume 2018
International Journal of
RotatingMachinery
Hindawiwwwhindawicom Volume 2018
Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Navigation and Observation
International Journal of
Hindawi
wwwhindawicom Volume 2018
Advances in
Multimedia
Submit your manuscripts atwwwhindawicom