how to hide data

Upload: geovana-ribeiro

Post on 22-Feb-2018

222 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/24/2019 How to hide data

    1/19

    ISTE-STTP on Network Security & Cryptography, LBSCE 2004

    Information Hiding Techniques: A Tutorial Review

    Sabu M ThampiAssistant Professor

    Department of Computer Science & EngineeringLBS College of Engineering, Kasaragod

    Kerala- 671542, S.India

    [email protected]

    AbstractThe purpose of this tutorial is to present an

    overview of various information hiding techniques.

    A brief history of steganography is provided along

    with techniques that were used to hide information.

    Text, image and audio based information hiding

    techniques are discussed. This paper also provides

    a basic introduction to digital watermarking.

    1.History of Information HidingThe idea of communicating secretly is as old ascommunication itself. In this section, we brieflydiscuss the historical development of informationhiding techniques such as steganography/watermarking.

    Early steganography was messy. Before phones,before mail, before horses, messages were sent onfoot. If you wanted to hide a message, you had twochoices: have the messenger memorize it, or hide iton the messenger.

    While information hiding techniques have receiveda tremendous attention recently, its applicationgoes back to Greek times. According to Greekhistorian Herodotus, the famous Greek tyrantHistiaeus, while in prison, used unusual method tosend message to his son-in-law. He shaved the headof a slave to tattoo a message on his scalp.Histiaeus then waited until the hair grew back onslaves head prior to sending him off to his son-in-law.

    The second story also came from Herodotus, whichclaims that a soldier named Demeratus needed tosend a message to Sparta that Xerxes intended toinvade Greece. Back then, the writing medium waswritten on wax-covered tablet. Demeratus removedthe wax from the tablet, wrote the secret messageon the underlying wood, recovered the tablet with

    wax to make it appear as a blank tablet and finallysent the document without being detected.

    Invisible inks have always been a popular methodof steganography. Ancient Romans used to write

    between lines using invisible inks based on readilyavailable substances such as fruit juices, urine andmilk. When heated, the invisible inks woulddarken, and become legible. Ovid in his Art ofLove suggests using milk to write invisibly. Later

    chemically affected sympathetic inks were developed.Invisible inks were used as recently as World War II.Modern invisible inks fluoresce under ultraviolet lightand are used as anti-counterfeit devices. For example,"VOID" is printed on checks and other officialdocuments in an ink that appears under the strongultraviolet light used for photocopies.

    The monk Johannes Trithemius, considered one of thefounders of modern cryptography, had ingenuity inspades. His three volume work Steganographia,written around 1500, describes an extensive system for

    concealing secret messages within innocuous texts. Onits surface, the book seems to be a magical text, andthe initial reaction in the 16th century was so strongthat Steganographiawas only circulated privately untilpublication in 1606. But less than five years ago, JimReeds of AT&T Labs deciphered mysterious codes inthe third volume, showing that Trithemius' work ismore a treatise on cryptology than demonology. Reeds'fascinating account of the code breaking process isquite readable.One of Trithemius' schemes was to conceal messagesin long invocations of the names of angels, with thesecret message appearing as a pattern of letters withinthe words. For example, as every other letter in every

    other word:

    padiel aporsymesarpon omeuaspeludyn malpreaxo

    which reveals "prymus apex."

    Another clever invention in Steganographia was the"Ave Maria" cipher. The book contains a series oftables, each of which has a list of words, one per letter.To code a message, the message letters are replaced bythe corresponding words. If the tables are used inorder, one table per letter, then the coded message willappear to be an innocent prayer.

    The earliest actual book on steganography was a fourhundred page work written by Gaspari Schott in 1665and called Steganographica. Although most of theideas came from Trithemius, it was a start.

    Further development in the field occurred in 1883,with the publication of Auguste KerchoffsCryptographie militaire. Although this work wasmostly about cryptography, it describes someprinciples that are worth keeping in mind whendesigning a new steganographic system.

  • 7/24/2019 How to hide data

    2/19

  • 7/24/2019 How to hide data

    3/19

    ISTE-STTP on Network Security & Cryptography, LBSCE 2004

    hide messages inside other harmless messages in away that does not allow any enemy to even detectthat there is a second message present.

    In a digital world, Steganography and cryptographyare both intended to protect information fromunwanted parties. Both Steganography and

    Cryptography are excellent means by which toaccomplish this but neither technology alone isperfect and both can be broken. It is for this reasonthat most experts would suggest using both to addmultiple layers of security.

    Steganography Vs Cryptography

    The term Steganography means, cover writingwhereas cryptography means secret writing.Cryptography is the study of methods of sendingmessages in distinct form so that only the intendedrecipients can remove the disguise and read themessage. The message we want to send is called

    plain text and disguised message is called ciphertext. The process of converting a plain text to acipher text is called enciphering or encryption, andthe reverse process is called deciphering ordecryption. Encryption protects contents during thetransmission of the data from sender to receiver.However, after receipt and subsequent decryption,the data is no longer protected and is the clear.Steganography hides messages in plain sight ratherthan encrypting the message; it is embedded in thedata (that has to be protected) and doesnt requiresecret transmission. The message is carried insidedata.

    Steganography can be used in a large amount ofdata formats in the digital world of today. The most

    popular data formats are .bmp, .doc, .gif, .jpeg,.mp3, .txt and .wav. Steganographic technologiesare a very important part of the future of Internetsecurity and privacy on open systems such asInternet.

    Figure 2a : Cryptography

    Steganographic research is primarily driven by thelack of strength in the cryptographic systems ontheir own and the desire to have complete secrecyin an open-systems environment. ManyGovernments have created laws that either limit thestrength of cryptosystems or prohibit themcompletely. This unfortunately leaves the majorityof the Internet community either with relativelyweak and a lot of the times breakable encryption

    algorithms or none at all. This is where Steganographycomes in. Steganography can be used to hide importantdata inside another file so that only the parties intendedto get the message even knows a secret message exists.It is a good practice to use Cryptography andSteganography together.

    Neither Steganography nor Cryptography is consideredturnkey solutions to open systems privacy, but usingboth technologies together can provide a veryacceptable amount of privacy for anyone connecting toand communicating over these systems.

    Figure 2b: Steganography

    Figure 3

    3.A Detailed Look at SteganographyIn this section we will discuss Steganography at length.We will start by looking at the different types ofSteganography generally used in practice today alongwith some of the other principles that are used inSteganography.

    To start, lets look at what a theoretically perfect secretcommunication (Steganography) would consist of. Toillustrate this concept, we will use three fictitiouscharacters named Amy, Bret and Crystal. Amy wantsto send a secret message (M) to Bret using a random(R) harmless message to create a cover (C), which canbe sent to Bret without raising suspicion. Amy thenchanges the cover message (C) to a stego-object (S) byembedding the secret message (M) into the cover

  • 7/24/2019 How to hide data

    4/19

    ISTE-STTP on Network Security & Cryptography, LBSCE 2004

    message (C) by using a stego-key (K). Amy shouldthen be able to send the stego-object (S) to Bretwithout being detected by Crystal. Bret will then beable to read the secret message (M) because heknows the stego-key (K) used to embed it into thecover message (C).

    In order to embed secret data into a cover message,the cover must contain a sufficient amount ofredundant data or noise. This is because in theembedding process Steganography actuallyreplaces redundant data with the secret message.This limits the types of data that we can use withSteganography.

    There are basically three types of steganographicprotocols used. They are Pure Steganography,Secret Key Steganography, and Public KeySteganography.

    Pure Steganography is defined as a steganographicsystem that does not require the exchange of acipher such as a stego-key. This method ofSteganography is the least secure means by whichto communicate secretly because the sender andreceiver can rely only upon the presumption that noother parties are aware of this secret message.

    Secret Key Steganography is defined as aSteganographic system that requires the exchangeof a secret key (stego-key) prior to communication.Secret Key Steganography takes a cover messageand embeds the secret message inside of it using asecret key (stego-key). Only the parties who knowthe secret key can reverse the process and read thesecret message. Unlike Pure Steganography wherea perceived invisible communication channel is

    present, Secret Key Steganography exchanges astego-key, which makes it more susceptible tointerception. The benefit to Secret KeySteganography is even if it is intercepted; only

    parties who know the secret key can extract thesecret message.

    Figure 4: Steganographic Protocols

    Public Key Steganography takes the concepts fromPublic Key Cryptography as explained below.Public Key Steganography is defined as a

    steganographic system that uses a public key and aprivate key to secure the communication between theparties wanting to communicate secretly. The senderwill use the public key during the encoding processand only the private key, which has a direstmathematical relationship with the public key, candecipher the secret message. Public Key

    Steganography provides a more robust way ofimplementing a steganographic system because it canutilise a much more robust and researched technologyin Public Key Cryptography.

    Throughout the history different media types havebeen used to hide information. With advancements incomputer industry this number is only increasing.Some of the media types are computer file system,transmission protocols, audio files, text files andimages. A brief introduction for encoding messages invarious media types is given below.

    Kerchoffs Principle

    The security of the system has to be based on the

    assumption that the enemy has full knowledge of the

    design and implementation details of the steganographicsystem. The only missing information for the enemy is a

    short, easily exchangeable random number sequence, the

    secret key. Without this secret key, the enemy should not

    have the chance to even suspect that on an observed

    communication channel, hidden communication is takingplace.

    3.2 Computer File SystemWhere it stores normal data, a computer file systemcan also be used to hide information between innocentfiles. For example a hard drive while showing thevisible partition to a computer user may contain hiddenpartitions that can carry hidden files inside them. For

    example sfspatch is a kernel patch, which introducesmodule support for the steganographic file on a Linuxmachine. Sfspatch employs encryption along withsteganographic techniques to hide information on thedisk so it is not visible to a casual user.

    FAT 16 system on Microsoft Windows hosts allocate32 kilobytes of disk space to each file. If the file size isonly a few kilobytes, the rest of the space can be usedto hide information.

    3.3 Transmission ProtocolCovert channels can be established using the controldata, timing properties of transmission or of the user

    data. In this approach it is very difficult or almostimpossible to prove the existence of covert channels,because the information is stripped off at the receiver.But if the information is hidden using user data, itremains on the hard disk until it is specifically deleted.Thus network systems can be utilized in cryptographyto establish hidden channels of communications.

    Transmission Control Protocol (TCP) and InternetProtocol (IP) are some of the few protocols that can beused to hide information inside certain header fields.

  • 7/24/2019 How to hide data

    5/19

    ISTE-STTP on Network Security & Cryptography, LBSCE 2004

    Some TCP/IP fields are either changed or strippedoff by packet filtering mechanisms or throughfragment re-assembly. However, there are fieldsthat are less likely to change or altered. These fieldsinclude: Identification field, Sequence Numberfield and Acknowledge Sequence Number field.

    Hiding information within an IP headerThe Identification field within an IP headerprovides network devices with a unique number toidentify packets that may require reassembly. As

    presented by Neil F. Johnson in INFS 762 class atGeorge Mason University (GMU), replacing theidentification field with the numerical ASCIIrepresentation of the character to be encoded

    provides an easy way to hide information withinthis field. In his example, Johnson selects anunsigned integer to be transmitted as theidentification field. The ASCII value of this integercan be achieved by dividing the integer by 256.

    At the transmitting end client host construct apacket to include the desired identification numberalong with source and destination address. In thisexample I have chosen 18432, 18688, 17408 and17664 as the four identification field values for thefour IP packets. This process is depicted in Table 1.Once the ASCII value of the identification field iscalculated at the destination, the decoded messageis found to be the word HIDE.

    Table 1: The IP Header

    Similar techniques can also be used to encodeinformation in the Sequence Number field of aTCP packet.

    Session LayerAn Open Systems Interconnection Reference

    model (OSI) uses packet structures to sendinformation across the network from one layer toanother as well as from one network terminal toanother. A network packet consists of packetheaders, user data and packet trailers. All the

    packets sent across the network have the samepacket structure.

    The session layer allows two machines to establishsessions over the network. These sessions allow

    ordinary data transfer plus enhanced services for someapplications. This function is achieved via softwarethat can "mount" remote discs on a local machine.Richard Popa, who has conducted research in this areaat the University of Timisoara, has described thefollowing scheme that can be used to establish covertcommunication channel:

    "Suppose we have two files on the disk of Alice, Bob

    can read one of them. If he reads the first file then

    Alice records a zero and if he reads the second file she

    records a one. "

    Table 2: Hiding data in the identification field

    The fact that Oscar can see this traffic should notarouse his suspicion, since it is irrelevant to him thatBob reads one file rather than another.

    Figure 5: Open Systems Interconnection Reference

    model (OSI)

    3.4 Encoding Secret Messages in TextEncoding secret messages in text can be a verychallenging task. This is because text files have a verysmall amount of redundant data to replace with a secretmessage. Another drawback is the ease of which text

  • 7/24/2019 How to hide data

    6/19

    ISTE-STTP on Network Security & Cryptography, LBSCE 2004

    based Steganography can be altered by anunwanted parties by just changing the text itself orreformatting the text to some other form (from .txtto .doc, etc.). There are numerous methods bywhich to accomplish text based Steganography.These methods are: Open space method, Syntacticmethods and Semantic methods (Figure 4).

    3.4.1 Open Space MethodsThere are couple ways to employ the open space intext files to encode the information. This methodworks because to a casual reader one extra space atthe end of line or an extra space between twowords does not prompt abnormality. However,open space methods are only useful with ASCIIformat.

    Inter-sentence space method encodes a 0 byadding a single space after a period in Englishprose. Adding two spaces would encode a 1.This method works, but requires a large

    amount of data to hide only little information.Also many word processing toolsautomatically correct the spaces betweensentences.

    Line-shift encoding involves actually shifting eachline of text vertically up or down by as little 3centimeters. Depending on whether the line was upor down from the stationary line would equate to avalue that would or could be encoded into a secretmessage.

    Figure 6: Steganography in Text

    End-of-line space method exploits white spaceat the end of each line. Data encoded using apredetermined number of spaces at the end ofeach line. For example two spaces will encodeone bit, four spaces will encode two bits andeight spaces will encode three bits and so on.This technique works better than the inter-space method, because increasing the numberof spaces can hide more data.

    Right-justification of text can also be used toencode data within text files. Calculating andcontrolling the spaces between words encode datain innocent text files. One space between wordsrepresents a 0 and two spaces represent a 1.However, this approach makes it difficult todecode the data as it becomes impossible to

    distinguish a single innocent space form anencoded one. For this purpose another techniquebased on Manchester coding is used. Hence 01is interpreted as 1 and 10 is interpreted as 0.Whereas 00 and 11 are considered the null bitstrings.

    Feature specific encoding involves encoding secretmessages into formatted text by changing certaintext attributes such as vertical/horizontal length ofletters such as b, d, T, etc. This is by far thehardest text encoding method to intercept as eachtype of formatted text has a large amount offeatures that can be used for encoding the secretmessage.

    All of the above text based encoding methods requireeither the original file or the knowledge of the originalfiles formatting to be able to decode the secretmessage.

    3.4.2 Syntactic MethodsSyntactic methods exploit the use of punctuation andstructure of text to hide data without scientificallyaltering the meaning of the message. For example thetwo phrases bread, butter, and milk and bread,butter and milkare grammatically correct but differ inthe use of comma. One can employ this structure

    alternatively in a text message to represent either a 1of one method is used and to represent a 0 if theother method is employed.

    3.4.3 Semantic MethodsSemantic methods assign two synonyms a primary orsecondary value. These values are then translated intobinary 1 or 0. For example the word big isassigned a primary and the word large is assignedsecondary. Therefore, decoding a message wouldtranslate the use of primary to be 1 and secondary toa 0. The problem in this approach is that replacementof synonyms may change the meaning or structure ofthe sentence. For example calling someone cool hasa different meaning than calling him or her chilly.

    Our tool analysis for text file Steganography will covera product namely SNOW, that makes use of end oflines of textfor encoding messages.

    3.4.4 Concealing Messages in Text Files: SNOW

    Snow is a free for non-commercial use programavailable at http://www.darkside.com.au/snow and isauthored by Matthew Kwan.

  • 7/24/2019 How to hide data

    7/19

    ISTE-STTP on Network Security & Cryptography, LBSCE 2004

    The encoding scheme used by snow relies on thefact that spaces and tabs (known as whitespace),when appearing at the end of lines, are invisiblewhen displayed in pretty well all text viewing

    programs. This allows messages to be hidden inASCII text without affecting the text's visual

    representation. And since trailing spaces and tabsoccasionally occur naturally, their existence shouldnot be sufficient to immediately alert an observerwho stumbles across them.

    The snow program runs in two modes - messageconcealment, and message extraction.

    During concealment, the following steps are taken.

    Message -> optional compression -> optionalencryption -> concealment in text

    Extraction reverses the process.Extract data from text -> optional decryption ->optional uncompression -> messageThe program has the handy ability to tell you howmuch data it can fit in the desired cover file:Issuing the command snow S cover.txt producesthe following output: File has storage capacity of

    between 1763 and 2012 bits. Our embedded data,my secret.txt is only 107 bytes, so we know it willeasily fit within this cover file. If space is ofconcern, the program also offers a flag to compressthe data, however the author notes that if the datais not text, or of there is lot of data, the use of the

    built-in compression is not recommended, andsuggests that the user pre-compress the data withmore robust compression tools such as winzip.

    Snow also provides the ability to encrypt the datato be hidden with a password-protected key. It usesthe authors own ICE encryption protocol, allowingfor passwords or pass phrases of up to 1170characters. ICE stands for InformationConcealment Engine. It is a 64-bit private key

    block cipher, in the tradition of DES. However,unlike DES, it was designed to be secure againstdifferential and linear cryptanalysis, and has no keycomplementation weaknesses or weak keys. Inaddition, its key size can be any multiple of 64 bits,whereas the DES key is limited to 56 bits. The ICEalgorithm is public domain, and source code can be

    downloaded.

    To embed our secret data in our file we issue thecommand:

    snow C f mysecret.txt p mypassword cover.txtstego.txt

    This command compresses the message containedin mysecret.txt, encrypts it with the password-

    protected key using mypassword and embeds it incover.txt, creating the stego file called stego.txt. Theoutput from the command informs the user that itcompressed the original message by 41.87% and thatthe message used approximately 25.14% of theavailable space in the cover file.

    The extraction process is just as straightforward,issuing the command:

    snow C p mypassword stego.txt

    will output the contents of our secret message file toopen.

    The stego.txt, when opened with a common text editor,like Microsoft Word, looks identical to the original,despite having gained 655 bytes in size with theaddition of the secret message. Also, by tellingMicrosoft Word to show special formatting marks, onecan easily see the inserted tabs and spaces in the stegodocument. Due to the presence of strong encryptionscheme and without knowing the password the attackercan not extract the hidden message.

    3.5 Data Hiding in the Graphic FilesCoding secret messages in digital images is by far themost widely used of all methods in the digital world oftoday. This is because it can take advantage of thelimited power of the human visual system (HVS).Almost any plain text, cipher text, image and any othermedia that can be encoded into a bit stream can behidden in a digital image. With the continued growthof strong graphics power in computers and the researchbeing put into image based Steganography, this field

    will continue to grow at a very rapid pace.

    To a computer, an image is an array of numbers thatrepresent light intensities at various points, or pixels.These pixels make up the images raster data. Whendealing with digital images for use withSteganography, 8-bit and 24-bit per pixel image filesare typical. Both have advantages and disadvantages,as we will explain below. 8-bit images are a greatformat to use because of their relatively small size. Thedrawback is that only 256 possible colors can be usedwhich can be a potential problem during encoding.Usually a gray scale color palette is used when dealingwith 8-bit images such as (.GIF) because its gradualchange in color will be harder to detect after the imagehas been encoded with the secret message. 24-bitimages offer much more flexibility when used forSteganography. The large numbers of colors (over 16million) that can be used go well beyond the HVS,which makes it very hard to detect once a secretmessage, has bee encoded. The other benefit is that amuch larger amount of hidden data can be encodedinto 24-bit digital image as opposed to an 8-bit digitalimage. The one major drawback to 24-bit digitalimages is their large size (usually in MB) makes them

  • 7/24/2019 How to hide data

    8/19

    ISTE-STTP on Network Security & Cryptography, LBSCE 2004

    more suspect than the much smaller 8-bit digitalimages (usually in KB) when sent over an opensystem such as Internet. Digital image compression(lossy compression jpeg) is a good solution tolarge digital images such as 24-bit imagesmentioned earlier.

    Information can be hidden many different ways inimages. Straight message insertion can be done,which will simply encode every bit of informationin the image. More complex encoding can be doneto embed the message only in ``noisy'' areas of theimage that will attract less attention. The messagemay also be scattered randomly throughout thecover image. The most common approaches toinformation hiding in images are:

    Least significant bit (LSB) insertion

    Masking and filtering techniques

    TransformationsEach of these can be applied to various images,

    with varying degrees of success. Each of themsuffers to varying degrees from operationsperformed on images, such as cropping, orresolution decrementing, or decreases in the colourdepth.

    Least Significant Bit InsertionThe least significant bit insertion method is

    probably the most well-known imagesteganography technique. It is a common, simpleapproach to embedding information in a graphicalimage file. Unfortunately, it is extremelyvulnerable to attacks, such as image manipulation.A simple conversion from a GIF or BMP format to

    a lossy compression format such as JPEG candestroy the hidden information in the image.

    When applying LSB techniques to each byte of a24-bit image, three bits can be encoded into each

    pixel. (As each pixel is represented by three bytes.)Any changes in the pixel bits will be indiscernibleto the human eye. For example, the letter A can behidden in three pixels. Assume the original three

    pixels are represented by the three 24-bit wordsbelow:

    ( 00100111 11101001 11001000 ) ( 00100111 1100100011101001 ) ( 11001000 00100111 11101001 )

    The binary value for the letter A is (10000011).Inserting the binary value of A into the three pixels,starting from the top left byte, would result in:

    (00100111 1110100011001000)(00100110 11001000 11101000)(11001000 00100111 11101001)

    The emphasised bits are the only bits that actuallychanged. The main advantage of LSB insertion is

    that data can be hidden in the least and second to leastbits and still the human eye would be unable to noticeit.

    When using LSB techniques on 8-bit images, morecare needs to be taken, as 8-bit formats are not asforgiving to data changes as 24-bit formats are. Care

    needs to be taken in the selection of the cover image,so that changes to the data will not be visible in thestego-image. Commonly known images, (such asfamous paintings, like the Mona Lisa) should beavoided. In fact, a simple picture of your dog would bequite sufficient.

    When modifying the LSB bits in 8-bit images, thepointers to entries in the palette are changed. It isimportant to remember that a change of even one bitcould mean the difference between a shade of red and ashade of blue. Such a change would be immediatelynoticeable on the displayed image, and is thusunacceptable. For this reason, data-hiding expertsrecommend using grey-scale palettes, where thedifferences between shades is not as pronounced.

    Masking and FilteringMasking and filtering techniques hide information bymarking an image in a manner similar to paperwatermarks. Because watermarking techniques aremore integrated into the image, they may be appliedwithout fear of image destruction from lossycompression. By covering, or masking a faint butperceptible signal with another to make the first non-perceptible, we exploit the fact that the human visualsystem cannot detect slight changes in certain temporaldomains of the image.

    Masking techniques are more suitable for use in lossyJPEG images than LSB insertion because of theirrelative immunity to image operations such ascompression and cropping.

    TransformationsTransform Domain tools utilize an algorithm such asthe Discrete Cosine Transformation (DCT) or wavelettransformation to hide information in significant areasof the image. The JPEG image format uses a discretecosine transformation (DCT) to transform successive8x8 pixel blocks of the image into 64 DCT coefficientseach. The least-significant bits of the quantized DCTcoefficients are used as redundant bits into which thehidden message is embedded. The modification of asingle DCT coefficient affects all 64-image pixels. Forthat reason, there are no known visual attacks againstthe JPEG image format.

    Stego-tools which utilize one of the many transformdomain techniques are more robust, have a higherresilience to attacks against the stego-image such ascompression, cropping and image processing. As ofthis writing all of the stego-tools which can manipulateJPEG images are transform domain tools such as;

  • 7/24/2019 How to hide data

    9/19

    ISTE-STTP on Network Security & Cryptography, LBSCE 2004

    Jpeg-Jsteg, JPHide, Outguess, PictureMarc andSysCop.

    Figure 7: Block Diagram of JPEG image

    compression

    3.5.1 Pixel CalculatorThere are a number of easy to use tools availableon the Internet to hide information in image files.To better understand and appreciate some of the

    processes used by these tools some understandingof digital image processing becomes essential.Written by Steve Tenimoto and his team atUniversity of Washington, Pixel Calculator is avery interesting tool to understand digital images.Pixel Calculator also provides a neat feature of

    achieving some very basic Steganography.

    Pixel calculator is equipped with two basic tools. Azooming tool is provided to learn the exact pixelvalue by zooming into an image until the pixelvalues are visible. A calculator tool is then used tochange or modify pixel values. Learning the pixelvalues and changing them using the calculator isthe key in hiding the information inside an image.Figure:6 depicts the image file I used as cover tohide the information.

    I used the zooming tool to find an area of interestwhere neighboring pixel values are close to each

    other. These values are visible in Figure 8. Usingthe calculator tool, I started to replace the pixelvalues in the image with a magic number, say 90. Irepeated this process until I was done typing thehidden message.

    Figure 9 shows the image with a hidden messageon the left hand side and the decoded message atthe right hand side. The red circle on top of themountain peak represents the area where themessage is hidden. To decode the message,calculator tool is used again to convert bits lowerthan the magic number to 0 whereas, thehigher ones are converted to 255. This processconverts rest of the image black and white, whilerevealing the hidden message in gray color.

    Figure 8: Cover Image and Uniform Pixel Area

    3.5.2 Concealing Messages in JPEG Image Files:

    Jsteg

    Jsteg hides the data inside images stored in the JFIFformat of the JPEG standard. It was believed that thistype of steganography was impossible, or at leastinfeasible, since the JPEG standard uses lossyencoding to compress its data. Any hidden data would

    be overwhelmed by the noise. The trick used by thissteganographic implementation is to recognize thatJPEG encoding is split into lossy and non-lossy stages.The lossy stages use a discrete cosine transform and aquantization step to compress the image data; the non-lossy stage then uses Huffman coding to furthercompress the image data. As such, we can insert thesteganographic data into the image data between thosetwo steps and not risk corruption.

    Figure 9: Encoded Picture and Decoded Message

    To compile the package, simply follows the stepsgiven:

    To inject a data file into a JPEG/JFIF image, simplyadd the option "-steg filename" to the "cjpeg"command line. If the data file is too large for theimage, "cjpeg" will inform you. At this point, you cancompress the data file, increase the quality of theimage (thereby increasing image size), or try adifferent image.

  • 7/24/2019 How to hide data

    10/19

    ISTE-STTP on Network Security & Cryptography, LBSCE 2004

    Extraction of a data file works similarly. The "-steg filename" option to "djpeg" writes thesteganographic data to the file, wiping out its

    previous contents. Usually, the decoded image sentto standard output is redirected to "/dev/null".

    3.6 Data Hiding in Audio Files

    Encoding secret messages in audio is the mostchallenging technique to use when dealing withSteganography. This is because the human auditorysystem (HAS) has such a dynamic range that it canlisten over. The HAS perceives over a range of

    power greater than one billion to one and a range offrequencies greater than one thousand to one.Sensitivity to additive random noise is also acute.Perturbations in a sound file can be detected as lowas one part in ten million. However there are someholes available in this perspective range wheredata may be hidden. While the HAS has a largedynamic range, it often has a fairly smalldifferential range. As a result, loud sounds tend to

    mask out quiet sounds. There are someenvironmental distortions so common as to beignored by the listener in most cases.

    There are two concepts to consider before choosingan encoding technique for audio. They are thedigital format of the audio and the transmissionmedium of the audio.

    There are three main digital audio formats typicallyin use. They are Sample Quantization, TemporalSampling Rate and Perceptual Sampling.

    Sample Quantization which is a 16-bit linear

    sampling architecture used by popular audioformats such as .WAV and .AIFF.

    Temporal Sampling uses selectablefrequencies (8 kHz, 9.6 kHz, 10 kHz, 12 kHz,16 kHz, 22.05 kHz and 44.1 kHz.) to samplethe audio. Sampling rate puts an upper boundon the usable portion of the frequency range.Generally, the higher the sampling rate is, thehigher the usable data space gets.

    Perceptual Sampling format changes thestatistics of the audio drastically by encodingonly the parts the listener perceives, thusmaintaining the sound but changing the signal.This format is used by the most popular digital

    audio on the Internet today in ISO MPEG(MP3).

    Transmission medium (path the audio takes fromsender to receiver) must also be considered whenencoding secret messages in audio. The fourtransmission mediums are discussed below.

    Digital end-to-end environment: If a sound fileis copied directly from machine to machine,but never modified, then it will go through this

    environment. As a result, the sampling will beexactly the same between the encoder anddecoder. Very little constraints are put on datahiding in this environment.

    Increased/decreased resampling environment: Inthis environment, a signal is resampled to a higheror lower sampling rate, but remains digital

    throughout. Although the absolute magnitude andphase of most of the signal are preserved, thetemporal characteristics of the signal are changed.

    Analog transmission and resampling: This occurswhen a signal is converted to an analog state,played on a relatively clean analog line, andresampled. Absolute signal magnitude, samplequantisation and temporal sampling rate are notpreserved. In general, phase will be preserved.

    ''Over the air'' environment: This occurs when thesignal is ``played into the air'' and ``resampledwith a microphone''. The signal will be subjectedto possible unknown nonlinear modificationscausing phase changes, amplitude changes,

    drifting of different frequency components,echoes, etc.The signal representation and transmissionenvironment both need to be considered whenchoosing a data-hiding method.

    3.6.1 Methods of Audio Data HidingWe now need to consider some methods of audio datahiding. In low-bit encoding data is embedded by

    replacing the Least Significant Bit (LSB) of eachsampling point by a coded binary string. Thisresults in a large amount of data that can beencoded in a single audio file. For example if the

    ideal noiseless channel capacity is I Kbps then thebit rate will be 8 Kbps given an 8 kHz sampledsequence. While the simplest way to hide data inthe audio files, low-bit encoding scheme can bedestroyed by the channel noise and re-sampling.

    Phase codingwhen it can be used has proven tobe most effective coding techniques in terms ofsignal to noise ratio. In this method the phase ofthe original audio signal is replaced with thereference phase of the data to be hidden. It isdiscovered that a channel capacity ofapproximately 8 bps can be achieved by allocating128 frequency slots per bit with a littlebackground noise. The procedure for phasecoding is as follows:

    The original sound sequence is broken into aseries of N short segments.

    A discrete Fourier transform (DFT) is appliedto each segment, to break create a matrix ofthe phase and magnitude.

    The phase difference between each adjacentsegment is calculated.

    For segment S0, the first segment, an artificialabsolute phase p0 is created.

  • 7/24/2019 How to hide data

    11/19

    ISTE-STTP on Network Security & Cryptography, LBSCE 2004

    For all other segments, new phase framesare created.

    The new phase and original magnitude arecombined to get a new segment, Sn.

    Finally, the new segments areconcatenated to create the encoded output.

    For the decoding process, the synchronisation ofthe sequence is done before the decoding. Thelength of the segment, the DFT points, and the datainterval must be known at the receiver. The valueof the underlying phase of the first segment isdetected as 0 or 1, which represents the coded

    binary string

    Figure 10: A sinusoidal Function*, characterized

    by a period (L), an amplitude (A) and a phase()

    Note: The length of the cycle, L is known as the

    period of the function. The amplitude is the size of

    the variation the height of a peak or depth of a

    trough. The phase is the position of the start of

    cycle, relative to some reference point (e.g., the

    origin) A sine function has =0, whereas a cosine

    function has =/2

    Modern steganographic systems use spread-

    spectrum communications to transmit anarrowband signal over a much largerbandwidth so that the spectral density of thesignal in the channel looks like noise. The twodifferent spread-spectrum techniques thesetools employ are called direct-sequence andfrequency hopping. The former hidesinformation by phase-modulating the datasignal (carrier) with a pseudorandom numbersequence that both the sender and the receiverknow. The latter divides the availablebandwidth into multiple channels and hopsbetween these channels (also triggered by apseudorandom number sequence).

    Echo hiding, a form of data hiding, is amethod for embedding information into anaudio signal. It seeks to do so in a robustfashion, while not perceivably degrading thehost signal (cover audio). Echo hidingintroduces changes to the cover audio that arecharacteristic of environmental conditionsrather than random noise, thus it is robust in

    light of many lossy data compression algorithms.

    Like all good Steganographic methods, echohiding seeks its data into data stream with minimaldegradation of the original data stream. Byminimal degradation, we mean that the change inthe cover audio is either imperceivable or simply

    dismissed by the listener as a common non-objectionable environmental distortion.

    The particular distortion we are introducing issimilar to the resonances found in a room due towalls, furniture, etc. The difference between thestego audio and the cover audio is similar to thedifference between listening to a compact disc onheadphones and listening to it form speakers. Withthe headphones, we hear the sound as it wasrecorded. With the speakers, we hear the soundplus echoes caused by room acoustics. Bycorrectly choosing the distortion we areintroducing for echo hiding, we can make suchdistortions indistinguishable from those a roommight introduce in the above speaker case.

    3.7 Concealing Messages in Image and Audio Files

    Using S-ToolsS-Tools (Steganography Tools) brings you thecapability of concealing files within various forms ofdata. Users of S-Tools can opt to encrypt theirinformation using the strongest state-of-the-artencryption algorithms currently known within theacademic world, so that even an enemy equipped witha copy of S-Tools cannot be completely sure data ishidden unless he has your secret passphrase.

    You could use S-Tools to conceal private orconfidential information that you don't want to fall intothe wrong hands. You could use it to send informationto another individual via a broadcast network such asUsenet. By agreeing on a passphrase you can keep theinformation out of unauthorised hands. Alternativelyyou could use S-Tools to verify your copyright over animage by storing an encrypted copyright statement inthe graphic and extracting it in the event of a dispute.

    How S-Tools hides your dataS-Tools can hide multiple files in one object. If youhave selected compression then the files areindividually compressed and stored together with theirnames. If you are not using compression then just the

    raw file data is stored along with the names. Then S-Tools prepends some random garbage on to the frontof the data in order to prevent two identical sets of filesencrypting the same. The whole lot is then encryptedusing the passphrase that you chose to generate the key(actually, MD5 is used to hash the passphrase down to128 evenly distributed key bits). The encryptionalgorithms all operate in Cipher Feedback Mode(CFB).

  • 7/24/2019 How to hide data

    12/19

    ISTE-STTP on Network Security & Cryptography, LBSCE 2004

    It would be too easy to hide the data by justspreading it across the available bits in a linearfashion, so S-Tools seeds a cryptographicallystrong pseudo-random number generator from your

    passphrase and uses its output in order to choosethe position of the next bit from the cover data touse.

    For instance, if your sound file had 100 bitsavailable for hiding, and you wanted to hide 10 bitsin it, then S-Tools would not choose bits 0 through9 as that would be trivially detectable by a potentialenemy. Instead it might choose bits 63, 32, 89, 2,53, 21, 35, 44, 99, 80. Or it might choose any tenothers, it all depends on the passphrase that youenter. As you can see, the job of a potential enemyhas just become very difficult indeed.

    How data is hidden in soundsSound samples are, by their very nature, inaccurateestimates of the correct value of the sound wave ata particular moment in time. The sound samples inWindows WAV files are stored as either 8 or 16 bitvalues that eventually get passed to the DAconverter in your soundboard. For 8 bit samplesthis means that the values can range between 0 and255. 16 bit samples range between 0 and 65535.All S-Tools does is to distribute the bit-pattern thatcorresponds to the file that you want to hide acrossthe least significant bits of the sound sample. Forexample, suppose that a sound sample had thefollowing eight bytes of information in itsomewhere:

    132 134 137 141 121 101 74 38

    In binary, this is:

    10000100 10000110 10001001 1000110101111001 01100101 01001010 00100110

    (LSB of each byte shown in italics)

    Suppose that we want to hide the binary byte11010101 (213) inside this sequence. We simplyreplace the LSB (Least Significant bit) of eachsample byte with the corresponding bit from the

    byte we are trying to hide. So the above sequencewill change to:

    133 135 136 141 120 101 7439

    In binary, this is:

    10000101 10000111 10001000 1000110101111000 01100101 01001010 00100111

    As you can clearly see, the values of the soundsamples have changed by, at most, one value either

    way. This will be inaudible to the human ear, yet wehave concealed 8 bits of information within thesample. This is the theory behind how S-Tools does itsjob.

    How data is hidden in picturesAll computer-based pictures are composed of an array

    of dots, called pixels, that make up a very fine grid.Each one of these pixels has its own colour,represented internally as separate quantities of red,green and blue. Within Windows, each of these colourlevels may range between 0 (none of the colour) and255 (a full amount of the colour). A pixel with anRGB value of 0 0 0 is black, and one with a value of255 255 255 is white.

    S-Tools works by 'spreading' the bit-pattern of the filethat you want to hide across the least-significant bits(LSB's) of the colour levels in the image.

    For a 24 bit image this is simple because 24 bit imagesare stored internally as RGB triples, and all we need todo is spread our bits and save out the new file. Thedrawback to this is that 24 bit images are uncommon atthe moment, and would therefore attract the attentionof those whose attention you are trying to avoidattracting! They are also very large as they contain 3bytes for every pixel (for a 640x480 image this is640x480x3=921600 bytes).

    It is considerably more difficult to hide anythingwithin a 256-colour image. This is because the imagemay already have over 200 colours which ourmeddling will carry to way over the absolutemaximum of 256.

    Looking at a little theory it is easy to see that an imagewith 32 or less colours will never exceed 256 colours,no matter how much we meddle with it. To see this,visualise the 3 LSB's of an RGB triple as a 3-bitnumber. As we pass through it in our hiding processwe can change it to any one of 8 possible values, thebinary digits from 000 to 111, one of which is theoriginal pattern.

    If one colour can 'expand' to up to 8 colours, howmany distinct colours can we have before we are indanger of exceeding the limit of 256? Simple,256/8=32 colours. There is no guarantee that 32colours is our upper limit for every file that you want

    to hide though. If you're lucky the file will not changea colour to all of its 8 possible combinations and thenwe are able to keep one more of the original colours.In practice, however, you will often find pictures beingreduced to the minimum of 32 colours. S-Tools tries toreduce the number of image colours in a manner thatpreserves as much of the image detail as possible.

    I used a program called FileRay, to compare the binaryvalues of two image files that were operated on using

  • 7/24/2019 How to hide data

    13/19

    ISTE-STTP on Network Security & Cryptography, LBSCE 2004

    S-Tools. cov.gif was used as the cover medium tohide stego.gif. The resultant file was namedhidden.gif. Comparison of the original image tothe stego data reveals changes in the LSB. Figure11 shows the comparison between the two imagefiles. The original image is shown in the bottom

    pane and the stego image is reflected in the top

    pane.

    4. SteganalysisAs the techniques to hide information get morecomplicated and computationally involved, thedetection of such cover medium has becomeconsiderably more challenging as well. However,given time, dedication and technology it is possibleto detect the presence of hidden information insome stego mediums. A few tools have knownsignatures that may predict the presence of hiddeninformation. Techniques like encryption andcompression are used to make it difficult todecipher the hidden information. However,

    knowing the fact that there is hidden informationpresent in the cover destroys the purpose ofsteganography.

    Figure 11: Comparison of cover image with stegoin binary

    Steganalysis is the practice of attackingSteganographic methods by detection, destruction,extraction or modification of embedded data. Thisis the Steganographic analogue to cryptanalysis,which refers to attempts to break cryptographic

    protocols. With Cryptographic protocols,cryptanalysis is generally considered to be succefulif the adversary can retrieve the encrypted message.Steganography adds the additional requirement thatthe steganographically hidden message is not evendetectable by the adversary; that is, not only should

    the attacker not be able to find the message, but heshould not even know it exists. The definition ofsuccess in steganalysis depends upon your intent. Forthe security professional charged with protecting hisemployers data, a successful result would be provingthe existence of hidden data being sent, and notnecessarily the ability to extract it. For the data thief,

    wishing to perhaps use a digital image that contains aprotective watermark, success would be not onlydetecting the existence of the watermark, but wouldalso require destroying it without damaging theintegrity of the desired cover file.

    Research shows that some well-known tools like S-Tools have known signatures and can be recognized ifproper techniques are used. S-Tools works by reducingthe number of colors of the cover image to 32, butexpands them over several color palette entries, if thepalette is then sorted by luminance, blocks of colorsappear to be the same, but actually have a one-bitvariance. This type of variance pattern is extremelyrare in a natural image.

    There are six formal categories of detection techniquesavailable for steganalysis. The following tablesummarizes what the attacker has available to him ineach case:

    Stego

    Object

    Original

    Cover

    Object

    Hidden

    Message

    Stego

    Algorithm

    or Tool

    Stego

    onlyX

    Known

    coverX X

    Knownmessage X X

    Chosen

    stegoX X

    Chosen

    messageX

    Known

    stegoX X X

    Astego only attack, while considered the most difficultattack in that one has the least information to go on, isfar from impossible, especially if ones goal is tomerely detect that there is a hidden message and notnecessarily have the need to extract it. For text filesusing empty space methods, like the one used with the

    Snow tool, merely opening the document with aneditor that shows formatting codes would indicate thatthere were oddities in the formatting. The messagecould be easily destroyed through simply removing theextra spaces and tabs. Similarly, for other text methodssuch as line and word shift methods, visual inspectionof the text itself could indicate anomalies. For imageand audio files with messages embedded with LSBmethodology, detection with only the stego fileavailable is a little more difficult. Detection in this case

  • 7/24/2019 How to hide data

    14/19

    ISTE-STTP on Network Security & Cryptography, LBSCE 2004

    would rely on the appearance of visual or audibledistortions or patterns.

    With the Known-cover method, one has both theoriginal innocent cover file as well as the resultingstego file. Anomalous patterns and excess noise inthe stego file are much more easily detectable when

    comparing it to the original, particularly if the fileformat makes use of compression (as in JPEG files)that would show up as excess noise even in asinnocent file. Destruction or distortion of LSBencoded messages in images in image files can besimply a matter of zeroing out the LSB fields of thefile in question, image conversion, cropping or theapplication of other image formatting changes. Foraudio files, methods to damage the hidden messageinclude then introduction of a random relativeamplitude signal and reconstructing the file byignoring bad signals.

    A known-message attack gives the attacker theknowledge of the secret message that is hidden inthe file.

    A chosen-stego attack provides the attacker withthe extraction tool to reach the data, and thechosen message technique assumes the attackerhas the stego tool itself, and can embed and detectmessage at will.

    There are other ways to break up attack-types, andthese are also useful in describing thevulnerabilities of various methods. Wayner dividescommon attack methods by functional propertiesrather than adversarial assumptions; attacks aredivided into visual or aural attacks, structuralattacks, and statistical attacks. Visual and auralattacks describe the human factor in attacks;humans can often perceive the modifications in thecover object because it doesnt look or sound right.In text steganography this can be extended toformat-based, lexical, grammatical, semantic, andrhetorical attacks. Among others. Structural attacksrefer to detecting the patterns in modificationsmade in the data format (for example, using extraspace in files or encoding schemes to storeinformation is often detectable through structuralattacks). Statistical attacks detect anomalies in thestatistical profile of the stego-object (for example,images whose color palette has been changed to

    hide information often contain non-standard colorsor ranges of colors which would not normally begenerated by image software).

    5. Digital Watermarking --

    Steering the Future of SecurityWatermarks were first used in Europe to identifythe guild that manufactured paper. They were liketrademarks or signatures. Varying the papers

    density creates watermarks in paper. Normallyinvisible, a watermark image becomes visible as darkerand lighter areas when the paper is held up to the light.Wire or relief sculptures are placed in the paper moldand when the paper slurry is drained of its water anddried the thinner areas created by the wire or sculptureshow clearly when held up to the light. Watermarks are

    still used in quality stationary and have even beenadded to currencies of various countries.

    A watermark is a form, image or text that is impressedonto paper, which provides evidence of its authenticity.Digital watermarking is an extension of this concept inthe digital world. In recent years the phenomenalgrowth of the Internet has highlighted the need formechanisms to protect ownership of digital media.Exactly identical copies of digital information, be itimages, text or audio, can be produced and distributedeasily. In such a scenario, who is the artist and who theplagiarist? Its impossible to tell--or was, until now.Digital watermarking is a technique that provides asolution to the longstanding problems faced withcopyrighting digital data.

    Digital watermarks are pieces of information added todigital data (audio, video, or still images) that can bedetected or extracted later to make an assertion aboutthe data. This information can be textual data about theauthor, its copyright, etc; or it can be an image itself.The digital watermarks remain intact undertransmission / transformation, allowing us to protectour ownership rights in digital form.

    Figure 12: Watermark in currency

    Watermarkon new$100 billshowsBenjaminFranklinwhen youhold thebill up tothe light.

    A given watermark may be unique to each copy (e.g. toidentify the intended recipient), or be common tomultiple copies (e.g. to identify the document source).

  • 7/24/2019 How to hide data

    15/19

    ISTE-STTP on Network Security & Cryptography, LBSCE 2004

    In either case, the watermarking of the documentinvolves the transformation of the original intoanother form. This distinguishes digitalwatermarking from digital fingerprinting, wherethe original file remains intact and a new createdfile 'describes' the original file's content.

    5.1

    General Framework for WatermarkingA digital watermark is, in essence, a hiddenmessage, within a digitized image, video or audiorecording. The watermark is integrated into thecontent itself. So it requires no additional storagespace.

    In general, any watermarking scheme (algorithm)consists of three parts.

    The watermark

    The encoder (insertion algorithm)

    The decoder and comparator (verificationor extraction or detection algorithm)

    Figure 13: Digital Copy of fifteenth century

    drawing with digital watermark superimposed.

    Each owner has a unique watermark or an ownercan also put different watermarks in differentobjects. The marking algorithm incorporates thewatermark into the object. The verificationalgorithm authenticates the object determining boththe owner and the integrity of the object.

    5.1.1 Encoding ProcessLet us denote an image by I, a signature byS=s1,s2,and the watermarked image by I. E is anencoder function, it takes an image I and asignature S, and it generates a new image which iscalled watermarked image I, mathematically,

    E(I,S) = I .. (1)

    Figure 14: Encoder

    5.1.2

    Decoding ProcessA decoder function D takes an image J (J can be awatermarked or un-watermarked image, and possiblycorrupted) whose ownership is to be determined andrecovers a signature S from the image. In this processan additional image I can also be included which isoften the original and un-watermarked version of J.

    This is due to the fact that some encoding schemesmay make use of the original images in thewatermarking process to provide extra robustnessagainst intentional and unintentional corruption ofpixels. Mathematically,

    D(J,I) = S . (2)

    The extracted signature S will then be compared withthe owner signature sequence by a comparator function

    C and a binary output decision generated. It is 1 ifthere is match and 0 otherwise, which can berepresented as follows.

    Where C is the correlator, x= C (S, S). c is the

    correlation of two signatures and is certain threshold.Without loss of generality, watermarking scheme can

    be treated as a three-tupple (E, D, C ). Followingfigures demonstrate the decoder and the comparator.

    Figure 15: Decoder

    Figure 16: Comparator

    A watermark must be detectable or extractable to beuseful. Depending on the way the watermark isinserted and depending on the nature of thewatermarking algorithm, the method can involve very

  • 7/24/2019 How to hide data

    16/19

    ISTE-STTP on Network Security & Cryptography, LBSCE 2004

    distinct approaches. In some watermarkingschemes, a watermark can be extracted in its exactform, a procedure we call watermark extraction. Inother cases, we can detect only whether a specificgiven watermarking signal is present in an image, a

    procedure we call watermark detection. It should benoted that watermark extraction can prove

    ownership whereas watermark detection can onlyverify ownership.

    5.2 Watermarking AlgorithmsWatermarks and watermarking techniques can bedivided into various categories in various ways.The watermarks can be applied in spatial domain.An alternative to spatial domain watermarking isfrequency domain watermarking. It has been

    pointed out that the frequency domain methods aremore robust than the spatial domain techniques.

    A simple Spatial watermarking algorithm --The LSB technique

    The LSB technique is the simplest technique of

    watermark insertion. If we specifically considerstill images, each pixel of the color image has threecomponents -- red, green and blue. Let us assumewe allocate 3 bytes for each pixel. Then, eachcolour has 1 byte, or 8 bits, in which the intensityof that colour can be specified on a scale of 0 to255.

    So a pixel that is bright purple in colour wouldhave full intensities of red and blue, but no green.Thus that pixel can be shown as

    X0 = {R=255, G=0, B=255}Now lets have a look at another pixel:

    X1 = {R=255, G=0, B=254}

    Weve changed all the value of B here. But howmuch of a difference does it make to the humaneye? For the eye, detecting a difference of 1 on acolor scale of 256 is almost impossible. Now sinceeach color is stored in a separate byte, the last bit ineach byte stores this difference of one. That is, thedifference between values 255 and 254, or 127 and126 is stored in the last bit, called the LeastSignificant Bit (LSB).

    Since this difference does not matter much, whenwe replace the color intensity information in theLSB with watermarking information, the image

    will still look the same to the naked eye. Thus, forevery pixel of 3 bytes (24 bits), we can hide 3 bitsof watermarking information, in the LSBs.

    Thus a simple algorithm for this technique wouldbe:

    Let W be watermarking informationFor every pixel in the image, XiDo Loop:

    Store the next bit from W in theLSB position of Xi [red] byteStore the next bit from W in theLSB position of Xi [green] byte

    Store the next bit from W in theLSB position of Xi [blue] byteEnd Loop

    To extract watermark information, we would simplyneed to take all the data in the LSBs of the color bytesand combine them. Image manipulations, such asresampling, rotation, format conversions and cropping,will in most cases result in the watermark informationbeing lost.

    Frequency based WatermarkingWatermarking in the frequency domain involvesselecting the pixels to be modified based on thefrequency of occurrence of that particular pixel. This isto overcome the greatest disadvantage of techniques

    operating in the spatial domain i.e. susceptibility tocropping. The mosaic attack (In a mosaic attack, theattacker breaks up the entire watermarked image intomany small parts. For example, a watermarked imageon a web page can be cut up and reassembled as awhole using tables in HTML. The only defence againstthis attack is to tile a very small watermark all over theimage, and allow retrieval of the watermark from anyof the small subsections of the fragmented image.However, the attacker can always create smallerblocks, and the watermarked image also has to be largeenough to be distinguishable), defeats mostimplementations of digital watermarking operating inthe spatial domain but the frequency domain

    watermarking is less susceptible.

    The LSB technique can also be applied in thefrequency domain selecting the pixels according tofrequency, though not robust. Common transforms,such as Fast Fourier Transforms, alter the value ofpixels within the original image based on theirfrequencies. The watermark is more commonly appliedto the lower frequencies within an image as higherfrequencies are usually lost when an image iscompressed or to frequencies considered to containperceptually significant information. Frequency basedtechniques result in a watermark that is dispersedthroughout the image, therefore, less susceptible toattack by cropping. However these techniques aresusceptible to standard frequency filters and lossycompression algorithms, which tend to filter out lesssignificant frequencies.

    5.3 Types of Digital WatermarksWatermarks and watermarking techniques can bedivided into various categories in various ways. Thewatermarks can be applied in spatial domain. Analternative to spatial domain watermarking is frequencydomain watermarking.

  • 7/24/2019 How to hide data

    17/19

    ISTE-STTP on Network Security & Cryptography, LBSCE 2004

    Visible watermark is a secondary translucentoverlaid into the primary image. The watermarkappears visible to a casual viewer on a carefulinspection.

    A fragile watermark is a mark, which is sensitiveto a modification of the stego-medium. A fragile

    watermarking scheme should be able to detect anychange in the signal and identify where it has taken

    place and possibly what the signal was beforemodification. It serves at proving the authenticityof a document.

    On the opposite, a robust watermark should bestuck to the document it has been embedded in, insuch a way that any signal transform of reasonablestrength cannot remove the watermark. Hence a

    pirate willing to remove the watermark will notsucceed unless they debase the document too muchto be of commercial interest.

    Dual watermark is a combination of a visible andan invisible watermark. In this type of watermarkan invisible watermark is used as a back up for thevisible watermark as clear from the followingdiagram.

    Private watermarking and non-blind-watermarkingmean the same: the original cover signal is requiredduring the detection process. By asymmetricwatermarking or public-key watermarking, peoplerefer to watermarking schemes with propertiesreminding asymmetric cryptosystem (or public keycryptosystem). No such system really exists yetalthough some possible suggestions have been

    made. In this case, the detection process (and inparticular the detection key) is fully known toanyone as opposed to blind watermarking where asecret key is required. So here, only a 'public key' isneeded for verification and a 'private key' (secret) isused for the embedding though. Knowledge of the

    public key does not help to compute the privatekey, it does not either allow removal of the marknor it allows an attacker to forge a mark.

    Source-based watermark are desirable forownership identification or authentication where aunique watermark identifying the owner isintroduced to all copies of a particular image beingdistributed. A source-based watermark could beused for authentication and to determine whether areceived image or other electronic data has beentempered with. The watermark could also beestimation-based where each distributed copy getsa unique watermark identifying the particular

    buyer. The destination-based watermark could beused to trace the buyer in the case of illegalreselling.

    5.4

    Applications of Digital Watermarks

    Visible WatermarkVisible watermarks can be used in the following cases:

    Visible watermarking for enhanced copyrightprotection. In such situations, where images are

    made available through Internet the content owneris concerned that the images will be usedcommercially (e.g. Imprinting coffee mugs)without payment of royalties. Here the contentowner desires an ownership mark, that is visuallyapparent, but which does not prevent image beingused for other purposes (e.g. scholarly research).

    Visible watermarking used to indicate ownershiporigins. In this case images are made availablethrough the Internet and the content owner desiresto indicate the ownership of the underlyingmaterials (library manuscript), so an observermight be encouraged to patronize the institutionsthat own the material.

    Invisible Robust Watermark:Invisible robust watermarks find application infollowing cases.

    Invisible watermarking to detect misappropriatedimages. In this scenario, the seller of digitalimages is concerned, that his, fee generatingimages may be purchased by an individual whowill make them available for free, this woulddeprive the owner of licensing revenue.

    Invisible watermarking as evidence of ownership.In this scenario, the seller that of the digitalimages suspects one of his images has been edited

    and published without payment of royalties Here,the detection of the sellers watermark in theimage is intended to serve as evidence that thepublished image is property of seller.

    Invisible Fragile WatermarksFollowing are the applications of invisible fragilewatermarks.

    Invisible watermarking for a trustworthycamera. In this scenario, images are capturedwith a digital camera for later inclusion in newsarticles. Here, it is the desire of a news agencyto verify that an image is true to the original

    capture and has not been edited to falsify ascene. In this case, an invisible watermark isembedded at capture time; its presence at thetime of publication is intended to indicate thatthe image has not been attended since it wascaptured.

    Invisible watermarking to detect alternation ofimages stored in a digital library. In this case,images (e.g. human fingerprints) have beenscanned and stored in a digital library; thecontent owner desires the ability to detect any

  • 7/24/2019 How to hide data

    18/19

    ISTE-STTP on Network Security & Cryptography, LBSCE 2004

    alternation of the images, without the needto compare the images to the scannedmaterials.

    6. Audio WatermarkingDigital audio watermarking involves theconcealment of data within a discrete audio file.Applications for this technology are numerous.

    Intellectual property protection is currently the maindriving force behind research in this area. Tocombat online music piracy, a digital watermarkcould be added to all recording prior to release,signifying not only the author of the work, but theuser who has purchased a legitimate copy. Neweroperating systems equipped with digital rightsmanagement software (DRM) will extract thewatermark from audio files prior to playing them onthe system. The DRM software will ensure that theuser has paid for the song by comparing thewatermark to the existing purchased licenses on thesystem.

    DC Watermarking SchemeThis section details the implementation of a digitalaudio watermarking scheme, which can be used tohide auxiliary information within a sound file.

    Figure 17: Types of Watermarking Techniques

    The DC watermarking scheme hides watermarkdata in lower frequency components of the audiosignal, which are below the perceptual threshold ofthe human auditory system.

    Watermark InsertionThe process of inserting a digital watermark into anaudio file can be divided into four main processes(see Figure 8). A original audio file in wave formatis fed into the system, where it is subsequentlyframed, analyzed, and processed, to attach theinaudible watermark to the output signal.

    FramingThe audio file is portioned into frames which are90 milliseconds in duration. With a 90 ms framesize, our bit rate for watermarked data is equal to 1/ 0.09 = 11.1 bits per second.

    Spectral AnalysisNext, spectral analysis is performed on the signal,consisting of a fast Fourier transform (FFT), whichallows us to calculate the low frequency componentsof each frame, as well as the overall frame power.From the FFT, we are now able to determine the lowfrequency (DC) component of the frame as well as the

    frame spectral power.

    Figure 18. Watermark Insertion Process.

    DC RemovalFrom the above spectral analysis of each frame, we

    have calculated the low frequency (DC) componentF(1), which can now be removed by subtraction fromeach frame .

    Watermark Signal Addition

    From the spectral analysis completed previously, wecalculated the spectral power for each frame, which isnow utilised for embedding the watermark signal data.The power in each frame determines the amplitude ofthe watermark which can be added to the lowfrequency spectrum.

    Watermark ExtractionThe process of extracting the digital watermark from

    the audio file is similar to the technique for insertingthe watermark. The computer processing requirementsfor extraction are slightly lower. A marked audio filein wave format is fed into the system, where it issubsequently framed, analysed, and processed, toremove the embedded data which exists as a digitalwatermark.

    FramingAs with the insertion process, the audio file ispartitioned into frames which are 90 milliseconds induration.

    Figure 19: Watermark Extraction Process

    Spectral Analysis

    Subsequent to the framing of the watermarked audiosignal, we perform spectral analysis on the signal,consisting of a fast Fourier transform (FFT), which

  • 7/24/2019 How to hide data

    19/19

    again allows us to calculate the low frequencycomponents of each frame, as well as the overallframe power.

    Watermark Signal Extraction

    From the spectral analysis completed previously,we calculated the spectral power for each frame,

    which allows us to examine the low frequencypower in each frame and subsequently extract thewatermark.

    In order to attain higher hidden data density in thewatermarked signal, more advanced techniquesmust be used such as spread spectrum, phaseencoding, or echo hiding.

    7. ConclusionsIn this tutorial, we take an introductory look atinformation hiding techniques. Historical detail isdiscussed. Several methods for hiding data in text,image, and audio are described, with appropriateintroductions to the environment of each medium,as well as the strengths and weaknesses of eachmethod. Most data hiding systems take advantageof human perceptual weaknesses, but haveweaknesses of their own. In areas wherecryptography and strong encryption are beingoutlawed, citizens are looking at steganography tocircumvent such policies and pass messagescovertly. Commercial applications ofsteganography in the form of digital watermarksare currently being used to track the copyright andownership of electronic media. We conclude thatfor now, it seems that no system of data hiding istotally immune attack. However, steganography

    has its place in security. It in no way can replacecryptography, but is intended to supplement it. Itsapplication in watermarking for use in detection ofunauthorised, illegally copied material iscontinually being realised and developed.

    References

    [1] Petitcolas, F.A.P., Anderson, R., Kuhn, M.G.,"Information Hiding - A Survey", July1999, URL:http://www.cl.cam.ac.uk/~fapp2/publications/ieee99-infohiding.pdf (11/26/0117:00)

    [2] An archive of steganography and steganalysis tools: URL:http://members.tripod.com/steganography/stego/software.html (11/26/01 17:00)

    [3] Katzenbeisser, S., Petitcolas, F.A.P., Information HidingTechniques for Steganography and Digital Watermarking,Norwood: Artech House, 2000, pg 56 - 92

    [4] Johnson, N.F., Jajodia, S., "Steganalysis of images createdusing current steganographic tools", April 1998, URL:http://www.ise.gmu.edu/~njohnson/ihws98/jjgmu.html(11/26/01 17:00)

    [5] Provos, N., Honeyman, P., "Detecting SteganographicContent on the Internet", August 2001,http://www.citi.umich.edu/techreports/reports/citi_tr_01-11.pdf (11/26/01 17:00)

    [6] McCullagh, D., "Secret Messages Come in .Wavs", Feb20, 2001, Wired News, URL:

    http://www.wired.com/news/politics/0,1283,41861,00.html(11/26/01 17:00)

    [7] Artz, D., "Digital Steganography: Hiding Data within Data",IEEE Internet Computing, May-June 2001, pg 75-80

    [8] Beyda, W.J., Data Communications From Basics toBroadband 3 rd edition, Upper Saddle

    [9] River: Prentice Hall, 2000, pg 38 - 40[10] Kelley, J., "Terrorist instructions hidden online", USA TODAY,

    06/19/2001,URL:

    http://www.usatoday.com/life/cyber/tech/2001-02-05-binladen-side.htm (11/26/01 17:00)[11] Johnson, N.F., Jajodia, S., "Exploring Steganography: Seeing

    the Unseen", February 1998, URL:http://www.jjtc.com/pub/r2026.pdf (11/26/01 17:00)

    [12] Johnson, N.F., Jajodia, S., "Steganalysis: The Investigation ofHidden Information", IEEE Information TechnologyConference, September 1998,

    [13] URL: http://www.jjtc.com/pub/it98a.htm (11/26/01 17:00)[14] Kelley, J., "Terror groups hide behind Web encryption", USA

    TODAY, 06/19/2001,[15] URL: http://www.usatoday.com/life/cyber/tech/2001-02-05-

    binladen.htm (11/26/01 17:00)[16] McCullagh, D., "Bin Laden: Steganography Master?", Wired

    News, 07 Feb 2001,[17] URL:

    http://www.wired.com/news/politics/0,1283,41658,00.html(11/26/01 17:00)

    [18] Schneier, B., Crypto-Gram Newsletter, October 15 1998,[19] URL: http://www.counterpane.com/crypto-gram-9810.html

    (11/26/01 17:00)[20] B .Tao and B .Dickinson, Adaptive Watermarking in DCT

    Domain, Proc. IEEE International Conference on Acoustics,Speech and Signal Processing, ICASSP- 97, 1997, Vol.4,pp.1985-2988.

    [21] R. G. Van Schyndel, A Digital Watermark, Proc. IEEEInternational Conference on Image Processing, ICIP-94, 1994,Vol.2, pp.86-90.

    [22] Saraju P. Mohanty, Watermarking of Digital Images, AMaster Degrees Project Report, Dept. of EE, Indian Instituteof Science, Bangalore - 560 012, India, Jan. 1999.

    Courtesy: WWW