chapter 3 polyalphabetic ciphers€¦ · this operation is self-inverse, so that exactly the same...

38

CHAPTER 3

POLYALPHABETIC CIPHERS

3.1 INTRODUCTION

In a polyalphabetic cipher, multiple cipher alphabets are used. To

facilitate encryption, all the alphabets are usually written out in a large table,

traditionally called a tableau. Usually the tableau is 26 × 26, so that 26 full

ciphertext alphabets are available. The method of filling the tableau, and of

choosing which alphabet to use next, defines the particular polyalphabetic

cipher. All such ciphers are easier to break than were believed since the

substitution alphabets are repeated for sufficiently large plaintexts. One of the

most popular was that of Vigenere cipher.

A simple substitution cipher involves a single mapping of the

plaintext alphabet onto ciphertext characters (Menezes et al 1997). A more

complex alternative is to use different substitution mappings (called multiple

alphabets) on various portions of the plaintext. This results in so-called

polyalphabetic substitution. In the simplest case, the different alphabets are

used sequentially and then repeated, so the position of each plaintext character

in the source string determines which mapping is applied to it. Under different

alphabets, the same plaintext character is thus encrypted to different

ciphertext characters, precluding simple frequency analysis as per mono-

alphabetic substitution. The simple Vigenere cipher is a polyalphabetic

substitution cipher. The definition is repeated here for convenience.

39

3.2 ADVANTAGES OF POLYALPHABETIC CIPHERS

The advantage of Polyalphabetic ciphers is that they make

frequency analysis more difficult. Frequency analysis is the practice of

decrypting a message by counting the frequency of ciphertext letters, and

equating it to the letter frequency of normal text. For instance if P occurred

most in a ciphertext whose plaintext is in English, one could suspect that P

corresponded to E, because E is the most frequently used letter in English.

Using the Vigenere cipher, E can be enciphered as any of several letters in the

alphabet in the Vigenere cipher, thus defeating simple frequency analysis

(www.experiencefestival.com/vigenre_cipher).

3.3 VIGENERE CIPHER

The Vigenere cipher is a method of encryption invented by Giovan

Batista Belaso and described in his 1553 book, “La cifra del. Sig. Giovan

Batista Belaso”. It was misattributed to Blaise de Vigenere in the 19th

century, and given his name. The cipher is a keyword-based system that uses

a series of different Caesar ciphers based on the letters of the keyword. It is a

simplified version of the more general polyalphabetic substitution cipher,

invented by Alberti ca 1465. This cipher is well-known because while it is

easy to understand and implement, it often appears to beginners to be

unbreakable. Consequently, many programmers have implemented

obfuscation or encryption schemes in their applications which are essentially

Vigenere ciphers, only to have them broken by the first cryptanalyst who

comes along. Use and cryptanalysis of the Vigenere cipher is therefore

frequently introduced at the beginning of courses on cryptography.

In the Vigenere cipher, the first row of the tableau is filled out with

a copy of the plaintext alphabet, and successive rows are simply shifted one

40

place to the left. (Such a simple tableau is called tabula recta, and

mathematically corresponds to adding the plaintext and key letters,

modulo 26.) A keyword is then used to choose which ciphertext alphabet to

use. Each letter of the keyword is used in turn, and then they are repeated

again from the beginning. So if the keyword is ’CAT’, the first letter of

plaintext is enciphered under alphabet ’C’, the second under ’A’, the third

under ’T’, the fourth under ’C’ again, and so on. In practice, Vigenere keys

were often phrases several words long. In 1863, Friedrich Kasiski published a

method, which enabled the calculation of the length of the keyword in a

Vigenere ciphered message. Once this was done, ciphertext letters that had

been enciphered under the same alphabet could be picked out and attacked

separately as a number of semi-independent simple substitutions complicated

by the fact that within one alphabet letters were separated and did not form

complete words, but simplified by the fact that usually a tabula recta had been

employed. As such, even today a Vigenere type cipher should theoretically be

difficult to break if mixed alphabets are used in the tableau, if the keyword is

random, and if the total length of ciphertext is less than

27.6 times the length of the keyword. These requirements are rarely

understood in practice and so Vigenere enciphered message security is usually

less than what might have been.

Other notable polyalphabetics include:

• The Gronsfeld cipher. This is identical to the Vigenere except

that only 10 alphabets are used, and so the “keyword” is

numerical.

• The Beaufort cipher. This is practically the same as the

Vigenere, except the tabula recta is replaced by a backwards

one, mathematically equivalent to ciphertext = key – plaintext.

41

This operation is self-inverse, so that exactly the same table is

used in exactly the same way, for both encryption and

decryption.

• The autokey cipher, which mixes plaintext in to the keying to

avoid periodicity in the key.

• The running key cipher, where the key is made very long by

using a passage from a book or similar text.

3.3.1 Definition of Vigenere Cipher

A simple Vigenere cipher of period t, over an s-character alphabet,

involves a t -character key k1k2k3…kt. The mapping of plaintext

m= m1m2m3.......... to ciphertext c=c1c2c3… is defined on individual characters

by ci =m i+ ki mod s, where subscript i in ki is taken modulo t (the key is re-used).

The simple Vigenere uses t shift ciphers defined by t shift values ki, each

specifying one of s (mono-alphabetic) substitutions; ki is used on the

characters in position i , i+s , i+2s….. . In general, each of the t substitution is

different; this is referred to as using t alphabets rather than a single

substitution mapping. The shift cipher is a simple Vigenere with period t=1.

3.3.2 Vigenere Table

Blaise de Vigenere was born in 1523 in Saint-Pourcain, France.

While serving as a diplomat in Rome, he came into contact with Giovanni

Battista della Porta in 1549 and learned from Porta’s Traicte´ des Chiffres

1585 describing various encryption systems. Vigenere’s book, “A Treatise on

Secret Writing” was published when Vigenere returned to Paris. It contains

the basic 26×26 Vigenere tableaux Table 3.1.

42

Table 3.1 Vigenere tableaux

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

A A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

B B C D E F G H I J K L M N O P Q R S T U V W X Y Z A

C C D E F G H I J K L M N O P Q R S T U V W X Y Z A B

D D E F G H I J K L M N O P Q R S T U V W X Y Z A B C

E E F G H I J K L M N O P Q R S T U V W X Y Z A B C D

F F G H I J K L M N O P Q R S T U V W X Y Z A B C D E

G G H I J K L M N O P Q R S T U V W X Y Z A B C D E F

H H I J K L M N O P Q R S T U V W X Y Z A B C D E F G

I I J K L M N O P Q R S T U V W X Y Z A B C D E F G H

J J K L M N O P Q R S T U V W X Y Z A B C D E F G H I

K K L M N O P Q R S T U V W X Y Z A B C D E F G H I J

L L M N O P Q R S T U V W X Y Z A B C D E F G H I JK

M M N O P Q R S T U V W X Y Z A B C D E F G H I J K L

N N O P Q R S T U V W X Y Z A B C D E F G H I J K L M

O O P Q R S T U V W X Y Z A B C D E F G H I J K L M N

P P Q R S T U V W X Y Z A B C D E F G H I J K L M N O

Q Q R S T U V W X Y Z A B C D E F G H I J K L M N O P

R R S T U V W X Y Z A B C D E F G H I J K L M N O P Q

S S T U V W X Y Z A B C D E F G H I J K L M N O P Q R

T T U V W X Y Z A B C D E F G H I J K LM N O P Q R S

U U V W X Y Z A B C D E F G H I J K L M N O P Q R S T

V V W X Y Z A B C D E F G H I J K L M N O P Q R S T U

W W X Y Z A B C D E F G H I J K L M N O P Q R S T U V

X X Y Z A B C D E F G H I J K L M N O P Q R S T U V W

Y Y Z A B C D E F G H I J K L M N O P Q R S T U V W X

Z Z A B C D E F G H I J K L M N O P Q R S T U V W X Y

43

The Vigenere encipherment of plaintext x (identified by its column

position) with the key k (identified by its row number) is the table entry in the

kth

row and column position x; for example, plaintext x = B is enciphered with

the key K = 2 to ciphertext y = d. Vigenere polyalphabetic encipherment

extends a sequence of r letters (k0, k1, . . . , kr-1) periodically to generate the

running key, k = (k0, k1, . . . , kn-1, . . .) with ki = k(i (modulo r)) for 0 ≤ i < ∞ . For

example, the key of length 12.

C R Y P T O G R A P H Y

12 17 24 15 19 14 6 17 0 15 8 24

Enciphers plaintext of length 20 using the repeated key

C R Y P T O G R A P H Y C R Y P T O G R

2 17 24 15 19 14 6 17 0 15 8 24 2 17 24 15 19 14 6 17

Vigenere’s original scheme subtracted rather than added the key

from the plaintext

x → y = ( y0, y1, . . . , yn-1), yi = (xi - ki) (modulo m).

It was rediscovered nearly one hundred years later by Admiral Sir

Francis Beaufort, whose name is associated with the wind velocity scale.

3.3.3 Operation of Vigenere Cipher

The Vigenere was described as “impossible of translation” in the

respected journal, “Scientific American”. “The Alphabet Cipher” provides a

good description of how to use a table for encryption and decryption using

arbitrary keywords, but here is an alternate description:

44

1. The encipherer chooses a plaintext: VIGENERE

2. The encipherer chooses a keyword and repeats it to become

the length of the plaintext, e.g. the keyword, “CIPH”:

CIPHCIPH

3. To encipher letter L1 of the plaintext, the encipherer creates a

new alphabet wherein A is shifted to letter L1 of the

ciphertext, B is shifted to the next letter, etc.:

ABCDEFGHIJKLMNOPQRSTUVWXYZ

CDEFGHIJKLMNOPQRSTUVWXYZAB

4. The encipherer finds the letter that corresponds to L1 in the

substitution alphabet. This is now L1 of the plaintext: V⇒ X

5. This is repeated for each letter in the plaintext and its

corresponding letter in the key: VIGENERE + CIPHCIPH ⇒

XQVLPMGL

(www.mathdaily.com/lessons/Vigen%E8re_cipher).

A simpler, but equivalent way to encode a message is to write out a

copy of the alphabet, and then write the keyword vertically beneath the letter

A. Then, starting with each letter, complete the alphabet (starting again with

A after reaching Z). For example, if the keyword is “CUP”, one would write:

ABCDEFGHIJKLMNOPQRSTUVWXYZ

CDEFGHIJKLMNOPQRSTUVWXYZAB

UVWXYZABCDEFGHIJKLMNOPQRST

PQRSTUVWXYZABCDEFGHIJKLMNO

To encode a message, one locates the plaintext letter in the top row,

and then reads the ciphertext letter from one of the alphabets below, using

45

each one in turn. One can also write out the entire set of these shifted

alphabets, picking the right row for any letter of the key, the resulting block of

alphabets is known as a tabula recta. This use of multiple alphabets in rotation

to encrypt a message is why this is called a polyalphabetic cipher; the

systematic and repeated use of multiple alphabets (all ordered as in the natural

alphabet) is why this cipher is not as secure as polyalphabetic ciphers can be.

3.4 SUBSTITUTION CIPHERS (BACKGROUND)

A substitution cipher is a cipher that replaces each plaintext symbol

with another ciphertext symbol. The receiver deciphers using the inverse

substitution. A substitution alphabet is the extended list of ciphertext symbols.

Examples are Caesar ciphers and the Atbash cipher.

This section considers the following types of classical ciphers:

simple (or mono-alphabetic) substitution, polygram substitution, and

homophonic substitution. The difference between codes and ciphers is also

noted. Polyalphabetic substitution ciphers are considered (Menezes et al

1997).

3.4.1 Mono-Alphabetic Substitution

The simple substitution cipher is one in which each plaintext

character is simply replaced by a corresponding one from a cipher alphabet.

The cipher alphabet may be shifted or reversed (creating the Caesar cipher

and atbash ciphers, respectively) or scrambled, in which case it is called a

“mixed alphabet” or “deranged alphabet”. Traditionally, mixed alphabets are

created by first writing out a keyword, then all the remaining letters.

Traditionally, the ciphertext is written out in blocks of fixed length to help

avoid errors and to disguise word boundaries from the plaintext.

46

Suppose the ciphertext and plaintext character sets are the same.

Let m= m1m2m3.......... be a plaintext message consisting of juxtaposed

characters mi A∈ , where A is some fixed character alphabet such as

A= {A, B,C, Z}. A simple substitution cipher or monoalphabetic substitution

cipher employs a permutation e over A, with encryption mapping

Ee (m) =e (m1) e (m2) e (m3)…. . Here juxtaposition indicates concatenation

(rather than multiplication), and e (mi) is the character to which mi is mapped

by e.

The disadvantage of using mono alphabetic in ciphering is easy

way to cryptanalysis since every bigram (for example _A) is mapped to the

same encrypted bigram each time.

The time complexity of mono-alphabetic is O (n2) when n is the

size of alphabetic, the possible number of keys in monoalphabetic is n!

3.4.2 PolyGram Substitution (Example Playfair Cipher)

The Playfair cipher or Playfair square is a manual symmetric

encryption technique; this technique encrypts pairs of letters (digraphs),

instead of single letters as in the simple substitution cipher and rather more

complex Vigenere cipher systems. The Playfair is thus significantly harder to

break since the frequency analysis used for simple substitution ciphers does

not work with it. The usual form of the cipher used a 5 by 5 table and a key

word or phrase. Memorization of the key and 4 simple rules was all that was

required to create the 5 by 5 table and use the cipher. First fill in the spaces in

the table with the letters of the key (dropping any duplicate letters), then fill

the remaining spaces with the rest of the letters of the alphabet in order

(usually omitting “Q” to reduce the alphabet to fit, other versions put both “I”

and “J” in the same space). The key can be written in the top rows of the

47

table, from left to right, or in some other pattern, such as a spiral beginning in

the upper-left-hand corner and ending in the center. Then apply the following

4 rules, in order, to each pair of letters to encrypt a message:

• If the letters of a pair are both the same (or only one letter is

left), add an “X” after the first letter. Encrypt the new pair and

continue.

• If the letters appear on the same row of your table, replace

them with the letters to their immediate right respectively

(wrapping around to the left side of the row if a letter in the

original pair was on the right side of the row).

• If the letters appear on the same column of your table, replace

them with the letters immediately below respectively

(wrapping around to the top side of the column if a letter in

the original pair was on the bottom side of the column).

• If the letters are not on the same row or column, replace them

with the letters on the same row respectively but at the other

pair of corners of the rectangle defined by the original pair.

To decrypt, use the inverse of these 4 rules (dropping any extra

“X”s that don’t make sense in the final message when you finish).

PolyGram substitution ciphers involve groups of characters being

substituted by other groups of characters. For example, sequences of two

plaintext characters (bigrams) may be replaced by other bigrams. The same

may be done with sequences of three plaintext characters (trigrams), or more

generally using n-grams. In full bigram substitution over an alphabet of

26 characters, the key may be any of the 226

bigrams, arranged in a table with

row and column indices corresponding to the first and second characters in

48

the bigram, and the table entries being the ciphertext bigrams substituted for

the plaintext pairs. There are then (262)! Keys.

The advantage of this is, first the frequency distribution of digraphs

is much flatter than that of individual letters (though not actually flat in real

languages; for example, ’TH’ is much more common than ’XQ’ in English).

Second, the larger number of symbols requires correspondingly more

ciphertext to productively analyze letter frequencies. Because 262 = 676, to

substitute pairs with a substitution alphabet would take an alphabet 676

symbols long – which would be rather weighty. (Stewart lee 1999).

3.4.3 Homophonic Substitution

The idea of homophonic substitution is for each fixed key k to

associate with each plaintext unit (e.g., character) m a set S (k, m) of potential

corresponding ciphertext units (generally all of common size). To encrypt m

under k, randomly choose one element from this set as the ciphertext. To

allow decryption, for each fixed key this one-to-many encryption function

must be injective on ciphertext space. Homophonic substitution results in

ciphertext data expansion. In homophonic substitution, | S (k, m)| should be

proportional to the frequency of m in the message space. The motivation is to

smooth out obvious irregularities in the frequency distribution of ciphertext

characters, which result from irregularities in the plaintext frequency

distribution when simple substitution is used. While homophonic substitution

complicates cryptanalysis based on simple frequency distribution statistics,

sufficient ciphertext may nonetheless allow frequency analysis, in conjunction

with additional statistical properties of plaintext manifested in the ciphertext.

For example, in long ciphertexts each element of S (k, m) will occur roughly

the same number of times. Bigram distributions may also provide

information. (Menezes 1997).

49

3.5 SUBSTITUTION IN MODERN CRYPTOGRAPHY

The cryptographic concept of substitution carries on even today.

From a sufficiently abstract perspective, modern bit-oriented block ciphers

(e.g., Data Encryption Standard (DES) or Advanced Encryption Algorithm

(AES)) can be viewed as substitution ciphers on a very large binary alphabet.

In addition, block ciphers often include smaller substitution tables called

S-boxes.

3.6 SECURITY FOR SIMPLE SUBSTITUTION CIPHERS

A disadvantage of this method of derangement is that the last letters

of the alphabet (which are mostly low frequency) tend to stay at the end. A

stronger way of constructing a mixed alphabet is to perform a columnar

transposition on the ordinary alphabet using the keyword, but this is not often

done. Although the number of possible keys is very large (26! ≈ 288.4

, or

about 88 bits), this cipher is not very strong, being easily broken. Provided the

message is of reasonable length, the cryptanalyst can deduce the probable

meaning of the most common symbols by analyzing the frequency

distribution of the ciphertext. This allows formation of partial words, which

can be tentatively filled in, progressively expanding the (partial) solution.

Many people solve such ciphers for recreation, as with cryptogram puzzles in

the newspaper. According to the unicity distance of English, 27.6 letters of

ciphertext are required to crack a mixed alphabet simple substitution. In

practice, typically about 50 letters are needed, although some messages can be

broken with fewer if particular unusual patterns are found. In other cipher

cases, the plaintext can be contrived to have a nearly flat frequency

distribution, and much longer plaintexts will then be required.

50

3.7 POLYALPHABETIC SUBSTITUTION CIPHER

The polyalphabetic substitution cipher is a simple extension of the

monoalphabetic one. The difference is that the message is broken into blocks

of equal length, say B, and then each position in the block (1… B) is

encrypted (or decrypted) using a different simple substitution cipher key. The

block size B is often referred to as the period of the cipher. An example of a

polyalphabetic substitution cipher is shown on Table 3.1. The block size (i.e.,

B) is chosen to be three, and Table 3.2 gives an example key and shows the

corresponding encryption. (Dimovski and Gligoroski 2003a).

Table 3.2 Example of the polyalphabetic substitution cipher key and

encryption process

KEY:

Plaintext:

ABCDEFGHIJKLMNOPQRSTUVWXYZ_

Ciphertext:

ND_WIEURYTLAKSJQHFGMZPXOBCV

(Position 1)

LP_MKONJIBHUVGYCFTXDRZSEAWQ

(Position 2)

GFTYHBVCDRUJNXSEIKM_ZAOLWQP

(Position 3)

ENCRYPTION:

Position: 1 2 3 1 2 3 1 2312

Plaintext: HOW_ARE_YOU

Ciphertext RYOVLKIQWJR

51

The decryption process is the reversal of the encryption. The

polyalphabetic substitution cipher is somewhat more difficult to cryptanalyse

than the simple substitution cipher because of the independent keys used to

encrypt successive characters in the plaintext, but it is still relatively simple to

cryptanalyse the polyalphabetic substitution cipher based on the n-gram

statistics of the plaintext language. So, despite the monoalphabetic

substitution cipher where every bigram (for example _A) is mapped to the

same encrypted bigram each time, this is not the case for the polyalphabetic

substitution cipher, where the encrypted value of a bigram is dependent upon

two factors: the individual key values and the position of the characters within

the block.

3.8 VERNAM CIPHER

In modern terminology, a Vernam cipher is a stream cipher in

which the plaintext is XORed with a random or pseudorandom stream of data

of the same length to generate the ciphertext. If the stream of data is truly

random and used only once, this is the one-time pad. Substituting

pseudorandom data generated by a cryptographically secure pseudo-random

number generator is a common and effective construction for a stream cipher.

RC4 is an example of a Vernam cipher that is still widely used in 2004.

3.8.1 Vernam History

Gilbert Sandford Vernam was an AT&T Bell Labs engineer who, in

1917, invented the stream cipher and later co-invented the one-time pad

cipher. Vernam proposed a teletype cipher in which a previously-prepared

key, kept on paper tape, is combined character by character with the plaintext

message to produce the ciphertext. To decipher the ciphertext, the same key

would be again combined character by character, producing the plaintext.

52

3.8.2 Principle of the Vernam Cipher (One-Time Pad)

One type of substitution cipher, the One-Time Pad (OTP), is quite

special. In its most common implementation, the one-time pad can be called a

substitution cipher only from an unusual perspective; typically, the plaintext

letter is combined (not substituted) in some manner (eg, XOR) with the key

material character at that position. The one time pad is, in most cases,

impractical as it requires that the key material be as long as the plaintext,

actually random, used once and only once, and kept entirely secret from all

except the sender and intended receiver. When these conditions are violated,

even marginally, the one-time pad is no longer unbreakable. Each character in

the message is combined with one from the (random, secret, and used only

once) pad in the manner of a Vernam cipher. So the pad must be at least the

length of the message. Theoretically there is no way to decipher the message

without knowing the contents of the pad. For this reason it is very important

that the pad be protected (i.e., secret), random (i.e., unpredictable by anyone),

and used only once, lest the cipher be easily compromised.

3.9 THEORETICALLY SECURE CRYPTOSYSTEM

All the methods of encryption ever devised, only one has been

theoretically proved to be completely secure. It is called the Vernam cipher or

one-time pad. The worth of all other ciphers is based on computational

security. If a cipher is computationally secure this means the probability of

cracking the encryption key using current computational technology and

algorithms within a reasonable time is supposedly extremely small, yet not

impossible. In theory, every cryptographic algorithm except for the Vernam

cipher can be broken giving enough ciphertext and time. For example, the

Public Key (PK) cryptosystems such as Pretty Good Privacy (PGP) and

Rivest Shamir and Adleman (RSA) are based on the following:

53

Calculate an integer N such that it has only two prime number

factors f1 and f2. This triad of integers forms the basis of the encryption and

decryption keys used in PK cryptosystems. The security of these systems is

simply based on the computational difficulty of calculating f2 and f1 from N

if N is a very large integer. To break this cipher, N must be factored and at the

time these systems were devised the best publicly available factoring

algorithms would take millions of years to factor a 200 digit number. This

does not logically exclude the possibility of a new factoring algorithm being

discovered, or the existence of a secret factoring algorithm, or the invention of

technology capable of running current factoring algorithms at high speed.

Both the original design and the modern version of one-time pads are based

on the binary alphabet. The message, or plaintext, is converted to a sequence

of 0's and 1's, using some publicly known rule. The key is another sequence

of 0's and 1's of the same length. Each bit of the message, or the plaintext, is

then combined with the respective bit of the key, according to the rules of

addition in base 2:

0+0=0

0+1=1

1+0=1

1+1=0

The key is a random sequence of 0's and 1's, and therefore the

resulting cryptogram, the plaintext plus the key is also random and completely

scrambled unless one knows the key. The plaintext can be recovered by

adding (in base 2 again) the cryptogram and the key.

In the example, the sender adds each bit of the plaintext (01011100)

to the corresponding bit of the key (11001010) obtaining the cryptogram

(10010110), which is then transmitted to the receiver (Figure 3.1). The sender

54

and receiver must have exact copies of the key beforehand. The sender needs

the key to encrypt the plaintext, and the receiver needs the key to recover the

plaintext from the cryptogram. An eavesdropper, who has intercepted the

cryptogram and knows the general method of encryption but not the key, will

not be able to infer anything useful about the original message. It has been

proved that if the key is secret, the same length as the message, truly random,

and never reused, then the one-time pad is unbreakable. (Sergienko 2006).

Figure 3.1 Transmission of data

3.10 KEY MANAGEMENT PROBLEM IN VERNAM

All one-time pads suffer from a serious practical drawback, known

as the key distribution problem. Potential users have to agree secretly and in

advance on the key - a long, random sequence of 0's and 1's. Once they have

done this they can use the key for enciphering and deciphering, and the

resulting cryptograms can be transmitted publicly, for example, broadcasted

by radio, posted on Internet or printed in a newspaper, without compromising

the security of messages. But the key itself must be established between the

sender and the receiver by means of a very secure channel, for example, a

55

very secure telephone line, a private meeting or hand-delivery by a trusted

courier. Such a secure channel is usually available only at certain times and

under certain circumstances. So users far apart, in order to guarantee perfect

security of subsequent crypto-communication, have to carry around with them

an enormous amount of secret and meaningless information (cryptographic

keys), equal in volume to all the messages they might later wish to send. This

is, to say the least, not very convenient. Furthermore, even if a ‘secure’

channel is available, this security can never be truly guaranteed.

A fundamental problem remains because, in principle, any classical

private channel can be monitored passively, without the sender or receiver

knowing that the eavesdropping has taken place. This is because classical

physics - the theory of ordinary-scale bodies and phenomena such as paper

documents, magnetic tapes and radio signals - allows all physical properties

of an object to be measured without disturbing those properties. Since all

information, including cryptographic keys is encoded in measurable physical

properties of some object or signal, classical theory leaves open the possibility

of passive eavesdropping, because in principle it allows the eavesdropper to

measure physical properties without disturbing them. The fastest method of

encrypting a message with a one-time pad is with a computer. A computer

simplifies the process because the message and pad are encoded in binary.

Each character is represented internally by a computer as a unique

combination of zeros and ones called bits, for example the letter 'b' is

composed of the eight bits '1100010'. This binary number is 98 in decimal. To

encrypt the message each bit of each letter in the plaintext is combined with

the corresponding letters' bit in the pad in sequence using a transformation

called the bitwise exclusive or (abbreviated to XOR). This operation is

performed on each letter in sequence i.e. The first letter of the plaintext is

XORed with the first letter of the pad to produce the first letter of the

56

ciphertext, then the second letter of the plaintext is XORed with the second

letter of the pad to produce the second letter of the ciphertext and so on.

A basic example:

Suppose you wish to encrypt the message - begin at 17.30

Using the pad - #/KBZaF>TQV^Nc

Firstly all the bits in 'b' are XORed with all the bits in '#. This

produces the binary pattern for the character 'A'.

Table 3.3 shows Bitwise XOR operation.

Table 3.3 Bitwise XOR operation

Bit sequence for [b] Bit sequence for [#] Bitwise XOR

[A]

1

1

0

0

0

1

0

0

1

0

0

0

1

1

1

0

0

0

0

0

1

The same process is repeated for the next letters

e' and '/' are XORed to produce 'J'

'g' and 'K' are XORed to produce ',' etc.

57

To do this manually necessitates that you have a list of all the

character binary codes, which is why a computer is helpful. The completed

ciphertext looks like [AJ, +4A'Jt`AP} S]. By XORing the ciphertext with their

duplicate pad, the receiver regenerates the plaintext.

You can experimentally verify this procedure as follows:

1. Produce a table of the letters of the alphabet and numbers 0 to

9. Assign to each letter and digit a unique bit sequence. There

is no need to use eight bits, six are sufficient for this test. A

sample Table 3.4 is provided.

Table 3.4 Bit sequence for plaintext

Letter Bit

sequence Letter

Bit

sequence Letter

Bit

sequence Letter

Bit

sequence

a 111111 j 110110 s 101101 2 100100

b 111110 k 110101 t 101100 3 100011

c 111101 l 110100 u 101011 4 100010

d 111100 m 110011 v 101010 5 100001

e 111011 n 110010 w 101001 6 100000

f 111010 o 110001 x 101000 7 011111

g 111001 p 110000 y 100111 8 011110

h 111000 q 101111 z 100110 9 011101

2. Using the throws of two dice to index the rows and columns of

the Table 3.5, generate a pad of sufficient length for the

message.

58

Table 3.5 Random key generation

1 2 3 4 5 6

1 a g m s y 5

2 b h n t z 6

3 c i o u 1 7

4 d j p v 2 8

5 e k q w 3 9

6 f l r x 4 0

3. XOR each bit from each letter of the text with the

corresponding bit of the equivalent pad letter to create the

ciphertext.

4. XOR the ciphertext with the pad. The plaintext will be

regenerated.

5. One final test is to XOR the ciphertext with the plaintext. This

will reconstruct the pad.

3.11 SECURITY OF THE VERNAM CIPHER

The one-time pad is unbreakable if used properly. The pad must be

composed of truly random data, it must never be used more than once and it

must be kept secure.

If each key letter in the pad sequence is truly random, a

cryptanalyst can do no better than try every possible key letter for every

ciphertext message position. This is a hopeless situation for the attacker

because it is equivalent to trying all the possible messages the key could ever

encrypt. Even for a short pad such as the given example, the number of

59

possible messages is in the region of 200,000,000,000,000,000,000,000. The

ciphertext can provide no clues as to which one of these possibilities is the

real message.

3.12 TRANSPOSITION CIPHERS

Classical ciphers were first used hundreds of years ago. So far as

security is concerned, they are no match for today’s ciphers; however, this

does not mean that they are any less important to the field of cryptology.

Their importance stems from the fact that most of the ciphers in common use

today utilize the operations of the classical ciphers as their building blocks.

For example, the Data Encryption Standard (DES) (U.S. Department of

Commerce 1988), an encryption algorithm used widely in the finance

community throughout the world, uses only three very simple operators,

namely substitution, permutation (transposition) and bit-wise exclusive-or

(admittedly, in a complicated fashion). Given their simplicity and the fact that

they are used to construct other ciphers, the classical ciphers are usually the

first ones considered when researching new attack techniques. Many flavors

of classical ciphers exist, although most fall into one of two broad categories:

substitution ciphers and transposition (permutation) ciphers (Clark and

Dawson 1998). The transposition cipher rearranges the positions of the

plaintext characters in a different and complex order but "leaves the value of a

character or character string unaltered when transforming plaintext into

Ciphertext" (Grundlingh and Van Vuuren 2003).

3.13 TYPES OF TRANSPOSITION SYSTEMS

Transposition systems are fundamentally different from substitution

systems. In substitution systems, plaintext values are replaced with other

values. In transposition systems, plaintext values are rearranged without

60

otherwise changing them. All the plaintext characters that were present before

encipherment are still present after encipherment. Only the order of the text

changes (Field manual 1990).

• Most transposition systems rearrange text by single letters. It

is possible to rearrange complete words or groups of letters

rather than single letters, but these approaches are not very

secure and have little practical value. Larger groups than

single letters preserve too much recognizable plaintext.

• Some transposition systems go through a single transposition

process. These are called single transposition. Others go

through two distinctly separate transposition processes. These

are called double transposition.

• Most transposition systems use a geometric process. Plaintext

is written into a geometric figure, most commonly a rectangle

or square and extracted from the geometric figure by a

different path than the way it was entered. When the

geometric figure is a rectangle or square and the plaintext is

entered by rows and extracted by columns, it is called

columnar transposition. When some route other than rows and

columns is used, it is called route transposition.

• Another category of transposition is grille transposition. There

are several types of grilles, but each type uses a mask with cut

out holes that is placed over the worksheet. The mask may in

turn be rotated or turned over to provide different patterns

when placed in different orientations. At each position, the

holes lineup with different spaces on the worksheet. After

writing plaintext into the holes, the mask is removed and the

61

ciphertext extracted by rows or columns. In some variations,

the plaintext may be written in rows or columns and the

ciphertext extracted using the grille. These systems may be

difficult to identify initially when first encountered, but once

the process is recognized, the systems are generally solvable.

• Transposition systems are easy to identify. Their frequency

counts will necessarily look just like plaintext, since the same

letters are still present. There should be no repeats longer than

two or three letters, except for the rare longer accidental

repeat. The monographic phi will be within plaintext limits,

but a digraphic phi should be lower, since repeated digraphs

are broken up by transposition. Identifying which type of

transposition is used is much more difficult initially, and you

may have to try different possibilities until you find the

particular method used or take advantage of special situations

which can occur.

• Columnar transposition systems can be exploited when keys

are reused with messages of the same length. The plaintext to

messages with reused keys can often be recovered without

regard to the actual method of encipherment. Once the

plaintext is recovered, the method can be reconstructed.

3.14 DOUBLE TRANSPOSITION

A single columnar transposition could be attacked by guessing

possible column lengths, writing the message out in its columns (but in the

wrong order, as the key is not yet known) and then looking for possible

anagrams. Thus to make it stronger, a double transposition was often used.

This is simply a columnar transposition applied twice, with two different keys

62

of different (preferably relatively prime) length. Double transposition was

generally regarded as the most complicated cipher that an agent could operate

reliably under difficult field conditions.

3.15 COMBINATIONS

Transposition is often combined with other techniques. For

example, a simple substitution cipher combined with a columnar transposition

avoids the weakness of both. Replacing high frequency ciphertext symbols

with high frequency plaintext letters does not reveal chunks of plaintext

because of the transposition. Anagramming the transposition does not work

because of the substitution. The technique is particularly powerful if

combined with fractionation. A disadvantage is that such ciphers are

considerably more laborious and error prone than simpler ciphers.

3.16 TRANSPOSITION CIPHERS AND BIGRAMS

According to Russell et al (2003a), cryptography has a long and

colorful history. The earliest schemes, now termed the classical ciphers, were

designed to be carried out with pen and paper rather than by electronics.

Many were transposition algorithms which rearrange the order of letters in a

message. Classical cryptography became obsolete after the advent of

computers: more complex ciphers could be used and older ciphers broken

with greater ease. Nonetheless, modern analogues of classical schemes can

still be found as components of larger ciphers. In particular, some iterated

block ciphers such as the Data Encryption Standard (NIST 1999), incorporate

transpositions to provide diffusion. The cryptanalyst's tactic when presented

with a transposition was to exploit particular statistical features of the

ciphertext, as well as to rely upon intuition, luck and trial-and-error, to find

the correct decryption. As this was sometimes too slow a process, mechanized

63

aids were used as early as World War II (Bauer 1997) by which frequencies

of letter pairs (known as bigrams) were automatically examined in order to

narrow down the space of possible keys. The remaining few keys could then

be checked exhaustively by hand to recover the plaintext.

The possibility of fully automating this procedure is considered. A

straightforward implementation turns out to be incapable of decrypting harder

cryptograms due to random 'variation in the bigram heuristic. Cryptograms

which are hard for this algorithm are quantified. It will be shown that the

pheromone feedback mechanism of an Ant Colony System is capable of

overcoming some random variation and decrypting a wider variety of

messages. A preliminary version of this result was summarized in (Russell

et al 2003b).

3.17 FORMS OF THE TRANSPOSITION CIPHER

Two forms of the transposition cipher (Helen Fouché Gaines 1939)

are introduced and their cryptanalysis shown to be equivalent. The first is

known as columnar transposition. The Columnar Transposition Cipher

arranges the plaintext in a square matrix from left to right and from top to

bottom. It depends on the key to determine the number of columns for the

letters in the square. Each character in the key becomes a column header

followed by the plaintext message in successive rows beneath those headers.

Spaces are ignored or replaced with a "null" value. Finally, the encrypted

message is written in groups according to columns. The transposition cipher

basically rearranges the content according to a regular pattern. This could be

made more complex by additional shuffling of the positions of the characters.

The standard columnar transposition consists of writing the key out

as column headers, then writing the message out in successive rows below

64

these headers (filling in any spare spaces with nulls), finally, the message is

read off in columns, in alphabetical order of the headers. As an example,

consider the plaintext "CRYPTANALYSISOFTRANSPOSITION

CIPHERSISTOUGH", encrypted using the key (31524):

3 1 5 2 4

C R Y P T

A N A L Y

S I S O F

T R A N S

P O S I T

I O N S I

S T O U G

H X X X X

Ciphertext: RNIROOTX PLONISUX CASTPISH TYFSTIGX YASASNOX

Decryption is simply a matter of writing the ciphertext back into the grid

using the same ordering of the columns. The second form is termed complete-

unit transposition. The plaintext is divided into a series of blocks (units) of a

fixed length w. again padding if necessary. A permutation of size w, is applied

to each block in turn, rearranging the letters. The sequence of permuted

blocks is then used as the ciphertext. Here is an example with the same key

and plaintext as before:

31524 31524 31524 31524 31524 31524…

CRYPT ANALY SISOF TRANS POSIT IONCI...

RPCTY NLAFS IOSFS RNTSA OIPTS OCIIN

In a sense this latter form of the transposition is also the most

general, as any transposition can be recast as a complete-unit transposition

65

with key size set to the length of the plaintext. Both of these forms of the

transposition cipher are susceptible to an attack known as multiple

anagramming. The key size w is assumed to be known (there are statistical

tests for this purpose), and the ciphertext is written into a grid with w

columns. For columnar transpositions, the ciphertext must be written into the

grid, column by column from left to right and dually for complete-unit

transpositions the ciphertext is entered row by row from top to bottom. The

columns then have to be rearranged to form readable plaintext in every row.

For example.

Finding the correct rearrangement is clearly equivalent to finding

the key. Certain patterns inherent in natural language can he exploited in

order to do this efficiently.

3.18 HISTORICAL CRYPTANALYSIS OF TRANSPOSITION

CIPHERS

According to National Institute of Standards and Technology

(NIST) (1999), one property of written natural language is the distribution of

pairs of letters known as bigrams, is not uniform. In English, for example,

'TH' is common and 'QZ' is rare. Using some large sample of text, stripped of

R P C T Y

N L A Y A

I O S F S

R N T S A

O I P T S

O S I I N

T U S G O

X X H X X

C R Y P T

A N A L Y

S I S O F

T R A N S

P O S I T

I O N S I

S T O U G

H X X X X

Rearrange

66

numbers, punctuation, white space and other non-letters, a standard

probability for each bigram can be obtained. For other texts, the observed

frequencies will tend to be close to these probabilities. Two columns placed

next to each other form several bigrams, one for each row. The bigram

adjacency score, Adj(I, J) is defined as the average probability of the bigrams

created by juxtaposing columns I and J, i.e.

Adj (I, J) = )(1

1

rr

h

r

std JIPh∑

=

(3. 1)

where Ir and Jr denote the rth

letter in the column I or J respectively. Pstd(xy) is

the standard probability of the bigram ’xy’, and h is the number of rows in a

column. The score will be higher for two correctly aligned columns, because

the bigrams will be from the plaintext. If they are incorrectly aligned, the

pairs will be much more random and likely to score lower. From the bigram

adjacency score, a pen-and-paper cryptanalyst infers the top candidates for

each column's neighbor. Together with other statistical clues, it is usually

straightforward to reassemble the columns correctly.

A more general method is also known (Bauer 1997), that is less

reliant on ad hoc exploitation of particular features of the cryptogram. This

method has been partially automated, and will now be considered in more

detail. Some preliminaries are needed: by I || J. it means that when columns I

and J are adjacent in that order they form bigrams from the plaintext. If this is

not the case, then I ≠ J can be written. The multiple anagramming problem

can be represented as a graph, as has been done in Figure 3.2. The graph is

called the anagramming graph of the problem. Each node denotes a column.

A directed arc from column I to column .J indicates that I || J has not been

ruled out. Since the transposition key is a permutation, a candidate key can be

represented as some path through the column nodes which does not pass

67

through the same column twice. Normally, this would specify w! possible

keys, where w is the width of the grid: even for small w this precludes an

exhaustive search. In the historical attack, arcs on the anagramming graph are

pruned to restrict the number of paths. The number of keys is hopefully

reduced to a point where each can be checked by hand to see which produces

a comprehensible plaintext. To prune the arcs, a cutoff value α is chosen; an

arc is included from node I to J if and only if Adj (1.J) >α .

Figure 3.2 The anagramming graph produced by the cryptogram used

as the running example

3.19 SIMPLE TRANSPOSITION (ROW TRANSPOSITION)

A simple transposition or permutation cipher works by breaking a

message into fixed size blocks and then permuting the characters within each

block according to a fixed permutation, say P. The key to the transposition

cipher is simply the permutation P. So, the transposition cipher has the

C

A

S

T

P

I

S

H

R

N

I

R

O

O

T

X

Y

A

S

A

S

N

O

X

P

L

O

N

I

S

U

X

T

Y

F

S

T

I

G

X

68

property that the encrypted message contains all the characters that were in

the plaintext message. In other words, the unigram statistics for the message

are unchanged by the encryption process. The size of the permutation is

known as the period. Let's consider an example of a transposition cipher with

a period of ten 10, and a key P={7,10,4,2,8,1,5,9,6,3}. In this case, the

message is broken into blocks of ten characters, and after encryption the

seventh character in the block will be moved to position 1, the tenth moved

character in the block will be moved to position 2, the forth is moved to

position 3, the second to position 4, the eighth to position 5, the first to

position 6, the fifth to the position 7, the ninth to the position 8, the sixth to

the position 9 and the third to position 10.

Table 3.6 shows the key and the encryption process of the

previously described transposition cipher. It can be noticed that the random

string "X" was appended to the end of the message to enforce a message

length, which is a multiple of the block size.

Table 3.6 Example of the transposition cipher key and encryption process

KEY:

Plaintext: 1 2 3 4 5 6 7 8 9 10

Ciphertext: 7 10 4 2 8 1 5 9 6 3

ENCRYPTION:

Position : 12345678910 1234 5678 910 12345678910

Plaintext : TRANSPOSITION _ ALGORITHMXXXXXXX

Ciphertext OTNRSTSIPAGI _ OOIARLNXXXHXTXXXM

It is also clear that the decryption can be achieved by following the

same process as encryption using the "inverse" of the encryption permutation.

In this case the decryption key, P-1

is equal to {6,4,10,3,7,9,1,5, 8,2}.

chapter 3 polyalphabetic ciphers€¦ · this operation is self-inverse, so that exactly the same...

Documents