part five: security issues in distributed systems

PART FIVE:Security issues in distributed systems

2

Suggested reading

Crittografia, P. Ferragina e F. Luccio, Ed. Bollati Boringhieri, € 16.

3

Roadmap

Introduction Computer security vs network

security Attacks to security

Criptography Private key (symmetric)

cryptography Public key (asymmetric)

cryptography

4

Computer security vs network (i.e., distributed systems) security

Computer security: aims at protecting information within a computer

Network security: aims at protecting the exchange of information among computers

5

Basic problems in network security

Networks are insecure because most of the communications occur in clear (e.g., e-mail, http, ftp,...)

No physical point-to-point connections but through shared lines through third-party routers

Often there is no router authentication, but only (and not always) of users

New security issues: e-commerce, e-banking, e-government, etc.

6

Attacks to information security

Information security: main issues

Confidentiality: prevent the data sent from one person A to person B to be understood by a third party C.

Authentication: verify the identity of who sends or receives data.

Integrity: be sure that the data received are identical to those sent.

Non-repudiation: prevent users who send data may in future negate to have sent them (digital signature).

8

Ensuring security: cryptography

To address the above security issues, one has to make use of cryptographic techniques, i.e., of communication protocols that overcome the influence of adversaries

Cryptography (from the greek kryptos, hidden, and graphein, writing) is the discipline that in the old days dealt with the study of the “secret scriptures”: today is a “set of techniques that permit the construction of an encrypted text and the decryption of a cryptogram.” (Garzanti, 1972)

Cryptography: a brief history Cryptography was used in the ancient antiquity to hide the content

of text messages (in modern terms, to ensure confidentiality) Cryptography experienced a tremendous development during the

Second World War, when British mathematician Alan Turing formalized the theory needed to decrypt the Nazi German cryptosystem known as Enigma

In 1949, Claude Shannon published a paper which gave start to what is now called Theory of Information, that along with Probability, Computational Complexity, and Number Theory forms the basis of Modern Cryptography

Modern cryptography is concerned with all the aspects (theoretical, computational, implementative) related to information security

Cryptosystem Definition: A cryptosystem is a quintuple

(P,C,K,Cod,Dec), where: P: finite set of plaintexts C: finite set of ciphertexts K: set of possible encryption keys Cod: P x K → C: encryption function (injective and invertible) Dec: C x K → P: decryption function

Kerckhoffs’ principle: a cryptosystem should be secure even if everything about the system, except the key, is public knowledge: thus, all its strength is based on the inviolability of the key

It can be rephrased in the Shannon maxim: the enemy knows the system!

Symmetric vs asymmetric cryptosystems If Cod and Dec use the same key to encrypt

and to decrypt a given text, we talk about symmetric cryptosystems, otherwise of asymmetric cryptosystems

Symmetric cryptosystem: since the same key is used to encrypt and to decrypt data, sender A and receiver B must share such a key

since this key must be kept secret, the main problem is the key exchange!

Symmetric key scenario

The problem of transmitting the key Q: If you want to use a symmetric

cipher to protect the dataflow between two parties, how to exchange the secret key?

A: You must use a secure channel of communication!

A first example of a symmetric key cipher: The Caesar cipher

Let us consider the Italian alphabet, and let us construct a cipher that replaces each letter of the alphabet by the letter which is 3 (this is the key!) positions forward:

For example, the clear text “distributed algorithm" is encrypted in the cryptogram “gnvzuneazhg dolrunzmpv”.

However, as most of the simple ciphers based on transpositions and/or shifting, it can be easily attacked by means of statistical analysis

Statistical cryptanalysis The plaintext is obtained by means of the use of

statistical techniques on the frequency of characters or substrings of the ciphertext, as compared to the corresponding frequencies of the associated language

Perfect cryptosystems A cryptosystem is called perfect if plaintext

and ciphertext are statistically independent

More formally, if we denote by P(m) the probability that a message from A to B contains m, and by P(m|c) the probability that a message is equal to m after observing an encrypted message c, then a cipher is perfect iff for any mM and for any cC it holds P(m|c)=P(m).

A necessary condition to be perfectTheorem (Shannon): A cryptosystem is perfect only if |K|≥|M|.Proof: For the sake of contradiction, assume the existence of a perfect cryptosystem with |K|<|M|. By the injectivity of the Cod function, |M|≤|C|, i.e., |K|<|C|. Let m be a message s.t. P(m)=p≠0. Then, m can generate at most |K|<|C| ciphered messages (one for each key). It follows that there exists at least a ciphered message c* which is not image of m, namely

P(m|c*)=0≠p=P(m)against the assumption of perfectness. □

Two very unperfect ciphers Assume that P(m)=p, 0<p<1, and that P(m|

c)=0≠p; then, a cryptoanalyst that sees c, is able to infer that the encrypted message is not m!

Assume now that P(m)=p, 0<p<1, and that P(m|c)=1≠p; then, a cryptoanalyst that sees c, is able to infer that the encrypted message is exactly m!

In all the intermediate cases in which P(m|c)≠p, a cryptoanalyst is able to infer something by observing the ciphered messages!

A perfect cryptosystem One-time pad (Vernam G., AT&T,

1917):1. Builds a large random (and not

pseudorandom) key, for example using a detector of cosmic rays.

2. The ciphertext is constructed by a bitwise XOR between the plaintext m and the key k (recall 10=01=0; 11=00=1)

3. A sends c=mk

4. B decrypts m=ck (indeed xyy=x)5. The key should never be reused (one-time

pad).

One-time pad is perfect!We have to show that P(m|c)=P(m). Let m and c be of n bits; by definition of conditional probability, we have:

P(m|c)=P(m∩c)/P(c)where P(m∩c) is the probability that A generates m and ciphers it as c; then

P(m∩c)=P(m∩c:=mk)and since k is truly random, and the XOR transforms a 0 of m in a 0 of c with probability 1/2, and 0 in 1 with probability 1/2, and a 1 in 0 with probability 1/2, and finally a 1 in 1 with probability 1/2, it follows that any bit of c is statistically independent of the corresponding bit of m. Thus, m and c are independent, and so

P(m∩c:=mk) = P(m)∙P(c)

from which it follows that P(m|c)=P(c).□

From perfection to reality ... One-time pad is only theoretically perfect: how A and B do actually exchange key k?!? If they exchange it a priori (by not using a traditional communication channel), then its length will bound the length of the message to be encrypted! (notice however that one-time pad was used for the Moscow–Washington red line)

Instead of being perfect (i.e., provably secure but practically unusable), all used cryptosystems are computationally secure: the cryptanalytic problem (namely, the decryption of a ciphertext without knowing the key) is computationally intractable

The state-of-the-art in symmetric key encryption: Rijndael

Developed by Joan Daemen and Vincent Rijmen.

This algorithm has won the selection for Advanced Encryption Standard (AES) in 2000. Officially, the Rijndael method has become the standard for symmetric key encryption of the XXI century.

The cipher uses a set of keys of variable length (128, 192, 256 bits), and a network of “shuffling of the message," in which multiple operations of transposition, substitution, and xoring of blocks of fixed length are performed. Keys are exchanged by encrypting them with RSA cryptography (to be seen next).

Limits of symmetric key ciphers Does a secure channel of communication to

exchange the secret key actually exist in reality? And if it does exist, why using encryption??

In addition, for secure communication between n users, one must exchange a quadratic number of keys!

Finally, the symmetric method does not distinguish between sender and receiver, and so it cannot address security issues like the authentication and the non-repudiability of a message

Asymmetric key (a.k.a. public key) algorithms Each subject S has a pair of keys:

A public key Kpu(S), known to all; A private key Kpr(S), known only by himself.

The requirements that a public key algorithm must enjoy are: Data encoded with one key can be decrypted

only with the other one; The private key should never be transmitted in

the network; It must be very difficult to derive a key from

the other one (in particular, the private key from the public key).

The various public key scenarios

First scenario: A encodes a message with the public key associated with B, which then decodes the message by using its own private key; in this way, confidentiality and integrity are guaranteed (B only can read the message)


Second scenario: A “signs” a message by encoding it with its own private key, and then sends it to B, which then authenticates the message by using the public key associated with A; in this way, authenticity and non-repudability are guaranteed (all can read the message, but A only can have signed it)


Third scenario: A “signs” a message by encoding it with its own private key, then re-encodes it with the public key associated with B; hence, it sends it to B, which decodes it by using its own private key, and then authenticates it by using the public key associated with A; in this way, confidentiality, integrity, authenticity, and non-repudability are guaranteed.

The birth of PKI systems• Where do I find the public keys of my

recipients?

• Creation of archives of public keys, the public key servers.

• But who guarantees the correspondence of public keys with the respective owners?

Birth of the Certification Authority (CA).

• At this point, who guarantees the validity of a certificate authority?

Act of faith!

The mathematics of public key systems It was introduced by Diffie and Hellman in 1976:Definition: A function f is called one-way if for

every x the computation of y=f(x) is simple (i.e., it is in P), while the calculation of x=f-1(y) is computationally hard (i.e., it is NP-hard).

Definition: A one-way function is called trapdoor if the calculation x=f-1(y) can be made easy once that additional information (private) are known.

... But unfortunately for them, they were not able

to build a one-way trapdoor function!

The RSA algorithm Designed in 1977 by Ron Rivest, Adi Shamir and

Leonard Adlemann, the cipher is patented, and has become public knowledge until 2000.

Basic idea: given two prime numbers p and q (very large), it is easy to calculate the product n = p∙q, while it is very difficult to compute the factorization of n (although this problem is not known to be NP-hard).

The best factorization algorithms currently available (Quadratic Sieve, Elliptic Curve Method, Pollard’s Heuristic, etc.). all have an exponential complexity, in the order of:

The RSA algorithm To ensure security, it is necessary that p and q are

at least 200 decimal digits. Indeed, in this way n=p∙q is 400 digits long, namely is in the order of 10400, and so:

≈ e79 ≈ 1034

which is computationally intractable. keys are typically 1024 bits long (21024 ≈ 10300)

RSA is much slower than symmetric key algorithms, and it is often applied for the transmission of small amount of data, like the private key in a symmetric key system (as noticed before).

RSA at work: key generation

1. Choose two large primes p and q and computes n=p∙q. 2. Compute the Euler totient function w.r.t. n, i.e., the

cardinality of all numbers less than n and prime with it: ϕ(n)=ϕ(pq)=pq-[(q-1)+(p-1)]-1=pq-(p+q)+1=

=(p-1)·(q-1)=ϕ(p)·ϕ(q) (since there are q-1 multiples of p less than n, and p-1

multiples of q less than n) 3. Choose a number 0<e<ϕ(n) s.t. GCD(e,ϕ(n))=1 (i.e., e,ϕ(n)

are coprime)4. Define the public key as (e,n). 5. Compute d such that e·d1 mod ϕ(n). 6. Define the private key as (d,n).

Recall: xy mod z the remainder of the integer division between x and z, and between y and z is the same, namely x mod z = y mod z (or, equivalently, there exists an integer k s.t. x=y+kz)

RSA at work

1. The encryption function of A is Cod(x):=xe mod n (with x<n), where (e,n) is the public key of B.

2. The decryption function of B is:Dec(Cod(x)):=Cod(x)d mod n = (xe mod n)d mod n

where (d,n) is the private key of B.

A sends a crypted message x to B

A sends a signed (i.e., non-repudable) message x to B1. The encryption function of A is Cod(x)=xd mod n

(with x<n), where (d,n) is the private key of A.2. The decryption function of B is:

Dec(Cod(x)):=Cod(x)e mod n = (xd mod n)e mod nwhere (e,n) is the public key of A.

Notice that public and private keys can be used interchangably, since Dec(Cod(x))=Cod(Dec(x)).

Correctness of RSA: some theorems of modular algebra

Theorem (modular equations): Equation axb mod n has a solution iff GCD(a,n) divides b. In such a case, there are exactly GCD(a,n) distinct solutions.

Corollary (existence of the inverse): If a and n are coprime, then ax1 mod n has eactly one positive solution less than n, known as the inverse of a modulo n.

Euler’s Theorem: For any n>1, and for any a prime with n, we have that aϕ(n)1 mod n.

Correctness of RSA Notice that e and ϕ(n) are coprime, and so from the

corollary on the existence of the inverse, there exists a unique d less than ϕ(n) s.t. e∙d1 mod ϕ(n).

Here is the strength of RSA: to compute d from e one must know ϕ(n), i.e., p and q, and so one must be able to factorize efficiently!

Secretation: we must prove that x<n, Dec(Cod(x))=x. ButDec(Cod(x))=(xe mod n)d mod n=xed mod n,

and so we have to show that x=xed mod n. Prove it!

We distinguish two cases:1. p and q do not divide x (and so GCD(p,x)=GCD(q,x)=1,

since they are prime);2. p (or q) divides x, but q (or p) does not divide x.

(notice that p and q cannot both divide x, since otherwise we should have x≥n, against the assumptions)

Case 1: We have GCD(x,n)=1, and so from Eulero’s theorem, we have xϕ(n)1 mod n; since ed1 mod ϕ(n), we have ed=1+kϕ(n), for some positive integer k. So, since x<n, we have:

xed mod n = x1+kϕ(n) mod n = x·(xϕ(n))k mod n = x·1k mod n = x.

Case 2: Since p divides x, for any positive integer k we have xxk0 mod p, namely (xk-x)0 mod p. Since instead q does not divide x, similarly to Case 1, we have also xedx mod q, and so (xed-x)0 mod q. It follows that (xed-x) is divided by both p and q, and then by their product n, from which it follows

(xed-x)0 mod n xedx mod n xed mod n = x mod n = x. □

Authentication: it follows from the RSA property:Dec(Cod(x))=Cod(Dec(x)).

Correctness of RSA (2)

RSA at work: an example (1 of 2) Assume that A wishes to send a secret message to B; then, by

the RSA protocol, B needs to provide its publik key to A. B needs to generate its keys; then, it selects two large primes,

for instance p=3 e q=11 (ehmm, not very large, actually!) Then, n=33 e ϕ(n)=2·10=20. Then, B takes e=3, since 3 is coprime with 20 (3,33) is the

public key of B Then, B searches d s.t. 3d1 mod 20. Hence, from

3d=1+k·20, by setting k=1, we have d=7 (7,33) is the private key of B

Now, to encrypt a message, A divides it in blocks of bits whose maximum value is less than n=33; then, a block P becomes:

C:=Cod(P)=P3 mod 33 A sends C to B; to decode it, B computes P=C7 mod 33 In our example, since n=33, a block contains at most 5 bits

(25<33); however, in the practice, n is in the order of 21024, and so blocks have a size of 1024 bits, i.e., 128 ASCII characters (8 bits each).

RSA at work: an example (2 of 2)

To visualize the example, let us suppose that the 26 letters of the English alphabet are represented by using 5 bits, and so, since n=33, each block is made up by a single character:

Computational Complexity of RSA It can be shown that the keys (and thus p,q,e,d) can

be generated in polynomial time w.r.t. to their binary representation (namely, logarithmic in their value).

In particular, e is usually chosen by taking a quite small prime number (e.g., e=3).

Instead, d is obtained by an extension (polynomial) of the Euclidean algorithm for computing the GCD (based on the fact that GCD(a,b)=GCD(b,a mod b)).

However, to find large prime numbers (i.e., p and q), probabilistic primality testing algorithms are used, since deterministic algorithms are too slow (although polynomial, but in the order of a degree of 10).

Finally, note that the processes of encryption and decryption can be performed efficiently by successive exponentiation (so-called modular exponentiation).

Searching for p and q Recall: remember the randomized algorithm for computing a

MIS: its answer was deterministically correct, while its computational complexity was given in probabilistic form. This is known as a randomized Las Vegas algorithm.

There exists another fundamental type of randomized algorithm, known as randomized Monte Carlo algorithm., in which the answer is probably correct, while the time complexity is deterministically bounded.

Definition (Monte Carlo algorithm): A Monte Carlo "no-biased" algorithm is a randomized algorithm for solving a given decision problem, such that the answer "no" is always correct, while the answer "yes" may be incorrect with a fixed probability ε. Monte Carlo "yes-biased" algorithms are similarly defined.

The Miller&Rabin algorithm is a Monte Carlo "no-biased" algorithm to test the primality of a number n. Its time complexity is O(log3 n), and its probability of inaccuracy is ε≈1/4 (i.e., YES answer is correct with probability ≈3/4).

Miller&Rabin algorithm It is based on the following property: given an odd integer n

(for which we want to test primality), we write it as n=2sr+1, with r odd (thus s is the multiplicity of factor 2 in the decomposition of the even number n-1). Now, given 2 ≤ t ≤ n-2, we define the following 2 predicates:(P1): GCD(n,t)=1;(P2): (tr mod n=1) OR (it exists 0 ≤ j ≤ s-1 s.t. t2

jr mod n=-

1). Theorem: If n is prime it satisfies both predicates, while if n

is composite, then the number of integers between 1 and n-1 that satisfy both predicates is less than n/4.

We run MR(n) a number of k times, testing each time the two predicates on a random integer 2 ≤ t ≤ n-2. If the algorithm answers "no“, even only once, the number is definitely composite, but if it always answers "yes", then the probability that the number is composite is 4-k, and therefore the probability that the number is prime is:

P(prime)=1-P(composite)=1-4-k

(e.g., if k=100, then P(prime)≈1-10-60 ≈ 1)

Miller&Rabin algorithm

Miller-Rabin(n)1. Set n-1=2sr with r odd2. For i=1 to k do

2.1 choose randomly an integer t s.t. 2≤t≤n-22.2 if GCD(n,t)>1 return composite %condition

(P1) is false2.3 compute y=tr mod n2.4 if y≠1 do %first condition of (P2) is false

2.4.1 j=02.4.2 while ((j≤s-1) and (y≠n-1))

y:=t2jr mod n

j++2.4.3 if y≠n-1 return composite %second condition of (P2) is false as well

3. Return prime (w.h.p. 1-4-k)

Is it easy to find prime numbers? Despite the efficiency of the primality test, it is still unknown

if the primes are well distributed and therefore easy to find at random (Riemann hypothesis!). However, we know that their density is quite high, as stated by the following result:

Gauss Theorem (prime numbers): Let π(n) be the distribution function of prime numbers, i.e., the number of primes less than n. Then the following is satisfied:

So, if you search for a prime number of 100 digits, and they would be uniformly distributed, you should check "only" ln (10100) ≈ 230 consecutive numbers.

part five: security issues in distributed systems

Documents

computer network security

usersnew security issues

theory of information

exchange of information

set of techniques

network securitynetworks

quintuple p

finite set of ciphertextsk