information theoretic perspectives to cloud communications oscar n. garcia professor college of...

43
Information Theoretic Perspectives to Cloud Communications Oscar N. Garcia Professor College of Engineering UNT SoMIC , April 3,2013

Upload: conrad-donald-johns

Post on 26-Dec-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

Information Theoretic Perspectives to Cloud Communications

Oscar N. GarciaProfessor

College of EngineeringUNT

SoMIC , April 3,2013

Conventional wisdom seems to agree on the following issues:

• 1) No system can be guaranteed to be impenetrable to malware,

• 2) The Cloud will compete and win over sales of anything that does not require direct personal services,

• 3) One of the obstacles to better security is the lack of a theory of encryption and secure software,

• 4) Decryption in the Cloud presents dangers even when using an honest provider.

Kinds of darknet sensing serversDepending on the level of response to the attack:

1) No response to an attack (“Black Hole sensor”)2) Low level or minimal behavior as having been attacked (SYN-

ACK)3) Full behavior as victim of the attack (interactive honeypot)

All examples below provide dynamic (blackbox) analysis:

• Norman Corporation Sandbox (Norway) –Windows clone used w/o Internet connection for the malware to propagate

• CWSandbox – U. of Mannheim (Germany) Windows XP on a virtual machine but allows Internet connectivity

• Anubis – U. of Vienna, uses a PC emulator and allows Internet connectivity

NICT’s approach via darknets

nicter Makes a real –time estimate of the types of malware present in the Internet by sensors in “darknets” IP’s by:• Multipoint passive monitoring since most if not all of what reach

the server sensors is malware because the IP is not used • Analyzing the collected traffic information for commonality

(macro)• Analyzing each specific attach for malware feature detection and

how it causes damage (micro)• Using specimens from “honeypots” or darknets using dynamic

(black box sacrifices) or static (manual white box analysis) by disassembly.

• Correlating the features (datamining as in KDD 99)We are now involved in forming clusters via classification techniques to eventually find commonality of countermeasures for classes of attacks, rather than considering each kind individually.KDD 99 had a database with 41 features and thousands of incidents characterized by those features.

Wrappers and their generation

• My definition of a wrapper (loaded word): a “wrapping” or delegating a method to carrying a task without doing much computation before the delegation is used and often just passing complex data types.

• Wrappers have been automatically generated (such as the SWIG or Simplified Wrapper and Interface Generator, an open source tool) to connect the wrapped program to another language, often of the scripting type.

• Wrappers can be used as security mediators (say SAW project at Stanford) particularly now that they can be automatically generated (Fujitsu).

A First Order Model of an Attack

PROBING & SCANNINGObjective?

1. Targeted or 2. Opportunistic.

IDENTIFICATION OR LOCATION OF

VALUABLE OR UNPROTECTED

ACCOUNTS OR DBs

DISCOVERING PASSWORDS & BRUTE FORCE DECRYPTION

MALWARE ACTIONS AND TIMING - INFO GATHERING OR DESTRUCTION

HIDING AND PLANNING FUTURE OR

SLOW RE-ENTRY

POTENTIAL FOR DENIAL OF SERVICE

CLASSIFICATION OF ATTACKS• Purpose• Origin• Target• Evaluation of threat• etc

Original KDD 99 listed 41 features; reduced to 22 in “Cyber Attack Classification Based on Parallel Support Vector Machine”, Mital Patel, MTech Scholar, SIRT, Bhopal, Yogdhar Pandey, Assistant Professor, CSE Dept, SIRT, Bhopal

One more item of consensus Theoretical contributors in the field lament the lack of an integrative theory of how to achieve security by encryption or for that matter by any other approach. There are theoretical approaches for different encryption methods but not one that encompasses them all.

We have been considering an information theory approach with some basic views about the compression, encryption and error correction sequence. And oriented a course to the common aspects of these topics.

[email protected] 12

Two different approaches to information: information theoretic and algorithmic

How does one measure information in each case: in non-redundant bits or in complexity?

Why is measuring information an important endeavor? A “Gedanken” tool to measure in the information theoretic approaches: the Infometer.

Claude E. Andrey N.Shannon Kolmogorov

[email protected] 13

The information model is the same in both approaches

13

SOURCEOF

INFORMATION

CHANNEL OF INFORMATION with or without memory

NOISY OR NOISELESS

RECEIVER ORSINK OF

INFORMATION

KOLMOGOROV USED THE MINIMUM SIZE OF THE SOURCE AS THE MEASURE OF

INFORMATION

SHANON USED THE PROBABILITY OF THE

SYMBOLS TRANSMITTED THROUGH THE

CHANNEL TO DEFINE THE MEASURE OF

INFORMATION

[email protected] 14

Kolmogorov Relative Complexity Leads to a measure of information

Given a binary string s and a computational machine M, the complexity of s relative to M is given by the smallest number of bits needed to write a program p that exactly reproduces the sequence using the computer M. Let’s call L(p) the length of the sequence of binary symbols needed to express the program that run in M produces s as an output. The smallest possible L(p) for a given s over all programs and all machines that outputs s is the Kolmogorov measure of information in s relative in complexity to M represented as: KM(S)=min(L(p))+CM where CM is the number of bits that it takes to describe M, a quantity that is independent of s.

[email protected] 15

Shannon’s Entropic definitionShannon’s view of the degree of randomness, which corresponds to the amount of information being emitted by a source with more or less equiprobable output x, is measured in bits and called the entropy H(x) of x over the set of all possible messages X that could be emitted. It is given by the negative of the sum of the probability of each emitted message in the subset xєX times the logarithm of that probability: [1]

[email protected] 16

THE CLOUD SYSTEM(Cryptocoding)

[email protected]

Unicity value (Shannon)If the plaintext is not randomized, Eva with a supercomputer could estimate the key given a sufficiently large number n of samples (ciphertext only attack) of an n-digit cryptogram. What is that number? It is called the unicity distance and is given by

nu = H(key)/(percentage redundancy of

the plaintext/ size of ciphertext alphabet) = H(K)/(max possible H(P)-actual H(Px))

=H(K)/(redundancy of plaintext in bits/character)

Then: change the key before we get past nu samples? Better figure out how much brute force cryptanalysis can the cryptosystem tolerate. Another way to look at it is the minimum amount of ciphertext that could yield a unique key by a brute force attack by an adversary with infinite computational resources. The denominator can also be stated as the difference between the maximum possible information of the plaintext and the actual information of the transmitted message.

17

[email protected] 18

ExampleUsing the 26 letters and the space, the maximum entropy if the were all equally likely would be –log2(27) = 4.7549/character while the actual average Shannon entropy of a character in an sufficiently long English plaintext is about 1.5 bits/character. The redundancy of the plaintext P is = (maxpossible-actual)=~(4.7-1.5)=3.2 bits/character. Assuming length(K) = 128 bits, then H(K)=128 for equiprobable keys (rather than standard language words) and the unicity distance is

nu = 128 bits/3.2 bits per character = 40 characterswhich is a scary number if an attacker had such computational capacity to try all possible 2128 keys (brute force attack) on the 40 characters of ciphertext to extract some meaningful message and therefore figure out the key. Eve has a bid out for a teraflop computer.

[email protected] 19

Classification of Compression MethodsMost compression methods fall in one of four categories:i. Run-length encoding or RLE ii. Statistical methods – mostly entropy based

(Huffman, Arithmetic codes, etc.)iii. Dictionary-based or LZ (Lempel-Ziv) methodsiv. Transform methods (Fourier, wavelets, DCT, etc

Removing redundancy leaves what is left containing more information per bit but the same entropy in the complete message.

[email protected]

LOSSLESS COMPRESSION PROGRAMSand their use

20

LEMPEL-ZIVBased on dynamic tables

The tables can be encoded (SHRI, LZX) using Huffman encoding

Microsoft uses in its

CAB format

Coding based on probabilities

Use partial matches Use Grammars

Arithmetic codes for compression

Based on Statistics (Rissanen’s idea)

A generalization of Huffman

SEQUITORRE-PAIR

DEFLATE(optimized)

used in PKZIP, gzip, PNG

LZ+Welch = LZW

Used in GIF

LZ+Renau= LZR

Used in Zip

Run-length encoding (lossless) used in simple colored icons but not oncontinuous-tone images although JPEG uses it. Common formats for run-length data include Truevision TGA, PackBits, PCX and ILBM and fax.

[email protected] 21

EncryptionThe perfect approach to encryption is to immerse the data into a randomized noise-like stream that it would be difficult to identify it. A totally random sequence has the highest entropy possible, but of course, could not be compressed (although a couple of patents have been issued) or meaningfully decoded. Recurrences in the encrypted stream, give clues for potential decryption.

DES was a failed encryption standard in the US that was replaced by the Advance Encryption Standard (AES) that may use keys of sizes 128, 192 or 256 bits for increased security.

[email protected] 22

The general idea of encryption

Plaintext

cipher:an

encryptingalgorithm

Secret Key(s)

ciphertextto be

stored or transmitted

(not processed

until decrypted!)

Golomb’s postulates for randomness are the following:

(G1): The numbers of 0s and 1s in the sequence are as near as possible to n/2(that is, exactly n/2 if n/2 is even, and (n±1)/2 if n is odd).

(G2): The number of runs of given length should halve when the length is in-creased by one (as long as possible), and where possible equally many runsof given length should consist of 0s as of 1s.

(G3): The out-of-phase autocorrelation (ANDing for binary of cyclic shifts) should be constant (independent of the shift).

Tests for RandomnessIt should make sense that if we can test the randomness (and understand what we are testing for) of a string (or its generator) we should be able to improve it.Terry Ritter, a prolific author (whose webpage was last maintained up to 2005, http://www.ciphersbyritter.com/AUTHOR.HTM#Addr) wrote a nice survey (http://www.ciphersbyritter.com/RES/RANDTEST.HTM) on the literature on testing for randomness from 1963 to 1995. Since that time Marsaglia and Wan wrote: George Marsaglia, Wai Wan Tsang, Some Difficult-to-pass Tests of Randomness, Journal of Statistical Software, Volume 7, 2002, Issue 3. But the most comprehensive suite of tests that I found is from R. G. Brown: http://www.phy.duke.edu/~rgb/General/dieharder.php

Total messageinfo in bits

Info per bit

Lossless

Lossy

Lossless

Compression Encryption EC encoding

Lossy

Lost information/bit

Total lost information

While the example is not to scale, it could be made precise in a specific case (blue line is start baseline)

Total number of bits in message (green)

[email protected] 26

Classification and examples of crypto systems

Encryption/Decryption

Systems

Symmetric: encryption and decryption keys are shared by Alice and Bob, or the former is shared and the latter calculated. Ex.: DES, AES

Public Key (Asymmetric): nothing is shared but the common public knowledge of a public key infrastructure. Alice knows her private key but not Bob’s and vice-versa. Public key systems allow the secret transmission of a common key and therefore enable a symmetric system.Ex: RSA, ElGamal, NTRU, McEliece

1970

[email protected] 27

A High Level View of theAdvanced Encryption Standard

16 bytes block ofplaintext

Encryption Key

ciphertextFor an animated description see:

Initialization Vector

AES Software “Blender”

http://www.cs.bc.edu/~straubin/cs381-05/blockciphers/rijndael_ingles2004.swf

One-way functions => HASH algorithms

SHA-1,2,…,3MD5RIPEMDDSA(NIST…)

With a trapdoor

Public-key encryptionAsymmetric

Diffie-Hellman (principle and implementation)RSA (find proper large primes)ElGamal(prime exp)NTRU(pair of poly.)McEliece (ECC )Elliptic Curves

Block and Streaming CryptosystemsCommon Private key

Symmetric

DESAES

AuthenticationIntegrity(not privacy)

ElectronicSignature

Security Protocols

Kerberos(Client/Sever)PKIX.509 Certificates

PGP in tandem

SimilarIDEA

Dist=101 Goppa ECC generator matrix G

SUMMARY

[email protected] 30

Some background for elliptic curves (0): a singular elliptical curve is not an elliptical curve according to this definition!

a later lesson, but imagine infinity as the additive identity element of the field. Also notice that the field is not specified: it could be the complex or real field, the field of rationals, Galois fields ( the ground field modulo p, or extension finite fields ), and any algebraic structure that fills the requirements for a field. Once you chose the field you use its elements for the coefficients, the indeterminants (variables), and the solutions: they all must be elements of the field. However, the representation of those elements plays a heavy burden in the computational complexity of the solutions.

[email protected] 31

In choosing an elliptic curve as the foundation of a public key system there are a variety of different choices. The National Institute of Standards and Technology (NIST) has standardized on a list of 15 elliptic curves of varying sizes. Ten of these curves are for what are known as binary fields (section 16.4 in text) and 5 are for prime fields. Those curves listed provide cryptography equivalent to symmetric encryption algorithms (e.g. AES, DES or SKIPJACK) with keys of length 80, 112, 128, 192, and 256 bits and beyond. For protecting both classified and unclassified National Security information, the National Security Agency has decided to move to elliptic curve based public key cryptography. Where appropriate, NSA plans to use the elliptic curves over finite fields with large prime moduli (256, 384, and 521 bits) published by NIST.

NSA

[email protected] 32

ERROR CONTROLError control in digital transmission and storage involves the detection and possible correction of errors due to noise. When designing error control systems it is important to know the type of errors that noise is most likely to cause. We classify those errors as random errors, burst errors and erasure errors among others.

When we code for error detection we may require a retransmission if we can not tolerate discarding the message.

With error correction capabilities we can repair up to a certain number of errors and obtain the original message back.

In either case, we decrease the amount of information per bit when we encode for error control, but the amount of information transmitted is the same, just more redundant.

[email protected] 33

HOMOMORPHIC ENCODINGThe last intruder or careless manager that one would suspect of disseminating or damaging the valuable user information is the Cloud provider. To insulate the user from that potential damage it would be ideal not to have to decrypt the information sent to the provider while the processing the user’s request from the utility. You wish the provider would never see your data

This is the quest, not yet practical, of Homomorphic Encryption.

[email protected] 34

What is a homomorphism?

Elements r, s (encrypted) and

operatorsin user range

Elements r, s (encrypted) and

“proper” operators

in the Cloud domain

Mapping between Universal Algebras ( ƒ: A → B

for each n-ary operation μB and for all elements a1,...,an ∈ A.(Re: Wikipedia)

Groups G, H ƒ: G → H such that for any elements g1, g2 ∈ G,

where * denotes the group operation in G and *' denotes an operation in H. If there is more than one operation the homorphism must hold for all.

[email protected] 35

Homomorphic Encryption: PHE and FHE(PARTIAL AND FULL)

Plaintext x ciphertext Ƹ(x)

Fully homomorphicCryptosystems:

Craig GentryMarten van Dijk, Craig Gentry, Shai Halevi and

Vinod VaikuntanathanNigel P. Smart and Frederik Vercauteren

Riggio and SicariCoron, Naccache and Tibouchi

Credit to Wikipedia

Partially homomorphicCryptosystems:Unpadded RSA

ElGamalGoldwasser-Micali

BenalohPaillier

Okamoto–UchiyamaNaccache–SternDamgård–Jurik

Boneh–Goh–Nissim

If encryption does not use randomness

is called deterministic

[email protected] 36

Elements of the Cloud Architecture

User may have any or all of:Desktop PCsLaptopsServersMobile devicesSensors, etc.

User

U

Internet

I

CLOUD Must have

one or more:SaaS, PaaS,

IaaS

NAS

(Large)Network Attached Storage

PROVIDER

Full service cloud = a

services utility ; specialized

service clouds exist

ENTERPRISEHOME OFFICEPRIVATE CLOUD

Access and Protocols

Direct Connection

S Server

[email protected] 37

NAS U

I

Not a cloud as there are no services from a utility PRIVATE

CLOUDHOME OFFICE

U

CLOUD Operating

System (stack) and some

Application

I

U

ONE CLOUD with service(s)

as needed from SaaS, PaaS, IaaS

I

NAS

Same user; different locations U U

Multiple related users serviced by one cloud

S

I

U

[email protected] 38

CLOUD Must have

one or more:SaaS, PaaS,

IaaS

CLOUD Must have

one or more:SaaS, PaaS,

IaaS

CLOUD Must have

one or more:SaaS, PaaS,

IaaS

PROVIDERPROVIDER

ENTERPRISE

ENTERPRISEENTERPRISE I

UU

UU

PRIVATE CLOUD

When connected to a public cloud it becomes a hybrid cloudU

THE REAL CLOUD

ENVIRONMENT

[email protected] 39

Research Areas in the Cloud:networking, operating systems, databases, storage, A metric for Cloudsvirtual machines, distributed systems, data-mining, web search, network measurements, and multimedia.Storage to computation ratio 500 TB/1000 cores at Illinois CCT (Cloud Computing Testbed)System stack: CentOS (the underlying OS beneath many popular parallel programming frameworks such as Hadoop and Pig Latin),The Open Cloud Consortium (OCC) is a not for profit that manages and operates cloud computing infrastructure to support :scientific, environmental, medical and health care research.

[email protected] 40

Virtualization is a win-win solutionFrom the client/user point of view:• Has practically on demand access to infinite (illusory but at lower

cost than using individual real servers) autonomous resources and memory at a very competitive cost.

From the server bank owner/service provider point of view:• More efficient utilization of physical facilities because peak loads

can be handle by replication of VMs and not by adding individual servers to cover peak loads.

• Faster disaster recovery and maintenance due to ease of VMs migration.

• Be more competitive in pricing, processing offerings and variety of services due to economies of scale and physical plant expansion cost at a much smaller rate of increase.

• Easier adaptation to and satisfaction of special customer requests by specializing and dedicating software, not hardware.

[email protected] 41

WHITE PAPERS

[email protected] 42

CONCLUSION: Information is power but …

Quote from Herbert Simon (June 15, 1916 – February 9, 2001) : “In an information-rich world, the wealth of information means a dearth of something else: a scarcity of whatever it is that information consumes…the attention of its recipients.”

[email protected] 43

IMPLICATIONS FOR THE FUTUREAs I see it:1.- Smaller or disappearing private IT installations2.- If security does not improve, many private Clouds with interfaces to public or specialized ones in a hybrid arrangement3.- The significant new struggle of Malls and store commerce to compete with electronic commerce4.- Centralization of jobs in larger computing entities6.- Significant changes in distance high quality education7.- Data loss to natural disasters less common8.- Data loss due to malware and cyberattacks more common9.- Significant increase in loss of privacyIn conclusion, if we thought that we were in a Brave New World

GET READY FOR A BRAVER NEW ONE