a f ramework for the a nalysis of m ix -b ased s teganographic f ile s ystems claudia diaz, carmela...

22
A FRAMEWORK FOR THE ANALYSIS OF MIX-BASED STEGANOGRAPHIC FILE SYSTEMS Claudia Diaz, Carmela Troncoso, Bart Preneel K.U.Leuven / COSIC Cambridge, January 28, 2009 1

Upload: elvin-harper

Post on 17-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

A FRAMEWORK FOR THE ANALYSIS OF MIX-BASED STEGANOGRAPHIC FILE SYSTEMS

Claudia Diaz, Carmela Troncoso, Bart Preneel

K.U.Leuven / COSIC

Cambridge, January 28, 20091

MOTIVATION Problem: we want to keep stored information

secure (confidential) Encryption protects against the unwanted

disclosure of information but… reveals the fact that hidden information exists!

User can be threatened / tortured / coerced to disclose the decryption keys (“coercion attack”) We need to hide the existence of files

Property: plausible deniability Allow users to deny believably that any further encrypted

data is located on the storage device If password is not known, not possible to determine the

existence of hidden files 2

ATTACKER MODEL: ONE SNAPSHOT

Attacker has never inspected the user’s computer before coercion

Ability to coerce the user at any point in time User produces some keys Attacker inspects user computer

Game: If attacker is able to determine that the user has not provided all her keys, the attacker wins

3

ANDERSON, NEEDHAM & SHAMIR (1998)

1. Use cover files such that a linear combination (XOR) of them reveals the information

Password: subset of files to combine Hierarchy (various levels of security)

User can show some “low” security levels while hiding “high” security levels

Not possible to know whether she has revealed the keys to all existing levels

Drawbacks: File read operations have high cost Needs a lot of cover files to be secure (computationally

infeasible to try all combinations) Assumes adversary knows nothing about the plaintext

4

ANDERSON, NEEDHAM & SHAMIR (1998)

2. Real files hidden in encrypted form in pseudo-random locations amongst random data

Location derived from the name of the file and a password

Collisions (birthday paradox) overwrite data: Use only small part of the storage capacity ( < )

Replication

All copies of a block need to be overwritten to lose the data

Linear hierarchy: higher security levels need more replication

5

N

STEGFS: MCDONALD & KUHN (1999)

o Implemented as extension of the Linux file system (Ext2fs)o Hidden files are placed into unused blocks of a “normal”

partitiono Normal files are overwritten with random data when

deletedo Attacker cannot distinguish a deleted normal file from an

encrypted hidden fileo Block allocation table with one entry per block on

the partition:o Used blocks: entry encrypted with same key as data

block o Unused blocks: random datao The table helps locating data and detecting corrupted

blocks (lower security levels can still overwrite higher ones)

6

What if attacker can observe accesses to the store? Remote or shared semi-trusted store Distributed P2P system

Same game as before: o Ability to coerce the user at any point in timeo User produces keys to some security levelso Attacker inspects user computero If attacker is able to determine that the user has not

provided all her keys, the attacker wins BUT now the adversary has prior information (which blocks

have been accessed/modified) Previous systems do not provide plausible deniability

against this adversary model

ATTACKER MODEL: CONTINUOUS OBSERVATION

7

PREVIOUS WORK WHERE THIS ADVERSARY IS RELEVANT: P2P

8

PREVIOUS WORK WHERE THIS ADVERSARY IS RELEVANT: SEMI-TRUSTED REMOTE STORE

Semi-trusted remote store: Zhou et al. (2004) Use of constant rate cover traffic (dummy

accesses) to disguise file accesses Every time a block location is accessed, it is

overwritten with different data (re-encrypted with different IV)

Block updates no longer indicate file modifications Every time a file block is accessed, it is moved to

another (empty) location Protects against simple access frequency analysis Relocations are low-entropy

Broken by Troncoso et al. (2007) with traffic analysis attacks that find correlations between sets of accesses

Multi-block files are found prior to coercion if they are accessed twice

One-block files are found if accessed a few times 9

HOW IT IS BROKEN (SIMPLIFIED VERSION)

1 10 2 20 3 30 4 40 …

… 10 100 20 200 30 300 40 400

At time t1 At time t2

10

Can we provide plausible deniability against an adversary who monitors the store prior to coercion?

11

SYSTEM MODEL Files are stored on fixed-size blocks Blocks containing (encrypted) file data are undistinguishable from

empty blocks containing random data Several levels of security (we assume hierarchical)

User discloses keys to some of these levels while keeping others hidden Data persistence: erasure codes for redundancy (impact on plausible

deniability) Traffic analysis resistance

Constant rate dummy traffic High entropy block relocation

12

Process user file requestsGenerate dummy traffic (uniform)

USER LOGIN User logs in with security level s, by providing key uks

Agent trial-decrypts every entry in the table Files in security levels s or lower can be found in the table Files in higher security levels are indistinguishable from

random (empty) Agent starts making block accesses (either dummy or

to retrieve files requested by the user) For each block, the agent performs an access cycle

13

Table

Block containing a file in security level s

User key: uks

(One time) block key: bki

Empty block, or containing a file in security level higher than s

BLOCK ENCRYPTION

14

data

random

ACCESS CYCLE

15

Table

ATTACK METHODOLOGY

1. Attacker profiles the system to extract: Typical access sequences when the user is idle

(dummy traffic) Typical access sequences when the user is accessing

a file2. Attacker monitors accesses and looks for

sequences that look like file accesses3. Attacker coerces the user when sequence

indicates possible file access (worst case scenario)

4. Attacker obtains some user keys and inspects computer

5. Attacker combines the evidence obtained before and after coercion to try to determine if there are more user keys the user has not provided

6. If the probability of undisclosed keys is high, deniability is low, and vice versa.

16

EXTRACTING INFORMATION FROM THE SEQUENCE OF ACCESSES TO THE STORE I

17

Attacker profiles the system to extract typical access sequences when the user is accessing a file )1()1(

1)( )()( tqtEP

tq tpoolt )1()2()2()1( )1()1( tqtqtEtE ttpoolpool

MixSFSMixSFS 4 8 5 1 7 3 29

x x x5 8 43 8 9 4 72 3 7 91

EXTRACTING INFORMATION FROM THE SEQUENCE OF ACCESSES TO THE STORE II

18

Attacker profiles the system to extract: Typical access sequences when the user is idle (dummy

traffic) Establish a baseline for dummy traffic

Analyze accesses to store and find strong correlations (unlikely to be generated by dummy traffic)

For big files, the area that goes over the baseline is much bigger than for dummy traffic (i.e., distinguishable)

SECURITY METRICS: UNOBSERVABILITY

Prior to coercion: we define unobservability (U) as the probability

of a file operation being undetectable by the adversary; i.e., the sequence of store accesses generated by a file operation is considered by the adversary as dummy traffic

19

SECURITY METRICS: DENIABILITY

20

After coercion Percentage of empty blocks in pool compared to the percentage

in the whole store Worst case scenario: coercion occurs immediately after a hidden

file access – large number of “empty” blocks in the pool We define deniability (D) as the probability that the evidence

collected by the adversary (before and after coercion) has been generated by dummy traffic (i.e., no evidence of hidden files).

CONCLUSIONS AND OPEN QUESTIONS

Conclusions Hard to protect against traffic analysis, even

using constant rate dummy traffic Hard to conceal file accesses with dummy traffic

that selects locations uniformly at random When files occupy more blocks, access to them

is harder to conceal Open questions

More sophisticated pattern recognition algorithms may extract more info from the sequence of accesses

Design of smarter traffic analysis strategies Can such a system be implemented in practice?

21

Thank you!

22