a f ramework for the a nalysis of m ix -b ased s teganographic f ile s ystems claudia diaz, carmela...
TRANSCRIPT
A FRAMEWORK FOR THE ANALYSIS OF MIX-BASED STEGANOGRAPHIC FILE SYSTEMS
Claudia Diaz, Carmela Troncoso, Bart Preneel
K.U.Leuven / COSIC
Cambridge, January 28, 20091
MOTIVATION Problem: we want to keep stored information
secure (confidential) Encryption protects against the unwanted
disclosure of information but… reveals the fact that hidden information exists!
User can be threatened / tortured / coerced to disclose the decryption keys (“coercion attack”) We need to hide the existence of files
Property: plausible deniability Allow users to deny believably that any further encrypted
data is located on the storage device If password is not known, not possible to determine the
existence of hidden files 2
ATTACKER MODEL: ONE SNAPSHOT
Attacker has never inspected the user’s computer before coercion
Ability to coerce the user at any point in time User produces some keys Attacker inspects user computer
Game: If attacker is able to determine that the user has not provided all her keys, the attacker wins
3
ANDERSON, NEEDHAM & SHAMIR (1998)
1. Use cover files such that a linear combination (XOR) of them reveals the information
Password: subset of files to combine Hierarchy (various levels of security)
User can show some “low” security levels while hiding “high” security levels
Not possible to know whether she has revealed the keys to all existing levels
Drawbacks: File read operations have high cost Needs a lot of cover files to be secure (computationally
infeasible to try all combinations) Assumes adversary knows nothing about the plaintext
4
ANDERSON, NEEDHAM & SHAMIR (1998)
2. Real files hidden in encrypted form in pseudo-random locations amongst random data
Location derived from the name of the file and a password
Collisions (birthday paradox) overwrite data: Use only small part of the storage capacity ( < )
Replication
All copies of a block need to be overwritten to lose the data
Linear hierarchy: higher security levels need more replication
5
N
STEGFS: MCDONALD & KUHN (1999)
o Implemented as extension of the Linux file system (Ext2fs)o Hidden files are placed into unused blocks of a “normal”
partitiono Normal files are overwritten with random data when
deletedo Attacker cannot distinguish a deleted normal file from an
encrypted hidden fileo Block allocation table with one entry per block on
the partition:o Used blocks: entry encrypted with same key as data
block o Unused blocks: random datao The table helps locating data and detecting corrupted
blocks (lower security levels can still overwrite higher ones)
6
What if attacker can observe accesses to the store? Remote or shared semi-trusted store Distributed P2P system
Same game as before: o Ability to coerce the user at any point in timeo User produces keys to some security levelso Attacker inspects user computero If attacker is able to determine that the user has not
provided all her keys, the attacker wins BUT now the adversary has prior information (which blocks
have been accessed/modified) Previous systems do not provide plausible deniability
against this adversary model
ATTACKER MODEL: CONTINUOUS OBSERVATION
7
PREVIOUS WORK WHERE THIS ADVERSARY IS RELEVANT: SEMI-TRUSTED REMOTE STORE
Semi-trusted remote store: Zhou et al. (2004) Use of constant rate cover traffic (dummy
accesses) to disguise file accesses Every time a block location is accessed, it is
overwritten with different data (re-encrypted with different IV)
Block updates no longer indicate file modifications Every time a file block is accessed, it is moved to
another (empty) location Protects against simple access frequency analysis Relocations are low-entropy
Broken by Troncoso et al. (2007) with traffic analysis attacks that find correlations between sets of accesses
Multi-block files are found prior to coercion if they are accessed twice
One-block files are found if accessed a few times 9
HOW IT IS BROKEN (SIMPLIFIED VERSION)
1 10 2 20 3 30 4 40 …
… 10 100 20 200 30 300 40 400
At time t1 At time t2
10
Can we provide plausible deniability against an adversary who monitors the store prior to coercion?
11
SYSTEM MODEL Files are stored on fixed-size blocks Blocks containing (encrypted) file data are undistinguishable from
empty blocks containing random data Several levels of security (we assume hierarchical)
User discloses keys to some of these levels while keeping others hidden Data persistence: erasure codes for redundancy (impact on plausible
deniability) Traffic analysis resistance
Constant rate dummy traffic High entropy block relocation
12
Process user file requestsGenerate dummy traffic (uniform)
USER LOGIN User logs in with security level s, by providing key uks
Agent trial-decrypts every entry in the table Files in security levels s or lower can be found in the table Files in higher security levels are indistinguishable from
random (empty) Agent starts making block accesses (either dummy or
to retrieve files requested by the user) For each block, the agent performs an access cycle
13
Table
Block containing a file in security level s
User key: uks
(One time) block key: bki
Empty block, or containing a file in security level higher than s
BLOCK ENCRYPTION
14
data
random
ATTACK METHODOLOGY
1. Attacker profiles the system to extract: Typical access sequences when the user is idle
(dummy traffic) Typical access sequences when the user is accessing
a file2. Attacker monitors accesses and looks for
sequences that look like file accesses3. Attacker coerces the user when sequence
indicates possible file access (worst case scenario)
4. Attacker obtains some user keys and inspects computer
5. Attacker combines the evidence obtained before and after coercion to try to determine if there are more user keys the user has not provided
6. If the probability of undisclosed keys is high, deniability is low, and vice versa.
16
EXTRACTING INFORMATION FROM THE SEQUENCE OF ACCESSES TO THE STORE I
17
Attacker profiles the system to extract typical access sequences when the user is accessing a file )1()1(
1)( )()( tqtEP
tq tpoolt )1()2()2()1( )1()1( tqtqtEtE ttpoolpool
MixSFSMixSFS 4 8 5 1 7 3 29
x x x5 8 43 8 9 4 72 3 7 91
EXTRACTING INFORMATION FROM THE SEQUENCE OF ACCESSES TO THE STORE II
18
Attacker profiles the system to extract: Typical access sequences when the user is idle (dummy
traffic) Establish a baseline for dummy traffic
Analyze accesses to store and find strong correlations (unlikely to be generated by dummy traffic)
For big files, the area that goes over the baseline is much bigger than for dummy traffic (i.e., distinguishable)
SECURITY METRICS: UNOBSERVABILITY
Prior to coercion: we define unobservability (U) as the probability
of a file operation being undetectable by the adversary; i.e., the sequence of store accesses generated by a file operation is considered by the adversary as dummy traffic
19
SECURITY METRICS: DENIABILITY
20
After coercion Percentage of empty blocks in pool compared to the percentage
in the whole store Worst case scenario: coercion occurs immediately after a hidden
file access – large number of “empty” blocks in the pool We define deniability (D) as the probability that the evidence
collected by the adversary (before and after coercion) has been generated by dummy traffic (i.e., no evidence of hidden files).
CONCLUSIONS AND OPEN QUESTIONS
Conclusions Hard to protect against traffic analysis, even
using constant rate dummy traffic Hard to conceal file accesses with dummy traffic
that selects locations uniformly at random When files occupy more blocks, access to them
is harder to conceal Open questions
More sophisticated pattern recognition algorithms may extract more info from the sequence of accesses
Design of smarter traffic analysis strategies Can such a system be implemented in practice?
21