an analysis of parallel mixing with attacker-controlled inputs nikita borisov formerly of uc...
TRANSCRIPT
An Analysis of Parallel Mixing with Attacker-Controlled Inputs
Nikita Borisov
formerly of UC Berkeley
Definitions
• “Parallel Mixing” A latency optimization for synchronous re-encryption
mixnets [Golle & Juels 2004]
• “Attacker-Controlled Inputs” Inputs to a mixnet which can be linked to corresponding
outputs Either directly controlled by attackers or discovered through
other means
• “Analysis” Low anonymity if most inputs are known If few inputs are known, anonymity loss can be amplified
with repeated mixings
Synchronous Re-encryption Mixes• Messages are mixed by all mix servers• Re-encryption of each message under the
same decryption key
M1
M2
M3
M4
Mix 1
M’1M’2M’3M’4
Mix 2
M’’1M’’2M’’3M’’4
Parallel Mixing
M1
M2
M3
M4
Mix 2
Mix 1
Mix 2
Mix 1
Mix 2
Mix 1
Mix 2
Mix 1
Rotation RotationDistribution
Properties
• Initial public permutation to assign inputs to batches• T rotations, followed by 1 distribution, followed by T
more rotations• Defends against up to T dishonest mixes• Latency is 2(T+1)*N/M re-encryptions
N - number of messages M - number of mix servers
• Even with T=M-1, faster than conventional cascade with N*M re-encryptions (for M>2)
Attacker-Controlled Inputs
M1
M2
M3
M4
Mix 2
Mix 1
Mix 2
Mix 1
Mix 2
Mix 1
Mix 2
Mix 11
2
1
2
Overview
• Introduction• Analysis Methods• Analysis Results• Multiple-round analysis• Open problems• Conclusions
Theorem 1 Definitions (j) = # of known inputs in batch j ((1) = 1) (j’) = # of known outputs in batch j’ ((1) = 1) (j,j’) = # of known inputs in batch j matching outputs in
batch j’ ((1,1) = 0)
M1
M2
M3
M4
Mix 2
Mix 1
Mix 2
Mix 1
Mix 2
Mix 1
Mix 2
Mix 11
2
1
2
Theorem 1
M1
M2
M3
M4
Mix 2
Mix 1
Mix 2
Mix 1
Mix 2
Mix 1
Mix 2
Mix 11
2
1
2
Pr[s1 -> s1] = (1-0)/((2-1)(2-1)) = 1
Anonymity Metrics
• Anon [Golle and Juels ‘04]
• Entropy [SD’02, DSCP’02]
• Can compute either metric using Theorem 1 Need to know (j), (j’), and (j,j’) for each j,j’
Scenarios
• Given a scenario: # of known inputs Distribution of known inputs among input batches Distribution of known outputs among output
batches
• We can compute: (j), (j’), and (j,j’) Anonymity metrics
• What’s a typical scenario? Distribution of anonymity metrics
Combinatorial Enumeration
• Given # of known inputs, enumerate through all scenarios All initial permutations All mix shuffle choices
• Compute (j), (j’), and (j,j’) for each possibility
• Improvements: Partition states into equivalence classes Combinatorial enumeration
3 Mixes, 18 Inputs
0
2
4
6
8
10
12
14
16
18
20
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
# of unknown inputs
Anon metric
Optimal
First quartile
Median
Third quartile
17311151454831150294756284149883771654705774592000000000000000000000 possible scenarios
Sampling
• Full enumeration still impractical for large systems
• Instead, we use sampling: Given a # of known inputs, simulate a random
scenario Compute (j), (j’), and (j,j’) and anonymity
metrics Repeat
• Get a sampled distribution of metrics Misses the tail of distribution, but we don’t care
1008 Inputs, 900 unknown
0
100
200
300
400
500
600
700
800
900
1000
650 700 750 800 850 900
Anon metric
# of samples
2 mixes
3 mixes
4 mixes
6 mixes
12 mixes
1008 inputs, 100 unknown
0
100
200
300
400
500
600
700
800
900
1000
0 5 10 15 20 25 30
Anon metric
# of samples
12 mixes
Multiple-Round Analysis
• Anonymity may be short of optimal, but with Anon > 10, who cares?
• Consider repeated mixing of the same inputs Unlikely to happen with e-voting Likely if parallel mixing used for TCP forwarding
• Each mixing is a new, random observation Reveals new information each time
• Over time, input-output correspondence identified w.h.p.
Repeated mixing with 500 unknown inputs• Note: all mixes here are honest!
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1 26 51 76 101 126 151 176 201 226 251 276 301 326 351 376 401 426 451 476
# of repetitions
Probability
Correct guess
Incorrect guess
Conclusions
• Parallel mixing reveals information when attackers control some inputs Big problem if most inputs are controlled When fewer inputs are known, repeated mixings
may still be a problem
• This problem exists even if all mixes are honest
• Statistical approximations should be checked by simulations