the evolution of protocol security and insecurity john a clark dept. of computer science university...
Post on 21-Dec-2015
218 views
TRANSCRIPT
The Evolution of Protocol Security and Insecurity
John A ClarkDept. of Computer Science
University of York, [email protected]
Canterbury 19.02.2002
Overview Motivation Introduction to heuristic optimisation
techniques Part I: making security protocols Part II: breaking protocols based on NP-
hardness
Motivation Search techniques such as simulated annealing
and genetic algorithms have proved hugely successful across many domains
major success story of computer science They have seen little application to cryptology
most work has been concerned with breaking classical permutation and substitution ciphers (easy)
very little application to modern day cryptology (hard) I want to attack systematically this lack of interest. Aim to show possibilities across various domains
from bit twiddling to high levels of abstraction
Heuristic Optimisation
(Local search via simulated annealing as an example)
Local Optimisation - Hill Climbing
x0 x1 x2
z(x)
Neighbourhood of a point x might be N(x)={x+1,x-1}Hill-climb goes x
0 x
1 x
2 since
f(x0)<f(x
1)<f(x
2) > f(x
3)
and gets stuck at x2 (local
optimum)
xopt
Really want toobtain x
opt
x3
Simulated Annealing
x0 x1
x2
z(x)Allows non-improving moves so that it is possible to go down
x11
x4
x5
x6
x7
x8
x9
x10
x12
x13
x
in order to rise again
to reach global optimum
In practice neighbourhood may be very large and trial neighbour is chosen randomly. Possible to accept worsening move when improving ones exist.
Simulated Annealing Improving moves always accepted Non-improving moves may be accepted
probabilistically and in a manner depending on the temperature parameter T. Loosely
the worse the move the less likely it is to be accepted
a worsening move is less likely to be accepted the cooler the temperature
The temperature T starts high and is gradually cooled as the search progresses.
Initially virtually anything is accepted, at the end only improving moves are allowed (and the search effectively reduces to hill-climbing)
Simulated Annealing Current candidate x. Minimisation formulation.
farsobestisSolution
TempTemp
rejectelse
acceptyxcurrentUifelse
acceptyxcurrentif
yfxf
xighbourgenerateNey
timesDo
dofrozenUntil
TTemp
xxcurrent
Temp
95.0
)( ))1,0((exp
)( )0(
)()(
)(
400
)(
0
0
/
At each temperature consider 400 moves
Always accept improving moves
Accept worsening moves probabilistically.
Gets harder to do this the worse the move.
Gets harder as Temp decreases.
Temperature cycle
Simulated Annealing
Do 400 trial moves
Do 400 trial moves
Do 400 trial moves
Do 400 trial moves
Do 400 trial moves
100T
95.0TT
95.0TT
95.0TT
95.0TT
00001.0TDo 400 trial moves
95.0TT
Making Protocols with Heuristic Optimisation
Examples: Secure session key exchange “I am alive” protocols. Various electronic transaction protocols.
Probably the highest profile area of academic security research.
Problems Rather hard to get right “We cannot even get three-line programs right”
Major impetus given to the area by Burrows Abadi and Needham’s belief logic “BAN logic”.
Security Protocols
Allows the assumptions and goals of a protocol to be stated abstractly in a belief logic.
Messages contain beliefs actually held by the sender.
Rules govern how receiver may legitimately update his belief state when he receives a message.
Protocols are series of messages. At the end of the protocol the belief states of the principals should contain the goals.
BAN Logic
Basic elements
BAN Logic
QP,
QK
P
PN
)(# PN
K is a good key for communicating between P and Q
Np is a well-typed ‘nonce’, a number to be used only once in the current protocol run, e.g. a randomlygenerated number useds as a challenge.
Np is ‘fresh’ #, meaning that it really is a valid ‘nonce’
P,Q stand for arbitrary protocol principals
BAN Logic
XP |~
P believes X. The general idea is that principals shouldonly issue statements they actually believe. Thus, P mighthave believed that the number Na was fresh yesterdayand said so, but it would be wrong to conclude that hebelieves it now. If the message is recent (see later) then we might conclude he believes it.
P once said X, i.e. has issued a message containing X at some point
XP |
XP | P has jurisdiction over X. This captures the notion that P is an authority about the statement X. If you believeP believes X and you trust him on the matter, then you should believe X too (see later)
BAN Logic - Assumptions and Goals
BASA
NaANaA
BASSASSAA
Kab
KabKasKas
||
)(#| |
| | |
BAA Kab |
A and S share common belief in the goodness of the key Kasand so they can use it to communicate. S also believes thatthe key Kab is a good session key for A and B.
A has a number Na that he also believes is fresh and believes thatS is the authority on statements about the goodness of key Kab.
The goal of the protocol is to get A to believe the key Kab is good for communication with B
BAN Logic –Message Meaning Rule
QK
P
X
}{X K
statebeliefQ
QK
PP |
XQP |~ |
then P should believe that Q once uttered or ‘once said’ X.
XQ |~
QK
P
X KP If P sees X encrypted
using key K
statebeliefP
,
and P believes that key K is shared securely only with principal Q
QK
PP |
BAN Logic –Nonce Verification Rule
statebeliefP
XQP | |
then P should believe that Q currently believes X
XQ |
)(#| XP ,
and P believes that X is ‘fresh’
)(# XThis rule promotes ‘once saids’ to actual beliefs
If P believes that Q once said X
XQ |~
XQP |~ |
BAN Logic – Jurisdiction Rule
statebeliefP
XP |
then P should believe X too
X
If P believes that Q has jurisdiction over X
XQP | |
XQ |
,
and P believes Q believes X
XQP | |
XQ |
Jurisdiction captures the notion of being an authority.
A typical use would be to give a key server authority over statements of belief about keys.
If I believe that a key is good and you reckon I am an authority on such matters then you should believe the key is good too
Messages as Integer Sequences
sender Belief_1
statebeliefP
null
XQ |~ N p
)(# N P
QK
P
4
3
2
1
0
receiver Belief_2
21 819 12
0=21 mod 3 3=8 mod 51=19 mod 3 2=12 mod 5
P Q N p XQ |~
Say 3 principals P, Q and SP=0, Q=1,S=2
Message components are beliefs in thesender’s current belief state (and so if P has 5 beliefsintegers are interpreted modulo 5)
Search Strategy
We can now interpret sequences of integers as valid protocols.
Interpret each message in turn updating belief states after each message
This is the execution of the abstract protocol. Every protocol achieves something! The issue is
whether it is something we want! We also have a move strategy for the search, e.g.
just randomly change an integer element. This can change the sender,receiver or specific
belief of a message (and indeed subsequent ones)
Fitness Function
We need a fitness function to capture the attainment of goals.
Could simply count the number of goals attained at the end of the protocol
In practice this is awful. A protocol that achieves a goal after 6 messages
would be ‘good as’ one that achieved a goal after 1 message.
Much better to reward the early attainment of goals in some way
Have investigated a variety of strategies.
Fitness Functions
mmess
messmess messvedAfterGoalsAchiew
protocolFitness
1)(
)( is given by
One strategy (uniform credit) would be to make all the weightsthe same. Note that credit is cumulative. A goal achievedafter the first message is also achieved after the second andthird and so on.
Examples
KabKab
KabKab
KbsKab
Kbs
KasKab
Kas
BANbBBA
BANbNaAAB
BANbBNaABS
NbSB
BANaAAS
NaSA
,|~:.6
,,|~:.5
,|~,|~:.4
:.3
,|~:.2
:.1
One of the assumptions made was that B would take S’sword on whether A |~Na
Examples
KabKab
KabKab
KbsKab
KasKab
Kbs
Kas
BANaAAB
BANaNbBBA
BANaANbBBS
BANbBNaAAS
NbSB
NaSA
,|~:.6
,,|~:.5
,|~,|~:.4
,|~,|~:.3
:.2
:.1
General Observations Able to generate protocols whose abstract
executions are proofs of their own correctness Have done so for protocols requiring up to 9
messages to achieve the required goals. Other methods for protocol synthesis is search via
model checking. Exhaustive but limited to short protocols.
Limited by the power of the logic used. Can generalise notion of fitness function to include
aspects other than correctness (e.g. amount of encryption).
General Observations In a sense there is a notion of progress implicit in
the idea of a protocol. Gradually a protocol moves towards its eventual goals. Seems sensible to adopt a guided search rather than an
enumerative type search Nothing to stop you using model checking as an analysis
technique after generating examples using guided search. Generally capable of generating example protocols
in under a minute (1.8 GHz PC) Real need to increase power of the logic. Believe that this is the most abstract application of
heuristic search in cryptology.
Breaking Protocols with Heuristic Optimisation
Identification Problems Notion of zero-knowledge introduced by
Goldwasser and Micali (1985) Indicate that you have a secret without revealing it
Early scheme by Shamir Several schemes of late based on NP-complete
problems Permuted Kernel Problem (Shamir) Syndrome Decoding (Stern) Constrained Linear Equations (Stern) Permuted Perceptron Problem (Pointcheval)
Pointcheval’s Perceptron Schemes
Given
A nm
1a ij
a......aa
...............
a......aa
a.......aa
mnm2m1
2n2221
1n1211
1 js
Find
:
: 2
1
ns
s
s
S n 1
0
:
0
0
:
2
1
mw
w
w
SA nnm 1
So That
Interactive identification protocols based on NP-complete problem.
Perceptron Problem.
Pointcheval’s Perceptron Schemes
Given
A nm
1a ij
a......aa
...............
a......aa
a.......aa
mnm2m1
2n2221
1n1211
1 js
Find
:
: 2
1
ns
s
s
S n 1
:
2
1
mw
w
w
SA nnm 1
So That
Permuted Perceptron Problem (PPP). Make Problem harder by imposing extra constraint.
Has particular histogram H of positive values
1 3 5 .. .. ..
Example: Pointcheval’s Scheme
PP and PPP-example Every PPP solution is a PP solution.
5
1
1
3
1
1
1
1
1
11111
11111
1111-1
1-11-1-1
)1,1,2(
))5(),3(),1((
hhhH
Has particular histogram H of positive values
1 3 5
Generating Instances
Suggested method of generation
1
1
1
1
1
• Generate random secret S
5
1
1
3
• Calculate AS
• Generate random matrix A
11111
11111
1111-1
11-111-
Significant structure in this problem; high correlation between majority values of matrix columns and secret corresponding secret bits• If any (AS)i <0 then negate ith row of
A
5
1
1
3
1
1
1
1
1
11111
11111
1111-1
1-11-1-1
Instance Properties
Each matrix row/secret dot product is the sum of n Bernouilli (+1/-1) variables.
Initial image histogram has Binomial shape and is symmetric about 0 After negation simply folds over to be positive
-7–5-3-1 1 3 5 7… 1 3 5 7…
Image elements tend to be small
PP Using Search: Pointcheval
Pointcheval couched the Perceptron Problem as a search problem.
1
1
1
1
1
1Y
1
1
1
1
1
2Y
1
1
1
1
1
3Y
1
1
1
1
1
4Y
1
1
1
1
1
5Y
current solution Y
Neighbourhood defined by single bit flips on current solution
1
1
1
1
1
Cost function punishes any negative image components
1
3
1
1
AY
costNeg(y)=|-1|+|-3| =4
Using Annealing: Pointcheval
PPP solution is also PP solution. Based estimates of cracking PPP on ratio of PP solutions to
PPP solutions. Calculated sizes of matrix for which this should be most
difficult Gave rise to (m,n)=(m,m+16) Recommended (m,n)=(101,117),(131,147),
(151,167) Gave estimates for number of years needed to solve PPP
using annealing as PP solution means Instances with matrices of size 200 ‘could usually be solved
within a day’ But no PPP problem instance greater than 71 was ever
solved this way ‘despite months of computation’.
Perceptron Problem (PP)
Knudsen and Meier approach (loosely): Carrying out sets of runs Note where results obtained all agree Fix those elements where there is complete
agreement and carry out new set of runs and so on. If repeated runs give same values for particular bits
assumption is that those bits are actually set correctly
Used this sort of approach to solve instances of PP problem up to 180 times faster than Pointcheval for (151,167) problem but no upper bound given on sizes achievable.
Profiling Annealing
Approach is not without its problems. Not all bits that have complete agreement are correct.
Actual SecretRun 1Run 2Run 3Run 4Run 5Run 6All runs agree
All agree (wrongly)
1-1
Knudsen and Meier Have used this method to attack PPP problem sizes
(101,117) Needs hefty enumeration stage (to search for wrong
bits), allowed up to 264 search complexity Used new cost function w1=30, w2=1 with histogram
punishment
cost(y)=w1costNeg(y)+w2costHist(y)
1
1
1
1
Ay)0,0,3()(
)1,1,2()(
yhist
shist
010123)(costHist y
Analogy Time I: Encryption
Key
Plaintext P
Ciphertext C
The Black Box Assumption – essentially considering encryption only as a mathematical function.
In the public arena only really challenged in the 90’s when attacks based on physical implementation arrived
• Paul Kocher’s Timing Attacks
• Simple Power Analysis
• Differential Power Analysis
• Fault Injection Attacks (Belcore, and others)
The computational dynamics of the implementation can leak vast amounts of information
Analogy Time II: Annealing
Initialisation data
Problem P
Final Solution C
The Black Box Assumption – virtually every application of annealing simply throws the technique at problem and awaits the final output.
Is this really the most efficient use of information?
Let’s look inside the box…..
Analogy Time III: Internal Computational Dynamics
Initialisation data
Problem P, e.g. minimise cost(y,A,Hist)
Final Solution C
The algorithm carries out 100 000s of cost function evaluations which guide the search.
Why did it take the path it did?
Bear in mind the whole search process is public and so we can monitor it.
Analogy Time IV: Fault Injection
Initialisation data
Warped or Faulty Problem P’
Final Solution C’
Invariably people assume you need to solve the problem at hand.
Reflected in ‘well-motivated’ or direct cost functionsWhat happens if we inject a ‘fault’ into the process?Mutate the
problem into a similar but different one.
Can we make use of the solutions obtained to help solve original problem?
PP Move Effects
What limits the ability of annealing to find a PP solution? A move changes a single element of the current solution.
Want current negative image values to go positive But changing a bit to cause negative values to go positive
will often cause small positive values to go negative.
01234567 01234567
iAYi
W 2'' i
WiAYiW
Problem Fault Injection
Can significantly improve results by punishing at positive value K
For example punish any value less than K=4 during the search
Drags the elements away from the boundary during search. Also use square of differences |Wi-K|2 rather than simple
deviation
01234567
AYW
Problem Fault Injection
Comparative results Generally allows solution within a few runs of annealing for sizes (201,217) Number of bits correct is generally worst when K=0. Best value for K varies between sizes (but can do profiling to test what it is)
Has proved possible to solve for size (601,617) and higher. Enormous increase in power for essentially change to one line of the
program Using powers of 2 rather than just modulus Use of K factor
Morals… Small changes may make a big difference. The real issue is how the cost function and the search technique interact The cost function need not be the most `natural’ direct expresion of the
problem to be solved. Cost functions are a means to an end.
This is a form of fault injection on the problem.
Profiling Annealing
But look again at the cost function templates
Different weights w1 and w2 will given different results yet the resulting cost functions seem plausibly well-motivated.
We can view different choices of weights as different viewpoints on the problem.
Now carry out runs using the different costs functions.
)(costHistw)(costNegw)(cost 21 xxx
Radical Viewpoint Analysis
Problem P
Problem P1 Problem P2 Problem Pn-1 Problem Pn
Essentially create mutant problems and attempt to solve them.
If the solutions agree on particular elements then they generally will do so for a reason, generally because they are correct.
Can think of mutation as an attempt to blow the search away from actual original solution
Faults are Good Consequence is that warped problems
typically give rise to solutions with more agreement than the original secret than non-warped ones.
For example (101,117): up to 108 bits correct (131,147): up to 139 bits correct (151,167): up to 157 bits correct.
Profiling Annealing: Timing
But this throws away a lot of information – better to monitor the search process as it cools down.
Based on notion of thermostatistical annealing. Watch the elements of the secret vector as the search proceeds. Record the temperature cycle at which the last change to an
elements value occurs, i.e. +1 to –1 or vice versa At the end of the search all elements are fixed. Analysis shows that some elements will take some values early in
the search and then never subsequently change. They get ‘stuck’ early in the search.
The ones that get stuck early often do so for good reason – they are the correct values.
Profiling Annealing: Timing
Tested 30 PPP instances (101,117) with 32 different strategies (different weights wi for negativity and histogram component costs and different values of K). Ten runs at each strategy.
Maximum number of initial bits fixed at correct values
Some strategies far better than others Channel is highly volatile – hence need for repeated runs. For small K the minimum number of bits correct in final
solution is radically worse than for larger values of K. Also initial bits correct is radically worse.
<4040-4950-5960-6970-79
2101422
Profiling Annealing: Timing
Tested 30 PPP instances (151,167) with 16 different strategies (different weights wi for negativity and histogram component costs and different values of K). Ten runs at each strategy.
Maximum number of initial bits fixed at correct values
Similar general results as before. Also tried for (201,217) – some runs in excess of 100 initial
stuck bits correct.
<4040-4950-5960-6970-7980+
1591122
Profiling Annealing: Timing
Try running many runs of annealing on the various mutant problems.
Add up the sticking times for each element over all runs.
Ones with least aggregate times generally the majority vector is correct.
Calculate the majority vectors over all runs. Rank the element values by degree of agreement
all runs say value[I]=1 is top ranked half runs say value[I]=1, half give value[I]=-1 worst ranked the top ranked bits tend to have a majority value that is
correct. In some cases the top 100 out of 157 bits had only one
incorrect majority value.
Some Questions
Can you fix an element of the solution at +1 and –1 and determine likelihood of correctness based on distribution of results obtained?
Affects of different parameters (e.g. power parameters)? How well can we profile the distribution of results in order to
isolate those ones at the extremes of correctness? Can we apply similar profiling tricks to other NP-complete
problems Permuted Kernel Problem Syndrome Decoding
Some Questions
Why does everyone try to find the secret/key directly? e.g. for Block ciphers can we use guided search techniques to
generate better approximations? Use search to generate better (or more) cryptanalytic tools, e.g.
multiple approximations? Very loose. What would happen if you tried to search for a
key on a difficult traditional encryption algorithm?
Encrypt(K: P)=C
Suppose you tried a guided search based on Hamming Distance
Encrypt(K’: P)=C’
Cost(C’,C)=hamming(C,C’) (or sum of such costs over Pi)
No chance of success at all. But what is the distribution of the failures? Is there a cost function that would induce an exploitable distribution of solutions?
Some Questions
Work combines fault injection and a `timing’ attack? What is the equivalent of differential power analysis for
heuristic search?
Finally
Only given some of the results of work so far. More technical work on Boolean functions for cryptography has
given results better than mathematical constructions to date. But approaches taken are similarly non-standard.
There is a great deal of work to be done, particularly in the application of these techniques to the cryptanalysis of modern day crypto algorithms.