assumptions in the use of heuristic optimisation in cryptography john a clark dept. of computer...
Post on 20-Dec-2015
217 views
TRANSCRIPT
Assumptions in the Use of Heuristic Optimisation in Cryptography
John A ClarkDept. of Computer Science
University of York, [email protected]
Overview Purpose of Talk Introduction to heuristic optimisation
techniques Examples of current use and assumptions
made Why such assumptions may usefully be
relaxed Some idle speculation about possibilities
Purpose of Talk Heuristic search techniques have proven
extraordinarily useful at solving `hard’ problems in a great number of engineering fields
Little application to cryptographic problems Why?
Limited success but close inspection seems to suggest that some limitations are self-imposed. Wish to question some basic assumptions about the
way these techniques are used.
Optimisation Subject of huge practical importance. An
optimisation problem may be stated as follows:
Find the value x that maximises the function z(y) over D. Similarly for minimising.
Given a domain D and a function z: D find x in D such that
z(x)=sup{z(y): y in D}
Optimisation Traditional optimisation techniques include:
calculus (e.g. solve differential equations for extrema)
hill-climbing: inspired by notion of calculus gradient ascent etc.
(quasi-) enumerative or otherwise exact: brute force dynamic programming branch and bound linear programming
Optimisation Problems Traditional techniques not without their problems
assumptions may simply not hold e.g. non-differentiable discontinuous functions non-linear functions
problem may suffer from ‘ curse (joy?) of dimensionality ’ - the problem is simply too big to handle exactly (e.g. by brute force). NP hard problems.
Some techniques may tend to get stuck in local optima for non-linear problems (see later)
The various difficulties have led researchers to investigate the use of heuristic techniques typically inspired by natural processes that typically give good solutions to optimisation problems (but forego guarantees).
Heuristic Optimisation A variety of techniques have been developed to
deal with non-linear and discontinuous problems highest profile one is probably genetic algorithms
works with a population of solutions and breeds new solutions by aping the processes of natural reproduction
Darwinian survival of the fittest proven very robust across a huge range of problems can be very efficient
Simulated annealing - a local search technique based on cooling processes of molten metals (used in this paper)
Will illustrate problems with non-linearity and then describe simulated annealing.
Local Optimisation - Hill Climbing Let the current solution be x. Define the neighbourhood N(x) to be the set of
solutions that are ‘close’ to x If possible, move to a neighbouring solution
that improves the value of z(x), otherwise stop. Choose any y as next solution provided z(y) >= z(x)
loose hill-climbing Choose y as next solution such that z(y)=sup{z(v): v
in N(x)} steepest gradient ascent
Local Optimisation - Hill Climbing
x0 x1 x2
z(x)
Neighbourhood of a point x might be N(x)={x+1,x-1}Hill-climb goes x
0 x
1 x
2 since
f(x0)<f(x
1)<f(x
2) > f(x
3)
and gets stuck at x2 (local
optimum)
xopt
Really want toobtain x
opt
x3
Simulated Annealing
x0 x1
x2
z(x)Allows non-improving moves so that it is possible to go down
x11
x4
x5
x6
x7
x8
x9
x10
x12
x13
x
in order to rise again
to reach global optimum
In practice neighbourhood may be very large and trial neighbour is chosen randomly. Possible to accept worsening move when improving ones exist.
Simulated Annealing Improving moves always accepted Non-improving moves may be accepted
probabilistically and in a manner depending on the temperature parameter T. Loosely the worse the move the less likely it is to be
accepted a worsening move is less likely to be accepted the
cooler the temperature The temperature T starts high and is gradually
cooled as the search progresses. Initially virtually anything is accepted, at the end only
improving moves are allowed (and the search effectively reduces to hill-climbing)
Simulated Annealing Current candidate x. Minimisation formulation.
farsobestisSolution
TempTemp
rejectelse
acceptyxcurrentUifelse
acceptyxcurrentif
yfxf
xighbourgenerateNey
timesDo
dofrozenUntil
TTemp
xxcurrent
Temp
95.0
)( ))1,0((exp
)( )0(
)()(
)(
400
)(
0
0
/
At each temperature consider 400 moves
Always accept improving moves
Accept worsening moves probabilistically.
Gets harder to do this the worse the move.
Gets harder as Temp decreases.
Temperature cycle
Simulated Annealing
Do 400 trial moves
Do 400 trial moves
Do 400 trial moves
Do 400 trial moves
Do 400 trial moves
100T
95.0TT
95.0TT
95.0TT
95.0TT
00001.0TDo 400 trial moves
95.0TT
Problem Fault Injection and Side Channels on An Analysis Technique
Identification Problems Notion of zero-knowledge introduced by
Goldwasser and Micali (1985) Indicate that you have a secret without revealing it
Early scheme by Shamir Several schemes of late based on NP-complete
problems Permuted Kernel Problem (Shamir) Syndrome Decoding (Stern) Constrained Linear Equations (Stern) Permuted Perceptron Problem (Pointcheval)
Pointcheval’s Perceptron Schemes
Given
A nm
1a ij
a......aa
...............
a......aa
a.......aa
mnm2m1
2n2221
1n1211
1 js
Find
:
: 2
1
ns
s
s
S n 1
0
:
0
0
:
2
1
mw
w
w
SA nnm 1
So That
Interactive identification protocols based on NP-complete problem.
Perceptron Problem.
Pointcheval’s Perceptron Schemes
Given
A nm
1a ij
a......aa
...............
a......aa
a.......aa
mnm2m1
2n2221
1n1211
1 js
Find
:
: 2
1
ns
s
s
S n 1
:
2
1
mw
w
w
SA nnm 1
So That
Permuted Perceptron Problem (PPP). Make Problem harder by imposing extra constraint.
Has particular histogram H of positive values
1 3 5 .. .. ..
Example: Pointcheval’s Scheme
PP and PPP-example Every PPP solution is a PP solution.
5
1
1
3
1
1
1
1
1
11111
11111
1111-1
1-11-1-1
)1,1,2(
))5(),3(),1((
hhhH
Has particular histogram H of positive values
1 3 5
Generating Instances
Suggested method of generation:
11111
11111
1111-1
11-111-
• Generate random matrix A
1
1
1
1
1
• Generate random secret S
5
1
1
3
• Calculate AS• If any (AS)i <0 then negate ith row of
A
11111
11111
1111-1
11-111-
1
1
1
1
1
5
1
1
3
11111
11111
Significant structure in this problem; high correlation between majority values of matrix columns and secret corresponding secret bits
Instance Properties
Each matrix row/secret dot product is the sum of n Bernouilli (+1/-1) variables.
Initial image histogram has Binomial shape and is symmetric about 0 After negation simply folds over to be positive
-7–5-3-1 1 3 5 7… 1 3 5 7…
Image elements tend to be small
PP Using Search: Pointcheval
Pointcheval couched the Perceptron Problem as a search problem.
1
1
1
1
1
1Y
1
1
1
1
1
2Y
1
1
1
1
1
3Y
1
1
1
1
1
4Y
1
1
1
1
1
5Y
current solution Y
Neighbourhood defined by single bit flips on current solution
1
1
1
1
1
Cost function punishes any negative image components
1
3
1
1
AY
costNeg(y)=|-1|+|-3| =4
Using Annealing: Pointcheval
PPP solution is also PP solution. Based estimates of cracking PPP on ratio of PP solutions to
PPP solutions. Calculated sizes of matrix for which this should be most
difficult Gave rise to (m,n)=(m,m+16) Recommended (m,n)=(101,117),(131,147),
(151,167) Gave estimates for number of years needed to solve PPP
using annealing as PP solution means Instances with matrices of size 200 ‘could usually be solved
within a day’ But no PPP problem instance greater than 71 was ever
solved this way ‘despite months of computation’.
Perceptron Problem (PP)
Knudsen and Meier approach (loosely): Carrying out sets of runs Note where results obtained all agree Fix those elements where there is complete
agreement and carry out new set of runs and so on. If repeated runs give same values for particular bits
assumption is that those bits are actually set correctly
Used this sort of approach to solve instances of PP problem up to 180 times faster than Pointcheval for (151,167) problem but no upper bound given on sizes achievable.
Profiling Annealing
Approach is not without its problems. Not all bits that have complete agreement are correct.
Actual SecretRun 1Run 2Run 3Run 4Run 5Run 6All runs agree
All agree (wrongly)
1-1
Knudsen and Meier Have used this method to attack PPP problem sizes
(101,117) Needs hefty enumeration stage (to search for wrong
bits), allowed up to 264 search complexity Used new cost function w1=30, w2=1 with histogram
punishment
cost(y)=w1costNeg(y)+w2costHist(y)
1
1
1
1
Ay)0,0,3()(
)1,1,2()(
yhist
shist
010123)(costHist y
Analogy Time I: Encryption
Key
Plaintext P
Ciphertext C
The Black Box Assumption – essentially considering encryption only as a mathematical function.
In the public arena only really challenged in the 90’s when attacks based on physical implementation arrived
• Paul Kocher’s Timing Attacks
• Simple Power Analysis
• Differential Power Analysis
• Fault Injection Attacks (Belcore, and others)
The computational dynamics of the implementation can leak vast amounts of information
Analogy Time II: Annealing
Initialisation data
Problem P
Final Solution C
The Black Box Assumption – virtually every application of annealing simply throws the technique at problem and awaits the final output.
Is this really the most efficient use of information?
Let’s look inside the box…..
Analogy Time III: Internal Computational Dynamics
Initialisation data
Problem P, e.g. minimise cost(y,A,Hist)
Final Solution C
The algorithm carries out 100 000s of cost function evaluations which guide the search.
Why did it take the path it did?
Bear in mind the whole search process is public and so we can monitor it.
Analogy Time IV: Fault Injection
Initialisation data
Warped or Faulty Problem P’
Final Solution C’
Invariably people assume you need to solve the problem at hand.
Reflected in ‘well-motivated’ or direct cost functionsWhat happens if we inject a ‘fault’ into the process?Mutate the
problem into a similar but different one.
Can we make use of the solutions obtained to help solve original problem?
PP Move Effects
What limits the ability of annealing to find a PP solution? A move changes a single element of the current solution.
Want current negative image values to go positive But changing a bit to cause negative values to go positive
will often cause small positive values to go negative.
01234567 01234567
iAYiW 2''
iWiAYiW
Problem Fault Injection
Can significantly improve results by punishing at positive value K For example punish any value less than K=4 during the
search Drags the elements away from the boundary during search. Also use square of differences |Wi-K|2 rather than simple
deviation
01234567
AYW
Problem Fault Injection
Comparative results Generally allows solution within a few runs of annealing for sizes (201,217) Number of bits correct is generally worst when K=0. Best value for K varies between sizes (but can do profiling to test what it is)
Has proved possible to solve for size (401,417) and higher. Enormous increase in power for essentially change to one line of the
program Using powers of 2 rather than just modulus Use of K factor
Morals… Small changes may make a big difference. The real issue is how the cost function and the search technique interact The cost function need not be the most `natural’ direct expresion of the
problem to be solved. Cost functions are a means to an end.
This is a form of fault injection on the problem.
Profiling Annealing
But look again at the cost function templates
Different weights w1 and w2 will given different results yet the resulting cost functions seem plausibly well-motivated.
We can view different choices of weights as different viewpoints on the problem.
Now carry out runs using the different costs functions. Very effective – using about 30 cost functions have
managed to get agreement on about 25% of the key with less than 0.5 bits on average in error
Additional cost functions remove incorrect agreement (but may also reduce correct agreement).
)(costHistw)(costNegw)(cost 21 xxx
Radical Viewpoint Analysis
Problem P
Problem P1 Problem P2 Problem Pn-1 Problem Pn
Essentially create mutant problems and attempt to solve them.
If the solutions agree on particular elements then they generally will do so for a reason, generally because they are correct.
Can think of mutation as an attempt to blow the search away from actual original solution
Profiling Annealing: Timing
Simulated annealing can make progress, typically getting solutions with around 80% of the vector entries correct (but don’t know which 80%)
But this throws away a lot of information – better to monitor the search process as it cools down.
Based on notion of thermostatistical annealing. Watch the elements of the secret vector as the search proceeds. Record the temperature cycle at which the last change to an
elements value occurs, i.e. +1 to –1 or vice versa At the end of the search all elements are fixed. Analysis shows that some elements will take some values early in
the search and then never subsequently change. They get ‘stuck’ early in the search.
The ones that get stuck early often do so for good reason – they are the correct values.
Profiling Annealing: Timing
Tested 30 PPP instances (101,117) with 32 different strategies (different weights wi for negativity and histogram component costs and different values of K). Ten runs at each strategy.
Maximum number of initial bits fixed at correct values
Some strategies far better than others – value of K is very important: K=13 seems very good candidate.
Channel is highly volatile – hence need for repeated runs. Note also that some runs had up to 108 of 117 bits set
correctly in final solution. For small K the minimum number of bits correct in final
solution is radically worse than for larger values of K.
<4040-4950-5960-6970-79
2101422
Profiling Annealing: Timing
Tested 30 PPP instances (151,167) with 16 different strategies (different weights wi for negativity and histogram component costs and different values of K). Ten runs at each strategy.
Maximum number of initial bits fixed at correct values
Similar general results as before. Also tried for (201,217) – some runs in excess of 100 initial
stuck bits correct.
<4040-4950-5960-6970-7980+
1591122
Some Questions
Can you fix an element of the solution at +1 and –1 and determine likelihood of correctness based on distribution of results obtained?
Affects of different parameters (e.g. power parameters)? How well can we profile the distribution of results in order to
isolate those ones at the extremes of correctness? Can we apply similar profiling tricks to other NP-complete
problems Permuted Kernel Problem Syndrome Decoding
Example – Permuted Kernel Problem
Arithmetic carried out mod p
635223
562452
441253
0
such that V of Vn permutatiosecret Find perm
permAV
5
4
6
5
3
2
nmA 1nV
Example – Syndrome Decoding
Arithmetic carried out mod 2
set bitsk with S
secret find
Small number k of bits in S set to 111 nnmm SAW
000111
100110
011001
0
0
1
0
0
1
W Image
Given
set bits 56 has
exampleFor
15125122561256
S
SAW
1
0
0
AMatrix
Some Questions
Why does everyone try to find the secret/key directly? e.g. for Block ciphers can we use guided search techniques to
generate better approximations? Use search to generate better (or more) cryptanalytic tools, e.g.
multiple approximations? Very loose. What would happen if you tried to search for a
key on a difficult traditional encryption algorithm?
Encrypt(K: P)=C
Suppose you tried a guided search based on Hamming Distance
Encrypt(K’: P)=C’
Cost(C’,C)=hamming(C,C’) (or sum of such costs over Pi)
No chance of success at all. But what is the distribution of the failures? Is there a cost function that would induce an exploitable distribution of solutions?
Some Questions
Work combines fault injection and a `timing’ attack? What is the equivalent of differential power analysis for
heuristic search?
Optimisation for Design
Boolean Functions
Design as Optimisation Let DS be the design space or search space Let f(y) be a function over the design space that
signifies how good (bad) a candidate y is. measuring goodness we talk in terms of a fitness
function (measuring badness we talk in terms of a cost function)
Find z in DS such that f(z)=sup{f(y):y in DS}
Traditional techniques such as hill-climbing tend to get stuck in local optima. Need ability to escape from these to achieve global optimum.
Boolean Function Design A Boolean function }1,0{}1,0{: nf
For present purposes we shall use the polar representation
0 0 00 0 10 1 00 1 11 0 01 0 11 1 01 1 1
01234567
1 -10 10 10 11 -10 11 -11 -1
f(x) f(x)x
Will talk only about balanced functions where there are equal numbers of 1s and -1s.
Preliminary Definitions Definitions relating to a Boolean function f of n
variables
)(ˆ)(ˆ)(12
0
xLxfFn
x
Walsh Hadamard
Linear functionL(x)=1x1… nxn
L(x)=(-1)L(x)
(polar form)
Preliminary Definitions Non-linearity
Auto-correlation
For present purposes we need simply note that these can be easily evaluated given a function f. They can therefore be used as the functions to be optimised. Traditionally they are.
)(max22
1 FN n
f
ACf=max | f(x)f(x+s) |xs
Using Parseval’s Theorem Parseval’s Theorem
Loosely, push down on F()2 for some particular and it appears elsewhere.
Suggests that arranging for uniform values of F()2 will lead to good non-linearity. This is the initial motivation for our new cost function.
n
n
F 212
0
2 2)(ˆ
12
0
22)()(costn n
Ff
NEW FUNCTION!
Moves Preserving Balance Start with balanced (but otherwise random) solution. Move
strategy preserves balance
Neighbourhood of a particular function f to be the set of all functions obtained byexchanging (flipping) any two dissimilar values.
Here we have swapped f(2) and f(4)
0 0 00 0 10 1 00 1 11 0 01 0 11 1 01 1 1
01234567
1 -10 10 10 11 -10 11 -11 -1
f(x) f(x)x
-11
1
1-1-1
g(x)
1
-1
Getting in the Right Area Previous work (QUT) has shown strongly
Heuristic techniques can be very effective for cryptographic design synthesis
Boolean function, S-box design etc Hill-climbing works far better than random search Combining heuristic search and hill-climbing generally
gives best results Aside – notion applies more generally too - has led to
development of memetic algorithms in GA work. GAs known to be robust but not suited for ‘fine tuning’.
We will adopt this strategy too: use simulated annealing to get in the ‘right area’ then hill-climb. But we will adopt the new cost function for the first stage.
Hill-climbing With Traditional CF (n=8)
106 108 110 112 114 116104 0 1 0 0 0 096 0 0 0 0 0 088 0 2 1 0 0 080 0 5 7 2 0 072 1 19 31 6 0 0
Varying the Technique (n=8)
108 110 112 114 11680 0 0 0 0 072 0 0 10 0 064 0 0 59 0 056 0 0 186 0 048 0 0 140 1 040 0 0 2 0 032 0 0 0 0 024 0 0 0 0 0
108 110 112 114 11680 0 0 0 0 072 0 0 0 0 064 0 0 0 0 056 0 1 0 0 048 2 7 35 5 040 4 39 158 27 032 4 26 79 11 024 0 0 2 0 0
108 110 112 114 11680 0 0 0 0 072 0 0 0 0 064 0 0 0 0 056 0 0 1 7 148 0 0 14 56 240 0 0 27 176 1832 0 0 23 64 1124 0 0 0 0 0
Non-linearity Non-linearity Non-linearity
Au
tocorr
ela
tion
Simulated AnnealingWith Traditional CF
Simulated AnnealingWith New CF
Simulated AnnealingWith New CF+Hill Climbing With Traditional CF
Tuning the Technique Experience has shown that experimentation is
par for the course with optimisation. Initial cost function motivated by theory but
the real issue is how the cost function and search technique interact.
Have generalised the initial cost function to give a parametrised family of new cost functionsCost(f)=||F()|-(2 n/2+K)|
R
Tuning the Technique (n=8)
112 114 11656 0 0 048 2 5 1340 9 68 8032 29 74 11524 1 1 316 0 0 0
K=4
Non-linearityA
uto
corr
ela
tion
112 114 11656 0 0 048 0 1 340 11 16 832 41 42 2424 201 6 516 42 0 0
K=-12
Illustration of how results change as K is varied400 runs
Tuning the Technique (n=8)
112 114 11648 0 0 040 3 2 232 19 10 624 45 2 016 11 0 0
K=-14
Non-linearity
Au
tocorr
ela
tion
112 114 11648 0 1 040 3 2 332 15 8 224 53 2 016 11 0 0
K=-12
112 114 11648 0 0 140 2 5 232 15 9 324 51 2 016 10 0 0
K=-10
112 114 11648 0 0 040 2 2 132 11 12 624 44 3 116 18 0 0
K=-8
112 114 11648 0 0 140 0 5 1332 2 27 4224 0 0 1016 0 0 0
K=-6
112 114 11648 0 3 240 3 19 2032 5 17 2724 1 1 216 0 0 0
K=-4
112 114 11648 0 8 540 6 32 1532 3 16 1524 0 0 016 0 0 0
K=-2
112 114 11648 5 12 140 12 43 132 6 19 124 0 0 016 0 0 0
K=0
Further illustration of how results change as K is varied. 100 Runs
Comparison of Results
4 5 6 7 8 9 10 11 124 12 26 56 118 244 494 1000 20144 12 26 56 116 240 492 992 20104 12 26 56 116 240 480 992 19844 12 26 56 116 236 484 980 19764 12 26 56 116 238 484 984 1990
MethodLeast Upper BoundBest Known ExampleBent ConcatenationGenetic Algorithms
Our Simulated Annealing
Summary and Conclusions Have shown that local search can be used
effectively for a cryptographic non-linear optimisation problem - Boolean Function Design.
‘Direct’ cost functions not necessarily best. Cost function is a means to an end.
Whatever works will do. Cost function efficacy depends on problem, problem
parameters, and the search technique used. You can take short cuts with annealing parameters
(and computationally there may be little choice) Experimentation is highly beneficial
should look to engaging theory more?
Uses and Abuses Can use optimisation to maximise the non-linearity,
minimise autocorrelation elements etc. These are publicly recognised good properties that
we might wish to demonstrate to others. From an optimisation point of view one way of satisfying
these requirements is as good as another. But for a malicious designer this may not be the case.
Who says that optimisation has to be used honestly????
What’s to stop me creating Boolean functions or S-boxes with good public properties but with hidden (unknown) properties?
Planting Trapdoors Can use these techniques to generate cryptographic
elements with good public properties using an honest fitness function honestFit(x)
But also can try to hide useful (but privately known) properties using a malicious fitness function trapFit(x)
Now take combination and do both at the same time
Want as low as you can get away with for the next N years! The result must still possess the required good properties.
)()1()()( xtrapFitxhonestFitxfitness
Planting TrapdoorsPublicly good solutions with high trapdoor bias found by annealing and combined honest and trapdoor cost functions.
Publicly good solutions, e.g. Boolean functions with same very high non-linearity
Publicly good solutions found by annealing and honest cost function
There appears nothing to distinguish the sets of solutions obtained – unless you know what form the trapdoor takes!
Or is there…
Vector Representations
+1
-1
+1
+1
-1
+1
-1
-1
Different cost functions may give similar goodness results but may do so in radically different ways.
Results using honest and dishonest cost functions cluster in different parts of the design space
Basically distinguish using discriminant analysis.
If you don’t have an alternative hypothesis then you can generate a family of honest results and ask how probable the offered one is.
Evolving Protocols
Recent IEEE S&P Oakland paper using genetic algorithms to evolve abstract protocols (with proofs!).
Fitness function is based on number of stated goals met at each message.
Random bits strings can be decoded as protocols expressed in BAN-logic formalism and executed.
When a receiver gets a message he uses BAN inference rules to update his belief state according to what he knows already and what is in the message. this is a form of abstract execution
Future Work
Integrating quantum search and traditional optimisation: At its simplest let QS find a good starting point and then use
traditional techniques to hill climb. Others possible. Statistical profiling of traditional optimisation techniques –
potentially a very rich seam to mine (both in analysis and design).
Future Work Opportunities for expansion:
detailed variation of parameters use of more efficient annealing processes (e.g.
thermostatistical annealing). evolution of artefacts with hidden properties (you do
not need to be honest - e.g. develop S-Boxes with hidden trapdoors)
experiment with different cost function families multiple criteria etc.
evolve sets of Boolean functions other local techniques (e.g. tabu search, TS)
more generally, when do GAs, SA, TS work best?