backpacking in a world of secrets - home - amsi vacation...

28
Backpacking in a World of Secrets Adam Hamilton Supervised by Professor Matthew Roughan The University of Adelaide Vacation Research Scholarships are funded jointly by the Department of Education and Training and the Australian Mathematical Sciences Institute.

Upload: vanthuy

Post on 04-Jun-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

Backpacking in a World of Secrets

Adam Hamilton

Supervised by Professor Matthew Roughan

The University of Adelaide

Vacation Research Scholarships are funded jointly by the Department of Education and Training and the Australian Mathematical Sciences Institute.

Contents

1 Introduction 3

2 Background and Related Work 32.1 What is The Knapsack Problem . . . . . . . . . . . . . . . . . . . . . . . 32.2 What is a hard problem? . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.3 Secure Multiparty Computation (SMC) . . . . . . . . . . . . . . . . . . . 62.4 Public key cryptography and semantic security . . . . . . . . . . . . . . . 92.5 The Paillier Cryptosystem . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.6 Genetic Algorithms and Heuristics . . . . . . . . . . . . . . . . . . . . . . 102.7 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3 What We’re Doing 133.1 Symbiotic Genetic algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 133.2 Paillier encryption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153.3 A solution to Yao’s millionaire problem . . . . . . . . . . . . . . . . . . . 163.4 An oblivious transfer solution to the millionaire’s problem . . . . . . . . . 173.5 A genetic algorithm with round by round encryption . . . . . . . . . . . . 183.6 complexity and run-times . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

4 Experimental results 224.1 efficiency of the genetic algorithm . . . . . . . . . . . . . . . . . . . . . . . 224.2 accuracy of the genetic algorithm . . . . . . . . . . . . . . . . . . . . . . . 244.3 effects of encryption on the efficiency of the genetic algorithm . . . . . . . 25

5 Future work 25

6 appendices 266.1 proof the knapsack problem is NP complete . . . . . . . . . . . . . . . . . 266.2 Types of security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266.3 Carmichael’s theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276.4 Stopping criteria in the Genetic Algorithm . . . . . . . . . . . . . . . . . . 28

References 28

2

1 Introduction

This paper focusses on the problems faced by two parties called Alice and Bob. Bothof these parties are in possession of a series of items each of which have an associatedweights and values. The goal of both parties is to fill a container with these items insuch a way that the combined weight does not exceed some limit and that the value ismaximised. This is the two party case of a famous problem in combinatorial optimisationcalled the knapsack problem. The knapsack problem has been well studied and even inthe two party case well established methods of solving the problem should still be equallyapplicable. There are, however 2 catches. The first is that both parties wish to keep theirobjects a closely guarded secret and if they can’t be assured that their inputs will remainprivate they will be unwilling to disclose any information. The second catch is that theknapsack problem is a well known NP-hard problem. This means that any algorithmdesigned to accurately solve it will need to perform a number of operations that growsexponentially with the size of the problem’s inputs. This has the very inconvienient effectthat the knapsack problem quickly become intractable for large collections of objects.This paper uses a metaheuristic called the Genetic Algorithm to approximate the solutionto the knapsack problem. Protocols from the field of Secure Multiparty Computation areapplied to the subprotocols of the genetic alfgorithm so that the inputs of both partiescan remain private. An implementation of this protocol was coded using matlab. Thematlab code designed to solve the privacy preserving 2-party knapsack with a symbioticgenetic algorithm turned out to be incredibly computationally expensive. So expensivethat I wasn’t able to run it at all. I was still able to get results to demonstrate thatthe genetic algorithm did indeed work and that on the sizes of problems considered inthis paper the algorithm would, on average produce an answer with a value which wasapproximately 0.86 times the true optimal solution’s value. The protocols used to createthis algorithm are included in the paper.

2 Background and Related Work

Two mutually distrustful parties Alice and Bob are each in possesion of a collection ofitems, the details of which neither party is willing to disclose. Each object belonging toeither Alice of Bob has an associated weight and value. The task for Alice and Bob is,given a container with acertain capacity known to one or the other or both of the parties

2.1 What is The Knapsack Problem

Imagine that you’re at a restaurant, you only had a twenty in your pocket and you arevery hungry. You want to have the best meal that you can buy for $20 so what doyou do? do you spend $16 on you favourite thing and have $4 left over that won’t getspent, or do you buy two less tasty apetizers for $10 each and get a more filling meal.Problems like these where you have to maximise a given function (the quality of themeal) by selecting a collection of discrete objects (you can’t buy half of a meal) from a

3

given set under certain constraints are called knapsack problems and are part of the areaof combinatorial opimization are of great practical importance to applied mathematics.

The knapsack problem can be mathematically formulated by introducing a vector ofn binary variable xj(j = 1, 2, ..., n). Which have the meaning

xj =

{1 if the object j is selected

0 otherwise

}For each each of the binary variable xj assign two numbers vj and cj , these numbersare called the value and the cost of a variable. Our problem would be to select, fromalmong all binary vectors x satisfying the constrait

n∑i=1

ci × xi ≤ C

And maximising the objective function

n∑i=1

vi × xi

In our example of the restaurant the binary variables would be which meal you orderfrom the restaurant, the cost of each meal xj would be the cost of the meal and vj wouldbe the quality of the meal, which may be some function of the meal’s taste, quantity,nutrition etc. This problem need not be constrained to buying food at a restaurant. Ahuge number of problems in both pure and applied mathematics can turn out to just bedifferent ways of expressing the knapsack problem.

The drawback with the knapsack problem is that it what is called NP-complete. Themeaning of this result is discussed further in section 1.2 and a proof can be found inappendix 1 but it can esentially sumarised as ”If the number of objects in the knapsackproblem is particularly large then the entire problem is intractable”. As important asthese problems are mathematicians have no desire to spend months or even years solvingthese problems. It is for these reasons that we use heuristics to find approximate solu-tions to the knapsack problem. These methods have no guarantee of finding the optimalsolution but in the real world solutions don’t have to be perfect they just have to begood enough. Any reader not convinced of this should play chess against a computer.Even though the computer doesn’t play a perferct game of chess 1 (because computersaren’t that good yet) the number of times you would win would be exactly zero if youturn the difficulty up.

1there is a really cool theorem attributed to Zermelo that there is a perfect strategy in chess that isa strategy that will win the game regardless of the oponents moves.

4

2.2 What is a hard problem?

Some problems are harder than others 2. There are various problems that can only betackled with a powerful tool or huge amounts of free time. Some may argue at this pointthat hardness is surely just a matter of opinion and that some people have different setsof skills maing some things easier and other things harder. In this section I will definemathematically what it means for a problem to be considered hard, provide an overview of complexity theory and give examples of particular problems of varying complex-ity categories. The complexity of a problem is a mechanism for classifying computationalproblems based on the resources required to solve them. This resource may be the timetaken to completely solve the problem, the sorage space , number of processors etc., theclassification of the problem shouldn’t depend precisely on which resource is being con-sidered but should instead depend on the intrinsic difficulty of a problem. An problem’scomplexity is usually dependent on the most efficient (fastest) method of performing analgorithm designed to solve the problem. The most efficient method of problem solvingis dependent on the size of the input (for example longer passwords are more difficult tobreak.)

Generally the computational complexity of a problem is expressed in what’s called”bigO” notation. In this notation we calculate (or at the least approximate) the numberof operations the computer needs to perform in order to solve the problem as a functionof the size of the input. This is called the complexity function. The order of magnitudeof the complexity is just the term of the term of he complexity function which growsthe fastest as n gets larger. For example if the compelxity function of a given algorithmwas 4n3 + 3 + 4/n then we woul say that the algorithm’s complexity is of the order nand is expressed O(n2). Whilst the this expression does not tell you much about theactual run time of the algorithm but this notation is useful for getting a feel for howthe problem’s complexity grows as the input gets larger. For example with an algorithmof complexity O(2n) were you to provide an input of n bits and another input of n+1bits then the algorithm would take twice as long to solve the instance with n+! bits.It is easy to see that if you keep multiplying a number by two the result would growincredibly large incredinbly quickly. For example if the universe is closed 3 then the totallife time of the universe in seconds is approximately 261 if an algorithm of O(2n) tookone second to complete the algorithm on an inout of one bit, then if the input’s size wereto be increased to 60 bits 4 then the algorithm would not only fail to return an outputin your lifetime but also your children’s, grand children’s lives or in fact the lifetime ofany other human being who has ever been or will ever be born. Hopefully the readercan appreciate that some problems, whilst they may be computable would not be worthcomputing directly.

2citation needed3This is a big if. I do not plan on turning this into a cosmology article4For comparsion the size of a single number in floating point form in matlab is approximately 64 bits.

Any computer scientist would consider 60 bits to be a ridiculously small ammount of information

5

Algorithms like the ones described before are put into groups called complexity classesbased on their complexity functions. Figure 1 shows a diagram containing the moreimportant complexity classes and their presumed relationships. It can be assumed thatthe bottom of the diagram contains the more solvable algorithms i.e polynomial. At thebottom we have P, the class of problems that can be solved in polynomial time. That isany problem in this class is a problem of O(nk), where k is a real number. Above P isthe class NP these have the more complicated definition of ”a problem is NP if it can besolved in polynomial time on a nondeterministic Turing machine”. Explaining conceptslike non-deterministic turing machines is little beyond the scope of this report 5 so I willsumiarise this by saying a problem is in NP then a correct answer can be confirmedin polynomial time. For example when it comes to factorising a number into its primefactors, given a list of prime numbers and one composite number it can be confiremedthat the list of primes is the factorisation of the integer by simply multiplying themtogether, which can be done in polynomial time. It is true that P ⊆NP since if aproblem can be solved in polynomial time it can be checked in polynomial time by justsolving the problem again. It has never been proven whether P=NP but just abouteverbody thinks this isn’t the case.6 Above the dotted line is the class of problemsNP-complete. This class refers to a set of specific problems in NP that can be provento be as difficult as any problem in this class, for more detailed definition on what thismeans see appendix 1. Beyond this is EXPTIME, the class of all problems which canbe solved in exponential time. Some problems in this category hve been proven to notbe in NP So while we don not know for sure that P 6= NP we know for sure thatNP 6= EXPTIME.

2.3 Secure Multiparty Computation (SMC)

Secure multi-party computation refers to computations between two or more partieswhere each party i is in possession of some input xi the details of which they wish tokeep secret. On top of this these parties also wish to compute some set of (possibly iden-tical) functions f1(x1, x2, ..., xn), f2(x1, x2, ..., xn), ..., fn(x1, x2, ..., xn) such that party 1receives only f1(x1, x2, ..., xn), party 2 receives only f2(x1, x2, ..., xn) etc and neitherparty is able to infer any details about the other parties inputs. For example, in thefield of network engineering, providers would need to share network traffic informationwith one another in order to optimise network traffic. The simplest way to solve thisoptimization problem would be to give a single network operator complete control overeverything and access to all information pertaining to the networks. However, most peo-ple would not be ok with having their network information (phone calls, emails, internethistory, etc) being given to anybody, and are willing to sacrifice the efficiency of thenetwork in order to maintain some level of privacy. This is where secure multi-partycomputation comes in. Secure multi-party computation would allow these network op-erators to optimize the inter network traffic with minimal loss of information.

5for a detailed description of NP and turing machines take a look at6There is a million dollars up for grabs for whoever can prove or disprove this

6

Figure 1: complexity classes.

Another example would be two students who are both very competitive and insecure.After a particular test these two students, named Alice and Bob 7 have just received theirmarks and they are both curious to see who did better. The simplest way to comparetheir marks would be to just tell each other what grades they received 8. However ifAlice had aced the test with full marks and Bob had only managed to acheive one markout of ten then Bob would feel very silly in front of his rival. Another approach wouldbe to find some sort of trusted third party, tell this trusted third party their inputs,and have this trusted third party say who scored higher. There are two problems withthis particular approach. One is that the student might just want to stir things up andlie about which student scored higher. The other problem is that is that trusted thirdparties are almost impossible to come by and if Alice and Bob really cared about thesecrecy of their information they would have to kill this third party lest their test scoresbecome gossip. So now the problem requires that only Alice and Bob can hold on totheir private information, there can’t be any third parties because trustworthy peopleare presumed not to exist in the paranoid world of cryptography 9. As difficult as this

7Because in every problem in cryptography if there are two parties they are called Alice and Bob.8this problem is calld Yao’s millionaire problem and is one of the more canonical problems of secure

multi-party computation9cryptographers tend to make worst case assumptions. In a practical setting trusted third parties can

7

Figure 2: Alice. Figure 3: Bob .

initially sounds this problem can be easily solved with just some basic concepts andpublic key cryptography. two methods of solving this problem are given in section three.Whilst this problem may seem contrived it is invaluable in solving the two-party privacypreserving knapsack problem and in encrypting genetic algorithms in general.A question that naturally arises at this point is can any problem be computed securely orare there some problems that need both parties to give far more information than theywould feel comfortable with. IIn 1986 the first generic solution for SMC was proposedby Andrew Yao. It was a constant-round protocol for computing a two party functionin such a way that the inputs of each party was kept secret. This protocol works byexpressing the problem as a Boolean circuit, encrypting the wires corresponding to theinputs of a certain party and using oblivious transfer (also described in section 3) todistrbute the keys needed to evaluate f , where f is a polynomial-time function 10. Sincecomputers essentially work by using boolean algebra 11 Yao was able to prove that ifa protocol between two parties is computable then it is securely computable, with noparties learning anything other than their respective outputs.Because of Yao’s work we can say with certainty that a solution to the knapsack problemcan be securely computed between two parties. The straightforward solution would beto express an algorithm that could solve the knapsack problem, such as the branch and

and often do exist10described in 2.211find a source to link this to

8

bound, as a Boolean circuit and to garble this circuit. This method has two drawbacks.The first is that the knapsack problem is NP-complete and any algorithm which solvesthis for large input values would take a huge amount of time to solve. The second isthat Yao’s garbled circuits add a lot of overhead to the computation time, making someordinarily simple problems intractably time consuming. Provided Alice and Bob werecontent to only solve small instances of the two party knapsack problem they would beable to use this method. This is why in this project another method of secure multi-partycomputation of an heuristic is used.

2.4 Public key cryptography and semantic security

All of the secure multiparty computation protocols that will be employed in this projectrequire the use of a public key cryptosystem that has the property of semantic security.These systems work by relying on two seperate keys. The first key, called the public keyis published and can be viewed by anybody. This key is used to encrypt a message fromthe paintext space to the ciphertext space. The second key is called the private key iskept a closly guarded secret by one single party. This key is used to decrypt encryptedmessages. This means that Alice or indeed anybody can send encrypted messages toBob but only Bob is capable of decrypting these messages.

The most crucial part of public key cryptosystems is creating a method of encryptionsuch that it is incredibly difficult to deduce the method of decryption. For many ciphersand encryption techniques it is trivial to deduce the decryption algorithm from the en-cryption. Taking for examle a caeser shift cipher, the cipher encrypts each letter of theplaintext one at a time by shifting it a certain number of positions forward or backward.e.g if the cipher shifted each letter forward two spaces ”a” would be encrypted to ”c”,”p” would be encrypted to ”r” etc. It is easy to see that if encryption works by shiftingeach letter forward in the alphabet by two places then decryption obviously consists ofshfting the encrypted letters back two places. This is precisely the problem that publickey cryptography tries to avoid. So how is it possible to create such a cryptosystem?This is where the NP problems discussed in section 2.2 come in. Public key cryptosys-tems rely on these NP problems in order to make it very difficult to deduce the privatekey from the public key. For example the RSA cryptosystem 12 is based on the NPproblem of factorising integers. The RSA cipher is based on two prime numbers p andq. Part of the public key, which anybody can view is n = pq, the product of these twoprime factors. In order to decrypt the message it is necessary to know what these primenumbers are, hence the factorisation of integers. For sufficiently large primes p and q thetime needed to factor n would be astronomically large, whilst for people in possessionof the prime numbers it would not take much time at all to successfully encrypt anddecrypt messages. It is for this reason that the RSA cryptosystem is one of the mostcommonly used ciphers today.

12Writing about the RSA cryptosystem in detail could easily fill several books. Any reader who wantsto read more than the few lines Ive written about this cipher should read [] from the bibliography

9

Despite the popularity of RSA in the real world the SMC protocols used in this paperrequire another public key cryptosystem. This is because RSA is semantically insecure.Later on we will be performing oblivious transfer, a process that involves encryptingeither a one or a zero. If Alice and Bob were to use RSA, even though Alice doesn’tknow the private key she can still decrypt all of Bob’s messages. Imagine that Alicesaw that Bob had sent an ecrypted message of, say 45, Alice knows that Bob eitherencrypted a 0 or a 1. Since Alice knows Bob’s public key she can just encrypt 0 and 1herself and see which of the two is encrypted as 45. This would let her break Bob’s cipherin no time at all. Therefore something more is needed. A semantically secure cipherhas the property that not every piece of ciphertext is the same, that is if Alice were toencrypt 1 using the public key three seperate times there would be no guarantee thatthose encryptions would be the same. In this project we use the Paillier Cryptosystemwhich has proven to be semantically secure.

2.5 The Paillier Cryptosystem

The Paillier Cryptosystem is a modular, public key, semantically secure encryptionscheme, based on the decisonal composite residuosity assumption (DCRA). Informallythis is the assumption that given a composite number n and an integer z it is computa-tionally hard whether there exists an integer y such thatz ≡ yn mod( n2). However it iseasy to check whether a given number y satisfies the aforementioned condition. This isanother problem in the NP class which is used in a public key cryptosystem.

The encryption algorithm of the Paillier cryptosystem is a bijection from Zn × Z∗nto Z∗n2 . Here the plaintext space is the set of integere modulo n and the ciphertextspace is the multiplicative group of the integers modulo n2. It is easy to see that thesecond set contains far more elements than the first. This is remedied by assigning toevery plaintext message a randomly chosen integer r ∈ Z∗n. the two sets Zn × Z∗n andZ∗n2 have the same number of elements (proof omitted) and a bijection can be definedbetween the two sets. Using the private key and Carmichael’s Theorem (discussed furtherin the appendices) it is possible to remove this rendom term and recover the originalplaintext. Since the Paillier Cryptosystem has the property that every element of theplaintext space is mapped to a set of ciphertext elements the system is semanticallysecure provided the ciphertext space is large enough.

2.6 Genetic Algorithms and Heuristics

Genetic algorithms are a type of metaheuristic based on darwinian evolution. To bestget a handle to how Genetic algorithms work it is helpful to consider the analogy ofdrinking beer. Whenever a human drinks any alcoholic beverage it kills some of thedrinker’s brain cells. However, the brain cells that are first to die are usually the weak-est cells in the brain and since the dead cells were all of below average fitness the average

10

brain cell strength increases. Hence drinking alcohol makes you smarter.

Although not entirely true this thought experiment provides a good method of un-derstanding the principle behind darwinian evolution. Members of a population whoare not fit tend to die sooner and miss out on the oppurtunity to pass their genes ontosubsequent generations. This has the effect of causing each generation to be, on averagemore suited for survival than the previous. Genetic algorithms work in a similar man-ner. Instead of a population of living organisms they deal with a collection of binarystrings, each string encoding a particular solution to a given optimization problem. Theability to survive in the wild is replaced by the value of the objective function. Insteadof reproducing the solutions are combined together to incorporate parts of the parent’satributes. The algorithm works by encoding the solutions to the optimisation problemas a binary strings called chromosomes. For instance in the case of the knapsack prob-lem with n objects each solution can be expressed as a string of ones and zeros where aone in the ith place corresponds to the ith object in the knapsack being selected. Thealgorithm then generates a random sample of these solutions. Once this ”population”of solutions is created the value of the objective function associated with each chromo-some is calculated. To the solutions which have a high value of the objective functiona relatively large probability is associated. Solutions from the population are selectedaccording to this probabilities and are ”bred” together. The breeding process consists ofselecting two chromosomes and randomly selecting a crossover point where the crossoverwill happen. Each chromosome is split into two substrings at this crossover point forexample the string 10001011 would become 100 and 01011 the first substring of onechromosome is attached to the second substring of the other to create an offspring sefig 5. Once a number of offspring is created they replace the previous generation. Intheory the favouable traits of the two chromosomes are passed on to their offspring andthe weaker chromosomes in the population have little chance of being selected and lessfavourable characteristics have a high probability of being eradicated from the popula-tion. In order to increase the gentic diversity and hence the probability that the geneticalgorithm converges to the true optimum rather than some local otimum, some randomnoise or mutation is added to the offsprings.

2.7 Related Works

there has been little research into the field of privacy preserving genetic algorithms.The majority of my work is based on a paper written my my supervisor AssociateProfessor Matthew Roughan et al [?]. In this paper a symbiotic genetic algorithm isused to optimise shared traffic between 2 internet service providers. Most of the securemultiparty computation taken from a thesis by Wilko Henecka [?] a former PHD studentfrom the university of adelaide.

11

Figure 4: A man seeking to better his intellilect.

Figure 5: an example of the breeding process between solutions.

12

3 What We’re Doing

The purpose of this research project is to present a method of solving the privacy pre-serving, two party knapsack problem using a genetic algorithm. That is Alice and Bobboth have a list of objects each with an associated value and weight that they don’t wishto share with each other yet they wish to work out which items they should include in acontainer such that it is of a feasible weight and optimal value. Now some informationleakage is inevitable. For example if the genetic algorith were to converge on a solutionwhere Alice does not add any of her objects to the knapsack then it would be trivial toconclude that Bob’s collection of objects are more suited to the knapsack. Some factorssuch as the order of the problem and the capacity of the knapsack can be presumed tobe public knowledge and knowing these facts is of minimal consequence.

In the section is a description of the various components of this problem as well asan algorithm which combines some of these individual protocols into a single algorithmwhich securely solves the two party instance of the knapsack problem. After this protocolhas been we will go over the protocol piece by piece and attempt to quantify the amountof information leaked between parties as well as the complexities and expected run-timesassociated with this solution.

3.1 Symbiotic Genetic algorithms

The symbiotic genetic algorithm is an extension of the metaphor of evolution to includesymbiosis. In biology, the term is used to mean a mutually beneficial interaction betweentwo or more species. For example, the goby fish and the shrimp have a relationship wherethe shrimp digs a hole in the sea floor where both organisms can live and the fish repaysthe shrimp by warning it whenever predators are approching. In the field of optimisationthe genetic algorithm can incorporate a form of symbiosis. This works by decomposingthe problem into n subproblems each using n different species. By dividing this intosubproblems we can formulate solutions based on simple structures which combine tosolve a more complex problem. In this particlaur case the two species are the inputs byAlice and Bob.

The symbiotic genetic works in a manner very similar to the canonical genetic algo-rithm, the steps used are given below.

1. Alice and Bob both generate their initial populations of the same size. They agreeon the size of each population beforehand, which may require Alice to share theorder of the problem (how large it is) with Bob.

2. Every chromosome ai in Alice’s population is paired with a corresponding chro-mosome bi in Bob’s population. This will be called the population of joint chro-mosomes and each joint chromosome ai ∪ bi is reffered to as ji. Together Aliceand Bob calculate the fitnesses and weights of each of the joint chromosomes. Ifa particular chromosome is overweight then it is assigned a fitness of zero. If it

13

isn’t then the fitness of the joint chromosome is equal to the sum of the fitnessesof Alice’s and Bob’s part of the chromosome.

3. The fitness of each of the joint chromosomes are calculated and the joint chromo-somes are ranked according to their fitnesses. If it is the case that a particularjoint chromosome is overweight then it is assigned a fitness of zero. At the end ofthis process Alice and Bob should have a list of their joint chromosomes ranked inorder from the most fit to the least fit.

4. based on their respective ranks each joint chromosome is assigned a probability.This is done in such a way that joint chromosomes with a higher level of fitnessare given a larger probability. The method used in the implementation of thisalgorithm is if in a list of n chromosomes a joint chromsome is in the ith positionfrom the top of the list (it is the ith most fit chromsome) the it is assigned aprobability of 2(n − i)/(n2 + n). It is left as an excercise for the reader to verifythat if all n chromosomes in the population is assigned a probability using thismethod then the sum of all n of these probabilities is 1.

5. Out of the n chromosomes in the population 2n−4 joint chromosomes are selectedaccording to their rank based probabilities. This is so that the more fit chromo-somes are selected to pass on their genes to subsequent generations and increasethe probability that an optimal solution will be found.

6. From the joint chromosomes selected in the previous step select the first 2, de-noted by j1 and j2. These will be bred together using the crossover procedure.Alice and Bob take their parts of the joint chromosomes j1 and j2, obtaining the2 pairs of chromosomes (a1, a2) and (b1, b2) respecitively. Alice performs crossoveron her parts of the chromosome by taking the binary strings that represent a1and a2. Alice then randomly selects an integer i in the set 1, 2, .., l, where l isthe length of a1 and a2. Alice the takes the first i elements in the binary stringof a1 and the last l − i elements of a2. She then concantenates these 2 binarystrings to obtain an offspring a′1,2. Bob performs this same proces with his parts ofthe chromosome b1 and b2 to obtain an offspring b′1,2. Both Alice and Bob repeatthis process for the next 2 joint chromosomes selected in the previous step untilthe list has been exhausted and Alice and Bob are left with the sets of offspringsa′1,2, a

′3,4, a

′5,6, ..., a

′2n−3,2n−4 and b′1,2, b

′3,4, b

′5,6, ..., b

′2n−3,2n−4

for example

7. Once this new generation of chromosome parts is created it is mutated by selectingcertain bits in the chromosomes in the offspring generation with a very smallprobability and then randomly changing them. This adds an element of randomnoise to the algorithm, preventing it from getting stuck in a local extremum.

8. The 2 most fit chromosomes are retained and are added to the next generationunaltered. This is so that each generation contains a chromosome that is at least

14

as fit as the best chromosome from the previous generation. This practice, calledelitism is crucial for the genetic algorithm to eventually find the optimal solution.

9. Alice and Bob recombine their populations togethre ans perform steps 3-8 againfor a set number of generations or until some stopping criteria is met

3.2 Paillier encryption

This subsection contains the protocols involving the creation of the public and privatekeys used in Paillier encryption.

Key Generation

1. Begin by generating 2 large random prime numbers p and q. These primes shouldhave the property that gcd(pq, (p− 1), (q − 1)) = 1

2. Compute n = pq and the carmichael function of n, λ(n) = lcm((p− 1), (q − 1))

3. Select a random integer g ∈ Z∗n2 . That is select a random integer less than n2 suchthat gcd((g), (n)) = 1

4. Check to see whether the multiplicative inverse µ = (L(gλmod n2))−1mod n exists.If no such number exists then select another random integer g ∈ Z∗n2

5. the public key used for encryption is (n, g)

6. The private key, used for decryption is (µ, λ)

encryption

1. Let m be th message to be encrypted where m is an element f the the plaintextspace Zn

2. Select a random integer r from the multiplicative group Z∗n

3. Compute the ciphertext c = gmrnmod n2

decryption

1. let c represent the ciphertext where c ∈ Z∗n2

2. The plaintext of the corresponding ciphertext is calculated asm = L(cλ modn2)µmod n

15

3.3 A solution to Yao’s millionaire problem

This protocol was taken from Applied Cryptography by Schneier. In this protocol Aliceis in posession of a private input i and Bob is in posession of a private input j. It isknown that both i and j are natural numbers less than some upper bound. Without lossof generality we assume this upper bound to be 100. We also make the assumption thatBob has a secure public key crypto system with an encryption key made public to allAlice.

1. Alice begins by choosing a large random number, x, and encrypts it using Bob’spublic key to get the number c = E(x).

2. Alice computes c− i and sends this to Bob

3. Bob receives this and then computes the following 100 numbers

yu = D(c− i+ u) where u ranges from 1 to 100

Bob then chooses a large random prime p which is somewhat smaller than Al-ices random number x (It can be assumed that if Alice tells Bob the oreder of xthen no information of great importance is leaked)

Bob then uses this large prime to compute the following 100 numbers:

zu = (yu mod p)

Bob then makes sure that, for all u 6= v

|zu − zv| ≥ 2

and for all u

0 < zu < p− 1

If this is not the case then Bob randomly generates another prime number andtries again

4. Bob sends Alice this sequence of number is this exact order:

z1, z2, ..., zj , zj+1 + 1, zj+2 + 1, ..., z100 + 1, p

5. Alice takes this ordered sequence and checks whether the ith number is congruentto x mod p. If it is, she concludes that i ≤ j otherwise she concludes i > j

6. Alice tells Bob the conclusion

The purpose of all the checks Bob had to go through in step 3 was to ensure that allof the numbers in the sequence generated by step 4 were distinct. Had there been two

16

identical numbers in the sequence, Alice would’ve been able to establish both lower andupper bounds on Bob’s private input j.

The protocol suffers from a few drawbacks. The first is that the protocol is not particu-larly robust. If Alice selects too small an x value, or Bob selects too small a p value theentire protocol could be rendered unusable. The second drawback is that Alice learnsthe output before Bob and in the last step Bob relies entirely on Alice to tell the truthand let Bob know the conclusion. If Alice were feeling particularly malicious she shejust keep lie or keep silent, Bob therefore would not have a guarantee that his output iscorrect.

This process also has some advantages. The first is that this solution is Informationtheoretically secure (for a definition of this see the appendices) and that it is quite simpleto implement.

3.4 An oblivious transfer solution to the millionaire’s problem

Oblivious transfer is a cryptographic protocol between two parties that forms the basisof most secure multi-party computation. In oblivious transfer Bob is in possession of anarray of n values and Alice wishes to learn the ith value from this array. Alice, however,does not want Bob to know which of the n values she wishes to look at and Bob doesn’twant to share any of the values in his array besides the ith value that Alice wants. inorder to securely extract 1 value from an array of length n is called 1 of n oblivoustransfer, and the protocol is as follows:

1. Bob begins by generating n distinct encryption/decryption key pairs (Ku,K−1u )

where u ranges from 1 to n. He associates each of these n key pairs with anelement of his array and publishes the public keys Ku

2. Alice genrates her own encryption/decryption key pair (A,A−1), this she keeps asecret from Bob. She then selects the ith public key from Bob’s published list andencrypts her own key A using Bob’s public key

3. Bob then receives the value Ki(A) and calculates the n valuesK−11 (Ki(A)),K−12 (Ki(A)), ....,K−1i (Ki(A)),K−1i+1(Ki(A)), ...,K−1n (Ki(A))the ith of these values will be Alice’s encryption key A but since this was randomlygenerated it will be indistiguishable from any of the other numbers in the sequencegenerated by step 3.

4. Bob then takes all of the values generated in step 3 and uses them to encrypt theircorresponding element of his array. For instance the number xj will be encryptedas K−1j (Ki(A))(xj). Since K−1j is not the decryption key for (Ki(A)) the number

K−1j (Ki(A))(xj) would just be random nonsense that Alice would be unable todecrypt to get xj . Bob then sends this list to Alice.

5. Alice then selects the ith element of the list generated in step 4 and decrypts thiswith her decription key A−1. If the protocol is done correctly then the value xi

17

would be encrypted using Alice’s encryption key A and Alice would be able todecrypt this and find out the value of xi.

This 1 of n oblivious transfer can be used to solve the millionaire’s problem whereAlice and Bob both have their private inputs i and j respectively which are known tobe between zero and some upper bound n

1. Bob creates the binary vector (0,0,...,0,1,1,....,1,1) where the jth element and allelements thereafter are 1.

2. Alice uses 1 of n oblivious transfer to find the ith element of the array

3. if the ith element of this array is a 1 then Alice can conclude that i ≥ j otherwiseAlice knows that i < j

4. Alice the informs Bob of this result

This protocol is superior to the previous solution to the millionaire’s problem in thatit far less complicated and more easy and robust to implement. It too, however suffersfrom the same drawback in that Alice must be trusted to truthfully tell Bob the resultof the output.

3.5 A genetic algorithm with round by round encryption

In this version of the genetic algorithm each genreation is encrypted as opposed to theentire algorithm. In each generation Alice’s and Bob’s chromosomes are ranked in termsof fitness and assigned probabilities of reproduction. In this protocol we show how tosecurely rank these chromosomes based on fitness and thus learn the probabilities ofbeing selected. Our goal is therefore to design a two party functionality which takesas inputs Alice and Bob’s lists of chromosomes and outputs the probabilities associatedwith each of these chromosomes. This protocol can be easily implemented using oursolution to the millionaires problem and a description of the algorithm is as follows.

1. Alice and Bob both begin by generating their initial populations of the same size.They agree on the size of each population beforehand, which may require Aliceto share the order of the problem (how large it is) with Bob. It can be assumedthat both parties sharing a rough The details of these initial populations are keptprivate.

2. Every chromosome ai in Alice’s population is paired with a corresponding chromo-some bi in Bob’s population as was the case in the unencrypted symbiotic geneticalgorithm. This will be called the population of joint chromosomes and each jointchromosome ai ∪ bi is reffered to as ji

3. Each chromosome pair ji is checked to see whether it is overweight. This is done byAlice calculating the weight of her chromosome ai and bob calculating the weight

18

of his chromosome bi. If either of these chromosomes exceed the capacity of theknapsack by themselves the party that owns the of the chromosome notifies theother party and the chromosome pair is given a fitness of 0. If neither party’schromosome is overweight then both parties use the solution of Yao’s millionaire’sproblem to securely find which is larger out of weight(ai) and C−weight(bi), whereC is the capacity of the knpasack. If it is the case that C−weight(bi) > weight(ai)then C − weight(bi)− weight(ai) > 0 and the chromosome pair ji is of a healthyweight. The value of the joint chromosome ji is equal to value(ai) + value(bi). Ifit is the case that the joint chromosome is overweight, that is C − weight(bi) <weight(ai) then the chromosome is assignned a value of 0. This value is madeknown to both parties.

4. Each of the joint chromosomes ji let the chromosome’s fitness be denoted by fi.Using the protocol designed to solve the millionaire’s problem it is possible to rankeach of the joint chromsomes based on their fitness without revealing what thechromosomes’ fitnesses are. In order to compare the fitness of two joint chromo-somes jx and jy let value(ax) and value(ay) represent the fitness of the chromosomejx and jy respectively, available to Alice, and let value(bx) and value(by). In or-der to compare jx with jy all that needs to be done is to use the solution to themillionaire’s problem to ascertain which is larger out of value(ax) − value(ay) orvalue(by)− value(bx)

5. once every joint chromosome has been ranked The joint chromosomes are selectedwith probabilities based on their ranks. Which chromosomes are selected is publicknowledge. Alice and Bob then perform crossover and mutation on their parts ofthe joint chromosomes using the steps specified in (3.1)to obtain a new generationof chromosomes. Since these processes involve Alice and Bob acting on their ownprivate inputs they can both be done using the steps in (3.1) without any furtherencryption.

for example

6. Once these new populations have been obtained Alice and Bob repeat steps 2-5for a sufficient number of times or until some stopping criteria (see appendix 3) ismet.

3.6 complexity and run-times

Now that we have a genetic algorithm which leaks minimal information about the natureof the chromosomes. All that remains is to analyse the complexity of the algorithm todetermine how the algorithm will behave given arbitrarilly large inputs. There are severaldifferent inputs and parameters in this genetic algorithm which can effect the run timeof the algorithm. Many of these parameters such as the probability mass function andthe stopping criteria of the genetic algorithm would affect the run time of the algorithm

19

by changing the probability that the genetic algorithm reaches the optimum solutionin a shorter time. Other inputs and parameters such as the number of chromosomesin each generation, how many variables, and the number of variables in the knapsackproblem can change the algorithm’s run-time by forcing it to perform more operations.Since we are attempting to derive the complexity function of theis algorithm in Big-Onotation (i.e an overestimate) we will be looking at the worst case scenario and assumethat the genetic algorithm will have ”bad luck”. As such we will choose to ignore theparameters that would probabilistically affect the run-time of the genetic algorithm. Inthis analysis only five variables are analysed from the perspective of complexity theory.Whilst it would be more accurate to include the effects of things such as the type ofPsuedo-random Number Generator used and the method of generating prime numbersit would detract from the essence of the algorithm. In this section the comlexity of thealgorithm will be calculated in terms of the following six variables, each of which areassumd to be common knowledge (or at least to an approximation) to both Alice andBob.

1. G:The maximum number of generations the Genetic Algorithm goes through beforegiving up.

2. P : The number of chromosomes in each generation of the algorithm.

3. Va: The number of variables in the knapsack problem available to Alice

4. Vb: The number of variables in the knapsack problem available to Bob

5. R: An upper bound on the values that Alice’s and Bob’s part of the chromosomecan take.

6. S: The size of the primes used by Alice and Bob in the generation of their PaillierCryptosystems

The genetic algorithm is required to compute no more than G generations. Thismeans that every operation performed in each generation of the the genetic algorithm isperformed at least G times. This in turn means that the genetic algorithm with roundby round encryption has a complexity function of the form O(G∗f(P, V,R, S)) for somefunction f .

Each generation of the genetic algorithm begins with each party calculating theweights and fitnesses of the parts of the chromosomes available to them. The complexityof this process is a linear function of Va and Vb. The complexity of the secure geneticalgorithm

Each chromosome in the population is checked to see whether it is overweight. Thisrequire the algorithm to solve the millionaires P times in order to check each chromo-some’s weight.

Each chromosome is securely ranked using the mergesort algorithm, the quickestsorting algorithm that I was able to code. On a list of n elements the complexity of themergesort algorithm is of order nlog(n). Since chromosomes found to be overweight are

20

given a fitness of zero the algorithm will need to solve the millionaire’s problem at mostPlog(P ) times.

The complexity of the solution to the millionaire’s problem from [?] can be calculatedin terms of R and S. In this subprotocol Alice is required to perform 1 encryption usingBob’s public key and Bob is required to perform R decryptions using his private key.Bob is also required to select a prime number p which has certain properties mentionedin subsection 3.3. Since Bob is using a public key cryptosystem based on two primes pand q both of which are less than or equal to S the complexity of this solution is of orderO(e(S) + d(S)(R) + S)) where e(S) and d(S) are the complexity functions associatedwith one encryption and one decryption respectively, expressed in terms of S.

The solution to the millionaire’s problem using oblivious transfer is far more com-putationally intense. Bob is required to generate R key pairs and Alice is required togenerate one key pair. Alice has to perform one encryption using one of Bob’s publickeys, Bob has to perform R decryptions of alice’s encrypted message. Then Bob needs toperform R encryptions using encryptions keys derived from step 3 of the protocol.Alicethen has to perform one decryption. For reasons mentioned in subsection 2.4 Alice andBob must use to the Paillier cryptosystem which more complex than other public keycryptosystems.

On top of this Bob is obligated to use a set of much larger prime numbers in hiskey pairs than Alice. This is because in the first step of the protocol Alice encryptsher public key (n, g) using one of Bob’s public keys. Alice’s public and private keys arebased off of two large prime numbers p and q both of which are less than or equal toS. The first element of her public key n is equal to the product of the primes p andq and the other element of Alice’s public key is a randomly chosen element of Z∗n2 . Inorder for Alice to encrypt her public key Bob needs to create a cryptosystem with aplaintext space of at least n2. This means that Bob needs to use primes approximatelyas large as n and he needs to perform arithmetic on numbers as large as S4. Since n isapproxiately the size of S2 this means that the complexity of this particular solution isof order O(g(S) + e(S) +d(S) +R(g(S2) + e(S2) +d(S2))), where g(S) is the complexityfunction of the key generation protocol expressed in terms of S.

The complexities associated with the paillier cryptosystem can be calculated quiteeasily. The process of key generation require one computation of the carmichael functionof n one subtraction, one division and use of the Euclidean algorithm. since these stepsare far less complex than the encryption and decryption algorithms mentioned later theycomplexity of the key generation can be effectively ignored.

the encryption function is two instances of modular exponentiation and one of mod-ular multipliction. The modular exponenetiation’s complexity depends on randomlygenerated numbers but since the plaintext space of the algorithm is Z∗n, the encryptionfunction can be given an upper bound of O(S6)

The decryption function involves one modular exponentiation, one subtraction, onedivision, and on multiplication. Like the encyption function this can be ssigned the

21

upper bound of O(S6)

Since we’ve concluded that the solution to the millionaire’s problem ifrom [?] is moreefficient to implement than the solution using oblivious transfer we will be using thisin our algorithm. The total complexity of the entire symbiotic genetic algorithm withround by round encryption is O(GPlogPS6R)

In conclusion the encrypted genetic algorithm is of complexity class P, a polynomialtime algorithm. Whilst this result seems to come with the promise of higher efficiencythan a brute force solution readers should not make the mistake in thinking that it isfast in general. Complexities in big O notation focus only on the assymptotic behaviourof algorithms. For smaller input values the encrypted genetic algorithm is excessivelyslow as is mentioned in the next section.

4 Experimental results

In this section we look at an actual implementation of the insecure symbiotic geneticalgorithm used to solve the knapsack problem on sets of randomly generated problems.The secure version of the knapsack problem proved to be computationally intense toimplement. However I’ve been able to implement the encrypted merge sort algorithmwhich is also analysed in this section. Here every algorithm was coded using Matlab.Whilst it is widely known that Matlab is not the most efficient language for cryptographyit was the only language that I knew well enough to create any workable code.

1. how efficiently the genetic algorithm runs given a variety of inputs

2. how accurate the genetic algorithm is in solving the knapsack problem comparedto a non-heuristic method

3. whether the encrypted genetic algorithm slows the convergence of the genetic al-gorithm

4.1 efficiency of the genetic algorithm

The symbiotic genetic algorithm was allowed to run for a fixed number of generations.After each generation the most fit member of the population was recorded and plotted ina graph. In figure 6 an example of a genetic algorithm was given in where the viewer cansee that the fitness of the bet member of the population increases rapidly at first and thenlevels off as the GA has increasing difficulty improving the current best solution. Oncethe the GA has reached this point of levelling of we say it has reached convergence. Thenumber of generations the GA goes through before reaching convergence is called, ratherunimaginatively the number of generations before convergence is reached. In figure 7a 95% confidence interval of the number of generations until convergence is reached isplotted against the number of variables in a problem. For every number n from 2 to 44a symbiotic genetic algorithm was given 100 knapsack problems of n variables. The first

22

0 1000 2000 3000 4000 5000 6000 7000 8000960

970

980

990

1000

1010

1020

1030

1040

1050

1060

Figure 6: An example of a genetic algorithm solving the knapsack problem. Here thefitness of each generation’s fittest member is plotted against the number of generationsthat the GA has gone through. As can be expected the solution quickly increases beforelevelling off as the algorithm finds it increasingly harder to improve upon the currentbest solution.

23

0 5 10 15 20 25 30 35 40 45−1000

−500

0

500

1000

1500

2000

2500

3000

genera

tions

number of variables

Figure 7: A man seeking to better his intellilect.

thing that is noticable in this graph is that for low numbers of variables the graph is,with a few exceptions almost a straight line. This is because the number of variables isso small the algorithm can effectively guess the correct solution in the first generation.There are a few points on this line where the GA needed multiple generations In orderto oprimise the problem. This can be seen by the aparent spikes in the otherwisestraight line. Once the number of variables reaches aprroximately 15 the GA can nolonger accurately guess the solution. From this point on the number of generations untilconvergence Increases as one would expect. One of the more obvious fetures of thisgraph is the large amounts of random noise. This is owing to the intrinsically stochasticnature of the genetic algorith. Each number n of variables for the knapsack problem wasonly solved 100 different times. It would seem for a more accurate idea of the behaviourof the symbiotic GA it would require a much larger sample size.

4.2 accuracy of the genetic algorithm

Here a symbiotic genetic algorithm was allowed to run for a fixed number of generationson a series of randomly generated knapsack problems. Each of the randomly generatedknapsack problems were also solved using matlab’s inbuilt bintprog function, a processthat I knew would tell me the true optimal solution. In figure 8 the quotient of thetrue optimal solution was plotted against the number of variables in the problem. Likethe previous experiment each knapsack problem was solved 100 times for every numberof variables. Here we find that the heuristic starts off with a high level of accuracy,before quickly decreasing to approximately 75% accuracy and flctuating from that point

24

0 5 10 15 20 25 30 35 40 450.7

0.75

0.8

0.85

0.9

0.95

1

accura

y

number of variables

Figure 8: In this graph the mean accuracy of the symbiotic genetic algorithm is plottedagainst the number of variables used in a given problem.

onwards. Like the previous graph the is a large amount of random noise owing to thestochastic nature of the algorithm.

4.3 effects of encryption on the efficiency of the genetic algorithm

A very natural question to ask about the encrypted version of the symbiotic genetic algo-rithm is whether the encryption process has any effect on the efficiency of the symbioticGA. Although I was not able to produce any data to support this claim it is the casethat using the solution to the millionaire’s problem will not affect the number of genera-tions the GA needs in order to find a good enough optimal solution. This is because thesolution to the millionaire’s problem from [?] has guaranteed correctness. This meansthat given 2 numbers a and b within a certain range Schneier’s protocol will alwaystell with absolute certainty whether a < b or not. This means that the secure mergesort algorithm is guaranteed to produce list of elements in the correct order. Since theranking process, the only encrypted part of the symbiotic GA is guaranteed to producethe correct results, all steps thereafter will not affect the GA’s efficiency. Hence thesymbiotic GA with round by round encryption will converge to an optimal result in thesame number of generations as an unencrypted GA. It can, in fact be shown that if theencryption process no longer guarantees correctness of the sorting alorithm then the GAis not longer guaranteed to eventually find the true optimal solution (proof omitted).

5 Future work

Were I to be given more time and funding I would look into other ways to implementan encrypted genetic algorithm with less information leakage. It is possible to encryptall information in the genetic algorithm rather than just round by round encryption.

25

Not only would this system be more secure than the one proposed in this paper it mayalso work out to be more computationally efficient. A way to do this would be to use athreshold homomorphic cryptosystem or garbled circuits.

6 appendices

6.1 proof the knapsack problem is NP complete

Theorem: the knapsack problem is NP-complete

Proof: A decision problem L is said to be NP-complete if L∈ NP and for any de-cision problem Lx in NP Lx is polytime reducible to L, this is written as Lx ≤p L.A decision problem is polytime reducible to another if there exists an algorithm whichsolves L whch uses an algorithm that solves Lx as a subroutine and runs in polynomialtime if the algorithm which solves Lx does.

By Cook’s theorem any problem in NP can be reduced to an NP-complete problem. Inorder to show that the knapsack problem is NP-complete it is sufficient to show that aNP-complete problem is reducible to the decision problem based on the knapsack prob-lem. If the decision problem associated with the knapsack problem (whether a solutionabove a certain threshold exists) is solvable in NP-complete then the knapsack problem(what the solution actually is) is NP-hard. In this particular proof we start by showingthat the knapsack problem is NP and then we will show that the partition problem, aproblem known to be NP-complete, is reducible to the knapsack problem.The knapsack problem is NP. This is because the set S of items that are chosen can beverified by computing

∑(i∈S) ci and

∑(i∈S) vi both of which can be done in polynomial

time.

Secondly we show that the Partition problem (presumed to be NP-complete) can bepolynomially reduced to the Knapsack problem. It will be enough to show that thereexists a polynomial time reduction Q(.) such that Q(x) is a ’yes’ answer to the knap-sack problem iff x is a ’yes’ answer to the partition problem. Suppose we are givena1, a2, .., an for the partition problem, consider the following knapsack problem wherewi = ai, vi = ai for all i = 1, 2, .., n and let B = V = 1/2

∑ni=1 ai. This processes is

clearly takes polynomial time to complete.

Now it is easy to show that if X is a ’yes’ instance of the partition problem then Q(x)is a ’yes’ instance of the transformed knapsack problem.

6.2 Types of security

In this paper three types of security are mentioned. This this section the definitionsof these types of security are defined and the key differrences between these types ofsecurity are highlighted. The three types of security mentioned in this paper are

26

1. Computational security

2. Semantic security

3. Information theoretic security.

Computational security: roughly speaking computational security means that acryptosytem is secure provided that the adversary is computationally limited. For ex-ample the RSA cryptosystem can only be broken if the adversary could factor a largenumbers into its prime factors. This cannot be done without lage ammounts of compu-tational power and the RSA system is called somputationally secure.

semantic security: An adversary is given n plaintexts and an encryption of oneof those pieces of plaintext. The crypto systm is semantically secure if the adversarycannot guess with better probability than 1/n which of the n pieces of plaintext is thetranslation of the ciphertext.

information theoretic security: A crypotsystem is information theoretically se-cure if it cannot be broken even when the adversary has unlimted computing power.The adversary simply does not have enough information to break the encryption andthe system is called cryptanalytically unbreakable. This this the most powerful securitya cryptosystem can have.

6.3 Carmichael’s theorem

The Paillier cryptosystem is based on a very important result in number theory calledCarmichael’s theorem. For an arbitrary natural number n and for every integer a,coprime to n, the Carmichael function λ(n) is defined as the smallest positive integer msuch that am ≡ 1mod n. Carmichael’s theorem states that if n is the power of an oddprime, twice the power of an odd prime ,2 or 4 then λ(n) is equal to the euler totientfunction φ(n). If n is a power of 2 greater than 4 then λ(n) is equal to φ(n)/2

xj =

{φ(n) if n= 2,3,4,5,6,7,9,10,11,13,14,17,...

φ(n)/2 if n= 8,16,32,64,128,....

}For general natural numbers n, the fundamental theorem of arithmetic states that n

can be uniquely factorised n = pn11 p

n22 p

n33 ....p

ω(n)ω(n) where p1 < p2 < ... < pω(n) are primes

and ai > 0. The Carmichael functions of these numbers n is given by the equation.

λ(n) = lcm[λ(pa11 ), λ(pa22 ), ..., λ(pω(n)ω(n))]

This theorem (proof omitted) ensures correctness of the Paillier cryptosystem.

6.4 Stopping criteria in the Genetic Algorithm

In this paper the genetic algorithm is set to run over a fixed number of iterations. In thispaper 20,000 iterations was enough to acheive reliable approximations for any problem Icared to pose. For many problems the number of generations through which the genetic

27

algorithm should go to is not immediately obvious and most people would have betterthings to do with their time than wait for a genetic algorithm to take an excessiveamount to time attempting to improve an answer that is already correct. There existmethods of estimating upper bounds for certain problems but in the general case reliableestimates of the number of generations needed cannot be calculated. There are manystopping criteria for different genetic algorithms, many using esoteric techniques likeGenetic Learning Automata or fuzzy logic. The simplest stopping criteria, the one Iuse is to set a threshold value less than but not equal to one, the value used in myalgorithms is 0.999, however for greater precision higher values could be used. The ideaof the stopping criteria is to notice when the Genetic algorithm has run for enoughgenerations without any improvement to the solution. For my stopping criteria I takethe current solution, the most fit chromosome in the current generation. I then takethe number of generations in which this chromosome was also the fittest member ofthe population. I divide that number by the total number of generations run by thealgorithm up until this point. If this ratio is greater than my threshold of 0.999 but isless than 1 then I assume that the genetic algorithm has gone on for long enough andcan stop. I make sure that the ratio must be less than one to stop the algorithm fromstopping too early. It is possible that the genetic algorithm correctly guesses the trueoptimum solution in the first generation whereupon the stopping criteria will never bemet, but for large problems the chances of this are remote.

[3]

References

[1] Kazue Sako Semantic Security Springer 2011

[2] Sennur Ulukus Information Theoretic Security lecture proceedings from the Univer-sity of Maryland, April 2012

[3] mathematical programming University of Cornell, mathematical programming, lec-ture 25, november 2014

[4] www.bigocheatsheet.com

[5] Michael O’Keefe Thre Paillier Cryptosystem The college of New Jersey, 2008

[6] Bruce Schneier Applied Cryptography John Wiley & Sons, 1999

[7] Wilko Henecka Network Management in a World of Secrets PHD thesis, Universityof Adelaide Mathematical Sciences Department, 2015

[8] Silvano Martello, Paolo Toth title publisher date

28