search problems and search techniques suranga hettiarachchi computer science department university...

Post on 23-Dec-2015

216 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Search Problems and Search Techniques

Suranga Hettiarachchi

Computer Science Department

University of WyomingGraduate Student Symposium, Spring 2004

Outline

We will investigate problems where a path is the solution, and problems where the goal state is the solution.

We will use traditional search (depth-first, breadth-first, and A*) as well as newer techniques (evolutionary algorithms) to solve these problems.

Terminology

Search Space: The set of possible candidate solutions (all the problems except path problems).

State: snapshot of data at a particular time. A node/vertex is the representation of a state in a search tree

Operator: transition from one state to another by modifying the data. A directed edge/arrow is the representation of a state-to-state transition (i.e., an operator or action having been applied) in a search tree.

Search strategy: the order in which you will traverse the states in the search space by choosing the operator(s).

A solution: either a goal state or a path from the start state to the goal state

Pruning: Eliminating states along the search. Done when it’s determined that it’s pointless to search further on a certain path. Symbol ‘// ‘ represents pruning in this lecture.

Taxonomy of Problems

Path problems• Missionary/Cannibal• 8 Puzzle

Goal state problems1. Combination problems

1. 0/1 Knapsack [Optimization problem]2. Boolean Satisfiability [Decision problem]

2. Permutation problems1. N Queen [Decision problem]2. TSP [Optimization problem]

Path problems

Missionary/Cannibal

8 Puzzle

Note: Consider search space as all the states in the

search tree.

Missionary-Cannibal Problem

Three missionaries and three cannibals come to a river. There is a boat on their side of the river that can be used by either one or two persons. How should they use this boat to cross the river in such a way that cannibals never outnumber missionaries on either side of the river?

Characteristics of the Problem

Solution to this problem is a path in the search tree.

We know the start state and the goal state

Search space is not extremely large

Missionary-Cannibal Problem Representation

Represent a state with a 3 element structure.

|Search Space| = 4 X 4 X 2 = 32

Start State : {3, 3, L} Goal State : {0, 0, R}[numbers represent the number of missionaries and

cannibals on the left side of the river]

Side of the boat

Number of cannibals

Number of missionaries

{0,1,2,3} {0,1,2,3} {L, R}Possible values

Missionary-Cannibal ProblemSolution

{0,0,R}

goal

{1,1,L}{0,1,R}{0,3,L}

{0,2,R}{2,2,L}{1,1,R}{3,1,L}

{3,0,R}{3,2,L}{2,2,R}{3,3,L} start

Missionary-Cannibal ProblemSearch Strategy

Operators :Move 2 missionaries, if there is 2 or more

missionaries.Move 2 cannibals, if there is 2 or more cannibalsMove 1 missionary, if there is 1 or more

missionaries.Move 1 cannibal, if there is 1 or more cannibals.Move 1 missionary and 1 cannibal, if there is 1 or

more of both. Location of the boat always alternates between

Left and Right, and Boat does not move by itself.

Missionary-Cannibal ProblemPartial Search Tree

{3,3,L}

{1,3,R} {3,1,R} {2,3,R} {3,2,R} {2,2,R}

// //

Prune if there are more cannibals thanmissionaries, but do not prune if there areno missionaries

{3,3,L} {3,2,L} {3,2,L} {2,3,L} {3,3,L}

////

\\

prune repeating sates

{1,2,R} {3,0,R} {2,2,R} {2,2,L} {2,1,R}

\\\\\\

Missionary-Cannibal Problem – BFS Solutions

PATH 0 (3,3,L)(3,1,R)(3,2,L)(3,0,R)(3,1,L)(1,1,R)(2,2,L) (0,2,R)(0,3,L)(0,1,R)(1,1,L)(0,0,R)

PATH 1 (3,3,L)(3,1,R)(3,2,L)(3,0,R)(3,1,L)(1,1,R)(2,2,L) (0,2,R)(0,3,L)(0,1,R)(0,2,L)(0,0,R)

PATH 2 (3,3,L)(2,2,R)(3,2,L)(3,0,R)(3,1,L)(1,1,R)(2,2,L) (0,2,R)(0,3,L)(0,1,R)(1,1,L)(0,0,R)

PATH 3 (3,3,L)(2,2,R)(3,2,L)(3,0,R)(3,1,L)(1,1,R)(2,2,L) (0,2,R)(0,3,L)(0,1,R)(0,2,L)(0,0,R)

Application

Lessons Learned

Due to the simplicity of this problem, and the possibility of effective pruning, BFS is an acceptable search strategy.

BFS will give us the optimal solution path(s) for this problem.

Now let’s look at a harder problem.

Eight Puzzle Problem

Given a start state, how to get to the goal state?

(Start State) (Goal State) Represent the board in a 3X3 matrix and the

blank square with -1.

126

705

-134

-176

543

210

Characteristics of the Problem

The search space is reasonably large.

Best representation I could comeup with is a 3 X 3 matrix, so wecannot improve the representation for better performance.

There are two classes of states in this problem

A* algorithm is a better algorithm to solve this problem compared to DFS.

181440 181440

Two Classes of States

Eight Puzzle Problem

Represent the board in a 3X3 matrix or a 9 element vector.

Operators :Move blank upMove blank downMove blank leftMove blank rightSize of the Search Space = 9! = 362880

Structure of A* for 8-Puzzle

Idea : Avoid expanding nodes that will be expensive.

Heuristic function : estimates the cost of the path from the current state to the closest goal state.

Maintain a Queue with nodes in a sorted order by cost.

Lower cost nodes kept in the front of the queue, and higher cost nodes are kept at the end of the queue.

A* Terminology

f(n): evaluation function of A*, estimated total cost of the cheapest solution path through node n, as a measure of the merit of node n. f(n) = g(n) + h(n)

g(n): path cost from start node to node n.

h(n): estimated cost of path from n to the closest goal node.

Heuristic Functions

Algorithm 1h1(n) = total Manhattan distance (i.e.,

number of squares from the desired location of each tile).

Algorithm2h2(n) = Number of tiles out of place.Algorithm3h3(n) = 0 : uniform cost search, regular

BFS.

h3<=h2<=h1<=h*(true minimal cost)

Results of Three Algorithms

Algorithm 1 : h1(n)Number of moves to goal : 22Nodes expanded : 245Time taken to find the goal : 0.07 seconds [average over 3

runs]

Algorithm 2 : h2(n)Number of moves to goal : 22Nodes expanded : 6126Time taken to find the goal : 27.23 seconds [average over

3 runs]

Algorithm 3 : h3(n)Number of moves to goal : 22Nodes expanded : 155507Time taken to find the goal : 3141.04 seconds [average

over 2 runs]

0.07 27.23

3141.04

0

500

1000

1500

2000

2500

3000

3500

Time (Seconds)

h(1) h(2) h(3)

Heuristic Function

Difference In Times For The Three Heuristics

Time for Heuristic

Eight Puzzle ProblemSolution Using A* - Algorithm1

0 1 2

3 4 56 7 -1 cost = 22

0 1 2

3 4 -16 7 5 cost = 22

0 1 -1

3 4 26 7 5 cost = 22

0 -1 1

3 4 26 7 5 cost = 22

-1 0 1

3 4 26 7 5 cost = 22

3 0 1

-1 4 26 7 5 cost = 22

3 0 1

4 -1 26 7 5 cost = 22

3 0 1

4 7 26 -1 5 cost = 22

3 0 1

4 7 26 5 -1 cost = 22

3 0 1

4 7 -16 5 2 cost = 22

3 0 -1

4 7 16 5 2 cost = 22

3 -1 0

4 7 16 5 2 cost =22

3 7 0

4 -1 16 5 2 cost =22

3 7 0

4 5 16 -1 2 cost = 20

3 7 0

4 5 16 2 -1 cost = 20

3 7 0

4 5 -16 2 1 cost = 20

3 7 -1

4 5 06 2 1 cost = 20

3 -1 7

4 5 06 2 1 cost =20

-1 3 7

4 5 06 2 1 cost = 20

4 3 7

-1 5 06 2 1 cost = 20

4 3 7

5 -1 06 2 1 cost = 20

4 3 7

5 0 -16 2 1 cost = 18

4 3 -1

5 0 76 2 1 cost = 0

Lessons Learned

A* is actually a better algorithm to solve this problem, if a good h(n) is used.

When h(n) = 0 (algorithm 3), A* becomes BFS because it expands and explores all the nodes.

Better heuristic improves both the time and space requirements.

A* always finds the optimal path given an admissible heuristic.

Combination Problems

Boolean Satisfiability [Decision Problem]

1. 0/1 Knapsack [Optimization Problem]

Boolean Satisfiability Problem

As an example, given an expression F(X) in conjunctive normal form, with three Boolean variables, find the truth assignment for each variable Xi for all i=1 through 3 such that F(x) = True.

~Xi is the negation of Xi

Expression may look like: (AND (OR X1 X2 ~X3) (OR ~X1 X3 ))

Variation of SAT 3-SAT Problem

3-SAT has the same form as SAT presented earlier, but each clause has the form (OR X1 X2 ~X3). The goal is to find any assignment to the Boolean variables such that expression evaluates to true. There are N Boolean variables and M clauses in the expression.

Characteristics of the Problem

All the variations of SAT could be represented the same way.

Number of clauses in the problem does not affect the representation because the representation is based on the number of variables in the expression.

We may or may not be able to satisfy the expression.

Boolean Satisfiability Problem Representation

A bit vector with Six elements (for example)

So there are 26 possible candidate solutions, each element corresponds to a variable in the problem, so there are 2N possible candidate solutions in the general problem.

Start State vector, unknown elements

Candidate State each element contains 1 or 0

Operator: Assign truth value 1 or 0 to a variable

??????

110110

Boolean Satisfiability Problem Search Strategy

( )

( 1 ) ( 0 )

( 11 ) ( 10 ) ( 01 ) ( 00 )

( 111 ) ( 110 ) ( 1 01) ( 100 ) ( 011 ) ( 010 ) ( 001 ) ( 000 )

Leaf nodes represents a set of candidate solutions. Number of candidate solutions in the tree = , where k is the number of variables.

Boolean Satisfiability Problem Results Using DFS

Number of Variables = 100Number of Clauses = 400{AND (OR -48 43 76 ) (OR 99 -56 67 ) (OR -99 -19 -42 ) (OR 11 -43 -97 ) (OR -77 14 -86

) (OR 90 -65 57 ) (OR 49 -73 -33 ) (OR 14 -86 -77 ) (OR -34 37 -82 ) (OR -77 91 -57 )

(OR -4 -51 -26 ) (OR 65 -16 -2 ) (OR -73 9 1 ) (OR -8 55 -92 ) (OR -15 -50 -19 ) (OR 15 99 -59 ) (OR -74 -93 1 ) (OR 8 -20 33 ) (OR 66 -22 -88 ) (OR 57 33 90 ) ………………..

……………….. (OR 53 93 20 ) (OR -62 99 80 ) (OR 56 -38 46 ) (OR -72 -86 -75 ) (OR -21 94 -33 ) (OR -89 -6 67 ) (OR -53 -67 83 ) (OR -32 -86 -14 ) (OR -33 3 31 ) (OR 14 -2 97 ) (OR -13 19 28 ) (OR 51 -52 49 ) (OR 87 -87 65 ) (OR 90 -3 -67 ) (OR 96 88 -6 ) (OR 47 7 -29 ) (OR 88 31 -25 ) (OR 85 -14 -47 ) (OR 36 -4 -47 ) (OR 56 -88 33 ) (OR 18 -62 33 ) }

Best results achieved1 1 0 0 0 0 0 0 0 0 1 1 1 0 0 0 1 0 0 0 1 0 1 1 0 1 1 1 0 0 1

0 0 0 0 1 1 1 0 1 0 0 1 1 0 0 1 1 0 0 0 1 1 0 0 0 0 0 0 1 1 1 0 1 1 0 1 1 0 1 0 1 1 1 0 0 1 1 0 1 1 0 0 0 1 1 1 1 1 0 1 0 1 1 0 1 0 1 1 0

Number of Clauses Satisfied = 370 [3600 seconds]

Pruning Strategy for SAT

Any one of the states in our search tree may be a candidate solution, because just a single variable occurring in all of the clauses may satisfy the expression. Every state should be evaluated and decided upon whether to continue in that path or not.

Also, a clause that evaluates to false with any state of the search tree will not satisfy the expression hence we prune the search.

Lessons Learned

DFS is a better search strategy for this problem than BFS, because we are only looking for a single solution, which could be any of the states.

However, it didn’t solve this particular problem – the search space is too large.

Let’s look at an alternative approach.

EA Philosophy

Candidate Solutions in Traditional Search(Individual)

Populationfitnes

s

Search Space

EA Terminology

Representation: an individual, typically a candidate solution in traditional search.

Initial Population: set of individuals, a fixed population size

Evaluation Function: evaluate the fitness of an individual for survival in the environment.

Selection Procedure: who can produce offspring.

Reproduction: generation of offspring from selected parents. Apply variation operators like crossover and mutation.

Unlike DFS and BFS, EAs are not exhaustive search – they are sound but not complete.

Outline of an EA

Set up initial population of candidate solutions.

Evaluate initial population.Generate successive generations of

populations by:Parent selection;Generate children via:

Crossover (recombination);Mutation;

Evaluate fitness of children;Decide which individuals survive;

Repeat generations until termination criterion satisfied.

genetic operators

Boolean Satisfiability Problem

Assume a SAT with 100 variables and 400 clauses.An individual is a bit vector with 100 elements, each

element is a 1 or a 0.Initial Population contains 300 individuals with randomly

allocated bit values.Evaluation function is to count how many clauses are

True. Maximize the number of True clauses, because making all the clauses True will make F(x) = True.

Selection procedure: modified (μ,λ) method. Ex: generate k new individuals from each μi,so then chose the fittest μ individuals from λ.

Reproduction could be accomplished by performing 1-point or N-point crossover on two (or more) randomly chosen parents. Occasional mutation or flipping a randomly chosen bit with a very small probability may generate better individuals.

Results for 3-SAT from EA

Number of Variables = 100Number of Clauses = 400Partial Expression

{AND (OR -48 43 76 ) (OR 99 -56 67 ) (OR -99 -19 -42 ) (OR 11 -43 -97 ) (OR -77 14 -86 ) (OR 90 -65 57 ) (OR 49 -73 -33 ) (OR 14 -86 -77 ) (OR -34 37 -82 ) (OR -77 91 -57 )

(OR -4 -51 -26 ) (OR 65 -16 -2 ) (OR -73 9 1 ) (OR -8 55 -92 ) (OR -15 -50 -19 ) (OR 15 99 -59 ) (OR -74 -93 1 ) (OR 8 -20 33 ) (OR 66 -22 -88 ) (OR 57 33 90 ) ………………..

……………….. (OR 53 93 20 ) (OR -62 99 80 ) (OR 56 -38 46 ) (OR -72 -86 -75 ) (OR -21 94 -33 ) (OR -89 -6 67 ) (OR -53 -67 83 ) (OR -32 -86 -14 ) (OR -33 3 31 ) (OR 14 -2 97 ) (OR -13 19 28 ) (OR 51 -52 49 ) (OR 87 -87 65 ) (OR 90 -3 -67 ) (OR 96 88 -6 ) (OR 47 7 -29 ) (OR 88 31 -25 ) (OR 85 -14 -47 ) (OR 36 -4 -47 ) (OR 56 -88 33 ) (OR 18 -62 33 ) }

Variable assignment: 1 1 1 0 0 0 0 1 1 0 1 1 1 1 1 0 1 1 0 0 0 0 1 1 0 1 0 1 0 1 0 0 0 0 1 1 1 1 1 0 0 1 0 1 0 0 0 0 1 1 0 0 1 1 0 1 1 1 0 1 1 0 1 1 1 0 0 0 1 1 0 1 1 0 0 1 0 1 1 1 1 1 1 1 1 0 0 1 0 1 1 0 0 0 1 0 1 1 1 1

Fitness : 400 [240 seconds][ 15 Times better than DFS]

Satga.exe

Observation of 3-SAT from EA

Satisfiability Based on Clauses/Variable Ratio

0

0.2

0.4

0.6

0.8

1

1.2

2.5 3 3.5 4 4.1 4.2 4.3 4.5 4.6

Ratio

sa

tis

fie

d/n

ot

sa

tis

fie

d

solut ion

0/1 Knapsack Problem

There are N objects and each object has a weight and a value. There is a maximum amount of weight the knapsack can hold. Find the set of objects that maximize their value, while staying below the weight constraint of the knapsack.

Characteristics of the Problem

Based on the number of objects in the problem, search space could become extremely large.

We need to know which objects we pick and which objects we don’t pick to maximize the value while staying below the weight limit.

As an example, consider the following.

0/1 Knapsack Problem

Assume there are 3 items and the maximum weight the knapsack can hold is 50.

Number of candidate solutions in n item general problem

Candidate solutions in our example =

12010060Value302010WeightObject3

Object2

Object1

0/1 Knapsack Problem – Search Tree

( ) Start state

( 10, 60 ) ( 0,0 )

object 1

( 30, 160 ) ( 10, 60 ) ( 20, 100 ) ( 0, 0 )

( 60, 280 ) ( 30, 160 ) ( 40, 180 ) ( 10, 60 ) ( 50, 220 ) ( 20, 100 ) ( 30, 120 ) ( 0, 0 )

object 2

Goal State

We can prune the paths that have the weights of a node exceeding the maximum weight a knapsack can hold. As an example (60,280) is not a solution to the problem. Since we are looking for an optimal solution we cannot prune the nodes with (0,0)

object 3

\\

0/1 Knapsack Problem – DFS Implementation Results

Number of objects = 20

Maximum weight of the Knapsack = 2.5 Objects that are picked (1) and not picked(0)0 0 1 1 0 0 1 0 1 0 1 1 0 1 0 0 1 0 1 0 Total Weight of the objects picked = 2.48 Total Value of the objects picked = 5.95

Number of objects = 100Maximum weight of the Knapsack = 12.5Best Results achievedTotal Weight of the objects picked = 12.4703 Total Value of the objects picked = 17.3843 [ 18000 seconds or 5hrs]

Data set (s) obtained from Dr. Terence Soule Department of Computer Science University of Idaho

http://www.cs.uidaho.edu/~tsoule/cs472/Knapsack.html

0/1 Knapsack Problem – DFS Implementation Results

Number of objects = 1000

Maximum weight of the Knapsack = 125.0Best Results achievedTotal Weight of the objects picked = 124.999Total Value of the objects picked = 134.414[18hrs]

Lessons Learned

For 0/1 Knapsack problem DFS is a better search strategy .

DFS will minimize the space requirement.

To find a candidate solution, we may need to reach the maximum depth of the search tree.

In larger search space, time requirement is very high.

0/1 Knapsack Problem with EAs

Assume a knapsack problem with 20 objects.An individual is a bit vector with 20 elements, each element

is a 1 or a 0.Initial Population contains 200 individuals with randomly

allocated bit values.Evaluation function: Individual with higher cumulative value

and cumulative weight of the objects which is still below the maximum weight of the knapsack has the a better chance to reproduce.

Selection procedure: modified (μ,λ) method. Ex: generate k new individuals from each μi,so then chose the fittest μ individuals from λ.

Reproduction could be accomplished by performing 1-point or N-point crossover on two randomly chosen parents. Occasional mutation or flipping a randomly chosen bit with a very small probability may generate better individuals.

Results

Number of objects = 20

Maximum weight of the Knapsack = 2.5 Objects that are picked (1) and not

picked(0)0 0 1 1 0 0 1 0 1 0 1 1 0 1 0 0 1 0 1 0

Total Weight of the objects picked = 2.48

Total Value of the objects picked = 5.95

(same as DFS)

Results

Number of objects = 100Maximum weight of the Knapsack = 12.5Objects that are picked (1) and not

1 0 1 0 1 0 1 0 1 0 0 0 0 0 1 0 0 1 1 0 1 1 1 0 0 1 1 1 1 0 1 0 0 1 1 0 0 1 1 0 0 0 1 1 1 1 0 0 1 0 0 0 1 0 0 1 0 1 1 0 0 0 1 1 1 0 0 0 1 0 1 1 1 1 0 1 0 1 0 1 1 1 1 0 0 0 1 0 1 1 1 1 1 0 0 1 1 0 0 0

Total Weight of the objects picked = 12.47 Total Value of the objects picked = 25.30 (compare with 17.38

with DFS)[1500 seconds or 25min, 12 Times better than DFS]

ResultsNumber of objects = 1000Maximum weight of the Knapsack = 125.0Objects that are picked (1) and not

0 1 0 0 1 1 0 1 0 1 0 0 0 0 0 1 0 1 0 0 0 0 0 1 1 1 0 1 1 0 0 1 1 1 0 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 0 0 0 0 0 0 1 0 1 0 1 0 0 1 1 1 0 0 0 0 1 1 0 0 0 0 1 1 0 0 0 1 1 1 0 0 1 1 0 1 0 0 0 1 1 1 1 1 1 0 1 1 0 1 0 1 0 1 0 1 1 1 0 1 0 0 1 0 1 0 1 0 0 1 0 1 0 0 0 1 0 1 0 1 0 0 1 1 1 0 1 1 1 0 1 1 0 1 0 0 0 0 0 1 0 0 1 0 1 1 0 1 1 1 1 0 1 0 0 1 1 0 0 0 1 1 1 0 1 0 0 1 0 0 1 0 0 1 1 0 0 0 0 1 0 1 1 1 0 1 0 1 0 0 1 0 0 0 1 1 0 1 1 0 0 1 1 1 1 1 1 0 0 0 1 1 1 0 0 0 0 1 0 0 1 0 1 0 1 1 0 1 1 1 0 1 0 0 0 1 0 0 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 1 0 1 0 1 0 1 0 0 1 1 0 0 1 1 0 0 0 0 1 0 1 1 0 0 0 1 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 0 1 0 0 1 1 1 0 1 0 1 1 0 1 1 1 1 1 0 0 1 1 1 0 0 1 1 0 1 0 1 1 1 0 1 1 1 1 1 0 1 0 1 0 0 0 1 1 0 1 1 0 0 0 1 0 0 0 0 1 0 1 0 0 0 0 0 0 1 1 0 1 0 1 0 0 1 0 1 0 1 0 1 0 0 0 1 0 1 1 1 1 0 0 0 0 1 1 0 0 0 0 1 0 0 1 0 0 0 0 1 1 1 1 1 0 0 0 1 0 1 0 0 0 0 1 1 0 1 1 1 0 1 0 1 0 0 1 0 0 1 0 0 1 1 0 0 0 0 1 1 0 1 1 0 0 0 0 1 0 1 1 1 1 1 0 1 1 1 0 1 0 1 0 0 1 0 0 0 0 0 1 1 0 0 0 1 1 0 0 1 0 1 0 0 0 0 0 1 1 0 1 1 1 0 1 1 0 1 0 1 0 1 0 0 0 0 0 0 0 1 0 0 0 1 1 1 1 0 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0 1 1 1 1 1 0 0 1 0 1 1 1 1 0 0 1 0 1 0 1 0 0 1 0 1 0 0 0 0 0 1 1 1 1 0 1 0 0 1 1 1 0 0 0 0 1 1 1 0 0 0 0 1 1 1 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 1 1 1 0 1 0 0 0 0 0 0 0 0 1 0 1 1 0 0 0 1 0 1 1 0 1 1 1 0 0 0 1 1 1 1 0 0 0 0 0 1 0 1 1 0 1 1 1 1 0 1 1 1 0 0 1 0 0 1 1 1 0 0 0 0 0 0 1 0 0 1 0 0 0 1 1 0 1 1 0 1 1 0 0 1 0 1 1 0 0 0 1 1 1 0 0 1 1 1 0 0 0 1 0 1 0 1 1 0 0 1 1 0 0 1 1 1 1 0 1 1 1 1 1 0 1 0 1 1 1 0 1 0 0 1 1 1 1 1 0 1 1 1 1 0 0 1 0 0 1 1 0 0 1 1 0 1 1 1 1 0 0 0 0 1 0 1 0 0 1 1 1 0 1 0 0 0 1 0 1 1 1 0 0 1 1 1 1 0 1 0 1 1 1 0 0 0 1 1 0 1 1 0 0 1 1 1 0 0 0 0 1 1 0 0 1 1 1 0 1 1 1 0 0 1 1 0 1 0 1 1 1 1 1 1 0 0 1 0 1 0 1 1 0 1 1 0 0 0 1 0 1 1 1 1 0 0 0 0 0 0 0 1 0 1 0 1 1 1 1 1 0 1 1 0 0 1 0 0 0 1 0 0 1 0 0 1 1 0 1 1 1 0 0 0 1 0 1 1 1 0 1 0 0 1 0 0 0 1 0 0 1 1 1 0 1 0 1 0 1 0 0 1 0 0 0 0 0 1 0 1 0 1 0 0 1 1 0 1 0 0 1 1 1 1 1 0 0 1 0 1 1 0 1 1 0 1 1 0 0 1 1 1 0 1 0 0 0 0 1

Total Weight of the objects picked = 124.82 Total Value of the objects picked = 257.06 [40min, 27 times better than DFS]

Lessons Learned

EA’s are easy to implement and produce results faster. Also EAs are not optimal or optimality is not provable.

Mutation is an insurance policy against premature loss of important subparts of a solution.

We have to be careful when deciding the size of the population; neither choosing extremely large population size nor choosing a very small population size will produce effective results.

It is difficult to avoid getting stuck in a local maxima; the only solution to this problem is a better selection strategy.

Permutation Problems

N Queen [Decision Problem]

1.TSP [Optimization Problem]

N Queens Problem

The N queens puzzle is the problem of putting N queens on an NxN chessboard such that none of them is able to capture any other. That is to say, no two queens should share the same row, column, or diagonal.

Characteristics of the Problem

The search space of the problem is very large, for large N.

Need to Explore the possibility of reducing the search space.

Smart representation may give us faster results.

Amount of space required to solve the problem could be extremely large, if all the states are expanded and explored.

Eight Queen Problem

To the right: board with all the eight queens on the board without any conflicts. This could also be shown in an eight element vector.

0 1 2 3 4 5 6 7

Index represents the rowElement represents the

column QQ

QQ

QQ

QQ

526 1

3704

Eight Queens Problem - Search space for 3 Different Representation

92 solutions

remove row conflicts

Remove column conflicts

remove diagonal conflicts

64 choose 8

8X8 board

88 = [16,777,216]

8 element vector –contains duplicates

8 element vector –no duplicates

8! = [40,320]

[4,426,165,368]

[ # of candidate solutions]

8 Queens ProblemDepth-first Search (DFS) Results

Results for the first implementation – 8 element vector with duplicates. Row conflicts are removed.

8 queensSolutions: 92Actual representation of a Solution in the output

file: Q E E E E E E E E E E E Q E E E E E E E E E E Q E E E E E Q E E E E Q E E E E E E E E E E E Q E E Q E E E E E E E E E Q E E E E

8 Queens Problem DFS Results

Results for the Second implementation – 8 element vector with no duplicates. Row and Column conflicts are removed.

Some Partial SolutionsBoard : [3 7 4 2 0 5 ] Queens :[1 6 ]Board : [3 7 4 2 0 5 1 ] Queens :[6 ]Board : [3 7 4 2 0 6 ] Queens :[1 5 ]Board : [3 7 4 2 0 6 1 ] Queens :[5 ]Board : [3 7 4 2 0 6 1 5 ] Queens :[ ] SolutionBoard : [3 7 4 2 5 ] Queens :[0 1 6 ]Board : [4 ] Queens :[0 1 2 3 5 6 7 ]Board : [4 0 ] Queens :[1 2 3 5 6 7 ]Board : [4 0 3 ] Queens :[1 2 5 6 7 ]Board : [4 0 3 5 ] Queens :[1 2 6 7 ]Board : [4 0 3 5 2 ] Queens :[1 6 7 ]Board : [4 0 3 5 7 ] Queens :[1 2 6 ]Board : [4 0 3 5 7 1 ] Queens :[2 6 ]Board : [4 0 3 5 7 1 6 ] Queens :[2 ]Board : [4 0 3 5 7 1 6 2 ] Queens :[ ] Solution

8 Queens Problem DFS Results

Results for Second implementation , Cont.

Some Solutions[ 6 4 2 0 5 7 1 3 ][ 7 1 3 0 6 4 2 5 ][ 7 1 4 2 0 6 3 5 ][ 7 2 0 5 1 4 6 3 ][ 7 3 0 2 5 1 6 4 ]Number of Solutions = 92

8 - 13 Queens Time Comparison on Two Previous Algorithms - Average Over 3 Runs

22791841233.72NA1

5

3655961

86.83NA1

4

737123

0.08103.5513

142005.231

7.1212

26800.993.3511

7240.20.6710

3520.050.229

920.010.048

SolutionsNo Row and Column ConflictsNo Row ConflictsQueens

Number of Queens Vs Algorithm Execution Time

050

100150200250300350400450500550600650700750800850900950

1000105011001150120012501300

8 9 10 11 12 13 14 15

Number of Queens

rowconflictsremoved

row andcolumnconflictsremoved

Number of Queens Vs Execution log(Time) for Two Algorithms

-3

-2

-1

0

1

2

3

4

8 9 10 11 12 13 14 15

Nimber of Queens

No Row Conflicts

No Raw and ColumnConflicts

Lessons Learned

DFS is a better algorithm to solve this problem because of less space requirement.

We could improve the performance of DFS based on the representation of a problem.

We could attempt to remove the symmetries as an another improvement.

It is impossible to come up with a representation that does not contain diagonal conflicts [then we don’t have a problem to solve].

With more than 30 queens, it is very hard for DFS to even come up with a single solution.

N Queens Problem with EAs

Assume N queens problem with 200 queens.An individual is a 200 element vector with natural numbers from

0 to 199. Initial Population contains 30 individuals with randomly

allocated integer values.Evaluation function would be to minimize the diagonal conflicts.

Since our implementation does not contain duplicate integers in the vector, there are no row or column conflicts.

Selection procedure: modified (μ,λ) method. Ex: generate k new individuals from each μi,so then chose the fittest μ individuals from λ.

Reproduction: you could only use mutation in N Queens problem, crossover is not possible since we should avoid duplicates to eliminate row/column conflicts.

Results for 200 Queens from EA

22 24 99 82 56 101 87 42 97 186 116 154 49 17 94 57 127 102 79 25 50 175 178 51 108 7 70 152 23 86 139 9 153 192 183 75 158 113 190 162 176 169 160 120 136 34 95 137 119 58 132 83 182 115 107 69 67 12 151 105 32 187 41 46 40 92 20 16 146 171 18 64 84 180 157 138 77 37 114 38 181 141 106 100 188 123 26 173 59 13 165 54 150 93 185 80 130 1 131 55 0 36 197 172 45 6 19 155 196 89 30 184 85 144 133 179 15 35 33 170 117 166 11 191 88 53 78 2 31 61 111 29 122 60 142 174 110 44 163 126 8 103 63 3 27 147 156 124 118 74 168 177 148 62 195 21 159 149 125 90 98 112 76 73 121 104 52 10 189 134 4 68 109 164 96 161 48 145 28 66 71 194 5 199 72 14 193 65 135 43 143 39 167 198 140 129 47 91 128 81

200,000 queens problem has been solved this way!Impossible with DFS!

Traveling Salesman Problem

There are N cities in a graph and there are roads from every city to every other city, which have costs associated with them. Find a “tour” that starts at some city, visits every other city once, ends back at the start city, and has the minimal cost.

Characteristics of Traveling Salesman Problem

A tour can be represented in two different ways for symmetrical TSP.

In symmetrical TSP for a pair of cities i, j cost (i, j) = cost (j, i)

Represent a tour in a N+1 element vector, last element to store the cost of the tour.

Candidate solutions are permutations of natural numbers from 1 to N.

Characteristics of Traveling Salesman Problem

Each number corresponds to a city to be visited in the sequence, so the search space = N!

In symmetrical TSP, shrink the search space by ½, so |s| = N! / 2

Since the tour could be the same regardless of the starting city, we can reduce the search space by N. So |s| = (N-1)! / 2

Traveling Salesman Problem Search Strategy

We implement the TSP similar to the second implementation of the N Queen problem.

Maintain Two vectors, one for the visited cities and one for the cities to be visited.

Visited [ 0 1 2 ] : To be visited [ 3 4 5 6 7] So we avoid producing duplicate cities in

the tour.The starting city does not change, so the

permutations always begin with the same city.

Traveling Salesman Problem Search Strategy

Either we use DFS or BFS, we have to do an exhaustive search.

We need to explore all the possible tours to find the minimum cost tour. Also all the solutions are at the leaf nodes at the maximum depth of the tree .

In DFS, we can prune partial tours, if the current partial tour cost already exceeds a previously found cost of a full tour.

Results for TSP Using DFS

Time is averaged over Three runs

Fifteen Cities

Results[ 0 5 1 14 2 6 3 8 11 12 4 9 13 10 7 ]best cost = 941Execution Time (hrs):0.6730560

Fourteen Cities

Results[ 0 5 8 3 2 6 13 12 4 9 1 10 11 7 ]best cost = 910Execution Time (hrs):0.1580560

Thirteen Cities

Results[ 0 5 6 2 4 9 10 1 12 11 3 8 7 ]best cost = 870Execution Time (hrs):0.0247222

Twelve Cities

Results[ 0 5 6 8 3 2 11 10 1 4 9 7 ]best cost = 837Execution Time (hrs):0.0072222

Eleven Cities

Results[ 0 5 7 10 1 6 9 4 2 3 8 ]best cost = 825Execution Time (hrs):0.002500

Ten Cities

Results[ 0 5 1 9 4 6 2 3 8 7 ]best cost = 718Execution Time (hrs):0.0003703

Nine Cities

Results[ 0 1 2 3 8 4 7 6 5 ]best cost = 631Execution Time (hrs):0.0002777

TSP Time Analysis for 9-15 Cities Using DFS

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

9 10 11 12 13 14 15

Cities

Tim

e(hr

s)

TSP Time

Lessons Learned

Time taken to solve the problem grows exponentially with the number of cities.

DFS is chosen to solve this problem to minimize the space requirement.

To find a complete tour, algorithm has to reach the maximum depth of the search tree.

It is extremely difficult to solve this problem for 29 city TSP - the best path cost I got is 2790.

TSP with EAs Assume a TSP with 29 cities.An individual is a 29 element vector with natural

numbers from 0 to 28.Initial population contains 200 candidate tours that

are randomly initialized with natural numbers from 0 to 28.

Evaluation function is to minimize the cost of the tour. Only the tours with lower costs are fit to survive in the environment.

Selection procedure: modified (μ,λ) method. Ex: generate k new individuals from each μi,so then chose the fittest μ individuals from λ

Reproduction: you have to use mutation in TSP, crossover is not possible since we should avoid duplicate cities in the tour.

Results

Best results found so far for 29 city TSP using an EA.

Tour0 27 5 11 8 25 2 28 4 20 1 19 9 3 14 17 13 16 21 10 18 24 6 22 7

26 15 12 23 0

Cost of the tour 1610 (this is best known)

This was impossible with DFS that we discussed earlier.

END

Acknowledgement

Thanks to Dr. William Spears and Dr. Diana Spears for all the help. I appreciate all their guidance and advice. Thanks to Dr. Thomas Bailey for answering my questions and all his help through out this project. Also like to thank Ms. Nadezda Kuzmina for fixing the bug in N Queens program and Mr. Dimitri Zarzhitsky for helping me with this presentation.

Estimated total Cost Vs True Minimal Cost

h*(n) = the true minimal cost to goal node from n.

f*(n) : true minimal total cost of the cheapest solution path through node n.

g*(n) : true path cost from start node to node n. so g*(n) = g(n).

h(n) is an admissible heuristic if

A* with an Admissible Heuristic is Optimal

sG

n’

G’

Proof (by contradiction): Suppose optimal path isthe thick one shown in darker green, and the path found by A* with an admissible h function is the thin one shownin green, which is longer, i.e., A* terminates at a suboptimalnode G’. Let the green path deviate from the optimal path atnode n. Let f* be the optimal path cost...

with permission of Prof. Spears

n

Formal Proof of A*’s Optimality

Lemma: f(n) <= f*. [proved earlier]

Proof: f(n) = g(n) + h(n) [by definition]

= g*(n) + h(n) [A* has found optimal path to n]

<= g*(n) + h*(n) [h is admissible]

= f*(n) [by definition]

= f* [f* is the same for every node on optimal path]

with permission of Prof. Spears

Proof Cont.

f(n) >= f(G’) [A* terminates at G’ rather than

expanding n]

f* >= f(n) [lemma]f* >= f(G’) [consequence of above two facts,

transitivity]f(G’) = g(G’) [because h(G’)=0]f* >= g(G’) [consequence of above two facts]Contradicts that G’ is sub-optimal!Conclusion: A* cannot terminate at G’.

with permission of Prof. Spears

top related