graph

of 11

ASSIGNMENT # 06

GRAPH SEARCH TECHNQUES

Course Title : Optimization techniques

Semester : FALL 2013

Instructor : Dr. Sohail Razzak

Student : Ahmad Masood

Reg. No. : FA13-R09-017

Date : 26 Dec 2013

Department of Electrical Engineering COMSATS Institute of Information Technology, Abbott bad

of 11

Graph

A graph is a representation of a set of objects where some pairs of objects are connected

by links. The interconnected objects are represented by mathematical abstractions called vertices,

and the links that connect some pairs of vertices are called edges. Typically, a graph is depicted

in diagrammatic form as a set of dots for the vertices, joined by lines or curves for the edges.

Graphs are one of the objects of study in discrete mathematics.

Graph Search

One of the most fundamental tasks on graphs is searching a graph by starting at some

vertex, or set of vertices, and visiting new vertices by crossing (out) edges until there is nothing

left to search. In such a search we need to be systematic to make sure that we visit all vertices

that we can reach and that we do not visit vertices multiple times. This will require recording

what vertices we have already visited so we don’t visit them again. Graph searching can be use

to determine various properties of graphs, such as whether the graph is connected or whether it is

bipartite, as well as various properties relating vertices, such as whether a vertex u is reachable

from v, or finding the shortest path between u and v.

Standard methods for searching graphs:

There are three standard methods for searching graphs: breadth first search (BFS), depth

first search (DFS), and priority first search. All these methods visit every vertex that is reachable

from a source, but the order in which they visit the vertices can differ. All search methods when

starting on a single source vertex generate a rooted search tree, either implicitly or explicitly.

This tree is a subset of the edges from the original graph. In particular a search always visits a

vertex v by entering from one of its neighbors u via an edge (u;v). This visit to v adds the edge

(u;v)to the tree. These edges form a tree (i.e., have no cycles) since no vertex is visited twice and

hence there will never be an edge that wraps around and visits a vertex that has already been

visited. We refer to the source vertex as the root of the tree.

Comparison b/w DFS & BFS:

Graph searching has played a very important role in the design of sequential algorithms,

but the approach can be problematic when trying to achieve good parallelism. Depth first search

(DFS) is inherently sequential. Because of this, one often uses other techniques in designing

good parallel algorithms. Breadth first search (BFS) can be parallelized effectively as long as the

http://en.wikipedia.org/wiki/Link_(geometry)

http://en.wikipedia.org/wiki/Vertex_(graph_theory)

http://en.wikipedia.org/wiki/Discrete_mathematics

of 11

graph is shallow (the longest shortest path from the source to any vertex is reasonably small). In

fact, the depth of the graph will show up in the bounds for span. Fortunately many real-world

graphs are shallow, but if we are concerned with worst-case behavior over any graph, then BFS

is also sequential.

Uninformed Search

An uninformed (Blind) search algorithm generates the search tree

without using any domain specific knowledge. If a state is not a goal , we

cannot tell how close to the goal it might be. Hence, all we can do is move

systematically between states until we stumble on a goal. In contrast,

informed ( heuristic ) search uses a guess on how close to the goal a state

might be.

Uninformed Search Methods

Breadth-First Search(BFS):

• Enqueue nodes on nodes in FIFO (first-in, first-out) order.

• Complete

Optimal (i.e., admissible) if all operators have the same cost. Otherwise, not optimal but

finds solution with shortest path length.

• Exponential time and space complexity, O(bd), where d is the depth of the solution and b

is the branching factor (i.e., number of children) at each node

• Will take a long time to find solutions with a large number of steps because must look at

all shorter length possibilities first

– A complete search tree of depth d where each non-leaf node has b children, has a

total of 1 + b + b2 + ... + b

d = (b

(d+1) - 1)/(b-1) nodes

– For a complete search tree of depth 12, where every node at depths 0, ..., 11 has

10 children and every node at depth 12 has 0 children, there are 1 + 10 + 100 +

1000 + ... + 1012

= (1013

- 1)/9 = O(1012

) nodes in the complete search tree. If BFS

expands 1000 nodes/sec and each node uses 100 bytes of storage, then BFS will

take 35 years to run in the worst case, and it will use 111 terabytes of memory!

of 11

Depth-First Search(DFS):

• Enqueue nodes on nodes in LIFO (last-in, first-out) order. That is, nodes used as a stack

data structure to order nodes.

• May not terminate without a “depth bound,” i.e., cutting off search below a fixed depth D

( “depth-limited search”)

• Not complete (with or without cycle detection, and with or without a cutoff depth)

• Exponential time, O(bd), but only linear space, O(bd)

• Can find long solutions quickly if lucky (and short solutions slowly if unlucky!)

• When search hits a dead-end, can only back up one level at a time even if the “problem”

occurs because of a bad operator choice near the top of the tree. Hence, only does

“chronological backtracking”

Uniform-Cost Search(UCS):

• Enqueue nodes by path cost. That is, let g(n) = cost of the path from the start node to the

current node n. Sort nodes by increasing value of g.

• Called “Dijkstra’s Algorithm” in the algorithms literature and similar to “Branch and

Bound Algorithm” in operations research literature

• Complete (*)

• Optimal/Admissible (*)

• Admissibility depends on the goal test being applied when a node is removed from the

nodes list, not when its parent node is expanded and the node is first generated

• Exponential time and space complexity, O(bd)

Informed Search

A problem determines the graph and the goal but not which path to select from the

frontier. This is the job of a search strategy. A search strategy specifies which paths are selected

from the frontier. Different strategies are obtained by modifying how the selection of paths in the

frontier is implemented.

of 11

It is not difficult to see that uninformed search will pursue options that lead away from

the goal as easily as it pursues options that lead towards the goal. For any but the smallest

problems this leads to searches that take unacceptable amounts of time and/or space. Informed

search tries to reduce the amount of search that must be done by making intelligent choices for

the nodes that are selected for expansion. This implies the existence of some way of evaluating

the likely hood that a given node is on the solution path. In general this is done using

a heuristic function.

Informed Search Techniques:

Informed strategies use agent’s background information about the problem map, costs of

actions, approximation of solutions, ...

best-first search

greedy search

A* search

local search

Hill-climbing

Simulated annealing

Genetic algorithms

Local search in continuous spaces

Best-first search: Idea: use an evaluation functionfor each node

– estimate of “desirability”

Expand most desirable unexpanded node

Implementation:

fringe is a queue sorted in decreasing order of desirability

Special cases:

greedy search

A* search

of 11

Greedy Search:

Evaluation functionh(n) (heuristic)

= estimate of cost fromnto the closest goal

E.g.

hSLD(n) = straight-line distance from n to Bucharest

Greedy search expands the node that appears to be closest to goal

Properties of greedy search:

Complete?? No–can get stuck in loops, e.g., from Iasi to Fargas

Iasi Neamt Iasi Neamt

Complete in finite space with repeated-state checking

Optimal?? No

Time?? O(bm), but a good heuristic can give dramatic improvement

Space?? O(bm)

Uninformed vs. informed search: • Uninformed search strategies

– Also known as “blind search,” uninformed search strategies use no information about the

likely “direction” of the goal node(s)

– Uninformed search methods: Breadth-first, depth-first, depth-limited, uniform-cost,

depth-first iterative deepening, bidirectional

of 11

• Informed search strategies

– Also known as “heuristic search,” informed search strategies use information about the

domain to (try to) (usually) head in the general direction of the goal node(s)

– Informed search methods: Hill climbing, best-first, greedy search, beam search, A, A*

A* Algorithm

The A* algorithm combines features of uniform-cost search and pure heuristic search to

efficiently compute optimal solutions. A* algorithm is a best-first search algorithm in which the

cost associated with a node is f(n) = g(n) + h(n), where g(n) is the cost of the path from the initial

state to node n and h(n) is the heuristic estimate or the cost or a path from node n to a goal. Thus,

f(n) estimates the lowest total cost of any solution path going through node n. At each point a

node with lowest f value is chosen for expansion. Ties among nodes of equal f value should be

broken in favor of nodes with lower h values. The algorithm terminates when a goal is chosen for

expansion.

A* algorithm guides an optimal path to a goal if the heuristic function h(n) is admissible,

meaning it never overestimates actual cost. For example, since airline distance never

overestimates actual highway distance, and manhatten distance never overestimates actual moves

in the gliding tile.

For Puzzle, A* algorithm, using these evaluation functions, can find optimal solutions to

these problems. In addition, A* makes the most efficient use of the given heuristic function in

the following sense: among all shortest-path algorithms using the given heuristic function h(n).

A* algorithm expands the fewest number of nodes.

The defining characteristics of the A* algorithm are the building of a "closed list" to

record areas already evaluated, a "fringe list" to record areas adjacent to those already evaluated,

and the calculation of distances traveled from the "start point" with estimated distances to the

"goal point".

The fringe list, often called the "open list", is a list of all locations immediately adjacent

to areas that have already been explored and evaluated (the closed list).

The closed list is a record of all locations which have been explored and evaluated by the

algorithm.

of 11

Figure 1. The current location is the yellow square, it is now part of the closed list. The orange

squares surrounding around the yellow are the fringe, these are the possible options which the

algorithm can experiment with.

Figure 2. As the path progresses, the closed and fringe lists grow. Note that this path cuts

corners. If the gray area represents an obstacle, like a wall, this path might be invalid since it

passes unhindered through the wall.

of 11

Figure 3. When cornering rules are imposed, the path will be better suited to avoiding obstacles.

The heuristic used to evaluate distances in A* is:

f(n) = g(n) + h(n)

where g(n) represents the cost (distance) of the path from the starting point to any vertex n, and

h(n) represents the estimated cost from vertex n to the goal.

Euclidean distance (straight line distance) is a common method to used for h(n).

x2 = coordinate of the goal location x1 = coordinate of the current location y2 = coordinate of the goal location y1 = coordinate of current location dx = | x2 - x1 | dy = | y2 - y1 |

Distance = sqrt(dx2 + dy

2)

The A* algorithm is fairly simple. There are two sets, FRINGE and CLOSED. The

FRINGE set contains those nodes that are candidates for examining. Initially, the FRINGE set

contains just one element: the starting position. The CLOSED set contains those nodes that have

already been examined. Initially, the CLOSED set is empty. Graphically, the FRINGE set is the

"frontier" and the CLOSED set is the "interior" of the visited areas. Each node also keeps a

pointer to its parent node so that we can determine how it was found.

There is a main loop that repeatedly pulls out the best node n in FRINGE (the node with

the lowest f value) and examines it. If n is the goal, then we're done. Otherwise, node n is

removed from FRINGE and added to CLOSED. Then, its neighbors n' are examined. A neighbor

of 11

that is in CLOSED has already been seen, so we don't need to look at it. A neighbor that is in

FRINGE will be examined if its f value becomes the lowest in FRINGE. Otherwise, we add it to

FRINGE, with its parent set to n. The path cost to n', g(n'), will be set to g(n) + movementcost(n,

n').

Pseudo code:

Inputs

map start and goal locations

Internal Data

fringe - a list of map locations to be evaluated, in ascending order of estimated distance closedList - a list of map locations that have been fully evaluated

RouteNode, contains

a map location pointer to this node's parent node d, the actual distance traveled to reach this node dPlusL2, which is d + linear distance to goal

Search()

Put start node onto fringe endNode = findRoute()

findRoute()

if fringe is empty o // No route exists between start and goal. o return 0

else

of 11

o node = remove first fringe node (it will have the shortest estimated distance to the goal) o if node's location is the goal

return RouteNode data for current location o else

if node's location is not on the closedList add node to closedList addChildrenToFringe(node)

return findRoute()

addChildrenToFringe(parentNode)

for all children of parentNode o if child's location is not on closedList

childNode = new RoutNode() childNode.parent = parentNode childNode.d = parent.d + linearDistance(parent, child) L2 = linearDistance(childNode, goal) childNode.dPlusL2 = childNode.d + L2

o Add childNode to fringe, maintaining ascending dPlusL2 order

Drawback of A* algorithm:

The main drawback of A* algorithm and indeed of any best-first search is its memory

requirement. Since at least the entire open list must be saved, A* algorithm is severely space-

limited in practice, and is no more practical than best-first search algorithm on current machines.

For example, while it can be run successfully on the eight puzzle, it exhausts available memory

in a matter of minutes on the fifteen puzzle.

References:

http://en.wikipedia.org/wiki/Graph_(mathematics)

http://mnemstudio.org/path-finding-a-star.htm

http://en.wikipedia.org/wiki/Bidirectional_search

Lecture : Solving Problems by Searching by Marco Chiarandini

University of Southern Denmark

http://en.wikipedia.org/wiki/Graph_(mathematics)

http://mnemstudio.org/path-finding-a-star.htm

http://en.wikipedia.org/wiki/Bidirectional_search

graph

Documents