search algorithms for agents
DESCRIPTION
Search Algorithms for Agents. problems that have been addressed by search algorithms can be divided into three classes: path-finding problems constraint satisfaction problems (CSP) two-player games. Two-player games. - PowerPoint PPT PresentationTRANSCRIPT
Search Algorithms for Agents
problems that have been addressed by search algorithms can be divided into three classes:
• path-finding problems
• constraint satisfaction problems (CSP)
• two-player games
Two-player games
Two-player games studies are obviously related to DAI/multiagent systems where agents are competitive.
CSP & Path-finding
Most algorithms for these classes were originally developed for a single-agent
Among them, what kinds of algorithms would be useful for cooperative problem
solving by multiple agents?
search algorithm graph representation
A search problem can be represented by using a graph.
Some of the search problems can be solved by accumulating local computations for each node in the graph.
Asynchronous search algorithms definition
• Asynchronous search algorithm
solves a search problem by accumulating
local computations.• The execution order of these local
computation can be arbitrary or highly
flexible, and can be executed
asynchronously and concurrently.
CSP – a quick reminder
• A CSP consists of n variables x1,…,xn,
Whose values are taken from finite, discrete domains
D1,…,Dn, respectively, and a set of constraints on their values.
• The constraint pk(xk1,…,xkj) is a predicate
that is defined on the Cartesian product
Dk1 x … x Dkj. This predicate is true iff the
value assignment of these variables satisfies
this constraint.
CSP
Since constraint satisfaction is NP-complete in general, a trial-and-error exploration of alternatives is inevitable.
For simplicity, we will focus our attention on binary CSPs, i.e., all the constraints are between two variables.
Example: binary CSP graph
The figure shows 3 variables x1,x2,x3 and constraints x1 != x3, x1 = x2
x1x2
x3
=!
=
Distributed CSP
• Assuming that the variables of a CSP
are distributed among agents, solving the
consist of achieving coherence between the
agents.
• Problems like multiagent truth maintenance
tasks, interpretation problems, and assignment
problems can be formalized as distributed CSPs.
CSP and asynchronous algorithms
Each process will correspond to a variable.
We assume the following communication
model:• Processes communicate by sending messages. • The delay in delivering a massage is finite.• Between two processes, messages are received in
the order they were sent.
Processes that have links to xi is called neighbors
of xi.
Filtering AlgorithmA process xi perform the following procedure revise(xi,xj) for each neighboring process xj.
procedure revise(xi,xj)
for all xiinDi do
if there is no value vj inDj such that vj is consistent with vi then delete vi from Di; end if; end do;
• When a value is deleted, the process sends it’s newdomain to his neighboring processes.
• When xi receives a new domain from a neighbor xj, the
procedure revise(xi,xj) is performed again.
The execution order of these processes is arbitrary.
Filtering example: 3-Queens x1 x2 x3
x1 x2 x3 x1 x2 x3
x1
x2 x3
Revise(x1,x2)
Revise(x2,x3) Revise(x3,x2)
3-Queens example – continue
x1
x2 x3
Revise(x1,x3)
x1 x2 x3
x1 x2 x3 x1 x2 x3
Filtering Algorithm
• If a domain of some variable becomes an empty set, the problem is over-constrained and has no solution.
• If each domain has a unique value, then the remaining values are a solution.
• If there exist multiple values for some variables, we cannot tell whether the problem has a solution or not, and further search is required.
Filtering should be considered a preprocessing procedure that is invoked before the application of other search methods.
K-Consistency
A CSP is k-consistent iff given any instantiation of any k-1 variables satisfying all the constraints among them, it is possible to find an instantiation of any kth variable such that these k variable values satisfy all the constraints among them.
If the problem is k-consistent and j-consistent for all j<k, the problem is called strongly k-consistent.
Next, we’ll see an algorithm that transforms a given problem into an equivalent strongly k-consistent problem.
Hyper-Resolution-Based Consistency Algorithm
The hyper-resolution rule is described as follows (Ai is a proposition such as x1=1).
...)1...21...11(
...)1(
.
.
...),212(
...),111(
...21
AmAA
AmAm
AA
AA
AmAA
In this algorithm, all constraints are represented as a nogood, which is a prohibited combination of variables values. (example next slide).
Graph coloring example• The constraints between x1 and x2 can be represented as
two nogoods {x1=red,x2=red} and {x1=blue,x2=blue}.
• By using the hyper-resolution rule we can obtain from {x1=red,x2=red} and {x1=blue,x3=blue} a new nogood {x2=red,x3=blue}
x1x2
x3
}red,blue{
}red,blue{
}red,blue{
Hyper-Resolution-Based Consistency Algorithm
• Each process represents its constraints as nogoods.• Each process generates new nogoods by combining the
information about its domain and existing nogoods using the hyper-resolution rule.
• A newly obtained nogood is communicated to related processes.
• If a new nogood is communicated, the process tries to generate further new nogoods using the communicated nogood.
Hyper-Resolution-Based Consistency Algorithm
• A nogood is a combination of variables values that is
prohibited, therefore, a superset of a nogood cannot be a solution.
• If an empty set becomes a nogood, the problem is over-
constrained and has no solution.
The hyper-resolution rule can generate a very large
number of nogoods. If we restrict the application of the
rules so that only nogoods whose length are less than k
are produced, the problem becomes strongly k-consistent.
Asynchronous BacktrackingAn asynchronous version of a backtracking algorithm,
which is a standard method for solving CSPs .The completeness of the algorithm is guaranteed.
• The processes are ordered by the alphabetical order of the variable identifiers. Each process chooses an assignment.
• Each process maintains the current value of other processes from its viewpoint (local view). A process changes its assignment if its current value isn’t consistent with the assignments of higher priority processes.
• If there exist no value that is consistent with the higher priority processes, the process generates a new nogood, and communicate the nogood to a higher priority process.
Asynchronous Backtracking
• The local view may contain obsolete information. Therefore, the receiver of a new nogood must check whether the nogood ia actually violated from its own local view.
• The main messages types communicated among processes are ‘ok?’ to communicate the current value,
and ‘nogood’ to communicate a new nogood.
Asynchronous Backtracking example
X1
{1,2}X2
}2{
X3
}1,2{
)ok?, (x1,1)(
Local view: {(x1,1),(x2,2)}
=!=!)))ok?, (x2,2
Asynchronous Backtracking example – continue(1)
X1
{1,2}X2
}2{
X3
}1,2{
(nogood, {(x1,1),(x2,2)})
=!=!
Local view: {(x1,1)}
New link
Add neighbor, and get value requests
Asynchronous Backtracking example – continue(2)
X1
{1,2}X2
}2{
X3
}1,2{
=!=!
(nogood,{(x1,1)})
Asynchronous Backtracking
When received (ok?, (xj,dj)) doadd (xj,dj) to local_view;check_local_view; end do;
When received (nogood, nogood) dorecord nogood as a new constraint;when (xk,dk) where xk is not a neighbor do
request xk to add xi to its neighbors;add xk to neighbors;add (xk,dk) to local_view; end do;
check_local_view;end do;
Asynchronous Backtracking
Procedure check_local_view
when local_view and current_value are not consistent do
if no value in Di is consistent with local_view
then resolve new nogood using hyper-resolution rule and send the nogood to the lowest priority process in the nogood;
when an empty nogood is found do
broadcast to other processes that there is no solution, terminate this algorithm; end do;
else select d in Di where local_view and d are consistent;
current_value d;
send (ok?, (xi,d)) to neighbors; end if; end do;
Asynchronous Weak-Commitment Search
This algorithm introduces a method for dynamically ordering processes so that a bad decision can be revised without an exhaustive search.
• For each process, the initial priority is 0.• If there exists no consistent value for xi, the priority of xi
is changed to k+1, where k is the largest value of related processes.
• The order is defined such that any process with a larger priority value has higher priority. If the priority value of processes are the same, the order is determined by the alphabetical order of the variables.
Asynchronous Weak-Commitment Search
As in the asynchronous backtracking, each process concurrently assigns a value to its variable, and send the variable value to other processes.
• The priority value, as well as the current assignment, is communicated through the ‘ok?’ message.
• If the current value is not consistent with the local view the agent changes its value using the min-conflict heuristic, i.e., a value that is consistent with the local view and minimizes the number of constraint violations with variable of lower priority processes.
Asynchronous Weak-Commitment Search
• Each process records the nogoods that have been resolved.
• When xi cannot find a consistent value with its local view, xi sends nogoods messages to other processes,
and increment its priority only if he created a new nogood.
Asynchronous Weak-Commitment Search example
Q
Q
Q
Q
X1 (0)
X2 (0)
X4 (0)
X3 (0)
)a(
Q
Q
Q
Q
X1 (0)
X2 (0)
X4 (1)
X3 (0)
)b(
Asynchronous Weak-Commitment Search example - continue
Q
Q
Q
Q
X1 (0)
X2 (0)
X4 (1)
X3 (2)
)c(
Q
Q
Q
Q
X1 (0)
X2 (0)
X4 (1)
X3 (2)
)d(
Asynchronous Weak-Commitment Search Completeness
The completeness of algorithm is guaranteed by the factthat the processes record all nogoods found so far.
Handling a large number of nogoods is time/spaceconsuming. We can restrict the number of recordednogoods, such that each processes records only the mostrecently found nogoods. In this case the theoreticalcompleteness is not guaranteed. Yet, when the number of recorded nogoods is reasonably large, an infinite processing loop rarely occurs.
Path Finding Problem
A path finding problem consist of the following components:• A set of nodes N, each representing a state.• A set of directed links L, each representing an operator
available to a problem solving agent. • A unique node s called the start node.• A set of nodes G, each represents a goal state.
Path Finding Problem
More definitions:
• h*(i) is the shortest distance from node i to goal nodes
• If j is a neighbor of i, the shortest distance via j is given by f*(j) = k(i,j) + h*(j), where k(i,j) is the cost of the link between i and j.
• If i is not a goal node, then h*(i) = minjf*(j) holds.
Asynchronous Dynamic Programming Algorithm
Let assume the following situation.• For each node i there exist a process corresponding to it.• Each process records h(i), which is the estimated value of
h*(i). The initial value of h(i) is except for goal nodes.• For each goal node g, h(g) is 0.• Each process can refer to h value of neighboring nodes.
The algorithm:
each process updates h(i) by the following procedure.
For each neighboring node j, compute f(j) = k(i,j) + h(j), and update h(i) as follows: h(i) minjf(j).
Asynchronous Dynamic Programming Example
s
a
b
c
g
d
1
2
1
3
11
2
3
1
0
3
1
22
3
4
Asynchronous Dynamic Programming
• If the costs of all links are positive, it is proved that for each node i, h(i) converges to the true value h*(i).
• In reality, the number of nodes can be huge, and we cannot afford to have processes for all nodes.
Learning Real-Time A* Algorithm (LRTA*)
As with asynchronous dynamic programming, each agent
records the estimated distance h(i)
Each agent repeats the following procedure.
1. Lookahead: calculate f(j) = k(i,j) + h(j).
2. Update: h(i) minjf(j).
3. Action selection: move to the neighbor j that has the minimum f(j) value.
LRTA*
• The initial value of h is determined using an admissible heuristic function.
• By using an admissible heuristic function on a problem with finite number of nodes, in which all links are positive and there exist a path from every node to a goal node, the completeness is guaranteed.
• Since LRTA* never overestimates, it learns the optimal solutions through repeated trials.
Real-Time A* Algorithm (RTA*)
• Similar to LRTA*, only that the updating phase is different.
- instead of setting h(i) to the smallest value of f(j),
the second smallest value is assigned to h(i).
- as a result, RTA* learns more efficiently than LRTA*, but can overestimate heuristic costs.
In a finite space with positive edge costs, in which there exist a path from every state to a goal, using a non-negative admissible initial heuristic values, RTA* is complete.
Moving Target Search (MTS)
• MST algorithm is a generalization of LRTA* to the case where the target can move.
• We assume that the problem solver and the target move alternately, and each can traverse at most one edge in a single move.
• The task is accomplished when the problem solver and the target occupy the same node.
• MTS maintains a matrix of heuristic values, representing the function h(x,y) for all pairs of states x and y.
• The matrix initialized to the values returned by the static evaluation function.
MTS
To simplify the following discussion, we assume that all
edges in the graph have unit cost.
When the problem solver moves:
1. Calculate h(xj,yi) for each neighbor xj of xi.
2. Update the value of h(xi,yi) as follows:
h(xi,yi) max{ h(xi,yi), minxj{h(xj,yi) +1} }
3. Move to the neighbor xj with the minimum h(xj,yi).
MTS
When the target moves:1. Calculate h(xi,yj) for the target’s new position yj.2. Update the value of h(xi,yi) as follows:
h(xi,yi) max{ h(xi,yi), h(xi,yj) -1 }3. Assign yj to yi, yj is the new target’s position.
MST completeness:In a finite problem space with positive edge costs , in whichthere exists a path from every state to the goal state,starting with non-negative admissible initial heuristicvalues, and with the other assumptions we mentioned,the problem solver will eventually reach the target.
Real-Time Bidirectional Search Algorithm (RTBS)
• Two problem solvers starting from the initial and goal states move toward each other.
• Each of them knows its current location, and can communicate with the other.
The following steps are executed until the solvers meet:1. Control strategy: select a forward or backward move.2. Forward move: the forward solver moves toward
the other.3. Backward move: the backward solver moves toward
the other.
RTBS
There are two categories of RTBS:
1. Centralized RTBS where the best action is selected from among all possible moves of the two solvers.
2. Decoupled RTBS where the two solvers independently make their own decisions.
The evaluation results show that when the heuristic
function return accurate values decoupled performs better
than centralized.
Otherwise, centralized is better.
Is RTBS better than unidirectional search?
• The number of moves for centralized RTBS is around 1/2 in 15-puzzles and 1/6 in 24-puzzles that for real-time unidirectional search.
• In mazes, the number of moves for RTBS is double that for unidirectional search.
The key to understand this results is to view that the
difference between RTBS and unidirectional search is their
problem spaces.
RTBS
• We call a pair of locations (x,y) a p-state.• We call the problem space consisting of p-states a
combined problem space.
• A heuristic depression is a set of connected states with heuristic values less than or equal to the set of immediate surrounding.
• The performance of real-time search is sensitive to the topography of the problem space, especially to heuristic depressions.
RTBS
Heuristic depressions of the original problem space have been observed to become large and shallow in the combined problem space.
- if the original heuristic depressions are deep, they become large, and that makes the problem harder to solve.
- if the original depressions are shallow, they become very shallow, and this makes the
problem easier to solve