cs b659: intelligent robotics

CS B659: INTELLIGENT ROBOTICSPlanning Under Uncertainty

Perception(e.g., filtering,

SLAM)

Prior knowledge (often probabilistic)

Sensors

Decision-making(control laws, optimization,

planning)

Offline learning(e.g., calibration)

Actuators

Actions

DEALING WITH UNCERTAINTY Sensing uncertainty

Localization error Noisy maps Misclassified objects

Motion uncertainty Noisy natural processes Odometry drift Imprecise actuators Uncontrolled agents (treat as state)

MOTION UNCERTAINTY

g

obstacle

obstacle

obstacle

s

DEALING WITH MOTION UNCERTAINTY Hierarchical strategies

Use generated trajectory as input into a low-level feedback controller

Reactive strategies Approach #1: re-optimize when the state has

diverged from the planned path (online optimization) Approach #2: precompute optimal controls over state

space, and just read off the new value from the perturbed state (offline optimization)

Proactive strategies Explicitly consider future uncertainty Heuristics (e.g., grow obstacles, penalize nearness) Markov Decision Processes Online and offline approaches

PROACTIVE STRATEGIES TO HANDLE MOTION UNCERTAINTY

g

obstacle

obstacle

obstacle

s

DYNAMIC COLLISION AVOIDANCE ASSUMING WORST-CASE BEHAVIORS Optimizing

safety in real-time under worst case behavior model

T=1

T=2

TTPF=2.6

Robot

MARKOV DECISION PROCESS APPROACHES

Alterovitz et al 2007

DEALING WITH SENSING UNCERTAINTY Reactive heuristics often work well

Optimistic costs Assume most-likely state Penalize uncertainty

Proactive strategies Explicitly consider future uncertainty Active sensing Reward actions that yield information gain Partially Observable Markov Decision Processes

(POMDPs)

10

Assuming no obstacles in the unknown region and taking the shortest path to the goal

11


12


Works well for navigation because the space of all maps is too huge, and certainty is monotonically nondecreasing

13


Works well for navigation because the space of all maps is too huge, and certainty is monotonically nondecreasing

14

What if the sensor was directed (e.g., a camera)?

15


16


17


18


19


20


21


22


23


24


At this point, it would have made sense to turn a bit more to see more of the unknown map

ANOTHER EXAMPLE

? Locked?

Key?Key?

ACTIVE DISCOVERY OF HIDDEN INTENT

No motion

Perpendicular motion

MAIN APPROACHES TO PARTIAL OBSERVABILITY Ignore and react

Doesn’t know what it doesn’t know Model and react

Knows what it doesn’t know Model and predict

Knows what it doesn’t know AND what it will/won’t know in the future

• Better decisions• More components in

implementation• Harder

computationally

UNCERTAINTY MODELS Model #1: Nondeterministic uncertainty

f(x,u) -> a set of possible successors Model #2: Probabilistic uncertainty

P(x’|x,u): a probability distribution over successors x’, given state x, control u

Markov assumption

NONDETERMINISTIC UNCERTAINTY : REASONING WITH SETS x’ = x + e, [-a,a]

t=0

t=1-a a

t=2-2a 2a

Belief State: x(t) [-ta,ta]

UNCERTAINTY WITH SENSING Plan = policy (mapping from states to

actions) Policy achieves goal for every possible

sensor result in belief stateMove

Sense1 2

3 4

Observations should be chosen wisely to keep branching factor low

Outcomes

Special case: fully observable state

TARGET TRACKING

targetrobot

The robot must keep a target in its field of view

The robot has a prior map of the obstacles

But it does not know the target’s trajectory in advance

TARGET-TRACKING EXAMPLE Time is discretized into small

steps of unit duration At each time step, each of

the two agents moves by at most one increment along a single axis

The two moves are simultaneous

The robot senses the new position of the target at each step

The target is not influenced by the robot (non-adversarial, non-cooperative target)

targetrobot

TIME-STAMPED STATES (NO CYCLES POSSIBLE)

State = (robot-position, target-position, time) In each state, the robot can execute 5 possible actions :

{stop, up, down, right, left} Each action has 5 possible outcomes (one for each possible action

of the target), with some probability distribution[Potential collisions are ignored for simplifying the presentation]

([i,j], [u,v], t)

• ([i+1,j], [u,v], t+1)• ([i+1,j], [u-1,v], t+1)• ([i+1,j], [u+1,v], t+1)• ([i+1,j], [u,v-1], t+1)• ([i+1,j], [u,v+1], t+1)

right

REWARDS AND COSTSThe robot must keep seeing the target as long as possible Each state where it does not see the target is

terminal

The reward collected in every non-terminal state is 1; it is 0 in each terminal state[ The sum of the rewards collected in an execution run is exactly the amount of time the robot sees the target]

No cost for moving vs. not moving

EXPANDING THE STATE/ACTION TREE

...

horizon 1 horizon h

ASSIGNING REWARDS

...

horizon 1 horizon h

Terminal states: states where the target is not visible

Rewards: 1 in non-terminal states; 0 in others

But how to estimate the utility of a leaf at horizon h?

ESTIMATING THE UTILITY OF A LEAF

...

horizon 1 horizon h

Compute the shortest distance d for the target to escape the robot’s current field of view

If the maximal velocity v of the target is known, estimate the utility of the state to d/v [conservative estimate]

targetrobotd

SELECTING THE NEXT ACTION

...

horizon 1 horizon h

Compute the optimal policy over the state/action tree using estimated utilities at leaf nodes

Execute only the first step of this policy

Repeat everything again at t+1… (sliding horizon)

PURE VISUAL SERVOING

Computing and Using a Policy

NEXT WEEK More algorithm details

FINAL INFORMATION Final presentations

20 minutes + 10 minutes questions Final reports due 5/3

Must include a technical report Introduction Background Methods Results Conclusion

May include auxiliary website with figures / examples / implementation details

I will not review code

cs b659: intelligent robotics

Documents