ai in game programming it university of copenhagen learning from observations marco loog

51
ai in game programming it university of copenhagen Learning From Observations Marco Loog

Post on 19-Dec-2015

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Ai in game programming it university of copenhagen Learning From Observations Marco Loog

ai in game programming it university of copenhagen

Learning From Observations

Marco Loog

Page 2: Ai in game programming it university of copenhagen Learning From Observations Marco Loog

ai in game programming it university of copenhagen

Learning from Observations

Idea is that percepts should be used for improving agents ability to act in the future, not only for acting per se

Page 3: Ai in game programming it university of copenhagen Learning From Observations Marco Loog

ai in game programming it university of copenhagen

Outline

Learning agents

Inductive learning

Decision tree learning

Page 4: Ai in game programming it university of copenhagen Learning From Observations Marco Loog

ai in game programming it university of copenhagen

Learning

Learning is essential for unknown environments, i.e., when designer lacks omniscience

Learning is useful as a system construction method, i.e., expose the agent to reality rather than trying to write it down

Learning modifies the agent’s decision mechanisms to improve performance

Page 5: Ai in game programming it university of copenhagen Learning From Observations Marco Loog

ai in game programming it university of copenhagen

Learning Agent [Revisited]

Four conceptual components Learning element : responsible for making

improvements Performance element : takes percepts and

decides on actions Critic : provides feedback on how agent is

doing and determines how performance element should be modified

Problem generator : responsible for suggesting actions leading to new and informative experience

Page 6: Ai in game programming it university of copenhagen Learning From Observations Marco Loog

ai in game programming it university of copenhagen

Figure 2.15 [Revisited]

Page 7: Ai in game programming it university of copenhagen Learning From Observations Marco Loog

ai in game programming it university of copenhagen

Learning Element

Design of learning element is affected by

Which components of the performance element are to be learned

What feedback is available to learn these components

What representation is used for the components

Page 8: Ai in game programming it university of copenhagen Learning From Observations Marco Loog

ai in game programming it university of copenhagen

Agent’s Components

Direct mapping from conditions on current state to actions [instructor : brake!]

Means to infer relevant properties about world from percept sequence [learning from images]

Info about evolution of the world and results of possible actions [braking on wet road]

Utility indicating desirability of world state [no tip / component of utility function]

...

Each component can be learned from appropriate feedback

Page 9: Ai in game programming it university of copenhagen Learning From Observations Marco Loog

ai in game programming it university of copenhagen

Types of Feedback

Supervised learning : correct answers for each example

Unsupervised learning : correct answers not given

Reinforcement learning : occasional rewards

Page 10: Ai in game programming it university of copenhagen Learning From Observations Marco Loog

ai in game programming it university of copenhagen

Inductive Learning

Simplest form : learn a function from examples

I.e. learn the target function f

Examples : input / output pairs (x, f(x))

Page 11: Ai in game programming it university of copenhagen Learning From Observations Marco Loog

ai in game programming it university of copenhagen

Inductive Learning

Problem

Find a hypothesis h, such that h ≈ f, based on given training set of examples

= highly simplified model of real learning

Ignores prior knowledge Assumes examples are given

Page 12: Ai in game programming it university of copenhagen Learning From Observations Marco Loog

ai in game programming it university of copenhagen

Hypothesis

A good hypothesis will generalize well, i.e., able to predict based on unseen examples

Page 13: Ai in game programming it university of copenhagen Learning From Observations Marco Loog

ai in game programming it university of copenhagen

Inductive Learning Method

E.g. function fitting

Goal is to estimate real underlying functional relationship from example observations

Page 14: Ai in game programming it university of copenhagen Learning From Observations Marco Loog

ai in game programming it university of copenhagen

Inductive Learning Method

Construct h to agree with f on training set

Page 15: Ai in game programming it university of copenhagen Learning From Observations Marco Loog

ai in game programming it university of copenhagen

Inductive Learning Method

Construct h to agree with f on training set

Page 16: Ai in game programming it university of copenhagen Learning From Observations Marco Loog

ai in game programming it university of copenhagen

Inductive Learning Method

Construct h to agree with f on training set

Page 17: Ai in game programming it university of copenhagen Learning From Observations Marco Loog

ai in game programming it university of copenhagen

Inductive Learning Method

Construct h to agree with f on training set h is consistent if it agrees with f on all

examples

Page 18: Ai in game programming it university of copenhagen Learning From Observations Marco Loog

ai in game programming it university of copenhagen

Inductive Learning Method

Construct h to agree with f on training set h is consistent if it agrees with f on all

examples

Page 19: Ai in game programming it university of copenhagen Learning From Observations Marco Loog

ai in game programming it university of copenhagen

So, which ‘Fit’ is Best?

Page 20: Ai in game programming it university of copenhagen Learning From Observations Marco Loog

ai in game programming it university of copenhagen

So, which ‘Fit’ is Best?

Ockham’s razor : prefer simplest hypothesis consistent with the data

Page 21: Ai in game programming it university of copenhagen Learning From Observations Marco Loog

ai in game programming it university of copenhagen

So, which ‘Fit’ is Best?

Ockham’s razor : prefer simplest hypothesis consistent with the data What’s consistent? What’s simple?

Page 22: Ai in game programming it university of copenhagen Learning From Observations Marco Loog

ai in game programming it university of copenhagen

Hypothesis

A good hypothesis will generalize well, i.e., able to predict based on unseen examples

Not-exactly-consistent may be preferable over exactly consistent Nondeterministic behavior Consistency even not always possible

Nondeterministic functions : trade-off complexity of hypothesis / degree of fit

Page 23: Ai in game programming it university of copenhagen Learning From Observations Marco Loog

ai in game programming it university of copenhagen

Decision Trees

‘Decision tree induction is one of the simplest, and yet most successful forms of learning algorithm’

Good intro to the area of inductive learning

Page 24: Ai in game programming it university of copenhagen Learning From Observations Marco Loog

ai in game programming it university of copenhagen

Decision Tree

Input : object or situation described by set of attributes / features

Output [discrete or continuous] : decision / prediction

Continuous -> regression Discrete -> classification

Boolean classification : output is binary / ‘true’ or ‘false’

Page 25: Ai in game programming it university of copenhagen Learning From Observations Marco Loog

ai in game programming it university of copenhagen

Decision Tree

Performs a sequence of tests in order to reach a decision

Tree [as in : graph without closed loops] Internal node : test of the value of single

property Branches labeled with possible test

outcomes Leaf node : specifies output value

Resembles a ‘how to’ manual

Page 26: Ai in game programming it university of copenhagen Learning From Observations Marco Loog

ai in game programming it university of copenhagen

Decide whether to wait for a Table at a Restaurant

Based on the following attributes Alternate : is there an alternative restaurant nearby? Bar : is there a comfortable bar area to wait in? Fri/Sat : is today Friday or Saturday? Hungry : are we hungry? Patrons : number of people in the restaurant [None,

Some, Full] Price : price range [$, $$, $$$] Raining : is it raining outside? Reservation : have we made a reservation? Type : kind of restaurant [French, Italian, Thai, Burger] WaitEstimate : estimated waiting time [0-10, 10-30,

30-60, >60]

Page 27: Ai in game programming it university of copenhagen Learning From Observations Marco Loog

ai in game programming it university of copenhagen

Attribute-Based Representations

Examples of decisions

Page 28: Ai in game programming it university of copenhagen Learning From Observations Marco Loog

ai in game programming it university of copenhagen

Decision Tree

Possible representation for hypotheses Below is the ‘true’ tree [note Type? plays no

role]

Page 29: Ai in game programming it university of copenhagen Learning From Observations Marco Loog

ai in game programming it university of copenhagen

Expressiveness

Decision trees can express any function of the input attributes

E.g., for Boolean functions, truth table row path to leaf

Page 30: Ai in game programming it university of copenhagen Learning From Observations Marco Loog

ai in game programming it university of copenhagen

Expressiveness

There is a consistent decision tree for any training set with one path to leaf for each example [unless f nondeterministic in x] but it probably won’t generalize to new examples

Prefer to find more compact decision trees [This Ockham again...]

Page 31: Ai in game programming it university of copenhagen Learning From Observations Marco Loog

ai in game programming it university of copenhagen

Attribute-Based Representations

Is simply a lookup table Cannot generalize to unseen examples

Page 32: Ai in game programming it university of copenhagen Learning From Observations Marco Loog

ai in game programming it university of copenhagen

Decision Tree

Applying Ockham’s razor : smallest tree consistent with examples

Page 33: Ai in game programming it university of copenhagen Learning From Observations Marco Loog

ai in game programming it university of copenhagen

Decision Tree

Applying Ockham’s razor : smallest tree consistent with examples

Able to generalize to unseen examples

No need to program everything out / specify everything in detail

‘true’ tree = smallest tree?

Page 34: Ai in game programming it university of copenhagen Learning From Observations Marco Loog

ai in game programming it university of copenhagen

Decision Tree Learning

Unfortunately, finding the ‘smallest’ tree is intractable in general

New aim : find a ‘smallish’ tree consistent with the training examples

Idea : [recursively] choose ‘most significant’ attribute as root of [sub]tree

‘Most significant’ : making the most difference to the classification

Page 35: Ai in game programming it university of copenhagen Learning From Observations Marco Loog

ai in game programming it university of copenhagen

Choosing an Attribute Tests

Idea : a good attribute splits the examples into subsets that are [ideally] ‘all positive’ or ‘all negative’

Patrons? is a better choice

Page 36: Ai in game programming it university of copenhagen Learning From Observations Marco Loog

ai in game programming it university of copenhagen

Using Information Theory

Information content [entropy] : I(P(v1), … , P(vn)) = Σi=1 -P(vi) log2 P(vi) For a training set containing p positive

examples and n negative examples

Specifies the minimum number of bits of information needed to encode the classification of an arbitrary member

np

n

np

n

np

p

np

p

np

n

np

pI

22 loglog),(

Page 37: Ai in game programming it university of copenhagen Learning From Observations Marco Loog

ai in game programming it university of copenhagen

Information Gain

Chosen attribute A divides training set E into subsets E1, … , Ev according to their values for A, where A has v distinct values

Information gain [IG] : expected reduction in entropy caused by partitioning the examples

v

i ii

i

ii

iii

np

n

np

pI

np

npAremainder

1

),()(

)(),()( Aremaindernp

n

np

pIAIG

Page 38: Ai in game programming it university of copenhagen Learning From Observations Marco Loog

ai in game programming it university of copenhagen

Information Gain

Information gain [IG] : expected reduction in entropy caused by partitioning the examples

Choose the attribute with the largest IG

[Wanna know more : Google it...]

)(),()( Aremaindernp

n

np

pIAIG

Page 39: Ai in game programming it university of copenhagen Learning From Observations Marco Loog

ai in game programming it university of copenhagen

Information Gain [E.g.]

For the training set : p = n = 6, I(6/12, 6/12) = 1 bit

Consider Patrons? and Type? [and others]

Patrons has the highest IG of all attributes and so is chosen as the root Why is IG of Type? equal to zero?

bits 0)]4

2,

4

2(

12

4)

4

2,

4

2(

12

4)

2

1,

2

1(

12

2)

2

1,

2

1(

12

2[1)(

bits 0541.)]6

4,

6

2(

12

6)0,1(

12

4)1,0(

12

2[1)(

IIIITypeIG

IIIPatronsIG

Page 40: Ai in game programming it university of copenhagen Learning From Observations Marco Loog

ai in game programming it university of copenhagen

Decision Tree Learning

Plenty of other measures for ‘best’ attributes possible...

Page 41: Ai in game programming it university of copenhagen Learning From Observations Marco Loog

ai in game programming it university of copenhagen

Back to The Example...

‘Training data’

Page 42: Ai in game programming it university of copenhagen Learning From Observations Marco Loog

ai in game programming it university of copenhagen

Decision Tree Learned

Based on the 12 examples; substantially simpler solution than ‘true’ tree

More complex hypothesis isn’t justified by small amount of data

Page 43: Ai in game programming it university of copenhagen Learning From Observations Marco Loog

ai in game programming it university of copenhagen

Performance Measurement

How do we know that h ≈ f?

Or : how the h*ll do we know that our decision tree performs well?

Most often we don’t know... for sure

Page 44: Ai in game programming it university of copenhagen Learning From Observations Marco Loog

ai in game programming it university of copenhagen

Performance Measurement

However prediction quality can be estimated using

theory from computational / statistical learning theory / PAC-learning

Or we could, for example, simply try h on a new test set of examples The crux being of course that there should actually

be new test set...

If no test set is available several possibilities exist for creating ‘training’ and ‘test’ sets from the available data

Page 45: Ai in game programming it university of copenhagen Learning From Observations Marco Loog

ai in game programming it university of copenhagen

Performance Measurement

Learning curve : ‘%’ correct on test set as function of training set size

Page 46: Ai in game programming it university of copenhagen Learning From Observations Marco Loog

ai in game programming it university of copenhagen

Bad Conduct in AI

Training on the test set!

May happen before you know it Often very hard justifiable... if at all possible

All I can say is : try to avoid it

Page 47: Ai in game programming it university of copenhagen Learning From Observations Marco Loog

ai in game programming it university of copenhagen

Ensemble-Learning-in-1-Slide

Idea : collection [ensemble] of hypotheses is used / predictions are combined

Motivation : hope that it is much less likely to misclassify [obviously!] E.g. independence can be exploited

Examples : majority voting / boosting

Ensemble learning simply creates new, more expressive hypothesis space

Page 48: Ai in game programming it university of copenhagen Learning From Observations Marco Loog

ai in game programming it university of copenhagen

Summary

In general : learning needed for unknown environments or lazy designers

Learning agent = performance element + learning element [Chapter 2]

Supervised learning : the aim is to find simple hypothesis [approximately] consistent with training examples

Decision tree learning using IG Difficult to measure learning performance

Learning curve

Page 49: Ai in game programming it university of copenhagen Learning From Observations Marco Loog

ai in game programming it university of copenhagen

Next Week

More...

Page 50: Ai in game programming it university of copenhagen Learning From Observations Marco Loog

ai in game programming it university of copenhagen

Page 51: Ai in game programming it university of copenhagen Learning From Observations Marco Loog

ai in game programming it university of copenhagen